Commit Graph

25191 Commits

Author SHA1 Message Date
Alexander Korotkov bc3c8db8ae Display length and bounds histograms in pg_stats
Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

Catversion is bumped.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
2023-11-27 01:32:17 +02:00
Tom Lane 3558f120f8 Doc: list AT TIME ZONE and COLLATE in operator precedence table.
These constructs have precedence, but we forgot to list them.
In HEAD, mention AT LOCAL as well as AT TIME ZONE.

Per gripe from Shay Rojansky.

Discussion: https://postgr.es/m/CADT4RqBPdbsZW7HS1jJP319TMRHs1hzUiP=iRJYR6UqgHCrgNQ@mail.gmail.com
2023-11-26 16:40:24 -05:00
Tomas Vondra b2caf7c0e1 Fix brin.c indentation issues introduced by c1ec02be1d
Per buildfarm member koel.
2023-11-26 21:35:32 +01:00
Tomas Vondra c1ec02be1d Reuse BrinDesc and BrinRevmap in brininsert
The brininsert code used to initialize (and destroy) BrinDesc and
BrinRevmap for each tuple, which is not free. This patch initializes
these structures only once, and reuses them for all inserts in the same
command. The data is passed through indexInfo->ii_AmCache.

This also introduces an optional AM callback "aminsertcleanup" that
allows performing custom cleanup in case simply pfree-ing ii_AmCache is
not sufficient (which is the case when the cache contains TupleDesc,
Buffers, and so on).

Author: Soumyadeep Chakraborty
Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra
Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
2023-11-25 20:27:28 +01:00
Bruce Momjian 8d981341a5 C comment: clarify that WAL files can be _recycled_ or removed
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/CAB7nPqSDdF0heotQU3gsepgqx+9c+6KjLd3R6aNYH7KKfDd2ig@mail.gmail.com

Author: Michael Paquier

Backpatch-through: master
2023-11-25 10:48:18 -05:00
Bruce Momjian 79588d3c8d Use SECS_PER_HOUR macro in tzparser.c, instead of constants
Reported-by: CharSyam

Discussion: https://postgr.es/m/CAMrLSE5j_aWfoBDMrSvk14oBKSy+-2cjzNNH_FciirA7Kwo9TA@mail.gmail.com

Author: CharSyam

Backpatch-through: master
2023-11-24 22:36:23 -05:00
Bruce Momjian 344afc7769 modify segno. for pg_walfile_name() and pg_walfile_name_offset()
Previously these functions returned the previous segment number if the
LSN was on a segment boundary.  We now always return the current segment
number for an LSN.

Docs updated to reflect this change.  Regression tests added, author
Andres Freund.

Also mentioned in thread https://postgr.es/m/flat/20220204225057.GA1535307%40nathanxps13#d964275c9540d8395e138efc0a75f7e8

BACKWARD INCOMPATIBILITY

Reported-by: Kyotaro Horiguchi

Discussion: https://postgr.es/m/20190726.172120.101752680.horikyota.ntt@gmail.com

Co-authored-by: Kyotaro Horiguchi

Backpatch-through: master
2023-11-24 19:44:09 -05:00
Tom Lane d053a879bb Fix timing-dependent failure in GSSAPI data transmission.
When using GSSAPI encryption in non-blocking mode, libpq sometimes
failed with "GSSAPI caller failed to retransmit all data needing
to be retried".  The cause is that pqPutMsgEnd rounds its transmit
request down to an even multiple of 8K, and sometimes that can lead
to not requesting a write of data that was requested to be written
(but reported as not written) earlier.  That can upset pg_GSS_write's
logic for dealing with not-yet-written data, since it's possible
the data in question had already been incorporated into an encrypted
packet that we weren't able to send during the previous call.

We could fix this with a one-or-two-line hack to disable pqPutMsgEnd's
round-down behavior, but that seems like making the caller work around
a behavior that pg_GSS_write shouldn't expose in this way.  Instead,
adjust pg_GSS_write to never report a partial write: it either
reports a complete write, or reflects the failure of the lower-level
pqsecure_raw_write call.  The requirement still exists for the caller
to present at least as much data as on the previous call, but with
the caller-visible write start point not moving there is no temptation
for it to present less.  We lose some ability to reclaim buffer space
early, but I doubt that that will make much difference in practice.

This also gets rid of a rather dubious assumption that "any
interesting failure condition (from pqsecure_raw_write) will recur
on the next try".  We've not seen failure reports traceable to that,
but I've never trusted it particularly and am glad to remove it.

Make the same adjustments to the equivalent backend routine
be_gssapi_write().  It is probable that there's no bug on the backend
side, since we don't have a notion of nonblock mode there; but we
should keep the logic the same to ease future maintenance.

Per bug #18210 from Lars Kanis.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/18210-4c6d0b14627f2eb8@postgresql.org
2023-11-23 13:30:18 -05:00
Heikki Linnakangas 50c67c2019 Use ResourceOwner to track WaitEventSets.
A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.

This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.)

The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.

Reviewed-by: Alexander Lakhin, Thomas Munro
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
2023-11-23 13:31:36 +02:00
Bruce Momjian 414e75540f C comment: fix typos with unnecessary apostrophes
Reported-by: Vinayak Pokale

Discussion: https://postgr.es/m/CAEySZvh7gPTOqMhuKOBXEt=qF_1BCvFQB4MAJ4yaTPJHxgX_zw@mail.gmail.com

Author: Vinayak Pokale

Backpatch-through: master
2023-11-22 23:41:15 -05:00
Amit Kapila eeb0ebad79 Fix the initial sync tables with no columns.
The copy command formed for initial sync was using parenthesis for tables
with no columns leading to syntax error. This patch avoids adding
parenthesis for such tables.

Reported-by: Justin G
Author: Vignesh C
Reviewed-by: Peter Smith, Amit Kapila
Backpatch-through: 15
Discussion: http://postgr.es/m/18203-df37fe354b626670@postgresql.org
2023-11-22 11:44:14 +05:30
Amit Kapila ff68cc6f3b Stop the search once the slot for replication origin is found.
In replorigin_session_setup(), we were needlessly looping for
max_replication_slots even after finding an existing slot for the origin.
This shouldn't hurt us much except for probably large values of
max_replication_slots.

Author: Antonin Houska
Discussion: http://postgr.es/m/2694.1700471273@antos
2023-11-22 08:39:24 +05:30
Michael Paquier e83aa9f92f Simplify some logic in CreateReplicationSlot()
This refactoring reduces the code in charge of creating replication
slots from two "if" block to a single one, making it slightly cleaner.

This change is possible since 1d04a59be3, that has removed the
intermediate code that existed between the two "if" blocks in charge of
initializing the output message buffer.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PtnJzqKT41Zt8pChRzba=QgCqjtfYvcf84NMj3VFJoKfw@mail.gmail.com
2023-11-21 13:55:01 +09:00
Amit Kapila 7c3fb505b1 Log messages for replication slot acquisition and release.
This commit log messages (at LOG level when log_replication_commands is
set, otherwise at DEBUG1 level) when walsenders acquire and release
replication slots. These messages help to know the lifetime of a
replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down. These messages will be
useful on production servers to debug and analyze inactive replication
slots.

Note that these messages are emitted only for walsenders but not for
backends. This is because walsenders are the ones that typically hold
replication slots for longer durations, unlike backends which hold them
for executing replication related functions.

Author: Bharath Rupireddy
Reviewed-by: Peter Smith, Amit Kapila, Alvaro Herrera
Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com
2023-11-21 07:59:53 +05:30
Jeff Davis ad57c2a7c5 Optimize check_search_path() by using SearchPathCache.
A hash lookup is faster than re-validating the string, particularly
because we use SplitIdentifierString() for validation.

Important when search_path changes frequently.

Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com
2023-11-20 15:53:42 -08:00
Jeff Davis 8efa301532 Be more paranoid about OOM in search_path cache.
Recent commit f26c2368dc introduced a search_path cache, but left some
potential out-of-memory hazards. Simplify the code and make it safer
against OOM.

This change reintroduces one list_copy(), losing a small amount of the
performance gained in f26c2368dc. A future change may optimize away
the list_copy() again if it can be done in a safer way.

Discussion: https://postgr.es/m/e6fded24cb8a2c53d4ef069d9f69cc7baaafe9ef.camel@j-davis.com
2023-11-20 15:36:07 -08:00
Michael Paquier 3650e7a393 Prevent overflow for block number in buffile.c
As coded, the start block calculated by BufFileAppend() would overflow
once more than 16k files are used with a default block size.  This issue
existed before b1e5c9fa9a, but there's no reason not to be clean about
it.

Per report from Coverity, with a fix suggested by Tom Lane.
2023-11-20 09:14:53 +09:00
Tomas Vondra 28f84f72fb Lock table in DROP STATISTICS
The DROP STATISTICS code failed to properly lock the table, leading to

  ERROR:  tuple concurrently deleted

when executed concurrently with ANALYZE.

Fixed by modifying RemoveStatisticsById() to acquire the same lock as
ANALYZE. This function is called only by DROP STATISTICS, as ANALYZE
calls RemoveStatisticsDataById() directly.

Reported by Justin Pryzby, fix by me. Backpatch through 12. The code was
like this since it was introduced in 10, but older releases are EOL.

Reported-by: Justin Pryzby
Reviewed-by: Tom Lane
Backpatch-through: 12

Discussion: https://postgr.es/m/ZUuk-8CfbYeq6g_u@pryzbyj2023
2023-11-19 21:03:38 +01:00
Dean Rasheed b218fbb7a3 Guard against overflow in interval_mul() and interval_div().
Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.

Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.

Given that these errors are relatively easy to hit, back-patch to all
supported branches.

Per bug #18200 from Alexander Lakhin, and subsequent investigation.

Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
2023-11-18 14:41:20 +00:00
Andres Freund b2e237afdd Release lock on heap buffer before vacuuming FSM
When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Release the lock before calling
FreeSpaceMapVacuumRange() and, while we're at it, ensure the range
includes the heap block we just vacuumed.

There are no known deadlocks or other similar issues, therefore don't
backpatch. It's certainly not good to do all this work under a lock, but it's
not frequently reached, making it not worth the risk of backpatching.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
2023-11-17 12:46:55 -08:00
Tom Lane f7816aec23 Extract column statistics from CTE references, if possible.
examine_simple_variable() left this as an unimplemented case years
ago, with the result that plans for queries involving un-flattened
CTEs might be much stupider than necessary.  It's not hard to extend
the existing logic for RTE_SUBQUERY cases to also be able to drill
down into CTEs, so let's do that.

There was some discussion of whether this patch breaks the idea
of a MATERIALIZED CTE being an optimization fence.  We concluded
it's okay, because we already allow the outer planner level to
see the estimated width and rowcount of the CTE result, and
letting it see column statistics too seems fairly equivalent.
Basically, what we expect of the optimization fence is that the
outer query should not affect the plan chosen for the CTE query.
Once that plan is chosen, it's okay for the outer planner level
to make use of whatever information we have about it.

Jian Guo and Tom Lane, per complaint from Hans Buschmann

Discussion: https://postgr.es/m/4504e67078d648cdac3651b2960da6e7@nidsa.net
2023-11-17 14:36:23 -05:00
Tom Lane 8d5573b92e Don't specify number of dimensions in cases where we don't know it.
A few places in array_in() and plperl would report a misleading value
(always MAXDIM+1) for the number of dimensions in the input, because
we'd error out as soon as that was clearly too large rather than
scanning the entire input.  There doesn't seem to be much value in
offering the true number, at least not enough to justify the extra
complication involved in trying to get it.  So just remove that
parenthetical remark.  We already have other places that do it
like that, anyway.

Per suggestions from Alexander Lakhin and Heikki Linnakangas.

Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
2023-11-17 11:29:46 -05:00
Michael Paquier b1e5c9fa9a Change logtape/tuplestore code to use int64 for block numbers
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds.  This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER.  This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.

This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.

Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
2023-11-17 11:20:53 +09:00
Michael Paquier c99c7a4871 Remove NOT_USED BufFileTellBlock() from buffile.c
This routine has been marked as NOT_USED since 20ad43b576 from 2000,
and a patch is planned to switch the logtape/tuplestore APIs to rely on
int64 rather than long for the block nunbers, which is more portable.

Keeping it is more confusing than anything at this stage, so let's get
rid of it entirely.

Thanks for Heikki Linnakangas for the poke on this one.

Discussion: https://postgr.es/m/5047be8c-7ee6-4dd5-af76-6c916c3103b4@iki.fi
2023-11-17 10:46:50 +09:00
Tom Lane 743ddafc71 Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Michael Paquier 2e8a0edc2a Add target "slru" to pg_stat_reset_shared()
Currently, pg_stat_reset_shared() cannot reset the counters in the view
pg_stat_slru even if it is a type of shared stats.  This patch adds
support for a new value in pg_stat_reset_shared(), called "slru", able
to do that.  Note that pg_stat_reset_shared(NULL) also resets SLRU
counters.

There may be a point in removing pg_stat_reset_slru() that was
introduced in 28cac71bd3 (v13~) as the new option overlaps with this
function, but we would lose the ability to reset individual SLRU
counters.  This is left for future reconsideration.

Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/e3c25d72e81378e7b64f3c52e0306fc9@oss.nttdata.com
2023-11-16 15:41:34 +09:00
Nathan Bossart 6a72c42fd5 Retire MemoryContextResetAndDeleteChildren() macro.
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset().  Now
that some time has passed, this macro seems more likely to create
confusion.

This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset().  Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead.  Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.

Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
2023-11-15 13:42:30 -06:00
Heikki Linnakangas c21e6e2fd4 Clear CurrentResourceOwner earlier in CommitTransaction.
Alexander reported a crash with repeated create + drop database, after
the ResourceOwner rewrite (commit b8bff07daa). That was fixed by the
previous commit, but it nevertheless seems like a good idea clear
CurrentResourceOwner earlier, because you're not supposed to use it
for anything after we start releasing it.

Reviewed-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
2023-11-15 11:03:49 +01:00
Heikki Linnakangas a8b330ffb6 Fix dsa.c with different resource owners.
The comments in dsa.c suggested that areas were owned by resource
owners, but it was not in fact tracked explicitly. The DSM attachments
held by the dsa were owned by resource owners, but not the area
itself.  That led to confusion if you used one resource owner to
attach or create the area, but then switched to a different resource
owner before allocating or even just accessing the allocations in the
area with dsa_get_address(). The additional DSM segments associated
with the area would get owned by a different resource owner than the
initial segment.  To fix, add an explicit 'resowner' field to
dsa_area.  It replaces the 'mapping_pinned' flag; resowner == NULL now
indicates that the mapping is pinned.

This is arguably a bug fix, but I'm not backpatching because it
doesn't seem to be a live bug in the back branches. In 'master', it is
a bug because commit b8bff07daa made ResourceOwners more strict so
that you are no longer allowed to remember new resources in a
ResourceOwner after you have started to release it. Merely accessing a
dsa pointer might need to attach a new DSM segment, and before this
commit it was temporarily remembered in the current owner for a very
brief period even if the DSA was pinned. And that could happen in
AtEOXact_PgStat(), which is called after the owner is already released.

Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Thomas Munro, Andres Freund
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
2023-11-15 10:34:28 +01:00
Jeff Davis f26c2368dc Add cache for recomputeNamespacePath().
When search_path is changed to something that was previously set, and
no invalidation happened in between, use the cached list of namespace
OIDs rather than recomputing them. This avoids syscache lookups and
ACL checks.

Important when the search_path changes frequently, such as when set in
proconfig.

An earlier version of this patch was reviewd by Nathan Bossart. This
version simplifies a few things and is safer in case of OOM.

Discussion: https://www.postgresql.org/message-id/abf4ce8804e0e05dff8c1725ae6a8ed28b7d66e0.camel%40j-davis.com
Reviewed-by: Nathan Bossart
2023-11-14 17:53:30 -08:00
Robert Haas 025584a168 Change how a base backup decides which files have checksums.
Previously, it thought that any plain file located under global, base,
or a tablespace directory had checksums unless it was in a short list
of excluded files. Now, it thinks that files in those directories have
checksums if parse_filename_for_nontemp_relation says that they are
relation files. (Temporary relation files don't matter because they're
excluded from the backup anyway.)

This changes the behavior if you have stray files not managed by
PostgreSQL in the relevant directories. Previously, you'd get some
kind of checksum-related complaint if such files existed, assuming
that the cluster had checksums enabled and that the base backup
wasn't run with NOVERIFY_CHECKSUMS. Now, you won't get those
complaints any more. That seems like an improvement to me, because
those files were presumably not created by PostgreSQL and so there
is no reason to think that they would be checksummed like a
PostgreSQL relation file. (If we want to complain about such files,
we should complain about them existing at all, not just about their
checksums.)

The point of this change is to make the code more consistent.
sendDir() was already calling parse_filename_for_nontemp_relation()
as part of an effort to determine which files to include in the
backup. So, it already had the information about whether a certain
file was a relation file. sendFile() then used a separate method,
embodied in is_checksummed_file(), to make what is essentially
the same determination. It's better not to make the same decision
using two different methods, especially in closely-related code.

Patch by me. Reviewed by Dilip Kumar and Álvaro Herrera. Thanks
also to Jakub Wartak and Peter Eisentraut for comments, suggestions,
and testing on the larger patch set of which this is a part.

Discussion: http://postgr.es/m/CAFiTN-snhaKkWhi2Gz5i3cZeKefun6sYL==wBoqqnTXxX4_mFA@mail.gmail.com
Discussion: http://postgr.es/m/202311141312.u4qx5gtpvfq3@alvherre.pgsql
2023-11-14 10:51:05 -05:00
Dean Rasheed 519fc1bd9e Support +/- infinity in the interval data type.
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.

The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.

Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.

Bump catalog version.

Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.

Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
2023-11-14 10:58:49 +00:00
Peter Eisentraut 3849fe7c2b Replace Gen_dummy_probes.sed with Gen_dummy_probes.pl
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p.  This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.

(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)

As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary.  We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.

The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/3fd0f1bc-4483-4ba9-8aa0-64765b052039@eisentraut.org
2023-11-14 10:27:10 +01:00
Michael Paquier e5cca6288a Add support for pg_stat_reset_slru without argument
pg_stat_reset_slru currently requires an input argument, either:
- NULL to reset the SLRU counters of everything.
- A specific value to reset a single SLRU cache.

This commit adds support for a new pattern: pg_stat_reset_slru without
any argument works the same way as pg_stat_reset_slru(NULL), relying on
a DEFAULT in the function definition to handle this case.  This makes
the function more consistent with 23c8c0c8f4.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CALj2ACW1VizYg01EeH_cA-7qA+4NzWVAoZ5Lw9_XYO1RRHAZbA@mail.gmail.com
2023-11-14 09:50:52 +09:00
Tom Lane 83472de606 Improve readability and error detection of array_in().
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two.  This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing.  In any case this coding seems
much clearer and more straightforward than what we had before.

This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add.  Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.

Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.

Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.

Tom Lane, Heikki Linnakangas, and Jian He.  Thanks to Alexander Lakhin
for the initial report and for review/testing.

Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
2023-11-13 13:01:51 -05:00
Bruce Momjian acc95f29ef Add error about the use of FREEZE in COPY TO
Also clarify some other error wording.

Reported-by: Kyotaro Horiguchi

Discussion: https://postgr.es/m/20220802.133046.1941977979333284049.horikyota.ntt@gmail.com

Backpatch-through: master
2023-11-13 12:53:03 -05:00
Tom Lane 5c62ecf6ec Don't release index root page pin in ginFindParents().
It's clearly stated in the comments that ginFindParents() must keep
the pin on the index's root page that's associated with the topmost
GinBtreeStack item.  However, the code path for the case that the
desired downlink has been pushed down to the next index level
ignored this proviso, and would release the pin anyway if we were
still examining the root level.  That led to an assertion failure
or "buffer NNNN is not owned by resource owner" error later, when
we try to release the pin again at the end of the insertion.

This is quite hard to reproduce, since it can only happen if an
index root page split occurs concurrently with our own insertion.
Thanks to Jeff Janes for finding a test case that triggers it
often enough to allow investigation.

This has been there since the beginning of GIN, so back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
2023-11-13 11:44:35 -05:00
Etsuro Fujita 06e8e71e7f Remove incorrect file reference in comment.
Commit b7eda3e0e moved XidInMVCCSnapshot() from tqual.c into snapmgr.c,
but follow-up commit c91560def incorrectly updated this reference.  We
could fix it, but as pointed out by Daniel Gustafsson, 1) the reader can
easily find the file that contains the definition of that function, e.g.
by grepping, and 2) this kind of reference is prone to going stale; so
let's just remove it.

Back-patch to all supported branches.

Reviewed by Daniel Gustafsson.

Discussion: https://postgr.es/m/CAPmGK145VdKkPBLWS2urwhgsfidbSexwY-9zCL6xSUJH%2BBTUUg%40mail.gmail.com
2023-11-13 19:05:00 +09:00
Amit Kapila 861f86beea Use REGBUF_NO_CHANGE at one more place in the hash index.
Commit 00d7fb5e2e started to use REGBUF_NO_CHANGE at a few places in the
code where we register the buffer before marking it dirty but missed
updating one of the code flows in the hash index where we free the overflow
page without any live tuples on it.

Author: Amit Kapila and Hayato Kuroda
Discussion: http://postgr.es/m/f045c8f7-ee24-ead6-3679-c04a43d21351@gmail.com
2023-11-13 14:08:26 +05:30
Michael Paquier 7606175991 Extend sendFileWithContent() to handle custom content length in basebackup.c
sendFileWithContent() previously got the content length by using
strlen(), assuming that the content given is always a string.  Some
patches are under discussion to pass binary contents to a base backup
stream, where an arbitrary length needs to be given by the caller
instead.

The patch extends sendFileWithContent() to be able to handle this case,
where len < 0 can be used to indicate an arbitrary length rather than
rely on strlen() for the content length.

A comment in sendFileWithContent() mentioned the backup_label file.
However, this routine is used by more file types, like the tablespace
map, so adjust it in passing.

Author: David Steele
Discussion: https://postgr.es/m/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b@pgmasters.net
2023-11-13 08:26:44 +09:00
Michael Paquier 23c8c0c8f4 Add ability to reset all shared stats types in pg_stat_reset_shared()
Currently, pg_stat_reset_shared() can use an argument to specify the
target of statistics to reset, doing nothing for NULL as it is strict.

This patch adds to pg_stat_reset_shared() the possibility to reset all
the stats types already handled in this function rather than do nothing
if the argument value given is NULL or if nothing is specified
(proisstrict is switched to false).  Like previously, SLRUs are not
included in what gets reset.

The idea to use NULL or no argument to control if all the shared stats
already covered by this function should be reset has been proposed by
Andres Freund.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy,
Matthias van de Meent
Discussion: https://postgr.es/m/4291a55137ddda77cf7cc5f46e846daf@oss.nttdata.com
2023-11-12 16:43:12 +09:00
Alexander Korotkov b7f315c9d7 Fix how SJE checks against PHVs
It seems that a PHV evaluated/needed at or below the self join should not have
a problem if we remove the self join.  But this requires further investigation.
For now, we just do not remove self joins if the rel to be removed is laterally
referenced by PHVs.

Discussion: https://postgr.es/m/CAMbWs4-ns73VF9gi37q61G3dS6Xuos+HtryMaBh37WQn=BsaJw@mail.gmail.com
Author: Richard Guo
2023-11-10 22:46:46 +02:00
Amit Kapila 8bfb231b43 Prohibit max_slot_wal_keep_size to value other than -1 during upgrade.
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.

Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com
2023-11-10 08:45:01 +05:30
Tom Lane 36f5594c0f Fix computation of varnullingrels when const-folding field selection.
We can simplify FieldSelect on a whole-row Var into a plain Var
for the selected field.  However, we should copy the whole-row Var's
varnullingrels when we do so, because the new Var is clearly nullable
by exactly the same rels as the original.  Failure to do this led to
errors like "wrong varnullingrels (b) (expected (b 3)) for Var 2/2".

Richard Guo, per bug #18184 from Marian Krucina.  Back-patch to
v16 where varnullingrels was introduced.

Discussion: https://postgr.es/m/18184-5868dd258782058e@postgresql.org
2023-11-09 15:46:16 -05:00
Alexander Korotkov b44a1708ab Fix the way SJE removes references from PHVs
Add missing replacement of relids in phv->phexpr.  Also, remove extra
replace_relid() over phv->phrels.

Reported-by:  Zuming Jiang
Bug: #18187
Discussion: https://postgr.es/m/flat/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
2023-11-09 14:25:13 +02:00
Dean Rasheed 3850d4dec1 Avoid integer overflow hazard in interval_time().
When casting an interval to a time, the original code suffered from
64-bit integer overflow for inputs with a sufficiently large negative
"time" field, leading to bogus results.

Fix by rewriting the algorithm in a simpler form, that more obviously
cannot overflow. While at it, improve the test coverage to include
negative interval inputs.

Discussion: https://postgr.es/m/CAEZATCXoUKHkcuq4q63hkiPsKZJd0kZWzgKtU%2BNT0aU4wbf_Pw%40mail.gmail.com
2023-11-09 12:10:14 +00:00
Dean Rasheed a4f7d33a90 Fix AFTER ROW trigger execution in MERGE cross-partition update.
When executing a MERGE UPDATE action, if the UPDATE is turned into a
cross-partition DELETE then INSERT, do not attempt to invoke AFTER
UPDATE ROW triggers, or any of the other post-update actions in
ExecUpdateEpilogue().

For consistency with a plain UPDATE command, such triggers should not
be fired (and typically fail anyway), and similarly, other post-update
actions, such as WCO/RLS checks should not be executed, and might also
lead to unexpected failures.

Therefore, as with ExecUpdate(), make ExecMergeMatched() return
immediately if ExecUpdateAct() reports that a cross-partition update
was done, to be sure that no further processing is done for that
tuple.

Back-patch to v15, where MERGE was introduced.

Discussion: https://postgr.es/m/CAEZATCWjBgagyNZs02vgDF0DvASYj-iHTFtXG2-nP3orZhmtcw%40mail.gmail.com
2023-11-09 11:23:42 +00:00
David Rowley 10d34fefc2 Ensure we use the correct spelling of "ensure"
We seem to have accidentally used "insure" in a few places.  Correct
that.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
2023-11-10 00:15:54 +13:00
Heikki Linnakangas 8f4a1ab471 Fix bug in the new ResourceOwner implementation.
When the hash table is in use, ResoureOwnerSort() moves any elements
from the small fixed-size array to the hash table, and sorts it. When
the hash table is not in use, it sorts the elements in the small
fixed-size array directly. However, ResourceOwnerSort() and
ResourceOwnerReleaseAll() had different idea on when the hash table is
in use: ResourceOwnerSort() checked owner->nhash != 0, and
ResourceOwnerReleaseAll() checked owner->hash != NULL. If the hash
table was allocated but was currently empty, you hit an assertion
failure.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/be58d565-9e95-d417-4e47-f6bd408dea4b@gmail.com
2023-11-09 01:33:14 +02:00
Alvaro Herrera b0f7dd915b
Check stack depth in new recursive functions
Commit b0e96f3119 introduced a bunch of recursive functions, but
failed to make them check for stack depth.  This can cause the backend
to crash when operating on inheritance hierarchies several thousands
deep.  Protect the code by adding the missing stack depth checks.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2ac2392-9727-5f76-e890-721ac80c1615@gmail.com
2023-11-08 18:44:54 +01:00
Heikki Linnakangas 954e43564d Use a faster hash function in resource owners.
This buys back some of the performance loss that we otherwise saw from the
previous commit.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/d746cead-a1ef-7efe-fb47-933311e876a3%40iki.fi
2023-11-08 13:30:52 +02:00
Heikki Linnakangas b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Heikki Linnakangas b70c2143bb Move a few ResourceOwnerEnlarge() calls for safety and clarity.
These are functions where a lot of things happen between the
ResourceOwnerEnlarge and ResourceOwnerRemember calls. It's important
that there are no unrelated ResourceOwnerRemember calls in the code in
between, otherwise the reserved entry might be used up by the
intervening ResourceOwnerRemember and not be available at the intended
ResourceOwnerRemember call anymore. I don't see any bugs here, but the
longer the code path between the calls is, the harder it is to verify.

In bufmgr.c, there is a function similar to ResourceOwnerEnlarge,
ReservePrivateRefCountEntry(), to ensure that the private refcount
array has enough space. The ReservePrivateRefCountEntry() calls were
made at different places than the ResourceOwnerEnlargeBuffers()
calls. Move the ResourceOwnerEnlargeBuffers() and
ReservePrivateRefCountEntry() calls together for consistency.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:46 +02:00
Michael Paquier 1b2c6b756e Enlarge assertion in bloom_init() for false_positive_rate
false_positive_rate is a parameter that can be set with the bloom
opclass in BRIN, and setting it to a value of exactly 0.25 would trigger
an assertion in the first INSERT done on the index with value set.

The assertion changed here relied on BLOOM_{MIN|MAX}_FALSE_POSITIVE_RATE
that are somewhat arbitrary values, and specifying an out-of-range value
would also trigger a failure when defining such an index.  So, as-is,
the assertion was just doubling on the min-max check of the reloption.
This is now enlarged to check that it is a correct percentage value,
instead, based on a suggestion by Tom Lane.

Author: Alexander Lakhin
Reviewed-by: Tom Lane, Shihao Zhong
Discussion: https://postgr.es/m/17969-a6c54de48026d694@postgresql.org
Backpatch-through: 14
2023-11-08 14:06:26 +09:00
Alvaro Herrera 615f5f6faa
Stop including parsenodes.h in plannodes.h
I added it by mistake in commit 7103ebb7aa.  To clean up, struct
MergeAction needs to be moved to primnodes.h from parsenodes.h.  (This
forces us to also move OverridingKind to primnodes.h).

Having to add parsenodes.h to bootstrap.h as fallout is a bit
surprising, since nothing nominally needs it there.  However, per
comments in bootscanner.l, it is needed so that YYSTYPE can be declared.
I think this only started with commit dac048f71e, but I didn't
actually verify that.

In passing, stop including parsenodes.h in tcopprot.h.  Nothing needs it
there.

Per discussion on a patch by Ashutosh Bapat.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202311071106.6y7b2ascqjlz@alvherre.pgsql
2023-11-07 19:26:39 +01:00
Michael Paquier c2bdd2c5b1 Reorder two functions in inval.c
This file separates public and static functions with a separator
comment, but two routines were not defined in a location reflecting
that, so reorder them.

Author: Aleksander Alekseev
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CAJ7c6TMX2dd0g91UKvcC+CVygKQYJkKJq1+ZzT4rOK42+b53=w@mail.gmail.com
2023-11-07 11:55:13 +09:00
David Rowley ac7d6f5f83 Make use of initReadOnlyStringInfo() in more places
f0efa5aec introduced the concept of "read-only" StringInfos which makes
use of an existing, possibly not NUL terminated, buffer.

Here we adjust two places that make use of StringInfos to receive data
to avoid using appendBinaryStringInfo() in cases where a NUL termination
character is not required.  This saves a possible palloc() and saves
having to needlessly memcpy() from one buffer to another.

Here we adjust two places which were using appendBinaryStringInfo().
Neither of these cases seem particularly performance-critical.  In the
case of XLogWalRcvProcessMsg(), the appendBinaryStringInfo() was only
appending 24 bytes.  The change made here does mean that we can get rid
of the incoming_message global variable and make that local instead.

The apply_spooled_messages() case applies in logical decoding when
applying (possibly large) changes which have been serialized to a file.

Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAApHDvoxYUDHwqPf-ShvchsERf1RzmkGoLwg63JNvHCkDCuyKQ@mail.gmail.com
2023-11-07 11:16:43 +13:00
Tom Lane 18b585155a Detect integer overflow while computing new array dimensions.
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds.  In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen.  Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.

To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.

The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text.  v11 psql
lacks that parameter, so omit the tests in that branch.

Our thanks to Pedro Gallegos for reporting this problem.

Security: CVE-2023-5869
2023-11-06 10:56:43 -05:00
Tom Lane 3b0776fde5 Compute aggregate argument types correctly in transformAggregateCall().
transformAggregateCall() captures the datatypes of the aggregate's
arguments immediately to construct the Aggref.aggargtypes list.
This seems reasonable because the arguments have already been
transformed --- but there is an edge case where they haven't been.
Specifically, if we have an unknown-type literal in an ANY argument
position, nothing will have been done with it earlier.  But if we
also have DISTINCT, then addTargetToGroupList() converts the literal
to "text" type, resulting in the aggargtypes list not matching the
actual runtime type of the argument.  The end result is that the
aggregate tries to interpret a "text" value as being of type
"unknown", that is a zero-terminated C string.  If the text value
contains no zero bytes, this could result in disclosure of server
memory following the text literal value.

To fix, move the collection of the aggargtypes list to the end
of transformAggregateCall(), after DISTINCT has been handled.
This requires slightly more code, but not a great deal.

Our thanks to Jingzhou Fu for reporting this problem.

Security: CVE-2023-5868
2023-11-06 10:38:00 -05:00
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
Noah Misch b72de09a1b Set GUC "is_superuser" in all processes that set AuthenticatedUserId.
It was always false in single-user mode, in autovacuum workers, and in
background workers.  This had no specifically-identified security
consequences, but non-core code or future work might make it
security-relevant.  Back-patch to v11 (all supported versions).

Jelte Fennema-Nio.  Reported by Jelte Fennema-Nio.
2023-11-06 06:14:13 -08:00
Noah Misch 3a9b18b309 Ban role pg_signal_backend from more superuser backend types.
Documentation says it cannot signal "a backend owned by a superuser".
On the contrary, it could signal background workers, including the
logical replication launcher.  It could signal autovacuum workers and
the autovacuum launcher.  Block all that.  Signaling autovacuum workers
and those two launchers doesn't stall progress beyond what one could
achieve other ways.  If a cluster uses a non-core extension with a
background worker that does not auto-restart, this could create a denial
of service with respect to that background worker.  A background worker
with bugs in its code for responding to terminations or cancellations
could experience those bugs at a time the pg_signal_backend member
chooses.  Back-patch to v11 (all supported versions).

Reviewed by Jelte Fennema-Nio.  Reported by Hemanth Sandrana and
Mahendrakar Srinivasarao.

Security: CVE-2023-5870
2023-11-06 06:14:13 -08:00
Daniel Gustafsson 526fe0d799 Add XMLText function (SQL/XML X038)
This function implements the standard XMLTest function, which
converts text into xml text nodes. It uses the libxml2 function
xmlEncodeSpecialChars to escape predefined entities (&"<>), so
that those do not cause any conflict when concatenating the text
node output with existing xml documents.

This also adds a note in  features.sgml about not supporting
XML(SEQUENCE). The SQL specification defines a RETURNING clause
to a set of XML functions, where RETURNING CONTENT or RETURNING
SEQUENCE can be defined. Since PostgreSQL doesn't support
XML(SEQUENCE) all of these functions operate with an
implicit RETURNING CONTENT.

Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://postgr.es/m/86617a66-ec95-581f-8d54-08059cca8885@uni-muenster.de
2023-11-06 09:38:29 +01:00
Alexander Korotkov 93c85db3b5 Fix allocation of UniqueRelInfo
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_STsG1PKQBuvQC8W4sPo3KvML3=jOTjKLUYQuK3g8cpQ@mail.gmail.com
2023-11-06 10:04:01 +02:00
Alexander Korotkov ec63622c03 Fix usage of the parse tree for estimate_num_groups() in set operations
recurse_set_operations() uses the parse tree for the group number estimation,
because of the "varno 0" hack.  At the same time 2489d76c49 made root->parse
and corresponding parent_root->simple_rte_array[]->subquery distinct copies
of the parse tree, while d3d55ce571 introduced self-join removal replacing
relid of removed relation only in one of the copies.

The present commit fixes this bug by making recurse_set_operations() call
estimate_num_groups() with the copy of the parse tree processed by self-join
removal.

In future, we may think about maintaining just one copy of the parse tree
and/or keeping removed relids as aliases.

Reported-by: Zuming Jiang
Bug: #18170
Discussion: https://postgr.es/m/flat/18170-f1d17bf9a0d58b24%40postgresql.org
Author: Richard Guo, Alexander Korotkov
Reviewed-by: Andrei Lepikhov
2023-11-04 03:30:18 +02:00
Tom Lane 0bc726d95a Make GetConfigOption/GetConfigOptionResetString return "" for NULL.
As per the preceding commit, GUC APIs generally expose NULL-valued
string variables as empty strings.  Extend that policy to
GetConfigOption() and GetConfigOptionResetString(), eliminating
a crash hazard for unwary callers, as well as a fundamental
ambiguity in GetConfigOption()'s API.

No back-patch, since this is an API change and conceivably somebody
somewhere is depending on this corner case.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:53:36 -04:00
Tom Lane 7704a1a72e Be more wary about NULL values for GUC string variables.
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val.  Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice.  Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:47:33 -04:00
Jeff Davis a02b37fc08 Additional unicode primitive functions.
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().

The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.

Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
2023-11-01 22:47:06 -07:00
Daniel Gustafsson 0f852cccd9 Fix function name in comment
The name of the function resulting from the macro expansion was
incorrectly stated.

Backpatch to 16 where it was introduced.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20231101.172308.1740861597185391383.horikyota.ntt@gmail.com
Backpatch-through: v16
2023-11-01 11:46:30 +01:00
Bruce Momjian 989adace3f doc: 1-byte varlena headers can be used for user PLAIN storage
This also updates some C comments.

Reported-by: suchithjn22@gmail.com

Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org

Author: Laurenz Albe (doc patch)

Backpatch-through: 11
2023-10-31 09:10:35 -04:00
Bruce Momjian 75e700db45 improve alignment of postgresql.conf comments
Discussion: https://postgr.es/m/CAHut+Ps5MdQ1b4jp9rd63zfE2X25mV58y1W+hm2v53svtGDxBQ@mail.gmail.com

Author: Peter Smith

Backpatch-through: master
2023-10-31 08:51:36 -04:00
Noah Misch 13503eb590 Diagnose !indisvalid in more SQL functions.
pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen"
class XX.  The other functions succeeded on an empty index; they might
have malfunctioned if the failed index build left torn I/O or other
complex state.  Report an ERROR in statistics functions pgstatindex,
pgstatginindex, pgstathashindex, and pgstattuple.  Report DEBUG1 and
skip all index I/O in maintenance functions brin_desummarize_range,
brin_summarize_new_values, brin_summarize_range, and
gin_clean_pending_list.  Back-patch to v11 (all supported versions).

Discussion: https://postgr.es/m/20231001195309.a3@google.com
2023-10-30 14:46:05 -07:00
Bruce Momjian 56b30e266e pgindent run to fix commits de64268561 and 5ae2087202
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZT9YH7-TTx27V3yW@paquier.xyz

Backpatch-through: master
2023-10-30 14:52:35 -04:00
Peter Eisentraut 0c60e8ba80 Fill in more of ObjectProperty
Fill in .objtype field where an appropriate value exists.

These cases are currently not used (see also comments at
get_object_type()), but we might as well fill in what's possible in
case additional uses arise.

Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-10-30 06:08:53 -04:00
Michael Paquier dc5bd38894 Delay recovery mode LOG after reading backup_label and/or checkpoint record
When beginning recovery, a LOG is displayed by the startup process to
show which recovery mode will be used depending on the .signal file(s)
set in the data folder, like "standby mode", recovery up to a given
target type and value, or archive recovery.

A different patch is under discussion to simplify the startup code by
requiring the presence of recovery.signal and/or standby.signal when a
backup_label file is read.  Delaying a bit this LOG ensures that the
correct recovery mode would be reported, and putting it at this position
does not make it lose its value.

While on it, this commit adds a few comments documenting a bit more the
initial recovery steps and their dependencies, and fixes an incorrect
comment format.  This introduces no behavior changes.

Extracted from a larger patch by me.

Reviewed-by: David Steele, Bowen Shi
Discussion: https://postgr.es/m/ZArVOMifjzE7f8W7@paquier.xyz
2023-10-30 15:28:20 +09:00
Michael Paquier 1ffdc03c21 Mention standby.signal in FATALs for checkpoint record missing at recovery
When beginning recovery from a base backup by reading a backup_label
file, it may be possible that no checkpoint record is available
depending on the method used when the case backup was taken, which would
prevent recovery from beginning.  In this case, the FATAL messages
issued, initially added by c900c15269, mentioned recovery.signal as
an option to do recovery but not standby.signal.  Let's add it as an
available option, for clarity.

Per suggestion from Bowen Shi, extracted from a larger patch by me.

Author: Michael Paquier
Discussion: https://postgr.es/m/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
2023-10-30 13:56:02 +09:00
Michael Paquier 96f052613f Introduce pg_stat_checkpointer
Historically, the statistics of the checkpointer have been always part
of pg_stat_bgwriter.  This commit removes a few columns from
pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent,
renamed columns (plus a new one for the reset timestamp):
- checkpoints_timed -> num_timed
- checkpoints_req -> num_requested
- checkpoint_write_time -> write_time
- checkpoint_sync_time -> sync_time
- buffers_checkpoint -> buffers_written

The fields of PgStat_CheckpointerStats and its SQL functions are renamed
to match with the new field names, for consistency.  Note that
background writer and checkpointer have been split into two different
processes in commits 806a2aee37 and bf405ba8e4.  The pgstat
structures were already split, making this change straight-forward.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
2023-10-30 09:47:16 +09:00
Michael Paquier bf01e1ba96 Refactor some code related to transaction-level statistics for relations
This commit refactors find_tabstat_entry() so as transaction counters
for inserted, updated and deleted tuples are included in the result
returned.   If a shared entry is found for a relation, its result is now
a copy of the PgStat_TableStatus entry retrieved from shared memory.
This idea has been proposed by Andres Freund.

While on it, the following SQL functions, used in system views, are
refactored with macros, in the same spirit as 83a1a1b566, reducing the
amount of code:
- pg_stat_get_xact_tuples_deleted()
- pg_stat_get_xact_tuples_inserted()
- pg_stat_get_xact_tuples_updated()

There is now only one caller of find_tabstat_entry() in the tree.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com
2023-10-30 08:23:39 +09:00
Dean Rasheed b2d55447a5 Guard against overflow in make_interval().
The original code did very little to guard against integer or floating
point overflow when computing the interval's fields.  Detect any such
overflows and error out, rather than silently returning bogus results.

Joseph Koshakow, reviewed by Ashutosh Bapat and me.

Discussion: https://postgr.es/m/CAAvxfHcm1TPwH_zaGWuFoL8pZBestbRZTU6Z%3D-RvAdSXTPbKfg%40mail.gmail.com
2023-10-29 15:51:53 +00:00
Tom Lane 237f8765df Fix intra-query memory leak when a SRF returns zero rows.
When looping around after finding that the set-returning function
returned zero rows for the current input tuple, ExecProjectSet
neglected to reset either of the two memory contexts it's
responsible for cleaning out.  Typically this wouldn't cause much
problem, because once the SRF does return at least one row, the
contexts would get reset on the next call.  However, if the SRF
returns no rows for many input tuples in succession, quite a lot
of memory could be transiently consumed.

To fix, make sure we reset both contexts while looping around.

Per bug #18172 from Sergei Kornilov.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18172-9b8c5fc1d676ded3@postgresql.org
2023-10-28 14:05:01 -04:00
Bruce Momjian de64268561 doc: comment wording improvement
Discussion: https://postgr.es/m/CAEG8a3L7UoZXH1VmzpV-VDkex2kt68nWKuW1WiohoT=RrzYKWA@mail.gmail.com

Author: Junwang Zhao

Backpatch-through: master
2023-10-28 12:59:00 -04:00
Bruce Momjian 12cf3ac7f3 doc Improve C GUC-related comments
Discussion: https://postgr.es/m/CAEG8a3LZHTR5S+OPZCbZvECwsqdbx=pBRFZZyDjKaAtgoALOQQ@mail.gmail.com

Author: Junwang Zhao

Backpatch-through: master
2023-10-27 19:05:25 -04:00
Tomas Vondra c6cf6d353c Fix minmax-multi distance for extreme interval values
When calculating distance for interval values, the code mostly mimicked
interval_mi, i.e. it built a new interval value for the difference.
That however does not work for sufficiently distant interval values,
when the difference overflows the interval range.

Instead, we can calculate the distance directly, without constructing
the intermediate (and unnecessary) interval value.

Backpatch to 14, where minmax-multi indexes were introduced.

Reported-by: Dean Rasheed
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27 18:15:37 +02:00
Tomas Vondra 8da86d62a1 Fix minmax-multi on infinite date/timestamp values
Make sure that infinite values in date/timestamp columns are treated as
if in infinite distance. Infinite values should not be merged with other
values, leaving them as outliers. The code however returned distance 0
in this case, so that infinite values were merged first. While this does
not break the index (i.e. it still produces correct query results), it
may make it much less efficient.

We don't need explicit handling of infinite date/timestamp values when
calculating distances, because those values are represented as extreme
but regular values (e.g. INT64_MIN/MAX for the timestamp type).

We don't need an exact distance, just a value that is much larger than
distanced between regular values. With the added cast to double values,
we can simply subtract the values.

The regression test queries a value in the "gap" and checks the range
was properly eliminated by the BRIN index.

This only affects minmax-multi indexes on timestamp/date columns with
infinite values, which is not very common in practice. The affected
indexes may need to be rebuilt.

Backpatch to 14, where minmax-multi indexes were introduced.

Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27 18:15:37 +02:00
Tomas Vondra 394d517314 Fix calculation in brin_minmax_multi_distance_date
When calculating the distance between date values, make sure to subtract
them in the right order, i.e. (larger - smaller).

The distance is used to determine which values to merge, and is expected
to be a positive value. The code unfortunately did the subtraction in
the opposite order, i.e. (smaller - larger), thus producing negative
values and merging values the most distant values first.

The resulting index is correct (i.e. produces correct results), but may
be significantly less efficient. This affects all minmax-multi indexes
on date columns.

Backpatch to 14, where minmax-multi indexes were introduced.

Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27 18:15:37 +02:00
Tomas Vondra b5489b75c6 Fix overflow when calculating timestamp distance in BRIN
When calculating distances for timestamp values for BRIN minmax-multi
indexes, we need to be careful about overflows for extreme values. If
the value overflows into a negative value, the index may be inefficient.

The new regression test checks this for the timestamp type by adding a
table with enough values to force range compaction/merging. The values
are close to min/max, which means a risk of overflow.

Fixed by converting the int64 values to double first, before calculating
the distance. This prevents the overflow. We may lose some precision, of
course, but that's good enough. In the worst case we build a slightly
less efficient index, but for large distances this won't matter.

This only affects minmax-multi indexes on timestamp columns, with ranges
containing values sufficiently distant to cause an overflow. That seems
like a fairly rare case in practice.

Backpatch to 14, where minmax-multi indexes were introduced.

Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
2023-10-27 18:15:37 +02:00
Alexander Korotkov 2b26a69455 Make UniqueRelInfo a node
d3d55ce571 changed RelOptInfo.unique_for_rels from the list of Relid sets to
the list of UniqueRelInfo's.  But it didn't make UniqueRelInfo a node.
This commit makes UniqueRelInfo a node.  Also this commit revises some
comments related to RelOptInfo.unique_for_rels.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/flat/1189851.1698340331%40sss.pgh.pa.us
2023-10-27 05:45:16 +03:00
Michael Paquier 74604a37f2 Remove buffers_backend and buffers_backend_fsync from pg_stat_checkpointer
Two attributes related to checkpointer statistics are removed in this
commit:
- buffers_backend, that counts the number of buffers written directly by
a backend.
- buffers_backend_fsync, that counts the number of times a backend had
to do fsync() by its own.

These are actually not checkpointer properties but backend properties.
Also, pg_stat_io provides a more accurate and equivalent report of these
numbers, by tracking all the I/O stats related to backends, including
writes and fsyncs, so storing them in pg_stat_checkpointer was
redundant.

Thanks also to Robert Haas and Amit Kapila for their input.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund
Discussion: https://postgr.es/m/20230210004604.mcszbscsqs3bc5nx@awork3.anarazel.de
2023-10-27 11:16:39 +09:00
David Rowley 0c882a2988 Optimize various aggregate deserialization functions, take 2
f0efa5aec added initReadOnlyStringInfo to allow a StringInfo to be
initialized from an existing buffer and also relaxed the requirement
that a StringInfo's buffer must be NUL terminated at data[len].  Now
that we have that, there's no need for these aggregate deserial
functions to use appendBinaryStringInfo() as that rather wastefully
palloc'd a new buffer and memcpy'd in the bytea's buffer.  Instead, we can
just use the bytea's buffer and point the StringInfo directly to that
using the new initializer function.

In Amdahl's law, this speeds up the serial portion of parallel
aggregates and makes sum(numeric), avg(numeric), var_pop(numeric),
var_samp(numeric), variance(numeric), stddev_pop(numeric),
stddev_samp(numeric), stddev(numeric), array_agg(anyarray),
string_agg(text) and string_agg(bytea) scale better in parallel queries.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvr%3De-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR%3DcQ%40mail.gmail.com
2023-10-27 10:41:55 +13:00
Amit Langote 1f06b7fc6e Avoid compiler warning in non-assert builds
After 01575ad788, expand_single_inheritance_child()'s parentOID
variable is read only in an Assert, provoking a compiler warning in
non-assert builds.  Fix that by marking the variable with
PG_USED_FOR_ASSERTS_ONLY.

Per report and suggestion from David Rowley

Discussion: https://postgr.es/m/CAApHDvpjA_8Wxu4DCTRVAvPxC9atwMe6N%2ByvrcGsgb7mrfdpJA%40mail.gmail.com
2023-10-26 17:32:38 +09:00
Peter Eisentraut 611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
David Rowley f0efa5aec1 Introduce the concept of read-only StringInfos
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer.  There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h

Here we introduce 2 new functions to initialize StringInfos.  One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc.  Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation.  Having
this as a function allows us to verify the buffer is correctly NUL
terminated.  StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.

The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all.  StringInfos initialized this way
are deemed as "read-only".  This means that it's not possible to
append to them or reset them.

For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.

Incompatibility note:

Here we also forego adding the NUL in a few places where it does not
seem to be required.  These locations are passing the given StringInfo
into a type's receive function.  It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this.  It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.

Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
2023-10-26 16:31:48 +13:00
Amit Langote 01575ad788 Prevent duplicate RTEPermissionInfo for plain-inheritance parents
Currently, expand_single_inheritance_child() doesn't reset
perminfoindex in a plain-inheritance parent's child RTE, because
prior to 387f9ed0a0, the executor would use the first child RTE to
locate the parent's RTEPermissionInfo.  That in turn causes
add_rte_to_flat_rtable() to create an extra RTEPermissionInfo
belonging to the parent's child RTE with the same content as the one
belonging to the parent's original ("root") RTE.

In 387f9ed0a0, we changed things so that the executor can now use the
parent's "root" RTE for locating its RTEPermissionInfo instead of the
child RTE, so the latter's perminfoindex need not be set anymore, so
make it so.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us
Backpatch-through: 16
2023-10-26 11:53:56 +09:00
Amit Kapila 29d0a77fa6 Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.

If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.

Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 07:06:55 +05:30
Alexander Korotkov d3d55ce571 Remove useless self-joins
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result.  Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.

The SJE proof is based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
 1. At most one inner row matches the join clause.
 2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
 1. Collect all merge-joinable join quals which look like a.x = b.x
 2. Add to the list above the baseretrictinfo of the inner table.
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables.
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
    the possibility of self-join elimination inner and outer clauses must have
    an exact match.

The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins.  Tests, covering this feature,
were added to join.sql.  Some regression tests changed due to self-join removal
logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
2023-10-25 12:59:16 +03:00
Tom Lane 387f9ed0a0 Fix problems when a plain-inheritance parent table is excluded.
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not.  (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.)  In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:

1. The wrong table's statement-level triggers would get fired.

2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan.  This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.

In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.

While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch.  Given the lack of field complaints about the
trigger issue, I'll desist for now.  (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)

Per bug #18147 from Hans Buschmann.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-24 14:48:33 -04:00
Jeff Davis 00d7fb5e2e Assert that buffers are marked dirty before XLogRegisterBuffer().
Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.

Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.

Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a904@iki.fi
Reviewed-by: Heikki Linnakangas
2023-10-23 17:17:46 -07:00
Michael Paquier 9972c7de1d Fix typos in wait_event.c
Noticed while working on a different patch.  Introduced in af720b4c50.
2023-10-24 08:05:29 +09:00
Robert Haas 5b36e8f078 Change struct tablespaceinfo's oid member from 'char *' to 'Oid'
This shouldn't change behavior except in the unusual case where
there are file in the tablespace directory that have entirely
numeric names but are nevertheless not possible names for a
tablespace directory, either because their names have leading zeroes
that shouldn't be there, or the value is actually zero, or because
the value is too large to represent as an OID.

In those cases, the directory would previously have made it into
the list of tablespaceinfo objects and no longer will. Thus, base
backups will now ignore such directories, instead of treating them
as legitimate tablespace directories. Similarly, if entries for
such tablespaces occur in a tablespace_map file, they will now
be rejected as erroneous, instead of being honored.

This is infrastructure for future work that wants to be able to
know the tablespace of each relation that is part of a backup
*as an OID*. By strengthening the up-front validation, we don't
have to worry about weird cases later, and can more easily avoid
repeated string->integer conversions.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
2023-10-23 15:17:26 -04:00
Robert Haas 5c47c6546c Refactor parse_filename_for_nontemp_relation to parse more.
Instead of returning the number of characters in the RelFileNumber,
return the RelFileNumber itself. Continue to return the fork number,
as before, and additionally return the segment number.

parse_filename_for_nontemp_relation now rejects a RelFileNumber or
segment number that begins with a leading zero. Before, we accepted
such cases as relation filenames, but if we continued to do so after
this change, the function might return the same values for two
different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be
annoying for callers. Since we don't actually ever generate filenames
with leading zeroes in the names, any such files that we find must
have been created by something other than PostgreSQL, and it is
therefore reasonable to treat them as non-relation files.

Along the way, change unlogged_relation_entry to store a RelFileNumber
rather than an OID. This update should have been made in
851f4cc75c, but it was overlooked.
It's trivial to make the update as part of this commit, perhaps more
trivial than it would have been without it, so do that.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
2023-10-23 15:08:53 -04:00
Michael Paquier b6f1cca9ba Remove unnecessary break in pg_logical_replication_slot_advance()
pg_logical_replication_slot_advance() included a break condition to stop
when a targeted LSN is reached, when processing a series of WAL records
with XLogReadRecord().  Since 38a957316d, it matched with the check of
its main while loop.  This condition saved from an extra CFI check,
actually pointless, so let's remove this condition and simplify the
code.

In passing, fix an incorrect comment.

Author: Bharath Rupireddy
Reviewed-by: Tom Lane, Gurjeet Singh
Discussion: https://postgr.es/m/CALj2ACWfGDLQ2cy7ZKwxnJqbDkO6Yvqqrqxne5ZN4HYm=PRTGg@mail.gmail.com
2023-10-23 10:20:30 +09:00
Thomas Munro dab889d60b Fix min_dynamic_shared_memory on Windows.
When min_dynamic_shared_memory is set above 0, we try to find space in a
pre-allocated region of the main shared memory area instead of calling
dsm_impl_XXX() routines to allocate more.  The dsm_pin_segment() and
dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX()
routines even for main region segments.  Nobody noticed before now
because those routines do nothing on Unix, but on Windows they'd fail
while attempting to duplicate an invalid Windows HANDLE.  Add the
missing gating.

Back-patch to 14, where commit 84b1c63a added this feature.  Fixes
pgsql-bugs bug #18165.

Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org
2023-10-22 10:04:55 +13:00
Tom Lane 2d870b4aef Allow ALTER SYSTEM to set unrecognized custom GUCs.
Previously, ALTER SYSTEM failed if the target GUC wasn't present in
the session's GUC hashtable.  That is a reasonable behavior for core
(single-part) GUC names, and for custom GUCs for which we have loaded
an extension that's reserved the prefix.  But it's unnecessarily
restrictive otherwise, and it also causes inconsistent behavior:
you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar"
earlier in the session.  That's fairly silly.

Hence, refactor things so that we can execute ALTER SYSTEM even
if the variable doesn't have a GUC hashtable entry, as long as the
name meets the custom-variable naming requirements and does not
have a reserved prefix.  (It's safe to do this even if the
variable belongs to an extension we currently don't have loaded.
A bad value will at worst cause a WARNING when the extension
does get loaded.)

Also, adjust GRANT ON PARAMETER to have the same opinions about
whether to allow an unrecognized GUC name, and to throw the
same errors if not (it previously used a one-size-fits-all
message for several distinguishable conditions).  By default,
only a superuser will be allowed to do ALTER SYSTEM SET on an
unrecognized name, but it's possible to GRANT the ability to
do it.

Patch by me, pursuant to a documentation complaint from
Gavin Panella.  Arguably this is a bug fix, but given the
lack of other complaints I'll refrain from back-patching.

Discussion: https://postgr.es/m/2617358.1697501956@sss.pgh.pa.us
Discussion: https://postgr.es/m/169746329791.169914.16613647309012285391@wrigleys.postgresql.org
2023-10-21 13:35:19 -04:00
Alvaro Herrera 36a14afc07
Make some error strings more generic
It's undesirable to have SQL commands or configuration options in a
translatable error string, so take some of these out.
2023-10-20 22:52:15 +02:00
Tom Lane 2b5154beab Extend ALTER OPERATOR to allow setting more optimization attributes.
Allow the COMMUTATOR, NEGATOR, MERGES, and HASHES attributes to be set
by ALTER OPERATOR.  However, we don't allow COMMUTATOR/NEGATOR to be
changed once set, nor allow the MERGES/HASHES flags to be unset once
set.  Changes like that might invalidate plans already made, and
dealing with the consequences seems like more trouble than it's worth.
The main use-case we foresee for this is to allow addition of missed
properties in extension update scripts, such as extending an existing
operator to support hashing.  So only transitions from not-set to set
states seem very useful.

This patch also causes us to reject some incorrect cases that formerly
resulted in inconsistent catalog state, such as trying to set the
commutator of an operator to be some other operator that already has a
(different) commutator.

While at it, move the InvokeObjectPostCreateHook call for CREATE
OPERATOR to not occur until after we've fixed up commutator or negator
links as needed.  The previous ordering could only be justified by
thinking of the OperatorUpd call as a kind of ALTER OPERATOR step;
but we don't call InvokeObjectPostAlterHook therein.  It seems better
to let the hook see the final state of the operator object.

In the documentation, move the discussion of how to establish
commutator pairs from xoper.sgml to the CREATE OPERATOR ref page.

Tommy Pavlicek, reviewed and editorialized a bit by me

Discussion: https://postgr.es/m/CAEhP-W-vGVzf4udhR5M8Bdv88UYnPrhoSkj3ieR3QNrsGQoqdg@mail.gmail.com
2023-10-20 12:28:46 -04:00
Robert Haas afd12774ae During online checkpoints, insert XLOG_CHECKPOINT_REDO at redo point.
This allows tools that read the WAL sequentially to identify (possible)
redo points when they're reached, rather than only being able to
detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly
much later in the WAL stream. There are other possible applications as
well; see the discussion links below.

Any redo location that precedes the checkpoint location should now point
to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this.

While adjusting the code in CreateCheckPoint() for this patch, I made it
call WALInsertLockAcquireExclusive a bit later than before, since there
appears to be no need for it to be held while checking whether the system
is idle, whether this is an end-of-recovery checkpoint, or what the current
timeline is.

Bump XLOG_PAGE_MAGIC.

Patch by me, based in part on earlier work from Dilip Kumar. Review by
Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de
Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com
2023-10-19 14:47:29 -04:00
Tom Lane 8483a54b7d Doc: modernize comment for boolin().
Most of the behavior described by this comment was moved to
parse_bool_with_len() some time ago.  Move what's still
valuable there too, and drop the rest.

Peter Smith

Discussion: https://postgr.es/m/CAHut+PtMJURKp=U8Z=Ktp0zV40sEb1f-iEk9FvY2GQe+5ZBnwg@mail.gmail.com
2023-10-19 11:31:05 -04:00
Michael Paquier 295c36c0c1 Add local_blk_{read|write}_time I/O timing statistics for local blocks
There was no I/O timing statistics for counting read and write timings
on local blocks, contrary to the counterparts for temp and shared
blocks.  This information is available when track_io_timing is enabled.

The output of EXPLAIN is updated to show this information.  An update of
pg_stat_statements is planned next.

Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
2023-10-19 13:39:38 +09:00
Michael Paquier 13d00729d4 Rename I/O timing statistics columns to shared_blk_{read|write}_time
These two counters, defined in BufferUsage to track respectively the
time spent while reading and writing blocks have historically only
tracked data related to shared buffers, when track_io_timing is enabled.

An upcoming patch to add specific counters for local buffers will take
advantage of this rename as it has come up that no data is currently
tracked for local buffers, and tracking local and shared buffers using
the same fields would be inconsistent with the treatment done for temp
buffers.  Renaming the existing fields clarifies what the block type of
each stats field is.

pg_stat_statement is updated to reflect the rename.  No extension
version bump is required as 5a3423ad8e has done one, affecting v17~.

Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
2023-10-19 11:26:40 +09:00
Thomas Munro 76200e5ee4 jit: Changes for LLVM 17.
Changes required by https://llvm.org/docs/NewPassManager.html.

Back-patch to 12, leaving the final release of 11 unchanged, consistent
with earlier decision not to back-patch LLVM 16 support either.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BWXznXCyTgCADd%3DHWkP9Qksa6chd7L%3DGCnZo-MBgg9Lg%40mail.gmail.com
2023-10-19 05:13:23 +13:00
Thomas Munro f90b4a846b jit: Supply LLVMGlobalGetValueType() for LLVM < 8.
Commit 37d5babb used this C API function while adding support for LLVM
16 and opaque pointers, but it's not available in LLVM 7 and older.
Provide it in our own llvmjit_wrap.cpp.  It just calls a C++ function
that pre-dates LLVM 3.9, our minimum target.

Back-patch to 12, like 37d5babb.

Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
2023-10-19 03:01:55 +13:00
Thomas Munro 37d5babb5c jit: Support opaque pointers in LLVM 16.
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].

 * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
 * For LLVM == 15, we'll continue to do the same, explicitly opting
   out of opaque pointer mode.
 * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
   the extra type argument.

The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc.  The change is mostly mechanical,
except that at each site the correct type had to be provided.

In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.

Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.

[1] https://llvm.org/docs/OpaquePointers.html

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
2023-10-18 22:47:23 +13:00
Michael Paquier d17ffc734d Count write times when extending relation files for shared buffers
Relation files extended by multiple blocks at a time have been counting
the number of blocks written, but forgot to increment the write time in
this case, as single-block write and relation extension are treated as
two different I/O operations in the shared stats: IOOP_EXTEND vs
IOOP_WRITE.  In this case IOOP_EXTEND was forgotten for normal
(non-temporary) relations, still the number of blocks written was
incremented according to the relation extend done.

Write times are tracked when track_io_timing is enabled, which is not
the case by default.

Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
Backpatch-through: 16
2023-10-18 14:54:33 +09:00
Michael Paquier 173b56f1ef Add flush option to pg_logical_emit_message()
Since its introduction, LogLogicalMessage() (via the SQL interface
pg_logical_emit_message()) has never included a call to XLogFlush(),
causing it to potentially lose messages on a crash when used in
non-transactional mode.  This has come up to me as a problem while
playing with ideas to design a test suite for what has become
039_end_of_wal.pl introduced in bae868caf2 by Thomas Munro, because
there are no direct ways to force a WAL flush via SQL.

The default is false, to not flush messages and influence existing
use-cases where this function could be used.  If set to true, the
message emitted is flushed before returning back to the caller, making
the message durable on crash.  This new option has no effect when using
pg_logical_emit_message() in transactional mode, as the record's flush
is guaranteed by the WAL record generated by the transaction committed.

Two queries of test_decoding are tweaked to cover the new code path for
the flush.

Bump catalog version.

Author: Michael Paquier
Reviewed-by: Andres Freund, Amit Kapila, Fujii Masao, Tung Nguyen, Tomas
Vondra
Discussion: https://postgr.es/m/ZNsdThSe2qgsfs7R@paquier.xyz
2023-10-18 11:24:59 +09:00
Tom Lane 19fa977311 Dodge a compiler bug affecting timetz_zone/timetz_izone.
Use a modulo operator instead of implementing the same behavior
with a loop.  The loop solution is doubtless microscopically
faster for the typical case of only wrapping into the very next
day, but maybe not so much for large interval values.  In any
case, timetz is such a backwater that it's doubtful anybody
would notice any performance change anyway.

This avoids a compiler bug occurring in AIX's xlc, even in pretty
late-model revisions.

We did not have test coverage for the case where the initial
result->time value is negative, so add that.

For the moment, install this only in HEAD.  My plan is to
back-patch the test case, and then the code change assuming that
buildfarm testing proves the bug occurs in the back branches.
(That seems pretty likely, but let's find out for sure.)

Per buildfarm results from commits 97957fdba and 2f0472030.
Thanks to Michael Paquier for the idea to use a modulo operation
to replace the faulty loop.

Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
2023-10-17 13:10:35 -04:00
Nathan Bossart 97550c0711 Avoid calling proc_exit() in processes forked by system().
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system().  This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers.  If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit().  This can end badly.  For
example, both processes will try to remove themselves from the
PGPROC shared array.

To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before.  If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.

This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.

Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
2023-10-17 10:41:48 -05:00
Robert Haas 2406c4e34c Reword messages about impending (M)XID exhaustion.
First, we shouldn't recommend switching to single-user mode, because
that's terrible advice. Especially on newer versions where VACUUM
will enter emergency mode when nearing (M)XID exhaustion, it's
perfectly fine to just VACUUM in multi-user mode. Doing it that way
is less disruptive and avoids disabling the safeguards that prevent
actual wraparound, so recommend that instead.

Second, be more precise about what is going to happen (when we're
nearing the limits) or what is happening (when we actually hit them).
The database doesn't shut down, nor does it refuse all commands. It
refuses commands that assign whichever of XIDs and MXIDs are nearly
exhausted.

No back-patch. The existing hint that advises going to single-user
mode is sufficiently awful advice that removing it or changing it
might be justifiable even though we normally avoid changing
user-facing messages in back-branches, but I (rhaas) felt that it
was better to be more conservative and limit this fix to master
only. Aside from the usual risk of breaking translations, people
might be used to the existing message, or even have monitoring
scripts that look for it.

Alexander Alekseev, John Naylor, Robert Haas, reviewed at various
times by Peter Geoghegan, Hannu Krosing, and Andres Freund.

Discussion: http://postgr.es/m/CA+TgmoZBg95FiR9wVQPAXpGPRkacSt2okVge+PKPPFppN7sfnQ@mail.gmail.com
2023-10-17 10:34:21 -04:00
Robert Haas a1a5da8cb7 Talk about assigning, rather than generating, new MultiXactIds.
The word "assign" is used in various places internally to describe what
GetNewMultiXactId does, but the user-facing messages have previously
said "generate". For consistency, standardize on "assign," which seems
(at least to me) to be slightly clearer.

Discussion: http://postgr.es/m/CA+TgmoaoE1_i3=4-7GCTtKLVZVQ2Gh6qESW2VG1OprtycxOHMA@mail.gmail.com
2023-10-17 10:23:31 -04:00
Michael Paquier d6b0c2bcb1 Improve truncation of pg_serial/, removing "apparent wraparound" LOGs
It is possible that the tail XID of pg_serial/ gets ahead of its head
XID, which would cause the truncation of pg_serial/ done during
checkpoints to show up as a "wraparound" LOG in SimpleLruTruncate(),
which is confusing.  This also wastes a bit of disk space until the head
page is reclaimed again.

CheckPointPredicate() is changed so as the cutoff page for the
truncation is switched to the head page if the tail XID has advanced
beyond the head XID, rather than the tail page.  This prevents the
confusing LOG message about a wraparound while allowing some truncation
to be done to cut in disk space.

This could be considered as a bug fix, but the original behavior is
harmless as well, resulting only in disk space temporarily wasted, so
no backpatch is done.

Author: Sami Imseih
Reviewed-by: Heikki Linnakangas, Michael Paquier
Discussion: https://postgr.es/m/755E19CA-D02C-4A4C-80D3-74F775410C48@amazon.com
2023-10-17 14:36:21 +09:00
Amit Kapila 79243de13f Restart the apply worker if the privileges have been revoked.
Restart the apply worker if the subscription owner's superuser privileges
have been revoked. This is required so that the subscription connection
string gets revalidated and use the password option to connect to the
publisher for non-superusers, if required.

Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/CALDaNm2Dxmhq08nr4P6G+24QvdBo_GAVyZ_Q1TcGYK+8NHs9xw@mail.gmail.com
2023-10-17 08:41:44 +05:30
Tom Lane 54b208f909 Ensure we have a snapshot while dropping ON COMMIT DROP temp tables.
Dropping a temp table could entail TOAST table access to clean out
toasted catalog entries, such as large pg_constraint.conbin strings
for complex CHECK constraints.  If we did that via ON COMMIT DROP,
we triggered the assertion in init_toast_snapshot(), because
there was no provision for setting up a snapshot for the drop
actions.  Fix that.

(I assume here that the adjacent truncation actions for ON COMMIT
DELETE ROWS don't have a similar problem: it doesn't seem like
nontransactional truncations would need to touch any toasted fields.
If that proves wrong, we could refactor a bit to have the same
snapshot acquisition cover that too.)

The test case added here does not fail before v15, because that
assertion was added in 277692220 which was not back-patched.
However, the race condition the assertion warns of surely
exists further back, so back-patch to all supported branches.

Per report from Richard Guo.

Discussion: https://postgr.es/m/CAMbWs4-x26=_QxxgdJyNbiCDzvtr2WV5ZDso_v-CukKEe6cBZw@mail.gmail.com
2023-10-16 14:06:14 -04:00
Nathan Bossart 8fb13dd6ab Move extra code out of the Pre/PostRestoreCommand() section.
If SIGTERM is received within this section, the startup process
will immediately proc_exit() in the signal handler, so it is
inadvisable to include any more code than is required there (as
such code is unlikely to be compatible with doing proc_exit() in a
signal handler).  This commit moves the code recently added to this
section (see 1b06d7bac9 and 7fed801135) to outside of the section.
This ensures that the startup process only calls proc_exit() in its
SIGTERM handler for the duration of the system() call, which is how
this code worked from v8.4 to v14.

Reported-by: Michael Paquier, Thomas Munro
Analyzed-by: Andres Freund
Suggested-by: Tom Lane
Reviewed-by: Michael Paquier, Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 15
2023-10-16 12:41:55 -05:00
Michael Paquier e9718b4bd3 Fix code indentation violations in e83d1b0c40
koel has not reported this one yet, I have just bumped on it while
looking at a different patch.
2023-10-16 09:36:31 +09:00
Thomas Munro 01529c7040 Fix comment from commit 22655aa231.
Per automated complaint from BF animal koel this needed to be
re-indented, but there was also a typo.  Back-patch to 16.
2023-10-16 13:32:41 +13:00
Alexander Korotkov e83d1b0c40 Add support event triggers on authenticated login
This commit introduces trigger on login event, allowing to fire some actions
right on the user connection.  This can be useful for logging or connection
check purposes as well as for some personalization of environment.  Usage
details are described in the documentation included, but shortly usage is
the same as for other triggers: create function returning event_trigger and
then create event trigger on login event.

In order to prevent the connection time overhead when there are no triggers
the commit introduces pg_database.dathasloginevt flag, which indicates database
has active login triggers.  This flag is set by CREATE/ALTER EVENT TRIGGER
command, and unset at connection time when no active triggers found.

Author: Konstantin Knizhnik, Mikhail Gribkov
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru
Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko
Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund
Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk
Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu
2023-10-16 03:18:22 +03:00
Thomas Munro c558e6fd92 Acquire ControlFileLock in relevant SQL functions.
Commit dc7d70ea added functions that read the control file, but didn't
acquire ControlFileLock.  With unlucky timing, file systems that have
weak interlocking like ext4 and ntfs could expose partially overwritten
contents, and the checksum would fail.

Back-patch to all supported releases.

Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
2023-10-16 10:43:47 +13:00
Tom Lane fcdd6689d0 Harden xxx_is_visible() functions against concurrent object drops.
For the same reasons given in commit 403ac226d, adjust these
functions to not assume that checking SearchSysCacheExists can
guarantee success of a later fetch.

This follows the same internal API choices made in the earlier commit:
add a function XXXExt(oid, is_missing) and use that to eliminate
the need for a separate existence check.  The changes are very
straightforward, though tedious.  For the moment I just made the new
functions static in namespace.c, but we could export them if a need
emerges.

Per bug #18014 from Alexander Lakhin.  Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD.

Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
2023-10-14 16:13:11 -04:00
Tom Lane 403ac226dd Harden has_xxx_privilege() functions against concurrent object drops.
The versions of these functions that accept object OIDs are supposed
to return NULL, rather than failing, if the target object has been
dropped.  This makes it safe(r) to use them in queries that scan
catalogs, since the functions will be applied to objects that are
visible in the query's snapshot but might now be gone according to
the catalog snapshot.  In most cases we implemented this by doing
a SearchSysCacheExists test and assuming that if that succeeds, we
can safely invoke the appropriate aclchk.c function, which will
immediately re-fetch the same syscache entry.  It was argued that
if the existence test succeeds then the followup fetch must succeed
as well, for lack of any intervening AcceptInvalidationMessages call.

Alexander Lakhin demonstrated that this is not so when
CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly
dropped at the end of SearchSysCacheExists, and then it is possible
for the catalog snapshot to get advanced while re-fetching the entry.
Alexander's test case requires the operation to happen inside a
parallel worker, but that seems incidental to the fundamental problem.
What remains obscure is whether there is a way for this to happen in a
non-debug build.  Nonetheless, CATCACHE_FORCE_RELEASE is a very useful
test methodology, so we'd better make the code safe for it.

After some discussion we concluded that the most future-proof fix
is to give up the assumption that checking SearchSysCacheExists can
guarantee success of a later fetch.  At best that assumption leads
to fragile code --- for example, has_type_privilege appears broken
for array types even if you believe the assumption holds.  And it's
not even particularly efficient.

There had already been some work towards extending the aclchk.c
APIs to include "is_missing" output flags, so this patch extends
that work to cover all the aclchk.c functions that are used by the
has_xxx_privilege() functions.  (This allows getting rid of some
ad-hoc decisions about not throwing errors in certain places in
aclchk.c.)

In passing, this fixes the has_sequence_privilege() functions to
provide the same guarantees as their cousins: for some reason the
SearchSysCacheExists tests never got added to those.

There is more work to do to remove the unsafe coding pattern with
SearchSysCacheExists in other places, but this is a pretty
self-contained patch so I'll commit it separately.

Per bug #18014 from Alexander Lakhin.  Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD.  (Perhaps we should clean up the has_sequence_privilege()
oversight in the back branches, but in the absence of field complaints
I'm not too excited about that either.)

Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
2023-10-14 14:49:50 -04:00
Andres Freund 22655aa231 Fix bulk table extension when copying into multiple partitions
When COPYing into a partitioned table that does now permit the use of
table_multi_insert(), we could error out with
  ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes

because BulkInsertState->next_free was not reset between partitions. This
problem occurred only when not able to use table_multi_insert(), as a
dedicated BulkInsertState for each partition is used in that case.

The bug was introduced in 00d1e02be2, but it was hard to hit at that point,
as commonly bulk relation extension is not used when not using
table_multi_insert(). It became more likely after 82a4edabd2, which expanded
the use of bulk extension.

To fix the bug, reset the bulk relation extension state in BulkInsertState in
ReleaseBulkInsertStatePin(). That was added (in b1ecb9b3fc) to tackle a very
similar issue.  Obviously the name is not quite correct, but there might be
external callers, and bulk insert state needs to be reset in precisely in the
situations that ReleaseBulkInsertStatePin() already needed to be called.

Medium term the better fix likely is to disallow reusing BulkInsertState
across relations.

Add a test that, without the fix, reproduces #18130 in most
configurations. The test also catches the problem fixed in b1ecb9b3fc when
run with small shared_buffers.

Reported-by: Ivan Kolombet <enderstd@gmail.com>
Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us>
Analyzed-by: Andres Freund <andres@anarazel.de>
Bug: #18130
Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org
Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us
Backpatch: 16-
2023-10-13 19:16:44 -07:00
Nathan Bossart 8d140c5822 Improve the naming in wal_sync_method code.
* sync_method is renamed to wal_sync_method.

* sync_method_options[] is renamed to wal_sync_method_options[].

* assign_xlog_sync_method() is renamed to assign_wal_sync_method().

* The names of the available synchronization methods are now
  prefixed with "WAL_SYNC_METHOD_" and have been moved into a
  WalSyncMethod enum.

* PLATFORM_DEFAULT_SYNC_METHOD is renamed to
  PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is
  renamed to DEFAULT_WAL_SYNC_METHOD.

These more descriptive names help distinguish the code for
wal_sync_method from the code for DataDirSyncMethod (e.g., the
recovery_init_sync_method configuration parameter and the
--sync-method option provided by several frontend utilities).  This
change also prevents name collisions between the aforementioned
sets of code.  Since this only improves the naming of internal
identifiers, there should be no behavior change.

Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
2023-10-13 15:16:45 -05:00
Michael Paquier 97957fdbaa Add support for AT LOCAL
When converting a timestamp to/from with/without time zone, the SQL
Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the
session's time zone.  This includes three system functions able to do
the work in the same way as the existing flavors for AT TIME ZONE,
except that these need to be marked as stable as they depend on the
session's TimeZone GUC.

Bump catalog version.

Author: Vik Fearing
Reviewed-by: Laurenz Albe, Cary Huang, Michael Paquier
Discussion: https://postgr.es/m/8e25dec4-5667-c1a5-6581-167d710c2182@postgresfriends.org
2023-10-13 13:01:37 +09:00
Thomas Munro 0013ba290b Add wait events for checkpoint delay mechanism.
When MyProc->delayChkptFlags is set to temporarily block phase
transitions in a concurrent checkpoint, the checkpointer enters a
sleep-poll loop to wait for the flag to be cleared.  We should show that
as a wait event in the pg_stat_activity view.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGL7Whi8iwKbzkbn_1fixH3Yy8aAPz7mfq6Hpj7FeJrKMg%40mail.gmail.com
2023-10-13 16:43:22 +13:00
Robert Haas df9a3d4e99 Unify two isLogSwitch tests in XLogInsertRecord.
An upcoming patch wants to introduce an additional special case in
this function. To keep that as cheap as possible, minimize the amount
of branching that we do based on whether this is an XLOG_SWITCH
record.

Additionally, and also in the interest of keeping the overhead of
special-case code paths as low as possible, apply likely() to the
non-XLOG_SWITCH case, since only a very tiny fraction of WAL records
will be XLOG_SWITCH records.

Patch by me, reviewed by Dilip Kumar, Amit Kapila, Andres Freund,
and Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
2023-10-12 13:48:21 -04:00
David Rowley d9e46dfb78 Fix runtime partition pruning for HASH partitioned tables
This could only affect HASH partitioned tables with at least 2 partition
key columns.

If partition pruning was delayed until execution and the query contained
an IS NULL qual on one of the partitioned keys, and some subsequent
partitioned key was being compared to a non-Const, then this could result
in a crash due to the incorrect keyno being used to calculate the
stateidx for the expression evaluation code.

Here we fix this by properly skipping partitioned keys which have a
nullkey set.  Effectively, this must be the same as what's going on
inside perform_pruning_base_step().

Sergei Glukhov also provided a patch, but that's not what's being used
here.

Reported-by: Sergei Glukhov
Reviewed-by: tender wang, Sergei Glukhov
Discussion: https://postgr.es/m/d05b26fa-af54-27e1-f693-6c31590802fa@postgrespro.ru
Backpatch-through: 11, where runtime partition pruning was added.
2023-10-13 01:12:31 +13:00
David Rowley f0c409d9c7 Fix incorrect step generation in HASH partition pruning
get_steps_using_prefix_recurse() incorrectly assumed that it could stop
recursive processing of the 'prefix' list when cur_keyno was one before
the step_lastkeyno.  Since hash partition pruning can prune using IS
NULL quals, and these IS NULL quals are not present in the 'prefix'
list, then that logic could cause more levels of recursion than what is
needed and lead to there being no more items in the 'prefix' list to
process.  This would manifest itself as a crash in some code that
expected the 'start' ListCell not to be NULL.

Here we adjust the logic so that instead of stopping recursion at 1 key
before the step_lastkeyno, we just look at the llast(prefix) item and
ensure we only recursively process up until just before whichever the last
key is.  This effectively allows keys to be missing in the 'prefix' list.

This change does mean that step_lastkeyno is no longer needed, so we
remove that from the static functions.  I also spent quite some time
reading this code and testing it to try to convince myself that there
are no other issues.  That resulted in the irresistible temptation of
rewriting some comments, many of which were just not true or inconcise.

Reported-by: Sergei Glukhov
Reviewed-by: Sergei Glukhov, tender wang
Discussion: https://postgr.es/m/2f09ce72-315e-2a33-589a-8519ada8df61@postgrespro.ru
Backpatch-through: 11, where partition pruning was introduced.
2023-10-12 19:50:38 +13:00
Michael Paquier e7689190b3 Add option to bgworkers to allow the bypass of role login check
This adds a new option called BGWORKER_BYPASS_ROLELOGINCHECK to the
flags available to BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().

This gives the possibility to bgworkers to bypass the role login check,
making possible the use of a role that has no login rights while not
being a superuser.  PostgresInit() gains a new flag called
INIT_PG_OVERRIDE_ROLE_LOGIN, taking advantage of the refactoring done in
4800a5dfb4.

Regression tests are added to worker_spi to check the behavior of this
new option with bgworkers.

Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/bcc36259-7850-4882-97ef-d6b905d2fc51@gmail.com
2023-10-12 09:24:17 +09:00
Tom Lane b6a77c6a6c Reindent comment in GenericXLogFinish().
Restore pgindent cleanliness, per buildfarm member koel.
2023-10-11 17:14:31 -04:00
Tom Lane 5d8aa8bced Fix missed optimization in relation_excluded_by_constraints().
In commit 3fc6e2d7f, I (tgl) argued that we only need to check for
a constant-FALSE restriction clause when there's exactly one
restriction clause, on the grounds that const-folding would have
thrown away anything ANDed with a Const FALSE.  That's true just after
const-folding has been applied, but subsequent processing such as
equivalence class expansion could result in cases where a Const FALSE
is ANDed with some other stuff.  (Compare for instance joinrels.c's
restriction_is_constant_false.)  Hence, tweak this logic to check all
the elements of the baserestrictinfo list, not just one; that's cheap
enough to not be worth worrying about.

There is one existing test case where this visibly improves the plan.
There would not be any savings in runtime, but the planner effort and
executor startup effort will be reduced, and anyway it's odd that
we can detect related cases but not this one.

Richard Guo (independently discovered by David Rowley)

Discussion: https://postgr.es/m/CAMbWs4_x3-CnVVrCboS1LkEhB5V+W7sLSCabsRiG+n7+5_kqbg@mail.gmail.com
2023-10-11 12:51:38 -04:00
Heikki Linnakangas 16671ba6e7 Move canAcceptConnections check from ProcessStartupPacket to caller.
The check is not about processing the startup packet, so the calling
function seems like a more natural place. I'm also working on a patch
that moves 'canAcceptConnections' out of the Port struct, and this
makes that refactoring more convenient.

Reviewed-by: Tristan Partin
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-10-11 14:06:38 +03:00
Michael Paquier 4800a5dfb4 Refactor InitPostgres() to use bitwise option flags
InitPostgres() has been using a set of boolean arguments to control its
behavior, and a patch under discussion was aiming at expanding it with a
third one.  In preparation for expanding this area, this commit switches
all the current boolean arguments of this routine to a single bits32
argument instead.  Two values are currently supported for the flags:
- INIT_PG_LOAD_SESSION_LIBS to load [session|local]_preload_libraries at
startup.
- INIT_PG_OVERRIDE_ALLOW_CONNS to allow connection to a database even if
it has !datallowconn.  This is used by bgworkers.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZSTn66_BXRZCeaqS@paquier.xyz
2023-10-11 12:31:49 +09:00
Jeff Davis ef74c7197c Fix bug in GenericXLogFinish().
Mark the buffers dirty before writing WAL.

Discussion: https://postgr.es/m/25104133-7df8-cae3-b9a2-1c0aaa1c094a@iki.fi
Reviewed-by: Heikki Linnakangas
Backpatch-through: 11
2023-10-10 11:01:13 -07:00
Tom Lane 14661ba1a7 Replace has_multiple_baserels() with a bitmap test on all_baserels.
Since we added the PlannerInfo.all_baserels set, it's not really
necessary to grovel over the rangetable to count baserels in the
current query.  So let's drop has_multiple_baserels() in favor
of a bms_membership() test.  This might be microscopically
faster, but the main point is to remove some unnecessary code.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4_8RcSbbfs1ASZLrMuL0c0EQgXWcoLTQD8swBRY_pQQiA@mail.gmail.com
2023-10-10 13:08:29 -04:00
Peter Eisentraut 1d91d24d9a Add const to values and nulls arguments
This excludes any changes that would change the external AM APIs.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org
2023-10-10 07:50:43 +02:00
David Rowley fc4089f3c6 Fix possible crash in add_paths_to_append_rel()
While working on a8a968a82, I failed to consider that
cheapest_startup_path can be NULL when there is no non-parameterized
path in the pathlist.  This is well documented in set_cheapest(), I just
failed to notice.

Here we adjust the code to just check if the RelOptInfo has a
cheapest_startup_path set before adding it to the startup_subpaths list.

Reported-by: Richard Guo
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs49w3t03V69XhdCuw+GDwivny4uQUxrkVp6Gejaspt0wMQ@mail.gmail.com
2023-10-10 16:50:03 +13:00
David Rowley 4f3b56eea2 Revert "Optimize various aggregate deserialization functions"
This reverts commit 608fd198de.

On 2nd thoughts, the StringInfo API requires that strings are NUL
terminated and pointing directly to the data in a bytea Datum isn't NUL
terminated.

Discussion: https://postgr.es/m/CAApHDvorfO3iBZ=xpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ@mail.gmail.com
2023-10-10 14:16:54 +13:00
Heikki Linnakangas 637109d13a Rename StartBackgroundWorker() to BackgroundWorkerMain().
The comment claimed that it is "called from postmaster", but it is
actually called in the child process, pretty early in the process
initialization. I guess you could interpret "called from postmaster"
to mean that, but it seems wrong to me. Rename the function to be
consistent with other functions with similar role.

Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi
2023-10-09 11:52:09 +03:00
Heikki Linnakangas 0bbafb5342 Allocate Backend structs in PostmasterContext.
The child processes don't need them. By allocating them in
PostmasterContext, the memory gets free'd and is made available for
other stuff in the child processes.

Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi
2023-10-09 11:29:39 +03:00
Heikki Linnakangas 1ca312686e Clarify the checks in RegisterBackgroundWorker.
In EXEC_BACKEND or single-user mode, we process
shared_preload_libraries at postmaster startup as usual, but also at
backend startup. When a library calls RegisterBackgroundWorker() when
being loaded into a backend process, we go through the motions to add
the worker to BackgroundWorkerList, even though that is a
postmaster-private data structure. Make it return early when called in
a backend process, without changing BackgroundWorkerList.

You could argue that it was intentional: In non-EXEC_BACKEND mode, the
backend processes inherit BackgroundWorkerList at fork(), so it does
make some sense to initialize it to the same state in EXEC_BACKEND
mode, too. It's clearly a postmaster-private structure, though, and
all the functions that use it are clearly marked as "should only be
called in postmaster".

You could also argue that libraries should not call
RegisterBackgroundWorker() during backend startup. It's too late to
correctly register any static background workers at that stage. But
it's a common pattern in extensions, and it doesn't seem worth the
churn to require all extensions to change it.

Another sloppiness was the exception for "internal" background
workers. We checked that RegisterBackgroundWorker() was called during
shared_preload_libraries processing, or the background worker was an
internal one. That exception was made in commit 665d1fad99 to allow
postmaster to register the logical apply launcher in
ApplyLauncherRegister(). The way the check was written, it would not
complain if you registered an internal background worker in a regular
backend process. But it would complain if postmaster registered a
background worker defined in a shared library, outside
shared_preload_libraries processing. I think the correct rule is that
you can only register static background workers in the postmaster
process, and only before the bgworker shared memory array has been
initialized. Check for that more directly.

Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi
2023-10-09 11:29:33 +03:00
David Rowley 608fd198de Optimize various aggregate deserialization functions
The serialized representation of an internal aggregate state is a bytea
value.  In each deserial function, in order to "receive" the bytea value
we appended it onto a short-lived StringInfoData using
appendBinaryStringInfo.  This was a little wasteful as it meant having to
palloc memory, copy a (possibly long) series of bytes then later pfree
that memory.  Instead of going to this extra trouble, we can just fake up
a StringInfoData and point the data directly at the bytea's payload.  This
should help increase the performance of internal aggregate
deserialization.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAApHDvr=e-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR=cQ@mail.gmail.com
2023-10-09 17:25:16 +13:00
Amit Kapila 7cc2f59dd5 Remove duplicate words in docs and code comments.
Additionally, add a missing "the" in a couple of places.

Author: Vignesh C, Dagfinn Ilmari Mannsåker
Discussion: http://postgr.es/m/CALDaNm28t+wWyPfuyqEaARS810Je=dRFkaPertaLAEJYY2cWYQ@mail.gmail.com
2023-10-09 09:18:47 +05:30
David Rowley d8a295389b Strip off ORDER BY/DISTINCT aggregate pathkeys in create_agg_path
1349d2790 added code to adjust the PlannerInfo.group_pathkeys so that
ORDER BY / DISTINCT aggregate functions could obtain pre-sorted inputs
to allow faster execution.  That commit forgot to adjust the pathkeys in
create_agg_path().  Some code in there assumed that it was always fine
to make the AggPath's pathkeys the same as its subpath's.  That seems to
have been ok up until 1349d2790, but since that commit adds pathkeys for
columns which are within the aggregate function, those columns won't be
available above the aggregate node.  This can result in "could not find
pathkey item to sort" during create_plan().

The fix here is to strip off the additional pathkeys added by
adjust_group_pathkeys_for_groupagg().  It seems that the pathkeys here
will only ever be group_pathkeys, so all we need to do is check if the
length of the pathkey list is longer than the num_groupby_pathkeys and
get rid of the additional ones only if we see extras.

Reported-by: Justin Pryzby
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/ZQhYYRhUxpW3PSf9%40telsasoft.com
Backpatch-through: 16, where 1349d2790 was introduced
2023-10-09 16:37:05 +13:00
David Rowley 77db132637 Remove debug_print_rel and replace usages with pprint
Going by c4a1933b4, b33ef397a and 05893712c (to name just a few), it seems
that maintaining debug_print_rel() is often forgotten.  In the case of
c4a1933b4, it was several years before anyone noticed that a path type
was not handled by debug_print_rel().  (debug_print_rel() is only
compiled when building with OPTIMIZER_DEBUG).

After a quick survey on the pgsql-hackers mailing list, nobody came
forward to admit that they use OPTIMIZER_DEBUG.  So to prevent any future
maintenance neglect, let's just remove debug_print_rel() and have
OPTIMIZER_DEBUG make use of pprint() instead (as suggested by Tom Lane).
If anyone wants to come forward to claim they make use of
OPTIMIZER_DEBUG in a way that they need debug_print_rel() then they have
around 10 months remaining in the v17 cycle where we could revert this.
If nobody comes forward in that time, then we can likely safely declare
debug_print_rel() as not worth keeping.

Discussion: https://postgr.es/m/CAApHDvoCdjo8Cu2zEZF4-AxWG-90S+pYXAnoDDa9J3xH-OrczQ@mail.gmail.com
2023-10-09 15:53:16 +13:00
Alexander Korotkov 82a7132f53 Fix another typo in e0b1ee17dc
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_kHMJDak75y1kBTirv-drS1-knT-7Mpg5LprAjqRJDVA%40mail.gmail.com
2023-10-07 20:36:47 +03:00
Alexander Korotkov e8c334c47a Fix typos in e0b1ee17dc
Reported-by: Alexander Lakhin
2023-10-07 11:55:55 +03:00
Etsuro Fujita aec684ff0f Remove extra parenthesis from comment. 2023-10-06 18:30:00 +09:00
Alexander Korotkov e0b1ee17dc Skip checking of scan keys required for directional scan in B-tree
Currently, B-tree code matches every scan key to every item on the page.
Imagine the ordered B-tree scan for the query like this.

SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

The (col > 'a') scan key will be always matched once we find the location to
start the scan.  The (col < 'b') scan key will match every item on the page
as long as it matches the last item on the page.

This patch implements prechecking of the scan keys required for directional
scan on beginning of page scan.  If precheck is successful we can skip this
scan keys check for the items on the page.  That could lead to significant
acceleration especially if the comparison operator is expensive.

Idea from patch by Konstantin Knizhnik.

Discussion: https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571%40garret.ru
Reviewed-by: Peter Geoghegan, Pavel Borisov
2023-10-06 10:40:51 +03:00
Heikki Linnakangas 5da0a622e8 Fix crash on syslogger startup
When syslogger starts up, ListenSockets is still NULL. Don't try to
pfree it. Oversight in commit e29c464395.

Reported-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/ZR-uNkgL7m60lWUe@paquier.xyz
2023-10-06 10:22:02 +03:00
Peter Eisentraut 180e3394a7 Push attcompression and attstorage handling into BuildDescForRelation()
This was previously handled by the callers but it can be moved into a
common place.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-10-05 16:20:46 +02:00
Peter Eisentraut 04e485273b Move BuildDescForRelation() from tupdesc.c to tablecmds.c
BuildDescForRelation() main job is to convert ColumnDef lists to
pg_attribute/tuple descriptor arrays, which is really mostly an
internal subroutine of DefineRelation() and some related functions,
which is more the remit of tablecmds.c and doesn't have much to do
with the basic tuple descriptor interfaces in tupdesc.c.  This is also
supported by observing the header includes we can remove in tupdesc.c.
By moving it over, we can also (in the future) make
BuildDescForRelation() use more internals of tablecmds.c that are not
sensible to be exposed in tupdesc.c.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-10-05 16:20:46 +02:00
Peter Eisentraut 6d341407a6 Push attidentity and attgenerated handling into BuildDescForRelation()
Previously, this was handled by the callers separately, but it can be
trivially moved into BuildDescForRelation() so that it is handled in a
central place.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-10-05 16:20:46 +02:00
Heikki Linnakangas e29c464395 Refactor ListenSocket array.
Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.

Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.

Discussion: https://www.postgresql.org/message-id/7bb7ad65-a018-2419-742f-fa5fd877d338@iki.fi
2023-10-05 15:05:25 +03:00
Alvaro Herrera 1c99cde2f3
Improve JsonLexContext's freeability
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived.  With new features
being added for SQL/JSON this is no longer the case.  Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.

At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.

This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it.  AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.

Per Coverity complaint about datum_to_jsonb_internal().

Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
2023-10-05 10:59:08 +02:00
David Rowley a8a968a821 Consider cheap startup paths in add_paths_to_append_rel
6b94e7a6d did this for ordered append paths to allow fast startup
MergeAppends, however, nothing was done for the Append case.

Here we adjust add_paths_to_append_rel() to have it build an AppendPath
containing the cheapest startup paths from each of the child relations
when the append rel has "consider_startup" set.

Author: Andy Fan, David Rowley
Discussion: https://www.postgresql.org/message-id/CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com
2023-10-05 21:03:10 +13:00
David Rowley 0b053e78b5 Fix memory leak in Memoize code
Ensure we switch to the per-tuple memory context to prevent any memory
leaks of detoasted Datums in MemoizeHash_hash() and MemoizeHash_equal().

Reported-by: Orlov Aleksej
Author: Orlov Aleksej, David Rowley
Discussion: https://postgr.es/m/83281eed63c74e4f940317186372abfd%40cft.ru
Backpatch-through: 14, where Memoize was added
2023-10-05 20:30:47 +13:00
Peter Eisentraut 5e4282772a Remove RelationGetIndexRawAttOptions()
There was only one caller left, for which this function was overkill.

Also, having it in relcache.c was inappropriate, since it doesn't work
with the relcache at all.

Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org
2023-10-03 17:51:02 +02:00
Peter Eisentraut 7841623571 Remove IndexInfo.ii_OpclassOptions field
It is unnecessary to include this field in IndexInfo.  It is only used
by DDL code, not during execution.  It is really only used to pass
local information around between functions in index.c and indexcmds.c,
for which it is clearer to use local variables, like in similar cases.

Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org
2023-10-03 17:51:02 +02:00
Tom Lane af3ee8a086 Add some notes about why "ALTER TYPE enum DROP VALUE" is hard.
In hopes of putting these where any would-be implementer is sure to
find them, make a placeholder grammar production for ALTER DROP VALUE
and put them there.  This is really just a docs patch, though.

Vik Fearing, with a bit more wordsmithing by me

Discussion: https://postgr.es/m/9fffd149-da0f-0c9c-6745-731fb688642a@postgresfriends.org
2023-10-03 11:41:42 -04:00
Robert Haas c2ba3fdea5 In basebackup.c, refactor to create read_file_data_into_buffer.
This further reduces the length and complexity of sendFile(),
hopefully make it easier to understand and modify. In addition
to moving some logic into a new function, I took this opportunity
to make a few slight adjustments to sendFile() itself, including
renaming the 'len' variable to 'bytes_done', since we use it to represent
the number of bytes we've already handled so far, not the total
length of the file.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com
2023-10-03 11:00:40 -04:00
Robert Haas 053183138a In basebackup.c, refactor to create verify_page_checksum.
If checksum verification fails for a particular page, we reread the
page and try one more time. The code that does this somewhat complex
and difficult to follow. Move some of the logic into a new function
and rearrange the code a bit to try to make it clearer. This way,
we don't need the block_retry Boolean, a couple of other variables
move from sendFile() into the new function, and some code is now less
deeply indented.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoYt5jXH4U6cu1dm9Oe2FTn1aae6hBNhZzJJjyjbE_zYig@mail.gmail.com
2023-10-03 10:37:20 -04:00
Michael Paquier a956bd3fa9 Avoid memory size overflow when allocating backend activity buffer
The code in charge of copying the contents of PgBackendStatus to local
memory could fail on memory allocation because of an overflow on the
amount of memory to use.  The overflow can happen when combining a high
value track_activity_query_size (max at 1MB) with a large
max_connections, when both multiplied get higher than INT32_MAX as both
parameters treated as signed integers.  This could for example trigger
with the following functions, all calling pgstat_read_current_status():
- pg_stat_get_backend_subxact()
- pg_stat_get_backend_idset()
- pg_stat_get_progress_info()
- pg_stat_get_activity()
- pg_stat_get_db_numbackends()

The change to use MemoryContextAllocHuge() has been introduced in
8d0ddccec6, so backpatch down to 12.

Author: Jakub Wartak
Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com
Backpatch-through: 12
2023-10-03 15:37:00 +09:00
David Rowley 2075ba9dc9 Tidy-up some appendStringInfo*() usages
Make a few newish calls to appendStringInfo() which have no special
formatting use appendStringInfoString() instead.  Also, adjust usages of
appendStringInfoString() which only append a string containing a single
character to make use of appendStringInfoChar() instead.

This makes the code marginally faster, but primarily this change is so
we use the StringInfo type as it was intended to be used.

Discussion: https://postgr.es/m/CAApHDvpXKQmL+r=VDNS98upqhr9yGBhv2Jw3GBFFk_wKHcB39A@mail.gmail.com
2023-10-03 17:09:52 +13:00
Michael Paquier 6b18b3fe2c Fail hard on out-of-memory failures in xlogreader.c
This commit changes the WAL reader routines so as a FATAL for the
backend or exit(FAILURE) for the frontend is triggered if an allocation
for a WAL record decode fails in walreader.c, rather than treating this
case as bogus data, which would be equivalent to the end of WAL.  The
key is to avoid palloc_extended(MCXT_ALLOC_NO_OOM) in walreader.c,
relying on plain palloc() calls.

The previous behavior could make WAL replay finish too early than it
should.  For example, crash recovery finishing earlier may corrupt
clusters because not all the WAL available locally was replayed to
ensure a consistent state.  Out-of-memory failures would show up
randomly depending on the memory pressure on the host, but one simple
case would be to generate a large record, then replay this record after
downsizing a host, as Ethan Mertz originally reported.

This relies on bae868caf2, as the WAL reader routines now do the
memory allocation required for a record only once its header has been
fully read and validated, making xl_tot_len trustable.  Making the WAL
reader react differently on out-of-memory or bogus record data would
require ABI changes, so this is the safest choice for stable branches.
Also, it is worth noting that 3f1ce97346 has been using a plain
palloc() in this code for some time now.

Thanks to Noah Misch and Thomas Munro for the discussion.

Like the other commit, backpatch down to 12, leaving out v11 that will
be EOL'd soon.  The behavior of considering a failed allocation as bogus
data comes originally from 0ffe11abd3, where the record length
retrieved from its header was not entirely trustable.

Reported-by: Ethan Mertz
Discussion: https://postgr.es/m/ZRKKdI5-RRlta3aF@paquier.xyz
Backpatch-through: 12
2023-10-03 10:21:44 +09:00
Robert Haas 1ccc1e05ae Remove retry loop in heap_page_prune().
The retry loop is needed because heap_page_prune() calls
HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same
thing again, and they might get different answers due to concurrent
clog updates.  But this patch makes heap_page_prune() return the
HeapTupleSatisfiesVacuum() results that it computed back to the
caller, which allows lazy_scan_prune() to avoid needing to recompute
those values in the first place. That's nice both because it eliminates
the need for a retry loop and also because it's cheaper.

Melanie Plageman, reviewed by David Geier, Andres Freund, and me.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
2023-10-02 11:40:07 -04:00
Heikki Linnakangas e64c733bb1 Flush WAL stats in bgwriter
bgwriter can write out WAL, but did not flush the WAL pgstat counters,
so the writes were not seen in pg_stat_wal.

Back-patch to v14, where pg_stat_wal was introduced.

Author: Nazir Bilal Yavuz
Reviewed-by: Matthias van de Meent, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/CAN55FZ2FPYngovZstr%3D3w1KSEHe6toiZwrurbhspfkXe5UDocg%40mail.gmail.com
2023-10-02 12:39:35 +03:00
Heikki Linnakangas f0bd0b4489 Add rmgrdesc README
In the README, briefly explain what rmgrdesc functions are, and why
they are in a separate directory. Commit c03c2eae0a added some
guidelines on the preferred output format; move that to the README
too.

Reviewed-by: Melanie Plageman, Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/9159daf7-f42d-781b-458f-1b2cf32cb256%40iki.fi
2023-10-02 12:18:57 +03:00
Amit Langote c8ec5e0543 Revert "Add soft error handling to some expression nodes"
This reverts commit 7fbc75b26e.

Looks like the LLVM additions may not be totally correct.
2023-10-02 13:48:15 +09:00
Amit Langote 7fbc75b26e Add soft error handling to some expression nodes
This adjusts the expression evaluation code for CoerceViaIO and
CoerceToDomain to handle errors softly if needed.

For CoerceViaIo, this means using InputFunctionCallSafe(), which
provides the option to handle errors softly, instead of calling the
type input function directly.

For CoerceToDomain, this simply entails replacing the ereport() in
ExecEvalConstraintCheck() by errsave().

In both cases, the ErrorSaveContext to be used when evaluating the
expression is stored by ExecInitExprRec() in the expression's struct
in the expression's ExprEvalStep.  The ErrorSaveContext is passed by
setting ExprState.escontext to point to it when calling
ExecInitExprRec() on the expression whose errors are to be handled
softly.

Note that no call site of ExecInitExprRec() has been changed in this
commit, so there's no functional change.  This is intended for
implementing new SQL/JSON expression nodes in future commits that
will use to it suppress errors that may occur during type coercions.

Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-10-02 11:52:28 +09:00
Noah Misch e1f95ec8cf Correct assertion and comments about XLogRecordMaxSize.
The largest allocation, of xl_tot_len+8192, is in allocate_recordbuf().

Discussion: https://postgr.es/m/20230812211327.GB2326466@rfd.leadboat.com
2023-10-01 12:20:55 -07:00
Tom Lane 5b7b382464 Fix datalen calculation in tsvectorrecv().
After receiving position data for a lexeme, tsvectorrecv()
advanced its "datalen" value by (npos+1)*sizeof(WordEntry)
where the correct calculation is (npos+1)*sizeof(WordEntryPos).
This accidentally failed to render the constructed tsvector
invalid, but it did result in leaving some wasted space
approximately equal to the space consumed by the position data.
That could have several bad effects:

* Disk space is wasted if the received tsvector is stored into a
  table as-is.

* A legal tsvector could get rejected with "maximum total lexeme
  length exceeded" if the extra space pushes it over the MAXSTRPOS
  limit.

* In edge cases, the finished tsvector could be assigned a length
  larger than the allocated size of its palloc chunk, conceivably
  leading to SIGSEGV when the tsvector gets copied somewhere else.
  The odds of a field failure of this sort seem low, though valgrind
  testing could probably have found this.

While we're here, let's express the calculation as
"sizeof(uint16) + npos * sizeof(WordEntryPos)" to avoid the type
pun implicit in the "npos + 1" formulation.  It's not wrong
given that WordEntryPos had better be 2 bytes to avoid padding
problems, but it seems clearer this way.

Report and patch by Denis Erokhin.  Back-patch to all supported
versions.

Discussion: https://postgr.es/m/009801d9f2d9$f29730c0$d7c59240$@datagile.ru
2023-10-01 13:16:47 -04:00
Tom Lane d8a09939a3 In COPY FROM, fail cleanly when unsupported encoding conversion is needed.
In recent releases, such cases fail with "cache lookup failed for
function 0" rather than complaining that the conversion function
doesn't exist as prior versions did.  Seems to be a consequence of
sloppy refactoring in commit f82de5c46.  Add the missing error check.

Per report from Pierre Fortin.  Back-patch to v14 where the
oversight crept in.

Discussion: https://postgr.es/m/20230929163739.3bea46e5.pfortin@pfortin.com
2023-10-01 12:09:26 -04:00
Andrew Dunstan 276393f53e Only evaluate default values as required when doing COPY FROM
Commit 9f8377f7a2 was a little too eager in fetching default values.
Normally this would not matter, but if the default value is not valid
for the type (e.g. a varchar that's too long) it caused an unnecessary
error.

Complaint and fix from Laurenz Albe

Backpatch to release 16.

Discussion: https://postgr.es/m/75a7b7483aeb331aa017328d606d568fc715b90d.camel@cybertec.at
2023-10-01 10:18:41 -04:00
Andrew Dunstan f6d4c9cf16 Provide FORCE_NULL * and FORCE_NOT_NULL * options for COPY FROM
These options already exist, but you need to specify a column list for
them, which can be cumbersome. We already have the possibility of all
columns for FORCE QUOTE, so this is simply extending that facility to
FORCE_NULL and FORCE_NOT_NULL.

Author: Zhang Mingli
Reviewed-By: Richard Guo, Kyatoro Horiguchi, Michael Paquier.

Discussion: https://postgr.es/m/CACJufxEnVqzOFtqhexF2+AwOKFrV8zHOY3y=p+gPK6eB14pn_w@mail.gmail.com
2023-09-30 12:34:41 -04:00
Heikki Linnakangas c181f2e2bc Fix briefly showing old progress stats for ANALYZE on inherited tables.
ANALYZE on a table with inheritance children analyzes all the child
tables in a loop. When stepping to next child table, it updated the
child rel ID value in the command progress stats, but did not reset
the 'sample_blks_total' and 'sample_blks_scanned' counters.
acquire_sample_rows() updates 'sample_blks_total' as soon as the scan
starts and 'sample_blks_scanned' after processing the first block, but
until then, pg_stat_progress_analyze would display a bogus combination
of the new child table relid with old counter values from the
previously processed child table. Fix by resetting 'sample_blks_total'
and 'sample_blks_scanned' to zero at the same time that
'current_child_table_relid' is updated.

Backpatch to v13, where pg_stat_progress_analyze view was introduced.

Reported-by: Justin Pryzby
Discussion: https://www.postgresql.org/message-id/20230122162345.GP13860%40telsasoft.com
2023-09-30 17:03:50 +03:00
Dean Rasheed 1d5caec221 Fix EvalPlanQual rechecking during MERGE.
Under some circumstances, concurrent MERGE operations could lead to
inconsistent results, that varied according the plan chosen. This was
caused by a lack of rowmarks on the source relation, which meant that
EvalPlanQual rechecking was not guaranteed to return the same source
tuples when re-running the join query.

Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for
all non-target relations used in MERGE, in the same way that it does
for UPDATE and DELETE.

Per bug #18103. Back-patch to v15, where MERGE was introduced.

Dean Rasheed, reviewed by Richard Guo.

Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
2023-09-30 10:52:21 +01:00
Bruce Momjian 6d0c39a293 C comment: add optimizer function reference
Reported-by: James Coleman

Discussion: https://postgr.es/m/CAAaqYe9F6uoMhAr+8rMLwvGzaKaSknPA0Wi3Ehtv8pbSYmJq-Q@mail.gmail.com

Backpatch-through: master
2023-09-29 14:25:59 -04:00
David Rowley d40d827219 Robustify find_base_rel and find_base_rel_ignore_join
Improve find_base_rel() and find_base_rel_ignore_join() so that they
raise an ERROR if they ever receive a negative relid value in
non-cassert builds.  If either of these functions had ever received a
negative relid then they'd have attempted to access memory that does not
belong to simple_rel_array.

Because no evidence has been presented of actual cases where bugs have
caused this to happen, here we take a lightweight approach to checking
for negative values and simply cast both values to uint32 before
performing the comparison.  This will cause any negative relids to be
seen as greater than simple_rel_array_size which will ERROR rather than
attempt to access a negative simple_rel_array element.  Obviously, the
run-time error is better than a crash, so it makes sense to protect
against this, especially when it can be done without adding any
additional run-time overhead.

There is a slight change here if the functions are ever called with a
relid of 0.  This will pass the bounds check, but that array entry
should be NULL (along with the corresponding simple_rte_array entry), so
won't pass the "if (rel)" condition and still fall through and raise an
ERROR.

Author: Ranier Vilela
Reviewed-by: Ashutosh Bapat, David Rowley
Discussion: https://postgr.es/m/CAEudQArQSghBu2gLojg4o_tnHj_x2HcS%3D%2BwewL3NJS8z0VnK%2Bg%40mail.gmail.com
2023-09-29 16:58:32 +13:00
Peter Geoghegan 714780dcdd Fix btmarkpos/btrestrpos array key wraparound bug.
nbtree's mark/restore processing failed to correctly handle an edge case
involving array key advancement and related search-type scan key state.
Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore
processing (for a merge join) could incorrectly conclude that an
affected array/scan key must not have advanced during the time between
marking and restoring the scan's position.

As a result of all this, array key handling within btrestrpos could skip
a required call to _bt_preprocess_keys().  This confusion allowed later
primitive index scans to overlook tuples matching the true current array
keys.  The scan's search-type scan keys would still have spurious values
corresponding to the final array element(s) -- not values matching the
first/now-current array element(s).

To fix, remember that "array key wraparound" has taken place during the
ongoing btrescan in a flag variable stored in the scan's state, and use
that information at the point where btrestrpos decides if another call
to _bt_preprocess_keys is required.

Oversight in commit 70bc5833, which taught nbtree to handle array keys
during mark/restore processing, but missed this subtlety.  That commit
was itself a bug fix for an issue in commit 9e8da0f7, which taught
nbtree to handle ScalarArrayOpExpr quals natively.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
2023-09-28 16:29:37 -07:00
Tom Lane 9f71e10d65 Fix checking of index expressions in CompareIndexInfo().
This code was sloppy about comparison of index columns that
are expressions.  It didn't reliably reject cases where one
index has an expression where the other has a plain column,
and it could index off the start of the attmap array, leading
to a Valgrind complaint (though an actual crash seems unlikely).

I'm not sure that the expression-vs-column sloppiness leads
to any visible problem in practice, because the subsequent
comparison of the two expression lists would reject cases
where the indexes have different numbers of expressions
overall.  Maybe we could falsely match indexes having the
same expressions in different column positions, but it'd
require unlucky contents of the word before the attmap array.
It's not too surprising that no problem has been reported
from the field.  Nonetheless, this code is clearly wrong.

Per bug #18135 from Alexander Lakhin.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/18135-532f4a755e71e4d2@postgresql.org
2023-09-28 14:05:25 -04:00
Robert Haas 4e9fc3a976 Return data from heap_page_prune via a struct.
Previously, one of the values in the struct was returned as the return
value, and another was returned via an output parameter. In
preparation for returning more stuff, consolidate both values into a
struct returned via an output parameter.

Melanie Plageman, reviewed by Andres Freund and by me.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
2023-09-28 10:36:34 -04:00
David Rowley c4a1933b48 Add missing TidRangePath handling in print_path()
Tid Range scans were added back in bb437f995.  That commit forgot to add
handling for TidRangePaths in print_path().

Only people building with OPTIMIZER_DEBUG might have noticed this, which
likely is the reason it's taken 4 years for anyone to notice.

Author: Andrey Lepikhov
Reported-by: Andrey Lepikhov
Discussion: https://postgr.es/m/379082d6-1b6a-4cd6-9ecf-7157d8c08635@postgrespro.ru
Backpatch-through: 14, where bb437f995 was introduced
2023-09-29 00:02:22 +13:00
Etsuro Fujita c68f78538f Fix typo in src/backend/access/transam/README. 2023-09-28 19:45:00 +09:00
Amit Langote d060e921ea Remove obsolete executor cleanup code
This commit removes unnecessary ExecExprFreeContext() calls in
ExecEnd* routines because the actual cleanup is managed by
FreeExecutorState(). With no callers remaining for
ExecExprFreeContext(), this commit also removes the function.

This commit also drops redundant ExecClearTuple() calls, because
ExecResetTupleTable() in ExecEndPlan() already takes care of
resetting and dropping all TupleTableSlots initialized with
ExecInitScanTupleSlot() and ExecInitExtraTupleSlot().

After these modifications, the ExecEnd*() routines for ValuesScan,
NamedTuplestoreScan, and WorkTableScan became redundant. So, this
commit removes them.

Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2023-09-28 09:44:39 +09:00
Michael Paquier 9210afd3bc Move tracking of in_streaming to PGOutputData
"in_streaming" is a flag used to track if an instance of pgoutput is
streaming changes.  When pgoutput is started, the flag was always reset,
switched it back and forth in the stream start/stop callbacks.

Before this commit, it was a global variable, which is confusing as it
is actually attached to a state of PGOutputData.  Per my analysis, using
a global variable did not lead to an active bug like in 54ccfd6586,
but it makes the code more consistent.  Note that we cannot backpatch
this change anyway as it requires the addition of a new field to
PGOutputData, exposed in pgoutput.h.

Author: Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-09-28 09:33:51 +09:00
Peter Eisentraut ebf76f2753 Add TupleDescGetDefault()
This unifies some repetitive code.

Note: I didn't push the "not found" error message into the new
function, even though all existing callers would be able to make use
of it.  Using the existing error handling as-is would probably require
exposing the Relation type via tupdesc.h, which doesn't seem
desirable.  (Or even if we changed it to just report the OID, it would
inject the concept of a relation containing the tuple descriptor into
tupdesc.h, which might be a layering violation.  Perhaps some further
improvements could be considered here separately.)

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org
2023-09-27 18:52:40 +01:00
Daniel Gustafsson 9dce22033d llvmjit: Use explicit LLVMContextRef for inlining
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.

To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types.  Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.

This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.

Backpatching of this fix will be handled once it has matured
in master for a bit.

Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
2023-09-27 13:02:21 +02:00
Daniel Gustafsson ef668d8bf5 llvmjit: Make llvm_types_module variable static
Commit b059d2f456 introduced llvm_types_module and accidentally
exported it. As there is no usecase for accessing this variable
externally, this makes it static.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de
2023-09-27 13:02:14 +02:00
Daniel Gustafsson 2dad308e73 llvmjit: Remove unnecessary types
These types were added in fb46ac26fe but hasn't been used, so
remove until there is a need for them.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de
2023-09-27 13:02:01 +02:00
Amit Kapila 54ccfd6586 Fix the misuse of origin filter across multiple pg_logical_slot_get_changes() calls.
The pgoutput module uses a global variable (publish_no_origin) to cache
the action for the origin filter, but we didn't reset the flag when
shutting down the output plugin, so subsequent retries may access the
previous publish_no_origin value.

We fix this by storing the flag in the output plugin's private data.
Additionally, the patch removes the currently unused origin string from the
structure.

For the back branch, to avoid changing the exposed structure, we eliminated the
global variable and instead directly used the origin string for change
filtering.

Author: Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier
Backpatch-through: 16
Discussion: http://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-09-27 14:32:51 +05:30
Bruce Momjian 441bbd2988 doc: correct reference to pg_relation in comment
Reported-by: Dagfinn Ilmari Mannsåker

Discussion: https://postgr.es/m/87sf9apnr0.fsf@wibble.ilmari.org

Backpatch-through: master
2023-09-26 17:07:14 -04:00
Peter Eisentraut b0ae29512c MergeAttributes() and related variable renaming
Mainly, rename "schema" to "columns" and related changes.  The
previous naming has long been confusing.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org
2023-09-26 16:08:35 +01:00
Peter Eisentraut 369202bf4b Clean up MergeCheckConstraint()
If the constraint is not already in the list, add it ourselves,
instead of making the caller do it.  This makes the interface more
consistent with other "merge" functions in this file.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org
2023-09-26 14:01:53 +01:00
Heikki Linnakangas 28d3c2ddcf Fix another bug in parent page splitting during GiST index build.
Yet another bug in the ilk of commits a7ee7c851 and 741b88435. In
741b88435, we took care to clear the memorized location of the
downlink when we split the parent page, because splitting the parent
page can move the downlink. But we missed that even *updating* a tuple
on the parent can move it, because updating a tuple on a gist page is
implemented as a delete+insert, so the updated tuple gets moved to the
end of the page.

This commit fixes the bug in two different ways (belt and suspenders):

1. Clear the downlink when we update a tuple on the parent page, even
   if it's not split. This the same approach as in commits a7ee7c851
   and 741b88435.

   I also noticed that gistFindCorrectParent did not clear the
   'downlinkoffnum' when it stepped to the right sibling. Fix that
   too, as it seems like a clear bug even though I haven't been able
   to find a test case to hit that.

2. Change gistFindCorrectParent so that it treats 'downlinkoffnum'
   merely as a hint. It now always first checks if the downlink is
   still at that location, and if not, it scans the page like before.
   That's more robust if there are still more cases where we fail to
   clear 'downlinkoffnum' that we haven't yet uncovered. With this,
   it's no longer necessary to meticulously clear 'downlinkoffnum',
   so this makes the previous fixes unnecessary, but I didn't revert
   them because it still seems nice to clear it when we know that the
   downlink has moved.

Also add the test case using the same test data that Alexander
posted. I tried to reduce it to a smaller test, and I also tried to
reproduce this with different test data, but I was not able to, so
let's just include what we have.

Backpatch to v12, like the previous fixes.

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/18129-caca016eaf0c3702@postgresql.org
2023-09-26 14:14:49 +03:00
Peter Eisentraut 64b787656d Add some const qualifiers
There was a mismatch between the const qualifiers for
excludeDirContents in src/backend/backup/basebackup.c and
src/bin/pg_rewind/filemap.c, which led to a quick search for similar
cases.  We should make excludeDirContents match, but the rest of the
changes seem like a good idea as well.

Author: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/669a035c-d23d-2f38-7ff0-0cb93e01d610@pgmasters.net
2023-09-26 11:28:57 +01:00
Peter Eisentraut eddad679d2 Clean up MergeAttributesIntoExisting()
Make variable naming clearer and more consistent.  Move some variables
to smaller scope.  Remove some unnecessary intermediate variables.
Try to save some vertical space.

Apply analogous changes to nearby MergeConstraintsIntoExisting() and
RemoveInheritance() for consistency.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da%40eisentraut.org
2023-09-26 09:09:36 +01:00
Peter Eisentraut eb36c6ac84 Remove unused include
This was added in add5cf28d4 but was apparently never used.

Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org
2023-09-26 07:56:41 +01:00
Michael Paquier e221c0befb Fix behavior of "force" in pgstat_report_wal()
As implemented in 5891c7a8ed, setting "force" to true in
pgstat_report_wal() causes the routine to not wait for the pgstat
shmem lock if it cannot be acquired, in which case the WAL and I/O
statistics finish by not being flushed.  The origin of the confusion
comes from pgstat_flush_wal() and pgstat_flush_io(), that use "nowait"
as sole argument.  The I/O stats are new in v16.

This is the opposite behavior of what has been used in
pgstat_report_stat(), where "force" is the opposite of "nowait".  In
this case, when "force" is true, the routine sets "nowait" to false,
which would cause the routine to wait for the pgstat shmem lock,
ensuring that the stats are always flushed.  When "force" is false,
"nowait" is set to true, and the stats would only not be flushed if the
pgstat shmem lock can be acquired, returning immediately without
flushing the stats if the lock cannot be acquired.

This commit changes pgstat_report_wal() so as "force" has the same
behavior as in pgstat_report_stat().  There are currently three callers
of pgstat_report_wal():
- Two in the checkpointer where force=true during a shutdown and the
main checkpointer loop.  Now the code behaves so as the stats are always
flushed.
- One in the main loop of the bgwriter, where force=false.  Now the code
behaves so as the stats would not be flushed if the pgstat shmem lock
could not be acquired.

Before this commit, some stats on WAL and I/O could have been lost after
a shutdown, for example.

Reported-by: Ryoga Yoshida
Author: Ryoga Yoshida, Michael Paquier
Discussion: https://postgr.es/m/f87a4d7be70530606b864fd1df91718c@oss.nttdata.com
Backpatch-through: 15
2023-09-26 09:29:47 +09:00
Thomas Munro becfbdd6c1 Fix edge-case for xl_tot_len broken by bae868ca.
bae868ca removed a check that was still needed.  If you had an
xl_tot_len at the end of a page that was too small for a record header,
but not big enough to span onto the next page, we'd immediately perform
the CRC check using a bogus large length.  Because of arbitrary coding
differences between the CRC implementations on different platforms,
nothing very bad happened on common modern systems.  On systems using
the _sb8.c fallback we could segfault.

Restore that check, add a new assertion and supply a test for that case.
Back-patch to 12, like bae868ca.

Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGLCkTT7zYjzOxuLGahBdQ%3DMcF%3Dz5ZvrjSOnW4EDhVjT-g%40mail.gmail.com
2023-09-26 10:53:38 +13:00
Nathan Bossart 13aeaf0797 Add worker type to pg_stat_subscription.
Thanks to commit 2a8b40e368, the logical replication worker type is
easily determined.  The worker type could already be deduced via
other columns such as leader_pid and relid, but that is unnecessary
complexity for users.

Bumps catversion.

Author: Peter Smith
Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila
Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com
2023-09-25 14:12:43 -07:00
Tom Lane dc8d72c1c2 Collect dependency information for parsed CallStmts.
Parse analysis of a CallStmt will inject mutable information,
for instance the OID of the called procedure, so that subsequent
DDL may create a need to re-parse the CALL.  We failed to detect
this for CALLs in plpgsql routines, because no dependency information
was collected when putting a CallStmt into the plan cache.  That
could lead to misbehavior or strange errors such as "cache lookup
failed".

Before commit ee895a655, the issue would only manifest for CALLs
appearing in atomic contexts, because we re-planned non-atomic
CALLs every time through anyway.

It is now apparent that extract_query_dependencies() probably
needs a special case for every utility statement type for which
stmt_requires_parse_analysis() returns true.  I wanted to add
something like Assert(!stmt_requires_parse_analysis(...)) when
falling out of extract_query_dependencies_walker without doing
anything, but there are API issues as well as a more fundamental
point: stmt_requires_parse_analysis is supposed to be applied to
raw parser output, so it'd be cheating to assume it will give the
correct answer for post-parse-analysis trees.  I contented myself
with adding a comment.

Per bug #18131 from Christian Stork.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18131-576854e79c5cd264@postgresql.org
2023-09-25 14:42:17 -04:00
Tom Lane cf1c65070a Limit to_tsvector_byid's initial array allocation to something sane.
The initial estimate of the number of distinct ParsedWords is just
that: an estimate.  Don't let it exceed what palloc is willing to
allocate.  If in fact we need more entries, we'll eventually fail
trying to enlarge the array.  But if we don't, this allows success on
inputs that currently draw "invalid memory alloc request size".

Per bug #18080 from Uwe Binder.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/18080-d5c5e58fef8c99b7@postgresql.org
2023-09-25 11:50:28 -04:00
Tom Lane 3aff1d3fd0 Doc: improve cross-reference in Makefile comment.
Per gripe from Japin Li.

Discussion: https://postgr.es/m/MEYP282MB16692171F13B5DF40DB768EEB6FCA@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2023-09-25 11:25:19 -04:00
Daniel Gustafsson c1609cf3c0 Fix typo in numutils.c comments
s/messges/messages/
2023-09-25 13:29:34 +02:00
Daniel Gustafsson 7750fefdb2 Add GUC for temporarily disabling event triggers
In order to troubleshoot misbehaving or buggy event triggers, the
documented advice is to enter single-user mode.  In an attempt to
reduce the number of situations where single-user mode is required
(or even recommended) for non-extraordinary maintenance, this GUC
allows to temporarily suspend event triggers.

This was originally extracted from a larger patchset which aimed
at supporting event triggers on login events.

Reviewed-by: Ted Yu <yuzhihong@gmail.com>
Reviewed-by: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9@yesql.se
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709@postgrespro.ru
2023-09-25 12:41:49 +02:00
Thomas Munro bae868caf2 Don't trust unvalidated xl_tot_len.
xl_tot_len comes first in a WAL record.  Usually we don't trust it to be
the true length until we've validated the record header.  If the record
header was split across two pages, previously we wouldn't do the
validation until after we'd already tried to allocate enough memory to
hold the record, which was bad because it might actually be garbage
bytes from a recycled WAL file, so we could try to allocate a lot of
memory.  Release 15 made it worse.

Since 70b4f82a4b, we'd at least generate an end-of-WAL condition if the
garbage 4 byte value happened to be > 1GB, but we'd still try to
allocate up to 1GB of memory bogusly otherwise.  That was an
improvement, but unfortunately release 15 tries to allocate another
object before that, so you could get a FATAL error and recovery could
fail.

We can fix both variants of the problem more fundamentally using
pre-existing page-level validation, if we just re-order some logic.

The new order of operations in the split-header case defers all memory
allocation based on xl_tot_len until we've read the following page.  At
that point we know that its first few bytes are not recycled data, by
checking its xlp_pageaddr, and that its xlp_rem_len agrees with
xl_tot_len on the preceding page.  That is strong evidence that
xl_tot_len was truly the start of a record that was logged.

This problem was most likely to occur on a standby, because
walreceiver.c recycles WAL files without zeroing out trailing regions of
each page.  We could fix that too, but it wouldn't protect us from rare
crash scenarios where the trailing zeroes don't make it to disk.

With reliable xl_tot_len validation in place, the ancient policy of
considering malloc failure to indicate corruption at end-of-WAL seems
quite surprising, but changing that is left for later work.

Also included is a new TAP test to exercise various cases of end-of-WAL
detection by writing contrived data into the WAL from Perl.

Back-patch to 12.  We decided not to put this change into the final
release of 11.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com> (the idea, not the code)
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17928-aa92416a70ff44a2%40postgresql.org
2023-09-23 10:26:24 +12:00
Daniel Gustafsson 5f3aa309a8 Avoid potential pfree on NULL on OpenSSL errors
Guard against the pointer being NULL before pfreeing upon an error
returned from OpenSSL.  Also handle errors from X509_NAME_print_ex
which can return -1 on memory allocation errors.

Backpatch down to v15 where the code was added.

Author: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Discussion: https://postgr.es/m/8db5374d-32e0-6abb-d402-40762511eff2@postgrespro.ru
Backpatch-through: v15
2023-09-22 11:18:25 +02:00
Peter Eisentraut e59fcbd712 Simplify information schema check constraint deparsing
The computation of the column
information_schema.check_constraints.check_clause used
pg_get_constraintdef() plus some string manipulation to get the check
clause back out.  This ended up with an extra pair of parentheses,
which is only an aesthetic problem, but also with suffixes like "NOT
VALID", which don't belong into that column.  We can fix both of these
problems and simplify the code by just using pg_get_expr() instead.

Discussion: https://www.postgresql.org/message-id/799b59ef-3330-f0d2-ee23-8cdfa1740987@eisentraut.org
2023-09-22 07:43:26 +02:00
Tom Lane 48e2b234f8 Fix COMMIT/ROLLBACK AND CHAIN in the presence of subtransactions.
In older branches, COMMIT/ROLLBACK AND CHAIN failed to propagate
the current transaction's properties to the new transaction if
there was any open subtransaction (unreleased savepoint).
Instead, some previous transaction's properties would be restored.
This is because the "if (s->chain)" check in CommitTransactionCommand
examined the wrong instance of the "chain" flag and falsely
concluded that it didn't need to save transaction properties.

Our regression tests would have noticed this, except they used
identical transaction properties for multiple tests in a row,
so that the faulty behavior was not distinguishable from correct
behavior.

Commit 12d768e70 fixed the problem in v15 and later, but only rather
accidentally, because I removed the "if (s->chain)" test to avoid a
compiler warning, while not realizing that the warning was flagging a
real bug.

In v14 and before, remove the if-test and save transaction properties
unconditionally; just as in the newer branches, that's not expensive
enough to justify thinking harder.

Add the comment and extra regression test to v15 and later to
forestall any future recurrence, but there's no live bug in those
branches.

Patch by me, per bug #18118 from Liu Xiang.  Back-patch to v12 where
the AND CHAIN feature was added.

Discussion: https://postgr.es/m/18118-4b72fcbb903aace6@postgresql.org
2023-09-21 23:11:30 -04:00
Etsuro Fujita c621467d2b Update comment about set_join_pathlist_hook().
The comment introduced by commit e7cb7ee14 was a bit too terse, which
could lead to extensions doing different things within the hook function
than we intend to allow.  Extend the comment to explain what they can do
within the hook function.

Back-patch to all supported branches.

In passing, I rephrased a nearby comment that I recently added to the
back branches.

Reviewed by David Rowley and Andrei Lepikhov.

Discussion: https://postgr.es/m/CAPmGK15SBPA1nr3Aqsdm%2BYyS-ay0Ayo2BRYQ8_A2To9eLqwopQ%40mail.gmail.com
2023-09-21 19:45:00 +09:00
Michael Paquier c868cbfef7 Fix typos in pgoutput.c
RelationSyncCache was mentioned in two comments under a different name.
Issue noticed while reviewing a different patch touching the same area.

Introduced by 665d1fad99.

Discussion: https://postgr.es/m/ZQk1Ca_eFDTmBiZy@paquier.xyz
2023-09-20 10:02:12 +09:00
Peter Eisentraut c5b0582841 Replace more MemSet calls with struct initialization
This fixes up 10ea0f924a to use the style introduced by 9fd45870c1.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs490gJf5A=ydqyjh+Z8mVQa_foTGtcmBtHGLra0aOwLWHQ@mail.gmail.com
2023-09-19 11:35:01 +02:00
Heikki Linnakangas bf094372d1 Fix GiST README's explanation of the NSN cross-check.
The text got the condition backwards, it's "NSN > LSN", not "NSN < LSN".
While we're at it, expand it a little for clarity.

Reviewed-by: Daniel Gustafsson
Discussion: https://www.postgresql.org/message-id/4cb46e18-e688-524a-0f73-b1f03ed5d6ee@iki.fi
2023-09-19 11:53:51 +03:00
Peter Eisentraut 9847ca2c79 Standardize type of extend_by counter
The counter of extend_by loops is mixed int and uint32.  Fix by
standardizing from int to uint32, to match the extend_by variable.

Fixup for 31966b151e.

Author: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEudQAqHG-JP-YnG54ftL_b7v6-57rMKwET_MSvEoen0UHuPig@mail.gmail.com
2023-09-19 09:46:01 +02:00
Michael Paquier 78a33bba4c Improve error message for snapshot import in snapmgr.c, take two
When a snapshot file fails to be read in ImportSnapshot(), it would
issue an ERROR as "invalid snapshot identifier" when opening a stream
for it in read-only mode.  The error handling is improved to be more
talkative in failure cases:
- If a snapshot identifier uses incorrect characters, complain with the
same error as before this commit.
- If the snapshot file cannot be found in pg_snapshots/, complain with a
"snapshot \"foo\" does not exist" instead.  This maps to the case where
AllocateFile() fails on ENOENT.  Based on a suggestion from Andres
Freund.
- If AllocateFile() throws something else than ENOENT as errno, report
it with more details in %m instead, as these failures are never
expected.

b29504eeb489 was the first improvement take.  The older error message
exists since bb446b689b that introduced snapshot imports.  Two test
cases are added to cover the cases of an identifier with an incorrect
format and of a missing snapshot.

Author: Bharath Rupireddy
Reviewed-by: Andres Freund, Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com
2023-09-19 10:19:50 +09:00
Nathan Bossart 5af0263afd Make binaryheap available to frontend code.
There are a couple of places in frontend code that could make use
of this simple binary heap implementation.  This commit makes
binaryheap usable in frontend code, much like commit 26aaf97b68 did
for StringInfo.  Like StringInfo, the header file is left in lib/
to reduce the likelihood of unnecessary breakage.

The frontend version of binaryheap exposes a void *-based API since
frontend code does not have access to the Datum definitions.  This
seemed like a better approach than switching all existing uses to
void * or making the Datum definitions available to frontend code.

Reviewed-by: Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 12:18:33 -07:00
Tom Lane f73fa5a470 Don't crash if cursor_to_xmlschema is used on a non-data-returning Portal.
cursor_to_xmlschema() assumed that any Portal must have a tupDesc,
which is not so.  Add a defensive check.

It's plausible that this mistake occurred because of the rather
poorly chosen name of the lookup function SPI_cursor_find(),
which in such cases is returning something that isn't very much
like a cursor.  Add some documentation to try to forestall future
errors of the same ilk.

Report and patch by Boyu Yang (docs changes by me).  Back-patch
to all supported branches.

Discussion: https://postgr.es/m/dd343010-c637-434c-a8cb-418f53bda3b8.yangboyu.yby@alibaba-inc.com
2023-09-18 14:28:17 -04:00
Peter Eisentraut a0a5e0feb3 Fix information schema for catalogued not-null constraints
The column check_constraints.check_clause should be like

    col IS NOT NULL

without a surrounding CHECK (...).

Discussion: https://www.postgresql.org/message-id/09489196-0bc1-e796-c43e-63425f7c5910@eisentraut.org
2023-09-18 08:10:51 +02:00
Tom Lane e0e492e5a9 Track nesting depth correctly when drilling down into RECORD Vars.
expandRecordVariable() failed to adjust the parse nesting structure
correctly when recursing to inspect an outer-level Var.  This could
result in assertion failures or core dumps in corner cases.

Likewise, get_name_for_var_field() failed to adjust the deparse
namespace stack correctly when recursing to inspect an outer-level
Var.  In this case the likely result was a "bogus varno" error
while deparsing a view.

Per bug #18077 from Jingzhou Fu.  Back-patch to all supported
branches.

Richard Guo, with some adjustments by me

Discussion: https://postgr.es/m/18077-b9db97c6e0ab45d8@postgresql.org
2023-09-15 17:01:52 -04:00
Daniel Gustafsson a396e20ad0 Rename variable for code clarity
When tracking IO timing for WAL, the duration is what we calculate
based on the start and end timestamps, it's not what the variable
contains. Rename the timestamp variable to end to better communicate
what it contains.  Original patch by Krishnakumar with additional
hacking to fix another occurrence by me.

Author: Krishnakumar R <kksrcv001@gmail.com>
Discussion: https://postgr.es/m/CAPMWgZ9f9o8awrQpjo8oxnNQ=bMDVPx00NE0QcDzvHD_ZrdLPw@mail.gmail.com
2023-09-15 19:05:57 +02:00
Heikki Linnakangas 18724af9e8 Remove unnecessary smgrimmedsync() when creating unlogged table.
This became safe after commit 4b4798e138. The smgrcreate() call will
now register the segment for syncing at the next checkpoint, so we
don't need to sync it here. If a checkpoint happens before the
creation is WAL-logged, the records will be replayed when starting
recovery from the checkpoint. If a checkpoint happens after the WAL
logging, the checkpoint will fsync() it.

In the passing, clarify a comment in smgrDoPendingSyncs().

Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi
Reviewed-by: Robert Haas
2023-09-15 17:29:37 +03:00
Daniel Gustafsson b0ec61c9c2 Quote filenames in error messages
The majority of all filenames are quoted in user facing error and
log messages, but a few were still printed without quotes.  While
these filenames do not risk causing any ambiguity as their format
is strict, quote them anyways to be consistent across all logs.

Also concatenate a message to keep it one line to make it easier
to grep for in the code.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/080EEABE-6645-4A46-AB20-6285ADAC44FE@yesql.se
2023-09-14 11:17:33 +02:00
Peter Eisentraut be6f7cd9bb Fix indentation in SQL file 2023-09-14 09:42:43 +02:00
Michael Paquier be022908cf Revert "Improve error message on snapshot import in snapmgr.c"
This reverts commit a0d87bcd9b, following a remark from Andres Frend
that the new error can be triggered with an incorrect SET TRANSACTION
SNAPSHOT command without being really helpful for the user as it uses
the internal file name.

Discussion: https://postgr.es/m/20230914020724.hlks7vunitvtbbz4@awork3.anarazel.de
Backpatch-through: 11
2023-09-14 16:00:01 +09:00
Amit Kapila e0b2eed047 Flush logical slots to disk during a shutdown checkpoint if required.
It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly flushed to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart.  Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives.  As we don't flush the latest value of confirm_flush LSN, it
may lead to processing the same changes again without this patch.

The approach taken by this patch has been suggested by Ashutosh Bapat.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Michael Paquier, Ashutosh Bapat, Peter Smith, Hou Zhijie
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-09-14 08:57:05 +05:30
Andres Freund 7369798a83 Fix tracking of temp table relation extensions as writes
Karina figured out that I (Andres) confused BufferUsage.temp_blks_written with
BufferUsage.local_blks_written in fcdda1e4b5.

Tests in core PG can't easily test this, as BufferUsage is just used for
EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements. Thus this commit adds tests
for this to pg_stat_statements.

Reported-by: Karina Litskevich <litskevichkarina@gmail.com>
Author: Karina Litskevich <litskevichkarina@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACiT8ibxXA6+0amGikbeFhm8B84XdQVo6D0Qfd1pQ1s8zpsnxQ@mail.gmail.com
Backpatch: 16-, where fcdda1e4b5 was merged
2023-09-13 19:14:09 -07:00
Michael Paquier a0d87bcd9b Improve error message on snapshot import in snapmgr.c
When a snapshot file fails to be read in ImportSnapshot(), it would
issue an ERROR as "invalid snapshot identifier" when opening a stream
for it in read-only mode.  This error message is reworded to be the same
as all the other messages used in this case on failure, which is useful
when debugging this area.

Thinko introduced by bb446b689b where snapshot imports have been
added.  A backpatch down to 11 is done as this can improve any work
related to snapshot imports in older branches.

Author: Bharath Rupireddy
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com
Backpatch-through: 11
2023-09-14 10:30:08 +09:00
Michael Paquier b8f44a4779 Refactor error messages for unsupported providers in pg_locale.c
These code paths should not be reached normally, but if they are an
error with "(null)" as information for the collation provider would show
up if no locale is set, while we can assume that we are referring to
libc.

This refactors the code so as the provider is always reported even if no
locale is set.  The name of the function where the error happens is
added, while on it, as it can be helpful for debugging.

Issue introduced by d87d548cd0, so backpatch down to 16.

Author: Michael Paquier, Ranier Vilela
Reviewed-by: Jeff Davis, Kyotaro Horiguchi
Discussion: https://postgr.es/m/7073610042fcf97e1bea2ce08b7e0214b5e11094.camel@j-davis.com
Backpatch-through: 16
2023-09-14 08:35:02 +09:00
David Rowley ee3a551e96 Fix incorrect logic in plan dependency recording
Both 50e17ad28 and 29f45e299 mistakenly tried to record a plan dependency
on a function but mistakenly inverted the OidIsValid test.  This meant
that we'd record a dependency only when the function's Oid was
InvalidOid.  Clearly this was meant to *not* record the dependency in
that case.

50e17ad28 made this mistake first, then in v15 29f45e299 copied the same
mistake.

Reported-by: Tom Lane
Backpatch-through: 14, where 50e17ad28 first made this mistake
Discussion: https://postgr.es/m/2277537.1694301772@sss.pgh.pa.us
2023-09-14 11:27:29 +12:00
Amit Kapila f062cddafe Fix the ALTER SUBSCRIPTION to reflect the change in run_as_owner option.
Reported-by: Jeff Davis
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: http://postgr.es/m/17b62714fd115bd1899afd922954540a5c6a0467.camel@j-davis.com
2023-09-13 09:34:30 +05:30
Thomas Munro 3acd0599bd Fix exception safety bug in typcache.c.
If an out-of-memory error was thrown at an unfortunate time,
ensure_record_cache_typmod_slot_exists() could leak memory and leave
behind a global state that produced an infinite loop on the next call.

Fix by merging RecordCacheArray and RecordIdentifierArray into a single
array.  With only one allocation or re-allocation, there is no
intermediate state.

Back-patch to all supported releases.

Reported-by: "James Pang (chaolpan)" <chaolpan@cisco.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/PH0PR11MB519113E738814BDDA702EDADD6EFA%40PH0PR11MB5191.namprd11.prod.outlook.com
2023-09-13 14:58:22 +12:00
Michael Paquier e434e21e11 Remove redundant assignments in copyfrom.c
The tuple descriptor and the number of attributes are assigned twice to
the same values in BeginCopyFrom(), for what looks like a small thinko
coming from the refactoring done in c532d15ddd.

Author: Jingtang Zhang
Discussion: https://postgr.es/m/CAPsk3_CrYeXUVHEiaWAYxY9BKiGvGT3AoXo_+Jm0xP_s_VmXCA@mail.gmail.com
2023-09-09 21:12:41 +09:00
Daniel Gustafsson 5a3423ad8e Add JIT deform_counter
generation_counter includes time spent on both JIT:ing expressions
and tuple deforming which are configured independently via options
jit_expressions and jit_tuple_deforming.  As they are  combined in
the same counter it's not apparent what fraction of time the tuple
deforming takes.

This adds deform_counter dedicated to tuple deforming, which allows
seeing more directly the influence jit_tuple_deforming is having on
the query. The counter is exposed in EXPLAIN and pg_stat_statements
bumpin pg_stat_statements to 1.11.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20220612091253.eegstkufdsu4kfls@erthalion.local
2023-09-08 15:05:12 +02:00
Thomas Munro 04a09ee944 Teach WaitEventSetWait() to report multiple events on Windows.
The WAIT_USE_WIN32 implementation of WaitEventSetWait() previously
reported at most one event per call, because that's what the underlying
WaitForMultipleObjects() call does.

We can make the behavior match the three Unix implementations by looping
until our output buffer is full, or there are no more events available
now.  This makes no difference to most callers including the regular
FEBE socket code, since they ask for at most one event anyway.  A
difference in socket accept priority might be perceived by end users
after commit 7389aad6 started using WaitEventSet in the postmaster.
With this commit, the accept order now matches Unix systems, servicing
listening sockets in round-robin order.

We decided it wasn't really a bug or worth back-patching, but it seems
good to align the behavior across platforms.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Tested-by: "Wei Wang (Fujitsu)" <wangw.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA2dk29hr5zRP3HVJQ-_PncNJM6HVQ7aaYLXLRBZU-xw%40mail.gmail.com
2023-09-08 18:49:08 +12:00
Thomas Munro 9f0602539d Remove some more "snapshot too old" vestiges.
Commit f691f5b8 removed the logic, but left behind some now-useless
Snapshot arguments to various AM-internal functions, and missed a couple
of comments.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com
2023-09-08 17:12:12 +12:00
Michael Paquier e722846daf Improve BackendXidGetPid() to only access allProcs on matching XID
Compilers are able to optimize that, but it makes the code slightly more
readable this way.

Author: Zhao Junwang
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/CAEG8a3+i9gtqF65B+g_puVaCQuf0rZC-EMqMyEjGFJYOqUUWfA@mail.gmail.com
2023-09-08 10:00:29 +09:00
Robert Haas 9caf042088 Reorder tests in get_cheapest_path_for_pathkeys().
Checking parallel safety should be even cheaper than cost comparison, so
do that first.

Also make some minor, related comment improvements.

Richard Guo, reviewed by Aleksander Alekseev, Andy Fan, and me.

Discussion: http://postgr.es/m/CAMbWs4-KE2wf4QPj_Sr5mX4QFtBNNKGmxK=+e=KZEGUjdG33=g@mail.gmail.com
2023-09-07 13:51:35 -04:00
Alvaro Herrera ac22a9545c
Move privilege check to the right place
Now that ATExecDropConstraint doesn't recurse anymore, so it's wrong to
test privileges "during recursion" there.  Move the check to
dropconstraint_internal, which is the place where recursion occurs.

In passing, remove now-useless 'recursing' argument to
ATExecDropConstraint.

Discussion: https://postgr.es/m/202309051744.y4mndw5gwzhh@alvherre.pgsql
2023-09-07 12:15:18 +02:00
Alvaro Herrera 3af7217942
Update information_schema definition for not-null constraints
Now that we have catalogued not-null constraints, our information_schema
definition can be updated to grab those rather than fabricate synthetic
definitions.

Note that we still don't have catalog rows for not-null constraints on
domains, but we've never had not-null constraints listed in
information_schema, so that's a problem to be solved separately.

Co-authored-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/81b461c4-edab-5d8c-2f88-203108425340@enterprisedb.com
Discussion: https://postgr.es/m/202309041710.psytrxlsiqex@alvherre.pgsql
2023-09-07 11:33:01 +02:00
Thomas Munro 0da096d78e Fix recovery conflict SIGUSR1 handling.
We shouldn't be doing non-trivial work in signal handlers in general,
and in this case the handler could reach unsafe code and corrupt state.
It also clobbered its own "reason" code.

Move all recovery conflict decision logic into the next
CHECK_FOR_INTERRUPTS(), and have the signal handler just set flags and
the latch, following the standard pattern.  Since there are several
different "reasons", use a separate flag for each.

With this refactoring, the recovery conflict system no longer
piggy-backs on top of the regular query cancelation mechanism, but
instead raises an error directly if it decides that is necessary.  It
still needs to respect QueryCancelHoldoffCount, because otherwise the
FEBE protocol might get out of sync (see commit 2b3a8b20c2).

This fixes one class of intermittent failure in the new
031_recovery_conflict.pl test added by commit 9f8a050f, though the buggy
coding is much older.  Failures outside contrived testing seem to be
very rare (or perhaps incorrectly attributed) in the field, based on
lack of reports.

No back-patch for now due to complexity and release schedule.  We have
the option to back-patch into 16 later, as 16 has prerequisite commit
bea3d7e.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Michael Paquier <michael@paquier.xyz> (earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Tested-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACVr8au2J_9D88UfRCi0JdWhyQDDxAcSVav0B0irx9nXEg%40mail.gmail.com
2023-09-07 12:39:24 +12:00
Nathan Bossart 3ed1956719 Make enum for sync methods available to frontend code.
This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and
moves it to common/file_utils.h.  This is preparatory work for a
follow-up commit that will allow specifying the synchronization
method in frontend utilities such as pg_upgrade and pg_basebackup.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz
2023-09-06 16:26:39 -07:00
Michael Paquier 59cbf60c0f Remove column for wait event names in wait_event_names.txt
This file is now made of two columns, removing the column listing the
user-visible strings used in the system views and the documentation:
- Enum definitions for each class without the prefix "WAIT_EVENT_", so
as this information can be grepped in the code and wait_event_names.txt
at the same time.
- Description in the documentation.

The wait event names are now generated from the enum objects in
CamelCase, with the underscores removed.  The data generated for wait
events is consistent with what was produced by 414f6c0fb7.

This has the advantage to remove WAIT_EVENT_DOCONLY, which was a
placeholder for the wait event types Lock and LWLock as these two only
require the generation of the documentation.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
2023-09-06 10:27:02 +09:00
Michael Paquier 414f6c0fb7 Use more consistent names for wait event objects and types
The event names use the same case-insensitive characters, hence applying
lower() or upper() to the monitoring queries allows the detection of the
same events as before this change.  It is possible to cross-check the
data with the system view pg_wait_events, for instance, with a query
like that showing no differences:
SELECT lower(type), lower(name), description
  FROM pg_wait_events ORDER BY 1, 2;

This will help in the introduction of more simplifications in the format
of wait_event_names.  Some of the enum values in the code had to be
renamed a bit to follow the same convention naming across the board.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
2023-09-06 10:04:43 +09:00
Nathan Bossart f39b265808 Move PG_TEMP_FILE* macros to file_utils.h.
Presently, frontend code that needs to use these macros must either
include storage/fd.h, which declares several frontend-unsafe
functions, or duplicate the macros.  This commit moves these macros
to common/file_utils.h, which is safe for both frontend and backend
code.  Consequently, we can also remove the duplicated macros in
pg_checksums and stop including storage/fd.h in pg_rewind.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZOP5qoUualu5xl2Z%40paquier.xyz
2023-09-05 17:02:06 -07:00
Nathan Bossart 119c23eb98 Replace known_assigned_xids_lck with memory barriers.
This lock was introduced before memory barrier support was added,
and it is only used to guarantee proper memory ordering when
KnownAssignedXidsAdd() appends to the array without a lock.  Now
that such memory barrier support exists, we can remove the lock and
use barriers instead.

Suggested-by: Tom Lane
Author: Michail Nikolaev
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com
2023-09-05 13:59:06 -07:00
Thomas Munro f691f5b80a Remove the "snapshot too old" feature.
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability.  We agreed to remove it, after a long period
without an active plan to fix it.

This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
2023-09-05 19:53:43 +12:00
Michael Paquier aa0d350456 Improve description of keys in tsvector
If all the bits of a key in a tsvector are true (marked with ALLISTRUE),
gtsvectorout() would show the following description:
"0 true bits, 0 false bits"

This is confusing, as all the bits are true, but this would be
equivalent to the information if siglen is 0.

This commit improves the output so as "all true bits" show instead in
this case.  Alexander has proposed a regression test for pageinspect,
not included here as it is rather expensive compared to its coverage
value.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org
2023-09-05 13:57:44 +09:00
Michael Paquier ae10dbb0c5 Fix out-of-bound read in gtsvector_picksplit()
This could lead to an imprecise choice when splitting an index page of a
GiST index on a tsvector, deciding which entries should remain on the
old page and which entries should move to a new page.

This is wrong since tsearch2 has been moved into core with commit
140d4ebcb4, so backpatch all the way down.  This error has been
spotted by valgrind.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org
Backpatch-through: 11
2023-09-04 14:55:37 +09:00
Amit Kapila e70ed4b1b8 Fix typo in decode.c.
Author: Hou Zhijie
Discussion: http://postgr.es/m/OS0PR01MB57162DFFFCFCDA2E4B95899394E4A@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-09-04 09:06:15 +05:30
Michael Paquier 2b8e5273e9 Fix handling of shared statistics with dropped databases
Dropping a database while a connection is attempted on it was able to
lead to the presence of valid database entries in shared statistics.
The issue is that MyDatabaseId was getting set too early than it should,
as, if the connection attempted on the dropped database fails when
renamed or dropped, the shutdown callback of the shared statistics would
finish by re-inserting a correct entry related to the database already
dropped.

As analyzed by the bug reporters, this issue could lead to phantom
entries in the database list maintained by the autovacuum launcher
(in rebuild_database_list()) if the database dropped was part of the
database list when it was still valid.  After the database was dropped,
it would remain the highest on the list of databases to considered by
the autovacuum worker as things to process.  This would prevent
autovacuum jobs to happen on all the other databases still present.

The commit fixes this issue by delaying setting MyDatabaseId until the
database existence has been re-checked with the second scan on
pg_database after getting a shared lock on it, and by switching
pgstat_update_dbstats() so as nothing happens if MyDatabaseId is not
valid.

Issue introduced by 5891c7a8ed, so backpatch down to 15.

Reported-by: Will Mortensen, Jacob Speidel
Analyzed-by: Will Mortensen, Jacob Speidel
Author: Andres Freund
Discussion: https://postgr.es/m/17973-bca1f7d5c14f601e@postgresql.org
Backpatch-through: 15
2023-09-04 08:04:22 +09:00
Alvaro Herrera d0ec2ddbe0
Fix not-null constraint test
When a partitioned table has a primary key, trying to find the
corresponding not-null constraint for that column would come up empty,
causing code that's trying to check said not-null constraint to crash.
Fix by only running the check when the not-null constraint exists.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d57b4a69-7394-3146-5976-9a1ef27e7972@gmail.com
2023-09-01 19:49:20 +02:00
Alvaro Herrera e09d763e25
ATPrepAddPrimaryKey: ignore non-PK constraints
Because of lack of test coverage, this function added by b0e96f3119
wasn't ignoring constraint types other than primary keys, which it
should have.  Add some lines to a test for it.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs48bc-k_-1fh0dZpAhp_LiR5MfEX9haystmoBboR_4czCQ@mail.gmail.com
2023-09-01 14:21:27 +02:00
Heikki Linnakangas e8d74ad625 Report syncscan position at end of scan.
The comment in heapgettup_advance_block() says that it reports the
scan position before checking for end of scan, but that didn't match
the code. The code was refactored in commit 7ae0ab0ad9, which
inadvertently changed the order of the check and reporting. Change it
back.

This caused a few regression test failures with a small shared_buffers
setting like 10 MB. The 'portals' and 'cluster' tests perform seqscans
that are large enough that sync seqscans kick in. When the sync scan
position is not updated at end of scan, the next seq scan doesn't
start at the beginning of the table, and the test queries are
sensitive to that.

Reviewed-by: Melanie Plageman, David Rowley
Discussion: https://www.postgresql.org/message-id/6f991389-ae22-d844-a9d8-9aceb7c01a9a@iki.fi
Backpatch-through: 16
2023-08-31 13:02:15 +03:00
Peter Eisentraut d7ceb41b9b Correct ObjectProperty entry for transforms
There was some confusion in the ObjectProperty entry for transforms.
Some fields had values that were apparently meant for a different
field.  Also, some fields were not assigned, which is okay for most
fields, but not for all.  In particular, for .oid_catcache_id,
.name_catcache_id, and .objtype, zero is a valid value, so we need to
use -1 if not applicable.  It has apparently been like that from the
very beginning (commit cac7658205).  The faulty values were not
actually reachable, so it's not a big problem in practice, but we
should make it correct.

Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-08-31 11:11:59 +02:00
Peter Eisentraut f94dec76cc genbki.pl: Factor out boilerplate generation
Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-08-31 10:21:34 +02:00
Peter Eisentraut 226d0a6b98 Restructure DECLARE_INDEX arguments
Separate the table name from the index declaration.  We need that
anyway later for the ALTER TABLE / USING INDEX commands, so we might
as well structure the declarations like that to begin with.

Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-08-31 08:14:57 +02:00
Michael Paquier b5934bfd60 Fix some shadow variables in src/backend/replication/
The code is able to compile already without warnings under
-Wshadow=compatible-local, which is itself already enabled in the tree,
and the ones fixed here showed up with the more restrictive -Wshadow.

There are more of these that we may want to look at, and the ones fixed
here made the code confusing.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PuR0y4ofNOxi691VTVWmBfScHV9AaBMGSpeh8+DKp81Nw@mail.gmail.com
2023-08-31 08:07:48 +09:00
Nathan Bossart d0fe3046ee Use actual backend IDs in pg_stat_get_backend_subxact().
Unlike the other pg_stat_get_backend* functions,
pg_stat_get_backend_subxact() looks up the backend entry by using
its integer argument as a 1-based index in an internal array.  The
other functions look for the entry with the matching session
backend ID.  These numbers often match, but that isn't reliably
true.

This commit resolves this discrepancy by introducing
pgstat_get_local_beentry_by_backend_id() and using it in
pg_stat_get_backend_subxact().  We cannot use
pgstat_get_beentry_by_backend_id() because it returns a
PgBackendStatus, which lacks the locally computed additions
available in LocalPgBackendStatus that are required by
pg_stat_get_backend_subxact().

Author: Ian Barwick
Reviewed-by: Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
2023-08-30 14:47:01 -07:00
Nathan Bossart 3d51cb5197 Rename some support functions for pgstat* views.
Presently, pgstat_fetch_stat_beentry() accepts a session's backend
ID as its argument, and pgstat_fetch_stat_local_beentry() accepts a
1-based index in an internal array as its argument.  The former is
typically used wherever a user must provide a backend ID, and the
latter is usually used internally when looping over all entries in
the array.  This difference was first introduced by d7e39d72ca.
Before that commit, both functions accepted a 1-based index to the
internal array.

This commit renames these two functions to make it clear whether
they use the backend ID or the 1-based index to look up the entry.
This is preparatory work for a follow-up change that will introduce
a function for looking up a LocalPgBackendStatus using a backend
ID.

Reviewed-by: Ian Barwick, Sami Imseih, Michael Paquier, Robert Haas
Discussion: https://postgr.es/m/CAB8KJ%3Dj-ACb3H4L9a_b3ZG3iCYDW5aEu3WsPAzkm2S7JzS1Few%40mail.gmail.com
Backpatch-through: 16
2023-08-30 14:46:52 -07:00
Peter Eisentraut fe39f43352 Fix possible compiler warning
related to 1fa9241bdd
2023-08-30 16:21:21 +02:00
Nathan Bossart 6475e663b3 Fix misuse of PqMsg_Close.
EndCommand() and EndReplicationCommand() should use
PqMsg_CommandComplete instead.  Oversight in commit f4b54e1ed9.

Reported-by: Pavel Stehule, Tatsuo Ishii
Author: Pavel Stehule
Reviewed-by: Aleksander Alekseev, Michael Paquier
Discussion: https://postgr.es/m/CAFj8pRAMDCJXjnwiCkCB1yO1f7NPggFY8PwwAJDnugu-Z2G-Cg%40mail.gmail.com
2023-08-29 18:32:38 -07:00
Michael Paquier 52c6c0f196 Avoid possible overflow with ltsGetFreeBlock() in logtape.c
nFreeBlocks, defined as a long, stores the number of free blocks in a
logical tape.  ltsGetFreeBlock() has been using an int to store the
value of nFreeBlocks, which could lead to overflows on platforms where
long and int are not the same size (in short everything except Windows
where long is 4 bytes).

The problematic intermediate variable is switched to be a long instead
of an int.

Issue introduced by c02fdc9223, so backpatch down to 13.

Author: Ranier vilela
Reviewed-by: Peter Geoghegan, David Rowley
Discussion: https://postgr.es/m/CAEudQApLDWCBR_xmwNjGBrDo+f+S4E87x3s7-+hoaKqYdtC4JQ@mail.gmail.com
Backpatch-through: 13
2023-08-30 08:03:42 +09:00
Alvaro Herrera 9b581c5341
Disallow changing NO INHERIT status of a not-null constraint
It makes no sense to add a NO INHERIT not-null constraint to a child
table that already has one in that column inherited from its parent.
Disallow that, and add tests for the relevant cases.

Per complaint from Kyotaro Horiguchi.  I also used part of his proposed
patch.

Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20230828.161658.1184657435220765047.horikyota.ntt@gmail.com
2023-08-29 19:19:24 +02:00
Peter Eisentraut 63956bed7b Rename logical_replication_mode to debug_logical_replication_streaming
The logical_replication_mode GUC is intended for testing and debugging
purposes, but its current name may be misleading and encourage users to make
unnecessary changes.

To avoid confusion, renaming the GUC to a less misleading name
debug_logical_replication_streaming that casual users are less likely to mistakenly
assume needs to be modified in a regular logical replication setup.

Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/d672d774-c44b-6fec-f993-793e744f169a%40eisentraut.org
2023-08-29 15:19:56 +02:00
Peter Eisentraut 6844d3275a Remove useless if condition
We can call GetAttributeCompression() with a NULL argument.  It
handles that internally already.  This change makes all the callers of
GetAttributeCompression() uniform.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-08-29 08:58:56 +02:00
Peter Eisentraut 689c66a84b Remove useless if condition
This is useless because these fields are not set anywhere before, so
we can assign them unconditionally.  This also makes this more
consistent with ATExecAddColumn().

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-08-29 08:52:22 +02:00
Peter Eisentraut 1fa9241bdd Make more use of makeColumnDef()
Since we already have it, we might as well make full use of it,
instead of assembling ColumnDef by hand in several places.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-08-29 08:45:05 +02:00
Peter Eisentraut 2b088c8e4a Add some const decorations
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-08-29 08:30:45 +02:00
Heikki Linnakangas 5fec3c870e Initialize ListenSocket array earlier.
After commit b0bea38705, syslogger prints 63 warnings about failing to
close a listen socket at postmaster startup. That's because the
syslogger process forks before the ListenSockets array is initialized,
so ClosePostmasterPorts() calls "close(0)" 64 times. The first call
succeeds, because fd 0 is stdin.

This has been like this since commit 9a86f03b4e in version 13, which
moved the SysLogger_Start() call to before initializing ListenSockets.
We just didn't notice until commit b0bea38705 added the LOG message.

Reported by Michael Paquier and Jeff Janes.

Author: Michael Paquier
Discussion: https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9%40paquier.xyz
Discussion: https://www.postgresql.org/message-id/ZO0fgDwVw2SUJiZx@paquier.xyz#482670177eb4eaf4c9f03c1eed963e5f
Backpatch-through: 13
2023-08-29 09:09:40 +03:00
Michael Paquier f593c5517d Tweak pg_promote() to report failures on kill() or postmaster failures
Since its introduction in 10074651e3, pg_promote() has been returning
a false status in three cases:
- SIGUSR1 not sent to the postmaster process.
- Postmaster death during standby promotion.
- Standby not promoted within the specified wait time.

An application calling this function will have a hard time understanding
what a false state returned actually means.

Per discussion, this switches the two first states to fail rather than
return a "false" status, making the second case more consistent with the
existing CHECK_FOR_INTERRUPTS in the wait loop.  False is only returned
when the promotion is not completed within the specified time (60s by
default).

Author: Ashutosh Sharma
Reviewed-by: Fujii Masao, Laurenz Albe, Michael Paquier
Discussion: https://postgr.es/m/CAE9k0P=QTrwptL0t4J0fuBRDDjgsT-0PVKd-ikd96i1hyL7Bcg@mail.gmail.com
2023-08-29 08:45:04 +09:00
Peter Eisentraut 36e4419d1f Make error messages about WAL segment size more consistent
Make the primary messages more compact and make the detail messages
uniform.  In initdb.c and pg_resetwal.c, use the newish
option_parse_int() to simplify some of the option parsing.  For the
backend GUC wal_segment_size, add a GUC check hook to do the
verification instead of coding it in bootstrap.c.  This might be
overkill, but that way the check is in the right place and it becomes
more self-documenting.

In passing, make pg_controldata use the logging API for warning
messages.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/9939aa8a-d7be-da2c-7715-0a0b5535a1f7@eisentraut.org
2023-08-28 15:17:04 +02:00
Michael Paquier bb9002257b Fix some typos in wait_event_names.txt
Noticed in passing, while hacking on a different patch touching this
area.
2023-08-28 17:09:12 +09:00
Michael Paquier 617f9b7d4b Tighten unit parsing in internal values
Interval values now generate an error when the user has multiple
consecutive units or a unit without a value.  Previously, it was
possible to specify multiple units consecutively which is contrary to
what the documentation allows, so it was possible to finish with
confusing interval values.

This is a follow-up of the work done in 165d581f14.

Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
2023-08-28 14:27:17 +09:00
Michael Paquier 165d581f14 Tighten handling of "ago" in interval values
This commit Restrict the unit "ago" to only appear at the end of the
interval.  According to the documentation, a direction can only be
defined at the end of an interval, but it was possible to define it in
the middle of the string or define it multiple times.

In spirit, this is similar to the error handling improvements done in
5b3c595355 or bcc704b524.

Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
2023-08-28 13:49:55 +09:00
Peter Eisentraut 9a0ddc39c6 Format list of catalog files in makefile vertically
This makes it easier to compare the lists visually with the
corresponding meson lists.

In passing, copy over some relevant comments from the makefiles to
meson.build.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/a306be82-ee71-4554-d499-49a45a654396%40eisentraut.org
2023-08-28 06:20:56 +02:00
Michael Paquier d6d1430f40 Remove dead code in DecodeInterval()
This commit removes some dead code related to the unit type RESERV,
whose last use has been removed from the unit lookup table used for
intervals ("deltatktbl" in datetime.c) in 666cbae16d.  Before that,
RESERV was used as an equivalent of "invalid", but that's now
unreachable.

Author: Joseph Koshakow
Reviewed-by: Jacob Champion, Gurjeet Singh, Reid Thompson
Discussion: https://postgr.es/m/CAAvxfHd-yNO+XYnUxL=GaNZ1n+eE0V-oE0+-cC1jdjdU0KS3iw@mail.gmail.com
2023-08-28 12:53:41 +09:00
Michael Paquier bb45156f34 Show names of DEALLOCATE as constants in pg_stat_statements
This commit switches query jumbling so as prepared statement names are
treated as constants in DeallocateStmt.  A boolean field is added to
DeallocateStmt to make a distinction between ALL and named prepared
statements, as "name" was used to make this difference before, NULL
meaning DEALLOCATE ALL.

Prior to this commit, DEALLOCATE was not tracked in pg_stat_statements,
for the reason that it was not possible to treat its name parameter as a
constant.  Now that query jumbling applies to all the utility nodes,
this reason does not apply anymore.

Like 638d42a3c5, this can be a huge advantage for monitoring where
prepared statement names are randomly generated, preventing bloat in
pg_stat_statements.  A couple of tests are added to track the new
behavior.

Author: Dagfinn Ilmari Mannsåker, Michael Paquier
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
2023-08-27 17:27:44 +09:00
Michael Paquier e48b19c5db Generate new LOG for "trust" connections under log_connections
Adding an extra LOG for connections that have not set an authn ID, like
when the "trust" authentication method is used, is useful for audit
purposes.

A couple of TAP tests for SSL and authentication need to be tweaked to
adapt to this new LOG generated, as some scenarios expected no logs but
they now get a hit.

Reported-by: Shaun Thomas
Author: Jacob Champion
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/CAFdbL1N7-GF-ZXKaB3XuGA+CkSmnjFvqb8hgjMnDfd+uhL2u-A@mail.gmail.com
2023-08-26 20:11:19 +09:00
Alvaro Herrera b0e96f3119
Catalog not-null constraints
We now create contype='n' pg_constraint rows for not-null constraints.

We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables.  We also spawn not-null constraints
for inheritance child tables when their parents have primary keys.
These related constraints mostly follow the well-known rules of
conislocal and coninhcount that we have for CHECK constraints, with some
adaptations: for example, as opposed to CHECK constraints, we don't
match not-null ones by name when descending a hierarchy to alter it,
instead matching by column name that they apply to.  This means we don't
require the constraint names to be identical across a hierarchy.

For now, we omit them for system catalogs.  Maybe this is worth
reconsidering.  We don't support NOT VALID nor DEFERRABLE clauses
either; these can be added as separate features later (this patch is
already large and complicated enough.)

psql shows these constraints in \d+.

pg_dump requires some ad-hoc hacks, particularly when dumping a primary
key.  We now create one "throwaway" not-null constraint for each column
in the PK together with the CREATE TABLE command, and once the PK is
created, all those throwaway constraints are removed.  This avoids
having to check each tuple for nullness when the dump restores the
primary key creation.

pg_upgrading from an older release requires a somewhat brittle procedure
to create a constraint state that matches what would be created if the
database were being created fresh in Postgres 17.  I have tested all the
scenarios I could think of, and it works correctly as far as I can tell,
but I could have neglected weird cases.

This patch has been very long in the making.  The first patch was
written by Bernd Helmle in 2010 to add a new pg_constraint.contype value
('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one
was killed by the realization that we ought to use contype='c' instead:
manufactured CHECK constraints.  However, later SQL standard
development, as well as nonobvious emergent properties of that design
(mostly, failure to distinguish them from "normal" CHECK constraints as
well as the performance implication of having to test the CHECK
expression) led us to reconsider this choice, so now the current
implementation uses contype='n' again.  During Postgres 16 this had
already been introduced by commit e056c557ae, but there were some
problems mainly with the pg_upgrade procedure that couldn't be fixed in
reasonable time, so it was reverted.

In 2016 Vitaly Burovoy also worked on this feature[1] but found no
consensus for his proposed approach, which was claimed to be closer to
the letter of the standard, requiring an additional pg_attribute column
to track the OID of the not-null constraint for that column.
[1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
2023-08-25 13:31:24 +02:00
Amit Kapila 9c13b6814a Reset the logical worker type while cleaning up other worker info.
Commit 2a8b40e36 introduces the worker type field for logical replication
workers, but forgot to reset the type when the worker exits. This can lead
to recognizing a stopped worker as a valid logical replication worker.

Fix it by resetting the worker type and additionally adding the safeguard
to not use LogicalRepWorker until ->in_use is verified.

Reported-by: Thomas Munro based on cfbot reports.
Author: Hou Zhijie, Alvaro Herrera
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/CA+hUKGK2RQh4LifVgBmkHsCYChP-65UwGXOmnCzYVa5aAt4GWg@mail.gmail.com
2023-08-25 08:57:55 +05:30
Tom Lane d8b2fcc9d4 Avoid unnecessary plancache revalidation of utility statements.
Revalidation of a plancache entry (after a cache invalidation event)
requires acquiring a snapshot.  Normally that is harmless, but not
if the cached statement is one that needs to run without acquiring a
snapshot.  We were already aware of that for TransactionStmts,
but for some reason hadn't extrapolated to the other statements that
PlannedStmtRequiresSnapshot() knows mustn't set a snapshot.  This can
lead to unexpected failures of commands such as SET TRANSACTION
ISOLATION LEVEL.  We can fix it in the same way, by excluding those
command types from revalidation.

However, we can do even better than that: there is no need to
revalidate for any statement type for which parse analysis, rewrite,
and plan steps do nothing interesting, which is nearly all utility
commands.  To mechanize this, invent a parser function
stmt_requires_parse_analysis() that tells whether parse analysis does
anything beyond wrapping a CMD_UTILITY Query around the raw parse
tree.  If that's what it does, then rewrite and plan will just
skip the Query, so that it is not possible for the same raw parse
tree to produce a different plan tree after cache invalidation.

stmt_requires_parse_analysis() is basically equivalent to the
existing function analyze_requires_snapshot(), except that for
obscure reasons that function omits ReturnStmt and CallStmt.
It is unclear whether those were oversights or intentional.
I have not been able to demonstrate a bug from not acquiring a
snapshot while analyzing these commands, but at best it seems mighty
fragile.  It seems safer to acquire a snapshot for parse analysis of
these commands too, which allows making stmt_requires_parse_analysis
and analyze_requires_snapshot equivalent.

In passing this fixes a second bug, which is that ResetPlanCache
would exclude ReturnStmts and CallStmts from revalidation.
That's surely *not* safe, since they contain parsable expressions.

Per bug #18059 from Pavel Kulakov.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org
2023-08-24 12:02:46 -04:00
Heikki Linnakangas b0bea38705 Use FD_CLOEXEC on ListenSockets
It's good hygiene if e.g. an extension launches a subprogram when
being loaded. We went through some effort to close them in the child
process in EXEC_BACKEND mode, but it's better to not hand them down to
the child process in the first place. We still need to close them
after fork when !EXEC_BACKEND, but it's a little simpler.

In the passing, LOG a message if closing the client connection or
listen socket fails. Shouldn't happen, but if it does, would be nice
to know.

Reviewed-by: Tristan Partin, Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-08-24 17:03:05 +03:00
Peter Eisentraut d71e6055e4 Fix lack of message pluralization 2023-08-24 14:22:46 +02:00
Peter Eisentraut 4f3514f201 Rename hook functions for debug_io_direct to match variable name.
Commit 319bae9a renamed the GUC.  Rename the check and assign functions
to match, and alphabetize.

Back-patch to 16.

Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2769341e-fa28-c2ee-3e4b-53fdcaaf2271%40eisentraut.org
2023-08-24 22:25:49 +12:00
Amit Kapila 27449ccc4d Fix the error message when failing to restore the snapshot.
The SnapBuildRestoreContents() used a const value in the error message to
indicate the size in bytes it was expecting to read from the serialized
snapshot file. Fix it by reporting the size that was actually passed.

Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: http://postgr.es/m/OS0PR01MB5716D408364F7DF32221C08D941FA@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-08-24 14:37:29 +05:30
Peter Eisentraut 9efcf442b9 Fix translation markers
Conditionals cannot be inside gettext trigger functions, they must be
applied outside.
2023-08-24 10:25:51 +02:00
Heikki Linnakangas bf8bf6d0bd Fix _bt_allequalimage() call within critical section.
_bt_allequalimage() does complicated things, so it's not OK to call it
in a critical section. Per buildfarm failure on 'prion', which uses
-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE options.

Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi
Backpatch-through: 16, like commit ccadf73163 that introduced this
2023-08-23 18:12:41 +03:00
Nathan Bossart 260a1f18da Add to_bin() and to_oct().
This commit introduces functions for converting numbers to their
equivalent binary and octal representations.  Also, the base
conversion code for these functions and to_hex() has been moved to
a common helper function.

Co-authored-by: Eric Radman
Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed
Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com
2023-08-23 07:49:03 -07:00
Heikki Linnakangas ccadf73163 Use the buffer cache when initializing an unlogged index.
Some of the ambuildempty functions used smgrwrite() directly, followed
by smgrimmedsync(). A few small problems with that:

Firstly, one is supposed to use smgrextend() when extending a
relation, not smgrwrite(). It doesn't make much difference in
production builds. smgrextend() updates the relation size cache, so
you miss that, but that's harmless because we never use the cached
relation size of an init fork. But if you compile with
CHECK_WRITE_VS_EXTEND, you get an assertion failure.

Secondly, the smgrwrite() calls were performed before WAL-logging, so
the page image written to disk had 0/0 as the LSN, not the LSN of the
WAL record. That's also harmless in practice, but seems sloppy.

Thirdly, it's better to use the buffer cache, because then you don't
need to smgrimmedsync() the relation to disk, which adds latency.
Bypassing the cache makes sense for bulk operations like index
creation, but not when you're just initializing an empty index.
Creation of unlogged tables is hardly performance bottleneck in any
real world applications, but nevertheless.

Backpatch to v16, but no further. These issues should be harmless in
practice, so better to not rock the boat in older branches.

Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9@iki.fi
2023-08-23 17:21:31 +03:00
Daniel Gustafsson 27a36f79b6 Fix wording in comment
The comment for the DSM_OP_CREATE paramater read "the a new handle"
which is confusing. Fix by rewording to indicate what the parameter
means for DSM_OP_CREATE.

Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3J2bc197ym-M_ykOXb9ox2eNn-QNKNeoSAoHYSw2NCOnw@mail.gmail.com
2023-08-23 10:22:55 +02:00
Peter Eisentraut ae556c4416 Some vertical reformatting
Remove some line breaks that have become unnecessary after some
variable renaming.

Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org
2023-08-23 06:39:39 +02:00
Peter Eisentraut 23382b0f8b Rename some function arguments for better clarity
Especially make sure that array arguments have plural names.

Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org
2023-08-23 06:39:39 +02:00
Peter Eisentraut 11af63fb48 Add const decorations
in index.c and indexcmds.c and some adjacent places.  This especially
makes it easier to understand for some complicated function signatures
which are the input and the output arguments.

Discussion: https://www.postgresql.org/message-id/flat/5ed89c69-f4e6-5dab-4003-63bde7460e5e%40eisentraut.org
2023-08-23 06:39:39 +02:00
Nathan Bossart f4b54e1ed9 Introduce macros for protocol characters.
This commit introduces descriptively-named macros for the
identifiers used in wire protocol messages.  These new macros are
placed in a new header file so that they can be easily used by
third-party code.

Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
2023-08-22 19:16:12 -07:00
Thomas Munro 7114791158 ExtendBufferedWhat -> BufferManagerRelation.
Commit 31966b15 invented a way for functions dealing with relation
extension to accept a Relation in online code and an SMgrRelation in
recovery code.  It seems highly likely that future bufmgr.c interfaces
will face the same problem, and need to do something similar.
Generalize the names so that each interface doesn't have to re-invent
the wheel.

Back-patch to 16.  Since extension AM authors might start using the
constructor macros once 16 ships, we agreed to do the rename in 16
rather than waiting for 17.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com
2023-08-23 12:31:23 +12:00
Andrew Dunstan a684581085 Cache by-reference missing values in a long lived context
Attribute missing values might be needed past the lifetime of the tuple
descriptors from which they are extracted. To avoid possibly using
pointers for by-reference values which might thus be left dangling, we
cache a datumCopy'd version of the datum in the TopMemoryContext. Since
we first search for the value this only needs to be done once per
session for any such value.

Original complaint from Tom Lane, idea for mitigation by Andrew Dunstan,
tweaked by Tom Lane.

Backpatch to version 11 where missing values were introduced.

Discussion: https://postgr.es/m/1306569.1687978174@sss.pgh.pa.us
2023-08-22 15:17:05 -04:00
Alvaro Herrera 757fa45c86
Add comment missing in a4a232b1e7
Noticed while studying nearby code
2023-08-22 12:22:21 +02:00
Amit Kapila 1cdc6d86bf Simplify the logical worker type checks by using the switch on worker type.
The current code uses if/else statements at various places to take worker
specific actions. Change those to use the switch on worker type added by
commit 2a8b40e368. This makes code easier to read and understand.

Author: Peter Smith
Reviewed-by: Amit Kapila, Hou Zhijie
Discussion: http://postgr.es/m/CAHut+PttPSuP0yoZ=9zLDXKqTJ=d0bhxwKaEaNcaym1XqcvDEg@mail.gmail.com
2023-08-22 08:50:44 +05:30
Michael Paquier 6fde2d9a00 Fix pg_stat_reset_single_table_counters() for shared relations
This commit fixes the function of $subject for shared relations.  This
feature has been added by e042678.  Unfortunately, this new behavior got
removed by 5891c7a when moving statistics to shared memory.

Reported-by: Mitsuru Hinata
Author: Masahiro Ikeda
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada
Discussion: https://postgr.es/m/7cc69f863d9b1bc677544e3accd0e4b4@oss.nttdata.com
Backpatch-through: 15
2023-08-21 13:32:14 +09:00
Michael Paquier 1e68e43d3f Add system view pg_wait_events
This new view, wrapped around a SRF, shows some information known about
wait events, as of:
- Name.
- Type (Activity, I/O, Extension, etc.).
- Description.

All the information retrieved comes from wait_event_names.txt, and the
description is the same as the documentation with filters applied to
remove any XML markups.  This view is useful when joined with
pg_stat_activity to get the description of a wait event reported.

Custom wait events for extensions are included in the view.

Original idea by Yves Colin.

Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Masahiro Ikeda, Tom Lane, Michael
Paquier
Discussion: https://postgr.es/m/0e2ae164-dc89-03c3-cf7f-de86378053ac@gmail.com
2023-08-20 15:35:02 +09:00
Peter Eisentraut 881cd9e581 Remove dubious warning message from SQL/JSON functions
There was a warning that FORMAT JSON has no effect on json/jsonb
types, which is true, but it's not clear why we should issue a warning
about it.  The SQL standard does not say anything about this, which
should generally govern the behavior here.  So remove it.

Discussion: https://www.postgresql.org/message-id/flat/dfec2cae-d17e-c508-6d16-c2dba82db486%40eisentraut.org
2023-08-18 07:41:14 +02:00
Michael Paquier 00e49233a9 Fix format if entry in wait_event_names.txt
The entry LockManager had two successive whitespaces between two words.
This is not an actual bug, but let's be clean.  Thinko in fa88928.

Reported-by: Masahiro Ikeda
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/dd836027-2e9e-4df9-9fd9-7527cd1757e1@gmail.com
2023-08-18 08:11:10 +09:00
Thomas Munro 81e36d3e0d Invalidate smgr_targblock in smgrrelease().
In rare circumstances involving relfilenode reuse, it might have been
possible for smgr_targblock to finish up pointing past the end.

Oversight in b74e94dc.  Back-patch to 15.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA%40mail.gmail.com
2023-08-17 15:45:13 +12:00
Michael Paquier 352ea3acf8 Add OAT hook calls for more subcommands of ALTER TABLE
The OAT hooks are added in ALTER TABLE for the following subcommands:
- { ENABLE | DISABLE | [NO] FORCE } ROW LEVEL SECURITY
- { ENABLE | DISABLE } TRIGGER
- { ENABLE | DISABLE } RULE.  Note that there was hook for pg_rewrite,
but not for relation ALTER'ed in pg_class.

Tests are added to test_oat_hook for all the subcommand patterns gaining
hooks here.  Based on an ask from Legs Mansion.

Discussion: https://postgr.es/m/tencent_083B3850655AC6EE04FA0A400766D3FE8309@qq.com
2023-08-17 08:54:17 +09:00
Peter Eisentraut dca20013eb Unify some error messages
We had essentially the same error in several different wordings.
Unify that.
2023-08-16 16:17:00 +02:00
Peter Eisentraut 1e7ca1189c Improved CREATE SUBSCRIPTION message for clarity
Discussion: https://www.postgresql.org/message-id/CAHut+PtfzQ7JRkb0-Y_UejAxaLQ17-bGMvV4MJJHcPoP3ML2bg@mail.gmail.com
2023-08-16 15:27:42 +02:00
Peter Eisentraut 78806a9509 Remove incorrect field from information schema
The source code comment already said that the presence of the field
element_types.domain_default might be a bug in the standard, since it
never made sense there.  Indeed, the field is gone in newer versions
of the standard.  So just remove it.
2023-08-16 13:46:26 +02:00
John Naylor c9bfa40914 Split out tiebreaker comparisons from comparetup_* functions
Previously, if a specialized comparator found equal datum1 keys,
the "comparetup" function would repeat the comparison on the
datum before proceeding with the unabbreviated first key
and/or additional sort keys.

Move comparing additional sort keys into "tiebreak" functions so
that specialized comparators can call these directly if needed,
avoiding duplicate work.

Reviewed by David Rowley

Discussion: https://postgr.es/m/CAFBsxsGaVfUrjTghpf%3DkDBYY%3DjWx1PN-fuusVe7Vw5s0XqGdGw%40mail.gmail.com
2023-08-16 17:15:07 +07:00
Etsuro Fujita 9e9931d2bf Re-allow FDWs and custom scan providers to replace joins with pseudoconstant quals.
This was disabled in commit 6f80a8d9c due to the lack of support for
handling of pseudoconstant quals assigned to replaced joins in
createplan.c.  To re-allow it, this patch adds the support by 1)
modifying the ForeignPath and CustomPath structs so that if they
represent foreign and custom scans replacing a join with a scan, they
store the list of RestrictInfo nodes to apply to the join, as in
JoinPaths, and by 2) modifying create_scan_plan() in createplan.c so
that it uses that list in that case, instead of the baserestrictinfo
list, to get pseudoconstant quals assigned to the join, as mentioned in
the commit message for that commit.

Important item for the release notes: this is non-backwards-compatible
since it modifies the ForeignPath and CustomPath structs, as mentioned
above, and changes the argument lists for FDW helper functions
create_foreignscan_path(), create_foreign_join_path(), and
create_foreign_upper_path().

Richard Guo, with some additional changes by me, reviewed by Nishant
Sharma, Suraj Kharage, and Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-08-15 16:45:00 +09:00
Thomas Munro 5ffb7c7750 De-pessimize ConditionVariableCancelSleep().
Commit b91dd9de was concerned with a theoretical problem with our
non-atomic condition variable operations.  If you stop sleeping, and
then cancel the sleep in a separate step, you might be signaled in
between, and that could be lost.  That doesn't matter for callers of
ConditionVariableBroadcast(), but callers of ConditionVariableSignal()
might be upset if a signal went missing like this.

Commit bc971f4025 interacted badly with that logic, because it doesn't
use ConditionVariableSleep(), which would normally put us back in the
wait list.  ConditionVariableCancelSleep() would be confused and think
we'd received an extra signal, and try to forward it to another backend,
resulting in wakeup storms.

New idea: ConditionVariableCancelSleep() can just return true if we've
been signaled.  Hypothetical users of ConditionVariableSignal() would
then still have a way to deal with rare lost signals if they are
concerned about that problem.

Back-patch to 16, where bc971f4025 arrived.

Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2840876b-4cfe-240f-0a7e-29ffd66711e7%40enterprisedb.com
2023-08-15 10:23:47 +12:00
Andres Freund 82a4edabd2 hio: Take number of prior relation extensions into account
The new relation extension logic, introduced in 00d1e02be2, could lead to
slowdowns in some scenarios. E.g., when loading narrow rows into a table using
COPY, the caller of RelationGetBufferForTuple() will only request a small
number of pages. Without concurrency, we just extended using pwritev() in that
case. However, if there is *some* concurrency, we switched between extending
by a small number of pages and a larger number of pages, depending on the
number of waiters for the relation extension logic.  However, some
filesystems, XFS in particular, do not perform well when switching between
extending files using fallocate() and pwritev().

To avoid that issue, remember the number of prior relation extensions in
BulkInsertState and extend more aggressively if there were prior relation
extensions. That not just avoids the aforementioned slowdown, but also leads
to noticeable performance gains in other situations, primarily due to
extending more aggressively when there is no concurrency. I should have done
it this way from the get go.

Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com
Backpatch: 16-, where the new relation extension code was added
2023-08-14 11:33:09 -07:00
Michael Paquier af720b4c50 Change custom wait events to use dynamic shared hash tables
Currently, the names of the custom wait event must be registered for
each backend, requiring all these to link to the shared memory area of
an extension, even if these are not loaded with
shared_preload_libraries.

This patch relaxes the constraints related to this infrastructure by
storing the wait events and their names in two dynamic hash tables in
shared memory.  This has the advantage to simplify the registration of
custom wait events to a single routine call that returns an event ID
ready for consumption:
uint32 WaitEventExtensionNew(const char *wait_event_name);

The caller of this routine can then cache locally the ID returned, to be
used for pgstat_report_wait_start(), WaitLatch() or a similar routine.

The implementation uses two hash tables: one with a key based on the
event name to avoid duplicates and a second using the event ID as key
for event lookups, like on pg_stat_activity.  These tables can hold a
minimum of 16 entries, and a maximum of 128 entries, which should be plenty
enough.

The code changes done in worker_spi show how things are simplified (most
of the code removed in this commit comes from there):
- worker_spi_init() is gone.
- No more shared memory hooks required (size requested and
initialization).
- The custom wait event ID is cached in the process that needs to set
it, with one single call to WaitEventExtensionNew() to retrieve it.

Per suggestion from Andres Freund.

Author: Masahiro Ikeda, with a few tweaks from me.
Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de
2023-08-14 14:47:27 +09:00
Amit Kapila 2a8b40e368 Simplify determining logical replication worker types.
We deduce a LogicalRepWorker's type from the values of several different
fields ('relid' and 'leader_pid') whenever logic needs to know it.

In fact, the logical replication worker type is already known at the time
of launching the LogicalRepWorker and it never changes for the lifetime of
that process. Instead of deducing the type, it is simpler to just store it
one time, and access it directly thereafter.

Author: Peter Smith
Reviewed-by: Amit Kapila, Bharath Rupireddy
Discussion: http://postgr.es/m/CAHut+PttPSuP0yoZ=9zLDXKqTJ=d0bhxwKaEaNcaym1XqcvDEg@mail.gmail.com
2023-08-14 08:38:03 +05:30
Noah Misch c36b636096 Fix off-by-one in XLogRecordMaxSize check.
pg_logical_emit_message(false, '_', repeat('x', 1069547465)) failed with
self-contradictory message "WAL record would be 1069547520 bytes (of
maximum 1069547520 bytes)".  There's no particular benefit from allowing
or denying one byte in either direction; XLogRecordMaxSize could rise a
few megabytes without trouble.  Hence, this is just for cleanliness.
Back-patch to v16, where this check first appeared.
2023-08-12 14:37:05 -07:00
Michael Paquier 638d42a3c5 Show GIDs of two-phase commit commands as constants in pg_stat_statements
This relies on the "location" field added to TransactionStmt in 31de7e6,
now applied to the "gid" field used by 2PC commands.  These commands are
now reported like:
COMMIT PREPARED $1
PREPARE TRANSACTION $1
ROLLBACK PREPARED $1

Applying constants for these commands is a huge advantage for workloads
that rely a lot on 2PC commands with different GIDs.  Some tests are
added to track the new behavior.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ZMhT9kNtJJsHw6jK@paquier.xyz
2023-08-12 10:44:15 +09:00
Michael Paquier 5dc456b7dd Fix code indentation violations introduced by recent commit
The two culprit commits are 5765cfe and 5e0c761.

Per buildfarm member koel for the first commit, while I have noticed the
second one in passing.
2023-08-11 20:43:34 +09:00
Jeff Davis 5765cfe18c Transform proconfig for faster execution.
Store function config settings in lists to avoid the need to parse and
allocate for each function execution.

Speedup is modest but significant. Additionally, this change also
seems cleaner and supports some other performance improvements under
discussion.

Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel@j-davis.com
Reviewed-by: Nathan Bossart
2023-08-10 12:43:53 -07:00
Alvaro Herrera b57cfb439b
Document RelationGetIndexAttrBitmap better
Commit 19d8e2308b changed the list of set-of-columns that can be
returned by RelationGetIndexAttrBitmap, but didn't update its
"documentation".  That was pretty hard to read already, so rewrite to
make it more comprehensible, adding the missing values while at it.

Backpatch to 16, like that commit.

Discussion: https://postgr.es/m/20230809091155.7c7f3gttjk3dj4ze@alvherre.pgsql
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
2023-08-10 12:04:07 +02:00
Jeff Davis fa2e874946 Recalculate search_path after ALTER ROLE.
Renaming a role can affect the meaning of the special string $user, so
must cause search_path to be recalculated.

Discussion: https://postgr.es/m/186761d32c0255debbdf50b6310b581b9c973e6c.camel@j-davis.com
Reviewed-by: Nathan Bossart, Michael Paquier
Backpatch-through: 11
2023-08-09 13:09:25 -07:00
Alvaro Herrera c27f8621ee
struct PQcommMethods: use C99 designated initializers
As in 98afa68d93, 2c86077765, et al.
2023-08-09 11:30:59 +02:00
Michael Paquier f05b1fa1ff doc: Fix incorrect entries generated from wait_event_names.txt
fa88928 has introduced wait_event_names.txt, and some of its entries had
some documentation fields with more information than necessary.

This commit brings back the description of all the wait events to be
consistent with the older stable branches.  Five descriptions were
incorrect.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/e378989e-1899-643a-dec1-10f691a0a105@gmail.com
2023-08-08 08:17:53 +09:00
Noah Misch cd5f2a3570 Reject substituting extension schemas or owners matching ["$'\].
Substituting such values in extension scripts facilitated SQL injection
when @extowner@, @extschema@, or @extschema:...@ appeared inside a
quoting construct (dollar quoting, '', or "").  No bundled extension was
vulnerable.  Vulnerable uses do appear in a documentation example and in
non-bundled extensions.  Hence, the attack prerequisite was an
administrator having installed files of a vulnerable, trusted,
non-bundled extension.  Subject to that prerequisite, this enabled an
attacker having database-level CREATE privilege to execute arbitrary
code as the bootstrap superuser.  By blocking this attack in the core
server, there's no need to modify individual extensions.  Back-patch to
v11 (all supported versions).

Reported by Micah Gate, Valerie Woolard, Tim Carey-Smith, and Christoph
Berg.

Security: CVE-2023-39417
2023-08-07 06:05:56 -07:00
Peter Eisentraut 2bdd7b262f Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 97398d714ace69f0c919984e160f429b6fd2300e
2023-08-07 12:39:30 +02:00
David Rowley 990c3650c5 Don't Memoize lateral joins with volatile join conditions
The use of Memoize was already disabled in normal joins when the join
conditions had volatile functions per the code in
match_opclause_to_indexcol().  Ordinarily, the parameterization for the
inner side of a nested loop will be an Index Scan or at least eventually
lead to an index scan (perhaps nested several joins deep). However, for
lateral joins, that's not the case and seq scans can be parameterized
too, so we can't rely on match_opclause_to_indexcol().

Here we explicitly check the parameterization for volatile functions and
don't consider the generation of a Memoize path when such functions
are present.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs49nHFnHbpepLsv_yF3qkpCS4BdB-v8HoJVv8_=Oat0u_w@mail.gmail.com
Backpatch-through: 14, where Memoize was introduced
2023-08-07 22:14:21 +12:00
Dean Rasheed c2e08b04c9 Fix RLS policy usage in MERGE.
If MERGE executes an UPDATE action on a table with row-level security,
the code incorrectly applied the WITH CHECK clauses from the target
table's INSERT policies to new rows, instead of the clauses from the
table's UPDATE policies. In addition, it failed to check new rows
against the target table's SELECT policies, if SELECT permissions were
required (likely to always be the case).

In addition, if MERGE executes a DO NOTHING action for matched rows,
the code incorrectly applied the USING clauses from the target table's
DELETE policies to existing target tuples. These policies were applied
as checks that would throw an error, if they did not pass.

Fix this, so that a MERGE UPDATE action applies the same RLS policies
as a plain UPDATE query with a WHERE clause, and a DO NOTHING action
does not apply any RLS checks (other than adding clauses from SELECT
policies to the join).

Back-patch to v15, where MERGE was introduced.

Dean Rasheed, reviewed by Stephen Frost.

Security: CVE-2023-39418
2023-08-07 09:28:47 +01:00
David Rowley fdd79d8992 Fix misleading comment in paraminfo_get_equal_hashops
The comment mistakenly claimed the code was checking PlaceHolderVars for
volatile functions when the code was actually checking lateral vars.

Update the comment to reflect the reality of the code.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs48HZGZOV85g0fx8z1qDx6NNKHexJPT2FCnKnZhxBWkd-A@mail.gmail.com
2023-08-07 18:16:46 +12:00
David Rowley 6c00d2c9d4 Tidy up join_search_one_level code
The code in join_search_one_level was a bit convoluted.  With a casual
glance, you might think that other_rels_list was being set to something
other than joinrels[1] when level == 2, however, joinrels[level - 1] is
joinrels[1] when level == 2, so nothing special needs to happen to set
other_rels_list.  Let's clean that up to avoid confusing anyone.

In passing, we may as well modernize the loop in
make_rels_by_clause_joins() and instead of passing in the ListCell to
start looping from, let's just pass in the index where to start from and
make use of for_each_from().  Ever since 1cff1b95a, Lists are arrays
under the hood. lnext() and list_head() both seem a little too linked-list
like.

Author: Alex Hsieh, David Rowley, Richard Guo
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/CANWNU8x9P9aCXGF%3DaT-A_8mLTAT0LkcZ_ySYrGbcuHzMQw2-1g%40mail.gmail.com
2023-08-06 21:51:54 +12:00
Amit Kapila 81ccbe520f Simplify some of the logical replication worker-type checks.
Author: Peter Smith
Reviewed-by: Hou Zhijie
Discussion: http://postgr.es/m/CAHut+Pv-xkEpuPzbEJ=ZSi7Hp2RoGJf=VA-uDRxLi1KHSneFjg@mail.gmail.com
2023-08-04 08:15:07 +05:30
David Rowley 3fd19a9b23 Minor adjustments to WindowAgg startup cost code
This is a follow-on of 3900a02c9 containing some changes which I forgot
to commit locally before forming a patch with git format-patch.

Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com
2023-08-04 10:47:54 +12:00
David Rowley 3900a02c97 Account for startup rows when costing WindowAggs
Here we adjust the costs for WindowAggs so that they properly take into
account how much of their subnode they must read before outputting the
first row.  Without this, we always assumed that the startup cost for the
WindowAgg was not much more expensive than the startup cost of its
subnode, however, that's going to be completely wrong in many cases.  The
WindowAgg may have to read *all* of its subnode to output a single row
with certain window bound options.

Here we estimate how many rows we'll need to read from the WindowAgg's
subnode and proportionally add more of the subnode's run costs onto the
WindowAgg's startup costs according to how much of it we expect to have to
read in order to produce the first WindowAgg row.

The reason this is more important than we might have initially thought is
that we may end up making use of a path from the lower planner that works
well as a cheap startup plan when the query has a LIMIT clause, however,
the WindowAgg might mean we need to read far more rows than what the LIMIT
specifies.

No backpatch on this so as not to cause plan changes in released
versions.

Bug: #17862
Reported-by: Tim Palmer
Author: David Rowley
Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/17862-1ab8f74b0f7b0611@postgresql.org
Discussion: https://postgr.es/m/CAApHDvrB0S5BMv+0-wTTqWFE-BJ0noWqTnDu9QQfjZ2VSpLv_g@mail.gmail.com
2023-08-04 09:27:38 +12:00
Amit Kapila 02c1b64fb1 Refactor to split Apply and Tablesync Workers code.
Both apply and tablesync workers were using ApplyWorkerMain() as entry
point. As the name implies, ApplyWorkerMain() should be considered as
the main function for apply workers. Tablesync worker's path was hidden
and does not have enough in common to share the same main function with
apply worker.

Also, most of the code shared by both worker types is already combined
in LogicalRepApplyLoop(). There is no need to combine the rest in
ApplyWorkerMain() anymore.

This patch introduces TablesyncWorkerMain() as a new entry point for
tablesync workers. This aims to increase code readability and would help
with future improvements like the reuse of tablesync workers in the
initial synchronization.

Author: Melih Mutlu based on suggestions by Melanie Plageman
Reviewed-by: Peter Smith, Kuroda Hayato, Amit Kapila
Discussion: http://postgr.es/m/CAGPVpCTq=rUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ@mail.gmail.com
2023-08-03 08:59:50 +05:30
Masahiko Sawada 0125c4e21d Fix ReorderBufferCheckMemoryLimit() comment.
Commit 7259736a6 updated the comment but it was not correct since
ReorderBufferLargestStreamableTopTXN() returns only top-level
transactions.

Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoA9XB7OR86BqvrCe2dMYX%2BZv3-BvVmjF%3DGY2z6jN-kqjg%40mail.gmail.com
Backpatch-through: 14
2023-08-02 15:01:13 +09:00
David Rowley 3845577cb5 Fix performance regression in pg_strtointNN_safe functions
Between 6fcda9aba and 1b6f632a3, the pg_strtoint functions became quite
a bit slower in v16, despite efforts in 6b423ec67 to speed these up.

Since the majority of cases for these functions will only contain
base-10 digits, perhaps prefixed by a '-', it makes sense to have a
special case for this and just fall back on the more complex version
which processes hex, octal, binary and underscores if the fast path
version fails to parse the string.

While we're here, update the header comments for these functions to
mention that hex, octal and binary formats along with underscore
separators are now supported.

Author: Andres Freund, David Rowley
Reported-by: Masahiko Sawada
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com
Backpatch-through: 16, where 6fcda9aba and 1b6f632a3 were added
2023-08-02 12:05:41 +12:00
David Rowley deae1657ee Fix overly strict Assert in jsonpath code
This was failing for queries which try to get the .type() of a
jpiLikeRegex.  For example:

select jsonb_path_query('["string", "string"]',
                        '($[0] like_regex ".{7}").type()');

Reported-by: Alexander Kozhemyakin
Bug: #18035
Discussion: https://postgr.es/m/18035-64af5cdcb5adf2a9@postgresql.org
Backpatch-through: 12, where SQL/JSON path was added.
2023-08-02 01:39:47 +12:00
Noah Misch d3a38318ac Rename OverrideSearchPath to SearchPathMatcher.
The previous commit removed the "override" APIs.  Surviving APIs facilitate
plancache.c to snapshot search_path and test whether the current value equals
a remembered snapshot.

Aleksander Alekseev.  Reported by Alexander Lakhin and Noah Misch.

Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com
2023-07-31 17:04:47 -07:00
Noah Misch 7c5c4e1c03 Remove PushOverrideSearchPath() and PopOverrideSearchPath().
Since commit 681d9e4621, they have no in-tree
calls.  Any new calls would introduce security vulnerabilities like the one
fixed in that commit.

Alexander Lakhin, reviewed by Aleksander Alekseev.

Discussion: https://postgr.es/m/8ffb4650-52c4-6a81-38fc-8f99be981130@gmail.com
2023-07-31 17:04:47 -07:00
Michael Paquier c9af054653 Support custom wait events for wait event type "Extension"
Two backend routines are added to allow extension to allocate and define
custom wait events, all of these being allocated in the type
"Extension":
* WaitEventExtensionNew(), that allocates a wait event ID computed from
a counter in shared memory.
* WaitEventExtensionRegisterName(), to associate a custom string to the
wait event ID allocated.

Note that this includes an example of how to use this new facility in
worker_spi with tests in TAP for various scenarios, and some
documentation about how to use them.

Any code in the tree that currently uses WAIT_EVENT_EXTENSION could
switch to this new facility to define custom wait events.  This is left
as work for future patches.

Author: Masahiro Ikeda
Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath
Rupireddy
Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com
2023-07-31 17:09:24 +09:00
Michael Paquier 7395a90db8 Add WAIT_EVENT_{CLASS,ID}_MASK in wait_event.c
These are useful to extract the class ID and the event ID associated to
a single uint32 wait_event_info.  Only two code paths use them now, but
an upcoming patch will extend their use.

Discussion: https://postgr.es/m/ZMcJ7F7nkGkIs8zP@paquier.xyz
2023-07-31 16:14:47 +09:00
Etsuro Fujita 6f80a8d9c1 Disallow replacing joins with scans in problematic cases.
Commit e7cb7ee14, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-07-28 15:45:00 +09:00
Tom Lane 38df84c65e Eliminate fixed token-length limit in hba.c.
Historically, hba.c limited tokens in the authentication configuration
files (pg_hba.conf and pg_ident.conf) to less than 256 bytes.  We have
seen a few reports of this limit causing problems; notably, for
moderately-complex LDAP configurations.  Let's get rid of the fixed
limit by using a StringInfo instead of a fixed-size buffer.
This actually takes less code than before, since we can get rid of
a nontrivial error recovery stanza.  It's doubtless a hair slower,
but parsing the content of the HBA files should in no way be
performance-critical.

Although this is a pretty straightforward patch, it doesn't seem
worth the risk to back-patch given the small number of complaints
to date.  In released branches, we'll just raise MAX_TOKEN to
ameliorate the problem.

Discussion: https://postgr.es/m/1588937.1690221208@sss.pgh.pa.us
2023-07-27 11:56:35 -04:00
David Rowley b635ac03e8 Fix performance problem with new COPY DEFAULT code
9f8377f7a added code to allow COPY FROM insert a column's default value
when the input matches the DEFAULT string specified in the COPY command.

Here we fix some inefficient code which needlessly palloc0'd an array to
store if we should use the default value or input value for the given
column.  This array was being palloc0'd and pfree'd once per row.  It's
much more efficient to allocate this once and just reset the values once
per row.

Reported-by: Masahiko Sawada
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940%3DKp6mszNGK3bq9yRN6g%40mail.gmail.com
Backpatch-through: 16, where 9f8377f7a was introduced.
2023-07-27 14:47:05 +12:00
Michael Paquier f6a84546b1 Add sanity asserts for index OID and attnums during cache init
There was already a check on the relation OID, but not its index OID and
the attributes that can be used during the syscache lookups.  The two
assertions added by this commit are cheap, and actually useful for
developers to fasten the detection of incorrect data in a new entry
added in the syscache list, as these assertions are triggered during the
initial cache loading (initdb, or just backend startup), not requiring a
syscache that uses the new entry.

While on it, the relation OID check is switched to use OidIsValid().

Author: Aleksander Alekseev
Reviewed-by: Dagfinn Ilmari Mannsåker, Zhang Mingli, Michael Paquier
Discussion: https://postgr.es/m/CAJ7c6TOjUTJ0jxvWY6oJeP2-840OF8ch7qscZQsuVuotXTOS_g@mail.gmail.com
2023-07-27 10:55:16 +09:00
Michael Paquier 31de7e60da Show savepoint names as constants in pg_stat_statements
In pg_stat_statements, savepoint names now show up as constants with a
parameter symbol, using as base query string the one added as a new
entry to the PGSS hash table, leading to:
RELEASE $1
ROLLBACK TO $1
SAVEPOINT $1

Applying constants to these query parts is a huge advantage for
workloads that generate randomly savepoint points, like ORMs (Django is
at the origin of this patch).  The ODBC driver is a second layer that
likes a lot savepoints, though it does not use a random naming pattern.

A "location" field is added to TransactionStmt, now set only for
savepoints.  The savepoint name is ignored by the query jumbling.  The
location can be extended to other query patterns, if required, like 2PC
commands.  Some tests are added to pg_stat_statements for all the query
patterns supported by the parser.

ROLLBACK, ROLLBACK TO SAVEPOINT and ROLLBACK TRANSACTION TO SAVEPOINT
have the same Node representation, so all these are equivalents.  The
same happens for RELEASE and RELEASE SAVEPOINT.

Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmm+2s9PA4OaumwMJReWHk8qvJ_-g1WqxDRDAN1BSUfxyTw@mail.gmail.com
2023-07-27 09:42:33 +09:00
Amit Langote 03734a7fed Add more SQL/JSON constructor functions
This Patch introduces three SQL standard JSON functions:

JSON()
JSON_SCALAR()
JSON_SERIALIZE()

JSON() produces json values from text, bytea, json or jsonb values,
and has facilitites for handling duplicate keys.

JSON_SCALAR() produces a json value from any scalar sql value,
including json and jsonb.

JSON_SERIALIZE() produces text or bytea from input which containis
or represents json or jsonb;

For the most part these functions don't add any significant new
capabilities, but they will be of use to users wanting standard
compliant JSON handling.

Catversion bumped as this changes ruleutils.c.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>

Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera,
Peter Eisentraut

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-07-26 17:08:33 +09:00
Amit Langote 254ac5a7c3 Rename a nonterminal used in SQL/JSON grammar
This renames json_output_clause_opt to json_returning_clause_opt,
because the new name makes more sense given that the governing
keyword is RETURNING.

Per suggestion from Álvaro Herrera.

Discussion: https://postgr.es/m/20230707122820.wy5zlmhn4tdzbojl%40alvherre.pgsql
2023-07-26 17:06:09 +09:00
Amit Langote b22391a2ff Some refactoring to export json(b) conversion functions
This is to export datum_to_json(), datum_to_jsonb(), and
jsonb_from_cstring(), though the last one is exported as
jsonb_from_text().

A subsequent commit to add new SQL/JSON constructor functions will
need them for calling from the executor.

Discussion: https://postgr.es/m/20230720160252.ldk7jy6jqclxfxkq%40alvherre.pgsql
2023-07-26 17:06:03 +09:00
Masahiko Sawada bd88404d3c Fix crash with RemoveFromWaitQueue() when detecting a deadlock.
Commit 5764f611e used dclist_delete_from() to remove the proc from the
wait queue. However, since it doesn't clear dist_node's next/prev to
NULL, it could call RemoveFromWaitQueue() twice: when the process
detects a deadlock and then when cleaning up locks on aborting the
transaction. The waiting lock information is cleared in the first
call, so it led to a crash in the second call.

Backpatch to v16, where the change was introduced.

Bug: #18031
Reported-by: Justin Pryzby, Alexander Lakhin
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/ZKy4AdrLEfbqrxGJ%40telsasoft.com
Discussion: https://postgr.es/m/18031-ebe2d08cb405f6cc@postgresql.org
Backpatch-through: 16
2023-07-26 14:41:26 +09:00
Michael Paquier 66d86d4201 Document more assumptions of LWLock variable changes with WAL inserts
This commit adds a few comments about what LWLockWaitForVar() relies on
when a backend waits for a variable update on its LWLocks for WAL
insertions up to an expected LSN.

First, LWLockWaitForVar() does not include a memory barrier, relying on
a spinlock taken at the beginning of WaitXLogInsertionsToFinish().  This
was hidden behind two layers of routines in lwlock.c.  This assumption
is now documented at the top of LWLockWaitForVar(), and detailed at bit
more within LWLockConflictsWithVar().

Second, document why WaitXLogInsertionsToFinish() does not include
memory barriers, relying on a spinlock at its top, which is, per Andres'
input, fine for two different reasons, both depending on the fact that
the caller of WaitXLogInsertionsToFinish() is waiting for a LSN up to a
certain value.

This area's documentation and assumptions could be improved more in the
future, but at least that's a beginning.

Author: Bharath Rupireddy, Andres Freund
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com
2023-07-26 12:06:04 +09:00
Amit Kapila 62e9af4c63 Fix code indentation vioaltion introduced in commit d38ad8e31d.
Per buildfarm member koel

Discussion: https://postgr.es/m/ZL9bsGhthne6FaVV@paquier.xyz
2023-07-25 12:35:58 +05:30
Masahiko Sawada d0ce9d0bc7 Remove unnecessary checks for indexes for REPLICA IDENTITY FULL tables.
Previously, when selecting an usable index for update/delete for the
REPLICA IDENTITY FULL table, in IsIndexOnlyExpression(), we used to
check if all index fields are not expressions. However, it was not
necessary, because it is enough to check if only the leftmost index
field is not an expression (and references the remote table column)
and this check has already been done by
RemoteRelContainsLeftMostColumnOnIdx().

This commit removes IsIndexOnlyExpression() and
RemoteRelContainsLeftMostColumnOnIdx() and all checks for usable
indexes for REPLICA IDENTITY FULL tables are now performed by
IsIndexUsableForReplicaIdentityFull().

Backpatch this to remain the code consistent.

Reported-by: Peter Smith
Reviewed-by: Amit Kapila, Önder Kalacı
Discussion: https://postgr.es/m/CAHut%2BPsGRE5WSsY0jcLHJEoA17MrbP9yy8FxdjC_ZOAACxbt%2BQ%40mail.gmail.com
Backpatch-through: 16
2023-07-25 15:09:34 +09:00
Michael Paquier 71e4cc6b8e Optimize WAL insertion lock acquisition and release with some atomics
The WAL insertion lock variable insertingAt is currently being read
and written with the help of the LWLock wait list lock to avoid any read
of torn values.  This wait list lock can become a point of contention on
a highly concurrent write workloads.

This commit switches insertingAt to a 64b atomic variable that provides
torn-free reads/writes.  On platforms without 64b atomic support, the
fallback implementation uses spinlocks to provide the same guarantees
for the values read.  LWLockWaitForVar(), through
LWLockConflictsWithVar(), reads the new value to check if it still needs
to wait with a u64 atomic operation.  LWLockUpdateVar() updates the
variable before waking up the waiters with an exchange_u64 (full memory
barrier).  LWLockReleaseClearVar() now uses also an exchange_u64 to
reset the variable.  Before this commit, all these steps relied on
LWLockWaitListLock() and LWLockWaitListUnlock().

This reduces contention on LWLock wait list lock and improves
performance of highly-concurrent write workloads.  Here are some
numbers using pg_logical_emit_message() (HEAD at d6677b93) with various
arbitrary record lengths and clients up to 1k on a rather-large machine
(64 vCPUs, 512GB of RAM, 16 cores per sockets, 2 sockets), in terms of
TPS numbers coming from pgbench:
 message_size_b     |     16 |     64 |    256 |   1024
--------------------+--------+--------+--------+-------
 patch_4_clients    |  83830 |  82929 |  80478 |  73131
 patch_16_clients   | 267655 | 264973 | 250566 | 213985
 patch_64_clients   | 380423 | 378318 | 356907 | 294248
 patch_256_clients  | 360915 | 354436 | 326209 | 263664
 patch_512_clients  | 332654 | 321199 | 287521 | 240128
 patch_1024_clients | 288263 | 276614 | 258220 | 217063
 patch_2048_clients | 252280 | 243558 | 230062 | 192429
 patch_4096_clients | 212566 | 213654 | 205951 | 166955
 head_4_clients     |  83686 |  83766 |  81233 |  73749
 head_16_clients    | 266503 | 265546 | 249261 | 213645
 head_64_clients    | 366122 | 363462 | 341078 | 261707
 head_256_clients   | 132600 | 132573 | 134392 | 165799
 head_512_clients   | 118937 | 114332 | 116860 | 150672
 head_1024_clients  | 133546 | 115256 | 125236 | 151390
 head_2048_clients  | 137877 | 117802 | 120909 | 138165
 head_4096_clients  | 113440 | 115611 | 120635 | 114361

Bharath has been measuring similar improvements, where the limit of the
WAL insertion lock begins to be felt when more than 256 concurrent
clients are involved in this specific workload.

An extra patch has been discussed to introduce a fast-exit path in
LWLockUpdateVar() when there are no waiters, still this does not
influence the write-heavy workload cases discussed as there are always
waiters.  This will be considered separately.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVF+6jLvqKe6xhDzCCkr=rfd6upaGc3477Pji1Ke9G7Bg@mail.gmail.com
2023-07-25 13:38:58 +09:00
Amit Kapila d38ad8e31d Fix the display of UNKNOWN message type in apply worker.
We include the message type while displaying an error context in the
apply worker. Now, while retrieving the message type string if the
message type is unknown we throw an error that will hide the original
error. So, instead, we need to simply return the string indicating an
unknown message type.

Reported-by: Ashutosh Bapat
Author: Euler Taveira, Amit Kapila
Reviewed-by: Ashutosh Bapat
Backpatch-through: 15
Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L+ci2zreYWebpzxYsA@mail.gmail.com
2023-07-25 09:12:29 +05:30
Andres Freund f3bc519288 Fix off-by-one in LimitAdditionalPins()
Due to the bug LimitAdditionalPins() could return 0, violating
LimitAdditionalPins()'s API ("One additional pin is always allowed"). This
could be hit when setting shared_buffers very low and using a fair amount of
concurrency.

This bug was introduced in 31966b151e.

Author: "Anton A. Melnikov" <aamelnikov@inbox.ru>
Reported-by: "Anton A. Melnikov" <aamelnikov@inbox.ru>
Reported-by: Victoria Shepard
Discussion: https://postgr.es/m/ae46f2fb-5586-3de0-b54b-1bb0f6410ebd@inbox.ru
Backpatch: 16-
2023-07-24 19:07:52 -07:00
Tom Lane bda97e47af Avoid compiler warning in non-assert builds.
After 3c90dcd03, try_partitionwise_join's child_joinrelids
variable is read only in an Assert, provoking a compiler
warning in non-assert builds.  Rearrange code to avoid the
warning and eliminate unnecessary work in the non-assert case.

Per CI testing (via Jeff Davis and Bharath Rupireddy)

Discussion: https://postgr.es/m/ef0de9713e605451f1b60b30648c5ee900b2394c.camel@j-davis.com
2023-07-22 10:32:52 -04:00
Tom Lane 3c90dcd039 Fix calculation of relid sets for partitionwise child joins.
Applying add_outer_joins_to_relids() to a child join doesn't actually
work, even if we've built a SpecialJoinInfo specialized to the child,
because that function will also compare the join's relids to elements
of the main join_info_list, which only deal in regular relids not
child relids.  This mistake escaped detection by the existing
partitionwise join tests because they didn't test any cases where
add_outer_joins_to_relids() needs to add additional OJ relids (that
is, any cases where join reordering per identity 3 is possible).

Instead, let's apply adjust_child_relids() to the relids of the parent
join.  This requires minor code reordering to collect the relevant
AppendRelInfo structures first, but that's work we'd do shortly anyway.

Report and fix by Richard Guo; cosmetic changes by me

Discussion: https://postgr.es/m/CAMbWs49NCNbyubZWgci3o=_OTY=snCfAPtMnM-32f3mm-K-Ckw@mail.gmail.com
2023-07-21 12:00:14 -04:00
Amit Langote 7c7412cae3 Code review for commit b6e1157e7d
b6e1157e7d made some changes to enforce that
JsonValueExpr.formatted_expr is always set and is the expression that
gives a JsonValueExpr its runtime value, but that's not really
apparent from the comments about and the code manipulating
formatted_expr.  This commit fixes that.

Per suggestion from Álvaro Herrera.

Discussion: https://postgr.es/m/20230718155313.3wqg6encgt32adqb%40alvherre.pgsql
2023-07-21 19:15:34 +09:00
Tom Lane 9089287aa0 Guard against null plan pointer in CachedPlanIsSimplyValid().
If both the passed-in plan pointer and plansource->gplan are
NULL, CachedPlanIsSimplyValid would think that the plan pointer
is possibly-valid and try to dereference it.  For the one extant
call site in plpgsql, this situation doesn't normally happen
which is why we've not noticed. However, it appears to be possible
if the previous use of the cached plan failed, as per report from
Justin Pryzby.  Add an extra check to prevent crashing.
Back-patch to v13 where this code was added.

Discussion: https://postgr.es/m/ZLlV+STFz1l/WhAQ@telsasoft.com
2023-07-20 14:23:46 -04:00
Daniel Gustafsson 29a0ccbce9 Revert "Add notBefore and notAfter to SSL cert info display"
Due to an oversight in reviewing, this used functionality not
compatible with old versions of OpenSSL.

This reverts commit 75ec5e7bec.
2023-07-20 17:18:12 +02:00
Daniel Gustafsson 75ec5e7bec Add notBefore and notAfter to SSL cert info display
This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate.

Author: Cary Huang <cary.huang@highgo.ca>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca
2023-07-20 17:07:32 +02:00
Amit Langote 3c152a27b0 Unify JSON categorize type API and export for external use
This essentially removes the JsonbTypeCategory enum and
jsonb_categorize_type() and integrates any jsonb-specific logic that
was in jsonb_categorize_type() into json_categorize_type(), now
moved to jsonfuncs.c.  The remaining JsonTypeCategory enum and
json_categorize_type() cover the needs of the callers in both json.c
and jsonb.c.  json_categorize_type() has grown a new parameter named
is_jsonb for callers to engage the jsonb-specific behavior of
json_categorize_type().

One notable change in the now exported API of json_categorize_type()
is that it now always returns *outfuncoid even though a caller may
have no need currently to see one.

This is in preparation of later commits to implement additional
SQL/JSON functions.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-07-20 16:19:56 +09:00
Michael Paquier 2a990abd79 Add missing ObjectIdGetDatum() in syscache lookup calls for Oids
Based on how postgres.h foes the Oid <-> Datum conversion, there is no
existing bugs but let's be consistent.  17 spots have been noticed as
incorrectly passing down Oids rather than Datums.  Aleksander got one,
Zhang two and I the rest.

Author: Michael Paquier, Aleksander Alekseev, Zhang Mingli
Discussion: https://postgr.es/m/ZLUhqsqQN1MOaxdw@paquier.xyz
2023-07-20 15:18:25 +09:00
Nathan Bossart cdaedfc96d Support parenthesized syntax for CLUSTER without a table name.
b5913f6120 added a parenthesized syntax for CLUSTER, but it
requires specifying a table name.  This is unlike commands such as
VACUUM and ANALYZE, which do not require specifying a table in the
parenthesized syntax.  This change resolves this inconsistency.
This is preparatory work for a follow-up commit that will move the
unparenthesized syntax to the "Compatibility" section of the
CLUSTER documentation.

Reviewed-by: Melanie Plageman, Michael Paquier
Discussion: https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
2023-07-19 15:26:52 -07:00
Nathan Bossart 018b61f81b Rearrange CLUSTER rules in gram.y.
This change moves the unparenthesized syntax for CLUSTER to the end
of the ClusterStmt rules in preparation for a follow-up commit that
will move this syntax to the "Compatibility" section of the CLUSTER
documentation.  The documentation for the CLUSTER syntaxes has also
been consolidated.

Suggested-by: Melanie Plageman
Discussion https://postgr.es/m/CAAKRu_bc5uHieG1976kGqJKxyWtyQt9yvktjsVX%2Bi7NOigDjOA%40mail.gmail.com
2023-07-19 15:26:43 -07:00
Michael Paquier 4e465aac36 Fix indentation in twophase.c
This has been missed in cb0cca1, noticed before buildfarm member koel
has been able to complain while poking at a different patch.  Like the
other commit, backpatch all the way down to limit the odds of merge
conflicts.

Backpatch-through: 11
2023-07-18 14:04:31 +09:00
Michael Paquier cb0cca1880 Fix recovery of 2PC transaction during crash recovery
A crash in the middle of a checkpoint with some two-phase state data
already flushed to disk by this checkpoint could cause a follow-up crash
recovery to recover twice the same transaction, once from what has been
found in pg_twophase/ at the beginning of recovery and a second time
when replaying its corresponding record.

This would lead to FATAL failures in the startup process during
recovery, where the same transaction would have a state recovered twice
instead of once:
LOG:  recovering prepared transaction 731 from shared memory
LOG:  recovering prepared transaction 731 from shared memory
FATAL:  lock ExclusiveLock on object 731/0/0 is already held

This issue is fixed by skipping the addition of any 2PC state coming
from a record whose equivalent 2PC state file has already been loaded in
TwoPhaseState at the beginning of recovery by restoreTwoPhaseData(),
which is OK as long as the system has not reached a consistent state.

The timing to get a messed up recovery processing is very racy, and
would very unlikely happen.  The thread that has reported the issue has
demonstrated the bug using injection points to force a PANIC in the
middle of a checkpoint.

Issue introduced in 728bd99, so backpatch all the way down.

Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: Michael Paquier
Discussion: https://postgr.es/m/109e6994-b971-48cb-84f6-829646f18b4c.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 11
2023-07-18 13:43:44 +09:00
Nathan Bossart 884eee5bfb Remove db_user_namespace.
This feature was intended to be a temporary measure to support
per-database user names.  A better one hasn't materialized in the
~21 years since it was added, and nobody claims to be using it, so
let's just remove it.

Reviewed-by: Michael Paquier, Magnus Hagander
Discussion: https://postgr.es/m/20230630200509.GA2830328%40nathanxps13
Discussion: https://postgr.es/m/20230630215608.GD2941194%40nathanxps13
2023-07-17 11:44:59 -07:00
David Rowley 2c2eb0d6b2 Shrink memory contexts struct sizes
Here we reduce the block size fields in AllocSetContext, GenerationContext
and SlabContext from Size down to uint32.  Ever since c6e0fe1f2, blocks
for non-dedicated palloc chunks can no longer be larger than 1GB, so
there's no need to store the various block size fields as 64-bit values.
32 bits are enough to store 2^30.

Here we also further reduce the memory context struct sizes by getting rid
of the 'keeper' field which stores a pointer to the context's keeper
block.  All the context types which have this field always allocate the
keeper block in the same allocation as the memory context itself, so the
keeper block always comes right at the end of the context struct.  Add
some macros to calculate that address rather than storing it in the
context.

Overall, in AllocSetContext and GenerationContext, this saves 20 bytes on
64-bit builds which for ALLOCSET_SMALL_SIZES can sometimes mean the
difference between having to allocate a 2nd block and storing all the
required allocations on the keeper block alone.  Such contexts are used
in relcache to store cache entries for indexes, of which there can be
a large number in a single backend.

Author: Melih Mutlu
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/CAGPVpCSOW3uJ1QJmsMR9_oE3X7fG_z4q0AoU4R_w+2RzvroPFg@mail.gmail.com
2023-07-17 11:16:56 +12:00
Tom Lane e08d74ca13 Allow plan nodes with initPlans to be considered parallel-safe.
If the plan itself is parallel-safe, and the initPlans are too,
there's no reason anymore to prevent the plan from being marked
parallel-safe.  That restriction (dating to commit ab77a5a45) was
really a special case of the fact that we couldn't transmit subplans
to parallel workers at all.  We fixed that in commit 5e6d8d2bb and
follow-ons, but this case never got addressed.

We still forbid attaching initPlans to a Gather node that's
inserted pursuant to debug_parallel_query = regress.  That's because,
when we hide the Gather from EXPLAIN output, we'd hide the initPlans
too, causing cosmetic regression diffs.  It seems inadvisable to
kluge EXPLAIN to the extent required to make the output look the
same, so just don't do it in that case.

Along the way, this also takes care of some sloppiness about updating
path costs to match when we move initplans from one place to another
during createplan.c and setrefs.c.  Since all the planning decisions
are already made by that point, this is just cosmetic; but it seems
good to keep EXPLAIN output consistent with where the initplans are.

The diff in query_planner() might be worth remarking on.  I found that
one because after fixing things to allow parallel-safe initplans, one
partition_prune test case changed plans (as shown in the patch) ---
but only when debug_parallel_query was active.  The reason proved to
be that we only bothered to mark Result nodes as potentially
parallel-safe when debug_parallel_query is on.  This neglects the fact
that parallel-safety may be of interest for a sub-query even though
the Result itself doesn't parallelize.

Discussion: https://postgr.es/m/1129530.1681317832@sss.pgh.pa.us
2023-07-14 11:41:20 -04:00
Tom Lane d0d44049d1 Account for optimized MinMax aggregates during SS_finalize_plan.
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table.  When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs.  Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan.  However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.

This seems like clearly a bug, yet there have been no field reports
of problems that could trace to it.  This may be because the types
of Plan nodes that could contain Aggrefs do not have any of the
rescan optimizations that are controlled by extParam/allParam.
Nonetheless it seems certain to bite us someday, so let's fix it
in a self-contained patch that can be back-patched if we find a
case in which there's a live bug pre-v17.

The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs.  That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that.  I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs.  I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.

Given the lack of any currently-known bug, no test case here.

Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
2023-07-14 11:41:20 -04:00
Tom Lane b8d3dae00f Improve error message for MaxAllocSize overrun in accumArrayResult.
Before, if you went past about 64M array elements in array_agg() and
allied functions, you got a generic "invalid memory alloc request
size" error.  This patch replaces that with "array size exceeds the
maximum allowed", which seems more user-friendly since it points you
to needing to reduce the size of your array result.  (This is the
same error text you'd get from construct_md_array in the event of
overrunning the maximum physical size for the finished array.)

Per question from Shaozhong Shi.  Since this hasn't come up often,
I don't feel a need to back-patch.

Discussion: https://postgr.es/m/CA+i5JwYtVS9z2E71PcNKAVPbOn4R2wuj-LqbJsYr_XOz73q7dQ@mail.gmail.com
2023-07-14 10:35:49 -04:00
Amit Langote 00f2a2556c Add missing initializations of p_perminfo
In a61b1f7482, we failed to update transformFromClauseItem() and
buildNSItemFromLists() to set ParseNamespaceItem.p_perminfo causing
it to point to garbage.

Pointed out by Tom Lane.

Reported-by: Farias de Oliveira <matheusfarias519@gmail.com>
Discussion: https://postgr.es/m/3173476.1689286373%40sss.pgh.pa.us
Backpatch-through: 16
2023-07-14 14:56:35 +09:00
Nathan Bossart a0363ab7aa Fix privilege check for SET SESSION AUTHORIZATION.
Presently, the privilege check for SET SESSION AUTHORIZATION checks
whether the original authenticated role was a superuser at
connection start time.  Even if the role loses the superuser
attribute, its existing sessions are permitted to change session
authorization to any role.

This commit modifies this privilege check to verify the original
authenticated role currently has superuser.  In the event that the
authenticated role loses superuser within a session authorization
change, the authorization change will remain in effect, which means
the user can still take advantage of the target role's privileges.
However, [RE]SET SESSION AUTHORIZATION will only permit switching
to the original authenticated role.

Author: Joseph Koshakow
Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com
2023-07-13 21:13:45 -07:00
Nathan Bossart 9987a7bf34 Move privilege check for SET SESSION AUTHORIZATION.
Presently, the privilege check for SET SESSION AUTHORIZATION is
performed in session_authorization's assign_hook.  A relevant
comment states, "It's OK because the check does not require catalog
access and can't fail during an end-of-transaction GUC
reversion..."  However, we plan to add a catalog lookup to this
privilege check in a follow-up commit.

This commit moves this privilege check to the check_hook for
session_authorization.  Like check_role(), we do not throw a hard
error for insufficient privileges when the source is PGC_S_TEST.

Author: Joseph Koshakow
Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com
2023-07-13 21:10:36 -07:00
Amit Kapila edca342434 Allow the use of a hash index on the subscriber during replication.
Commit 89e46da5e5 allowed using BTREE indexes that are neither
PRIMARY KEY nor REPLICA IDENTITY on the subscriber during apply of
update/delete. This patch extends that functionality to also allow HASH
indexes.

We explored supporting other index access methods as well but they don't
have a fixed strategy for equality operation which is required by the
current infrastructure in logical replication to scan the indexes.

Author: Kuroda Hayato
Reviewed-by: Peter Smith, Onder Kalaci, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58669D7414E59664E17A5827F522A@TYAPR01MB5866.jpnprd01.prod.outlook.com
2023-07-14 08:21:54 +05:30
Michael Paquier a5ea825f95 Add indisreplident to fields refreshed by RelationReloadIndexInfo()
RelationReloadIndexInfo() is a fast-path used for index reloads in the
relation cache, and it has always forgotten about updating
indisreplident, which is something that would happen after an index is
selected for a replica identity.  This can lead to incorrect cache
information provided when executing a command in a transaction context
that updates indisreplident.

None of the code paths currently on HEAD that need to check upon
pg_index.indisreplident fetch its value from the relation cache, always
relying on a fresh copy on the syscache.  Unfortunately, this may not be
the case of out-of-core code, that could see out-of-date value.

Author: Shruthi Gowda
Reviewed-by: Robert Haas, Dilip Kumar, Michael Paquier
Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com
Backpatch-through: 11
2023-07-14 11:15:34 +09:00
Michael Paquier 38ea6aa90e Fix updates of indisvalid for partitioned indexes
indisvalid is switched to true for partitioned indexes when all its
partitions have valid indexes when attaching a new partition, up to the
top-most parent if all its leaves are themselves valid when dealing with
multiple layers of partitions.

The copy of the tuple from pg_index used to switch indisvalid to true
came from the relation cache, which is incorrect.  Particularly, in the
case reported by Shruthi Gowda, executing a series of commands in a
single transaction would cause the validation of partitioned indexes to
use an incorrect version of a pg_index tuple, as indexes are reloaded
after an invalidation request with RelationReloadIndexInfo(), a much
faster version than a full index cache rebuild.  In this case, the
limited information updated in the cache leads to an incorrect version
of the tuple used.  One of the symptoms reported was the following
error, with a replica identity update, for instance:
"ERROR: attempted to update invisible tuple"

This is incorrect since 8b08f7d, so backpatch all the way down.

Reported-by: Shruthi Gowda
Author: Michael Paquier
Reviewed-by: Shruthi Gowda, Dilip Kumar
Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com
Backpatch-through: 11
2023-07-14 10:12:48 +09:00
Thomas Munro d0c28601ef Remove wal_sync_method=fsync_writethrough on Windows.
The "fsync" level already flushes drive write caches on Windows (as does
"fdatasync"), so it only confuses matters to have an apparently higher
level that isn't actually different at all.

That leaves "fsync_writethrough" only for macOS, where it actually does
something different.

Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/CA%2BhUKGJ2CG2SouPv2mca2WCTOJxYumvBARRcKPraFMB6GSEMcA%40mail.gmail.com
2023-07-14 12:30:13 +12:00
Michael Paquier aea7fe33fb Add information about line contents on parsing failure of wait_event_names.txt
The contents of the line whose parsing failed was not reported in the
error message produced by generate-wait_event_types.pl, making harder
than necessary the debugging of incorrectly-shaped entries in the file.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
2023-07-14 09:09:23 +09:00
Michael Paquier 183a60a628 Remove double quotes from the second column of wait_event_names.txt
The double quotes used for the wait event names are not required, as
the values quoted are made of single words.  The files generated by
generate-wait_event_types.pl (pgstat_wait_event.c, wait_event_types.h
and wait_event_types.sgml) are exactly the same before and after this
commit, hence the wait event names and the enum elements have the same
names as before.

Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
2023-07-14 08:55:11 +09:00
Andres Freund c66a7d75e6 Handle DROP DATABASE getting interrupted
Until now, when DROP DATABASE got interrupted in the wrong moment, the removal
of the pg_database row would also roll back, even though some irreversible
steps have already been taken. E.g. DropDatabaseBuffers() might have thrown
out dirty buffers, or files could have been unlinked. But we continued to
allow connections to such a corrupted database.

To fix this, mark databases invalid with an in-place update, just before
starting to perform irreversible steps. As we can't add a new column in the
back branches, we use pg_database.datconnlimit = -2 for this purpose.

An invalid database cannot be connected to anymore, but can still be
dropped.

Unfortunately we can't easily add output to psql's \l to indicate that some
database is invalid, it doesn't fit in any of the existing columns.

Add tests verifying that a interrupted DROP DATABASE is handled correctly in
the backend and in various tools.

Reported-by: Evgeny Morozov <postgresql3@realityexists.net>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de
Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de
Backpatch: 11-, bug present in all supported versions
2023-07-13 13:03:28 -07:00
Andres Freund 83ecfa9fad Release lock after encountering bogs row in vac_truncate_clog()
When vac_truncate_clog() encounters bogus datfrozenxid / datminmxid values, it
returns early. Unfortunately, until now, it did not release
WrapLimitsVacuumLock. If the backend later tries to acquire
WrapLimitsVacuumLock, the session / autovacuum worker hangs in an
uncancellable way. Similarly, other sessions will hang waiting for the
lock. However, if the backend holding the lock exited or errored out for some
reason, the lock was released.

The bug was introduced as a side effect of 566372b3d6.

It is interesting that there are no production reports of this problem. That
is likely due to a mix of bugs leading to bogus values having gotten less
common, process exit releasing locks and instances of hangs being hard to
debug for "normal" users.

Discussion: https://postgr.es/m/20230621221208.vhsqgduwfpzwxnpg@awork3.anarazel.de
2023-07-13 13:03:28 -07:00
Amit Langote 40b3af72a7 Add missing const qualifier
Missed in commit 785480c953.

Pointed out by Tom Lane.

Discussion: https://postgr.es/m/2795364.1689221300%40sss.pgh.pa.us
2023-07-13 22:38:26 +09:00
Amit Langote 328f492d25 Fix code indentation violation in commit b6e1157e7d
Per buildfarm member koel via Andrew Dunstan.
2023-07-13 22:26:10 +09:00
Peter Eisentraut e1c83e7b96 Fix untranslatable log message assembly
We can't inject the name of the logical replication worker into a log
message like that.  But for these messages we don't really need the
precision of knowing what kind of worker it was, so just write
"logical replication worker" and keep the message in one piece.

Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPt1xwATviPGjjtJy5L631SGf3qjV9XUCmxLu16cHamfgg%40mail.gmail.com
2023-07-13 13:21:43 +02:00
Michael Paquier ccfca8ea42 Remove duplicated assignment of LLVMJitHandle->lljit
This duplicated assignment when emiting some code not yet compiled.

Oversight in 6c57f2e.

Author: Matheus Alcantara
Reviewed-by: Gurjeet Singh
Discussion: https://postgr.es/m/La1Tfi7wirg9uGbCx_y7Qb9kl2T15mYouDXjCKAFuDqzQWnTRwH7KNLGigLLcxRs91V6dp2ySs1j_7mB4btzEZJ9NIMEAyOjUyAS7Jx-ydQ=@pm.me
2023-07-13 16:44:17 +09:00
Masahiko Sawada fd48a86c62 Doc: clarify the conditions of usable indexes for REPLICA IDENTITY FULL tables.
Commit 89e46da5e allowed REPLICA IDENTITY FULL tables to use an index
on the subscriber during apply of update/delete. This commit clarifies
in the documentation that the leftmost field of candidate indexes must
be a column (not an expression) that references the published relation
column.

The source code comments are also updated accordingly.

Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDJjffEvUFKXT27Q5U8-UU9JHv4rrJ9Ke8Zkc5UPWHLvA@mail.gmail.com
Backpatch-through: 16
2023-07-13 15:03:17 +09:00
Nathan Bossart 0fef877538 Rename session_auth_is_superuser to current_role_is_superuser.
This variable might've been accurately named when it was added in
ea886339b8, but the name hasn't been accurate since at least the
introduction of SET ROLE in e5d6b91220.  The corresponding
documentation was fixed in eedb068c0a.  This commit renames the
variable accordingly.

Suggested-by: Joseph Koshakow
Discussion: https://postgr.es/m/CAAvxfHc-HHzONQ2oXdvhFF9ayRnidPwK%2BfVBhRzaBWYYLVQL-g%40mail.gmail.com
2023-07-12 21:28:54 -07:00
Amit Langote b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
A CaseTestExpr is currently being put into
JsonValueExpr.formatted_expr as placeholder for the result of
evaluating JsonValueExpr.raw_expr, which in turn is evaluated
separately.  Though, there's no need for this indirection if
raw_expr itself can be embedded into formatted_expr and evaluated
as part of evaluating the latter, especially as there is no
special reason to evaluate it separately.  So this commit makes it
so.  As a result, JsonValueExpr.raw_expr no longer needs to be
evaluated in ExecInterpExpr(), eval_const_exprs_mutator() etc. and
is now only used for displaying the original "unformatted"
expression in ruleutils.c.

While at it, this also removes the function makeCaseTestExpr(),
because the code in makeJsonConstructorExpr() looks more readable
without it IMO and isn't used by anyone else either.

Finally, a note is added in the comment above CaseTestExpr's
definition that JsonConstructorExpr is also using it.

Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-07-13 12:13:58 +09:00
Amit Langote 785480c953 Pass constructName to transformJsonValueExpr()
This allows it to pass to coerce_to_specific_type() the actual name
corresponding to the specific JSON_* function expression being
transformed, instead of the currently hardcoded string.

Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-07-13 12:13:49 +09:00
Michael Paquier c17164aec8 Simplify some conditions related to [LW]Lock in generate-wait_event_types.pl
The first check on the enum values was not necessary as the values set
in wait_event_names.txt for the classes LWLock and Lock were able to
satisfy the check.  The second check when generating the C and header
files is now based on a match of the class names, making it simpler to
understand.

Author: Masahiro Ikeda, Michael Paquier
Discussion: https://postgr.es/m/eaf82a85c0ef1b55dc3b651d3f7b867a@oss.nttdata.com
2023-07-13 09:09:04 +09:00
Peter Eisentraut 7ef2912519 Remove ancient special case code for dropping oid columns
The special handling of negative attribute numbers in
RemoveAttributeById() was introduced to support SET WITHOUT OIDS
(commit 24614a9880).  But that feature doesn't exist anymore, so we
can revert to the previous, simpler version.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-07-12 16:13:50 +02:00
Peter Eisentraut 5eaa0e92ee Remove ancient special case code for adding oid columns
The special handling of negative attribute numbers in
ATExecAddColumn() was introduced to support SET WITH OIDS (commit
6d1e361852).  But that feature doesn't exist anymore, so we can revert
to the previous, simpler version.  In passing, also remove an obsolete
comment about OID support.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-07-12 16:13:40 +02:00
Peter Eisentraut adf333b4ed Remove obsolete comment about OID support
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-07-12 16:13:31 +02:00
Peter Eisentraut 8c852ba9a4 Allow some exclusion constraints on partitions
Previously we only allowed unique B-tree constraints on partitions
(and only if the constraint included all the partition keys).  But we
could allow exclusion constraints with the same restriction.  We also
require that those columns be compared for equality, not something
like &&.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec8b1d9b-502e-d1f8-e909-1bf9dffe6fa5@illuminatedcomputing.com
2023-07-12 09:25:17 +02:00
Masahiko Sawada 46ebdfe164 Report index vacuum progress.
This commit adds two columns: indexes_total and indexes_processed, to
pg_stat_progress_vacuum system view to show the index vacuum
progress. These numbers are reported in the "vacuuming indexes" and
"cleaning up indexes" phases.

This uses the new parallel message type for progress reporting added
by be06506e7.

Bump catversion because this changes the definition of
pg_stat_progress_vacuum.

Author: Sami Imseih
Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
2023-07-11 12:34:01 +09:00
Masahiko Sawada f1889729dd Add new parallel message type to progress reporting.
This commit adds a new type of parallel message 'P' to allow a
parallel worker to poke at a leader to update the progress.

Currently it supports only incremental progress reporting but it's
possible to allow for supporting of other backend progress APIs in the
future.

There are no users of this new message type as of this commit. That
will follow in future commits.

Idea from Andres Freund.

Author: Sami Imseih
Reviewed by: Michael Paquier, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com
2023-07-11 12:33:54 +09:00
Thomas Munro 4e9fa6d56b Don't expose Windows' mbstowcs_l() and wcstombs_l().
Windows has similar functions with leading underscores.  Previously, we
provided the rename via a macro in win32_port.h.  In fact its functions
are not always good replacements for the Unix functions, since they
can't deal with UTF-8.  They are only currently used by pg_locale.c,
which is careful to redirect to other Windows routines for UTF-8.  Given
that portability hazard, it seem unlikely to be a good idea to encourage
any other code to think of these functions as being available outside
pg_locale.c.  Any code that thinks it wants these functions probably
wants our wchar2char() or char2wchar() routines instead, or it won't
actually work on Windows in UTF-8 databases.

Furthermore, some major libc implementations including glibc don't have
them (they only have the standard variants without _l), so external code
is very unlikely to require them to exist.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com
2023-07-11 09:34:22 +12:00
Tom Lane a8b7424684 Be more rigorous about local variables in PostgresMain().
Since PostgresMain calls sigsetjmp, any local variables that are not
marked "volatile" have a risk of unspecified behavior.  In practice
this means that when control returns via longjmp, such variables might
get reset to their values as of the time of sigsetjmp, depending on
whether the compiler chose to put them in registers or on the stack.
We were careful about this for "send_ready_for_query", but not the
other local variables.

In the case of the timeout_enabled flags, resetting them to
their initial "false" states is actually good, since we do
"disable_all_timeouts()" in the longjmp cleanup code path.  If that
does not happen, we risk uselessly calling "disable_timeout()" later,
which is harmless but a little bit expensive.  Let's explicitly reset
these flags so that the behavior is correct and platform-independent.
(This change means that we really don't need the new "volatile"
markings after all, but let's install them anyway since any change
in this logic could re-introduce a problem.)

There is no issue for "firstchar" and "input_message" because those
are explicitly reinitialized each time through the query processing
loop.  To make that clearer, move them to be declared inside the loop.
That leaves us with all the function-lifespan locals except the
sigjmp_buf itself marked as volatile, which seems like a good policy
to have going forward.

Because of the possibility of extra disable_timeout() calls, this
seems worth back-patching.

Sergey Shinderuk and Tom Lane

Discussion: https://postgr.es/m/2eda015b-7dff-47fd-d5e2-f1a9899b90a6@postgrespro.ru
2023-07-10 12:14:34 -04:00
Peter Eisentraut a44d96add2 Fix pgindent
for commit e53a611523
2023-07-10 12:05:32 +02:00
Peter Eisentraut e53a611523 Message wording improvements 2023-07-10 10:47:24 +02:00
Michael Paquier 9b286858e3 Add more sanity checks with callers of changeDependencyFor()
changeDependencyFor() returns the number of pg_depend entries changed,
or 0 if there is a problem.  The callers of this routine expect only one
dependency to change, but they did not check for the result returned.
The following code paths gain checks:
- Namespace for extensions.
- Namespace for various object types (see AlterObjectNamespace).
- Planner support function for a function.

Some existing error messages related to all that are reworded to be more
consistent with the project style, and the new error messages added
follow the same style.  This change has exposed one bug fixed a bit
earlier with bd5ddbe.

Reviewed-by: Heikki Linnakangas, Akshat Jaimini
Discussion: https://postgr.es/m/ZJzD/rn+UbloKjB7@paquier.xyz
2023-07-10 13:08:10 +09:00
Michael Paquier bd5ddbe866 Fix ALTER EXTENSION SET SCHEMA with objects outside an extension's schema
As coded, the code would use as a base comparison the namespace OID from
the first object scanned in pg_depend when switching its namespace
dependency entry to the new one, and use it as a base of comparison for
any follow-up checks.  It would also be used as the old namespace OID to
switch *from* for the extension's pg_depend entry.  Hence, if the first
object scanned has a namespace different than the one stored in the
extension, we would finish by:
- Not checking that the extension objects map with the extension's
schema.
- Not switching the extension -> namespace dependency entry to the new
namespace provided by the user, making ALTER EXTENSION ineffective.

This issue exists since this command has been introduced in d9572c4 for
relocatable extension, so backpatch all the way down to 11.  The test
case has been provided by Heikki, that I have tweaked a bit to show the
effects on pg_depend for the extension.

Reported-by: Heikki Linnakangas
Author: Michael Paquier, Heikki Linnakangas
Discussion: https://postgr.es/m/20eea594-a05b-4c31-491b-007b6fceef28@iki.fi
Backpatch-through: 11
2023-07-10 09:40:07 +09:00
Peter Eisentraut f8d03ea727 Remove unnecessary unbind in LDAP search+bind mode
Comments in src/backend/libpq/auth.c say: (after successfully finding
the final DN to check the user-supplied password against)

/* Unbind and disconnect from the LDAP server */

and later

/*
 * Need to re-initialize the LDAP connection, so that we can bind to
 * it with a different username.
 */

But the protocol actually permits multiple subsequent authentications
("binds") over a single connection.

So, it seems like the whole connection re-initialization thing was
just a confusion and can be safely removed, thus saving quite a few
network round-trips, especially for the case of ldaps/starttls.

Author: Anatoly Zaretsky <anatoly.zaretsky@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CALbq6kmJ-1+58df4B51ctPfTOSyPbY8Qi2=ct8oR=i4TamkUoQ@mail.gmail.com
2023-07-09 08:51:46 +02:00
Thomas Munro 8d9a9f034e All supported systems have locale_t.
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems.  For Windows, win32_port.h redirects to a partial
implementation called _locale_t.  We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.

Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed.  With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
2023-07-09 11:55:18 +12:00
Peter Eisentraut 5edf438eeb Make some indentation in gram.y consistent
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2023-07-08 15:56:01 +02:00
Nathan Bossart 151c22deee Revert MAINTAIN privilege and pg_maintain predefined role.
This reverts the following commits: 4dbdb82513, c2122aae63,
5b1a879943, 9e1e9d6560, ff9618e82a, 60684dd834, 4441fc704d,
and b5d6382496.  A role with the MAINTAIN privilege may be able to
use search_path tricks to escalate privileges to the table owner.
Unfortunately, it is too late in the v16 development cycle to apply
the proposed fix, i.e., restricting search_path when running
maintenance commands.

Bumps catversion.

Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org
Backpatch-through: 16
2023-07-07 11:25:13 -07:00
Tomas Vondra ec99d6e9c8 Document relaxed HOT for summarizing indexes
Commit 19d8e2308b allowed a weaker check for HOT with summarizing
indexes, but it did not update README.HOT. So do that now.

Patch by Matthias van de Meent, minor changes by me. Backpatch to 16,
where the optimization was introduced.

Author: Matthias van de Meent
Reviewed-by: Tomas Vondra
Backpatch-through: 16
Discussion: https://postgr.es/m/CAEze2WiEOm8V+c9kUeYp2BPhbEc5s473fUf51xNeqvSFGv44Ew@mail.gmail.com
2023-07-07 19:04:53 +02:00
Heikki Linnakangas 3142a8845b WAL-log the creation of the init fork of unlogged indexes.
We create a file, so we better WAL-log it. In practice, all the
built-in index AMs and all extensions that I'm aware of write a
metapage to the init fork, which is WAL-logged, and replay of the
metapage implicitly creates the fork too. But if ambuildempty() didn't
write any page, we would miss it.

This can be seen with dummy_index_am. Set up replication, create a
'dummy_index_am' index on an unlogged table, and look at the files
created in the replica: the init fork is not created on the
replica. Dummy_index_am doesn't do anything with the relation files,
however, so it doesn't lead to any user-visible errors.

Backpatch to all supported versions.

Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi
2023-07-06 17:25:29 +03:00
Amit Kapila 69a674a170 Fix code indentation vioaltion introduced in commit cc32ec24fd.
Per buildfarm member koel

Discussion: https://postgr.es/m/CAA4eK1K2Y_iegdXRUNbsghY9+b-4cSOrxYt9T8TtwXkkdWMVJA@mail.gmail.com
2023-07-06 11:49:18 +05:30
Michael Paquier a14354cac0 Add GUC parameter "huge_pages_status"
This is useful to show the allocation state of huge pages when setting
up a server with "huge_pages = try", where allocating huge pages would
be attempted but the server would continue its startup sequence even if
the allocation fails.  The effective status of huge pages is not easily
visible without OS-level tools (or for instance, a lookup at
/proc/N/smaps), and the environments where Postgres runs may not
authorize that.  Like the other GUCs related to huge pages, this works
for Linux and Windows.

This GUC can report as values:
- "on", if huge pages were allocated.
- "off", if huge pages were not allocated.
- "unknown", a special state that could only be seen when using for
example postgres -C because it is only possible to know if the shared
memory allocation worked after we can check for the GUC values, even if
checking a runtime-computed GUC.  This value should never be seen when
querying for the GUC on a running server.  An assertion is added to
check that.

The discussion has also turned around having a new function to grab this
status, but this would have required more tricks for -DEXEC_BACKEND,
something that GUCs already handle.

Noriyoshi Shinoda has initiated the thread that has led to the result of
this commit.

Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/TU4PR8401MB1152EBB0D271F827E2E37A01EECC9@TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM
2023-07-06 14:42:36 +09:00
Michael Paquier cf05113eb0 Add newline at the end of header generated by generate-wait_event_types.pl
The header file wait_event_types.h was generated without a newline at
its end, which was inconsistent with all the other things generated
automatically.

Per offline gripe from Nathan Bossart.
2023-07-06 13:35:50 +09:00
Amit Kapila cc32ec24fd Revert the commits related to allowing page lock to conflict among parallel group members.
This commit reverts the work done by commits 3ba59ccc89 and 72e78d831a.
Those commits were incorrect in asserting that we never acquire any other
heavy-weight lock after acquring page lock other than relation extension
lock. We can acquire a lock on catalogs while doing catalog look up after
acquring page lock.

This won't impact any existing feature but we need to think some other way
to achieve this before parallelizing other write operations or even
improving the parallelism in vacuum (like allowing multiple workers
for an index).

Reported-by: Jaime Casanova
Author: Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CAJKUy5jffnRKNvRHKQ0LynRb0RJC-o4P8Ku3x9vGAVLwDBWumQ@mail.gmail.com
2023-07-06 08:52:10 +05:30
Michael Paquier ae6d06f096 Handle \v as a whitespace character in parsers
This commit comes as a continuation of the discussion that has led to
d522b05, as \v was handled inconsistently when parsing array values or
anything going through the parsers, and changing a parser behavior in
stable branches is a scary thing to do.  The parsing of array values now
uses the more central scanner_isspace() and array_isspace() is removed.

As pointing out by Peter Eisentraut, fix a confusing reference to
horizontal space in the parsers with the term "horiz_space".  \f was
included in this set since 3cfdd8f from 2000, but it is not horizontal.
"horiz_space" is renamed to "non_newline_space", to refer to all
whitespace characters except newlines.

The changes impact the parsers for the backend, psql, seg, cube, ecpg
and replication commands.  Note that JSON should not escape \v, as per
RFC 7159, so these are not touched.

Reviewed-by: Peter Eisentraut, Tom Lane
Discussion: https://postgr.es/m/ZJKcjNwWHHvw9ksQ@paquier.xyz
2023-07-06 08:16:24 +09:00
Heikki Linnakangas 4f4d73466d Fix leak of LLVM "fatal-on-oom" section counter.
llvm_release_context() called llvm_enter_fatal_on_oom(), but was missing
the corresponding llvm_leave_fatal_on_oom() call. As a result, if JIT was
used at all, we were almost always in the "fatal-on-oom" state.

It only makes a difference if you use an extension written in C++, and
run out of memory in a C++ 'new' call. In that case, you would get a
PostgreSQL FATAL error, instead of the default behavior of throwing a
C++ exception.

Back-patch to all supported versions.

Reviewed-by: Daniel Gustafsson
Discussion: https://www.postgresql.org/message-id/54b78cca-bc84-dad8-4a7e-5b56f764fab5@iki.fi
2023-07-05 13:13:13 +03:00
Heikki Linnakangas 5e8068f04e Change example in pgindent README on "/*-----" comments.
Most, but not all, of our "/*----" style comments have an end-guard
line with dashes at the end of the comment. However, pgindent doesn't
care about the end-guards, so they mostly just waste screen
space. Going forward, let's not require end-guards.

Remove a broken end-guard in a comment in walsender.c that led me to
think about this. Remove the end guard from another comment in
walsender.c for consistency, so that we use the same style in all
comments in the file.

However, we have thousands of existing "/*----" comments the repository,
so it's not worth the code churn to change them all.

Discussion: https://www.postgresql.org/message-id/fb083c91-d490-3b65-25f3-05e9118b6b0d%40iki.fi
2023-07-05 10:02:15 +03:00
Daniel Gustafsson a77d39d5f4 Rename EVT cache hash to make context name unique
BuildEventTriggerCache sets up a context "EventTriggerCache" which
house a hash named "Event Trigger Cache", which in turn creates a
context with the table name.  This generated log output for memory
context dumps like below:

LOG:  level: 2; EventTriggerCache: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
LOG:  level: 3; Event Trigger Cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used

This renames the hash to ensure that the hash context has a unique
name for easier log reading and debugging.

Discussion: https://postgr.es/m/5EDC969E-CAE3-4CBD-965E-3B8A1294CFA4@yesql.se
2023-07-05 08:53:09 +02:00
Masahiko Sawada 68a59f9e99 pgstat: fix subscription stats entry leak.
Commit 7b64e4b3 taught DropSubscription() to drop stats entry of
subscription that is not associated with a replication slot for apply
worker at DROP SUBSCRIPTION but missed covering the case where the
subscription is not associated with replication slots for both apply
worker and tablesync worker.

Also add a test to verify that the stats for slot-less subscription is
removed at DROP SUBSCRIPTION time.

Backpatch down to 15.

Author: Masahiko Sawada
Reviewed-by: Nathan Bossart, Hayato Kuroda, Melih Mutlu, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoB71zkP7uPT7JDPsZcvp0749ExEQnOJxeNKPDFisHar+w@mail.gmail.com
Backpatch-through: 15
2023-07-05 14:49:46 +09:00
Michael Paquier fa88928470 Generate automatically code and documentation related to wait events
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs

Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.

The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them.  These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.

Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.

This commit has been tested with configure/Makefile, the CI and VPATH
build.  clean, distclean and maintainer-clean were working fine.

Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
2023-07-05 10:53:11 +09:00
Daniel Gustafsson 48efb2302b Fix assertion failure in snapshot building
Clear any potential stale next_phase_at value from the snapshot
builder which otherwise may trip an assertion check ensuring
that there is no next_phase_at value.

This can be reproduced by running 80 concurrent sessions like
the below where $c is a loop counter (assumes there has been
1..$c databases created) :

  echo "
    CREATE TABLE replication_example(id SERIAL PRIMARY KEY,
                                     somedata int,
                                     text varchar(120));
    SELECT 'init' FROM
      pg_create_logical_replication_slot('regression_slot_$c',
                                         'test_decoding');
    SELECT data FROM
      pg_logical_slot_get_changes('regression_slot_$c', NULL,
                                  NULL, 'include-xids', '0',
                                  'skip-empty-xacts', '1');
  " | psql -d regress_$c >>psql.log &

Backpatch down to v16.

Bug: #17695
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: bowenshi <zxwsbg@qq.com>
Discussion: https://postgr.es/m/17695-6be9277c9295985f@postgresql.org
Backpatch-through: v16
2023-07-04 17:36:13 +02:00
Heikki Linnakangas 4b4798e138 Ensure that creation of an empty relfile is fsync'd at checkpoint.
If you create a table and don't insert any data into it, the relation file
is never fsync'd. You don't lose data, because an empty table doesn't have
any data to begin with, but if you crash and lose the file, subsequent
operations on the table will fail with "could not open file" error.

To fix, register an fsync request in mdcreate(), like we do for mdwrite().

Per discussion, we probably should also fsync the containing directory
after creating a new file. But that's a separate and much wider issue.

Backpatch to all supported versions.

Reviewed-by: Andres Freund, Thomas Munro
Discussion: https://www.postgresql.org/message-id/d47d8122-415e-425c-d0a2-e0160829702d%40iki.fi
2023-07-04 17:57:03 +03:00
David Rowley 625d5b3ca0 Allow Incremental Sorts on GiST and SP-GiST indexes
Previously an "amcanorderbyop" index would only be used when the index
could provide sorted results which satisfied all query_pathkeys.  Here
we relax this so that we also allow these indexes to be considered by the
planner when they only provide partially sorted results.  This allows the
planner to later consider making use of an Incremental Sort to satisfy the
remaining pathkeys.  This change is particularly useful for KNN-type
queries which contain a LIMIT clause and an additional ORDER BY clause for
a non-indexed column.

Author: Miroslav Bendik
Reviewed-by: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAPoEpV0QYDtzjwamwWUBqyWpaCVbJV2d6qOD7Uy09bWn47PJtw%40mail.gmail.com
2023-07-04 23:08:52 +12:00
Thomas Munro 03f80daac8 Re-bin segment when memory pages are freed.
It's OK to be lazy about re-binning memory segments when allocating,
because that can only leave segments in a bin that's too high.  We'll
search higher bins if necessary while allocating next time, and
also eventually re-bin, so no memory can become unreachable that way.

However, when freeing memory, the largest contiguous range of free pages
might go up, so we should re-bin eagerly to make sure we don't leave the
segment in a bin that is too low for get_best_segment() to find.

The re-binning code is moved into a function of its own, so it can be
called whenever free pages are returned to the segment's free page map.

Back-patch to all supported releases.

Author: Dongming Liu <ldming101@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com
2023-07-04 15:16:47 +12:00
David Rowley a8c09daa8b Remove trailing zero words from Bitmapsets
Prior to this, Bitmapsets could contain trailing words which had no
members set.  Many places in bitmapset.c had to loop over trailing words
to check for empty words.  If we ensure we always remove these trailing
zero words then we can perform various optimizations such as fast
pathing bms_is_subset to return false when 'a' has more words than 'b'.
A similar optimization is possible in bms_equal.  Both of these together
can yield quite significant performance increases in the query planner
when querying a partitioned table with around 100 or more partitions.

While we're at it, since the minimum number of words a Bitmapset can
contain is 1, we can make use of do/while loops instead of for loops
when looping over all words in a set.  This means checking the loop
condition 1 less time, which for single-word sets cuts the loop
condition checks in half.

Author: David Rowley
Reviewed-by: Yuya Watari
Discussion: https://postgr.es/m/CAApHDvr5O41MuUjw0DQKqmAnv7QqvmLqXReEd5o4nXTzWp8-+w@mail.gmail.com
2023-07-04 12:34:48 +12:00
Nathan Bossart 957845789b Increase size of bgw_library_name.
This commit increases the size of the bgw_library_name member of
the BackgroundWorker struct from BGW_MAXLEN (96) bytes to MAXPGPATH
(default of 1024) bytes so that it can store longer file names
(e.g., absolute paths).

Author: Yurii Rashkovskii
Reviewed-by: Daniel Gustafsson, Aleksander Alekseev
Discussion: https://postgr.es/m/CA%2BRLCQyjFV5Y8tG5QgUb6gjteL4S3p%2B1gcyqWTqigyM93WZ9Pg%40mail.gmail.com
2023-07-03 15:02:16 -07:00
Thomas Munro 126552c85c Fix race in SSI interaction with gin fast path.
The ginfast.c code previously checked for conflicts in before locking
the relevant buffer, leaving a window where a RW conflict could be
missed.  Re-order.

There was also a place where buffer ID and block number were confused
while trying to predicate-lock a page, noted by visual inspection.

Back-patch to all supported releases.  Fixes one more problem discovered
with the reproducer from bug #17949, in this case when Dmitry tried
other index types.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reported-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:07:31 +12:00
Thomas Munro bcc93a389c Fix race in SSI interaction with bitmap heap scan.
When performing a bitmap heap scan, we don't want to miss concurrent
writes that occurred after we observed the heap's rs_nblocks, but before
we took predicate locks on index pages.  Therefore, we can't skip
fetching any heap tuples that are referenced by the index, because we
need to test them all with CheckForSerializableConflictOut().  The
old optimization that would ignore any references to blocks >=
rs_nblocks gets in the way of that requirement, because it means that
concurrent writes in that window are ignored.

Removing that optimization shouldn't affect correctness at any isolation
level, because any new tuples shouldn't be visible to an MVCC snapshot.
There also shouldn't be any error-causing references to heap blocks past
the end, because we should have held at least an AccessShareLock on the
table before the index scan.  It can't get smaller while our transaction
is running.  For now, though, we'll keep the optimization at lower
levels to avoid making unnecessary changes in a bug fix.

Back-patch to all supported releases.  In release 11, the code is in a
different place but not fundamentally different.  Fixes one aspect of
bug #17949.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:07:31 +12:00
Thomas Munro f9b7fc651a Fix race in SSI interaction with empty btrees.
When predicate-locking btrees, we have a special case for completely
empty btrees, since there is no page to lock.  This was racy, because,
without buffer lock held, a matching key could be inserted between the
_bt_search() and the PredicateLockRelation() calls.

Fix, by rechecking _bt_search() after taking the relation-level SIREAD
lock, if using SERIALIZABLE isolation and an empty btree is discovered.

Back-patch to all supported releases.  Fixes one aspect of bug #17949.

Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org
2023-07-04 09:07:31 +12:00
Nathan Bossart 562bee0fc1 Don't truncate database and user names in startup packets.
Unlike commands such as CREATE DATABASE, ProcessStartupPacket()
does not perform multibyte-aware truncation of overlength names.
This means that connection attempts might fail even if the user
provides the same overlength names that were used in CREATE
DATABASE, CREATE ROLE, etc.  Ideally, we'd do the same multibyte-
aware truncation in both code paths, but it doesn't seem worth the
added complexity of trying to discover the encoding of the names.
Instead, let's simply skip truncating the names in the startup
packet and let the user/database lookup fail later on.  With this
change, users must provide the exact names stored in the catalogs,
even if the names were truncated.

This reverts commit d18c1d1f51.

Author: Bertrand Drouvot
Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/07436793-1426-29b2-f924-db7422a05fb7%40gmail.com
2023-07-03 13:18:05 -07:00
Tomas Vondra 29cf61ade3 Consider fillfactor when estimating relation size
When table_block_relation_estimate_size() estimated the number of tuples
in a relation without statistics (e.g. right after load), it did not
consider fillfactor when calculating density. With non-default
fillfactor values, this may result in significant overestimate of the
number of tuples - up to 10x with the minimum 10% fillfactor. This may
have unexpected consequences, e.g. when creating hash indexes.

This considers the current fillfactor value in the "no statistics" code
path.  If the fillfactor changes after loading data into the table, the
estimate may be off. But that seems much less likely than changing the
fillfactor before the data load.

Reviewed-by: Corey Huinker, Peter Eisentraut
Discussion: https://postgr.es/m/cf154ef9-6bac-d268-b735-67a3443debba@enterprisedb.com
2023-07-03 18:55:31 +02:00
Tomas Vondra a4cfeeca5a Fix code indentation violations
Commits ce5aaea8cd, 2b8b2852bb and 28d03feac3 violated the expected code
indentation rules, upsetting the new buildfarm member "koel."

Discussion: https://postgr.es/m/ZKIU4mhWpgJOM0W0%40paquier.xyz
2023-07-03 12:47:49 +02:00
Peter Eisentraut 6d56c501a7 A minor simplification for List manipulation
Fix one place that was using lfirst(list_head(list)) by using linitial(list)
instead.  They are equivalent but the latter is simpler.  We did the same in
9d299a49.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs49dJnpezDQDDxCPKq7+=_3NyqLqGqnhqCjd+dYe4MS15w@mail.gmail.com
2023-07-03 11:39:03 +02:00
Peter Eisentraut c69bdf837f Take pg_attribute out of VacAttrStats
The VacAttrStats structure contained the whole Form_pg_attribute for a
column, but it actually only needs attstattarget from there.  So
remove the Form_pg_attribute field and make a separate field for
attstattarget.  This simplifies some code for extended statistics that
doesn't deal with a column but an expression, which had to fake up
pg_attribute rows to satisfy internal APIs.  Also, we can remove some
comments that essentially said "don't look at pg_attribute directly".

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
2023-07-03 07:18:57 +02:00
Peter Eisentraut 7a7f60aef8 Add macro for maximum statistics target
The number of places where 10000 was hardcoded had grown a bit beyond
the comfort level.  Introduce a macro MAX_STATISTICS_TARGET instead.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
2023-07-03 07:18:57 +02:00
Peter Eisentraut 3ee2f25d21 Change type of pg_statistic_ext.stxstattarget
Change from int32 to int16, to match attstattarget (changed in
90189eefc1).

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/d6069765-5971-04d3-c10d-e4f7b2e9c459%40eisentraut.org
2023-07-03 07:18:57 +02:00
Michael Paquier 8e278b6576 Remove support for OpenSSL 1.0.1
Here are some notes about this change:
- As X509_get_signature_nid() should always exist (OpenSSL and
LibreSSL), hence HAVE_X509_GET_SIGNATURE_NID is now gone.
- OPENSSL_API_COMPAT is bumped to 0x10002000L.
- One comment related to 1.0.1e introduced by 74242c2 is removed.

Upstream OpenSSL still provides long-term support for 1.0.2 in a closed
fashion, so removing it is out of scope for a few years, at least.

Reviewed-by: Jacob Champion, Daniel Gustafsson
Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz
2023-07-03 13:20:27 +09:00
Michael Paquier 2aeaf80e57 Refactor some code related to wait events "BufferPin" and "Extension"
The following changes are done:
- Addition of WaitEventBufferPin and WaitEventExtension, that hold a
list of wait events related to each category.
- Addition of two functions that encapsulate the list of wait events for
each category.
- Rename BUFFER_PIN to BUFFERPIN (only this wait event class used an
underscore, requiring a specific rule in the automation script).

These changes make a bit easier the automatic generation of all the code
and documentation related to wait events, as all the wait event
categories are now controlled by consistent structures and functions.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/c6f35117-4b20-4c78-1df5-d3056010dcf5@gmail.com
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
2023-07-03 11:01:02 +09:00
David Rowley c65102006b Remove redundant PARTITION BY columns from WindowClauses
Here we adjust the query planner to have it remove items from a window
clause's PARTITION BY clause in cases where the pathkey for a column in
the PARTITION BY clause is redundant.

Doing this allows the optimization added in 9d9c02ccd to stop window
aggregation early rather than going into "pass-through" mode to find
tuples belonging to the next partition.  Also, when we manage to remove
all PARTITION BY columns, we now no longer needlessly check that the
current tuple belongs to the same partition as the last tuple in
nodeWindowAgg.c.  If the pathkey was redundant then all tuples must
contain the same value for the given redundant column, so there's no point
in checking that during execution.

Author: David Rowley
Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/CAApHDvo2ji+hdxrxfXtRtsfSVw3to2o1nCO20qimw0dUGK8hcQ@mail.gmail.com
2023-07-03 12:49:43 +12:00
Thomas Munro 4637a6ac0b Silence "missing contrecord" error.
Commit dd38ff28ad added a new error message "missing contrecord" when
we fail to reassemble a record.  Unfortunately that caused noisy
messages to be logged by pg_waldump at end of segment, and by walsender
when asked to shut down on a segment boundary.

Remove the new error message, so that this condition signals end-of-
WAL without a message.  It's arguably a reportable condition that should
not be silenced while performing crash recovery, but fixing that without
introducing noise in the other cases will require more research.

Back-patch to 15.

Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/6a1df56e-4656-b3ce-4b7a-a9cb41df8189%40enterprisedb.com
2023-07-03 11:16:27 +12:00
Tomas Vondra ce5aaea8cd Fix oversight in handling of modifiedCols since f24523672d
Commit f24523672d fixed a memory leak by moving the modifiedCols bitmap
into the per-row memory context. In the case of AFTER UPDATE triggers,
the bitmap is however referenced from an event kept until the end of the
query, resulting in a use-after-free bug.

Fixed by copying the bitmap into the AfterTriggerEvents memory context,
which is the one where we keep the trigger events. There's only one
place that needs to do the copy, but the memory context may not exist
yet. Doing that in a separate function seems more readable.

Report by Alexander Pyhalov, fix by me. Backpatch to 13, where the
bitmap was added to the event by commit 71d60e2aa0.

Reported-by: Alexander Pyhalov
Backpatch-through: 13
Discussion: https://postgr.es/m/acddb17c89b0d6cb940eaeda18c08bbe@postgrespro.ru
2023-07-02 22:21:02 +02:00
Tomas Vondra 98640f960e Fix memory leak in Incremental Sort rescans
The Incremental Sort had a couple issues, resulting in leaking memory
during rescans, possibly triggering OOM. The code had a couple of
related flaws:

1. During rescans, the sort states were reset but then also set to NULL
   (despite the comment saying otherwise). ExecIncrementalSort then
   sees NULL and initializes a new sort state, leaking the memory used
   by the old one.

2. Initializing the sort state also automatically rebuilt the info about
   presorted keys, leaking the already initialized info. presorted_keys
   was also unnecessarily reset to NULL.

Patch by James Coleman, based on patches by Laurenz Albe and Tom Lane.
Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman, Laurenz Albe, Tom Lane
Reported-by: Laurenz Albe, Zu-Ming Jiang
Backpatch-through: 13
Discussion: https://postgr.es/m/b2bd02dff61af15e3526293e2771f874cf2a3be7.camel%40cybertec.at
Discussion: https://postgr.es/m/db03c582-086d-e7cd-d4a1-3bc722f81765%40inf.ethz.ch
2023-07-02 20:03:30 +02:00
Tomas Vondra 0457109344 Improve BRIN minmax-multi opclass test coverage
Per the code coverage report, the existing regression tests did not
exercice some a couple important BRIN minmax-multi code paths.

- The tests focused on testing planning with a range of scan key
  strategies, but not the execution. Fixed by adding queries that
  actually test query execution for both equality and inequality.

- All tests created indexes after inserting data, but this only
  exercises the CREATE INDEX strategy that sees all values at once, not
  incremental summary updates. The new tests flip the order and create
  the index before adding data.

- The assert check(s) validating correctness of expanded ranges were
  present only in the "union" code path, which is not covered by
  regression tests at all (as it requires concurrency etc.). Fixed by
  adding the asserts to a couple more places.

Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/57020b2e-d9c9-9bc7-4892-b36d9bb07563%40enterprisedb.com
2023-07-02 10:33:38 +02:00
Tomas Vondra 2b8b2852bb Introduce bloom_filter_size for BRIN bloom opclass
Move the calculation of Bloom filter parameters (for BRIN indexes) into
a separate function to make reuse easier. At the moment we only call it
from one place, but that may change and it's easier to read anyway.

Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/0e1f3350-c9cf-ab62-43a5-5dae314de89c%40enterprisedb.com
2023-07-02 10:24:29 +02:00
Tomas Vondra 28d03feac3 Minor cleanups in the BRIN code
BRIN bloom and minmax-multi opclasses were somewhat inconsistent when
dealing with bool variables, assigning to them Datum values etc. While
not a bug, it makes the code harder to understand, so fix that.

While at it, update an incorrect comment copied to bloom opclass from
minmax, talking about strategies not supported by bloom.

Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/0e1f3350-c9cf-ab62-43a5-5dae314de89c%40enterprisedb.com
2023-07-02 10:21:17 +02:00
Thomas Munro 4f49b3f849 Trust signalfd on illumos, again.
Commit 3ab4fc5d avoided choosing signalfd by default on illumos, because
it triggered kernel panics.  That was fixed, so we can remove a kludge
from our code.  Users/packagers can still override the default choice at
compile time if desired, and we'll leave the back-branches unchanged so
they keep choosing self-pipe by default, but we'll default to signalfd
(like we do for Linux) in 17.  Fixed kernels should be everywhere by the
time 17 ships.

The illumos issues were:

 * https://www.illumos.org/issues/13700
 * https://www.illumos.org/issues/14892

Discussion: https://postgr.es/m/CA+hUKG+NK-K_G_i1H3OpDTwYPEsiwQi_jw58PGcW2H+-N2eVCA@mail.gmail.com
2023-07-02 15:28:48 +12:00
Heikki Linnakangas e251e780bf Remove redundant check for fast_forward.
We already checked for it earlier in the function.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251a3e@iki.fi
2023-06-30 18:31:10 +03:00
Heikki Linnakangas a0dd4c95b9 Improve comment on why we need ctid->(cmin,cmax) mapping.
Combocids are only part of the problem. Explain the problem in more detail.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/1ba2899e-77f8-7866-79e5-f3b7d1251a3e@iki.fi
2023-06-30 18:30:32 +03:00
Michael Paquier cfc43aeb38 Fix marking of indisvalid for partitioned indexes at creation
The logic that introduced partitioned indexes missed a few things when
invalidating a partitioned index when these are created, still the code
is written to handle recursions:
1) If created from scratch because a mapping index could not be found,
the new index created could be itself invalid, if for example it was a
partitioned index with one of its leaves invalid.
2) A CCI was missing when indisvalid is set for a parent index, leading
to inconsistent trees when recursing across more than one level for a
partitioned index creation if an invalidation of the parent was
required.

This could lead to the creation of a partition index tree where some of
the partitioned indexes are marked as invalid, but some of the parents
are marked valid, which is not something that should happen (as
validatePartitionedIndex() defines, indisvalid is switched to true for a
partitioned index iff all its partitions are themselves valid).

This patch makes sure that indisvalid is set to false on a partitioned
index if at least one of its partition is invalid.  The flag is set to
true if *all* its partitions are valid.

The regression test added in this commit abuses of a failed concurrent
index creation, marked as invalid, that maps with an index created on
its partitioned table afterwards.

Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/14987634-43c0-0cb3-e075-94d423607e08@gmail.com
Backpatch-through: 11
2023-06-30 13:54:48 +09:00
Michael Paquier 23d8624fe5 Use named captures in Catalog::ParseHeader()
Using at least perl 5.14 is required since 4c15327, meaning that it is
possible to use named captures and the %+ hash instead of having to
count parenthesis groups manually.

While on it, CATALOG is made more flexible in its handling of
whitespaces for parameter lists (see the addition of \s* in this
case).  The generated postgres.bki remains exactly the same before and
after this commit.

Author: Dagfinn Ilmari Mannsåker
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/87y1l3s7o9.fsf@wibble.ilmari.org
2023-06-30 09:16:27 +09:00
Michael Paquier 97d8910104 Fix pg_depend entry to AMs after ALTER TABLE .. SET ACCESS METHOD
ALTER TABLE .. SET ACCESS METHOD was not registering a dependency to the
new access method with the relation altered in its rewrite phase, making
possible the drop of an access method even if there are relations that
depend on it.  During the rewrite, a temporary relation is created to
build the new relation files before swapping the new and old files, and,
while the temporary relation was registering a correct dependency to the
new AM, the old relation did not do that.  A dependency on the access
method is added when the relation files are swapped, which is the point
where pg_class is updated.

Materialized views and tables use the same code path, hence both were
impacted.

Backpatch down to 15, where this command has been introduced.

Reported-by: Alexander Lakhin
Reviewed-by: Nathan Bossart, Andres Freund
Discussion: https://postgr.es/m/18000-9145c25b1af475ca@postgresql.org
Backpatch-through: 15
2023-06-30 07:49:01 +09:00
Tom Lane a798660ebe Defend against bogus parameterization of join input paths.
An outer join cannot be formed using an input path that is parameterized
by a value that is supposed to be nulled by the outer join.  This is
obviously nonsensical, and it could lead to a bad plan being selected;
although currently it seems that we'll hit various sanity-check
assertions first.

I think that such cases were formerly prevented by the delay_upper_joins
mechanism, but now that that's gone we need an explicit check.

(Perhaps we should avoid generating baserel paths that could
lead to this situation in the first place; but it seems like
having a defense at the join level would be a good idea anyway.)

Richard Guo and Tom Lane, per report from Jaime Casanova

Discussion: https://postgr.es/m/CAJKUy5g2uZRrUDZJ8p-=giwcSHVUn0c9nmdxPSY0jF0Ov8VoEA@mail.gmail.com
2023-06-29 12:12:52 -04:00
Tom Lane 43af714def Fix order of operations in ExecEvalFieldStoreDeForm().
If the given composite datum is toasted out-of-line,
DatumGetHeapTupleHeader will perform database accesses to detoast it.
That can invalidate the result of get_cached_rowtype, as documented
(perhaps not plainly enough) in that function's API spec; which leads
to strange errors or crashes when we try to use the TupleDesc to read
the tuple.  In short then, trying to update a field of a composite
column could fail intermittently if the overall column value is wide
enough to require toasting.

We can fix the bug at no cost by just changing the order of
operations, since we don't need the TupleDesc until after detoasting.
(Other callers of get_cached_rowtype appear to get this right already,
so there's only one bug.)

Note that the added regression test case reveals this bug reliably
only with debug_discard_caches/CLOBBER_CACHE_ALWAYS.

Per bug #17994 from Alexander Lakhin.  Sadly, this patch does not fix
the missing-values issue revealed in the bug discussion; we'll need
some more work to cover that.

Discussion: https://postgr.es/m/17994-5c7100b51b4790e9@postgresql.org
2023-06-29 10:19:10 -04:00
Peter Eisentraut efcf55f8fe Remove inappropriate raw_expression_tree_walker() code
It was walking into the ColumnDef->compression field, which is not a
node but a string.  This code is currently not reachable (because the
compression field is only set in situations that don't go through
raw_expression_tree_walker()), but if it had been, this could have
behaved erratically.
2023-06-29 10:34:53 +02:00
Peter Eisentraut 39a584dc90 Error message wording improvements 2023-06-29 09:14:55 +02:00
Peter Eisentraut 046c8c5c8f Reword error messages for consistency 2023-06-28 19:30:26 +02:00
Michael Paquier fc55c7ff8d Ignore invalid indexes when enforcing index rules in ALTER TABLE ATTACH PARTITION
A portion of ALTER TABLE .. ATTACH PARTITION is to ensure that the
partition being attached to the partitioned table has a correct set of
indexes, so as there is a consistent index mapping between the
partitioned table and its new-to-be partition.  However, as introduced
in 8b08f7d, the current logic could choose an invalid index as a match,
which is something that can exist when dealing with more than two levels
of partitioning, like attaching a partitioned table (that has
partitions, with an index created by CREATE INDEX ON ONLY) to another
partitioned table.

A partitioned index with indisvalid set to false is equivalent to an
incomplete partition tree, meaning that an invalid partitioned index
does not have indexes defined in all its partitions.  Hence, choosing an
invalid partitioned index can create inconsistent partition index trees,
where the parent attaching to is valid, but its partition may be
invalid.

In the report from Alexander Lakhin, this showed up as an assertion
failure when validating an index.  Without assertions enabled, the
partition index tree would be actually broken, as indisvalid should
be switched to true for a partitioned index once all its partitions are
themselves valid.  With two levels of partitioning, the top partitioned
table used a valid index and was able to link to an invalid index stored
on its partition, itself a partitioned table.

I have studied a few options here (like the possibility to switch
indisvalid to false for the parent), but came down to the conclusion
that we'd better rely on a simple rule: invalid indexes had better never
be chosen, so as the partition attached uses and creates indexes that
the parent expects.  Some regression tests are added to provide some
coverage.  Note that the existing coverage is not impacted.

This is a problem since partitioned indexes exist, so backpatch all the
way down to v11.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/14987634-43c0-0cb3-e075-94d423607e08@gmail.com
Backpatch-through: 11
2023-06-28 15:57:31 +09:00
Michael Paquier 2ecbb0a493 Remove dependency to query text in JumbleQuery()
Since 3db72eb, the query ID of utilities is generated using the Query
structure, making the use of the query string in JumbleQuery()
unnecessary.  This commit removes the argument "querytext" from
JumbleQuery().

Reported-by: Joe Conway
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/ZJlQAWE4COFqHuAV@paquier.xyz
2023-06-28 08:59:36 +09:00
Peter Eisentraut 3f0199da5a Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: ab77975e9d2cde44da796c18af3ec1a66f0df7ae
2023-06-26 12:02:02 +02:00
Heikki Linnakangas b491a15f8c Change "..." to cstring in old input/output function comments.
It was not clear what the "..." meant.

Author: Steve Chavez
Discussion: https://www.postgresql.org/message-id/CAGRrpzZzeh7zC3yaVG9di%3DydJ%2BiHwdXnFPB3evGFKvC1zf6ajA@mail.gmail.com
2023-06-26 11:52:31 +03:00
Tom Lane 691594acd6 Check for interrupts and stack overflow in TParserGet().
TParserGet() recurses for some token types, meaning it's possible
to drive it to stack overflow.  Since this is a minority behavior,
I chose to add the check_stack_depth() call to the two places that
recurse rather than doing it during every single call.

While at it, add CHECK_FOR_INTERRUPTS(), because this can run
unpleasantly long for long inputs.

Per bug #17995 from Zuming Jiang.  This is old, so back-patch
to all supported branches.

Discussion: https://postgr.es/m/17995-9f20ff3e6389db4c@postgresql.org
2023-06-24 17:18:08 -04:00
Peter Eisentraut 3ad5f07c0f Error message refactoring
Take some untranslatable things out of the message and replace by
format placeholders, to reduce translatable strings and reduce
translation mistakes.
2023-06-23 16:36:17 +02:00
Nathan Bossart 4dbdb82513 Fix cache lookup hazards introduced by ff9618e82a.
ff9618e82a introduced has_partition_ancestor_privs(), which is used
to check whether a user has MAINTAIN on any partition ancestors.
This involves syscache lookups, and presently this function does
not take any relation locks, so it is likely subject to the same
kind of cache lookup failures that were fixed by 19de0ab23c.

To fix this problem, this commit partially reverts ff9618e82a.
Specifically, it removes the partition-related changes, including
the has_partition_ancestor_privs() function mentioned above.  This
means that MAINTAIN on a partitioned table is no longer sufficient
to perform maintenance commands on its partitions.  This is more
like how privileges for maintenance commands work on supported
versions.  Privileges are checked for each partition, so a command
that flows down to all partitions might refuse to process them
(e.g., if the current user doesn't have MAINTAIN on the partition).

In passing, adjust a few related comments and error messages, and
add a test for the privilege checks for CLUSTER on a partitioned
table.

Reviewed-by: Michael Paquier, Jeff Davis
Discussion: https://postgr.es/m/20230613211246.GA219055%40nathanxps13
2023-06-22 15:48:20 -07:00
Peter Geoghegan 5f0762f147 nbtree VACUUM: cope with topparent inconsistencies.
Avoid "right sibling %u of block %u is not next child" errors when
vacuuming a corrupt nbtree index.  Just LOG the issue and press on.
That way VACUUM will have a decent chance of finishing off all required
processing for the index (and for the table as a whole).

This is similar to recent work from commit 5abff197, as well as work
from commit 5b861baa (later backpatched as commit 43e409ce), which
taught nbtree VACUUM to keep going when its "re-find" check fails.  The
hardening added by this commit takes place directly after the "re-find"
check, right before the critical section for the first stage of page
deletion.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=dayg0vjs4+er84TS9ami=csdzjpuiCGbEw=idhwqhzQ@mail.gmail.com
Backpatch: 11- (all supported versions).
2023-06-21 17:41:58 -07:00
Jeff Davis f3a01af29b ICU: do not convert locale 'C' to 'en-US-u-va-posix'.
Older versions of ICU canonicalize "C" to "en-US-u-va-posix"; but
starting in ICU version 64, the "C" locale is considered
obsolete. Postgres commit ea1db8ae70 introduced code to always
canonicalize "C" to "en-US-u-va-posix" for consistency and
convenience, but it was deemed too confusing.

This commit removes that code, so that "C" is treated like other ICU
locale names: canonicalization is attempted, and if it fails, the
behavior is controlled by icu_validation_level.

A similar change was previously committed as f7faa9976c, then reverted
due to an ICU-version-dependent test failure. This commit un-reverts
it, omitting the test because we now expect the behavior to depend on
the version of ICU being used.

Discussion: https://postgr.es/m/3a200aca-4672-4b37-fc91-5d198a323503%40eisentraut.org
Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com
2023-06-21 13:18:25 -07:00
Tom Lane 555b929bbe Avoid Assert failure when processing empty statement in aborted xact.
exec_parse_message() wants to create a cached plan in all cases,
including for empty input.  The empty-input path does not have
a test for being in an aborted transaction, making it possible
that plancache.c will fail due to trying to do database lookups
even though there's no real work to do.

One solution would be to throw an aborted-transaction error in
this path too, but it's not entirely clear whether the lack of
such an error was intentional or whether some clients might be
relying on non-error behavior.  Instead, let's hack plancache.c
so that it treats empty statements with the same logic it
already had for transaction control commands, ensuring that it
can soldier through even in an already-aborted transaction.

Per bug #17983 from Alexander Lakhin.  Back-patch to all
supported branches.

Discussion: https://postgr.es/m/17983-da4569fcb878672e@postgresql.org
2023-06-21 11:07:24 -04:00
Amit Kapila a734caa25f Fix the errhint message and docs for drop subscription failure.
The existing errhint message and docs were missing the fact that we can't
disassociate from the slot unless the subscription is disabled.

Author: Robert Sjöblom, Peter Smith
Reviewed-by: Peter Eisentraut, Amit Kapila
Backpatch-through: 11
Discussion: https://postgr.es/m/807bdf85-61ea-88e2-5712-6d9fcd4eabff@fortnox.se
2023-06-21 10:36:09 +05:30
Nathan Bossart 5b1a879943 Move bool parameter for vacuum_rel() to option bits.
ff9618e82a introduced the skip_privs parameter, which is used to
skip privilege checks when recursing to a relation's TOAST table.
This parameter should have been added as a flag bit in
VacuumParams->options instead.

Suggested-by: Michael Paquier
Reviewed-by: Michael Paquier, Jeff Davis
Discussion: https://postgr.es/m/ZIj4v1CwqlDVJZfB%40paquier.xyz
2023-06-20 15:14:58 -07:00
Tom Lane 45392626c9 Fix hash join when inner hashkey expressions contain Params.
If the inner-side expressions contain PARAM_EXEC Params, we must
re-hash whenever the values of those Params change.  The executor
mechanism for that exists already, but we failed to invoke it because
finalize_plan() neglected to search the Hash.hashkeys field for
Params.  This allowed a previous scan's hash table to be re-used
when it should not be, leading to rows missing from the join's output.
(I believe incorrectly-included join rows are impossible however,
since checking the real hashclauses would reject false matches.)

This bug is very ancient, dating probably to d24d75ff1 of 7.4.
Sadly, this simple fix depends on the plan representational changes
made by 2abd7ae9b, so it will only work back to v12.  I thought
about trying to make some kind of hack for v11, but I'm leery
of putting code significantly different from what is used in the
newer branches into a nearly-EOL branch.  Seeing that the bug
escaped detection for a full twenty years, problematic cases
must be rare; so I don't feel too awful about leaving v11 as-is.

Per bug #17985 from Zuming Jiang.  Back-patch to v12.

Discussion: https://postgr.es/m/17985-748b66607acd432e@postgresql.org
2023-06-20 17:47:53 -04:00
Tom Lane 3af87736bf Fix another cause of "wrong varnullingrels" planner failures.
I removed the delay_upper_joins mechanism in commit b448f1c8d,
reasoning that it was only needed when we have a single-table
(SELECT ... WHERE) as the immediate RHS child of a left join,
and we could get rid of that by hoisting the WHERE condition into
the parent join's quals.  However that new code missed a case:
we could have "foo LEFT JOIN ((SELECT ... WHERE) LEFT JOIN bar)",
and if the two left joins can be commuted then we now have the
problematic query shape.  We can fix this too easily enough,
by allowing the syntactically-lower left join to pass through
its parent qual location pointer recursively.  That lets
prepjointree.c discard the SELECT by temporarily hoisting the
WHERE condition into the ancestor join's qual.

Per bug #17978 from Zuming Jiang.

Discussion: https://postgr.es/m/17978-12f3d93a55297266@postgresql.org
2023-06-20 11:09:56 -04:00
Tom Lane efeb12ef0b Don't include outer join relids in lateral_relids bitmapsets.
This avoids an assertion failure when outer joins are rearranged
per identity 3.  Listing only the baserels from a PlaceHolderVar's
ph_lateral set should be enough to ensure that the required values
are available when we need to compute the PHV --- it's what we
did before inventing nullingrel sets, after all.  It's a bit
unsatisfying; but with beta2 hard upon us, there's not time to
look for an aesthetically cleaner fix.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com
2023-06-20 10:29:57 -04:00
Tom Lane 0655c03ef9 Centralize fixups for mismatched nullingrels in nestloop params.
It turns out that the fixes we applied in commits bfd332b3f
and 63e4f13d2 were not nearly enough to solve the problem.
We'd focused narrowly on subquery RTEs with lateral references,
but lateral references can occur in several other RTE kinds
such as function RTEs.  Putting the same hack into half a dozen
code paths seems quite unattractive.  Hence, revert the code changes
(but not the test cases) from those commits and instead solve it
centrally in identify_current_nestloop_params(), as Richard proposed
originally.  This is a bit annoying because it could mask erroneous
nullingrels in nestloop params that are generated from non-LATERAL
parameterized paths; but on balance I don't see a better way.
Maybe at some future time we'll be motivated to find a more rigorous
approach to nestloop params, but that's not happening for beta2.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com
2023-06-20 10:22:52 -04:00
Tom Lane b334612b8a Pre-beta2 mechanical code beautification.
Run pgindent and pgperltidy.  It seems we're still some ways
away from all committers doing this automatically.  Now that
we have a buildfarm animal that will whine about poorly-indented
code, we'll try to keep the tree more tidy.

Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
2023-06-20 09:50:43 -04:00
Michael Paquier 68cb5af46c Enable archiving in recovery TAP test 009_twophase.pl
This is a follow-up of f663b00, that has been committed to v13 and v14,
tweaking the TAP test for two-phase transactions so as it provides
coverage for the bug that has been fixed.  This change is done in its
own commit for clarity, as v15 and HEAD did not show the problematic
behavior, still missed coverage for it.

While on it, this adds a comment about the dependency of the last
partial segment rename and RecoverPreparedTransactions() at the end of
recovery, as that can be easy to miss.

Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at
Backpatch-through: 13
2023-06-20 10:25:27 +09:00
Andres Freund 0d369ac650 fd.c: Retry after EINTR in more places
Starting with 4d330a61bb we can use posix_fallocate() to extend
files. Unfortunately in some situation, e.g. on tmpfs filesystems, EINTR may
be returned. See also 4518c798b2.

To fix, add a retry path to FileFallocate(). In contrast to 4518c798b2 the
amount we extend by is limited and the extending may happen at a high
frequency, so disabling signals does not appear to be the correct path here.

Also add retry paths to other file operations currently lacking them (around
fdatasync(), fsync(), ftruncate(), posix_fadvise(), sync_file_range(),
truncate()) - they are all documented or have been observed to return EINTR.

Even though most of these functions used in the back branches, it does not
seem worth the risk to backpatch - outside of the new-to-16 case of
posix_fallocate() I am not aware of problem reports due to the lack of
retries.

Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/ZEZDj1H61ryrmY9o@msg.df7cb.de
Backpatch: -
2023-06-19 14:11:32 -07:00
David Rowley 7fcd7ef2a9 Don't use partial unique indexes for unique proofs in the planner
Here we adjust relation_has_unique_index_for() so that it no longer makes
use of partial unique indexes as uniqueness proofs.  It is incorrect to
use these as the predicates used by check_index_predicates() to set
predOK makes use of not only baserestrictinfo quals as proofs, but also
qual from join conditions.  For relation_has_unique_index_for()'s case, we
need to know the relation is unique for a given set of columns before any
joins are evaluated, so if predOK was only set to true due to some join
qual, then it's unsafe to use such indexes in
relation_has_unique_index_for().  The final plan may not even make use
of that index, which could result in reading tuples that are not as
unique as the planner previously expected them to be.

Bug: #17975
Reported-by: Tor Erik Linnerud
Backpatch-through: 11, all supported versions
Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org
2023-06-19 13:00:42 +12:00
Jeff Davis a14e75eb0b CREATE DATABASE: make LOCALE apply to all collation providers.
For CREATE DATABASE, make LOCALE parameter apply regardless of the
provider used. Also affects initdb and createdb --locale arguments.

Previously, LOCALE (and --locale) only affected the database default
collation when using the libc provider.

Discussion: https://postgr.es/m/1a63084d-221e-4075-619e-6b3e590f673e@enterprisedb.com
Reviewed-by: Peter Eisentraut
2023-06-16 10:27:32 -07:00
Amit Langote 160c23b5fa Fix typo in comment.
Back-patch down to 11.

Author: Sho Kato (<kato-sho@fujitsu.com>)
Discussion: https://postgr.es/m/TYCPR01MB68499042A33BC32241193AAF9F5BA%40TYCPR01MB6849.jpnprd01.prod.outlook.com
2023-06-16 10:17:15 +09:00
Tom Lane f4c00d138f When removing a left join, clean out references in EquivalenceClasses.
Since commit b448f1c8d, we've been able to remove left joins
(that are otherwise removable) even when they are underneath
other left joins, a case that was previously prevented by a
delay_upper_joins check.  This is a clear improvement, but
it has a surprising side-effect: it's now possible that there
are EquivalenceClasses whose relid sets mention the removed
baserel and/or outer join.  If we fail to clean those up,
we may drop essential join quals due to not having any join
level that appears to satisfy their relid sets.

(It's not quite 100% clear that this was impossible before.
But the lack of complaints since we added join removal a dozen
years ago strongly suggests that it was impossible.)

Richard Guo and Tom Lane, per bug #17976 from Zuming Jiang

Discussion: https://postgr.es/m/17976-4b638b525e9a983b@postgresql.org
2023-06-15 15:24:50 -04:00
Amit Langote 5c8c8079b0 Remove outdated reference to a removed file
parse_jsontable.c was removed as part of 2f2b18bd3f, though its
mention in src/backend/parser/README was not.  Fix that.

Discussion: https://postgr.es/m/CA%2BHiwqHDzw8AP8p_dEkFr0xg458ZTf58zbivAHhK4UeNrx9Tdg%40mail.gmail.com
2023-06-15 22:35:42 +09:00
Masahiko Sawada a54fc892ad Replace GUC_UNIT_MEMORY|GUC_UNIT_TIME with GUC_UNIT.
We used (GUC_UNIT_MEMORY | GUC_UNIT_TIME) instead of GUC_UNIT some
places but we already define it in guc.h. This commit replaces them
with GUC_UNIT for better consistency with their surrounding code.

Author: Japin Li
Reviewed-by: Richard Guo, Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/MEYP282MB1669EC0FED922F7A151673ACB65AA@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2023-06-15 17:04:19 +09:00
Amit Kapila b5c517379a Fix possible crash in tablesync worker.
Commit c3afe8cf5a added a new password_required option but forgot that you
need database access to check whether an arbitrary role ID is a superuser.

Commit e7e7da2f8d fixed a similar bug in apply worker, and this patch
fixes a similar bug in tablesync worker.

Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB571607F5A9D723755268D36294759@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-06-15 08:37:48 +05:30
Noah Misch f9f31aa91f Make parseNodeString() C idiom compatible with Visual Studio 2015.
Between v15 and now, this function's "else if" chain grew from 252 lines
to 592 lines, exceeding a compiler limit that manifests as "fatal error
C1026: parser stack overflow, program too complex (compiling source file
src/backend/nodes/readfuncs.c)".  Use "if (...)  return ...;" instead.

Reviewed by Tom Lane, Peter Eisentraut and Michael Paquier.  Not all
reviewers endorse this.

Discussion: https://postgr.es/m/20230607185458.GA1334487@rfd.leadboat.com
2023-06-14 05:31:54 -07:00
Masahiko Sawada 4327f6c748 Fix typo in comment.
Introduced in 4d330a61bb.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoDg8rTWJkrNJg9UTP89vS8smfib2c55DVqKrCn8zR-GYA@mail.gmail.com
2023-06-14 13:28:41 +09:00
Amit Langote 0f8cfaf892 Retain relkind too in RTE_SUBQUERY entries for views.
47bb9db75 modified the ApplyRetrieveRule()'s conversion of a view's
original RTE_RELATION entry into an RTE_SUBQUERY one to retain relid,
rellockmode, and perminfoindex so that the executor can lock the view
and check its permissions.  It seems better to also retain
relkind for cross-checking that the exception of an
RTE_SUBQUERY entry being allowed to carry relation details only
applies to views, so do so.

Bump catversion because this changes the output format of
RTE_SUBQUERY RTEs.

Suggested-by: David Steele <david@pgmasters.net>
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3953179e-9540-e5d1-a743-4bef368785b0%40pgmasters.net
2023-06-14 12:00:10 +09:00
Tom Lane 63e4f13d2a Fix "wrong varnullingrels" for Memoize's lateral references, too.
The issue fixed in commit bfd332b3f can also bite Memoize plans,
because of the separate copies of lateral reference Vars made
by paraminfo_get_equal_hashops.  Apply the same hacky fix there.

(In passing, clean up shaky grammar in the existing comments
for this function.)

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4-krwk0Wbd6WdufMAupuou_Ua73ijQ4XQCr1Mb5BaVtKQ@mail.gmail.com
2023-06-13 18:01:33 -04:00
Tom Lane 792213f2e9 Correctly update hasSubLinks while mutating a rule action.
rewriteRuleAction neglected to check for SubLink nodes in the
securityQuals of range table entries.  This could lead to failing
to convert such a SubLink to a SubPlan, resulting in assertion
crashes or weird errors later in planning.

In passing, fix some poor coding in rewriteTargetView:
we should not pass the source parsetree's hasSubLinks
field to ReplaceVarsFromTargetList's outer_hasSubLinks.
ReplaceVarsFromTargetList knows enough to ignore that
when a Query node is passed, but it's still confusing
and bad precedent: if we did try to update that flag
we'd be updating a stale copy of the parsetree.

Per bug #17972 from Alexander Lakhin.  This has been broken since
we added RangeTblEntry.securityQuals (although the presented test
case only fails back to 215b43cdc), so back-patch all the way.

Discussion: https://postgr.es/m/17972-f422c094237847d0@postgresql.org
2023-06-13 15:58:43 -04:00
Andres Freund e3cb1a586c Report stats when replaying XLOG_RUNNING_XACTS
Previously stats in the startup process would only get reported during
shutdown of the startup process. It has been that way for a long time, but
became a lot more noticeable with the new pg_stat_io view, which separates out
IO done by different backend types...

While replaying after every XLOG_RUNNING_XACTS isn't the prettiest approach,
it has the advantage of being quite easy. Given that we're well past feature
freeze...

It's not a problem that we don't report stats more frequently with
wal_level=minimal, in that case stats can't be read before the stats process
has shut down.

Besides the above, this commit also changes pgstat_report_stat() to acquire
the timestamp with GetCurrentTimestamp() instead of
GetCurrentTransactionStopTimestamp().

Thanks to Melih Mutlu, Kyotaro Horiguchi for prototypes of other approaches to
solving this issue.

Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/5315aedc-fbca-1556-c5de-dc2e00b23a14@oss.nttdata.com
2023-06-12 15:06:39 -07:00
Tom Lane 7398e27224 Accept fractional seconds in jsonpath's datetime() method.
Commit 927d9abb6 purported to make datetime() accept any string
that could be output for a datetime value by to_jsonb().  But it
overlooked the possibility of fractional seconds being present,
so that cases as simple as to_jsonb(now()) would defeat it.

Fix by adding formats that include ".US" to the list in
executeDateTimeMethod().  (Note that while this is nominally
microseconds, it'll do the right thing for fractions with
fewer than six digits.)

In passing, re-order the list to restore the datatype ordering
specified in its comment.  The violation accidentally did not
break anything; but the next edit might be less lucky, so add
more comments.

Per report from Tim Field.  Back-patch to v13 where datetime()
was added, like the previous patch.

Discussion: https://postgr.es/m/014A028B-5CE6-4FDF-AC24-426CA6FC9CEE@mohiohio.com
2023-06-12 10:54:44 -04:00
Noah Misch 8c7ad6f156 Add win32ver data to meson-built postgres.exe.
As in the older build systems, the resources object is not an input to
postgres.def.

Reviewed by Andres Freund.

Discussion: https://postgr.es/m/20230607231407.GC1334487@rfd.leadboat.com
2023-06-12 07:40:38 -07:00
Noah Misch 04411cbfdb Give postgres.exe the icon of other executables.
We had left it icon-free since users won't achieve much by opening it
from Windows Explorer.  Subsequent to that decision, Task Manager
started to show the icon.  That shifts the balance in favor of attaching
the icon, so do so.  No back-patch, but make this late addition to v16.

Reviewed by Andres Freund and Magnus Hagander.

Discussion: https://postgr.es/m/20230608014507.GD1334487@rfd.leadboat.com
2023-06-12 07:40:38 -07:00
Tom Lane bfd332b3fd Fix "wrong varnullingrels" for subquery nestloop parameters.
If we apply outer join identity 3 when relation C is a subquery
having lateral references to relation B, then the lateral references
within C continue to bear the original syntactically-correct
varnullingrels marks, but that won't match what is available from
the outer side of the nestloop.  Compensate for that in
process_subquery_nestloop_params().  This is a slightly hacky fix,
but we certainly don't want to re-plan C in toto for each possible
outer join order, so there's not a lot of better alternatives.

Richard Guo and Tom Lane, per report from Markus Winand

Discussion: https://postgr.es/m/DFBB2D25-DE97-49CA-A60E-07C881EA59A7@winand.at
2023-06-12 10:01:26 -04:00
Heikki Linnakangas 548d726030 Remove a few unused global variables and declarations.
- Commit 3eb77eba5a, which moved the pending ops queue from md.c to
  sync.c, introduced a duplicate, unused 'pendingOpsCxt'
  variable. (I'm surprised none of the compilers or static analysis
  tools have complained about that.)

- Commit c2fe139c20 moved the 'synchronize_seqscans' variable and
  introduced an extern declaration in tableam.h, making the one in
  guc_tables.c unnecessary.

- Commit 6f0cf87872 removed the 'pgstat_temp_directory' GUC, but
  forgot to remove the corresponding global variable.

- Commit 1b4e729eaa removed the 'pg_krb_realm' GUC, and its global
  variable, but forgot the declaration in auth.h.

Spotted all these by reading the code.
2023-06-12 16:25:37 +03:00
Peter Geoghegan d088ba5a5a nbtree: Allocate new pages in separate function.
Split nbtree's _bt_getbuf function is two: code that read locks or write
locks existing pages remains in _bt_getbuf, while code that deals with
allocating new pages is moved to a new, dedicated function called
_bt_allocbuf.  This simplifies most _bt_getbuf callers, since it is no
longer necessary for them to pass a heaprel argument.  Many of the
changes to nbtree from commit 61b313e4 can be reverted.  This minimizes
the divergence between HEAD/PostgreSQL 16 and earlier release branches.

_bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to
_bt_getbuf.  There are only 3 affected call sites, all of which continue
to pass a heaprel for recovery conflict purposes.  Note that nbtree's
use of P_NEW was superficial; nbtree never actually relied on the P_NEW
code paths in bufmgr.c, so this change is strictly mechanical.

GiST already took the same approach; it has a dedicated function for
allocating new pages called gistNewBuffer().  That factor allowed commit
61b313e4 to make much more targeted changes to GiST.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com
2023-06-10 14:08:25 -07:00