Explain where pqsignal() came from, what problem it originally solved
without assuming the reader is familiar with historical Unixen, why we
still need it, what it does for us now, and the key differences in
frontend code on Windows.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA%2BhUKG%2BRst1h3uo%2BXRgdRVnWHBa4mmj5gFbmCzZr73s-Fh_5JA%40mail.gmail.com
When using GSSAPI encryption in non-blocking mode, libpq sometimes
failed with "GSSAPI caller failed to retransmit all data needing
to be retried". The cause is that pqPutMsgEnd rounds its transmit
request down to an even multiple of 8K, and sometimes that can lead
to not requesting a write of data that was requested to be written
(but reported as not written) earlier. That can upset pg_GSS_write's
logic for dealing with not-yet-written data, since it's possible
the data in question had already been incorporated into an encrypted
packet that we weren't able to send during the previous call.
We could fix this with a one-or-two-line hack to disable pqPutMsgEnd's
round-down behavior, but that seems like making the caller work around
a behavior that pg_GSS_write shouldn't expose in this way. Instead,
adjust pg_GSS_write to never report a partial write: it either
reports a complete write, or reflects the failure of the lower-level
pqsecure_raw_write call. The requirement still exists for the caller
to present at least as much data as on the previous call, but with
the caller-visible write start point not moving there is no temptation
for it to present less. We lose some ability to reclaim buffer space
early, but I doubt that that will make much difference in practice.
This also gets rid of a rather dubious assumption that "any
interesting failure condition (from pqsecure_raw_write) will recur
on the next try". We've not seen failure reports traceable to that,
but I've never trusted it particularly and am glad to remove it.
Make the same adjustments to the equivalent backend routine
be_gssapi_write(). It is probable that there's no bug on the backend
side, since we don't have a notion of nonblock mode there; but we
should keep the logic the same to ease future maintenance.
Per bug #18210 from Lars Kanis. Back-patch to all supported branches.
Discussion: https://postgr.es/m/18210-4c6d0b14627f2eb8@postgresql.org
A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.
This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.)
The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.
In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.
Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.
Reviewed-by: Alexander Lakhin, Thomas Munro
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
The copy command formed for initial sync was using parenthesis for tables
with no columns leading to syntax error. This patch avoids adding
parenthesis for such tables.
Reported-by: Justin G
Author: Vignesh C
Reviewed-by: Peter Smith, Amit Kapila
Backpatch-through: 15
Discussion: http://postgr.es/m/18203-df37fe354b626670@postgresql.org
In replorigin_session_setup(), we were needlessly looping for
max_replication_slots even after finding an existing slot for the origin.
This shouldn't hurt us much except for probably large values of
max_replication_slots.
Author: Antonin Houska
Discussion: http://postgr.es/m/2694.1700471273@antos
As written, the query checked for an access method of type 's', which is
not an AM type supported in the core code.
Error introduced by 8586bf7ed8. As this query is not checking what it
should, backpatch all the way down.
Reviewed-by: Aleksander Alekseev
Discussion: https://postgr.es/m/ZVxJkAJrKbfHETiy@paquier.xyz
Backpatch-through: 12
The original code would miscalculate the total number of cells when the
table to print has more than ~4 billion cells, leading to an unnecessary
error. Repair by changing some computations to be 64-bits wide. Add
some necessary overflow checks.
Author: Hongxu Ma <interma@outlook.com>
Discussion: https://postgr.es/m/TYBP286MB0351B057B101C90D7C1239E6B4E2A@TYBP286MB0351.JPNP286.PROD.OUTLOOK.COM
This refactoring reduces the code in charge of creating replication
slots from two "if" block to a single one, making it slightly cleaner.
This change is possible since 1d04a59be3, that has removed the
intermediate code that existed between the two "if" blocks in charge of
initializing the output message buffer.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PtnJzqKT41Zt8pChRzba=QgCqjtfYvcf84NMj3VFJoKfw@mail.gmail.com
This commit log messages (at LOG level when log_replication_commands is
set, otherwise at DEBUG1 level) when walsenders acquire and release
replication slots. These messages help to know the lifetime of a
replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down. These messages will be
useful on production servers to debug and analyze inactive replication
slots.
Note that these messages are emitted only for walsenders but not for
backends. This is because walsenders are the ones that typically hold
replication slots for longer durations, unlike backends which hold them
for executing replication related functions.
Author: Bharath Rupireddy
Reviewed-by: Peter Smith, Amit Kapila, Alvaro Herrera
Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com
Recent commit f26c2368dc introduced a search_path cache, but left some
potential out-of-memory hazards. Simplify the code and make it safer
against OOM.
This change reintroduces one list_copy(), losing a small amount of the
performance gained in f26c2368dc. A future change may optimize away
the list_copy() again if it can be done in a safer way.
Discussion: https://postgr.es/m/e6fded24cb8a2c53d4ef069d9f69cc7baaafe9ef.camel@j-davis.com
As coded, the start block calculated by BufFileAppend() would overflow
once more than 16k files are used with a default block size. This issue
existed before b1e5c9fa9a, but there's no reason not to be clean about
it.
Per report from Coverity, with a fix suggested by Tom Lane.
The DROP STATISTICS code failed to properly lock the table, leading to
ERROR: tuple concurrently deleted
when executed concurrently with ANALYZE.
Fixed by modifying RemoveStatisticsById() to acquire the same lock as
ANALYZE. This function is called only by DROP STATISTICS, as ANALYZE
calls RemoveStatisticsDataById() directly.
Reported by Justin Pryzby, fix by me. Backpatch through 12. The code was
like this since it was introduced in 10, but older releases are EOL.
Reported-by: Justin Pryzby
Reviewed-by: Tom Lane
Backpatch-through: 12
Discussion: https://postgr.es/m/ZUuk-8CfbYeq6g_u@pryzbyj2023
Commits 146604ec43 and a898b409f6 added overflow checks to
interval_mul(), but not to interval_div(), which contains almost
identical code, and so is susceptible to the same kinds of
overflows. In addition, those checks did not catch all possible
overflow conditions.
Add additional checks to the "cascade down" code in interval_mul(),
and copy all the overflow checks over to the corresponding code in
interval_div(), so that they both generate "interval out of range"
errors, rather than returning bogus results.
Given that these errors are relatively easy to hit, back-patch to all
supported branches.
Per bug #18200 from Alexander Lakhin, and subsequent investigation.
Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
Compute size first, then allocate, then update the structure.
Previously, an out-of-memory when growing could leave the hashtable in
an inconsistent state.
Discussion: https://postgr.es/m/20231117201334.eyb542qr5yk4gilq@awork3.anarazel.de
Reviewed-by: Andres Freund
Reviewed-by: Gurjeet Singh
When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Release the lock before calling
FreeSpaceMapVacuumRange() and, while we're at it, ensure the range
includes the heap block we just vacuumed.
There are no known deadlocks or other similar issues, therefore don't
backpatch. It's certainly not good to do all this work under a lock, but it's
not frequently reached, making it not worth the risk of backpatching.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
examine_simple_variable() left this as an unimplemented case years
ago, with the result that plans for queries involving un-flattened
CTEs might be much stupider than necessary. It's not hard to extend
the existing logic for RTE_SUBQUERY cases to also be able to drill
down into CTEs, so let's do that.
There was some discussion of whether this patch breaks the idea
of a MATERIALIZED CTE being an optimization fence. We concluded
it's okay, because we already allow the outer planner level to
see the estimated width and rowcount of the CTE result, and
letting it see column statistics too seems fairly equivalent.
Basically, what we expect of the optimization fence is that the
outer query should not affect the plan chosen for the CTE query.
Once that plan is chosen, it's okay for the outer planner level
to make use of whatever information we have about it.
Jian Guo and Tom Lane, per complaint from Hans Buschmann
Discussion: https://postgr.es/m/4504e67078d648cdac3651b2960da6e7@nidsa.net
This adds alternative expected files for various tests.
In src/test/regress/sql/password.sql, we make a small change to the
test so that the CREATE ROLE still succeeds even if the ALTER ROLE
that attempts to set a password might fail. That way, the roles are
available for the rest of the test file in either case.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/dbbd927f-ef1f-c9a1-4ec6-c759778ac852%40enterprisedb.com
A few places in array_in() and plperl would report a misleading value
(always MAXDIM+1) for the number of dimensions in the input, because
we'd error out as soon as that was clearly too large rather than
scanning the entire input. There doesn't seem to be much value in
offering the true number, at least not enough to justify the extra
complication involved in trying to get it. So just remove that
parenthetical remark. We already have other places that do it
like that, anyway.
Per suggestions from Alexander Lakhin and Heikki Linnakangas.
Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds. This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER. This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.
This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.
Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
This routine has been marked as NOT_USED since 20ad43b576 from 2000,
and a patch is planned to switch the logtape/tuplestore APIs to rely on
int64 rather than long for the block nunbers, which is more portable.
Keeping it is more confusing than anything at this stage, so let's get
rid of it entirely.
Thanks for Heikki Linnakangas for the poke on this one.
Discussion: https://postgr.es/m/5047be8c-7ee6-4dd5-af76-6c916c3103b4@iki.fi
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions). Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.
Two actual bugs of this ilk are fixed here. We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest. Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism. A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.
In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after. The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.
Per bug #18097 from Jim Keener. Back-patch to all supported versions.
Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
If the tap_tests option is disabled under Meson, the TAP tests are
currently not registered at all. But this makes it harder to see what
is going on, why suddently there are fewer tests than before.
Instead, run testwrap with an option that marks the test as skipped.
That way, the total list and count of tests is constant whether the
option is enabled or not.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/ad5ec96d-69ec-317b-a137-367ea5019b61@eisentraut.org
Currently, pg_stat_reset_shared() cannot reset the counters in the view
pg_stat_slru even if it is a type of shared stats. This patch adds
support for a new value in pg_stat_reset_shared(), called "slru", able
to do that. Note that pg_stat_reset_shared(NULL) also resets SLRU
counters.
There may be a point in removing pg_stat_reset_slru() that was
introduced in 28cac71bd3 (v13~) as the new option overlaps with this
function, but we would lose the ability to reset individual SLRU
counters. This is left for future reconsideration.
Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/e3c25d72e81378e7b64f3c52e0306fc9@oss.nttdata.com
"AS" is added as a suggested keyword for CREATE TABLE for a few query
patterns, including the case where a list of columns is given in
parenthesis.
More queries can be now completed with the keywords supported for
queries in a CTAS, after:
CREATE TABLE [TEMP|TEMPORARY|UNLOGGED] <name> [ (...) ] AS
Author: Gilles Darold
Reviewed-by: Jim Jones
Discussion: https://postgr.es/m/e462b251-99a7-4abc-aedc-214688742c80@darold.net
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument. This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset(). Now
that some time has passed, this macro seems more likely to create
confusion.
This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset(). Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead. Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.
Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
Alexander reported a crash with repeated create + drop database, after
the ResourceOwner rewrite (commit b8bff07daa). That was fixed by the
previous commit, but it nevertheless seems like a good idea clear
CurrentResourceOwner earlier, because you're not supposed to use it
for anything after we start releasing it.
Reviewed-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
The comments in dsa.c suggested that areas were owned by resource
owners, but it was not in fact tracked explicitly. The DSM attachments
held by the dsa were owned by resource owners, but not the area
itself. That led to confusion if you used one resource owner to
attach or create the area, but then switched to a different resource
owner before allocating or even just accessing the allocations in the
area with dsa_get_address(). The additional DSM segments associated
with the area would get owned by a different resource owner than the
initial segment. To fix, add an explicit 'resowner' field to
dsa_area. It replaces the 'mapping_pinned' flag; resowner == NULL now
indicates that the mapping is pinned.
This is arguably a bug fix, but I'm not backpatching because it
doesn't seem to be a live bug in the back branches. In 'master', it is
a bug because commit b8bff07daa made ResourceOwners more strict so
that you are no longer allowed to remember new resources in a
ResourceOwner after you have started to release it. Merely accessing a
dsa pointer might need to attach a new DSM segment, and before this
commit it was temporarily remembered in the current owner for a very
brief period even if the DSA was pinned. And that could happen in
AtEOXact_PgStat(), which is called after the owner is already released.
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Thomas Munro, Andres Freund
Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
When search_path is changed to something that was previously set, and
no invalidation happened in between, use the cached list of namespace
OIDs rather than recomputing them. This avoids syscache lookups and
ACL checks.
Important when the search_path changes frequently, such as when set in
proconfig.
An earlier version of this patch was reviewd by Nathan Bossart. This
version simplifies a few things and is safer in case of OOM.
Discussion: https://www.postgresql.org/message-id/abf4ce8804e0e05dff8c1725ae6a8ed28b7d66e0.camel%40j-davis.com
Reviewed-by: Nathan Bossart
Previously, it thought that any plain file located under global, base,
or a tablespace directory had checksums unless it was in a short list
of excluded files. Now, it thinks that files in those directories have
checksums if parse_filename_for_nontemp_relation says that they are
relation files. (Temporary relation files don't matter because they're
excluded from the backup anyway.)
This changes the behavior if you have stray files not managed by
PostgreSQL in the relevant directories. Previously, you'd get some
kind of checksum-related complaint if such files existed, assuming
that the cluster had checksums enabled and that the base backup
wasn't run with NOVERIFY_CHECKSUMS. Now, you won't get those
complaints any more. That seems like an improvement to me, because
those files were presumably not created by PostgreSQL and so there
is no reason to think that they would be checksummed like a
PostgreSQL relation file. (If we want to complain about such files,
we should complain about them existing at all, not just about their
checksums.)
The point of this change is to make the code more consistent.
sendDir() was already calling parse_filename_for_nontemp_relation()
as part of an effort to determine which files to include in the
backup. So, it already had the information about whether a certain
file was a relation file. sendFile() then used a separate method,
embodied in is_checksummed_file(), to make what is essentially
the same determination. It's better not to make the same decision
using two different methods, especially in closely-related code.
Patch by me. Reviewed by Dilip Kumar and Álvaro Herrera. Thanks
also to Jakub Wartak and Peter Eisentraut for comments, suggestions,
and testing on the larger patch set of which this is a part.
Discussion: http://postgr.es/m/CAFiTN-snhaKkWhi2Gz5i3cZeKefun6sYL==wBoqqnTXxX4_mFA@mail.gmail.com
Discussion: http://postgr.es/m/202311141312.u4qx5gtpvfq3@alvherre.pgsql
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.
The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.
Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.
Bump catalog version.
Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.
Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p. This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.
(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)
As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary. We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.
The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/3fd0f1bc-4483-4ba9-8aa0-64765b052039@eisentraut.org
pg_stat_reset_slru currently requires an input argument, either:
- NULL to reset the SLRU counters of everything.
- A specific value to reset a single SLRU cache.
This commit adds support for a new pattern: pg_stat_reset_slru without
any argument works the same way as pg_stat_reset_slru(NULL), relying on
a DEFAULT in the function definition to handle this case. This makes
the function more consistent with 23c8c0c8f4.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CALj2ACW1VizYg01EeH_cA-7qA+4NzWVAoZ5Lw9_XYO1RRHAZbA@mail.gmail.com
checkExtensionMembership() set the DUMP_COMPONENT_SECLABEL and
DUMP_COMPONENT_POLICY flags for extension member objects, even though
we lack any infrastructure for tracking extensions' initial settings
of these properties. This is not OK. The result was that a dump
would always include commands to set these properties for extension
objects that have them, with at least three negative consequences:
1. The restoring user might not have privilege to set these properties
on these objects.
2. The properties might be incorrect/irrelevant for the version of the
extension that's installed in the destination database.
3. The dump itself might fail, in the case of RLS properties attached
to extension tables that the dumping user lacks privilege to LOCK.
(That's because we must get at least AccessShareLock to ensure that
we don't fail while trying to decompile the RLS expressions.)
When and if somebody cares to invent initial-state infrastructure for
extensions' RLS policies and security labels, we could think about
finding another way around problem #3. But in the absence of such
infrastructure, this whole thing is just wrong and we shouldn't do it.
(Note: this applies only to ordinary dumps; binary-upgrade dumps
still dump and restore extension member objects separately, with
all properties.)
Tom Lane and Jacob Champion. Back-patch to all supported branches.
Discussion: https://postgr.es/m/00d46a48-3324-d9a0-49bf-e7f0f11d1038@timescale.com
This was done particularly for geometric data types.
Reported-by: Christoph Berg
Discussion: https://postgr.es/m/YGI8Leuk0WvmNWLr@msg.df7cb.de
Co-authored-by: Kyotaro Horiguchi
Backpatch-through: master
Default privileges are represented as NULL::aclitem[] in catalog ACL
columns, while revoking all privileges leaves an empty aclitem[].
These two cases used to produce identical output in psql meta-commands
like \dp. Using something like "\pset null '(default)'" as a
workaround for spotting the difference did not work, because null
values were always displayed as empty strings by describe.c's
meta-commands.
This patch improves that with two changes:
1. Print "(none)" for empty privileges so that the user is able to
distinguish them from default privileges, even without special
workarounds.
2. Remove the special handling of null values in describe.c,
so that "\pset null" is honored like everywhere else.
(This affects all output from these commands, not only ACLs.)
The privileges shown by \dconfig+ and \ddp as well as the column
privileges shown by \dp are not affected by change #1, because the
respective aclitem[] is reset to NULL or deleted from the catalog
instead of leaving an empty array.
Erik Wienhold and Laurenz Albe
Discussion: https://postgr.es/m/1966228777.127452.1694979110595@office.mailbox.org
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two. This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing. In any case this coding seems
much clearer and more straightforward than what we had before.
This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add. Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.
Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.
Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.
Tom Lane, Heikki Linnakangas, and Jian He. Thanks to Alexander Lakhin
for the initial report and for review/testing.
Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
It's clearly stated in the comments that ginFindParents() must keep
the pin on the index's root page that's associated with the topmost
GinBtreeStack item. However, the code path for the case that the
desired downlink has been pushed down to the next index level
ignored this proviso, and would release the pin anyway if we were
still examining the root level. That led to an assertion failure
or "buffer NNNN is not owned by resource owner" error later, when
we try to release the pin again at the end of the insertion.
This is quite hard to reproduce, since it can only happen if an
index root page split occurs concurrently with our own insertion.
Thanks to Jeff Janes for finding a test case that triggers it
often enough to allow investigation.
This has been there since the beginning of GIN, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
Commit b7eda3e0e moved XidInMVCCSnapshot() from tqual.c into snapmgr.c,
but follow-up commit c91560def incorrectly updated this reference. We
could fix it, but as pointed out by Daniel Gustafsson, 1) the reader can
easily find the file that contains the definition of that function, e.g.
by grepping, and 2) this kind of reference is prone to going stale; so
let's just remove it.
Back-patch to all supported branches.
Reviewed by Daniel Gustafsson.
Discussion: https://postgr.es/m/CAPmGK145VdKkPBLWS2urwhgsfidbSexwY-9zCL6xSUJH%2BBTUUg%40mail.gmail.com
Commit 00d7fb5e2e started to use REGBUF_NO_CHANGE at a few places in the
code where we register the buffer before marking it dirty but missed
updating one of the code flows in the hash index where we free the overflow
page without any live tuples on it.
Author: Amit Kapila and Hayato Kuroda
Discussion: http://postgr.es/m/f045c8f7-ee24-ead6-3679-c04a43d21351@gmail.com
sendFileWithContent() previously got the content length by using
strlen(), assuming that the content given is always a string. Some
patches are under discussion to pass binary contents to a base backup
stream, where an arbitrary length needs to be given by the caller
instead.
The patch extends sendFileWithContent() to be able to handle this case,
where len < 0 can be used to indicate an arbitrary length rather than
rely on strlen() for the content length.
A comment in sendFileWithContent() mentioned the backup_label file.
However, this routine is used by more file types, like the tablespace
map, so adjust it in passing.
Author: David Steele
Discussion: https://postgr.es/m/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b@pgmasters.net
Currently, pg_stat_reset_shared() can use an argument to specify the
target of statistics to reset, doing nothing for NULL as it is strict.
This patch adds to pg_stat_reset_shared() the possibility to reset all
the stats types already handled in this function rather than do nothing
if the argument value given is NULL or if nothing is specified
(proisstrict is switched to false). Like previously, SLRUs are not
included in what gets reset.
The idea to use NULL or no argument to control if all the shared stats
already covered by this function should be reset has been proposed by
Andres Freund.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy,
Matthias van de Meent
Discussion: https://postgr.es/m/4291a55137ddda77cf7cc5f46e846daf@oss.nttdata.com
Three queries did not consider partitioned indexes and tables, and
surrounding comments have not been updated in a while.
Like 4b9fbd6be4, this is only cosmetic currently as no such relkinds
exist at this stage of the regression tests, but running these queries
on existing clusters could lead to incorrect results.
Author: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CACJufxGsB1ciahkNDccyxhw-Pfp_-_y+Wx+1BOdRyVVxKojAbg@mail.gmail.com
It seems that a PHV evaluated/needed at or below the self join should not have
a problem if we remove the self join. But this requires further investigation.
For now, we just do not remove self joins if the rel to be removed is laterally
referenced by PHVs.
Discussion: https://postgr.es/m/CAMbWs4-ns73VF9gi37q61G3dS6Xuos+HtryMaBh37WQn=BsaJw@mail.gmail.com
Author: Richard Guo
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.
Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com
We can simplify FieldSelect on a whole-row Var into a plain Var
for the selected field. However, we should copy the whole-row Var's
varnullingrels when we do so, because the new Var is clearly nullable
by exactly the same rels as the original. Failure to do this led to
errors like "wrong varnullingrels (b) (expected (b 3)) for Var 2/2".
Richard Guo, per bug #18184 from Marian Krucina. Back-patch to
v16 where varnullingrels was introduced.
Discussion: https://postgr.es/m/18184-5868dd258782058e@postgresql.org
When casting an interval to a time, the original code suffered from
64-bit integer overflow for inputs with a sufficiently large negative
"time" field, leading to bogus results.
Fix by rewriting the algorithm in a simpler form, that more obviously
cannot overflow. While at it, improve the test coverage to include
negative interval inputs.
Discussion: https://postgr.es/m/CAEZATCXoUKHkcuq4q63hkiPsKZJd0kZWzgKtU%2BNT0aU4wbf_Pw%40mail.gmail.com
When executing a MERGE UPDATE action, if the UPDATE is turned into a
cross-partition DELETE then INSERT, do not attempt to invoke AFTER
UPDATE ROW triggers, or any of the other post-update actions in
ExecUpdateEpilogue().
For consistency with a plain UPDATE command, such triggers should not
be fired (and typically fail anyway), and similarly, other post-update
actions, such as WCO/RLS checks should not be executed, and might also
lead to unexpected failures.
Therefore, as with ExecUpdate(), make ExecMergeMatched() return
immediately if ExecUpdateAct() reports that a cross-partition update
was done, to be sure that no further processing is done for that
tuple.
Back-patch to v15, where MERGE was introduced.
Discussion: https://postgr.es/m/CAEZATCWjBgagyNZs02vgDF0DvASYj-iHTFtXG2-nP3orZhmtcw%40mail.gmail.com
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.
Back-patch to all supported branches.
Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.
Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
When the hash table is in use, ResoureOwnerSort() moves any elements
from the small fixed-size array to the hash table, and sorts it. When
the hash table is not in use, it sorts the elements in the small
fixed-size array directly. However, ResourceOwnerSort() and
ResourceOwnerReleaseAll() had different idea on when the hash table is
in use: ResourceOwnerSort() checked owner->nhash != 0, and
ResourceOwnerReleaseAll() checked owner->hash != NULL. If the hash
table was allocated but was currently empty, you hit an assertion
failure.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/be58d565-9e95-d417-4e47-f6bd408dea4b@gmail.com
Commit b0e96f3119 introduced a bunch of recursive functions, but
failed to make them check for stack depth. This can cause the backend
to crash when operating on inheritance hierarchies several thousands
deep. Protect the code by adding the missing stack depth checks.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/b2ac2392-9727-5f76-e890-721ac80c1615@gmail.com
When PQsendFlushRequest() was added by commit 69cf1d5429, we argued
against adding a PQflush() call in it[1]. This is still the right
decision: if the user wants a flush to occur, they can just call that.
However, we failed to realize that the message bytes could still be
given to the kernel for transmitting when this can be made without
blocking. That's what pqPipelineFlush() does, and it is done for every
single other message type sent by libpq, so do that.
(When the socket is in blocking mode this may indeed block, but that's
what all the other libpq message-sending routines do, too.)
[1] https://www.postgresql.org/message-id/202106252352.5ca4byasfun5%40alvherre.pgsql
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQTxZRevRWkKodE-SnJk1Yfm4eKT+8E4Cyq3MJ9YKTnNew@mail.gmail.com
This buys back some of the performance loss that we otherwise saw from the
previous commit.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/d746cead-a1ef-7efe-fb47-933311e876a3%40iki.fi
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.
The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.
All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.
The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.
Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget. There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.
Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.
There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.
Add tests in src/test/modules/test_resowner.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
These are functions where a lot of things happen between the
ResourceOwnerEnlarge and ResourceOwnerRemember calls. It's important
that there are no unrelated ResourceOwnerRemember calls in the code in
between, otherwise the reserved entry might be used up by the
intervening ResourceOwnerRemember and not be available at the intended
ResourceOwnerRemember call anymore. I don't see any bugs here, but the
longer the code path between the calls is, the harder it is to verify.
In bufmgr.c, there is a function similar to ResourceOwnerEnlarge,
ReservePrivateRefCountEntry(), to ensure that the private refcount
array has enough space. The ReservePrivateRefCountEntry() calls were
made at different places than the ResourceOwnerEnlargeBuffers()
calls. Move the ResourceOwnerEnlargeBuffers() and
ReservePrivateRefCountEntry() calls together for consistency.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
`openssl` is an optional dependency in the meson build as it may not be
installed in an environment even if SSL libraries are around. The meson
scripts assume that, but the SSL tests thought that it was a hard
dependency, causing a meson installation to fail if `openssl` could not
be found. Like similar tests that depend on external commands, and to
be consistent with ./configure for the SSL tests, this commit makes the
command existence optional in the tests.
Author: Tristan Partin
Discussion: https://postgr.es/m/CWSX6P5OUUM5.N7B74KQ06ZP6@neon.tech
Backpatch-through: 16
false_positive_rate is a parameter that can be set with the bloom
opclass in BRIN, and setting it to a value of exactly 0.25 would trigger
an assertion in the first INSERT done on the index with value set.
The assertion changed here relied on BLOOM_{MIN|MAX}_FALSE_POSITIVE_RATE
that are somewhat arbitrary values, and specifying an out-of-range value
would also trigger a failure when defining such an index. So, as-is,
the assertion was just doubling on the min-max check of the reloption.
This is now enlarged to check that it is a correct percentage value,
instead, based on a suggestion by Tom Lane.
Author: Alexander Lakhin
Reviewed-by: Tom Lane, Shihao Zhong
Discussion: https://postgr.es/m/17969-a6c54de48026d694@postgresql.org
Backpatch-through: 14
I added it by mistake in commit 7103ebb7aa. To clean up, struct
MergeAction needs to be moved to primnodes.h from parsenodes.h. (This
forces us to also move OverridingKind to primnodes.h).
Having to add parsenodes.h to bootstrap.h as fallout is a bit
surprising, since nothing nominally needs it there. However, per
comments in bootscanner.l, it is needed so that YYSTYPE can be declared.
I think this only started with commit dac048f71e, but I didn't
actually verify that.
In passing, stop including parsenodes.h in tcopprot.h. Nothing needs it
there.
Per discussion on a patch by Ashutosh Bapat.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202311071106.6y7b2ascqjlz@alvherre.pgsql
f0efa5aec introduced the concept of "read-only" StringInfos which makes
use of an existing, possibly not NUL terminated, buffer.
Here we adjust two places that make use of StringInfos to receive data
to avoid using appendBinaryStringInfo() in cases where a NUL termination
character is not required. This saves a possible palloc() and saves
having to needlessly memcpy() from one buffer to another.
Here we adjust two places which were using appendBinaryStringInfo().
Neither of these cases seem particularly performance-critical. In the
case of XLogWalRcvProcessMsg(), the appendBinaryStringInfo() was only
appending 24 bytes. The change made here does mean that we can get rid
of the incoming_message global variable and make that local instead.
The apply_spooled_messages() case applies in logical decoding when
applying (possibly large) changes which have been serialized to a file.
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAApHDvoxYUDHwqPf-ShvchsERf1RzmkGoLwg63JNvHCkDCuyKQ@mail.gmail.com
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds. In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen. Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.
To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.
The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text. v11 psql
lacks that parameter, so omit the tests in that branch.
Our thanks to Pedro Gallegos for reporting this problem.
Security: CVE-2023-5869
transformAggregateCall() captures the datatypes of the aggregate's
arguments immediately to construct the Aggref.aggargtypes list.
This seems reasonable because the arguments have already been
transformed --- but there is an edge case where they haven't been.
Specifically, if we have an unknown-type literal in an ANY argument
position, nothing will have been done with it earlier. But if we
also have DISTINCT, then addTargetToGroupList() converts the literal
to "text" type, resulting in the aggargtypes list not matching the
actual runtime type of the argument. The end result is that the
aggregate tries to interpret a "text" value as being of type
"unknown", that is a zero-terminated C string. If the text value
contains no zero bytes, this could result in disclosure of server
memory following the text literal value.
To fix, move the collection of the aggargtypes list to the end
of transformAggregateCall(), after DISTINCT has been handled.
This requires slightly more code, but not a great deal.
Our thanks to Jingzhou Fu for reporting this problem.
Security: CVE-2023-5868
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation. We have done this consistent with established
practice at the time to not require these tools for building from a
tarball. Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.
Now this has at least two problems:
One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball. This is pretty
complicated, but it works so far for autoconf/make. It does not
currently work for meson; you can currently only build with meson from
a git checkout. Making meson builds work from a tarball seems very
difficult or impossible. One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree. So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree. So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.
Second, there is increased interest nowadays in precisely tracking the
origin of software. We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs. But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.
The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball. The tarball now only contains
what is in the git tree (*). Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant. And of course we
want to get the meson build system working universally.
This commit removes the make distprep target altogether. The make
dist target continues to do its job, it just doesn't call distprep
anymore.
(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep. This is unchanged for now.
The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep. (In practice, it is probably obsolete given
that git clean is available.)
The following programs are now hard build requirements in configure
(they were already required by meson.build):
- bison
- flex
- perl
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
It was always false in single-user mode, in autovacuum workers, and in
background workers. This had no specifically-identified security
consequences, but non-core code or future work might make it
security-relevant. Back-patch to v11 (all supported versions).
Jelte Fennema-Nio. Reported by Jelte Fennema-Nio.
Documentation says it cannot signal "a backend owned by a superuser".
On the contrary, it could signal background workers, including the
logical replication launcher. It could signal autovacuum workers and
the autovacuum launcher. Block all that. Signaling autovacuum workers
and those two launchers doesn't stall progress beyond what one could
achieve other ways. If a cluster uses a non-core extension with a
background worker that does not auto-restart, this could create a denial
of service with respect to that background worker. A background worker
with bugs in its code for responding to terminations or cancellations
could experience those bugs at a time the pg_signal_backend member
chooses. Back-patch to v11 (all supported versions).
Reviewed by Jelte Fennema-Nio. Reported by Hemanth Sandrana and
Mahendrakar Srinivasarao.
Security: CVE-2023-5870
This function implements the standard XMLTest function, which
converts text into xml text nodes. It uses the libxml2 function
xmlEncodeSpecialChars to escape predefined entities (&"<>), so
that those do not cause any conflict when concatenating the text
node output with existing xml documents.
This also adds a note in features.sgml about not supporting
XML(SEQUENCE). The SQL specification defines a RETURNING clause
to a set of XML functions, where RETURNING CONTENT or RETURNING
SEQUENCE can be defined. Since PostgreSQL doesn't support
XML(SEQUENCE) all of these functions operate with an
implicit RETURNING CONTENT.
Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://postgr.es/m/86617a66-ec95-581f-8d54-08059cca8885@uni-muenster.de
pg_resetwal had poor test coverage. There are some TAP tests, but
they all run with -n, so they don't actually test the full
functionality. (There is a non-dry-run call of pg_resetwal in the
recovery test suite, but that is incidental.)
This adds a bunch of more tests to test all the different options and
scenarios.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/0f3ab4a1-ae80-56e8-3426-6b4a02507687@eisentraut.org
On Windows, GetDataDirectoryCreatePerm() just did nothing. The way
the code in some callers is structured, this is the first function
that tries to access the data directory. So it also ends up the place
that is responsible for reporting that a data directory does not exist
or similar. Therefore, on Windows, these scenarios end up on
potentially completely different code paths.
To unify this, to make testing more consistent across platforms, have
GetDataDirectoryCreatePerm() run the stat() call on Windows as well,
even though it won't do anything with the result. That way, file
system errors are reporting to callers in the same way as on
non-Windows.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/15a59bca-0383-183c-9383-0446da9b87e1%40eisentraut.org
recurse_set_operations() uses the parse tree for the group number estimation,
because of the "varno 0" hack. At the same time 2489d76c49 made root->parse
and corresponding parent_root->simple_rte_array[]->subquery distinct copies
of the parse tree, while d3d55ce571 introduced self-join removal replacing
relid of removed relation only in one of the copies.
The present commit fixes this bug by making recurse_set_operations() call
estimate_num_groups() with the copy of the parse tree processed by self-join
removal.
In future, we may think about maintaining just one copy of the parse tree
and/or keeping removed relids as aliases.
Reported-by: Zuming Jiang
Bug: #18170
Discussion: https://postgr.es/m/flat/18170-f1d17bf9a0d58b24%40postgresql.org
Author: Richard Guo, Alexander Korotkov
Reviewed-by: Andrei Lepikhov
As per the preceding commit, GUC APIs generally expose NULL-valued
string variables as empty strings. Extend that policy to
GetConfigOption() and GetConfigOptionResetString(), eliminating
a crash hazard for unwary callers, as well as a fundamental
ambiguity in GetConfigOption()'s API.
No back-patch, since this is an API change and conceivably somebody
somewhere is depending on this corner case.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val. Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice. Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().
The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.
Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
The logical replication launcher may start apply workers during an
upgrade. This could be the cause of corruptions on a new cluster if
these are able to apply changes before the physical files are copied
over to the new cluster.
The chance of being able to do so is small as pg_upgrade uses its own
port and unix domain directory (the latter is customizable with
--socketdir), but just preventing the launcher to start is safer at the
end, because we are then sure that no changes will be applied. Like
29d0a77fa6 for max_slot_wal_keep_size, this is only set when a cluster
uses v17 or newer.
Author: Vignesh C
Discussion: https://postgr.es/m/CALDaNm2g9ZKf=y8X6z6MsLCuh8WwU-=Q6pLj35NFi2M5BZNS_A@mail.gmail.com
The test missed that custom GUCs need to be ignored from the list of
parameters that can exist in postgresql.conf.sample. This caused the
test to fail on a server where such a module is loaded, when using
EXTRA_INSTALL and TEMP_CONFIG, for instance.
Author: Anton A. Melnikov
Discussion: https://postgr.es/m/fc5509ce-5144-4dac-8d13-21793da44fc5@postgrespro.ru
Backpatch-through: 15
pgstatindex failed with ERRCODE_DATA_CORRUPTED, of the "can't-happen"
class XX. The other functions succeeded on an empty index; they might
have malfunctioned if the failed index build left torn I/O or other
complex state. Report an ERROR in statistics functions pgstatindex,
pgstatginindex, pgstathashindex, and pgstattuple. Report DEBUG1 and
skip all index I/O in maintenance functions brin_desummarize_range,
brin_summarize_new_values, brin_summarize_range, and
gin_clean_pending_list. Back-patch to v11 (all supported versions).
Discussion: https://postgr.es/m/20231001195309.a3@google.com
When beginning recovery, a LOG is displayed by the startup process to
show which recovery mode will be used depending on the .signal file(s)
set in the data folder, like "standby mode", recovery up to a given
target type and value, or archive recovery.
A different patch is under discussion to simplify the startup code by
requiring the presence of recovery.signal and/or standby.signal when a
backup_label file is read. Delaying a bit this LOG ensures that the
correct recovery mode would be reported, and putting it at this position
does not make it lose its value.
While on it, this commit adds a few comments documenting a bit more the
initial recovery steps and their dependencies, and fixes an incorrect
comment format. This introduces no behavior changes.
Extracted from a larger patch by me.
Reviewed-by: David Steele, Bowen Shi
Discussion: https://postgr.es/m/ZArVOMifjzE7f8W7@paquier.xyz
When beginning recovery from a base backup by reading a backup_label
file, it may be possible that no checkpoint record is available
depending on the method used when the case backup was taken, which would
prevent recovery from beginning. In this case, the FATAL messages
issued, initially added by c900c15269, mentioned recovery.signal as
an option to do recovery but not standby.signal. Let's add it as an
available option, for clarity.
Per suggestion from Bowen Shi, extracted from a larger patch by me.
Author: Michael Paquier
Discussion: https://postgr.es/m/CAM_vCudkSjr7NsNKSdjwtfAm9dbzepY6beZ5DP177POKy8=2aw@mail.gmail.com
Historically, the statistics of the checkpointer have been always part
of pg_stat_bgwriter. This commit removes a few columns from
pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent,
renamed columns (plus a new one for the reset timestamp):
- checkpoints_timed -> num_timed
- checkpoints_req -> num_requested
- checkpoint_write_time -> write_time
- checkpoint_sync_time -> sync_time
- buffers_checkpoint -> buffers_written
The fields of PgStat_CheckpointerStats and its SQL functions are renamed
to match with the new field names, for consistency. Note that
background writer and checkpointer have been split into two different
processes in commits 806a2aee37 and bf405ba8e4. The pgstat
structures were already split, making this change straight-forward.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
This commit refactors find_tabstat_entry() so as transaction counters
for inserted, updated and deleted tuples are included in the result
returned. If a shared entry is found for a relation, its result is now
a copy of the PgStat_TableStatus entry retrieved from shared memory.
This idea has been proposed by Andres Freund.
While on it, the following SQL functions, used in system views, are
refactored with macros, in the same spirit as 83a1a1b566, reducing the
amount of code:
- pg_stat_get_xact_tuples_deleted()
- pg_stat_get_xact_tuples_inserted()
- pg_stat_get_xact_tuples_updated()
There is now only one caller of find_tabstat_entry() in the tree.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com
Handling of login trigger FATAL error could cause a timing-dependant panic of
IPC::Run. This commit excludes checks which involves handling of such errors.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/2268825.1698618066%40sss.pgh.pa.us
Among numerous other oversights, commit 482675987 neglected to fix
pg_dump to dump this new subscription option. Since the new default
is "false" while the previous behavior corresponds to "true", this
would cause legacy subscriptions to silently change behavior during
dump/reload or pg_upgrade. That seems like a bad idea. Even if it
was intended, failing to preserve the option once set in a new
installation is certainly not OK.
While here, reorder associated stanzas in pg_dump to match the
field order in pg_subscription, in hopes of reducing the impression
that all this code was written with the aid of a dartboard.
Back-patch to v16 where this new field was added.
Philip Warner (cosmetic tweaks by me)
Discussion: https://postgr.es/m/20231027042539.01A3A220F0A@thebes.rime.com.au
The original code did very little to guard against integer or floating
point overflow when computing the interval's fields. Detect any such
overflows and error out, rather than silently returning bogus results.
Joseph Koshakow, reviewed by Ashutosh Bapat and me.
Discussion: https://postgr.es/m/CAAvxfHcm1TPwH_zaGWuFoL8pZBestbRZTU6Z%3D-RvAdSXTPbKfg%40mail.gmail.com
When looping around after finding that the set-returning function
returned zero rows for the current input tuple, ExecProjectSet
neglected to reset either of the two memory contexts it's
responsible for cleaning out. Typically this wouldn't cause much
problem, because once the SRF does return at least one row, the
contexts would get reset on the next call. However, if the SRF
returns no rows for many input tuples in succession, quite a lot
of memory could be transiently consumed.
To fix, make sure we reset both contexts while looping around.
Per bug #18172 from Sergei Kornilov. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18172-9b8c5fc1d676ded3@postgresql.org
Debian recently decided to split out a bunch of "obsolete" timezone
names into a new tzdata-legacy package, which isn't installed by
default. One of these zone names is Pacific/Enderbury, and that
breaks our regression tests (on --with-system-tzdata builds)
because our default timezone abbreviations list defines PHOT as
Pacific/Enderbury.
Pacific/Enderbury got renamed to Pacific/Kanton in tzdata 2021b,
so that in distros that still have this entry it's just a symlink
to Pacific/Kanton anyway. So one answer would be to redefine PHOT
as Pacific/Kanton. However, then things would fail if the
installed tzdata predates 2021b, which is recent enough that that
seems like a real problem.
Instead, let's just remove PHOT from the default list. That seems
likely to affect nobody in the real world, because (a) it was an
abbreviation that the tzdb crew made up in the first place, with
no evidence of real-world usage, and (b) the total human population
of the Phoenix Islands is less than two dozen persons, per Wikipedia.
If anyone does use this zone abbreviation they can easily put it back
via a custom abbreviations file.
We'll keep PHOT in the Pacific.txt reference file, but change it
to Pacific/Kanton there, as that definition seems more likely to
be useful to future readers of that file.
Per report from Victor Wagner. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/20231027152049.4b5c8044@wagner.wagner.home
Add the 'checkunique' argument to bt_index_check() and bt_index_parent_check().
When the flag is specified the procedures will check the unique constraint
violation for unique indexes. Only one heap entry for all equal keys in
the index should be visible (including posting list entries). Report an error
otherwise.
pg_amcheck called with the --checkunique option will do the same check for all
the indexes it checks.
Author: Anastasia Lubennikova <lubennikovaav@gmail.com>
Author: Pavel Borisov <pashkin.elfe@gmail.com>
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CALT9ZEHRn5xAM5boga0qnrCmPV52bScEK2QnQ1HmUZDD301JEg%40mail.gmail.com
When calculating distance for interval values, the code mostly mimicked
interval_mi, i.e. it built a new interval value for the difference.
That however does not work for sufficiently distant interval values,
when the difference overflows the interval range.
Instead, we can calculate the distance directly, without constructing
the intermediate (and unnecessary) interval value.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Dean Rasheed
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
Make sure that infinite values in date/timestamp columns are treated as
if in infinite distance. Infinite values should not be merged with other
values, leaving them as outliers. The code however returned distance 0
in this case, so that infinite values were merged first. While this does
not break the index (i.e. it still produces correct query results), it
may make it much less efficient.
We don't need explicit handling of infinite date/timestamp values when
calculating distances, because those values are represented as extreme
but regular values (e.g. INT64_MIN/MAX for the timestamp type).
We don't need an exact distance, just a value that is much larger than
distanced between regular values. With the added cast to double values,
we can simply subtract the values.
The regression test queries a value in the "gap" and checks the range
was properly eliminated by the BRIN index.
This only affects minmax-multi indexes on timestamp/date columns with
infinite values, which is not very common in practice. The affected
indexes may need to be rebuilt.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
When calculating the distance between date values, make sure to subtract
them in the right order, i.e. (larger - smaller).
The distance is used to determine which values to merge, and is expected
to be a positive value. The code unfortunately did the subtraction in
the opposite order, i.e. (smaller - larger), thus producing negative
values and merging values the most distant values first.
The resulting index is correct (i.e. produces correct results), but may
be significantly less efficient. This affects all minmax-multi indexes
on date columns.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
When calculating distances for timestamp values for BRIN minmax-multi
indexes, we need to be careful about overflows for extreme values. If
the value overflows into a negative value, the index may be inefficient.
The new regression test checks this for the timestamp type by adding a
table with enough values to force range compaction/merging. The values
are close to min/max, which means a risk of overflow.
Fixed by converting the int64 values to double first, before calculating
the distance. This prevents the overflow. We may lose some precision, of
course, but that's good enough. In the worst case we build a slightly
less efficient index, but for large distances this won't matter.
This only affects minmax-multi indexes on timestamp columns, with ranges
containing values sufficiently distant to cause an overflow. That seems
like a fairly rare case in practice.
Backpatch to 14, where minmax-multi indexes were introduced.
Reported-by: Ashutosh Bapat
Reviewed-by: Ashutosh Bapat, Dean Rasheed
Backpatch-through: 14
Discussion: https://postgr.es/m/eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com
d3d55ce571 changed RelOptInfo.unique_for_rels from the list of Relid sets to
the list of UniqueRelInfo's. But it didn't make UniqueRelInfo a node.
This commit makes UniqueRelInfo a node. Also this commit revises some
comments related to RelOptInfo.unique_for_rels.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/flat/1189851.1698340331%40sss.pgh.pa.us
Two attributes related to checkpointer statistics are removed in this
commit:
- buffers_backend, that counts the number of buffers written directly by
a backend.
- buffers_backend_fsync, that counts the number of times a backend had
to do fsync() by its own.
These are actually not checkpointer properties but backend properties.
Also, pg_stat_io provides a more accurate and equivalent report of these
numbers, by tracking all the I/O stats related to backends, including
writes and fsyncs, so storing them in pg_stat_checkpointer was
redundant.
Thanks also to Robert Haas and Amit Kapila for their input.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund
Discussion: https://postgr.es/m/20230210004604.mcszbscsqs3bc5nx@awork3.anarazel.de
f0efa5aec added initReadOnlyStringInfo to allow a StringInfo to be
initialized from an existing buffer and also relaxed the requirement
that a StringInfo's buffer must be NUL terminated at data[len]. Now
that we have that, there's no need for these aggregate deserial
functions to use appendBinaryStringInfo() as that rather wastefully
palloc'd a new buffer and memcpy'd in the bytea's buffer. Instead, we can
just use the bytea's buffer and point the StringInfo directly to that
using the new initializer function.
In Amdahl's law, this speeds up the serial portion of parallel
aggregates and makes sum(numeric), avg(numeric), var_pop(numeric),
var_samp(numeric), variance(numeric), stddev_pop(numeric),
stddev_samp(numeric), stddev(numeric), array_agg(anyarray),
string_agg(text) and string_agg(bytea) scale better in parallel queries.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvr%3De-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR%3DcQ%40mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer. There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h
Here we introduce 2 new functions to initialize StringInfos. One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc. Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation. Having
this as a function allows us to verify the buffer is correctly NUL
terminated. StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.
The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all. StringInfos initialized this way
are deemed as "read-only". This means that it's not possible to
append to them or reset them.
For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.
Incompatibility note:
Here we also forego adding the NUL in a few places where it does not
seem to be required. These locations are passing the given StringInfo
into a type's receive function. It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this. It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.
Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
Currently, expand_single_inheritance_child() doesn't reset
perminfoindex in a plain-inheritance parent's child RTE, because
prior to 387f9ed0a0, the executor would use the first child RTE to
locate the parent's RTEPermissionInfo. That in turn causes
add_rte_to_flat_rtable() to create an extra RTEPermissionInfo
belonging to the parent's child RTE with the same content as the one
belonging to the parent's original ("root") RTE.
In 387f9ed0a0, we changed things so that the executor can now use the
parent's "root" RTE for locating its RTEPermissionInfo instead of the
child RTE, so the latter's perminfoindex need not be set anymore, so
make it so.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/839708.1698174464@sss.pgh.pa.us
Backpatch-through: 16
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
Instead of connecting to the server with psql to check if it is ready
for running tests, this changes pg_regress to use PQPing which avoids
performing system() calls which are expensive on some platforms, like
Windows. The frequency of tests is also increased in order to connect
to the server faster.
This patch is part of a larger effort to make testing consume fewer
resources in order to be able to fit more tests into the available
CI system constraints.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20230823192239.jxew5s3sjru63lio@awork3.anarazel.de
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not. (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.) In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:
1. The wrong table's statement-level triggers would get fired.
2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan. This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.
In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.
While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch. Given the lack of field complaints about the
trigger issue, I'll desist for now. (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)
Per bug #18147 from Hans Buschmann.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.
Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.
Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a904@iki.fi
Reviewed-by: Heikki Linnakangas
This shouldn't change behavior except in the unusual case where
there are file in the tablespace directory that have entirely
numeric names but are nevertheless not possible names for a
tablespace directory, either because their names have leading zeroes
that shouldn't be there, or the value is actually zero, or because
the value is too large to represent as an OID.
In those cases, the directory would previously have made it into
the list of tablespaceinfo objects and no longer will. Thus, base
backups will now ignore such directories, instead of treating them
as legitimate tablespace directories. Similarly, if entries for
such tablespaces occur in a tablespace_map file, they will now
be rejected as erroneous, instead of being honored.
This is infrastructure for future work that wants to be able to
know the tablespace of each relation that is part of a backup
*as an OID*. By strengthening the up-front validation, we don't
have to worry about weird cases later, and can more easily avoid
repeated string->integer conversions.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
Instead of returning the number of characters in the RelFileNumber,
return the RelFileNumber itself. Continue to return the fork number,
as before, and additionally return the segment number.
parse_filename_for_nontemp_relation now rejects a RelFileNumber or
segment number that begins with a leading zero. Before, we accepted
such cases as relation filenames, but if we continued to do so after
this change, the function might return the same values for two
different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be
annoying for callers. Since we don't actually ever generate filenames
with leading zeroes in the names, any such files that we find must
have been created by something other than PostgreSQL, and it is
therefore reasonable to treat them as non-relation files.
Along the way, change unlogged_relation_entry to store a RelFileNumber
rather than an OID. This update should have been made in
851f4cc75c, but it was overlooked.
It's trivial to make the update as part of this commit, perhaps more
trivial than it would have been without it, so do that.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
pg_logical_replication_slot_advance() included a break condition to stop
when a targeted LSN is reached, when processing a series of WAL records
with XLogReadRecord(). Since 38a957316d, it matched with the check of
its main while loop. This condition saved from an extra CFI check,
actually pointless, so let's remove this condition and simplify the
code.
In passing, fix an incorrect comment.
Author: Bharath Rupireddy
Reviewed-by: Tom Lane, Gurjeet Singh
Discussion: https://postgr.es/m/CALj2ACWfGDLQ2cy7ZKwxnJqbDkO6Yvqqrqxne5ZN4HYm=PRTGg@mail.gmail.com
When min_dynamic_shared_memory is set above 0, we try to find space in a
pre-allocated region of the main shared memory area instead of calling
dsm_impl_XXX() routines to allocate more. The dsm_pin_segment() and
dsm_unpin_segment() routines had a bug: they called dsm_impl_XXX()
routines even for main region segments. Nobody noticed before now
because those routines do nothing on Unix, but on Windows they'd fail
while attempting to duplicate an invalid Windows HANDLE. Add the
missing gating.
Back-patch to 14, where commit 84b1c63a added this feature. Fixes
pgsql-bugs bug #18165.
Reported-by: Maxime Boyer <maxime.boyer@cra-arc.gc.ca>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18165-bf4f525cea6e51de%40postgresql.org
Previously, ALTER SYSTEM failed if the target GUC wasn't present in
the session's GUC hashtable. That is a reasonable behavior for core
(single-part) GUC names, and for custom GUCs for which we have loaded
an extension that's reserved the prefix. But it's unnecessarily
restrictive otherwise, and it also causes inconsistent behavior:
you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar"
earlier in the session. That's fairly silly.
Hence, refactor things so that we can execute ALTER SYSTEM even
if the variable doesn't have a GUC hashtable entry, as long as the
name meets the custom-variable naming requirements and does not
have a reserved prefix. (It's safe to do this even if the
variable belongs to an extension we currently don't have loaded.
A bad value will at worst cause a WARNING when the extension
does get loaded.)
Also, adjust GRANT ON PARAMETER to have the same opinions about
whether to allow an unrecognized GUC name, and to throw the
same errors if not (it previously used a one-size-fits-all
message for several distinguishable conditions). By default,
only a superuser will be allowed to do ALTER SYSTEM SET on an
unrecognized name, but it's possible to GRANT the ability to
do it.
Patch by me, pursuant to a documentation complaint from
Gavin Panella. Arguably this is a bug fix, but given the
lack of other complaints I'll refrain from back-patching.
Discussion: https://postgr.es/m/2617358.1697501956@sss.pgh.pa.us
Discussion: https://postgr.es/m/169746329791.169914.16613647309012285391@wrigleys.postgresql.org
Allow the COMMUTATOR, NEGATOR, MERGES, and HASHES attributes to be set
by ALTER OPERATOR. However, we don't allow COMMUTATOR/NEGATOR to be
changed once set, nor allow the MERGES/HASHES flags to be unset once
set. Changes like that might invalidate plans already made, and
dealing with the consequences seems like more trouble than it's worth.
The main use-case we foresee for this is to allow addition of missed
properties in extension update scripts, such as extending an existing
operator to support hashing. So only transitions from not-set to set
states seem very useful.
This patch also causes us to reject some incorrect cases that formerly
resulted in inconsistent catalog state, such as trying to set the
commutator of an operator to be some other operator that already has a
(different) commutator.
While at it, move the InvokeObjectPostCreateHook call for CREATE
OPERATOR to not occur until after we've fixed up commutator or negator
links as needed. The previous ordering could only be justified by
thinking of the OperatorUpd call as a kind of ALTER OPERATOR step;
but we don't call InvokeObjectPostAlterHook therein. It seems better
to let the hook see the final state of the operator object.
In the documentation, move the discussion of how to establish
commutator pairs from xoper.sgml to the CREATE OPERATOR ref page.
Tommy Pavlicek, reviewed and editorialized a bit by me
Discussion: https://postgr.es/m/CAEhP-W-vGVzf4udhR5M8Bdv88UYnPrhoSkj3ieR3QNrsGQoqdg@mail.gmail.com