This new function is equivalent to PQpipelineSync(), except that it does
not flush anything to the server except if the size threshold of the
output buffer is reached; the user must subsequently call PQflush()
instead.
Its purpose is to reduce the system call overhead of pipeline mode, by
giving to applications more control over the timing of the flushes when
manipulating commands in pipeline mode.
Author: Anton Kirilov
Reviewed-by: Jelte Fennema-Nio, Robert Haas, Álvaro Herrera, Denis
Laxalde, Michael Paquier
Discussion: https://postgr.es/m/CACV6eE5arHFZEA717=iKEa_OewpVFfWJOmsOdGrqqsr8CJVfWQ@mail.gmail.com
When removing a useless join, we'd remove PHVs that are not used at join
partner rels or above the join. A PHV that references the join's relid
in ph_eval_at is logically "above" the join and thus should not be
removed. We have the following check for that:
!bms_is_member(ojrelid, phinfo->ph_eval_at)
However, in the case of SJE removing a useless inner join, 'ojrelid' is
set to -1, which would trigger the "negative bitmapset member not
allowed" error in bms_is_member().
Fix it by skipping examining ojrelid for inner joins in this check.
Reported-by: Zuming Jiang
Bug: #18260
Discussion: https://postgr.es/m/18260-1b6a0c4ae311b837%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Most of these tests have been introduced in 6dd8b00807, to check for
behaviors related to hashing and hash plans, and money is a data type
with btree support but no hash functions. These tests are switched to
use varbit instead, to provide the same coverage.
Some other tests historically used money but don't really need it for
what they wanted to test (see rules.sql). Plans and coverage are
unchanged after the modifications done here.
Support for money may be removed a a later point, but this needs more
discussion.
Discussion: https://postgr.es/m/18240-c5da758d7dc1ecf0@postgresql.org
When ExecBRUpdateTriggers switches to a new target tuple as a result
of the EvalPlanQual logic, it must form a new proposed update tuple.
Since commit 86dc90056, that tuple (the result of
ExecGetUpdateNewTuple) has been a virtual tuple that might contain
pointers to by-ref fields of the new target tuple (in "oldslot").
However, immediately after that we materialize oldslot, causing it to
drop its buffer pin, whereupon the by-ref pointers are unsafe to use.
This is a live bug only when the new target tuple is in a different
page than the original target tuple, since we do still hold a pin on
the original one. (Before 86dc90056, there was no bug because the
EPQ plantree would hold a pin on the new target tuple; but now that's
not assured.) To fix, forcibly materialize the new tuple before we
materialize oldslot. This costs nothing since we would have done that
shortly anyway.
The real-world impact of this is probably minimal. A visible failure
could occur if the new target tuple's buffer were recycled for some
other page in the short interval before we materialize newslot within
the trigger-calling loop; but that's quite unlikely given that we'd
just touched that page. There's a larger hazard that some other
process could prune and repack that page within the window. We have
lock on the new target tuple, but that wouldn't prevent it being moved
on the page.
Alexander Lakhin and Tom Lane, per bug #17798 from Alexander Lakhin.
Back-patch to v14 where 86dc90056 came in.
Discussion: https://postgr.es/m/17798-0907404928dcf0dd@postgresql.org
The pg_amcheck test reports a skip message if the layout of the index
does not match expectations. That message includes the bytes that
were expected and the ones that were found. But the found ones are
arbitrary bytes, which can have funny effects on the terminal when
they are printed. To avoid that, escape non-word characters before
printing.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/3f96f079-64e5-468a-8a19-cb481f0d31e5%40eisentraut.org
We've long had a policy that any toasted fields in a catalog tuple
should be pulled in-line before entering the tuple in a catalog cache.
However, that requires access to the catalog's toast table, and we'll
typically do AcceptInvalidationMessages while opening the toast table.
So it's possible that the catalog tuple is outdated by the time we
finish detoasting it. Since no cache entry exists yet, we can't
mark the entry stale during AcceptInvalidationMessages, and instead
we'll press forward and build an apparently-valid cache entry. The
upshot is that we have a race condition whereby an out-of-date entry
could be made in a backend's catalog cache, and persist there
indefinitely causing indeterminate misbehavior.
To fix, use the existing systable_recheck_tuple code to recheck
whether the catalog tuple is still up-to-date after we finish
detoasting it. If not, loop around and restart the process of
searching the catalog and constructing cache entries from the top.
The case is rare enough that this shouldn't create any meaningful
performance penalty, even in the SearchCatCacheList case where
we need to tear down and reconstruct the whole list.
Indeed, the case is so rare that AFAICT it doesn't occur during
our regression tests, and there doesn't seem to be any easy way
to build a test that would exercise it reliably. To allow
testing of the retry code paths, add logic (in USE_ASSERT_CHECKING
builds only) that randomly pretends that the recheck failed about
one time out of a thousand. This is enough to ensure that we'll
pass through the retry paths during most regression test runs.
By adding an extra level of looping, this commit creates a need
to reindent most of SearchCatCacheMiss and SearchCatCacheList.
I'll do that separately, to allow putting those changes in
.git-blame-ignore-revs.
Patch by me; thanks to Alexander Lakhin for having built a test
case to prove the bug is real, and to Xiaoran Wang for review.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us
Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
This changes the pg_attribute field attstattarget into a nullable
field in the variable-length part of the row. If no value is set by
the user for attstattarget, it is now null instead of previously -1.
This saves space in pg_attribute and tuple descriptors for most
practical scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108
to 104.) Also, null is the semantically more correct value.
The ANALYZE code internally continues to represent the default
statistics target by -1, so that that code can avoid having to deal
with null values. But that is now contained to the ANALYZE code.
Only the DDL code deals with attstattarget possibly null.
For system columns, the field is now always null. The ANALYZE code
skips system columns anyway.
To set a column's statistics target to the default value, the new
command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET
STATISTICS -1 still works.)
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
A superuser may create a subscription with password_required=true, but
which uses a connection string without a password.
Previously, if the owner of such a subscription was changed to a
non-superuser, the non-superuser was able to utilize a password from
another source (like a password file or the PGPASSWORD environment
variable), which should not have been allowed.
This commit adds a step to re-validate the connection string before
connecting.
Reported-by: Jeff Davis
Author: Vignesh C
Reviewed-by: Peter Smith, Robert Haas, Amit Kapila
Discussion: https://www.postgresql.org/message-id/flat/e5892973ae2a80a1a3e0266806640dae3c428100.camel%40j-davis.com
Backpatch-through: 16
BuildDescForRelation() has all the knowledge for converting a
ColumnDef into pg_attribute/tuple descriptor. ATExecAddColumn() can
make use of that, instead of duplicating all that logic. We just pass
a one-element list of ColumnDef and we get back exactly the data
structure we need. Note that we don't even need to touch
BuildDescForRelation() to make this work.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
It missed a entry for tmp_check/ generated by the tests. While on it,
append a slash at the beginning of "pg_walsummary" to restrict its check
to the current directory, like anywhere else.
Oversights in ee1bfd1683.
jit.c and dfgr.c had a copy of the same code to check if a file exists
or not, with a twist: jit.c did not check for EACCES when failing the
stat() call for the path whose existence is tested. This refactored
routine will be used by an upcoming patch.
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
This is a rework of 7021d3b176, where we relied on forcing
max_logical_replication_workers to 0 in the postgres command. This
commit now prevents logical replication launchers to start using -b and
a backend-side check based on IsBinaryUpgrade, effective when upgrading
from 17 and newer versions.
This commit improves the comments explaining why this restriction is
necessary.
This discussion was on hold until we were sure how to add support for
subscribers in pg_upgrade, something now done thanks to 9a17be1e24.
Reviewed-by: Álvaro Herrera, Amit Kapila, Tom Lane
Discussion: https://postgr.es/m/ZU2TeVkUg5qEi7Oy@paquier.xyz
The code for wrapping subquery output expressions in PlaceHolderVars
believed that if the expression already was a PlaceHolderVar, it was
never necessary to wrap that in another one. That's wrong if the
expression is underneath an outer join and involves a lateral
reference to outside that scope: failing to add an additional PHV
risks evaluating the expression at the wrong place and hence not
forcing it to null when the outer join should do so. This is an
oversight in commit 9e7e29c75, which added logic to forcibly wrap
lateral-reference Vars in PlaceHolderVars, but didn't see that the
adjacent case for PlaceHolderVars needed the same treatment.
The test case we have for this doesn't fail before 4be058fe9, but now
that I see the problem I wonder if it is possible to demonstrate
related errors before that. That's moot though, since all such
branches are out of support.
Per bug #18284 from Holger Reise. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18284-47505a20c23647f8@postgresql.org
Instead, just have lazy_scan_prune() and lazy_scan_noprune() update
LVRelState->nonempty_pages directly. This makes the two functions
more similar and also removes makes lazy_scan_noprune need one fewer
output parameters.
Melanie Plageman, reviewed by Andres Freund, Michael Paquier, and me
Discussion: http://postgr.es/m/CAAKRu_btji_wQdg=ok-5E4v_bGVxKYnnFFe7RA6Frc1EcOwtSg@mail.gmail.com
pg_combinebackup had various problems:
* strncpy was used in various places where strlcpy should be used
instead, to avoid any possibility of the result not being
\0-terminated.
* scan_for_existing_tablespaces() failed to close the directory,
and an error when opening the directory was reported with the
wrong pathname.
* write_reconstructed_file() contained some redundant and therefore
dead code.
* flush_manifest() didn't check the result of pg_checksum_update()
as we do in other places, and misused a local pathname variable
that shouldn't exist at all.
In pg_basebackup, the wrong variable name was used in one place,
due to a copy and paste that was not properly adjusted.
In blkreftable.c, the loop incorrectly doubled chunkno instead of
max_chunks. Fix that. Also remove a nearby assertion per repeated
off-list complaints from Tom Lane.
Per Coverity and subsequent code inspection by me and by Tom Lane.
Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be useful to external tools.
Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
This makes it possible to access information about the progress
of WAL summarization from SQL. The previously-added functions
pg_available_wal_summaries() and pg_wal_summary_contents() only
examine on-disk state, but this function exposes information from
the server's shared memory.
Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
During upgrade, when pg_restore performs CREATE DATABASE, bgwriter or
checkpointer may flush buffers and hold a file handle for pg_largeobject,
so later TRUNCATE pg_largeobject command will fail if OS (such as older
Windows versions) doesn't remove an unlinked file completely till it's
open. The probability of seeing this behavior is higher in this test
because we use wal_level as logical via allows_streaming => 'logical'
which in turn set shared_buffers as 1MB and make it more probable for
bgwriter to hold the file handle.
Diagnosed-by: Alexander Lakhin
Author: Hayato Kuroda, Amit Kapila
Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A@TYAPR01MB5866.jpnprd01.prod.outlook.com
In commit 3e51b278d I (tgl) caused initdb to leave lc_messages and
other lc_xxx GUCs commented-out in the installed postgresql.conf file
if they were going to be set to 'C'. This was a hack for cosmetic
purposes, and it was buggy because lc_messages' wired-in default is
not 'C' but '' (empty string). That led to --no-locale not having
the expected effect, since the postmaster would then obtain
lc_messages from its startup environment.
Let's just revert to the prior behavior of always de-commenting the
lc_xxx entries; the argument for changing that longstanding behavior
was weak in the first place.
Also, fix postgresql.conf.sample's erroneous claim that the default
value of lc_messages is 'C'. I suspect that was what misled me into
making this mistake in the first place.
Report and patch by Kyotaro Horiguchi. Back-patch to v16 where
the problem was introduced.
Discussion: https://postgr.es/m/20231122.162700.1995154567625541112.horikyota.ntt@gmail.com
Many psql backslash commands tolerate trailing semicolons, even
though that's not part of the official syntax. These did not.
They tried to, by passing semicolon = true to psql_scan_slash_option,
but that function ignored this parameter in OT_WHOLE_LINE mode.
Teach it to do the right thing, and remove the now-duplicative
logic in exec_command_help.
Discussion: https://postgr.es/m/2012251.1704746912@sss.pgh.pa.us
These were not testing the same thing as the comparable Assert
in calc_nestloop_required_outer(), because we neglected to map
the given Paths' relids to top-level relids. When considering
a partition child join the latter is the correct thing to do.
This oversight is old, but since it's only an overly-weak Assert
check there doesn't seem to be much value in back-patching.
Richard Guo (with cosmetic changes and comment updates by me)
Discussion: https://postgr.es/m/CAMbWs49sqbe9GBZ8sy8dSfKRNURgicR85HX8vgzcgQsPF0XY1w@mail.gmail.com
Commit 9d9c02ccd, which added the notion of a "run condition" for
window functions, neglected to teach nodeFuncs.c to process the new
field. Remarkably, that doesn't seem to have had any ill effects
before we invented Var.varnullingrels, but now it can cause visible
failures in join-removal scenarios.
I have no faith that there's not reachable problems in v15 too,
so back-patch the code change to v15 where 9d9c02ccd came in.
The test case seems irrelevant to v15, though.
Per bug #18277 from Zuming Jiang. Diagnosis and patch by
Richard Guo.
Discussion: https://postgr.es/m/18277-089ead83b329a2fd@postgresql.org
On Windows, cmd.exe is used to launch the postmaster process to ease its
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations, which could influence the
postmaster startup. This patch adds /D flag to the launcher cmd.exe
command line to disable autorun settings written in the registry.
This is arguably a bug, but no backpatch is done now out of caution.
Reported-by: Hayato Kuroda
Author: Kyotaro Horiguchi
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/20230922.161551.320043332510268554.horikyota.ntt@gmail.com
Both lwlocknames.txt and wait_event_names.txt contain a list of all
the predefined LWLocks, i.e., those with predefined positions
within MainLWLockArray. It is easy to miss one or the other,
especially since the list in wait_event_names.txt omits the "Lock"
suffix from all the LWLock wait events. This commit adds a cross-
check of these lists to the script that generates lwlocknames.h.
If the lists do not match exactly, building will fail.
Suggested-by: Robert Haas
Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot
Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13
Essentially this moves the non-interactive part of psql's "\password"
command into an exported client function. The password is not sent to the
server in cleartext because it is "encrypted" (in the case of scram and md5
it is actually hashed, but we have called these encrypted passwords for a
long time now) on the client side. This is good because it ensures the
cleartext password is never known by the server, and therefore won't end up
in logs, pg_stat displays, etc.
In other words, it exists for the same reason as PQencryptPasswordConn(), but
is more convenient as it both builds and runs the "ALTER USER" command for
you. PQchangePassword() uses PQencryptPasswordConn() to do the password
encryption. PQencryptPasswordConn() is passed a NULL for the algorithm
argument, hence encryption is done according to the server's
password_encryption setting.
Also modify the psql client to use the new function. That provides a builtin
test case. Ultimately drivers built on top of libpq should expose this
function and its use should be generally encouraged over doing ALTER USER
directly for password changes.
Author: Joe Conway
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/flat/b75955f7-e8cc-4bbd-817f-ef536bacbe93%40joeconway.com
The target relation for INSERT/UPDATE/DELETE/MERGE has a different behavior
than other relations in EvalPlanQual() and RETURNING clause. This is why we
forbid target relation to be either source or target relation in SJE.
It's not clear if we could ever support this.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/b9e8f460-f9a6-0e9b-e8ba-60d59f0bc22c%40gmail.com
When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to
innerrel_is_unique_ext(). That might lead to an invalid match to cache entries
made by previous non self-join checking calls. Add UniqueRelInfo.self_join
flag to prevent such cases. Also, fix that SJE should require a strict match
of outerrelids to make sure UniqueRelInfo.extra_clauses are valid.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com
This replaces dblink's blocking libpq calls, allowing cancellation and
allowing DROP DATABASE (of a database not involved in the query). Apart
from explicit dblink_cancel_query() calls, dblink still doesn't cancel
the remote side. The replacement for the blocking calls consists of
new, general-purpose query execution wrappers in the libpqsrv facility.
Out-of-tree extensions should adopt these. Use them in postgres_fdw,
replacing a local implementation from which the libpqsrv implementation
derives. This is a bug fix for dblink. Code inspection identified the
bug at least thirteen years ago, but user complaints have not appeared.
Hence, no back-patch for now.
Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
Since commit 599b33b94, this function assumed that every RTE_RELATION
RangeTblEntry would have an associated RelOptInfo. But that's not so:
we only build RelOptInfos for relations that are scanned by the query.
In particular the target of an INSERT won't have one, so that Vars
appearing in an INSERT ... RETURNING list will not have an associated
RelOptInfo. This apparently wasn't a problem before commit f7816aec2
taught examine_simple_variable() to drill down into CTEs containing
INSERT RETURNING, but it is now.
To fix, add a fallback code path that gets the userid to use directly
from the RTEPermissionInfo associated with the RTE. (Sadly, we must
have two code paths, because not every RTE has a RTEPermissionInfo
either.)
Per report from Alexander Lakhin. No back-patch, since the case is
apparently unreachable before f7816aec2.
Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com
During the calculations of the maximum for the number of buckets, take into
account that later we round that to the next power of 2.
Reported-by: Karen Talarico
Bug: #16925
Discussion: https://postgr.es/m/16925-ec96d83529d0d629%40postgresql.org
Author: Thomas Munro, Andrei Lepikhov, Alexander Korotkov
Reviewed-by: Alena Rybakina
Backpatch-through: 12
When the SJE code handles the transfer of qual clauses from the removed
relation to the remaining one, it replaces the Vars of the removed
relation with the Vars of the remaining relation for each clause, and
then reintegrates these clauses into the appropriate restriction or join
clause lists, while attempting to avoid duplicates.
However, the code compares RestrictInfo->clause to determine if two
clauses are duplicates. This is just flat wrong. Two RestrictInfos
with the same clause can have different required_relids,
incompatible_relids, is_pushed_down, and so on. This can cause qual
clauses to be mistakenly omitted, leading to wrong results.
This patch fixes it by comparing the entire RestrictInfos not just their
clauses ignoring 'rinfo_serial' field (otherwise almost all RestrictInfos will
be unique). Making 'rinfo_serial' equal_ignore would break other code. This
is why this commit implements our own comparison function for checking the
equality of RestrictInfos.
Reported-by: Zuming Jiang
Bug: #18261
Discussion: https://postgr.es/m/18261-2a75d748c928609b%40postgresql.org
Author: Richard Guo
Support referencing a composite-type variable in %TYPE.
Remove the undocumented, untested, and pretty useless ability
to have the subject of %TYPE be an (unqualified) type name.
You get the same result by just not writing %TYPE.
Add or adjust some test cases to improve code coverage here.
Discussion: https://postgr.es/m/716852.1704402127@sss.pgh.pa.us
A typo has been introduced by 31966b151e when updating the state of a
local buffer when a temporary relation is extended, for the case of a
block included in the relation range extended, when it is already found
in the hash table holding the local buffers. In this case, BM_VALID
should be cleared, but the buffer state was changed so as BM_VALID
remained while clearing the other flags.
As reported on the thread, it was possible to corrupt the state of the
local buffers on ENOSPC, but the states would be corrupted on any kind
of ERROR during the relation extend (like partial writes or some other
errno).
Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Richard Guo, Alexander Lakhin, Michael Paquier
Discussion: https://postgr.es/m/18259-6e256429825dd435@postgresql.org
Backpatch-through: 16
If we have DECHIST statistics about the argument expression, use
the average number of distinct elements as the array length estimate.
(It'd be better to use the average total number of elements, but
that is not currently calculated by compute_array_stats(), and
it's unclear that it'd be worth extra effort to get.)
To do this, we have to change the signature of estimate_array_length
to pass the "root" pointer. While at it, also change its result
type to "double". That's probably not really necessary, but it
avoids any risk of overflow of the value extracted from DECHIST.
All existing callers are going to use the result in a "double"
calculation anyway.
Paul Jungwirth, reviewed by Jian He and myself
Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com
Many foreach loops only use the ListCell pointer to retrieve the
content of the cell, like so:
ListCell *lc;
foreach(lc, mylist)
{
int myint = lfirst_int(lc);
...
}
This commit adds a few convenience macros that automatically
declare the loop variable and retrieve the current cell's contents.
This allows us to rewrite the previous loop like this:
foreach_int(myint, mylist)
{
...
}
This commit also adjusts a few existing loops in order to add
coverage for the new/adjusted macros. There is presently no plan
to bulk update all foreach loops, as that could introduce a
significant amount of back-patching pain. Instead, these macros
are primarily intended for use in new code.
Author: Jelte Fennema-Nio
Reviewed-by: David Rowley, Alvaro Herrera, Vignesh C, Tom Lane
Discussion: https://postgr.es/m/CAGECzQSwXKnxGwW1_Q5JE%2B8Ja20kyAbhBHO04vVrQsLcDciwXA%40mail.gmail.com
This adds a new ALTER TABLE subcommand ALTER COLUMN ... SET EXPRESSION
that changes the generation expression of a generated column.
The syntax is not standard but was adapted from other SQL
implementations.
This command causes a table rewrite, using the usual ALTER TABLE
mechanisms. The implementation is similar to and makes use of some of
the infrastructure of the SET DATA TYPE subcommand (for example,
rebuilding constraints and indexes afterwards). The new command
requires a new pass in AlterTablePass, and the ADD COLUMN pass had to
be moved earlier so that combinations of ADD COLUMN and SET EXPRESSION
can work.
Author: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com
1349d2790 added code to allow DISTINCT and ORDER BY aggregates to work
more efficiently by using presorted input. That commit added some code
that made use of the AggState's tmpcontext and adjusted the
ecxt_outertuple and ecxt_innertuple slots before checking if the current
row is distinct from the previously seen row. That code forgot to set the
TupleTableSlots back to what they were originally, which could result in
errors such as:
ERROR: attribute 1 of type record has wrong type
This only affects aggregate functions which have multiple arguments when
DISTINCT is used. For example: string_agg(DISTINCT col, ', ')
Thanks to Tom Lane for identifying the breaking commit.
Bug: #18264
Reported-by: Vojtěch Beneš
Discussion: https://postgr.es/m/18264-e363593d7e9feb7d@postgresql.org
Backpatch-through: 16, where 1349d2790 was added
This patch changes the existing 'conflicting' field to 'conflict_reason'
in pg_replication_slots. This new field indicates the reason for the
logical slot's conflict with recovery. It is always NULL for physical
slots, as well as for logical slots which are not invalidated. The
non-NULL values indicate that the slot is marked as invalidated. Possible
values are:
wal_removed = required WAL has been removed.
rows_removed = required rows have been removed.
wal_level_insufficient = the primary doesn't have a wal_level sufficient
to perform logical decoding.
The existing users of 'conflicting' column can get the same answer by
using 'conflict_reason' IS NOT NULL.
Author: Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz
CheckPWChallengeAuth() would return STATUS_ERROR if the user does not
exist or has no password assigned, even if the client disconnected
without responding to the password challenge (as libpq often will,
for example). We should return STATUS_EOF in that case, and the
lower-level functions do, but this code level got it wrong since the
refactoring done in 7ac955b34. This breaks the intent of not logging
anything for EOF cases (cf. comments in auth_failed()) and might
also confuse users of ClientAuthentication_hook.
Per report from Liu Lang. Back-patch to all supported versions.
Discussion: https://postgr.es/m/b725238c-539d-cb09-2bff-b5e6cb2c069c@esgyn.cn
Second attempt at 283a95da92. Since we can't reorder the enum values
of JsonPathItemType, instead reorder the switch cases where they are
used to generally follow the order of the enum values, for better
maintainability.
This reverts commit 283a95da92.
The reordering of JsonPathItemType affects the binary on-disk
compatibility of the jsonpath type, so we must not change it. Revert
for now and consider.
Swap the arguments to TimestampDifferenceMilliseconds so that we get
a positive answer instead of zero.
Then use the result of that computation instead of ignoring it.
Per reports from Alexander Lakhin.
Discussion: http://postgr.es/m/8b686764-7ac1-74c3-70f9-b64685a2535f@gmail.com
New TAP tests added by commits 9a17be1e and 4710b67d missed to convert
warnings to FATAL. This commit fixes that.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Some of the TAP tests have been historically setting the environment
variable PGDATABASE to 'postgres', which is not needed because
PostgreSQL::Test::Cluster already sets it when initialized. This commit
removes these explicit setups.
Note that the dependency of cluster -a with PGDATABASE (from
1caef31d9e) is still documented.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXLAz5dW3ZP+Fec8g6jQMMmDyCVT+qdbye2h7QJJmhsdw@mail.gmail.com
Avoid leaving a dangling pointer in the unlikely event that
nsphash_create fails. Improve comments, and fix formatting by
adding typedefs.list entries.
Discussion: https://postgr.es/m/3972900.1704145107@sss.pgh.pa.us
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.
Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker). Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.
To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.
The subscription's replication origin is needed to ensure that we don't
replicate anything twice.
To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.
pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.
Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
The brinbuildCallbackParallel callback used by parallel BRIN builds did
not consider that the parallel table scans may be synchronized, starting
from an arbitrary block and then wrap around.
If this happened and the scan actually did wrap around, tuples from the
beginning of the table were added to the last range produced by the same
worker. The index would be missing range at the beginning of the table,
while the last range would be too wide. This would not produce incorrect
query results, but it'd be less efficient.
Fixed by checking for both past and future ranges in the callback. The
worker may produce multiple summaries for the same page range, but the
leader will merge them as if the summaries came from different workers.
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:
* The "spool" grouping tuplesort and the heap/index is not necessary.
The heap/index are available as separate arguments, causing confusion.
So remove the spool, and use the tuplesort directly.
* The new tuplesort variant does not need the heap/index, as it sorts
simply by the range block number, without accessing the tuple data.
So simplify that too.
Initial report and patch by Ranier Vilela, further cleanup by me.
Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
Commit 16671ba6e7 moved the code that sends "sorry, too many clients
already" and other such messages, but it had the effect that we would
send that error even if the the startup packet processing failed, e.g.
because the client sent an invalid startup packet. That was not
intentional.
Spotted while reading the code again.
When enabled (default off), this logs a backtrace anytime elog() or an
equivalent ereport() for internal errors is called.
This is not well covered by the existing backtrace_functions, because
there are many equally-worded low-level errors in many functions. And
if you find out where the error is, then you need to manually rewrite
the elog() to ereport() to attach the errbacktrace(), which is
annoying. Having a backtrace automatically on every elog() call could
be very helpful during development for various kinds of common errors
from palloc, syscache, node support, etc.
Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests. Occasionally, these scripts produce warnings. These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.
This changes all warnings into fatal errors, by replacing
use warnings;
by
use warnings FATAL => 'all';
in all Perl files.
Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
If the underlying table isn't being dumped, it's useless to dump
an extended statistics object; it'll just cause errors at restore.
We have always applied similar policies to, say, indexes.
(When and if we get cross-table stats objects, it might be profitable
to think a little harder about what to do with them. But for now
there seems no point in considering a stats object as anything but
an appendage of its table.)
Rian McGuire and Tom Lane, per report from Rian McGuire.
Back-patch to supported branches.
Discussion: https://postgr.es/m/7075d3aa-3f05-44a5-b68f-47dc6a8a0550@buildkite.com
This function was originally coded with a handmade expansion
of the array subscripts. We can do it a little faster and far
more legibly today, by using unnest() WITH ORDINALITY.
While at it, let's apply the rowcount estimation support that exists
for the underlying unnest() function: reduce the default ROWS estimate
to 100 and attach array_unnest_support. I'm not sure that
array_unnest_support can do anything useful today with the call sites
that exist in information_schema, but it can't hurt, and the existing
default rowcount of 1000 is surely much too high for any of these
cases.
The psql.sql regression script is using _pg_expandarray() as a
test case for \sf+. While we could keep doing so, the new one-line
function body makes a poor test case for \sf+ row-numbering, so
switch it to print another information_schema function.
Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us
add_file_to_manifest declared its mtime argument as pg_time_t,
apparently on the principle that copy-and-paste from the backend
is fine. However, the callers are passing struct stat's st_mtime
field which is plain time_t, and add_file_to_manifest itself is
passing the value to gmtime(3) which expects plain time_t,
so the whole thing would not work at all on any platform where
those types are different. Fortunately we can just switch this
variable to time_t.
Per warnings from assorted buildfarm members.
The code was passing a scalar argument to node->restart(), but it was
expecting a hash, which causes a warning from Perl ("Odd number of
elements in hash assignment").
But the node->restart() function doesn't take a mode argument anyway.
This was probably copied from an incorrect comment (see commit
750c59d7ec). The default restart mode is already "fast", so the test
should still be semantically correct without explicitly specifying the
mode.
Discussion: https://www.postgresql.org/message-id/e3f4bf1b-63d3-408a-b07e-d35a0fdf1b98@eisentraut.org
Recently-introduced code in reconstruct.c was using "unsigned"
to store the result of read(), pg_pread(), or write(). This is
completely bogus: it breaks subsequent tests for the result being
negative, as we're being reminded of by a chorus of buildfarm
warnings. Switch to "int" as was doubtless intended. (There are
several other uses of "unsigned" in this file that also look poorly
chosen to me, but for now I'm just trying to clean up the buildfarm.)
A larger problem is that "int" is not necessarily wide enough to hold
the result: per POSIX, all these functions return ssize_t. In places
where the requested read or write length clearly fits in int, that's
academic. It may be academic anyway as long as we constrain
individual data files to 1GB, since even a readv or writev-like
operation would then not be responsible for transferring more than
1GB. Nonetheless it seems like trouble waiting to happen, so I made
a pass over readv and writev calls and fixed the result variables
where that seemed appropriate. We might want to think about changing
some of the fd.c functions to return ssize_t too, for future-proofing;
but I didn't tackle that here.
Discussion: https://postgr.es/m/1672202.1703441340@sss.pgh.pa.us
I don't think there's a real problem here, because if we reach
the loop over 'tles' then we will either find at least one
TimeLineHistoryEntry such that oldest_segno != 0, in which case
unsummarized_lsn will be initialized, or else unsummarized_tli
will remain 0 and an error will occur before unsummarized_lsn
is used for anything. But some compilers are complainining, as
reported on list by Nathan Bossart and off-list by Andrew Dunstan.
Discussion: http://postgr.es/m/20231223215147.GA69623@nathanxps13
e0b1ee17dc introduced optimization for matching B-tree scan keys required for
the directional scan. However, it incorrectly assumed that all keys required
for opposite direction scan are satisfied by _bt_first(). It has been
illustrated that with multiple scan keys over the same column, a lesser one
(according to the scan direction) could win leaving the other one unsatisfied.
Instead of relying on _bt_first() this commit introduces code that memorizes
whether there was at least one match on the page. If that's true we know that
keys required for opposite-direction scan are satisfied as soon as
corresponding values are not NULLs.
Also, this commit simplifies the description for the optimization of keys
required for the current direction scan. Now the flag used for this is named
continuescanPrechecked and means exactly that *continuescan flag is known
to be true for the last item on the page.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com
Reviewed-by: Pavel Borisov
It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData.
This commit makes it an argument of the _bt_readpage() function. We can easily
distinguish first-time and repeated calls (within the scan) of this function.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com
Reviewed-by: Pavel Borisov
There are a lot of situations when we share the same pointer to a Bitmapset
structure across different places. In order to evade undesirable side effects
replace_relid() function should always return a copy.
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
set_config_option() bails out early if it detects that the option to
be set is PGC_BACKEND or PGC_SU_BACKEND class and we're reading the
config file in a postmaster child; we don't want to apply any new
value in such a case. That's fine as far as it goes, but it fails
to consider the requirements of the pg_file_settings view: for that,
we need to check validity of the value even though we have no
intention to apply it. Because we didn't, even very silly values
for affected GUCs would be reported as valid by the view. There
are only half a dozen such GUCs, which perhaps explains why this
got overlooked for so long.
Fix by continuing when changeVal is false; this parallels the logic
in some other early-exit paths.
Also, the check added by commit 924bcf4f1 to prevent GUC changes in
parallel workers seems a few bricks shy of a load: it's evidently
assuming that ereport(elevel, ...) won't return. Make sure we
bail out if it does. The lack of trouble reports suggests that
this is only a latent bug, i.e. parallel workers don't actually
reach here with elevel < ERROR. (Per the code coverage report,
we never reach here at all in the regression suite.) But we clearly
don't want to risk proceeding if that does happen.
Per report from Rıdvan Korkmaz. These are ancient bugs, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/2089235.1703617353@sss.pgh.pa.us
Like commit 388e80132, use "#pragma GCC system_header" to silence
warnings appearing within the Python headers, since newer Python
versions no longer worry about some restrictions we still use like
-Wdeclaration-after-statement.
This patch improves on 388e80132 by inventing a separate wrapper
header file, allowing the pragma to be tightly scoped to just
the Python headers and not other stuff we have laying about in
plpython.h. I applied the same technique to plperl for the same
reason: the original patch suppressed warnings for a good deal
of our own code, not only the Perl headers.
Like the previous commit, back-patch to supported branches.
Peter Eisentraut and Tom Lane
Discussion: https://postgr.es/m/ae523163-6d2a-4b81-a875-832e48dec502@eisentraut.org
This will eventually be replaced once some translations exist
for pg_combinebackup's messages. In the meantime, we need
something here to prevent "make" from complaining in
builds with --enable-nls. Per advice added in 88dad06b4.
Self-join removal appears to be safe to apply with placeholder variables
as long as we handle PlaceHolderVar in replace_varno_walker() and replace
relid in phinfo->ph_lateral.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
This commit also retires sje_walker. This increases the generalty of replacing
varno in the parse tree and simplifies the code.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.
Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.
The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions. This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.
System catalog is changed. Catversion is bumped.
Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
Using a scale factor large enough so as the number of rows to insert
gets larger than INT32_MAX would cause an infinite loop in
initPopulateTable(), preventing pgbench to finish its initialization.
Oversight in e35cc3b3f2 that has refactored the data generation logic.
Author: John Hsu
Reviewed-by: Tatsuo Ishii, Japin Li
Discussion: https://postgr.es/m/CA+-JvFvHsOafjHcuFPfkyouHNZvbOXhBNhwZxKm3WNgYz9bwzA@mail.gmail.com
Commit 664d75753 pulled 010_tab_completion.pl's infrastructure for
invoking an interactive psql session out into a generally-useful test
function, but it didn't move enough stuff. We need to set up various
environment variables that readline will look at, both to ensure
stability of test results and to prevent test actions from cluttering
the calling user's ~/.psql_history. Expecting calling scripts to
remember to do that is too failure-prone: the other existing caller
001_password.pl did not do it. Hence, remove those initialization
steps from 010_tab_completion.pl and put them into interactive_psql().
Since interactive_psql was already making a local ENV hash, this has
no effect on calling scripts.
Discussion: https://postgr.es/m/794610.1703182896@sss.pgh.pa.us
When a column is dropped, the fields attacl, attoptions, and
attfdwoptions were kept unchanged. This is probably harmless, but it
seems wasteful, and leaves potentially dangling data lying around (for
example, attacl could contain references to users that are later also
dropped).
Change this to set those fields to null when a column is marked as
dropped.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/249d819d-1763-4580-8110-0bf91a0f08b7@eisentraut.org
Up to now, our distribution tarballs have included a plain-text form
of the installation.sgml chapter. The rationale for that was that a
recipient might not have either ready internet access or HTML-viewing
tools; a theory that seems downright quaint today. Maintaining the
ability to generate this file is not without cost, because it puts
special requirements on installation.sgml that are often overlooked.
Moreover, we are moving in the direction of making our distribution
tarballs be pure git snapshots for traceability/reproducibility
reasons; including generated files doesn't fit into that plan.
Hence, let's just drop INSTALL and remove the infrastructure for
generating it. The top-level README will now recommend visiting
our website to see the installation instructions. As a useful
side-effect, we can get rid of README.git which has provoked
confusion.
Discussion: https://postgr.es/m/20231220114927.faccqqprmuyrzdip@alap3.anarazel.de
Discussion: https://postgr.es/m/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
Commit 1301c80b21 removed some infrastructure needed to check
windows-oriented perl scripts. It also removed most such scripts, but
this one was left over. We repair the damage by making Win32::Registry a
conditional requirement that is only loaded on Windows. With this change
`perl -cw win32tzlist.pl` once again passes on non-Windows machines.
Discussion: https://postgr.es/m/a2bd77fd-61b8-4c2b-b12e-3e22ae260f82@eisentraut.org
Another oversight in dc2123400, visible when building/testing
in the source directory. (There's a lot of stuff we could
simplify if we stop supporting that case, but for now it's
still mainstream.)
This is necessary when spgcanreturn() is invoked on a partitioned
index, and the failure might be reachable in other scenarios as
well. The rest of what spgGetCache() does is perfectly sensible
for a partitioned index, so we should allow it to go through.
I think the main takeaway from this is that we lack sufficient test
coverage for non-btree partitioned indexes. Therefore, I added
simple test cases for brin and gin as well as spgist (hash and
gist AMs were covered already in indexing.sql).
Per bug #18256 from Alexander Lakhin. Although the known test case
only fails since v16 (3c569049b), I've got no faith at all that there
aren't other ways to reach this problem; so back-patch to all
supported branches.
Discussion: https://postgr.es/m/18256-0b0e1b6e4a620f1b@postgresql.org
Fix a bug during MERGE if a cross-partition update is attempted on a
partitioned table with a BEFORE DELETE ROW trigger that returns NULL,
to prevent the update. This would cause an error to be thrown, or an
assert failure in an assert-enabled build.
This was an oversight in 9321c79c86, which failed to properly
distinguish a DELETE prevented by a trigger from one prevented by a
concurrent update. Fix by having ExecDelete() return the TM_Result
status to ExecCrossPartitionUpdate(), so that it can distinguish the
two cases, and make ExecCrossPartitionUpdate() return the TM_Result
status to ExecUpdateAct(), so that it can return the correct status
from a concurrent update.
In addition, ensure that the command tag is correctly updated by
having ExecMergeMatched() pass canSetTag to ExecUpdateAct(), rather
than passing false, so that it updates the command tag if it does a
cross-partition update, making this code path in ExecMergeMatched()
consistent with ExecUpdate().
Per bug #18238 from Alexander Lakhin. Back-patch to v15, where MERGE
was introduced.
Dean Rasheed, reviewed by Richard Guo and Jian He.
Discussion: https://postgr.es/m/18238-2f2bdc7f720180b9%40postgresql.org
This is a function that makes a node jump by N WAL segments, which is
something a couple of tests have been relying on for some cases related
to streaming, replication slot limits and logical decoding on standbys.
Hence, this centralizes the logic, while making it cheaper by relying on
pg_logical_emit_message() to emit WAL records before switching to a new
segment.
Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi, Euler Taveira
Discussion: https://postgr.es/m/CALj2ACU3R8QFCvDewHCMKjgb2w_-CMCyd6DAK=Jb-af14da5eg@mail.gmail.com
Commit 6af179395 added isCatalogRel field to some WAL record types,
but this field was not shown in the rmgr descriptions. This commit
changes the several rmgr descriptions to display the isCatalogRel
field.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/957dc8f9-2a02-4640-9c01-9dcbf97c4187%40gmail.com
--show-diff becomes --diff, and --silent-diff becomes --check. These
options may now be given together. Without --check, --diff will exit
with a zero status even if diffs are found. With --check, it will now
exit with a non-zero status in that case.
Author: Tristan Partin
Reviewed-by: Daniel Gustafsson, Jelte Fennema-Nio
Discussion: https://postgr.es/m/CXLX2XYTH9S6.140SC6Y61VD88@neon.tech
The pg_dump compression was using strdup() instead of pg_strdup()
and failed to check the returned pointer for out-of-memory before
dereferencing it. Fix by using pg_strdup() instead which probably
was the intention here in the original patch.
Backpatch to v16 where pg_dump compression was introduced.
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CC661D60-6F4C-474D-B9CF-E789ACA3CEFC@yesql.se
Backpatch-through: 16
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
First, mark the xlblocks member with InvalidXLogRecPtr, then issue a
write barrier, then initialize it. That ensures that the xlblocks
member doesn't appear valid while the contents are being initialized.
In preparation for reading WAL buffer contents without a lock.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com
Reviewed-by: Andres Freund
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that. The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.
All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed. Some adjustments
for the documentation are planned in a follow-up set of changes.
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
This makes it possible for the code to be easily reused by other
client-side tools, and/or by the server.
Patch by me. Review of this patch in particular by at least Peter
Eisentraut; reviewers for the patch series in general include Dilip
Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak.
Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
It's at least theoretically possible to overflow int32 when adding up
column width estimates to make a row width estimate. (The bug example
isn't terribly convincing as a real use-case, but perhaps wide joins
would provide a more plausible route to trouble.) This'd lead to
assertion failures or silly planner behavior. To forestall it, make
the relevant functions compute their running sums in int64 arithmetic
and then clamp to int32 range at the end. We can reasonably assume
that MaxAllocSize is a hard limit on actual tuple width, so clamping
to that is simply a correction for dubious input values, and there's
no need to go as far as widening width variables to int64 everywhere.
Per bug #18247 from RekGRpth. There've been no reports of this issue
arising in practical cases, so I feel no need to back-patch.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a
faster version of MemoryContextAllocZero(), but modern compilers turn
the MemSetLoop() into a call to memset() anyway, making it more or
less identical to MemoryContextAllocZero(). That was the only user of
MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast().
- Convert newNode() to a static inline function. When this was
originally originally written, it was written as a macro because
testing showed that gcc didn't inline the size check as we
intended. Modern compiler versions do, and now that it just calls
palloc0() there is no size-check to inline anyway.
One nice effect is that the palloc0() takes one less argument than
MemoryContextAllocZeroAligned(), which saves a few instructions in the
callers of newNode().
Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro
Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
Currently, we give a misleading error if the user omits to pass the
required parameter 'proto_version' in SQL API
pg_logical_slot_get_changes() or during START_REPLICATION protocol
message. The error raised is as follows which indicates that the wrong
version is passed.
ERROR: client sent proto_version=0 but server only supports protocol 1 or higher
Author: Emre Hasegeli
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/CAE2gYzwdwtUbs-tPSV-QBwgTubiyGD2ZGsSnAVsDfAGGLDrGOA@mail.gmail.com
The value was double in the original implementation of this logic.
Commit da08a6598 pulled it out into a subroutine, but carelessly
declared the parameter as int when it should have been double.
On most platforms, the only ill effect would be to clamp the value
to be not more than INT_MAX, which would seldom be exceeded and
probably wouldn't change the estimates too much anyway. Nonetheless,
it's wrong and can cause complaints from ubsan.
While here, improve the comments and parameter names.
This is an ABI change in a globally exposed subroutine, so
back-patching would create some risk of breaking extensions.
The value of the fix doesn't seem high enough to warrant taking
that risk, so fix in HEAD only.
Per report from Alexander Lakhin.
Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
Presently, all platforms implement atomic exchanges by performing
an atomic compare-and-swap in a loop until it succeeds. This can
be especially expensive when there is contention on the atomic
variable. This commit optimizes atomic exchanges on many platforms
by using compiler intrinsics, which should compile into something
much less expensive than a compare-and-swap loop. Since these
intrinsics have been available for some time, the inline assembly
implementations are omitted.
Suggested-by: Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13
Commit dc3f9bc549 mainly targeted the JSONTYPE_NUMERIC code path.
This commit applies similar optimizations (e.g., removing
unnecessary runtime calls to strlen() and palloc()) to nearby code.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20231208203708.GA4126315%40nathanxps13
smgrreadv() and smgrwritev() and their md.c implementations call
FileReadV() and FileWriteV(). A range of disk blocks beginning at
'blocknum' and extending for 'nblocks' can be scattered to or gathered
from multiple buffers with a single system call. The traditional
smgrread() and smgrwrite() functions are implemented in terms of the new
functions.
Later commits will introduce calls with nblocks > 1, but the following
behavioral changes can be seen already:
* After a short transfer we'll now retry until we eventually read 0
bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we
would infer the reason. Retrying is consistent with xlog.c's
treatment of large WAL writes, and arguably also xlog.c and fd.c's
treatment of EINTR. Arbitrary short returns for larger transfers have
been observed on several OSes, and might in theory also happen for
transient reasons with our own pg_p*v() fallback code.
* After unexpected EOF or -1, the error thrown now talks about
a range even for the single block case, eg "blocks 42..42".
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
Originally, this routine relied on track_io_timing to check if a time
interval for an I/O operation stored in pg_stat_io should be initialized
or not. However, the addition of WAL statistics to pg_stat_io requires
that the initialization happens when track_wal_io_timing is enabled,
which is dependent on the code path where the I/O operation happens.
Author: Nazir Bilal Yavuz
Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
During the development that led to commit 357889eb17, for a time we
had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely
removed later on, before commit. Complete the removal now.
Author: Zhang Mingli <avamingli@gmail.com>
Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a
single block at a time. Add an nblocks argument so that we can do the
same for a range of blocks. This usually produces a single system call,
but might need to loop if it crosses a segment boundary. Initially it
is only called with nblocks == 1, but proposed patches will make wider
calls.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version)
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
In v16 and up (since commit afbfc0298), large object ownership
checking has been broken because object_ownercheck() didn't take care
of the discrepancy between our object-address representation of large
objects (classId == LargeObjectRelationId) and the catalog where their
ownership info is actually stored (LargeObjectMetadataRelationId).
This resulted in failures such as "unrecognized class ID: 2613"
when trying to update blob properties as a non-superuser.
Poking around for related bugs, I found that AlterObjectOwner_internal
would pass the wrong classId to the PostAlterHook in the no-op code
path where the large object already has the desired owner. Also,
recordExtObjInitPriv checked for the wrong classId; that bug is only
latent because the stanza is dead code anyway, but as long as we're
carrying it around it should be less wrong. These bugs are quite old.
In HEAD, we can reduce the scope for future bugs of this ilk by
changing AlterObjectOwner_internal's API to let the translation happen
inside that function, rather than requiring callers to know about it.
A more bulletproof fix, perhaps, would be to start using
LargeObjectMetadataRelationId as the dependency and object-address
classId for blobs. However that has substantial risk of breaking
third-party code; even within our own code, it'd create hassles
for pg_dump which would have to cope with a version-dependent
representation. For now, keep the status quo.
Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
Dead tuples are ignored and are not marked as dead during recovery, as
it can lead to MVCC issues on a standby because its xmin may not match
with the primary. This information is tracked by a field called
"xactStartedInRecovery" in the transaction state data, switched on when
starting a transaction in recovery.
Unfortunately, this information was not correctly tracked when starting
a subtransaction, because the transaction state used for the
subtransaction did not update "xactStartedInRecovery" based on the state
of its parent. This would cause index scans done in subtransactions to
return inconsistent data, depending on how the xmin of the primary
and/or the standby evolved.
This is broken since the introduction of hot standby in efc16ea520, so
backpatch all the way down.
Author: Fei Changhong
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/tencent_C4D907A5093C071A029712E73B43C6512706@qq.com
Backpatch-through: 12
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for
fd.c's virtual file descriptors. The simple FileRead() and FileWrite()
functions are now implemented in terms of the vectored functions, to
avoid code duplication, and they are converted back to the corresponding
simple system calls further down (commit 15c9ac36). Later work will
make more interesting multi-iovec calls.
The traditional behavior of reporting a "fake" ENOSPC error is
simplified. It's now always set for non-failing writes, for the benefit
of callers that expect to log a meaningful "%m" if they determine that
the write was short. (Perhaps we should consider getting rid of that
expectation one day.)
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
compute_remaining_iovec() is a re-usable routine for retrying after
pg_readv() or pg_writev() reports a short transfer. This will gain new
users in a later commit, but can already replace the open-coded
equivalent code in the existing pg_pwritev_with_retry() function.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
These two macros wouldn't work if used in an inline function definition
in a header seen by g++, because __builtin_types_compatible_p is only
available in C. Redirect to standard C++ const_cast (which also
adds/removes volatile despite its name).
Per cpluspluscheck failure in a development branch.
Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com
OpenSSL will sometimes return SSL_ERROR_SYSCALL without having set
errno; this is apparently a reflection of recv(2)'s habit of not
setting errno when reporting EOF. Ensure that we treat such cases
the same as read EOF. Previously, we'd frequently report them like
"could not accept SSL connection: Success" which is confusing, or
worse report them with an unrelated errno left over from some
previous syscall.
To fix, ensure that errno is zeroed immediately before the call,
and report its value only when it's not zero afterwards; otherwise
report EOF.
For consistency, I've applied the same coding pattern in libpq's
pqsecure_raw_read(). Bare recv(2) shouldn't really return -1 without
setting errno, but in case it does we might as well cope.
Per report from Andres Freund. Back-patch to all supported versions.
Discussion: https://postgr.es/m/20231208181451.deqnflwxqoehhxpe@awork3.anarazel.de
This removes the production json_encoding_clause_opt, instead merging
it into json_format_clause. Also remove the auxiliary
makeJsonEncoding() function.
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
This GUC was intended as a debugging help in the 9.0 area when hot
standby and streaming replication were being developped, able to offer
more information at LOG level rather than DEBUGn. There are more tools
available these days that are able to offer rather equivalent
information, like pg_waldump introduced in 9.3. It is not obvious how
this facility is useful these days, so let's remove it.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
The apply worker needs to update the state of the subscription tables to
'READY' during the synchronization phase which requires locking the
corresponding subscription. The apply worker also waits for the
subscription tables to reach the 'SYNCDONE' state after holding the locks
on the subscription and the wait is done using WaitLatch. The 'SYNCDONE'
state is changed by tablesync workers again by locking the corresponding
subscription. Both the state updates use AccessShareLock mode to lock the
subscription, so they can't block each other. However, a backend can
simultaneously try to acquire a lock on the same subscription using
AccessExclusiveLock mode to alter the subscription. Now, the backend's
wait on a lock can sneak in between the apply worker and table sync worker
causing deadlock.
In other words, apply_worker waits for tablesync worker which waits for
backend, and backend waits for apply worker. This is not detected by the
deadlock detector because apply worker uses WaitLatch.
The fix is to release existing locks in apply worker before it starts to
wait for tablesync worker to change the state.
Reported-by: Tomas Vondra
Author: Shlok Kyal
Reviewed-by: Amit Kapila, Peter Smith
Backpatch-through: 12
Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
Remove comments that supposed that holding a pin was a useful interlock
for _bt_walk_left(). There are times when _bt_walk_left() doesn't hold
either a lock or a pin on any page, so clearly this can't be true.
_bt_walk_left() is even prepared to deal with concurrent deletion of
both the original page and any pages to its left.
Oversight in commit 2ed5b87f96.
This commit does the following:
* In datum_to_json_internal(), the call to IsValidJsonNumber() is
replaced with simplified validation code. This avoids an extra
call to strlen() in this path, and it avoids validating the
entire string (which is okay since we know we're dealing with a
numeric data type's output).
* In datum_to_json_internal(), the call to escape_json() in the
JSONTYPE_NUMERIC path is replaced with code that just surrounds
the string with quotes. In passing, some other nearby calls to
appendStringInfo() have been replaced with similar code to avoid
unnecessary calls to vsnprintf().
* In composite_to_json(), the length of the separator is now
determined at compile time to avoid unnecessary calls to
strlen().
On my machine, this speeds up a benchmark for the proposed COPY TO
(FORMAT json) command with many integers by upwards of 20%. There
are likely other code paths that could be given a similar
treatment, but that is left as a future exercise.
Reviewed-by: Jeff Davis, Tom Lane, David Rowley, John Naylor
Discussion: https://postgr.es/m/20231207231251.GB3359478%40nathanxps13
Teach _bt_binsrch (and related helper routines like _bt_search and
_bt_compare) about the initial positioning requirements of backward
scans. Routines like _bt_binsrch already know all about "nextkey"
searches, so it seems natural to teach them about "goback"/backward
searches, too. These concepts are closely related, and are much easier
to understand when discussed together.
Now that certain implementation details are hidden from _bt_first, it's
straightforward to add a new optimization: backward scans using the <
strategy now avoid extra leaf page accesses in certain "boundary cases".
Consider the following example, which uses the tenk1 table (and its
tenk1_hundred index) from the standard regression tests:
SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1;
Before this commit, nbtree would scan two leaf pages, even though it was
only really necessary to scan one leaf page. We'll now descend straight
to the leaf page containing a (12, -inf) high key instead. The scan
will locate matching non-pivot tuples with "hundred" values starting
from the value 11. The scan won't waste a page access on the right
sibling leaf page, which cannot possibly contain any matching tuples.
You can think of the optimization added by this commit as disabling an
optimization (the _bt_compare "!pivotsearch" behavior that was added to
Postgres 12 in commit dd299df8) for a small subset of cases where it was
always counterproductive.
Equivalently, you can think of the new optimization as extending the
"pivotsearch" behavior that page deletion by VACUUM has long required
(since the aforementioned Postgres 12 commit went in) to other, similar
cases. Obviously, this isn't strictly necessary for these new cases
(unlike VACUUM, _bt_first is prepared to move the scan to the left once
on the leaf level), but the underlying principle is the same.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.
The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.
The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).
This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.
Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
When building BRIN indexes, the brinbuildCallback only advances to the
next page range when seeing a tuple that doesn't belong to the current
one. This means that the index may end up missing ranges at the end of
the table, if those pages do not contain any indexable tuples.
We tend not to have completely empty pages at the end of a relation, but
this also applies to partial indexes, where the tuples may simply not
match the index predicate. This results in inefficient scans using the
affected BRIN index - without the summaries, the page ranges have to be
read and processed, which consumes I/O and possibly also CPU time.
The existing code already added empty ranges for earlier parts of the
table, this commit makes sure we add them for the ranges at the end of
the table too.
Patch by Matthias van de Meent, with review/improvements by me.
Author: Matthias van de Meent
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiMsPZg%3DxkvSF_jt4%3D69k6K7gz5B8V2wY3gCGZ%2B1BzCbQ%40mail.gmail.com
Commit 252dcb3239 introduced initdb template caching to speed up
tests by re-using initdb output. The initdb command didn't however
use the --no-clean option to preserve generated data in case initdb
crashes unlike pg_regress which does do this. This adds the option
to initdb to aid debugging.
While changing the commandline, switch to using long options for
initdb to make the code more self-documenting.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2WhSTjfK_M+Ea4GSQp8odrEOaQS8HyORd1TJUEiyXaB+rw@mail.gmail.com
This works today, and it's valuable to ensure it doesn't get broken
if/when we get around to refactoring the implementation.
Author: Nikolay Shaplov <dhyan@nataraj.su>
Discussion: https://postgr.es/m/4563991.km65PDbjlG@thinkpad-pgpro
While checking if a record could fit in the circular WAL decoding
buffer, the coding from commit 3f1ce973 used arithmetic that could
overflow. 64 bit systems were unaffected for various technical reasons,
which probably explains the lack of problem reports. Likewise for 32
bit systems running known 32 bit kernels. The systems at risk of
problems appear to be 32 bit processes running on 64 bit kernels, with
unlucky placement in memory.
Per complaint from GCC -fsanitize=undefined -m32, while testing
variations of 039_end_of_wal.pl.
Back-patch to 15.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKH0oRPOX7DhiQ_b51sM8HqcPp2J3WA-Oen%3DdXog%2BAGGQ%40mail.gmail.com
During a pg_upgrade test using an old dump, all references to the old
regress shared library path (so, dylib or dll) are updated to point to
the library path used by the new build, to ensure a consistent
comparison between the old and new dumps.
The test previously relied on a hardcoded value of "src/test/regress/"
to build the new path value, which would point to an incorrect location
for the meson and vpath builds. This is replaced by REGRESS_SHLIB, able
to point to the correct location of the regress shared library.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/a628d8ad-a08a-2eab-4ca9-641bc82d3193@gmail.com
Backpatch-through: 15
tts_virtual_copyslot() contained an Assert that checked that the srcslot
contained <= attributes than the dstslot. This seems to be backwards as
if the srcslot contained fewer attributes then the dstslot could be left
with stale Datum values from the previously stored tuple. It might make
more sense to allow the source to contain additional attributes and only
copy the leading ones that also exist in the destination, however, that's
not what we're doing here.
Here we just remove the Assert from tts_virtual_copyslot() and add an
Assert to ExecCopySlot() to verify the attribute counts match. There
does not seem to be any places where the destination contains fewer
attributes, so instead of going to the trouble of making the code
properly handle this, let's just ensure the attribute counts match. If
this Assert fails then that will indicate that we do have cases that
require us to handle the srcslot with more attributes than the dstslot.
It seems better to only write this code if there's a genuine requirement
for it rather than write it now only to leave it untested.
Thanks to Andres Freund for helping with the analysis of this.
Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com
An invalid index is skipped when doing REINDEX CONCURRENTLY at table
level, with INDEX_CORRUPTED used as errcode. This is confusing,
because an invalid index could exist after an interruption. The errcode
is switched to OBJECT_NOT_IN_PREREQUISITE_STATE instead, as per a
suggestion from Andres Freund.
While on it, the error messages are reworded, and a hint is added,
telling how to rebuild an invalid index in this case. This has been
suggested by Noah Misch.
Discussion: https://postgr.es/m/20231118230958.4fm3fhk4ypshxopa@awork3.anarazel.de
The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up()
as a non-strict function to allow providing a better error message to callers
in case the passed slot_name is NULL. On further discussion, it seems that
it is not helpful to have a different error message for NULL input in this
function, so this patch marks the function as strict.
This patch also removes the explicit permission check to use replication
slots as this function is invoked only by superusers and instead adds an
Assert.
Reported-by: Masahiko Sawada
Author: Hayato Kuroda
Reviewed-by: Vignesh C
Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
A REINDEX CONCURRENTLY run on a table with no indexes would always pop
the topmost snapshot from the active snapshot stack, making the snapshot
handling inconsistent between the multiple-relation and single-relation
cases. This commit slightly changes the snapshot stack handling so as a
snapshot is popped only ReindexMultipleInternal() in this case after a
relation has been reindexed, fixing a problem where an event trigger
function may need a snapshot but does not have one. This also keeps the
places where PopActiveSnapshot() is called closer to each other.
While on it, this expands the existing tests to cover all the cases that
could be faced with REINDEX commands and such event triggers, for one or
more relations, with or without indexes.
This behavior is inconsistent since 5dc92b844e, but we've never had a
need for an active snapshot at the end of a REINDEX until now.
Thanks also to Jian He for the input.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/cb538743-484c-eb6a-a8c5-359980cd3a17@gmail.com
pg_test_fsync's signal_cleanup() intentionally ignores the write()
result since there's not much we could do about it, but certain
compilers make that harder than it ought to be.
This was missed in commit 52e98d4502.
Reviewed-by: Tristan Partin, Peter Eisentraut
Discussion: https://postgr.es/m/20231206161839.GA2828158%40nathanxps13
This commit removes the restriction that would not apply filters to the
dumps used for comparison in the TAP test of pg_upgrade when using the
same base version for the old and new nodes.
The previous logic would fail on Windows if loading a dump while using
the same set of binaries for the old and new nodes, as the library
dependencies updated in the old dump would append CRLFs to the dump
file as it is treated as a text file. The dump filtering logic replaces
all CRLFs (\r\n) by LFs (\n), which is able to prevent this issue.
When the old and new versions of the binaries are the same,
AdjustUpgrade removes all blank lines, removes version-based comments
generated by pg_dump and replaces CRLFs by LFs.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/60d434b9-53d9-9ea1-819b-efebdcf44e41@gmail.com
Backpatch-through: 15
The old names were too generic, and would have applied to any binary
that made use of JsonManifestParseContext. Rename to make the names
specific to pg_verifybackup, since there are plans afoot to reuse
this infrastructure.
Per suggestion from Álvaro Herrra.
Discussion: http://postgr.es/m/202311131625.o7hzq3oukuyd@alvherre.pgsql
There is no real reason to just run multiple words together when
we can instead punctuate them with marks intended for that purpose.
Per suggestion from Álvaro Herrera.
Discussion: http://postgr.es/m/202311161021.nisg7imt7kyf@alvherre.pgsql
When preparing commit 98e675ed7a I accidentally forgot to run
pgindent, which did produce a diff. Fix by adding the required
whitespace per the koel buildfarm failure.
Commit 5a991ef869 accidentally reversed the order of the tuples
and fields parameters, making the error message incorrectly refer
to 3 tuples with 1 field when IDENTIFY_SYSTEM returns 1 tuple and
3 or 4 fields. Fix by changing the order of the parameters. This
also adds a comment describing why we check for < 3 when postgres
since 9.4 has been sending 4 fields.
Backpatch all the way since the bug is almost a decade old.
Author: Tomonari Katsumata <t.katsumata1122@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18224
Backpatch-through: v12
The logic to keep the libpq command queue in sync with queries that have
been processed had a bug when errors were returned for reasons other
than problems in queries -- for example, when a connection is lost. We
incorrectly consumed an element from the command queue every time, but
this is wrong and can lead to the queue becoming empty ahead of time,
leading to later malfunction: PQgetResult would return nothing,
potentially causing the calling application to enter a busy loop.
Fix by making the SYNC queue element a barrier that can only be consumed
when a SYNC message is received.
Backpatch to 14.
Reported by: Иван Трофимов (Ivan Trofimov) <i.trofimow@yandex.ru>
Discussion: https://postgr.es/m/17948-fcace7557e449957@postgresql.org
The failed test was trying to validate that the two_phase option for a
slot is retained after the upgrade. The problem was that it didn't get
enabled before the upgrade so expecting to be enabled after the upgrade
was not right in the first place. This is a timing issue because we enable
the two_phase once the initial sync for all the tables is finished.
Normally, we wait for the two_phase state to be enabled but this test
missed that step.
Per buildfarm.
Author: Hayato Kuroda
Discussion: https://postgr.es/m/TY3PR01MB98892396D1071FC4320D6B31F586A@TY3PR01MB9889.jpnprd01.prod.outlook.com
Presently, pg_convert() allocates a new bytea and copies the result
regardless of whether any conversion actually happened. This
commit adjusts this function to return the source pointer as-is if
no conversion occurred. This optimization isn't expected to make a
tremendous difference, but it still seems worthwhile to avoid
unnecessary memory allocations.
Author: Yurii Rashkovskii
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CA%2BRLCQyknBPSWXRBQGOi6aYEcdQ9RpH9Kch4GjoeY8dQ3D%2Bvhw%40mail.gmail.com
After commit fd5e8b440d, InitProcess() is called later in the
EXEC_BACKEND startup sequence, so it's enough to set the
am_autovacuum_[launcher|worker] variables at the same place as in the
!EXEC_BACKEND case.
It draws an unnecessary error in builds compiled without thread support.
Added by commit 038f586d5f, which was backpatched to 14; though in
branch master we no longer support such builds, there's no reason to
have this there, so remove it in all branches since 14.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZW2G9Ix4nBKLcSSO@paquier.xyz