Commit Graph

43186 Commits

Author SHA1 Message Date
Amit Langote b4fad46b6b SQL/JSON: Improve some error messages
This improves some error messages emitted by SQL/JSON query functions
by mentioning column name when available, such as when they are
invoked as part of evaluating JSON_TABLE() columns.  To do so, a new
field column_name is added to both JsonFuncExpr and JsonExpr that is
only populated when creating those nodes for transformed JSON_TABLE()
columns.

While at it, relevant error messages are reworded for clarity.

Reported-by: Jian He <jian.universality@gmail.com>
Suggested-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com
2024-04-18 14:45:48 +09:00
Alexander Korotkov 40126ac68f Refactoring for CommitTransactionCommand()/AbortCurrentTransaction()
fefd9a3fed turned tail recursion of CommitTransactionCommand() and
AbortCurrentTransaction() into iteration.  However, it splits the handling of
cases between different functions.

This commit puts the handling of all the cases into
AbortCurrentTransactionInternal() and CommitTransactionCommandInternal().
Now CommitTransactionCommand() and AbortCurrentTransaction() are just doing
the repeated calls of internal functions.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240415224834.w6piwtefskoh32mv%40awork3.anarazel.de
Author: Andres Freund
2024-04-18 00:29:53 +03:00
Andres Freund 3ab8cf9275 Remove GlobalVisTestNonRemovable[Full]Horizon, not used anymore
GlobalVisTestNonRemovableHorizon() was only used for the implementation of
snapshot_too_old, which was removed in f691f5b80a. As using
GlobalVisTestNonRemovableHorizon() is not particularly efficient, no new uses
for it should be added. Therefore remove.

Discussion: https://postgr.es/m/20240415185720.q4dg4dlcyvvrabz4@awork3.anarazel.de
2024-04-17 11:21:17 -07:00
Tomas Vondra 0c2f5552d5 Cleanup parallel BRIN index build code
Commit b437571714 added support for parallel builds of BRIN indexes,
using code similar to BTREE. But there were to be a couple unnecessary
differences, particularly in how the leader waits for the workers, and
merges the results. So remove these, to make the code more similar.

The leader never waited on the workersdonecv condition variable, but
simply called WaitForParallelWorkersToFinish() in _brin_end_parallel()
and then merged the per-worker results. This worked correctly, but it
seems better to do the wait and merge before _brin_end_parallel().

This commit moves the relevant code to _brin_parallel_heapscan/merge(),
which means  _brin_end_parallel() remains responsible only for exiting
the parallel mode and accumulating WAL usage data.

Discussion: https://postgr.es/m/3733d042-71e1-6ae6-5fac-00c12db62db6@enterprisedb.com
2024-04-17 18:31:46 +02:00
Peter Eisentraut ca89db5f9d Remove dead code
The configure check for HAVE_DECL_LLVMORCREGISTERPERF was removed by
e9a9843e13, but some code guarded by it was left.  (That commit
removed the "register" calls but left the "unregister" calls.)  That
code cannot be reached anymore, so remove it.

Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/5539b16c-cff7-46d5-9621-c3fb6b549e9e@iki.fi
2024-04-17 10:48:04 +02:00
Peter Eisentraut 2e75492b3c Add missing source file to libpq/nls.mk 2024-04-17 09:11:02 +02:00
Michael Paquier 91fe092a96 Fix typos with function name in event_trigger.c
Databases exist, contrary to datatabases.

Oversight in e83d1b0c40.

Author: Japin Li
Discussion: https://postgr.es/m/ME3P282MB316611A2F7BF43919F695228B6082@ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
2024-04-17 14:56:31 +09:00
Masahiko Sawada a6d0fa5ef8 Disallow specifying ON_ERROR option without value.
The ON_ERROR option of the COPY command previously allowed omitting
its value, which was inconsistent with the syntax synopsis in the
documentation and the behavior of other non-boolean COPY options.

This change enforces providing a value for the ON_ERROR option,
ensuring consistency across other non-boolean options and aligning
with the documented syntax.

Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/a9770bf57646d90dedc3d54cf32634b2%40oss.nttdata.com
2024-04-17 11:31:27 +09:00
David Rowley 58cf2e120e Update mmgr's README to mention BumpContext
Oversight in 29f6a959c.

In passing, since we now have 4 memory context types to choose from,
provide a brief overview of the specialities of each memory context
type.

Reported-by: Amul Sul
Author: Amul Sul, David Rowley
Discussion: https://postgr.es/m/CAAJ_b94U2s9nHh--DEK=sPEZUQ+x7vQJ7529fF8UAH97QJ9NXg@mail.gmail.com
2024-04-17 10:49:09 +12:00
David Rowley 6d2fd66b99 Push dedicated BumpBlocks to the tail of the blocks list
BumpContext relies on using the head block from its 'blocks' field to
use as the current block to allocate new chunks to.  When we receive an
allocation request larger than allocChunkLimit, we place these chunks on
a new dedicated block and, until now, we pushed the block onto the
*head* of the 'blocks' list.

This behavior caused the previous bump block to no longer be available
for new normal-sized (non-large) allocations and would result in blocks
only being partially filled if a large allocation request arrived before
the block became full.

Here adjust the code to push these dedicated blocks onto the *tail* of
the blocks list so that the head block remains intact and available to
be used by normal allocation request sizes until it becomes full.

In passing, make the elog(ERROR) calls for the unsupported callbacks
consistent.  Likewise for the header comments for those functions.

Discussion: https://postgr.es/m/CAApHDvp9___r-ayJj0nZ6GD3MeCGwGZ0_6ZptWpwj+zqHtmwCw@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvqerXpzUnuDQfUEi3DZA+9=Ud9WSt3ruxN5b6PcOosx2g@mail.gmail.com
2024-04-17 10:40:31 +12:00
Peter Eisentraut 2ea5d8bece Mark some new location fields as ParseLoc
Some new code probably didn't see 605721f819 and continued to use
type int for parse location fields.  Fix those.
2024-04-16 20:22:41 +02:00
Tom Lane ec07d0d7fa Clean up more indent breakage from 6377e12a5.
Per buildfarm member koel.
2024-04-16 13:00:40 -04:00
Tom Lane 6f0cef9353 Fix assorted bugs in ecpg's macro mechanism.
The code associated with EXEC SQL DEFINE was unreadable and full of
bugs, notably:

* It'd attempt to free a non-malloced string if the ecpg program
tries to redefine a macro that was defined on the command line.

* Possible memory stomp if user writes "-D=foo".

* Undef'ing or redefining a macro defined on the command line would
change the state visible to the next file, when multiple files are
specified on the command line.  (While possibly that could have been
an intentional choice, the code clearly intends to revert to the
original macro state; it's just failing to consider this interaction.)

* Missing "break" in defining a new macro meant that redefinition
of an existing name would cause an extra entry to be added to the
definition list.  While not immediately harmful, a subsequent undef
would result in the prior entry becoming visible again.

* The interactions with input buffering are subtle and were entirely
undocumented.

It's not that surprising that we hadn't noticed these bugs,
because there was no test coverage at all of either the -D
command line switch or multiple input files.  This patch adds
such coverage (in a rather hacky way I guess).

In addition to the code bugs, the user documentation was confused
about whether the -D switch defines a C macro or an ecpg one, and
it failed to mention that you can write "-Dsymbol=value".

These problems are old, so back-patch to all supported branches.

Discussion: https://postgr.es/m/998011.1713217712@sss.pgh.pa.us
2024-04-16 12:31:42 -04:00
Peter Geoghegan c62d2ebd9e Fix nbtree "deduce NOT NULL" scan key comment.
Oversight in commit c9c0589fda.
2024-04-16 12:04:20 -04:00
Tom Lane 984c0eccd0 Undo incorrect typedefs.list change.
6377e12a5 should not have removed AcquireSampleRowsFunc,
as that's still in use.  Per buildfarm member koel.
2024-04-16 11:38:04 -04:00
Tom Lane 03107b4eda Ensure generated join clauses for child rels have correct relids.
When building a join clause derived from an EquivalenceClass, if the
clause is to be used with an appendrel child relation then make sure
its clause_relids include the relids of that child relation.
Normally this would be true already because the EquivalenceMember
would be a Var of that relation.  However, if the appendrel represents
a flattened UNION ALL construct then some child EquivalenceMembers
could be constants with no relids.  The resulting under-marked clause
is problematic because it could mislead join_clause_is_movable_into
about where the clause should be evaluated.  We do not have an example
showing incorrect plan generation, but there are existing cases in
the regression tests that will fail the Asserts this patch adds to
get_baserel_parampathinfo.  A similarly wrong conclusion about a
clause being considered by get_joinrel_parampathinfo would lead to
wrong placement of the clause.  (This also squares with the way
that clause_relids is calculated for non-equijoin clauses in
adjust_appendrel_attrs.)

The other reason for wanting these new Asserts is that the previous
blithe assumption that the results of generate_join_implied_equalities
"necessarily satisfy join_clause_is_movable_into" turns out to be
wrong pre-v16.  If it's still wrong it'd be good to find out.

Per bug #18429 from Benoît Ryder.  The bug as filed was fixed by
commit 2489d76c4, but these changes correlate with the fix we
will need to apply in pre-v16 branches.

Discussion: https://postgr.es/m/18429-8982d4a348cc86c6@postgresql.org
2024-04-16 11:22:51 -04:00
Peter Geoghegan f698704155 Fix nbtree posting list comment.
Oversight in commit 0d861bbb70.
2024-04-16 11:20:41 -04:00
Peter Geoghegan aa1def44c3 Fix nbtree page recycling comment.
Oversight in commit e5d8a99903.
2024-04-16 11:14:36 -04:00
Alexander Korotkov 6377e12a5a revert: Generalize relation analyze in table AM interface
This commit reverts 27bc1772fc and dd1f6b0c17.  Per review by Andres Freund.

Discussion: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de
2024-04-16 13:14:20 +03:00
David Rowley bea97cd02e Improve test coverage in bump.c
There were no callers of BumpAllocLarge() in the regression tests, so
here we add a sort with a tuple large enough to use that path in bump.c.

Also, BumpStats() wasn't being called, so add a test to sysviews.sql to
call pg_backend_memory_contexts() while a bump context exists in the
backend.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240414223305.m3i5eju6zylabvln@awork3.anarazel.de
2024-04-16 16:21:31 +12:00
Michael Paquier 768ceeeaa1 Add missing PGDLLIMPORT markings
All backend-side variables should be marked with PGDLLIMPORT, as per
policy introduced in 8ec569479f.  aafc05de1b has forgotten
MyClientSocket, and 05c3980e7f LoadedSSL.

These can be spotted with a command like this one (be careful of not
switching __pg_log_level):
src/tools/mark_pgdllimport.pl $(git ls-files src/include/)

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/ZhzkRzrkKhbeQMRm@paquier.xyz
2024-04-16 09:38:46 +09:00
Tom Lane e0df80828a Fix type-checking of RECORD-returning functions in FROM, redux.
Commit 2ed8f9a01 intended to institute a policy that if a
RangeTblFunction has a coldeflist, then the function return type is
certainly RECORD, and we should use the coldeflist as the source of
truth about what the columns of the record type are.  When the
original function has been folded to a constant, inspection of the
constant might give a different answer.  This situation will lead to
a tuple-type-mismatch error at execution, but up until that point we
need to consistently believe the coldeflist, or we'll have problems
from different bits of code reaching different conclusions.

expandRTE didn't get that memo though, and would try to produce a
tupdesc based on the constant in this situation, leading to an
assertion failure.  (Desultory testing suggests that non-assert
builds often manage to give the expected error, although I also
saw a "cache lookup failed for type 0" error, and it seems at
least possible that a crash could happen.)

Some other callers of get_expr_result_type and get_expr_result_tupdesc
were also being incautious about this.  While none of them seem to
have actual bugs, they're working harder than necessary in this case,
besides which it seems safest to have an explicit policy of not using
those functions on an RTE with a coldeflist.  Adjust the code
accordingly, and add commentary to funcapi.c about this policy.

Also fix an obsolete comment that claimed "get_expr_result_type()
doesn't know how to extract type info from a RECORD constant".
That hasn't been true since commit d57534740.

Per bug #18422 from Alexander Lakhin.
As with the previous commit, back-patch to all supported branches.

Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org
2024-04-15 12:56:56 -04:00
Alvaro Herrera cee8db3f68
ATTACH PARTITION: Don't match a PK with a UNIQUE constraint
When matching constraints in AttachPartitionEnsureIndexes() we weren't
testing the constraint type, which could make a UNIQUE key lacking a
not-null constraint incorrectly satisfy a primary key requirement.  Fix
this by testing that the constraint types match.  (Other possible
mismatches are verified by comparing index properties.)

Discussion: https://postgr.es/m/202402051447.wimb4xmtiiyb@alvherre.pgsql
2024-04-15 15:07:47 +02:00
Alexander Korotkov 9dfcac8e15 Grammar fixes for split/merge partitions code
The fixes relate to comments, error messages, and corresponding expected output
of regression tests.

Discussion: https://postgr.es/m/CAMbWs49DDsknxyoycBqiE72VxzL_sYHF6zqL8dSeNehKPJhkKg%40mail.gmail.com
Discussion: https://postgr.es/m/86bfd241-a58c-479a-9a72-2c67a02becf8%40postgrespro.ru
Discussion: https://postgr.es/m/CAHewXNkGMPU50QG7V6Q60JGFORfo8LfYO1_GCkCa0VWbmB-fEw%40mail.gmail.com
Author: Richard Guo, Dmitry Koval, Tender Wang
2024-04-15 16:00:02 +03:00
Alvaro Herrera c3709100be
Fix propagating attnotnull in multiple inheritance
In one of the many strange corner cases of multiple inheritance being
used, commit b0e96f3119 missed a CommandCounterIncrement() call after
updating the attnotnull flag during ALTER TABLE ADD COLUMN, which caused
a catalog tuple to be update attempted twice in the same command, giving
rise to a "tuple already updated by self" error.  Add the missing call
to solve that, and a test case that reproduces the scenario.

As a (perhaps surprising) secondary effect, this CCI addition triggers
another behavior change: when a primary key is added to a parent
partitioned table and the column in an existing partition does not have
a not-null constraint, we no longer error out.  This will probably be a
welcome change by some users, and I think it's unlikely that anybody
will miss the old behavior.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/045dec3f-9b3d-aa44-0c99-85f6992306c7@gmail.com
2024-04-15 12:20:56 +02:00
Peter Eisentraut 6ff21c0530 psql: Make output of \dD more stable
\dD showed domain check constraints in arbitrary order, which can
cause regression test failures, which was exposed by commit
9895b35cb8.  To fix, order the constraints by conname, which matches
what psql does in other queries listing constraints.
2024-04-15 09:28:48 +02:00
Peter Eisentraut 9895b35cb8 Fix ALTER DOMAIN NOT NULL syntax
This addresses a few problems with commit e5da0fe3c2 ("Catalog domain
not-null constraints").

In CREATE DOMAIN, a NOT NULL constraint looks like

    CREATE DOMAIN d1 AS int [ CONSTRAINT conname ] NOT NULL

(Before e5da0fe3c2, the constraint name was accepted but ignored.)

But in ALTER DOMAIN, a NOT NULL constraint looks like

    ALTER DOMAIN d1 ADD [ CONSTRAINT conname ] NOT NULL VALUE

where VALUE is where for a table constraint the column name would be.
(This works as of e5da0fe3c2.  Before e5da0fe3c2, this syntax
resulted in an internal error.)

But for domains, this latter syntax is confusing and needlessly
inconsistent between CREATE and ALTER.  So this changes it to just

    ALTER DOMAIN d1 ADD [ CONSTRAINT conname ] NOT NULL

(None of these syntaxes are per SQL standard; we are just living with
the bits of inconsistency that have built up over time.)

In passing, this also changes the psql \dD output to not show not-null
constraints in the column "Check", since it's already shown in the
column "Nullable".  This has also been off since e5da0fe3c2.

Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org
2024-04-15 08:34:45 +02:00
Heikki Linnakangas d21d61b96f Put back initialization of 'sslmode', to silence Coverity
Coverity pointed out that the function checks for conn->sslmode !=
NULL, which implies that it might be NULL, but later we access it
without a NULL-check anyway. It doesn't know that it is in fact always
initialized earlier, in conninfo_add_defaults(), and hence the
NULL-check is not necessary. However, there is a lot of distance
between conninfo_add_defaults() and pqConnectOptions2(), so it's not
surprising that it doesn't see that. Put back the initialization code,
as it existed before commit 05fd30c0e7, to silence the warning.

In the long run, I'd like to refactor the libpq options handling and
initalization code. It seems silly to strdup() and copy strings, for
things like sslmode that have a limited set of possible values; it
should be an enum. But that's for another day.
2024-04-14 23:02:43 +03:00
Tomas Vondra cd4b6af620 Fix unnecessary padding in incremental backups
Commit 10e3226ba1 added padding to incremental backups to ensure the
block data is properly aligned. The code in sendFile() however failed to
consider that the header may be a multiple of BLCKSZ and thus already
aligned, adding a full BLCKSZ of unnecessary padding.

Not only does this make the incremental file a bit larger, but the other
places calculating the amount of padding did realize it's not needed and
did not include it in the formula. This resulted in pg_basebackup
getting confused while parsing the data stream, trying to access files
with invalid filenames (e.g. with binary data etc.) and failing.
2024-04-14 20:37:49 +02:00
Tomas Vondra bb616ed3e6 Use the correct PG_DETOAST_DATUM macro in BRIN
Commit 6bcda4a721 replaced PG_DETOAST_DATUM with PG_DETOAST_DATUM_PACKED
in two BRIN output functions, for minmax-multi and bloom opclasses. But
this is incorrect - the code is accessing the data through structs that
already include a 4B header, so the detoast needs to match that. But the
PACKED macro may keep the 1B header, which means the struct fields will
point to incorrect data.

Backpatch-through: 16
Discussion: https://postgr.es/m/1df00a66-db5a-4e66-809a-99b386a06d86%40enterprisedb.com
2024-04-14 18:19:58 +02:00
Tomas Vondra 2f20ced1eb Update nbits_set in brin_bloom_union
Properly update the number of bits set in the bitmap after merging the
filters in brin_bloom_union.

This is mostly harmless, as the counter is used only in the output
function, which means pageinspect may show incorrect information about
the BRIN summary. The counter does not affect correctness.

Discovered while adding a regression test comparing indexes built with
and without parallelism. The parallel index builds exercise the union
procedure when merging results from workers, which is otherwise very
hard to do in a test. Which is why this went unnoticed until now.

Backpatch through 14, where the BRIN bloom opclasses were introduced.

Backpatch-through: 14
Discussion: https://postgr.es/m/1df00a66-db5a-4e66-809a-99b386a06d86%40enterprisedb.com
2024-04-14 18:07:15 +02:00
Noah Misch 9358297431 freespace: Don't return blocks past the end of the main fork.
GetPageWithFreeSpace() callers assume the returned block exists in the
main fork, failing with "could not read block" errors if that doesn't
hold.  Make that assumption reliable now.  It hadn't been guaranteed,
due to the weak WAL and data ordering of participating components.  Most
operations on the fsm fork are not WAL-logged.  Relation extension is
not WAL-logged.  Hence, an fsm-fork block on disk can reference a
main-fork block that no WAL record has initialized.  That could happen
after an OS crash, a replica promote, or a PITR restore.  wal_log_hints
makes the trouble easier to hit; a replica promote or PITR ending just
after a relevant fsm-fork FPI_FOR_HINT may yield this broken state.  The
v16 RelationAddBlocks() mechanism also makes the trouble easier to hit,
since it bulk-extends even without extension lock waiters.  Commit
917dc7d239 stopped trouble around
truncation, but vectors involving PageIsNew() pages remained.

This implementation adds a RelationGetNumberOfBlocks() call when the
cached relation size doesn't confirm a block exists.  We've been unable
to identify a benchmark that slows materially, but this may show up as
additional time in lseek().  An alternative without that overhead would
be a new ReadBufferMode such that ReadBufferExtended() returns NULL
after a 0-byte read, with all other errors handled normally.  However,
each GetFreeIndexPage() caller would then need code for the return-NULL
case.  Back-patch to v14, due to earlier versions not caching relation
size and the absence of a pre-v16 problem report.

Ronan Dunklau.  Reported by Ronan Dunklau.

Discussion: https://postgr.es/m/1878547.tdWV9SEqCh%40aivenlaptop
2024-04-13 08:34:20 -07:00
Heikki Linnakangas 4cc1c76fe9 Document PG_TEST_EXTRA=libpq_encryption and also check 'kerberos'
In the libpq encryption negotiation tests, don't run the GSSAPI tests
unless PG_TEST_EXTRA='kerberos' is also set. That makes it possible to
still run most of the tests when GSSAPI support is compiled in, but
there's no MIT Kerberos installation.
2024-04-12 19:52:39 +03:00
Heikki Linnakangas 65dfe9d167 Move libpq encryption negotiation tests
The test targets libpq's options, so 'src/test/interfaces/libpq/t' is
a more natural place for it.

While doing this, I noticed that I had missed adding the
libpq_encryption subdir to the Makefile. That's why this commit only
needs to remove it from the meson.build file.

Per Peter Eisentraut's suggestion.

Discussion: https://www.postgresql.org/message-id/09d4bf5d-d0fa-4c66-a1d7-5ec757609646@eisentraut.org
2024-04-12 19:52:37 +03:00
Heikki Linnakangas 084cae5513 Fix compilation with --with-gssapi --without-openssl
The #define is spelled ENABLE_GSS, not USE_GSS. Introduced in commit
05fd30c0e7, reported by Thomas Munro.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2BHRTtB%2Bx%2BKKKj_cfX6sNhbeGuqmGxjGMwdVPG7YGFP8w@mail.gmail.com
2024-04-12 19:52:34 +03:00
Heikki Linnakangas 0a5f229189 Fix libpq_encryption tests when compiled without SSL support
It correctly skipped tests involving SSL in the server when SSL
support was not compiled in, but even when SSL is not enabled in the
server and the connection is established without SSL, libpq behaves
differently in many of the test scenarios when libpq is compiled
without SSL support. For example, with sslmode=prefer, if libpq is
compiled with SSL support it will attempt to use SSL, but without SSL
support it will try authenticating in plaintext mode directly. The
expected test output didn't take that into account.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKG%2BHRTtB%2Bx%2BKKKj_cfX6sNhbeGuqmGxjGMwdVPG7YGFP8w@mail.gmail.com
2024-04-12 19:52:28 +03:00
Andrew Dunstan 929c05774b Don't allocate large buffer on the stack in pg_verifybackup
Per complaint from Andres Freund. Follow his suggestion to allocate the
buffer once in the calling routine instead.

Also make a tiny indentation improvement.

Discussion: https://postgr.es/m/20240411190147.a3yries632olfcgg@awork3.anarazel.de
2024-04-12 10:52:25 -04:00
Andrew Dunstan 42fa4b6601 Assorted minor cleanups in the test_json_parser module
Per gripes from Michael Paquier

Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz

Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
2024-04-12 10:32:30 -04:00
Andrew Dunstan b8a7bfa333 Shrink test file for test_json_parser module
Also delete live URLs

Jacob Champion

Discussion: https://postgr.es/m/CAOYmi+mtH=V1wZKAOauCd5QqQWr61hnXMJbJ9h-CZXAa1JXd3w@mail.gmail.com
2024-04-12 10:32:30 -04:00
Andrew Dunstan daf554dbea Add a TAP test for test_json_parser_perf
This just makes sure the test can run with a single iteration. A real
performance test would test with many more.
2024-04-12 10:32:30 -04:00
Andrew Dunstan 661ab4e185 Fix some memory leaks associated with parsing json and manifests
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.

While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
2024-04-12 10:32:30 -04:00
David Rowley b9ecefecc7 Fix recently introduced typo in code comment
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs49kAsZUsj7-0SBLvE9+uKz0RCqMEmM3NVytc1YvS8sTrQ@mail.gmail.com
2024-04-12 23:15:52 +12:00
Amit Kapila 3741f2a09d Fix the review comments and a bug in the slot sync code.
Ensure that when updating the catalog_xmin of the synced slots, it is
first written to disk before changing the in-memory value
(effective_catalog_xmin). This is to prevent a scenario where the
in-memory value change triggers a vacuum to remove catalog tuples before
the catalog_xmin is written to disk. In the event of a crash before the
catalog_xmin is persisted, we would not know that some required catalog
tuples have been removed and the synced slot would be invalidated.

Change the sanity check to ensure that remote_slot's confirmed_flush LSN
can't precede the local/synced slot during slot sync. Note that the
restart_lsn of the synced/local slot can be ahead of remote_slot. This can
happen when slot advancing machinery finds a running xacts record after
reaching the consistent state at a later point than the primary where it
serializes the snapshot and updates the restart_lsn.

Make the check to sync slots robust by allowing to sync only when the
confirmed_lsn, restart_lsn, or catalog_xmin of the remote slot is ahead of
the synced/local slot.

Reported-by: Amit Kapila and Shveta Malik
Author: Hou Zhijie, Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot
Discussion: https://postgr.es/m/OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAJpy0uCSS5zmdyUXhvw41HSdTbRqX1hbYqkOfHNj7qQ+2zn0AQ@mail.gmail.com
2024-04-12 15:10:41 +05:30
David Rowley 3af7040985 Fix IS [NOT] NULL qual optimization for inheritance tables
b262ad440 added code to have the planner remove redundant IS NOT NULL
quals and eliminate needless scans for IS NULL quals on tables where the
qual's column has a NOT NULL constraint.

That commit failed to consider that an inheritance parent table could
have differing NOT NULL constraints between the parent and the child.
This caused issues as if we eliminated a qual on the parent, when
applying the quals to child tables in apply_child_basequals(), the qual
might not have been added to the parent's baserestrictinfo.

Here we fix this by not applying the optimization to remove redundant
quals to RelOptInfos belonging to inheritance parents and applying the
optimization again in apply_child_basequals().  Effectively, this means
that the parent and child are considered independently as the parent has
both an inh=true and inh=false RTE and we still apply the optimization
to the RelOptInfo corresponding to the inh=false RTE.

We're able to still apply the optimization in add_base_clause_to_rel()
for partitioned tables as the NULLability of partitions must match that
of their parent.  And, if we ever expand restriction_is_always_false()
and restriction_is_always_true() to handle partition constraints then we
can apply the same logic as, even in multi-level partitioned tables,
there's no way to route values to a partition when the qual does not
match the partition qual of the partitioned table's parent partition.
The same is true for CHECK constraints as those must also match between
arent partitioned tables and their partitions.

Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAMbWs4930gQSZmjR7aANzEapdy61gCg6z8dT-kAEYD0sYWKPdQ@mail.gmail.com
2024-04-12 20:07:53 +12:00
Alexander Korotkov 772faafca1 Revert: Implement pg_wal_replay_wait() stored procedure
This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2,
ee79928441, and 74eaf66f98 per review by Heikki Linnakangas.

Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi
2024-04-11 17:28:15 +03:00
Alexander Korotkov 922c4c461d Revert: Allow table AM to store complex data structures in rd_amcache
This commit reverts 02eb07ea89 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:02:49 +03:00
Alexander Korotkov 8dd0bb84da Revert: Allow table AM tuple_insert() method to return the different slot
This commit reverts c35a3fb5e0 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:02:45 +03:00
Alexander Korotkov 193e6d18e5 Revert: Allow locking updated tuples in tuple_update() and tuple_delete()
This commit reverts 87985cc925 and 818861eb57 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:01:34 +03:00
Alexander Korotkov da841aa4dc Revert: Let table AM insertion methods control index insertion
This commit reverts b1484a3f19 per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 16:01:30 +03:00
Alexander Korotkov bc1e2092eb Revert: Custom reloptions for table AM
This commit reverts 9bd99f4c26 and 422041542f per review by Andres Freund.

Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de
2024-04-11 15:46:35 +03:00
Michael Paquier 8f136af3c4 Use correct datatype for xmin variables in slot.c
Two variables storing a slot's effective_xmin and effective_catalog_xmin
were saved as XLogRecPtr, which is incorrect as these should be
TransactionIds.

Oversight in 818fefd8fd.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVPSB74mrDTFezz-LV3Oi6F3SN71QA0oUHvndzi5dwTNg@mail.gmail.com
Backpatch-through: 16
2024-04-11 17:19:20 +09:00
Masahiko Sawada 810f64a015 Revert indexed and enlargable binary heap implementation.
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:18:05 +09:00
Masahiko Sawada efb8acc0d0 Replace binaryheap + index with pairingheap in reorderbuffer.c
A pairing heap can perform the same operations as the binary heap +
index, with as good or better algorithmic complexity, and that's an
existing data structure so that we don't need to invent anything new
compared to v16. This commit makes the new binaryheap functionality
that was added in commits b840508644 and bcb14f4abc unnecessary, but
they will be reverted separately.

Remove the optimization to only build and maintain the heap when the
amount of memory used is close to the limit, becuase the bookkeeping
overhead with the pairing heap seems to be small enough that it
doesn't matter in practice.

Reported-by: Jeff Davis
Author: Heikki Linnakangas
Reviewed-by: Michael Paquier, Hayato Kuroda, Masahiko Sawada
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:04:38 +09:00
Thomas Munro 942219996c Fix grammar.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZhdKqj5DwoOzirFv%40paquier.xyz
2024-04-11 14:35:42 +12:00
Thomas Munro d8f5acbdb9 Fix potential stack overflow in incremental backup.
The user can set RELSEG_SIZE to a high number at compile time, so we
can't use it to control the size of an array on the stack: it could be
many gigabytes in size.  On closer inspection, we don't really need that
intermediate array anyway.  Let's just write directly into the output
array, and then perform the absolute->relative adjustment in place.
This fixes new code from commit dc21234005.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2B2hZ0sBztPW4mkLfng0qfkNtAHFUfxOMLizJ0BPmi5%2Bg%40mail.gmail.com
2024-04-11 13:23:45 +12:00
Michael Paquier f56a9def71 Fix inconsistency with replay of hash squeeze record for clean buffers
aa5edbe379 has tweaked _hash_freeovflpage() so as the write buffer's
LSN is updated only when necessary, when REGBUF_NO_CHANGE is not used.

The replay code was not consistent with that, causing the write buffer's
LSN to be updated and its page to be marked as dirty even if the buffer
was registered in a "clean" state.  This was possible for the case of a
squeeze record when there are no tuples to add to the write buffer, for
(is_prim_bucket_same_wrt && !is_prev_bucket_same_wrt).

I have performed some validation of this commit with
wal_consistency_checking and a change in WAL that logs REGBUF_NO_CHANGE
to a new BKPIMAGE_*.  Thanks to that, it is possible to know at replay
if a buffer was clean when it was registered, then cross-checked the LSN
of the "clean" page copy coming from WAL with the LSN of the block once
the record has been replayed.  This eats one bit in bimg_info, which is
not acceptable to be integrated as-is, but it could become handy in the
future.  I didn't spot other areas than the one fixed by this commit at
the extent of what the main regression test suite covers.

As this is an oversight in aa5edbe379, no backpatch is required.

Reported-by: Zubeyr Eryilmaz
Author: Hayato Kuroda
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/ZbyVVG_7eW3YD5-A@paquier.xyz
2024-04-11 09:20:51 +09:00
Tom Lane 5392dd3d2a Fix plpgsql's handling of -- comments following expressions.
Up to now, read_sql_construct() has collected all the source text from
the statement or expression's initial token up to the character just
before the "until" token.  It normally tries to strip trailing
whitespace from that, largely for neatness.  If there was a "-- text"
comment after the expression, this resulted in removing the newline
that terminates the comment, which creates a hazard if we try to paste
the collected text into a larger SQL construct without inserting a
newline after it.  In particular this caused our handling of CASE
constructs to fail if there's a comment after a WHEN expression.

Commit 4adead1d2 noticed a similar problem with cursor arguments,
and worked around it through the rather crude hack of suppressing
the whitespace-trimming behavior for those.  Rather than do that
and leave the hazard open for future hackers to trip over, let's
fix it properly.  pl_scanner.c already has enough infrastructure
to report the end location of the expression's last token, so
we can copy up to that location and never collect any trailing
whitespace or comment to begin with.

Erik Wienhold and Tom Lane, per report from Michal Bartak.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAAVzF_FjRoi8fOVuLCZhQJx6HATQ7MKm=aFOHWZODFnLmjX-xA@mail.gmail.com
2024-04-10 15:45:58 -04:00
Michael Paquier 5105c90796 Make GIN tests using injection points concurrent-safe
f587338dec has introduced in the test module injection_points a SQL
function called injection_points_set_local(), that can be used to make
all the injection points linked to the process where they are attached,
discarded automatically if any remain once the process exits.

e2e3b8ae9e has added a NO_INSTALLCHECK to the test module to prevent
the use of installcheck.  Now that there is a way to make the test
concurrent-safe, let's use it and remove the installcheck restriction.

Concurrency issues could be easily reproduced by running in a tight
loop a command like this one, in src/test/modules/gin/ (hardcoding
pg_sleep() after attaching injection points enlarges the race window)
and a second test suite like contrib/btree_gin/:

  make installcheck USE_MODULE_DB=1

Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZhNG4Io9uYOgwv3F@paquier.xyz
2024-04-10 13:48:13 +09:00
Amit Kapila 7e85d1c75f Fix a test in failover slots regression test.
Wait for the standby to catch up before syncing the slots with
pg_sync_replication_slots(), otherwise, the logical slot could be ahead
and the sync would fail.

The other way to fix the test is to change it to use slotsync worker and
poll for the sync to get finished but the current approach is better as
this is a predictable way to write the test.

Per buildfarm

Author: Hou Zhijie
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB571665359F2F5DCD3ADABC9F94002@OS0PR01MB5716.jpnprd01.prod.outlook.com
2024-04-10 08:44:17 +05:30
Thomas Munro 53c8d6c9f1 Fix illegal attribute propagation in LLVM JIT.
Commit 72559438 started copying more attributes from AttributeTemplate
to the functions we generate on the fly.  In the case of deform
functions, which return void, this meant that "noundef", from
AttributeTemplate's return value (a Datum) was copied to a void type.
Older LLVM releases were OK with that, but LLVM 18 crashes.

Update our llvm_copy_attributes() function to skip copying the attribute
for the return value, if the target function returns void.

Thanks to Dmitry Dolgov for help chasing this down.

Back-patch to all supported releases, like 72559438.

Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com
2024-04-10 12:13:46 +12:00
David Rowley 8461424fd7 Fixup various StringInfo function usages
This adjusts various appendStringInfo* function calls to use a more
appropriate and efficient function with the same behavior.  For example,
use appendStringInfoChar() when appending a single character rather than
appendStringInfo() and appendStringInfoString() when no formatting is
required rather than using appendStringInfo().

All adjustments made here are in code that's new to v17, so it makes
sense to fix these now rather than wait a few years and make
backpatching harder.

Discussion: https://postgr.es/m/CAApHDvojY2UvMiO+9_55ArTj10P1LBNJyyoGB+C65BLDNT0GsQ@mail.gmail.com
Reviewed-by: Nathan Bossart, Tom Lane
2024-04-10 11:53:32 +12:00
Alexander Korotkov ff9f72c68f revert: Transform OR clauses to ANY expression
This commit reverts 72bd38cc99 due to implementation and design issues.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us
2024-04-10 02:28:09 +03:00
David Rowley 5a15bdea3b Remove unused BumpBlockIsValid macro
The bump allocator was recently added in 29f6a959c.  Our other
allocators have a similar macro to this, but seemingly the version of
the macro for those allocators is only used in places where the chunk
header is decoded.  Since the bump allocator has no chunk header, none
of those functions exist for bump therefore macro is unused.  Remove it.

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/5f724fb2-96e1-4f36-b65b-47b337ad432e@eisentraut.org
2024-04-10 11:10:16 +12:00
Alexander Korotkov c99ef1811a Checks for ALTER TABLE ... SPLIT/MERGE PARTITIONS ... commands
Check that the target partition actually belongs to the parent table.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/cd842601-cf1a-9806-f7b7-d2509b93ba61%40gmail.com
Author: Dmitry Koval
2024-04-10 01:47:21 +03:00
Peter Eisentraut 27074bce08 Fix incorrect format placeholders 2024-04-09 14:33:06 +02:00
Peter Eisentraut 43a9cab484 Fix whitespace 2024-04-09 11:32:48 +02:00
John Naylor bf183f168c Get rid of anonymous struct
This is a C11 feature, and we require C99. While at it, go the further
step and get rid of the surrounding union (with uintptr_t) entirely,
as there is currently no use case for this file to access the header of
BlocktableEntry as a uintptr_t, and there are no additional alignment
requirements. The least invasive way seems to be to transfer the old
union name to this struct.

Reported by Pavel Borisov and Andres Freund, per buildfarm member mylodon
Reviewed by Pavel Borisov

Discussion: https://postgr.es/m/CALT9ZEH11NYV8AOzKb1bWhCf6J0H=H31f0MgT9xX+HdqvcA1rw@mail.gmail.com
2024-04-09 16:16:01 +07:00
Heikki Linnakangas baa82b78dc libpq error message fixes
Remove stray paren, capitalize SSL and ALPN.

Author: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/20240409.104613.1653854506705708036.horikyota.ntt@gmail.com
2024-04-09 08:06:31 +03:00
Michael Paquier deca6ac136 Add missing set_pglocale_pgservice() for pg_walsummary and pg_combinebackup
These calls are required to make both tools work with NLS.

Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20240408.162702.183779935636035593.horikyota.ntt@gmail.com
2024-04-09 14:01:33 +09:00
Michael Paquier f4083c4975 injection_points: Fix race condition with local injection point tests
The module relies on a shmem exit callback to clean up any injection
points linked to a specific process.  One of the tests checks for the
case of an injection point name reused in a second connection where the
first connection should clean it up, but it did not count for the fact
that the shmem exit callback of the first connection may not have run
when the second connection begins its work.

The regress library includes a wait_pid() that can be used for this
purpose, instead of a custom wait logic, so let's rely on it to wait for
the first connection to exit before working with the second connection.
The module gains a REGRESS_OPTS to be able to look at the regress
library's dlpath.

This issue could be reproduced with a hardcoded sleep() in the shmem
exit callback, and the CI has been able to trigger it sporadically.

Oversight in f587338dec.

Reported-by: Bharath Rupireddy
Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZhOd3NXAutteokGL@paquier.xyz
2024-04-09 10:31:12 +09:00
Tom Lane f463de59d9 In psql, avoid leaking a PGresult after a query is cancelled.
After a query cancel, the tail end of ExecQueryAndProcessResults
took care to clear any not-yet-read PGresults; but it forgot about
the one it has already read.  There would only be such a result
when handling a multi-command string made with "\;", so that you'd
have to cancel an earlier command in such a string to reach the
bug at all.  Even then, there would only be leakage of a single
PGresult per cancel, so it's not surprising nobody noticed this.
But a leak is a leak.

Noted while re-reviewing 90f517821, but this is independent of that:
it dates to 7844c9918.  Back-patch to v15 where that came in.
2024-04-08 17:00:07 -04:00
Tom Lane c21d4c416a Further review for re-implementation of psql's FETCH_COUNT feature.
Alexander Lakhin noted an obsolete comment, which led me to revisit
some other important comments in the patch, and that study turned up a
couple of unintended ways in which the chunked-fetch code path didn't
match the normal code path in ExecQueryAndProcessResults.  The only
nontrivial problem is that it didn't call PrintQueryStatus, so that
we'd not print the final status result from DML ... RETURNING
commands.  To avoid code duplication, move the filter for whether a
result is from RETURNING from PrintQueryResult to PrintQueryStatus.

Discussion: https://postgr.es/m/0023bea5-79c0-476e-96c8-dad599cc3ad8@gmail.com
2024-04-08 15:49:10 -04:00
John Naylor 0fe5f64367 Teach radix tree to embed values at runtime
Previously, the decision to store values in leaves or within the child
pointer was made at compile time, with variable length values using
leaves by necessity. This commit allows introspecting the length of
variable length values at runtime for that decision. This requires
the ability to tell whether the last-level child pointer is actually
a value, so we use a pointer tag in the lowest level bit.

Use this in TID store. This entails adding a byte to the header to
reserve space for the tag. Commit f35bd9bf3 stores up to three offsets
within the header with no bitmap, and now the header can be embedded
as above. This reduces worst-case memory usage when TIDs are sparse.

Reviewed (in an earlier version) by Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
2024-04-08 18:54:35 +07:00
John Naylor f35bd9bf35 Teach TID store to skip bitmap for small numbers of offsets
The header portion of BlocktableEntry has enough padding space for
an array of 3 offsets (1 on 32-bit platforms). Use this space instead
of having a sparse bitmap array. This will take up a constant amount
of space no matter what the offsets are.

Reviewed (in an earlier version) by Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
2024-04-08 18:47:09 +07:00
Alexander Korotkov dd1f6b0c17 Provide a way block-level table AMs could re-use acquire_sample_rows()
While keeping API the same, this commit provides a way for block-level table
AMs to re-use existing acquire_sample_rows() by providing custom callbacks
for getting the next block and the next tuple.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de
Reviewed-by: Pavel Borisov
2024-04-08 14:39:48 +03:00
Alexander Korotkov df64c81ca9 Fix some grammer errors from error messages and codes comments
Discussion: https://postgr.es/m/CAHewXNkGMPU50QG7V6Q60JGFORfo8LfYO1_GCkCa0VWbmB-fEw%40mail.gmail.com
Author: Tender Wang
2024-04-08 14:39:41 +03:00
Alexander Korotkov 422041542f Fill CommonRdOptions with default values in extract_autovac_opts()
Reported-by: Thomas Munro
Reported-by: Pavel Borisov
Discussion: https://postgr.es/m/CA%2BhUKGLZzLR50RBvuqOO3MZ%3DF54ETz-rTp1PDX9uDGP_GqyYqA%40mail.gmail.com
2024-04-08 12:18:23 +03:00
Heikki Linnakangas 3dbd2ff786 Adjust wording of trace_connection_negotiation GUC's description
We're not very consistent about this across all the GUCs, but the
"Logs ..." phrasing is more common than "Log ...", and is used by the
neighboring "log_connections" and "log_disconnections" GUCs, so switch
to that.

Author: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/20240408.154010.1170771365226258348.horikyota.ntt@gmail.com
2024-04-08 12:14:20 +03:00
Alexander Korotkov 9bd99f4c26 Custom reloptions for table AM
Let table AM define custom reloptions for its tables. This allows specifying
AM-specific parameters by the WITH clause when creating a table.

The reloptions, which could be used outside of table AM, are now extracted
into the CommonRdOptions data structure.  These options could be by decision
of table AM directly specified by a user or calculated in some way.

The new test module test_tam_options evaluates the ability to set up custom
reloptions and calculate fields of CommonRdOptions on their base.

The code may use some parts from prior work by Hao Wu.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn
Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis
2024-04-08 11:23:28 +03:00
Amit Kapila 6f3d8d5e7c Fix the intermittent buildfarm failures in 040_standby_failover_slots_sync.
It is possible that even if the primary waits for the subscriber to catch
up and then disables the subscription, the XLOG_RUNNING_XACTS record gets
inserted between the two steps by bgwriter and walsender processes it.
This can move the restart_lsn of the corresponding slot in an
unpredictable way which further leads to slot sync failure.

To ensure predictable behaviour, we drop the subscription and manually
create the slot before the test. The other idea we discussed to write a
predictable test is to use injection points to control the bgwriter
logging XLOG_RUNNING_XACTS but that needs more analysis. We can add a
separate test using injection points.

Per buildfarm

Author: Hou Zhijie
Reviewed-by: Amit Kapila, Shveta Malik
Discussion: https://postgr.es/m/CAA4eK1JD8h_XLRsK_o_Xh=5MhTzm+6d4Cb4_uPgFJ2wSQDah=g@mail.gmail.com
2024-04-08 13:21:55 +05:30
John Naylor 8a1b31e6e5 Use bump context for TID bitmaps stored by vacuum
Vacuum does not pfree individual entries, and only frees the entire
storage space when finished with it. This allows using a bump context,
eliminating the chunk header in each leaf allocation. Most leaf
allocations will be 16 to 32 bytes, so that's a significant savings.
TidStoreCreateLocal gets a boolean parameter to indicate that the
created store is insert-only.

This requires a separate tree context for iteration, since we free
the iteration state after iteration completes.

Discussion: https://postgr.es/m/CANWCAZac%3DpBePg3rhX8nXkUuaLoiAJJLtmnCfZsPEAS4EtJ%3Dkg%40mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZZQFfxvzO8yZHFWtQV+Z2gAMv1ku16Vu7KWmb5kZQyd1w@mail.gmail.com
2024-04-08 14:39:49 +07:00
Amit Langote bb766cde63 JSON_TABLE: Add support for NESTED paths and columns
A NESTED path allows to extract data from nested levels of JSON
objects given by the parent path expression, which are projected as
columns specified using a nested COLUMNS clause, just like the parent
COLUMNS clause.  Rows comprised from a NESTED columns are "joined"
to the row comprised from the parent columns.  If a particular NESTED
path evaluates to 0 rows, then the nested COLUMNS will emit NULLs,
making it an OUTER join.

NESTED columns themselves may include NESTED paths to allow
extracting data from arbitrary nesting levels, which are likewise
joined against the rows at the parent level.

Multiple NESTED paths at a given level are called "sibling" paths
and their rows are combined by UNIONing them, that is, after being
joined against the parent row as described above.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>

Reviewers have included (in no particular order):

Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-04-08 16:14:13 +09:00
Amit Langote f6a2529920 Fix JsonExpr deparsing to emit QUOTES and WRAPPER correctly
Currently, get_json_expr_options() does not emit the default values
for QUOTES (KEEP QUOTES) and WRAPPER (WITHOUT WRAPPER).  That causes
the deparsed JSON_TABLE() columns, such as those contained in a a
view's query, to behave differently when executed than the original
definition.  That's because the rules encoded in
transformJsonTableColumns() will choose either JSON_VALUE() or
JSON_QUERY() as implementation to execute a given column's path
expression depending on the QUOTES and WRAPPER specificationd and
they have slightly different semantics.

Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEqhqsfrg_p7EMyo5zak3d767iFDL8vz_4%3DZBHpOtrghw%40mail.gmail.com
2024-04-08 16:14:12 +09:00
Amit Langote 561b74ddb8 Fix restriction on specifying KEEP QUOTES in JSON_QUERY()
Currently, transformJsonFuncExpr() enforces some restrictions on
the combinations of QUOTES and WRAPPER clauses that can be specified
in JSON_QUERY().  The intent was to only prevent the useless
combination WITH WRAPPER OMIT QUOTES, but the coding prevented KEEP
QUOTES too, which is not helpful. Fix that.
2024-04-08 16:02:29 +09:00
Alexander Korotkov b453a7a16a Fix the wording of or_to_any_transform_limit description
Reported-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20240408.144657.1746688590065601661.horikyota.ntt%40gmail.com
2024-04-08 08:59:27 +03:00
Alexander Korotkov 2af75e1174 Fix the value of or_to_any_transform_limit in postgresql.conf.sample
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/ZhM8jH8gsKm5Q-9p%40pryzbyj2023
2024-04-08 08:51:07 +03:00
Andres Freund e3b69be951 Remove references to old function name
In a97bbe1f1d I accidentally referenced heapgetpage(), both in a function
name and a comment. But since 44086b0975 the relevant function is named
heap_prepare_pagescan().  Rename the new function to page_collect_tuples().

Reported-by: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20240407172615.cocrsvboqm3ttqe4@awork3.anarazel.de
Discussion: https://postgr.es/m/CAApHDvp4SniHopTrVeKWcEvNXFtdki0utAvO=5R7H6TNhtULRQ@mail.gmail.com
2024-04-07 22:05:50 -07:00
Thomas Munro 13453eedd3 Add pg_buffercache_evict() function for testing.
When testing buffer pool logic, it is useful to be able to evict
arbitrary blocks.  This function can be used in SQL queries over the
pg_buffercache view to set up a wide range of buffer pool states.  Of
course, buffer mappings might change concurrently so you might evict a
block other than the one you had in mind, and another session might
bring it back in at any time.  That's OK for the intended purpose of
setting up developer testing scenarios, and more complicated interlocking
schemes to give stronger guararantees about that would likely be less
flexible for actual testing work anyway.  Superuser-only.

Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks)
Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com>
Reviewed-by: Jim Nasby <jim.nasby@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
2024-04-08 16:23:40 +12:00
John Naylor 0ea51bac38 Fix alignment of stack variable
Declare with union similar to PGAlignedBlock.

Report and fix by Andres Freund

Discussion: https://postgr.es/m/20240407190731.izm3mdazednrsiqk%40awork3.anarazel.de
2024-04-08 10:40:20 +07:00
Masahiko Sawada 304b6b1a6b Add more tab completion support for ALTER DEFAULT PRIVILEGES in psql.
This adds tab completion of "GRANT" and "REVOKE [GRANT OPTION FOR]"
for ALTER DEFAULT PRIVILEGES, and adds "WITH GRANT OPTION" for
ALTER DEFAULT PRIVILEGES ... GRANT ... TO role.

Author: Vignesh C, with cosmetic adjustments by me
Reviewed-by: Shubham Khanna, Masahiko Sawada
Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com
2024-04-08 12:15:10 +09:00
Peter Geoghegan 3b08133cd1 Remove redundant nbtree preprocessing assertions.
One of the assertions was the subject of a false positive complaint from
Coverity, but none of the assertions added much, so get rid of them.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3000247.1712537309@sss.pgh.pa.us
2024-04-07 22:13:28 -04:00
Andres Freund af7e90a277 simplehash: Free collisions array in SH_STAT
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.

It's unclear why coverity started warning now.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
2024-04-07 19:08:41 -07:00
Heikki Linnakangas 3e60e956b0 Fix check for 'outlen' return from SSL_select_next_proto()
Fixes compiler warning reported by Andres Freund.

Discusssion: https://www.postgresql.org/message-id/20240408015055.xsuahullywpfwyvu@awork3.anarazel.de
2024-04-08 05:03:17 +03:00
Heikki Linnakangas d60ab76f63 Silence perlcritic warnings in new libpq tests
Per buildfarm member 'koel'.
2024-04-08 04:32:26 +03:00
Heikki Linnakangas 91044ae4ba Send ALPN in TLS handshake, require it in direct SSL connections
libpq now always tries to send ALPN. With the traditional negotiated
SSL connections, the server accepts the ALPN, and refuses the
connection if it's not what we expect, but connecting without ALPN is
still OK. With the new direct SSL connections, ALPN is mandatory.

NOTE: This uses "TBD-pgsql" as the protocol ID. We must register a
proper one with IANA before the release!

Author: Greg Stark, Heikki Linnakangas
Reviewed-by: Matthias van de Meent, Jacob Champion
2024-04-08 04:24:51 +03:00
Heikki Linnakangas d39a49c1e4 Support TLS handshake directly without SSLRequest negotiation
By skipping SSLRequest, you can eliminate one round-trip when
establishing a TLS connection. It is also more friendly to generic TLS
proxies that don't understand the PostgreSQL protocol.

This is disabled by default in libpq, because the direct TLS handshake
will fail with old server versions. It can be enabled with the
sslnegotation=direct option. It will still fall back to the negotiated
TLS handshake if the server rejects the direct attempt, either because
it is an older version or the server doesn't support TLS at all, but
the fallback can be disabled with the sslnegotiation=requiredirect
option.

Author: Greg Stark, Heikki Linnakangas
Reviewed-by: Matthias van de Meent, Jacob Champion
2024-04-08 04:24:49 +03:00
Heikki Linnakangas 05fd30c0e7 Refactor libpq state machine for negotiating encryption
This fixes the few corner cases noted in commit 705843d294, as shown
by the changes in the test.

Author: Heikki Linnakangas, Matthias van de Meent
Reviewed-by: Jacob Champion
2024-04-08 04:24:46 +03:00
Thomas Munro 041b96802e Use streaming I/O in ANALYZE.
The ANALYZE command prefetches and reads sample blocks chosen by a
BlockSampler algorithm. Instead of calling [Prefetch|Read]Buffer() for
each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
2024-04-08 13:16:28 +12:00
Michael Paquier f587338dec injection_points: Introduce runtime conditions
This adds a new SQL function injection_points_set_local() that can be
used to force injection points to be run only in the process where they
are attached.  This is handy for SQL tests to:
- Detach automatically injection points when the process exits.
- Allow tests with injection points to run concurrently with other test
suites, so as such modules do not have to be marked with
NO_INSTALLCHECK.

Currently, the only condition that can be registered is for a PID.
This could be extended to more kinds later, if required, like database
names/OIDs, roles, or more concepts I did not consider.

Using a single function for SQL scripts is an idea from Heikki
Linnakangas.

Reviewed-by: Andrey Borodin
Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
2024-04-08 09:47:50 +09:00
Heikki Linnakangas 705843d294 Enhance libpq encryption negotiation tests with new GUC
The new "log_connection_negotiation" server option causes the server
to print messages to the log when it receives a SSLRequest or
GSSENCRequest packet from the client. Together with "log_connections",
it gives a trace of how a connection and encryption is
negotiatated. Use the option in the libpq_encryption test, to verify
in more detail how libpq negotiates encryption with different
gssencmode and sslmode options.

This revealed a couple of cases where libpq retries encryption or
authentication, when it should already know that it cannot succeed.  I
marked them with XXX comments in the test tables. They only happen
when the connection was going to fail anyway, and only with rare
combinations of options, so they're not serious.

Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
2024-04-08 02:49:37 +03:00
Heikki Linnakangas 20f9b61cc1 With gssencmode='require', check credential cache before connecting
Previously, libpq would establish the TCP connection, and then
immediately disconnect if the credentials were not available.  The
same thing happened if you tried to use a Unix domain socket with
gssencmode=require. Check those conditions before establishing the TCP
connection.

This is a very minor issue, but my motivation to do this now is that
I'm about to add more detail to the tests for encryption negotiation.
This makes the case of gssencmode=require but no credentials
configured fail at the same stage as with gssencmode=require and
GSSAPI support not compiled at all. That avoids having to deal with
variations in expected output depending on build options.

Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
2024-04-08 02:49:35 +03:00
Heikki Linnakangas 1169920ff7 Add tests for libpq gssencmode and sslmode options
Test all combinations of gssencmode, sslmode, whether the server
supports SSL and/or GSSAPI encryption, and whether they are accepted
by pg_hba.conf. This is in preparation for refactoring that code in
libpq, and for adding a new option for "direct SSL" connections, which
adds another dimension to the logic.

If we add even more options in the future, testing all combinations
will become unwieldy and we'll need to rethink this, but for now an
exhaustive test is nice.

Author: Heikki Linnakangas, Matthias van de Meent
Reviewed-by: Jacob Champion
Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi
2024-04-08 02:49:32 +03:00
Heikki Linnakangas 9f899562d4 Move Kerberos module
So that we can reuse it in new tests.

Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi
Reviewed-by: Jacob Champion, Matthias van de Meent
2024-04-08 02:49:30 +03:00
Michael Paquier 997db123c0 Make GIN test using injection points repeatable
As written, the test would fail when run repeatedly because one of the
injection points attached was not detached.  This would not matter if
the test is rewritten to be concurrently safe, but let's be clean and
it is a good practice.

Oversight in 6a1ea02c49.

Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
2024-04-08 08:45:04 +09:00
David Rowley 705ec05653 Fix incorrect KeeperBlock macro in bump.c
The macro was missing a MAXALIGN around the sizeof(BumpContext) which
would cause problems detecting the keeper block on 32-bit systems that
have a MAXALIGN value of 8.

Thank you to Andres Freund, Tomas Vondra and Tom Lane for investigating
and testing.

Reported-by: Melanie Plageman, Tomas Vondra
Discussion: https://postgr.es/m/CAAKRu_Y6dZjiJEZghgNZp0Gjar1JVq-CH7XGDqExDVHnPgDjuw@mail.gmail.com
Discussion: https://postgr.es/m/a4a10b89-6ba8-4abd-b449-019aafff04fc@enterprisedb.com
2024-04-08 11:06:31 +12:00
Alexander Korotkov beabea6c20 Fix usage of same ListCell transform_or_to_any()'s in nested loops
Discussion: https://postgr.es/m/CAAKRu_b4SXNW4GAM0bv3e6wcL5ODSXg1ZdRCn6uyLLjSPbveBg%40mail.gmail.com
Author: Melanie Plageman
2024-04-08 01:38:37 +03:00
Alexander Korotkov 72bd38cc99 Transform OR clauses to ANY expression
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.

Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').

Sometimes it can lead to not optimal plan.  This is why there is a
or_to_any_transform_limit GUC.  It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
2024-04-08 01:27:52 +03:00
Daniel Gustafsson 75a47b6a0d Change debug printing to log filename
When restarting the cluster fails the code introduced in 33774978c7
printed the full log contents to aid debugging.  For cases when the
logfile is large this adds unnecessary overhead.  Reduce to printing
the logfile path instead.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240406214439.2n4zf2w7ukhf7dsy@awork3.anarazel.de
2024-04-08 00:24:20 +02:00
Tom Lane 0c66a164e7 Remove useless duplicate call of defGetBoolean().
Seems to be a copy-and-paste error dating to dc2123400.
Noted while reviewing a related documentation patch.
2024-04-07 17:08:06 -04:00
Alvaro Herrera a0e0fb1ba5
Use conditional variable to wait for next MultiXact offset
In one multixact.c edge case, we need a mechanism to wait for one
multixact offset to be written before being allowed to read the next
one.  We used to handle this case by sleeping for one millisecond and
retrying, but such sleeps have been reported as problematic in
production cases.  We can avoid the problem by using a condition
variable: readers sleep on it and then every creator of multixacts
broadcasts into the CV when creation is sufficiently far along.

Author: Kyotaro Horiguchi <horikyotajntt@gmail.com>
Reviewed-by: Andrey Borodin <amborodin@acm.org>
Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru
Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
2024-04-07 20:33:45 +02:00
Peter Geoghegan 473411fc51 Avoid extra lookups with nbtree array inequalities.
nbtree index scans with SAOP inequalities (but no SAOP equalities)
performed extra ORDER proc lookups for any remaining equality strategy
scan keys.  This could waste cycles, and caused assertion failures.
Keeping around a separate ORDER proc is only necessary for a scan's
non-array/non-SAOP equality scan keys when the scan has at least one
other SAOP equality strategy key (a SAOP inequality shouldn't count).

To fix, replace _bt_preprocess_array_keys_final's assertion with a test
that makes the function return early when the scan has no SAOP equality
scan keys.

Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp
execution.

Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/0539d3d3-a402-0a49-ed5e-26429dffc4bd@gmail.com
2024-04-07 14:15:54 -04:00
Heikki Linnakangas a475a2fa3b Don't clobber test exit code at cleanup in LDAP/Kerberors tests
If the test script die()d before running the first test, the whole test
was interpreted as SKIPped rather than failed. The PostgreSQL::Cluster
module got this right.

Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi
2024-04-07 20:21:27 +03:00
Heikki Linnakangas e13b586d7c Improve check in LDAP test to find the OpenLDAP installation
If the OpenLDAP installation directory is not found, set $setup to 0
so that the LDAP tests are skipped. The macOS checks were already
doing that, but the checks on other OS's were not. While we're at it,
improve the error message when the tests are skipped, to specify
whether the OS is supported at all, or if we just didn't find the
installation directory.

This was accidentally "working" without this, i.e. we were skipping
the tests if the OpenLDAP installation was not found, because of a bug
in the LdapServer test module: the END block clobbered the exit code
so if the script die()s before running the first subtest, the whole
test script was marked as SKIPped. The next commit will fix that bug,
but we need to fix the setup code first.

These checks should probably go into configure/meson, but this is
better than nothing and allows fixing the bug in the END block.

Backpatch to all supported versions.

Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi
2024-04-07 20:21:21 +03:00
Thomas Munro b7b0f3f272 Use streaming I/O in sequential scans.
Instead of calling ReadBuffer() for each block, heap sequential scans
and TID range scans now use the streaming API introduced in b5a9b18cd0.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com
2024-04-08 01:53:57 +12:00
David Rowley 6ed83d5fa5 Use bump memory context for tuplesorts
29f6a959c added a bump allocator type for efficient compact allocations.
Here we make use of this for non-bounded tuplesorts to store tuples.
This is very space efficient when storing narrow tuples due to bump.c
not having chunk headers.  This means we can fit more tuples in work_mem
before spilling to disk, or perform an in-memory sort touching fewer
cacheline.

Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
2024-04-08 00:32:26 +12:00
Alvaro Herrera f3ff7bf83b
Add XLogCtl->logInsertResult
This tracks the position of WAL that's been fully copied into WAL
buffers by all processes emitting WAL.  (For some reason we call that
"WAL insertion").  This is updated using atomic monotonic advance during
WaitXLogInsertionsToFinish, which is not when the insertions actually
occur, but it's the only place where we know where have all the
insertions have completed.

This value is useful in WALReadFromBuffers, which can verify that
callers don't try to read past what has been inserted.  (However, more
infrastructure is needed in order to actually use WAL after the flush
point, since it could be lost.)

The value is also useful in WaitXLogInsertionsToFinish() itself, since
we can now exit quickly when all WAL has been already inserted, without
even having to take any locks.
2024-04-07 14:06:30 +02:00
David Rowley 29f6a959cf Introduce a bump memory allocator
This introduces a bump MemoryContext type.  The bump context is best
suited for short-lived memory contexts which require only allocations
of memory and never a pfree or repalloc, which are unsupported.

Memory palloc'd into a bump context has no chunk header.  This makes
bump a useful context type when lots of small allocations need to be
done without any need to pfree those allocations.  Allocation sizes are
rounded up to the next MAXALIGN boundary, so with this and no chunk
header, allocations are very compact indeed.

Allocations are also very fast as bump does not check any freelists to
try and make use of previously free'd chunks.  It just checks if there
is enough room on the current block, and if so it bumps the freeptr
beyond this chunk and returns the value that the freeptr was previously
pointing to.  Simple and fast.  A new block is malloc'd when there's not
enough space in the current block.

Code using the bump allocator must take care never to call any functions
which could try to call realloc() (or variants), pfree(),
GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated
chunk.  Due to lack of chunk headers, these operations are unsupported.
To increase the chances of catching such issues, when compiled with
MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and
any attempt to perform an unsupported operation will result in an ERROR.
Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported
operation could result in a segfault.

A follow-on commit will implement the first user of bump.

Author: David Rowley
Reviewed-by: Nathan Bossart
Reviewed-by: Matthias van de Meent
Reviewed-by: Tomas Vondra
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
2024-04-08 00:02:43 +12:00
David Rowley 0ba8b75e7e Enlarge bit-space for MemoryContextMethodID
Reserve 4 bits for MemoryContextMethodID rather than 3.  3 bits did
technically allow a maximum of 8 memory context types, however, we've
opted to reserve some bit patterns which left us with only 4 slots, all
of which were used.

Here we add another bit which frees up 8 slots for future memory context
types.

In passing, adjust the enum names in MemoryContextMethodID to make it
more clear which ones can be used and which ones are reserved.

Author: Matthias van de Meent, David Rowley
Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
2024-04-07 23:32:00 +12:00
David Rowley c4ab7da606 Avoid needless large memcpys in libpq socket writing
Until now, when calling pq_putmessage to write new data to a libpq
socket, all writes are copied into a buffer and that buffer gets flushed
when full to avoid having to perform small writes to the socket.

There are cases where we must write large amounts of data to the socket,
sometimes larger than the size of the buffer.  In this case, it's
wasteful to memcpy this data into the buffer and flush it out, instead,
we can send it directly from the memory location that the data is already
stored in.

Here we adjust internal_putbytes() so that after having just flushed the
buffer to the socket, if the remaining bytes to send is as big or bigger
than the buffer size, we just send directly rather than needlessly
copying into the PqSendBuffer buffer first.

Examples of operations that write large amounts of data in one message
are; outputting large tuples with SELECT or COPY TO STDOUT and
pg_basebackup.

Author: Melih Mutlu
Reviewed-by: Heikki Linnakangas
Reviewed-by: Jelte Fennema-Nio
Reviewed-by: David Rowley
Reviewed-by: Ranier Vilela
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAGPVpCR15nosj0f6xe-c2h477zFR88q12e6WjEoEZc8ZYkTh3Q@mail.gmail.com
2024-04-07 21:20:18 +12:00
Andres Freund a97bbe1f1d Reduce branches in heapgetpage()'s per-tuple loop
Until now, heapgetpage()'s loop over all tuples performed some conditional
checks for each tuple, even though condition did not change across the loop.

This commit fixes that by moving the loop into an inline function. By calling
it with different constant arguments, the compiler can generate an optimized
loop for the different conditions, at the price of two per-page checks.

For cases of all-visible tables and an isolation level other than
serializable, speedups of up to 25% have been measured.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Tested-by: Quan Zongliang <quanzongliang@yeah.net>
Discussion: https://postgr.es/m/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de
Discussion: https://postgr.es/m/2ef7ff1b-3d18-2283-61b1-bbd25fc6c7ce@yeah.net
2024-04-06 23:52:26 -07:00
Nathan Bossart 41c51f0c68 Optimize visibilitymap_count() with AVX-512 instructions.
Commit 792752af4e added infrastructure for using AVX-512 intrinsic
functions, and this commit uses that infrastructure to optimize
visibilitymap_count().  Specificially, a new pg_popcount_masked()
function is introduced that applies a bitmask to every byte in the
buffer prior to calculating the population count, which is used to
filter out the all-visible or all-frozen bits as needed.  Platforms
without AVX-512 support should also see a nice speedup due to the
reduced number of calls to a function pointer.

Co-authored-by: Ants Aasma
Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
2024-04-06 22:58:23 -05:00
Nathan Bossart 792752af4e Optimize pg_popcount() with AVX-512 instructions.
Presently, pg_popcount() processes data in 32-bit or 64-bit chunks
when possible.  Newer hardware that supports AVX-512 instructions
can use 512-bit chunks, which provides a nice speedup, especially
for larger buffers.  This commit introduces the infrastructure
required to detect compiler and CPU support for the required
AVX-512 intrinsic functions, and it adds a new pg_popcount()
implementation that uses these functions.  If CPU support for this
optimized implementation is detected at runtime, a function pointer
is updated so that it is used by subsequent calls to pg_popcount().

Most of the existing in-tree calls to pg_popcount() should benefit
from these instructions, and calls with smaller buffers should at
least not regress compared to v16.  The new infrastructure
introduced by this commit can also be used to optimize
visibilitymap_count(), but that is left for a follow-up commit.

Co-authored-by: Paul Amonson, Ants Aasma
Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley
Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
2024-04-06 21:56:23 -05:00
Thomas Munro 158f581923 Fix if/while thinko in read_stream.c edge case.
When we determine that a wanted block can't be combined with the current
pending read, it's time to start that read to get it out of the way.  An
"if" in that code path should have been a "while", because it might take
more than one go in case of partial reads.  This was only broken for
smaller ranges, as the more common case of io_combine_limit-sized ranges
is handled earlier in the code and knows how to loop, hiding the bug for
a while.

Discovered while testing large parallel sequential scans of partially
cached tables.  The ramp-up-and-down block allocator for parallel scans
could hit the problem case and skip some blocks near the end that should
have been streamed.

Defect in commit b5a9b18c.

Discussion: https://postgr.es/m/CA%2BhUKG%2Bh8Whpv0YsJqjMVkjYX%2B80fTVc6oi-V%2BzxJvykLpLHYQ%40mail.gmail.com
2024-04-07 14:51:33 +12:00
Tom Lane beb012b42f Disable parallel query in psql error-with-FETCH_COUNT test.
The buildfarm members using debug_parallel_query = regress are mostly
unhappy with this test.  I guess what is happening is that rows
generated by a parallel worker are buffered, and might or might not
get to the leader before the expected error occurs.  We did not see
any variability in the old version of this test because each FETCH
would succeed or fail atomically, leading to a predictable number of
rows emitted before failure.  I don't find this to be a bug, just
unspecified behavior, so let's disable parallel query for this one
test case to make the results stable.
2024-04-06 21:49:24 -04:00
Tom Lane 90f5178211 Re-implement psql's FETCH_COUNT feature atop libpq's chunked mode.
Formerly this was done with a cursor, which is problematic since
not all result-set-returning query types can be put into a cursor.
The new implementation is better integrated into other psql
features, too.

Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)

Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
2024-04-06 20:45:11 -04:00
Tom Lane 4643a2b265 Support retrieval of results in chunks with libpq.
This patch generalizes libpq's existing single-row mode to allow
individual partial-result PGresults to contain up to N rows, rather
than always one row.  This reduces malloc overhead compared to plain
single-row mode, and it is very useful for psql's FETCH_COUNT feature,
since otherwise we'd have to add code (and cycles) to either merge
single-row PGresults into a bigger one or teach psql's
results-printing logic to accept arrays of PGresults.

To avoid API breakage, PQsetSingleRowMode() remains the same, and we
add a new function PQsetChunkedRowsMode() to invoke the more general
case.  Also, PGresults obtained the old way continue to carry the
PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is
used then their status code is PGRES_TUPLES_CHUNK.  The underlying
logic is the same either way, though.

Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)

Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
2024-04-06 20:45:11 -04:00
Tomas Vondra 92641d8d65 Change BitmapAdjustPrefetchIterator to accept BlockNumber
BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-07 01:25:15 +02:00
Tomas Vondra 1fdb0ce9b1 BitmapHeapScan: Use correct recheck flag for skip_fetch
As of 7c70996ebf, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.

This only affects prefetching - if the recheck flag changes, we may
prefetch blocks unecessarily and not prefetch blocks that will be
needed. But we don't need to rely on that assumption - we know the
recheck flag for the block we're considering prefetching, so we can
use that.

The impact is very limited in practice - the opclass would need to
assign different recheck flags to different blocks, but none of the
built-in opclasses seems to do that.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
2024-04-07 00:51:03 +02:00
Tomas Vondra 04e72ed617 BitmapHeapScan: Push skip_fetch optimization into table AM
Commit 7c70996ebf introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.

With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.

This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.

Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-07 00:24:14 +02:00
Alexander Korotkov 87c21bb941 Implement ALTER TABLE ... SPLIT PARTITION ... command
This new DDL command splits a single partition into several parititions.
Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are
created using createPartitionTable() function with parent partition as the
template.

This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
2024-04-07 01:18:44 +03:00
Alexander Korotkov 1adf16b8fb Implement ALTER TABLE ... MERGE PARTITIONS ... command
This new DDL command merges several partitions into the one partition of the
target table.  The target partition is created using new
createPartitionTable() function with parent partition as the template.

This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
2024-04-07 01:18:43 +03:00
Tomas Vondra fe1431e39c BitmapHeapScan: postpone setting can_skip_fetch
Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of
in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the
skip fetch optimization into heap AM code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-06 23:56:49 +02:00
Alexander Korotkov 74eaf66f98 Call WaitLSNCleanup() in AbortTransaction()
Even though waiting for replay LSN happens without explicit transaction,
AbortTransaction() is responsible for the cleanup of the shared memory if
the error is thrown in a stored procedure.  So, we need to do WaitLSNCleanup()
there to clean up after some unexpected error happened while waiting for
replay LSN.

Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql
Author: Alvaro Herrera
2024-04-07 00:49:53 +03:00
Alexander Korotkov ee79928441 Clarify what is protected by WaitLSNLock
Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and
updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock.  There
is one now documented exclusion on fast-path checking of
WaitLSNProcInfo.inHeap flag.

Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
2024-04-07 00:49:53 +03:00
Alexander Korotkov 25f42429e2 Use an LWLock instead of a spinlock in waitlsn.c
This should prevent busy-waiting when number of waiting processes is high.

Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
Author: Alvaro Herrera
2024-04-07 00:49:53 +03:00
Tomas Vondra 1577081e96 BitmapHeapScan: begin scan after bitmap creation
Change the order so that the table scan is initialized only after
initializing the index scan and building the bitmap.

This is mostly a cosmetic change for now, but later commits will need
to pass parameters to table_beginscan_bm() that are unavailable in
ExecInitBitmapHeapScan().

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2024-04-06 22:58:04 +02:00
Noah Misch 06558f4952 Backport IPC::Run optimization to src/test/perl.
This one-liner makes the TAP portion of "make check-world" 7% faster on
a non-Windows machine.

Discussion: https://postgr.es/m/20240331050310.09@rfd.leadboat.com
2024-04-06 09:27:55 -07:00
Peter Geoghegan 5bf748b86b Enhance nbtree ScalarArrayOp execution.
Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively.  This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).

Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient.  All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.

The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value.  Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan.  Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.

Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.

Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute.  These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).

Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes.  They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem).  They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).

There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions.  Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.

Author: Peter Geoghegan <pg@bowt.ie>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
2024-04-06 11:47:10 -04:00
Tom Lane ddd9e43a92 Remove obsolete comment in CopyReadLineText().
When this bit of commentary was written, it was alluding to the
fact that we looked for newlines and EOD markers in the raw
(not yet encoding-converted) input data.  We don't do that anymore,
preferring to batch the conversion of larger chunks of input and
split it into lines later.  Hence there's no longer any need for
assumptions about the relevant characters being encoding-invariant,
and we should remove this comment saying we assume that.

Discussion: https://postgr.es/m/1461688.1712347668@sss.pgh.pa.us
2024-04-06 11:16:27 -04:00
John Naylor a365d9e2e8 Speed up tail processing when hashing aligned C strings, take two
After encountering the NUL terminator, the word-at-a-time loop exits
and we must hash the remaining bytes. Previously we calculated
the terminator's position and re-loaded the remaining bytes from
the input string. This was slower than the unaligned case for very
short strings. We already have all the data we need in a register,
so let's just mask off the bytes we need and hash them immediately.

In addition to endianness issues, the previous attempt upset valgrind
in the way it computed the mask. Whether by accident or by wisdom,
the author's proposed method passes locally with valgrind 3.22.

Ants Aasma, with cosmetic adjustments by me

Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
2024-04-06 17:14:28 +07:00
John Naylor 0c25fee359 Teach fasthash_accum to use platform endianness for bytewise loads
This function previously used a mix of word-wise loads and bytewise
loads. The bytewise loads happened to be little-endian regardless of
platform. This in itself is not a problem. However, a future commit
will require the same result whether A) the input is loaded as a
word with the relevent bytes masked-off, or B) the input is loaded
one byte at a time.

While at it, improve debuggability of the internal hash state.

Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com
2024-04-06 17:14:28 +07:00
Thomas Munro 98f320eb2e Increase default vacuum_buffer_usage_limit to 2MB.
The BAS_VACUUM ring size has been 256kB since commit d526575f introduced
the mechanism 17 years ago.  Commit 1cbbee03 recently made it
configurable but retained the traditional default.  The correct default
size has been debated for years, but 256kB is certainly very small.
VACUUM soon needs to write back data it dirtied only 32 blocks ago,
which usually requires flushing the WAL.  New experiments in prefetching
pages for VACUUM exacerbated the problem by crashing into dirty data
even sooner.  Let's make the default 2MB.  That's 1.6% of the default
toy buffer pool size, and 0.2% of 1GB, which would be a considered a
small shared_buffers setting for a real system these days.  Users are
still free to set the GUC to a different value.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de
Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com
2024-04-06 23:12:03 +13:00
Thomas Munro 3bd8439ed6 Allow BufferAccessStrategy to limit pin count.
While pinning extra buffers to look ahead, users of strategies are in
danger of using too many buffers.  For some strategies, that means
"escaping" from the ring, and in others it means forcing dirty data to
disk very frequently with associated WAL flushing.  Since external code
has no insight into any of that, allow individual strategy types to
expose a clamp that should be applied when deciding how many buffers to
pin at once.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com
2024-04-06 23:11:45 +13:00
John Naylor f956ecd035 Convert uses of hash_string_pointer to fasthash equivalent
Remove duplicate hash_string_pointer() function definitions by creating
a new inline function hash_string() for this purpose.

This has the added advantage of avoiding strlen() calls when doing hash
lookup. It's not clear how many of these are perfomance-sensitive
enough to benefit from that, but the simplification is worth it on
its own.

Reviewed by Jeff Davis

Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com
2024-04-06 12:20:40 +07:00
John Naylor db17594ad7 Add macro to disable address safety instrumentation
fasthash_accum_cstring_aligned() uses a technique, found in various
strlen() implementations, to detect a string's NUL terminator by
reading a word at at time. That triggers failures when testing with
"-fsanitize=address", at least with frontend code. To enable using
this function anywhere, add a function attribute macro to disable
such testing.

Reviewed by Jeff Davis

Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com
2024-04-06 12:20:40 +07:00
John Naylor 4b968e2027 Fix incorrect return type
fasthash32() calculates a 32-bit hashcode, but the return
type was uint64. Change to uint32.

Noted by Jeff Davis

Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com
2024-04-06 12:20:40 +07:00
Thomas Munro aa1e8c2064 Improve read_stream.c's fast path.
The "fast path" for well cached scans that don't do any I/O was
accidentally coded in a way that could only be triggered by pg_prewarm's
usage pattern, which starts out with a higher distance because of the
flags it passes in.  We want it to work for streaming sequential scans
too, once that patch is committed.  Adjust.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com
2024-04-06 18:00:35 +13:00
Andres Freund 9e7386924e Fix headerscheck violation introduced in f8ce4ed78c
Per ci.
2024-04-05 14:27:45 -07:00
Andrew Dunstan c3e60f3d7e Silence some compiler warnings in commit 3311ea86ed
Per report from Nathan Bossart
2024-04-05 16:08:40 -04:00
Robert Haas 55a5ee30cd Fix incorrect calculation in BlockRefTableEntryGetBlocks.
The previous formula was incorrect in the case where the function's
nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens
whenever a relation segment file is exactly 512MB or exactly 1GB in
length. In such cases, the formula would calculate a stop_offset of
0 rather than 65536, resulting in modified blocks in the second half
of a 1GB file, or all the modified blocks in a 512MB file, being
omitted from the incremental backup.

Reported off-list by Tomas Vondra and Jakub Wartak.

Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com
2024-04-05 13:41:19 -04:00
Tomas Vondra 079d94ab34 Check HAVE_COPY_FILE_RANGE before calling copy_file_range
Fix a mistake in ac81101551 - write_reconstructed_file() called
copy_file_range() without properly checking HAVE_COPY_FILE_RANGE.

Reported by several macOS machines. Also reported by cfbot, but I missed
that issue before commit.
2024-04-05 19:38:20 +02:00
Tomas Vondra ac81101551 Allow using copy_file_range in write_reconstructed_file
This commit allows using copy_file_range() for efficient combining of
data from multiple files, instead of simply reading/writing the blocks.
Depending on the filesystem and other factors (size of the increment,
distribution of modified blocks etc.) this may be faster than the
block-by-block copy, but more importantly it enables various features
provided by CoW filesystems.

If a checksum needs to be calculated for the file, the same strategy as
when copying whole files is used - copy_file_range is used to copy the
blocks, but the file is also read for the checksum calculation.

While the checksum calculation is rarely needed when cloning whole
files, when reconstructing the files from multiple backups it needs to
happen almost always (the only exception is when the user specified
--no-manifest).

Author: Tomas Vondra
Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas
Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
2024-04-05 19:19:36 +02:00
Alvaro Herrera b8b37e41ba
Make libpqsrv_cancel's return const char *, not char *
Per headerscheck's C++ check.

Discussion: https://postgr.es/m/372769.1712179784@sss.pgh.pa.us
2024-04-05 18:23:10 +02:00
Tomas Vondra 8e392595e5 Remove unused variable in checksum_file()
The 'offset' variable was set but otherwise unused.

Per buildfarm animals with clang, e.g. sifaka and longlin.
2024-04-05 18:13:15 +02:00
Tomas Vondra f8ce4ed78c Allow copying files using clone/copy_file_range
Adds --clone/--copy-file-range options to pg_combinebackup, to allow
copying files using file cloning or copy_file_range(). These methods may
be faster than the standard block-by-block copy, but the main advantage
is that they enable various features provided by CoW filesystems.

This commit only uses these copy methods for files that did not change
and can be copied as a whole from a single backup.

These new copy methods may not be available on all platforms, in which
case the command throws an error (immediately, even if no files would be
copied as a whole). This early failure seems better than failing later
when trying to copy the first file, after performing a lot of work on
earlier files.

If the requested copy method is available, but a checksum needs to be
recalculated (e.g. because of a different checksum type), the file is
still copied using the requested method, but it is also read for the
checksum calculation. Depending on the filesystem this may be more
expensive than just performing the simple copy, but it does enable the
CoW benefits.

Initial patch by Jakub Wartak, various reworks and improvements by me.

Author: Tomas Vondra, Jakub Wartak
Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas
Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
2024-04-05 18:01:32 +02:00
Tom Lane 3c5ff36aba Suppress "variable may be used uninitialized" warning.
Buildfarm member caiman is showing this, which surprises me because
it's very late-model gcc (14.0.1) and ought to be smart enough to
know that elog(ERROR) doesn't return.  But we're likely to see the
same from stupider compilers too, so add a dummy initialization in
our usual style.
2024-04-05 10:58:30 -04:00
Tomas Vondra 10e3226ba1 Align blocks in incremental backups to BLCKSZ
Align blocks stored in incremental files to BLCKSZ, so that the
incremental backups work well with CoW filesystems.

The header of the incremental file is padded with \0 to a multiple of
BLCKSZ, so that the block data (also BLCKSZ) is aligned to BLCKSZ. The
padding is added only to files containing block data, so files with just
the header remain small. This adds a bit of extra space, but as the
number of blocks increases the overhead gets negligible very quickly.
And as the padding is \0 bytes, it does compress extremely well.

The alignment is important for CoW filesystems that usually require the
blocks to be aligned to filesystem page size for features like block
sharing, deduplication etc. to work well. With the variable sized header
the blocks in the increments were not aligned at all, negating the
benefits of the CoW filesystems.

This matters even for non-CoW filesystems, for example when placed on a
RAID array. If the block is not aligned, it may easily span multiple
devices, causing read and write amplification.

It might be better to align the blocks to the filesystem page, not
BLCKSZ, but we have no good way to determine that. Even if we determine
the page size at the time of taking the backup, the backup may move. For
now the BLCKSZ seems sufficient - the filesystem page is usually 4K, so
the default BLCKSZ (8K by default) is aligned to that.

Author: Tomas Vondra
Reviewed-by: Robert Haas, Jakub Wartak
Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
2024-04-05 16:30:01 +02:00
Alvaro Herrera ee1cbe806d
Operate XLogCtl->log{Write,Flush}Result with atomics
This removes the need to hold both the info_lck spinlock and
WALWriteLock to update them.  We use stock atomic write instead, with
WALWriteLock held.  Readers can use atomic read, without any locking.

This allows for some code to be reordered: some places were a bit
contorted to avoid repeated spinlock acquisition, but that's no longer a
concern, so we can turn them to more natural coding.  Some further
changes are possible (maybe to performance wins), but in this commit I
did rather minimal ones only, to avoid increasing the blast radius.

Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
2024-04-05 16:14:39 +02:00
Amit Kapila 6f132ed693 Allow synced slots to have their inactive_since.
This commit does two things:
1) Maintains inactive_since for sync slots whenever the slot is released
just like any other regular slot.

2) Ensures the value is set to the current timestamp during the promotion
of standby to help correctly interpret the time after promotion. We don't
want the slots to appear inactive for a long time after promotion if they
haven't been synchronized recently. This would also avoid the invalidation
of such slots immediately after promotion if tomorrow we have a feature
that invalidates slots based on their inactivity time. Whoever acquires
the slot i.e. makes the slot active will reset it to NULL.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada
Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
2024-04-05 09:48:49 +05:30
Michael Paquier f98dbdeb51 Add "ABI_compatibility" regions to wait_event_names.txt
The current design behind the automatic generation of the C code and
documentation related to wait events introduced in fa88928470 does not
offer a way to attach new wait events without breaking ABI
compatibility, as all the events are forcibly reordered for each section
in the input file wait_event_names.txt.  Adding new wait events to
stable branches is something that has happened in the past, 0b6517a3b7
being a recent example of that with VERSION_FILE_SYNC, so we need a way
to generate any C code for wait events while maintaining compatibility
on stable branches already released.

This commit solves this issue by adding a new region called
"ABI_compatibility" (keyword could be updated to something else if
someone had a better idea) to each section of wait_event_names.txt, so
as one can add new wait events to stable branches in
wait_event_names.txt while keeping the code ABI-compatible.
"ABI_compatibility" has no impact on the documentation generated: all
the wait events of one section are still alphabetically ordered.  LWLock
and Lock sections generate their C code elsewhere, so they do not need
an "ABI_compatibility" region.

For example, let's imagine a wait_event_names.txt like that:
Section: ClassName - Foo
FOO_1	"Waiting in Foo 1"
FOO_2	"Waiting in Foo 2"
ABI_compatibility:
NEW_FOO_1	"Waiting in New Foo 1"
NEW_BAR_1	"Waiting in New Bar 1"

This results in the following enum, where the events in the ABI region
are listed last with the same ordering as in wait_event_names.txt:
typedef enum
{
    WAIT_EVENT_FOO_1,
    WAIT_EVENT_FOO_2,
    WAIT_EVENT_NEW_FOO_1,
    WAIT_EVENT_NEW_BAR_1
} WaitEventFoo;

New wait events added in stable branches should be added at the end of
each ABI_compatibility region, and ABI_compatibility should remain empty
on HEAD and unreleased stable branches.

This design has been suggested by Noah Misch and me.

Reported-by: Noah Misch
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com
2024-04-05 08:56:52 +09:00
Jeff Davis e2a2357671 Fix test failures when language environment is not UTF-8.
For tests that depend on UTF-8 encoding, force LC_COLLATE=C and
LC_CTYPE=C to avoid an encoding mismatch.

Reported-by: Thomas Munro
Discussion: https://postgr.es/m/CA+hUKGK-ZqV1njkG_=xcCqXh2fcMkz85FTMnhS2opm4ZerH=xw@mail.gmail.com
2024-04-04 16:10:12 -07:00
Robert Haas e57fe3824e Fix old, misleading comment for PGRES_POLLING_ACTIVE.
The comment implies that we can eventually remove this, but per
discussion, we actually don't want to do that ever, in order to
maintain compatibility.

Jelte Fennema-Nio, reviewed by Tristan Partin

Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com
2024-04-04 16:22:11 -04:00
Robert Haas 12b964d781 Remove reachable call to pg_unreachable().
The loop just before this uses break, not return, so this line
is reachable.  Commit cafe105655
introduced this issue.

Jelte Fennema-Nio, reviewed by Tristan Partin

Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com
2024-04-04 16:22:11 -04:00
Tom Lane 096a761d68 Fix ecpg's mechanism for detecting unsupported cases in the grammar.
ecpg wants to emit a warning if it parses a SQL construct that the
backend can parse but will immediately throw a FEATURE_NOT_SUPPORTED
error for.  The way it was testing for this was to see if the string
ERRCODE_FEATURE_NOT_SUPPORTED appeared anywhere in the gram.y code.
This is, of course, not nearly good enough, as there are plenty of
rules in gram.y that throw that error only conditionally.  There was
a hack dating to 2008 to suppress the warning in one rule that
doesn't even exist anymore, but nothing for other cases we've created
since then.  End result was that you could get "unsupported feature
will be passed to server" warnings while compiling perfectly good SQL
code in ecpg.  Somehow we'd not heard complaints about this, but
it was exposed by the recent addition of an ecpg test for a SQL/JSON
construct.

To fix, suppress the warning if the rule contains any "if" statement.
Manual comparison of gram.y with the generated preproc.y file shows
that the warning is now emitted only in rules where it's sensible.

This problem has existed for a long time, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/603615.1712245382@sss.pgh.pa.us
2024-04-04 15:31:53 -04:00
Tom Lane 332d406140 Further cleanup for recent JSON-related commits.
The link commands in test_json_parser/Makefile were a long way
shy of a load, as evidenced by buildfarm failures.  Model them
on pgxs.mk's PROGRAM rule.  (Probably we should have put these
two test programs in different subdirectories so we could
actually use the PROGRAM rule.  But I won't question that
decision today.)
2024-04-04 13:39:12 -04:00
Tom Lane 2497a669ef Further cleanup for recent JSON-related commits.
Add overlooked .gitignore entries.

Fix test_json_parser/Makefile to use the pgxs.mk clean rule
instead of fighting it.  Suppresses a warning from make,
at least for me.
2024-04-04 13:21:25 -04:00
Andrew Dunstan 88620824c2 Tidy up after incremental JSON parser patch
Remove junk left over from non-vpath builds.

Try to remedy gettext error on some platforms.
2024-04-04 12:41:55 -04:00
Andrew Dunstan 1b00fe30a6 Fix warnings re typedef redefinition in ea7b4e9a2a and 3311ea86ed
Per gripe from Tom Lane and the buildfarm
2024-04-04 11:36:26 -04:00
Amit Langote 6f4d63e989 Add missing initialization in transformJsonFuncExpr()
de3600452b added some code for the new JSON_TABLE_OP to that function
but missed to initialize the default_format variable.

Reported-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl
2024-04-04 22:01:13 +09:00
Amit Langote 2f6e78b061 Fix typo introduced in 6185c9737
Reported-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com
2024-04-04 20:53:23 +09:00
Amit Langote de3600452b Add basic JSON_TABLE() functionality
JSON_TABLE() allows JSON data to be converted into a relational view
and thus used, for example, in a FROM clause, like other tabular
data.  Data to show in the view is selected from a source JSON object
using a JSON path expression to get a sequence of JSON objects that's
called a "row pattern", which becomes the source to compute the
SQL/JSON values that populate the view's output columns.  Column
values themselves are computed using JSON path expressions applied to
each of the JSON objects comprising the "row pattern", for which the
SQL/JSON query functions added in 6185c9737c are used.

To implement JSON_TABLE() as a table function, this augments the
TableFunc and TableFuncScanState nodes that are currently used to
support XMLTABLE() with some JSON_TABLE()-specific fields.

Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN
clauses, which are required to provide more flexibility to extract
data out of nested JSON objects, but they are not implemented here
to keep this commit of manageable size.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Jian He <jian.universality@gmail.com>

Reviewers have included (in no particular order):

Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-04-04 20:20:15 +09:00
Peter Eisentraut a9d6c38684 pg_upgrade: Fix typo in message 2024-04-04 12:58:57 +02:00
Andrew Dunstan 222e11a10a Use incremental parsing of backup manifests.
This changes the three callers to json_parse_manifest() to use
json_parse_manifest_incremental_chunk() if appropriate. In the case of
the backend caller, since we don't know the size of the manifest in
advance we always call the incremental parser.

Author: Andrew Dunstan
Reviewed-By: Jacob Champion

Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
2024-04-04 06:46:40 -04:00
Andrew Dunstan ea7b4e9a2a Add support for incrementally parsing backup manifests
This adds the infrastructure for using the new non-recursive JSON parser
in processing manifests. It's important that callers make sure that the
last piece of json handed to the incremental manifest parser contains
the entire last few lines of the manifest, including the checksum.

Author: Andrew Dunstan
Reviewed-By: Jacob Champion

Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
2024-04-04 06:46:40 -04:00
Andrew Dunstan 3311ea86ed Introduce a non-recursive JSON parser
This parser uses an explicit prediction stack, unlike the present
recursive descent parser where the parser state is represented on the
call stack. This difference makes the new parser suitable for use in
incremental parsing of huge JSON documents that cannot be conveniently
handled piece-wise by the recursive descent parser. One potential use
for this will be in parsing large backup manifests associated with
incremental backups.

Because this parser is somewhat slower than the recursive descent
parser, it  is not replacing that parser, but is an additional parser
available to callers.

For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all
JSON parsing is done with the non-recursive parser, in which case only
trivial regression differences in error messages should be observed.

Author: Andrew Dunstan
Reviewed-By: Jacob Champion

Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
2024-04-04 06:46:40 -04:00
David Rowley 3a4a3537a9 Secondary refactor of heap scanning functions
Similar to 44086b097, refactor heap scanning functions to be more
suitable for the read stream API.

Author: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
2024-04-04 19:22:45 +13:00
Michael Paquier 2a217c3717 Coordinate emit_log_hook and all log destinations to share the same timeval
This would cause the timestamp values used by emit_log_hook and all the
other log destinations to differ, because the timestamps are reset
before sending the logs to the server and after calling the hook.

This change matters for emit_log_hook when generating log information
with 'n' or 'm' in log_line_prefix through log_status_format(), or when
doing direct calls to get_formatted_log_time() like in the JSON or CSV
log formats.

While on it, this commit fixes a couple of comments related to the
formatted timestamps where the JSON was not mentioned.  Oversight in
dc686681e0, that I have noticed while reviewing this patch.

Author: Kambam Vinay, Michael Paquier
Discussion: https://postgr.es/m/CANiRfmsK36A0i8mnQtzaxhSm3CUCimPwJPp4WQNq53OdSNkgWg@mail.gmail.com
2024-04-04 14:15:22 +09:00
David Rowley 44086b0975 Preliminary refactor of heap scanning functions
To allow the use of the read stream API added in b5a9b18cd for
sequential scans on heap tables, here we make some adjustments to make
that change less invasive and perhaps make the code easier to follow in
the process.

Here heapgetpage() gets broken into two functions:

1) The part which reads the block has now been moved into a function
   named heapfetchbuf().
2) The part which performed pruning and populated the scan's
   rs_vistuples[] array is now moved into a new function named
   heap_prepare_pagescan().

The functionality provided by heap_prepare_pagescan() was only ever
required by SO_ALLOW_PAGEMODE scans, so the branching that was
previously done in heapgetpage() is no longer needed as we simply just
don't call heap_prepare_pagescan() from heapgettup() in the refactored
code.

Author: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
2024-04-04 16:41:13 +13:00
Michael Paquier 85230a247c pg_regress: Save errno in emit_tap_output_v() and switch to %m
emit_tap_output_v() includes some fprintf() calls for some output
related to the TAP protocol, that may clobber errno and break %m.  This
commit makes the logging of pg_regress smarter by saving errno before
restoring it in vfprintf() where the input strings are used, removing
the need for strerror().  All logs are switched to %m rather than
strerror(), shaving some code.

This was not a problem until now as pg_regress.c did not use %m, but the
change is simple enough that we have no reason to not support this
placeholder, and that will avoid future mistakes if new logs that
include %m are added.

Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Peter Eisentraunt, Michael Paquier
Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
2024-04-04 11:33:07 +09:00
Jeff Davis 71b66171d0 CREATE INDEX: do not update stats during binary upgrade.
During binary upgrade, indexes are created before the data is moved
into place, so it will always be zero.

This is not currently a major problem, but will be when we try to
preserve statistics during upgrade.

Author: Corey Huinker
Discussion: https://postgr.es/m/CADkLM=daPdFB8V0tgFxK-dLowFsAEzWRWJHyxij7BG3kBjcouA@mail.gmail.com
2024-04-03 16:12:45 -07:00
Tom Lane 06286709ee Invent SERIALIZE option for EXPLAIN.
EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about
the volume of data emitted by a query, as well as the time taken
to convert the data to the on-the-wire format.  Previously there
was no way to investigate this without actually sending the data
to the client, in which case network transmission costs might
swamp what you wanted to see.  In particular this feature allows
investigating the costs of de-TOASTing compressed or out-of-line
data during formatting.

Stepan Rutz and Matthias van de Meent,
reviewed by Tomas Vondra and myself

Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de
2024-04-03 17:41:57 -04:00
Alexander Korotkov 97ce821e3e Fix the parameters order for TableAmRoutine.relation_copy_for_cluster()
Specify OldTable first, NewTable second as used by
table_relation_copy_for_cluster() and as implemented in
heapam_relation_copy_for_cluster().

Backpatch to PostgreSQL 12, where TableAmRoutine was introduced.

Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
Author: Japin Li
Reviewed-by: Pavel Borisov
Backpatch-through: 12
2024-04-04 00:34:28 +03:00
Alvaro Herrera c9920a9068
Split XLogCtl->LogwrtResult into separate struct members
After this change we have XLogCtl->logWriteResult and ->logFlushResult.
There's no functional change, other than the fact that the assignment
from shared memory to local is no longer done via struct assignment, but
instead using a macro that copies each member separately.

The current representation is inconvenient going forward; notably, we
would like to add a new member "Copy" (to keep track of the last
position copied into WAL buffers), so the symmetry between the values in
shared memory vs. those in local would be lost.

This also gives us freedom to later change the concurrency model for the
values in shared memory: we can make them use atomics instead of relying
on the info_lck spinlock.

Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql
2024-04-03 19:55:11 +02:00
Nathan Bossart deb1486c7d Inline pg_popcount() for small buffers.
If there aren't many bytes to process, the function call overhead
of the optimized implementation isn't worth taking, so instead we
inline a loop that consults pg_number_of_ones in that case.  If
there are many bytes to process, we accept the function call
overhead because the optimized versions are likely to be faster.
The threshold at which we use the optimized implementation is set
to the smallest amount of data required to use special popcount
instructions.

Reviewed-by: Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13
2024-04-03 12:22:02 -05:00
Heikki Linnakangas 6dbb490261 Combine freezing and pruning steps in VACUUM
Execute both freezing and pruning of tuples in the same
heap_page_prune() function, now called heap_page_prune_and_freeze(),
and emit a single WAL record containing all changes. That reduces the
overall amount of WAL generated.

This moves the freezing logic from vacuumlazy.c to the
heap_page_prune_and_freeze() function. The main difference in the
coding is that in vacuumlazy.c, we looked at the tuples after the
pruning had already happened, but in heap_page_prune_and_freeze() we
operate on the tuples before pruning. The heap_prepare_freeze_tuple()
function is now invoked after we have determined that a tuple is not
going to be pruned away.

VACUUM no longer needs to loop through the items on the page after
pruning. heap_page_prune_and_freeze() does all the work. It now
returns the list of dead offsets, including existing LP_DEAD items, to
the caller. Similarly it's now responsible for tracking 'all_visible',
'all_frozen', and 'hastup' on the caller's behalf.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-03 19:32:28 +03:00
Heikki Linnakangas 26d138f644 Refactor how heap_prune_chain() updates prunable_xid
In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.

Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.

Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
2024-04-03 19:32:21 +03:00
Alvaro Herrera be2f073100
Fix zeroing of pg_serial page without SLRU bank lock
Bug in commit 53c2a97a9266: we failed to acquire the correct SLRU bank
lock when iterating to zero-out intermediate pages in predicate.c.
Rewrite the code block so that we follow the locking protocol correctly.

Also update an outdated comment in the same file -- SerialSLRULock
exists no more.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/2a25eaf4-a3a4-5fd1-6241-9d7c73142085@gmail.com
2024-04-03 17:49:44 +02:00
Alexander Korotkov bf1e650806 Use the pairing heap instead of a flat array for LSN replay waiters
06c418e163 introduced pg_wal_replay_wait() procedure allowing to wait for
the particular LSN to be replayed on standby.  The waiters were stored in
the flat array.  Even though scanning small arrays is fast, that might be a
problem at scale (a lot of waiting processes).

This commit replaces the flat shared memory array with the pairing heap,
which holds the waiter with the least LSN at the top.  This gives us O(log N)
complexity for both inserting and removing waiters.

Reported-by: Alvaro Herrera
Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
2024-04-03 18:15:41 +03:00
Daniel Gustafsson 936e3fa378 Drop global objects after completed test
Project policy is to not leave global objects behind after a regress
test run.  This was found as a result of the development of a patch
to make pg_regress detect such leftovers automatically, which in the
end was withdrawn due to issues with parallel runs.

Discussion: https://postgr.es/m/E1phvk7-000VAH-7k@gemulon.postgresql.org
2024-04-03 13:33:25 +02:00
Amit Kapila 2ec005b4e2 Ensure that the sync slots reach a consistent state after promotion without losing data.
We were directly copying the LSN locations while syncing the slots on the
standby. Now, it is possible that at some particular restart_lsn there are
some running xacts, which means if we start reading the WAL from that
location after promotion, we won't reach a consistent snapshot state at
that point. However, on the primary, we would have already been in a
consistent snapshot state at that restart_lsn so we would have just
serialized the existing snapshot.

To avoid this problem we will use the advance_slot functionality unless
the snapshot already exists at the synced restart_lsn location. This will
help us to ensure that snapbuilder/slot statuses are updated properly
without generating any changes. Note that the synced slot will remain as
RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a
consistent snapshot state after which they will be marked as
RS_PERSISTENT.

Per buildfarm

Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
2024-04-03 14:04:59 +05:30
Alexander Korotkov e37662f221 Minor improvements for waitlsn.c
* Remove extra includes
 * Fill 'cur' in addLSNWaiter() before taking the spinlock
 * Initialize 'endtime' with zero in WaitForLSN() to avoid compiler warning

Reported-by: Alvaro Herrera, Masahiko Sawada, Daniel Gustafsson
Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
Discussion: https://postgr.es/m/CAD21AoAx7irptnPH1OkkkNh9E0M6X-phfX7sYZfwoMsc1qV1sQ%40mail.gmail.com
2024-04-03 11:32:39 +03:00
Daniel Gustafsson 9301308bd1 Fix indentation from cafe105655
Per buildfarm animal koel
2024-04-03 09:44:47 +02:00
Daniel Gustafsson 226261f387 Add error codes to some PANIC/FATAL errors reports
This adds errcodes to a set of PANIC and FATAL errors in xlog.c
and relcache.c,  which previously had no errcode at all set, in
order to make fleetwide analysis of errorlogs easier. There are
many more ereport/elogs left which could benefit from having an
errcode but this at least makes a dent in the issue.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ1k8LgLEqncPGmz_fWnrobV6bjABOTH4tOWta6xNcPQig@mail.gmail.com
2024-04-03 09:19:25 +02:00
Nathan Bossart c627d944e6 Add built-in ERROR handling for archive callbacks.
Presently, the archiver process restarts when an archive callback
ERRORs.  To avoid this, archive module authors can use sigsetjmp(),
manage a memory context, etc., but that requires a lot of extra
code that will likely look roughly the same between modules.  This
commit adds basic archive callback ERROR handling to pgarch.c so
that module authors won't ordinarily need to worry about this.
While this built-in handler attempts to clean up anything that an
archive module could conceivably have left behind, it is possible
that some modules are doing unexpected things that require
additional cleanup.  Module authors should be sure to do any extra
required cleanup in a PG_CATCH block within the archiving callback.

The archiving callback is now called in a short-lived memory
context that the archiver process resets between invocations.  If a
module requires longer-lived storage, it must maintain its own
memory context.

Thanks to these changes, the basic_archive module can be greatly
simplified.

Suggested-by: Andres Freund
Reviewed-by: Andres Freund, Yong Li
Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13
2024-04-02 22:28:11 -05:00
Masahiko Sawada 5bec1d6bc5 Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions.
Previously, when selecting the transaction to evict during logical
decoding, we check all transactions to find the largest
transaction. This could lead to a significant replication lag
especially in the case where there are many subtransactions.

This commit improves the eviction algorithm in ReorderBuffer using the
max-heap with transaction size as the key to efficiently find the
largest transaction.

The max-heap starts with empty. While the max-heap is empty, we don't
do anything for the max-heap when updating the memory
counter. Therefore, we get the largest transaction in O(N) time, where
N is the number of transactions including top-level transactions and
subtransactions.

We build the max-heap just before selecting the largest transactions
if the number of transactions being decoded is higher than the
threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap,
we also update the max-heap when updating the memory counter. The
intention is to efficiently find the largest transaction in O(1) time
instead of incurring the cost of memory counter updates (O(log
N)). Once the number of transactions got lower than the threshold, we
reset the max-heap.

The performance benchmark results showed significant speed up (more
than x30 speed up on my machine) in decoding a transaction with 100k
subtransactions, whereas there is no visible overhead in other cases.

Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian,
Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera,
Euler Taveira
Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com
2024-04-03 11:40:42 +09:00
David Rowley 7487044d6c Don't adjust ressortgroupref in generate_setop_child_grouplist()
This is already done inside assignSortGroupRef(), therefore is
redundant.

Oversight from 66c0185a3.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
2024-04-03 15:39:29 +13:00
Masahiko Sawada b840508644 Add functions to binaryheap for efficient key removal and update.
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).

This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.

The current code does not use the new indexing logic, but it will be
used by an upcoming patch.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:44:21 +09:00
Masahiko Sawada bcb14f4abc Make binaryheap enlargeable.
The node array space of the binaryheap is doubled when there is no
available space.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:27:43 +09:00
Alexander Korotkov 2c91e13013 Move WaitLSNShmemInit() to CreateOrAttachShmemStructs()
Thanks to Andres Freund, Thomas Munrom and David Rowley for investigating
this issue.

Discussion: https://postgr.es/m/CAPpHfdvap5mMLikt8CUjA0osAvCJHT0qnYeR3f84EJ_Kvse0mg%40mail.gmail.com
2024-04-03 02:55:03 +03:00
David Rowley 3b1a7eb289 Don't zero tuple_fraction when planning UNIONs with ORDER BYs
Since 66c0185a3, the planner is able to use Merge Append -> Unique to
implement UNION queries and each subquery is prompted to produce Paths
correctly sorted by the UNION's targetlist.

Here we remove some now redundant code which was zeroing the
tuple_fraction at the parent level.  This will allow the planner to
consider cheap startup paths when planning the UNION's subqueries.

EXCEPT and INTERSECT set operations still have the tuple_fraction zeroed
in generate_nonunion_paths().  These operations currently always read
all of their subqueries' tuples.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us
2024-04-03 11:40:33 +13:00