Commit Graph

11441 Commits

Author SHA1 Message Date
Michael Paquier 449e798c77 Introduce sequence_*() access functions
Similarly to tables and indexes, these functions are able to open
relations with a sequence relkind, which is useful to make a distinction
with the other relation kinds.  Previously, commands/sequence.c used a
mix of table_{close,open}() and relation_{close,open}() routines when
manipulating sequence relations, so this clarifies the code.

A direct effect of this change is to align the error messages produced
when attempting DDLs for sequences on relations with an unexpected
relkind, like a table or an index with ALTER SEQUENCE, providing an
extra error detail about the relkind of the relation used in the DDL
query.

Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
2024-02-26 16:04:59 +09:00
Heikki Linnakangas d360e3cc60 Fix compiler warning on typedef redeclaration
bulk_write.c:78:3: error: redefinition of typedef 'BulkWriteState' is a C11 feature [-Werror,-Wtypedef-redefinition]
    } BulkWriteState;
      ^
    ../../../../src/include/storage/bulk_write.h:20:31: note: previous definition is here
    typedef struct BulkWriteState BulkWriteState;
                                  ^
    1 error generated.

Per buildfarm animals 'sifaka' and 'longfin'.

Discussion: https://www.postgresql.org/message-id/9e1f63c3-ef16-404c-b3cb-859a96eaba39@iki.fi
2024-02-23 17:39:27 +02:00
Heikki Linnakangas 8af2565248 Introduce a new smgr bulk loading facility.
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-23 16:10:51 +02:00
Amit Kapila 93db6cbda0 Add a new slot sync worker to synchronize logical slots.
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.

The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.

A new parameter sync_replication_slots enables or disables this new
process.

On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.

A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.

Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-22 15:25:15 +05:30
Peter Eisentraut fbc93b8b5f Remove custom Constraint node read/write implementations
This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

Allegedly, only certain fields of the Constraint node are valid based
on contype.  But this has historically not been kept up to date in the
read/write functions.  The Constraint node is only used for debugging
DDL statements, so there are no strong requirements for its output,
and there is no enforcement for its correctness.  (There was no read
support before a6bc3301925.)  Commits e7a552f303 and abf46ad9c7 are
examples of where omissions were fixed.

This patch just removes the custom read/write implementations for the
Constraint node type.  Now we just output all the fields, which is a
bit more than before, but at least we don't have to maintain these
functions anymore.  Also, we lose the string representation of the
contype field, but for this marginal use case that seems tolerable.
This patch also changes the documentation of the Constraint struct to
put less emphasis on grouping fields by constraint type but rather
document for each field how it's used.

Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
2024-02-22 07:07:12 +01:00
Michael Paquier 943f7ae1c8 Add lookup table for replication slot conflict reasons
This commit switches the handling of the conflict cause strings for
replication slots to use a table rather than being explicitly listed,
using a C99-designated initializer syntax for the array elements.  This
makes the whole more readable while easing future maintenance with less
areas to update when adding a new conflict reason.

This is similar to 74a7306310, but the scale of the change is smaller
as there are less conflict causes than LWLock builtin tranche names.

Author: Bharath Rupireddy
Reviewed-by: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CALj2ACUxSLA91QGFrJsWNKs58KXb1C03mbuwKmzqqmoAKLwJaw@mail.gmail.com
2024-02-22 08:40:40 +09:00
Heikki Linnakangas 28f3915b73 Remove superfluous 'pgprocno' field from PGPROC
It was always just the index of the PGPROC entry from the beginning of
the proc array. Introduce a macro to compute it from the pointer
instead.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-02-22 01:21:34 +02:00
Peter Eisentraut 74563f6b90 Revert "Improve compression and storage support with inheritance"
This reverts commit 0413a55699.

pg_dump cannot currently dump all the structures that are allowed by
this patch.  This needs more work in pg_dump and more test coverage.

Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org
2024-02-20 11:10:59 +01:00
Nathan Bossart 3b42bdb471 Use new overflow-safe integer comparison functions.
Commit 6b80394781 introduced integer comparison functions designed
to be as efficient as possible while avoiding overflow.  This
commit makes use of these functions in many of the in-tree qsort()
comparators to help ensure transitivity.  Many of these comparator
functions should also see a small performance boost.

Author: Mats Kindahl
Reviewed-by: Andres Freund, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
2024-02-16 14:05:36 -06:00
Nathan Bossart 6b80394781 Introduce overflow-safe integer comparison functions.
This commit adds integer comparison functions that are designed to
be as efficient as possible while avoiding overflow.  A follow-up
commit will make use of these functions in many of the in-tree
qsort() comparators.  The new functions are not better in all cases
(e.g., when the comparator function is inlined), so it is important
to consider the context before using them.

Author: Mats Kindahl
Reviewed-by: Tom Lane, Heikki Linnakangas, Andres Freund, Thomas Munro, Andrey Borodin, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
2024-02-16 13:37:02 -06:00
Nathan Bossart 5497daf3aa Replace calls to pg_qsort() with the qsort() macro.
Calls to this function might give the impression that pg_qsort()
is somehow different than qsort(), when in fact there is a qsort()
macro in port.h that expands all in-tree uses to pg_qsort().

Reviewed-by: Mats Kindahl
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
2024-02-16 11:37:50 -06:00
Peter Eisentraut 0413a55699 Improve compression and storage support with inheritance
A child table can specify a compression or storage method different
from its parents.  This was previously an error.  (But this was
inconsistently enforced because for example the settings could be
changed later using ALTER TABLE.)  This now also allows an explicit
override if multiple parents have different compression or storage
settings, which was previously an error that could not be overridden.

The compression and storage properties remains unchanged in a child
inheriting from parent(s) after its creation, i.e., when using ALTER
TABLE ...  INHERIT.  (This is not changed.)

Before this change, the error detail would mention the first pair of
conflicting parent compression or storage methods.  But with this
change it waits till the child specification is considered by which
time we may have encountered many such conflicting pairs.  Hence the
error detail after this change does not include the conflicting
compression/storage methods.  Those can be obtained from parent
definitions if necessary.  The code to maintain list of all
conflicting methods or even the first conflicting pair does not seem
worth the convenience it offers.  This change is inline with what we
do with conflicting default values.

Before this commit, the specified storage method could be stored in
ColumnDef::storage (CREATE TABLE ... LIKE) or ColumnDef::storage_name
(CREATE TABLE ...).  This caused the MergeChildAttribute() and
MergeInheritedAttribute() to ignore a storage method specified in the
child definition since it looked only at ColumnDef::storage.  This
commit removes ColumnDef::storage and instead uses
ColumnDef::storage_name to save any storage method specification. This
is similar to how compression method specification is handled.

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org
2024-02-16 13:27:46 +01:00
Alexander Korotkov 51efe38cb9 Introduce transaction_timeout
This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within the transaction will be
terminated after spanning longer than this timeout.

However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin <amborodin@acm.org>
Author: Japin Li <japinli@hotmail.com>
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
2024-02-15 23:56:12 +02:00
Nathan Bossart 8fd0498de2 Remove obsolete check in SIGTERM handler for the startup process.
Thanks to commit 3b00fdba9f, this check in the SIGTERM handler for
the startup process is now obsolete and can be removed.  Instead of
leaving around the dead function write_stderr_signal_safe(), I've
opted to just remove it for now.

This partially reverts commit 97550c0711.

Reviewed-by: Andres Freund, Noah Misch
Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13
2024-02-14 17:09:31 -06:00
Nathan Bossart 8d8afd48d3 Allow pg_monitor to execute pg_current_logfile().
We allow roles with privileges of pg_monitor to execute functions
like pg_ls_logdir(), so it seems natural that such roles would also
be able to execute this function.

Bumps catversion.

Co-authored-by: Pavlo Golub
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CAK7ymcLmEYWyQkiCZ64WC-HCzXAB0omM%3DYpj9B3rXe8vUAFMqw%40mail.gmail.com
2024-02-14 11:48:29 -06:00
Amit Kapila ddd5f4f54a Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.

A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.

The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.

If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.

If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.

The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.

A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-14 09:45:36 +05:30
Michael Paquier 06bd311bce Revert "Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM"
This reverts commit 95fb5b4902, for reasons similar to what led to
1aa8324b81.  In this case, the callback was called once per row, which
is less worse than the previous callback introduced for COPY TO called
once per argument for each row, still the patch set discussed to plug in
custom routines to the COPY paths would be able to know which subroutine
to use depending on its CopyFromState, so this led to a suboptimal
approach at the end.

For now, this part is reverted to consider better which approach to use.

Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
2024-02-14 10:07:22 +09:00
Jeff Davis 91f2cae7a4 Read WAL directly from WAL buffers.
If available, read directly from WAL buffers, avoiding the need to go
through the filesystem. Only for physical replication for now, but can
be expanded to other callers.

In preparation for replicating unflushed WAL data.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com
Reviewed-by: Andres Freund, Alvaro Herrera, Nathan Bossart, Dilip Kumar, Nitin Jadhav, Melih Mutlu, Kyotaro Horiguchi
2024-02-12 11:11:22 -08:00
Thomas Munro 65f438471b Fix gai_strerror() thread-safety on Windows.
Commit 5579388d removed code that supplied a fallback implementation of
getaddrinfo(), which was dead code on modern systems.  One tiny piece of
the removed code was still doing something useful on Windows, though:
that OS's own gai_strerror()/gai_strerrorA() function returns a pointer
to a static buffer that it overwrites each time, so it's not
thread-safe.  In rare circumstances, a multi-threaded client program
could get an incorrect or corrupted error message.

Restore the replacement gai_strerror() function, though now that it's
only for Windows we can put it into a win32-specific file and cut it
down to the errors that Windows documents.  The error messages here are
taken from FreeBSD, because Windows' own messages seemed too verbose.

Back-patch to 16.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKz%2BF9d2PTiXwfYV7qJw%2BWg2jzACgSDgPizUw7UG%3Di58A%40mail.gmail.com
2024-02-12 11:14:21 +13:00
Daniel Gustafsson 5c7038d70b Refactor pipe_read_line to return the full line
Commit 5b2f4afffe refactored find_other_exec() and in the process
created pipe_read_line() into a static routine for reading a single
line of output, aimed at reading version numbers.  Commit a7e8ece41
later exposed it externally in order to read a postgresql.conf GUC
using "postgres -C ..".  Further, f06b1c598 also made use of it for
reading a version string much like find_other_exec().  The internal
variable remained "pgver", even when used for other purposes.

Since the function requires passing a buffer and its size, and at
most size - 1 bytes will be read via fgets(), there is a truncation
risk when using this for reading GUCs (like how pg_rewind does,
though the risk in this case is marginal).

To keep this as generic functionality for reading a line from a pipe,
this refactors pipe_read_line() into returning an allocated buffer
containing all of the line to remove the risk of silent truncation.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/DEDF73CE-D528-49A3-9089-B3592FD671A9@yesql.se
2024-02-09 15:03:16 +01:00
John Naylor 2579985086 Fix warnings in cpluspluscheck
Various int variables were compared to macros that are of type size_t,
which caused -Wsign-compare warnings in cpluspluscheck.  Change those
to size_t, which also better describes their purpose.

Per report from Peter Eisentraut

Discussion: https://postgr.es/m/486847dc-6de5-464a-938e-bac98ec2438b%40eisentraut.org
2024-02-08 10:07:26 +07:00
Alvaro Herrera d172b717c6
Use atomic access for SlruShared->latest_page_number
The new concurrency model proposed for slru.c to improve performance
does not include any single lock that would coordinate processes
doing concurrent reads/writes on SlruShared->latest_page_number.
We can instead use atomic reads and writes for that variable.

Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-02-06 10:54:10 +01:00
John Naylor b83033c3cf Further cosmetic review of hashfn_unstable.h
In follow-up to e97b672c8,
* Flesh out comments explaining the incremental interface
* Clarify detection of zero bytes when hashing aligned C strings

The latter was suggested and reviewed by Jeff Davis

Discussion: https://postgr.es/m/48e8f8bbe0be9c789f98776c7438244ab7a7cc63.camel%40j-davis.com
2024-02-06 14:49:06 +07:00
John Naylor 9ed3ee5001 Simplify initialization of incremental hash state
The standalone functions fasthash{32,64} use length for two purposes:
how many bytes to hash, and how to perturb the internal seed.

Developers using the incremental interface may not know the length
ahead of time (e.g. for C strings). In this case, it's advised to
pass length to the finalizer, but initialization still needed some
length up front, in the form of a placeholder macro.

Separate the concerns by having the standalone functions perturb the
internal seed themselves from their own length parameter, allowing
to remove "len" from fasthash_init(), as well as the placeholder macro.

Discussion: https://postgr.es/m/CANWCAZbTUk2LOyhsFo33gjLyLAHZ7ucXCi5K9u%3D%2BPtnTShDKtw%40mail.gmail.com
2024-02-06 14:39:36 +07:00
Peter Eisentraut 1ae5ace755 Fix meson installation of new generated files
Fix for 9b1a6f50b9: We want to install catalog/syscache_ids.h but not
catalog/syscache_info.h.  The meson code has this backwards.  The
makefiles are ok.

Reported-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/CAJ7c6TMDGmAiozDjJQ3%3DP3cd-1BidC_GpitcAuU0aqq-r1eSoQ%40mail.gmail.com
2024-02-05 15:45:29 +01:00
Amit Kapila dafbfed9ef Enhance libpqrcv APIs to support slot synchronization.
This patch provides support for regular (non-replication) connections in
libpqrcv_connect(). This can be used to execute SQL statements on the
primary server without starting a walsender.

A new API libpqrcv_get_dbname_from_conninfo() is also added to extract the
database name from the given connection-info.

Note that this patch doesn't change any existing functionality but later
patches implementing the slot synchronization will use this functionality
to connect to the primary server to fetch required slot information.

Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-05 10:54:06 +05:30
Michael Paquier 95fb5b4902 Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM
CopyReadAttributes{CSV,Text}() are used to parse lines for text and CSV
format.  This reduces the number of "if" branches that need to be
checked when parsing fields in CSV and text mode when dealing with a
COPY FROM, something that can become more noticeable with more
attributes and more lines to process.

Extracted from a larger patch by the same author.

Author: Sutou Kouhei
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
2024-02-05 09:46:02 +09:00
Heikki Linnakangas 21d9c3ee4e Give SMgrRelation pointers a well-defined lifetime.
After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose(), which freed the memory.

Guarantee that the object won't be destroyed until the end of the
current transaction, or in recovery, the commit/abort record that
destroys the underlying storage.

smgrclose() is now just an alias for smgrrelease(). It closes files
and forgets all state except the rlocator, but keeps the SMgrRelation
object valid.

A new smgrdestroy() function is used by rare places that know there
should be no other references to the SMgrRelation.

The short version:

 * smgrclose() is now just an alias for smgrrelease(). It releases
   resources, but doesn't destroy until EOX
 * smgrdestroy() now frees memory, and should rarely be used.

Existing code should be unaffected, but it is now possible for code that
has an SMgrRelation object to use it repeatedly during a transaction as
long as the storage hasn't been physically dropped.  Such code would
normally hold a lock on the relation.

This also replaces the "ownership" mechanism of SMgrRelations with a
pin counter.  An SMgrRelation can now be "pinned", which prevents it
from being destroyed at end of transaction.  There can be multiple pins
on the same SMgrRelation.  In practice, the pin mechanism is only used
by the relcache, so there cannot be more than one pin on the same
SMgrRelation.  Except with swap_relation_files XXX

Author: Thomas Munro, Heikki Linnakangas
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com
2024-01-31 12:31:02 +02:00
Amit Kapila 776621a5e4 Add a failover option to subscriptions.
This commit introduces a new subscription option named 'failover', which
provides users with the ability to set the failover property of the
replication slot on the publisher when creating or altering a
subscription.

This uses the replication commands introduced by commit 7329240437 to
enable the failover option for a logical replication slot.

If the failover option is set to true, the associated replication slots
(i.e. the main slot and the table sync slots) in the upstream database are
enabled to be synchronized to the standbys. Note that the capability to
sync the replication slots will be added in subsequent commits.

Thanks to Masahiko Sawada for the design inputs.

Author: Shveta Malik, Hou Zhijie, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-30 16:49:28 +05:30
Nathan Bossart 97287bdfae Move is_valid_ascii() to ascii.h.
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h.  Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics).  Moving the function to the much
less popular ascii.h resolves these issues for now.

This commit is back-patched for the benefit of the aforementioned
third-party tool.  The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists.  This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code.  It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().

Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
2024-01-29 12:08:57 -06:00
Alvaro Herrera 5de890e361
Add EXPLAIN (MEMORY) to report planner memory consumption
This adds a new "Memory:" line under the "Planning:" group (which
currently only has "Buffers:") when the MEMORY option is specified.

In order to make the reporting reasonably accurate, we create a separate
memory context for planner activities, to be used only when this option
is given.  The total amount of memory allocated by that context is
reported as "allocated"; we subtract memory in the context's freelists
from that and report that result as "used".  We use
MemoryContextStatsInternal() to obtain the quantities.

The code structure to show buffer usage during planning was not in
amazing shape, so I (Álvaro) modified the patch a bit to clean that up
in passing.

Author: Ashutosh Bapat
Reviewed-by: David Rowley, Andrey Lepikhov, Jian He, Andy Fan
Discussion: https://www.postgresql.org/message-id/CAExHW5sZA=5LJ_ZPpRO-w09ck8z9p7eaYAqq3Ks9GDfhrxeWBw@mail.gmail.com
2024-01-29 17:53:03 +01:00
Amit Kapila 7329240437 Allow setting failover property in the replication command.
This commit implements a new replication command called
ALTER_REPLICATION_SLOT and a corresponding walreceiver API function named
walrcv_alter_slot. Additionally, the CREATE_REPLICATION_SLOT command has
been extended to support the failover option.

These new additions allow the modification of the failover property of a
replication slot on the publisher. A subsequent commit will make use of
these commands in subscription commands and will add the tests as well to
cover the functionality added/changed by this commit.

Author: Hou Zhijie, Shveta Malik
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-29 09:37:23 +05:30
Masahiko Sawada 08e6344fd6 Remove ReorderBufferTupleBuf structure.
Since commit a4ccc1cef, the 'node' and 'alloc_tuple_size' fields of
the ReorderBufferTupleBuf structure are no longer used. This leaves
only the 'tuple' field in the structure. Since keeping a single-field
structure makes little sense, the ReorderBufferTupleBuf is removed
entirely. The code is refactored accordingly.

No back-patching since these are ABI changes in an exposed structure
and functions, and there would be some risk of breaking extensions.

Author: Aleksander Alekseev
Reviewed-by: Amit Kapila, Masahiko Sawada, Reid Thompson
Discussion: https://postgr.es/m/CAD21AoCvnuxiXXfRecp7g9+CeC35POQfhuQeJFr7_9u_Q5jc_Q@mail.gmail.com
2024-01-29 10:37:16 +09:00
Tom Lane 8ba6fdf905 Support TZ and OF format codes in to_timestamp().
Formerly, these were only supported in to_char(), but there seems
little reason for that restriction.  We should at least have enough
support to permit round-tripping the output of to_char().

In that spirit, TZ accepts either zone abbreviations or numeric
(HH or HH:MM) offsets, which are the cases that to_char() can output.
In an ideal world we'd make it take full zone names too, but
that seems like it'd introduce an unreasonable amount of ambiguity,
since the rules for POSIX-spec zone names are so lax.

OF is a subset of this, accepting only HH or HH:MM.

One small benefit of this improvement is that we can simplify
jsonpath's executeDateTimeMethod function, which no longer needs
to consider the HH and HH:MM cases separately.  Moreover, letting
it accept zone abbreviations means it will accept "Z" to mean UTC,
which is emitted by JSON.stringify() for example.

Patch by me, reviewed by Aleksander Alekseev and Daniel Gustafsson

Discussion: https://postgr.es/m/1681086.1686673242@sss.pgh.pa.us
2024-01-25 17:47:08 -05:00
Andrew Dunstan 66ea94e8e6 Implement various jsonpath methods
This commit implements ithe jsonpath .bigint(), .boolean(),
.date(), .decimal([precision [, scale]]), .integer(), .number(),
.string(), .time(), .time_tz(), .timestamp(), and .timestamp_tz()
methods.

.bigint() converts the given JSON string or a numeric value to
the bigint type representation.

.boolean() converts the given JSON string, numeric, or boolean
value to the boolean type representation.  In the numeric case, only
integers are allowed. We use the parse_bool() backend function
to convert a string to a bool.

.decimal([precision [, scale]]) converts the given JSON string
or a numeric value to the numeric type representation.  If precision
and scale are provided for .decimal(), then it is converted to the
equivalent numeric typmod and applied to the numeric number.

.integer() and .number() convert the given JSON string or a
numeric value to the int4 and numeric type representation.

.string() uses the datatype's output function to convert numeric
and various date/time types to the string representation.

The JSON string representing a valid date/time is converted to the
specific date or time type representation using jsonpath .date(),
.time(), .time_tz(), .timestamp(), .timestamp_tz() methods.  The
changes use the infrastructure of the .datetime() method and perform
the datatype conversion as appropriate.  Unlike the .datetime()
method, none of these methods accept a format template and use ISO
DateTime format instead.  However, except for .date(), the
date/time related methods take an optional precision to adjust the
fractional seconds.

Jeevan Chalke, reviewed by Peter Eisentraut and Andrew Dunstan.
2024-01-25 10:15:43 -05:00
Peter Eisentraut 924d046dcf Add a const decoration
Useful for a subsequent patch.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2024-01-25 13:34:49 +01:00
Alvaro Herrera 55627ba2d3
Remove dummy_spinlock
It's been unused since 1b468a131b (2015).
2024-01-25 11:43:47 +01:00
Amit Kapila c393308b69 Allow to enable failover property for replication slots via SQL API.
This commit adds the failover property to the replication slot. The
failover property indicates whether the slot will be synced to the standby
servers, enabling the resumption of corresponding logical replication
after failover. But note that this commit does not yet include the
capability to sync the replication slot; the subsequent commits will add
that capability.

A new optional parameter 'failover' is added to the
pg_create_logical_replication_slot() function. We will also enable to set
'failover' option for slots via the subscription commands in the
subsequent commits.

The value of the 'failover' flag is displayed as part of
pg_replication_slots view.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-25 12:15:46 +05:30
Thomas Munro 820b5af73d jit: Require at least LLVM 10.
Remove support for older LLVM versions.  The default on common software
distributions will be at least LLVM 10 when PostgreSQL 17 ships.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com
2024-01-25 15:42:34 +13:00
Masahiko Sawada 729439607a Add progress reporting of skipped tuples during COPY FROM.
9e2d870119 enabled the COPY command to skip malformed data, however
there was no visibility into how many tuples were actually skipped
during the COPY FROM.

This commit adds a new "tuples_skipped" column to
pg_stat_progress_copy view to report the number of tuples that were
skipped because they contain malformed data.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/d12fd8c99adcae2744212cb23feff6ed%40oss.nttdata.com
2024-01-25 10:57:41 +09:00
Peter Eisentraut 46a0cd4cef Add temporal PRIMARY KEY and UNIQUE constraints
Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints.
These are backed by GiST indexes instead of B-tree indexes, since they
are essentially exclusion constraints with = for the scalar parts of
the key and && for the temporal part.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-01-24 16:34:37 +01:00
Amit Langote 1edb3b491b Adjust populate_record_field() to handle errors softly
This adds a Node *escontext parameter to it and a bunch of functions
downstream to it, replacing any ereport()s in that path by either
errsave() or ereturn() as appropriate.  This also adds code to those
functions where necessary to return early upon encountering a soft
error.

The changes here are mainly intended to suppress errors in the
functions of jsonfuncs.c.  Functions in any external modules, such as
arrayfuncs.c, that those functions may in turn call are not changed
here based on the assumption that the various checks in jsonfuncs.c
functions should ensure that only values that are structurally valid
get passed to the functions in those external modules.  An exception
is made for domain_check() to allow handling domain constraint
violation errors softly.

For testing, this adds a function jsonb_populate_record_valid(),
which returns true if jsonb_populate_record() would finish without
causing an error for the provided JSON object, false otherwise.  Note
that jsonb_populate_record() internally calls populate_record(),
which in turn uses populate_record_field().

Extracted from a much larger patch to add SQL/JSON query functions.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>

Reviewers have included (in no particular order) Andres Freund,
Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers,
Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby,
Álvaro Herrera, Jian He, Peter Eisentraut

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-01-24 15:04:33 +09:00
Amit Langote aaaf9449ec Add soft error handling to some expression nodes
This adjusts the code for CoerceViaIO and CoerceToDomain expression
nodes to handle errors softly.

For CoerceViaIo, this adds a new ExprEvalStep opcode
EEOP_IOCOERCE_SAFE, which is implemented in the new accompanying
function ExecEvalCoerceViaIOSafe().  The only difference from
EEOP_IOCOERCE's inline implementation is that the input function
receives an ErrorSaveContext via the function's
FunctionCallInfo.context, which it can use to handle errors softly.

For CoerceToDomain, this simply entails replacing the ereport() in
ExecEvalConstraintNotNull() and ExecEvalConstraintCheck() by
errsave() passing it the ErrorSaveContext passed in the expression's
ExprEvalStep.

In both cases, the ErrorSaveContext to be used is passed by setting
ExprState.escontext to point to it before calling ExecInitExprRec()
on the expression tree whose errors are to be handled softly.

Note that there's no functional change as of this commit as no call
site of ExecInitExprRec() has been changed.  This is intended for
implementing new SQL/JSON expression nodes in future commits.

Extracted from a much larger patch to add SQL/JSON query functions.

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>

Reviewers have included (in no particular order) Andres Freund,
Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers,
Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby,
Álvaro Herrera, Jian He, Peter Eisentraut

Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
2024-01-24 15:04:33 +09:00
Peter Eisentraut 6eb6086faa Fix makefiles for newly added files
The new files added in 9b1a6f50b9 need to be mentioned in a few more
places in the makefiles.

Discussion: https://www.postgresql.org/message-id/flat/E1rSAY2-002hIk-4y%40gemulon.postgresql.org
2024-01-23 16:33:26 +01:00
Peter Eisentraut 9b1a6f50b9 Generate syscache info from catalog files
Add a new genbki macros MAKE_SYSCACHE that specifies the syscache ID
macro, the underlying index, and the number of buckets.  From that, we
can generate the existing tables in syscache.h and syscache.c via
genbki.pl.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2024-01-23 07:31:06 +01:00
David Rowley b262ad440e Add better handling of redundant IS [NOT] NULL quals
Until now PostgreSQL has not been very smart about optimizing away IS
NOT NULL base quals on columns defined as NOT NULL.  The evaluation of
these needless quals adds overhead.  Ordinarily, anyone who came
complaining about that would likely just have been told to not include
the qual in their query if it's not required.  However, a recent bug
report indicates this might not always be possible.

Bug 17540 highlighted that when we optimize Min/Max aggregates the IS NOT
NULL qual that the planner adds to make the rewritten plan ignore NULLs
can cause issues with poor index choice.  That particular case
demonstrated that other quals, especially ones where no statistics are
available to allow the planner a chance at estimating an approximate
selectivity for can result in poor index choice due to cheap startup paths
being prefered with LIMIT 1.

Here we take generic approach to fixing this by having the planner check
for NOT NULL columns and just have the planner remove these quals (when
they're not needed) for all queries, not just when optimizing Min/Max
aggregates.

Additionally, here we also detect IS NULL quals on a NOT NULL column and
transform that into a gating qual so that we don't have to perform the
scan at all.  This also works for join relations when the Var is not
nullable by any outer join.

This also helps with the self-join removal work as it must replace
strict join quals with IS NOT NULL quals to ensure equivalence with the
original query.

Author: David Rowley, Richard Guo, Andy Fan
Reviewed-by: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAApHDvqg6XZDhYRPz0zgOcevSMo0d3vxA9DvHrZtKfqO30WTnw@mail.gmail.com
Discussion: https://postgr.es/m/17540-7aa1855ad5ec18b4%40postgresql.org
2024-01-23 18:09:18 +13:00
Michael Paquier d86d20f0ba Add backend support for injection points
Injection points are a new facility that makes possible for developers
to run custom code in pre-defined code paths.  Its goal is to provide
ways to design and run advanced tests, for cases like:
- Race conditions, where processes need to do actions in a controlled
ordered manner.
- Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force
recovery, etc.
- Arbitrary sleeps.

This implements some basics, and there are plans to extend it more in
the future depending on what's required.  Hence, this commit adds a set
of routines in the backend that allows developers to attach, detach and
run injection points:
- A code path calling an injection point can be declared with the macro
INJECTION_POINT(name).
- InjectionPointAttach() and InjectionPointDetach() to respectively
attach and detach a callback to/from an injection point.  An injection
point name is registered in a shmem hash table with a library name and a
function name, which will be used to load the callback attached to an
injection point when its code path is run.

Injection point names are just strings, so as an injection point can be
declared and run by out-of-core extensions and modules, with callbacks
defined in external libraries.

This facility is hidden behind a dedicated switch for ./configure and
meson, disabled by default.

Note that backends use a local cache to store callbacks already loaded,
cleaning up their cache if a callback has found to be removed on a
best-effort basis.  This could be refined further but any tests but what
we have here was fine with the tests I've written while implementing
these backend APIs.

Author: Michael Paquier, with doc suggestions from Ashutosh Bapat.
Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip
Kumar, Amul Sul, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-22 10:15:50 +09:00
Alexander Korotkov 0452b461bc Explore alternative orderings of group-by pathkeys during optimization.
When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.

Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Author: Andrei Lepikhov, Teodor Sigaev
Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
2024-01-21 22:21:36 +02:00
Tom Lane 075df6b208 Add planner support functions for range operators <@ and @>.
These support functions will transform expressions with constant
range values into direct comparisons on the range bound values,
which are frequently better-optimizable.  The transformation is
skipped however if it would require double evaluation of a
volatile or expensive element expression.

Along the way, add the range opfamily OID to range typcache entries,
since load_rangetype_info has to compute that anyway and it seems
silly to duplicate the work later.

Kim Johan Andersson and Jian He, reviewed by Laurenz Albe

Discussion: https://postgr.es/m/94f64d1f-b8c0-b0c5-98bc-0793a34e0851@kimmet.dk
2024-01-20 13:57:54 -05:00
Nathan Bossart 8b2bcf3f28 Introduce the dynamic shared memory registry.
Presently, the most straightforward way for a shared library to use
shared memory is to request it at server startup via a
shmem_request_hook, which requires specifying the library in
shared_preload_libraries.  Alternatively, the library can create a
dynamic shared memory (DSM) segment, but absent a shared location
to store the segment's handle, other backends cannot use it.  This
commit introduces a registry for DSM segments so that these other
backends can look up existing segments with a library-specified
string.  This allows libraries to easily use shared memory without
needing to request it at server startup.

The registry is accessed via the new GetNamedDSMSegment() function.
This function handles allocating the segment and initializing it
via a provided callback.  If another backend already created and
initialized the segment, it simply attaches the segment.
GetNamedDSMSegment() locks the registry appropriately to ensure
that only one backend initializes the segment and that all other
backends just attach it.

The registry itself is comprised of a dshash table that stores the
DSM segment handles keyed by a library-specified string.

Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul
Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13
2024-01-19 14:24:36 -06:00
Peter Eisentraut 6db4598fcb Add stratnum GiST support function
This is support function 12 for the GiST AM and translates
"well-known" RT*StrategyNumber values into whatever strategy number is
used by the opclass (since no particular numbers are actually
required).  We will use this to support temporal PRIMARY
KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality.

This commit adds two implementations, one for internal GiST opclasses
(just an identity function) and another for btree_gist opclasses.  It
updates btree_gist from 1.7 to 1.8, adding the support function for
all its opclasses.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-01-19 15:42:13 +01:00
Alexander Korotkov b725b7eec4 Rename COPY option from SAVE_ERROR_TO to ON_ERROR
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".

Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi
2024-01-19 15:15:51 +02:00
John Naylor dd0a0cfc81 Fixed misspelled byteswap function for big endian machines
Per members lora and mamba
2024-01-19 13:26:18 +07:00
John Naylor 0aba255440 Add optimized C string hashing
Given an already-initialized hash state and a NUL-terminated string,
accumulate the hash of the string into the hash state and return the
length for the caller to (optionally) save for the finalizer. This
avoids a strlen call.

If the string pointer is aligned, we can use a word-at-a-time
algorithm for NUL lookahead. The aligned case is only used on 64-bit
platforms, since it's not worth the extra complexity for 32-bit.

Handling the tail of the string after finishing the word-wise loop
was inspired by NetBSD's strlen(), but no code was taken since that
is written in assembly language.

As demonstration, use this in the search path cache. This brings the
general case performance closer to the special case optimization done
in commit a86c61c9ee. There are other places that could benefit, but
that is left for future work.

Jeff Davis and John Naylor
Reviewed by Heikki Linnakangas, Jian He, Junwang Zhao

Discussion: https://postgr.es/m/3820f030fd008ff14134b3e9ce5cc6dd623ed479.camel%40j-davis.com
Discussion: https://postgr.es/m/b40292c99e623defe5eadedab1d438cf51a4107c.camel%40j-davis.com
2024-01-19 12:56:15 +07:00
John Naylor e97b672c88 Add inline incremental hash functions for in-memory use
It can be useful for a hash function to expose separate initialization,
accumulation, and finalization steps.  In particular, this is useful
for building inline hash functions for simplehash.  Instead of trying
to whack around hash_bytes while maintaining its current behavior on
all platforms, we base this work on fasthash (MIT licensed) which
is simple, faster than hash_bytes for inputs over 12 bytes long,
and also passes the hash function testing suite SMHasher.

The fasthash functions have been reimplemented using our added-on
incremental interface to validate that this method will still give
the same answer, provided we have the input length ahead of time.

This functionality lives in a new header hashfn_unstable.h. The name
implies we have the freedom to change things across versions that
would be unacceptable for our other hash functions that are used for
e.g. hash indexes and hash partitioning. As such, these should only
be used for in-memory data structures like hash tables. There is also
no guarantee of being independent of endianness or pointer size.

As demonstration, use fasthash for pgstat_hash_hash_key.  Previously
this called the 32-bit murmur finalizer on the three elements,
then joined them with hash_combine(). The new function is simpler,
faster and takes up less binary space. While the collision and bias
behavior were almost certainly fine with the previous coding, now we
have objective confidence of that.

There are other places that could benefit from this, but that is left
for future work.

Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao
Credit to Andres Freund for the idea

Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de
2024-01-19 12:44:09 +07:00
David Rowley 4b31063643 Fix broken Bitmapset optimization in DiscreteKnapsack()
Some code in DiscreteKnapsack() attempted to zero all words in a
Bitmapset by performing bms_del_members() to delete all the members from
itself before replacing those members with members from another set.
When that code was written, this was a valid way to manipulate the set
in such a way to save from palloc having to be called to allocate a new
Bitmapset.  However, 00b41463c modified Bitmapsets so that an empty set is
*always* represented as NULL and this breaks the optimization as the
Bitmapset code will always pfree the memory when the set becomes empty.

Since DiscreteKnapsack() has been coded to avoid as many unneeded
allocations as possible, it seems risky to not fix this.  Here we add
bms_replace_members() to effectively perform an in-place copy of another
set, reusing the memory of the existing set, when possible.

This got broken in v16, but no backpatch for now as there've been no
complaints.

Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/CAApHDvoTCBkBU2PJghNOFUiO0q=QP4WAWHi5sJP6_4=b2WodrA@mail.gmail.com
2024-01-19 10:44:36 +13:00
Robert Haas c120550edb Optimize vacuuming of relations with no indexes.
If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD when pruning. This significantly reduces WAL
volume, since we no longer need to emit one WAL record for pruning
and a second to change the LP_DEAD line pointers thus created to
LP_UNUSED.

Melanie Plageman, reviewed by Andres Freund, Peter Geoghegan, and me

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
2024-01-18 10:03:42 -05:00
Michael Paquier 8013850c85 Add try_index_open(), conditional variant of index_open()
try_index_open() is able to open an index if its relkind fits, except
that it would return NULL instead of generated an error if the relation
does not exist.  This new routine will be used by an upcoming patch to
make REINDEX on partitioned relations more robust when an index in a
partition tree is dropped.

Extracted from a larger patch by the same author.

Author: Fei Changhong
Discussion: https://postgr.es/m/tencent_6A52106095ACDE55333E3AD33F304C0C3909@qq.com
Backpatch-through: 14
2024-01-18 15:04:24 +09:00
Alexander Korotkov 9e2d870119 Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.

This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.

Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi
2024-01-16 23:08:53 +02:00
Heikki Linnakangas c6b86eaa55 Add missing PGDLLIMPORT markings
Since commit 8ec569479f, we have a policy of marking all backend
variables with PGDLLIMPORT.

Reported-by: Anton A. Melnikov
Discussion: https://www.postgresql.org/message-id/0b78546c-ffef-4cd9-9ba1-d1e6aab88cea@postgrespro.ru
2024-01-16 13:53:28 +02:00
Peter Eisentraut 4f622503d6 Make attstattarget nullable
This changes the pg_attribute field attstattarget into a nullable
field in the variable-length part of the row.  If no value is set by
the user for attstattarget, it is now null instead of previously -1.
This saves space in pg_attribute and tuple descriptors for most
practical scenarios.  (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108
to 104.)  Also, null is the semantically more correct value.

The ANALYZE code internally continues to represent the default
statistics target by -1, so that that code can avoid having to deal
with null values.  But that is now contained to the ANALYZE code.
Only the DDL code deals with attstattarget possibly null.

For system columns, the field is now always null.  The ANALYZE code
skips system columns anyway.

To set a column's statistics target to the default value, the new
command form ALTER TABLE ... SET STATISTICS DEFAULT can be used.  (SET
STATISTICS -1 still works.)

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-01-13 18:14:53 +01:00
Michael Paquier e72a37528d Refactor code checking for file existence
jit.c and dfgr.c had a copy of the same code to check if a file exists
or not, with a twist: jit.c did not check for EACCES when failing the
stat() call for the path whose existence is tested.  This refactored
routine will be used by an upcoming patch.

Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-12 12:04:51 +09:00
Robert Haas d9ef650fca Add new function pg_get_wal_summarizer_state().
This makes it possible to access information about the progress
of WAL summarization from SQL. The previously-added functions
pg_available_wal_summaries() and pg_wal_summary_contents() only
examine on-disk state, but this function exposes information from
the server's shared memory.

Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
2024-01-11 12:41:18 -05:00
Nathan Bossart 5b1b9bce84 Cross-check lists of predefined LWLocks.
Both lwlocknames.txt and wait_event_names.txt contain a list of all
the predefined LWLocks, i.e., those with predefined positions
within MainLWLockArray.  It is easy to miss one or the other,
especially since the list in wait_event_names.txt omits the "Lock"
suffix from all the LWLock wait events.  This commit adds a cross-
check of these lists to the script that generates lwlocknames.h.
If the lists do not match exactly, building will fail.

Suggested-by: Robert Haas
Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot
Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13
2024-01-09 11:05:19 -06:00
Alexander Korotkov 30b4955a46 Fix misuse of RelOptInfo.unique_for_rels cache by SJE
When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to
innerrel_is_unique_ext().  That might lead to an invalid match to cache entries
made by previous non self-join checking calls.  Add UniqueRelInfo.self_join
flag to prevent such cases.  Also, fix that SJE should require a strict match
of outerrelids to make sure UniqueRelInfo.extra_clauses are valid.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com
2024-01-09 00:09:06 +02:00
Noah Misch d3c5f37dd5 Make dblink interruptible, via new libpqsrv APIs.
This replaces dblink's blocking libpq calls, allowing cancellation and
allowing DROP DATABASE (of a database not involved in the query).  Apart
from explicit dblink_cancel_query() calls, dblink still doesn't cancel
the remote side.  The replacement for the blocking calls consists of
new, general-purpose query execution wrappers in the libpqsrv facility.
Out-of-tree extensions should adopt these.  Use them in postgres_fdw,
replacing a local implementation from which the libpqsrv implementation
derives.  This is a bug fix for dblink.  Code inspection identified the
bug at least thirteen years ago, but user complaints have not appeared.
Hence, no back-patch for now.

Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
2024-01-08 11:39:56 -08:00
Noah Misch 0efc831847 Remove excess #include "utils/wait_event.h".
This simplifies copying libpq/libpq-be-fe-helpers.h into extensions,
because some supported PostgreSQL versions lack the other header.

Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
2024-01-08 11:39:56 -08:00
Noah Misch 196799d676 Fix missing word in comment. 2024-01-08 11:39:56 -08:00
Tom Lane 89b69db82a Allow examine_simple_variable() to work on INSERT RETURNING Vars.
Since commit 599b33b94, this function assumed that every RTE_RELATION
RangeTblEntry would have an associated RelOptInfo.  But that's not so:
we only build RelOptInfos for relations that are scanned by the query.
In particular the target of an INSERT won't have one, so that Vars
appearing in an INSERT ... RETURNING list will not have an associated
RelOptInfo.  This apparently wasn't a problem before commit f7816aec2
taught examine_simple_variable() to drill down into CTEs containing
INSERT RETURNING, but it is now.

To fix, add a fallback code path that gets the userid to use directly
from the RTEPermissionInfo associated with the RTE.  (Sadly, we must
have two code paths, because not every RTE has a RTEPermissionInfo
either.)

Per report from Alexander Lakhin.  No back-patch, since the case is
apparently unreachable before f7816aec2.

Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com
2024-01-08 11:48:44 -05:00
Tom Lane 9391f71523 Teach estimate_array_length() to use statistics where available.
If we have DECHIST statistics about the argument expression, use
the average number of distinct elements as the array length estimate.
(It'd be better to use the average total number of elements, but
that is not currently calculated by compute_array_stats(), and
it's unclear that it'd be worth extra effort to get.)

To do this, we have to change the signature of estimate_array_length
to pass the "root" pointer.  While at it, also change its result
type to "double".  That's probably not really necessary, but it
avoids any risk of overflow of the value extracted from DECHIST.
All existing callers are going to use the result in a "double"
calculation anyway.

Paul Jungwirth, reviewed by Jian He and myself

Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com
2024-01-04 18:36:19 -05:00
Nathan Bossart 14dd0f27d7 Add macros for looping through a List without a ListCell.
Many foreach loops only use the ListCell pointer to retrieve the
content of the cell, like so:

    ListCell   *lc;

    foreach(lc, mylist)
    {
        int         myint = lfirst_int(lc);

        ...
    }

This commit adds a few convenience macros that automatically
declare the loop variable and retrieve the current cell's contents.
This allows us to rewrite the previous loop like this:

    foreach_int(myint, mylist)
    {
        ...
    }

This commit also adjusts a few existing loops in order to add
coverage for the new/adjusted macros.  There is presently no plan
to bulk update all foreach loops, as that could introduce a
significant amount of back-patching pain.  Instead, these macros
are primarily intended for use in new code.

Author: Jelte Fennema-Nio
Reviewed-by: David Rowley, Alvaro Herrera, Vignesh C, Tom Lane
Discussion: https://postgr.es/m/CAGECzQSwXKnxGwW1_Q5JE%2B8Ja20kyAbhBHO04vVrQsLcDciwXA%40mail.gmail.com
2024-01-04 16:09:34 -06:00
Peter Eisentraut 5d06e99a3c ALTER TABLE command to change generation expression
This adds a new ALTER TABLE subcommand ALTER COLUMN ... SET EXPRESSION
that changes the generation expression of a generated column.

The syntax is not standard but was adapted from other SQL
implementations.

This command causes a table rewrite, using the usual ALTER TABLE
mechanisms.  The implementation is similar to and makes use of some of
the infrastructure of the SET DATA TYPE subcommand (for example,
rebuilding constraints and indexes afterwards).  The new command
requires a new pass in AlterTablePass, and the ADD COLUMN pass had to
be moved earlier so that combinations of ADD COLUMN and SET EXPRESSION
can work.

Author: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com
2024-01-04 16:28:54 +01:00
Amit Kapila 007693f2a3 Track conflict_reason in pg_replication_slots.
This patch changes the existing 'conflicting' field to 'conflict_reason'
in pg_replication_slots. This new field indicates the reason for the
logical slot's conflict with recovery. It is always NULL for physical
slots, as well as for logical slots which are not invalidated. The
non-NULL values indicate that the slot is marked as invalidated. Possible
values are:

wal_removed = required WAL has been removed.
rows_removed = required rows have been removed.
wal_level_insufficient = the primary doesn't have a wal_level sufficient
to perform logical decoding.

The existing users of 'conflicting' column can get the same answer by
using 'conflict_reason' IS NOT NULL.

Author: Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz
2024-01-04 08:26:25 +05:30
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Peter Eisentraut 59fd390d5e Second attempt at organizing jsonpath operators and methods
Second attempt at 283a95da92.  Since we can't reorder the enum values
of JsonPathItemType, instead reorder the switch cases where they are
used to generally follow the order of the enum values, for better
maintainability.
2024-01-03 21:56:41 +01:00
Peter Eisentraut 0958f8f6bf Revert "Reorganise jsonpath operators and methods"
This reverts commit 283a95da92.

The reordering of JsonPathItemType affects the binary on-disk
compatibility of the jsonpath type, so we must not change it.  Revert
for now and consider.
2024-01-03 21:02:49 +01:00
Peter Eisentraut 283a95da92 Reorganise jsonpath operators and methods
Various jsonpath operators and methods add various keywords, switch
cases, and documentation entries in some order.  However, they are not
consistent; reorder them for better maintainability or readability.

Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
2024-01-03 11:25:33 +01:00
Peter Eisentraut c1b9e1e56d Add numeric_int8_opt_error() to optionally suppress errors
This matches the existing numeric_int4_opt_error() (see commit
16d489b0fe).  It will be used by a future JSON-related patch, which
wants to report errors in its own way and thus does not want the
internal functions to throw any error.

Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
2024-01-03 10:05:35 +01:00
Robert Haas e62e73f3a2 gist: fix typo "split(t)ed" -> "split"
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna.

Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02 12:24:28 -05:00
Robert Haas 0d9937d118 Fix typos in comments and in one isolation test.
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions
by me.

Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02 12:05:41 -05:00
Peter Eisentraut 141752bbb0 Fix typos in simplehash.h
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/18252-d46d27900a277d87@postgresql.org
2024-01-02 10:18:47 +01:00
Amit Kapila 9a17be1e24 Allow upgrades to preserve the full subscription's state.
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.

Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker).  Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.

To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.

The subscription's replication origin is needed to ensure that we don't
replicate anything twice.

To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.

pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.

Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 08:08:46 +05:30
Tomas Vondra 6c63bcbf3c Minor cleanup of the BRIN parallel build code
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:

* The "spool" grouping tuplesort and the heap/index is not necessary.
  The heap/index are available as separate arguments, causing confusion.
  So remove the spool, and use the tuplesort directly.

* The new tuplesort variant does not need the heap/index, as it sorts
  simply by the range block number, without accessing the tuple data.
  So simplify that too.

Initial report and patch by Ranier Vilela, further cleanup by me.

Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
2023-12-30 23:15:04 +01:00
Peter Eisentraut a740b213d4 Add GUC backtrace_on_internal_error
When enabled (default off), this logs a backtrace anytime elog() or an
equivalent ereport() for internal errors is called.

This is not well covered by the existing backtrace_functions, because
there are many equally-worded low-level errors in many functions.  And
if you find out where the error is, then you need to manually rewrite
the elog() to ereport() to attach the errbacktrace(), which is
annoying.  Having a backtrace automatically on every elog() call could
be very helpful during development for various kinds of common errors
from palloc, syscache, node support, etc.

Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org
2023-12-30 11:43:57 +01:00
Peter Eisentraut c538592959 Make all Perl warnings fatal
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests.  Occasionally, these scripts produce warnings.  These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.

This changes all warnings into fatal errors, by replacing

    use warnings;

by

    use warnings FATAL => 'all';

in all Perl files.

Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-29 18:20:00 +01:00
Tom Lane 58054de2d0 Improve the implementation of information_schema._pg_expandarray().
This function was originally coded with a handmade expansion
of the array subscripts.  We can do it a little faster and far
more legibly today, by using unnest() WITH ORDINALITY.

While at it, let's apply the rowcount estimation support that exists
for the underlying unnest() function: reduce the default ROWS estimate
to 100 and attach array_unnest_support.  I'm not sure that
array_unnest_support can do anything useful today with the call sites
that exist in information_schema, but it can't hurt, and the existing
default rowcount of 1000 is surely much too high for any of these
cases.

The psql.sql regression script is using _pg_expandarray() as a
test case for \sf+.  While we could keep doing so, the new one-line
function body makes a poor test case for \sf+ row-numbering, so
switch it to print another information_schema function.

Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us
2023-12-27 15:55:46 -05:00
Alexander Korotkov 7e6fb5da41 Improvements and fixes for e0b1ee17dc
e0b1ee17dc introduced optimization for matching B-tree scan keys required for
the directional scan.  However, it incorrectly assumed that all keys required
for opposite direction scan are satisfied by _bt_first().  It has been
illustrated that with multiple scan keys over the same column, a lesser one
(according to the scan direction) could win leaving the other one unsatisfied.

Instead of relying on _bt_first() this commit introduces code that memorizes
whether there was at least one match on the page.  If that's true we know that
keys required for opposite-direction scan are satisfied as soon as
corresponding values are not NULLs.

Also, this commit simplifies the description for the optimization of keys
required for the current direction scan.  Now the flag used for this is named
continuescanPrechecked and means exactly that *continuescan flag is known
to be true for the last item on the page.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com
Reviewed-by: Pavel Borisov
2023-12-27 14:35:08 +02:00
Alexander Korotkov 06b10f80ba Remove BTScanOpaqueData.firstPage
It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData.
This commit makes it an argument of the _bt_readpage() function.  We can easily
distinguish first-time and repeated calls (within the scan) of this function.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com
Reviewed-by: Pavel Borisov
2023-12-27 14:21:49 +02:00
Alexander Korotkov 7d58f2342b REALLOCATE_BITMAPSETS manual compile-time option
This option forces each bitmapset modification to reallocate bitmapset.  This
is useful for debugging hangling pointers to bitmapset's.

Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
2023-12-27 03:57:57 +02:00
Alexander Korotkov 12915a58ee Enhance checkpointer restartpoint statistics
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.

Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.

The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions.  This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.

System catalog is changed.  Catversion is bumped.

Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
2023-12-25 01:12:36 +02:00
Robert Haas 49f2194ed5 Fix numerous typos in incremental backup commits.
Apparently, spell check would have been a really good idea.

Alexander Lakhin, with a few additions as per an off-list report
from Andres Freund.

Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com
2023-12-21 15:36:17 -05:00
Robert Haas dc21234005 Add support for incremental backup.
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 09:49:12 -05:00
Robert Haas 174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Robert Haas aafc07c7a1 Move src/bin/pg_verifybackup/parse_manifest.c into src/common.
This makes it possible for the code to be easily reused by other
client-side tools, and/or by the server.

Patch by me. Review of this patch in particular by at least Peter
Eisentraut; reviewers for the patch series in general include Dilip
Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak.

Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
2023-12-19 15:24:31 -05:00
Tom Lane 7e1ce2b3de Prevent integer overflow when forming tuple width estimates.
It's at least theoretically possible to overflow int32 when adding up
column width estimates to make a row width estimate.  (The bug example
isn't terribly convincing as a real use-case, but perhaps wide joins
would provide a more plausible route to trouble.)  This'd lead to
assertion failures or silly planner behavior.  To forestall it, make
the relevant functions compute their running sums in int64 arithmetic
and then clamp to int32 range at the end.  We can reasonably assume
that MaxAllocSize is a hard limit on actual tuple width, so clamping
to that is simply a correction for dubious input values, and there's
no need to go as far as widening width variables to int64 everywhere.

Per bug #18247 from RekGRpth.  There've been no reports of this issue
arising in practical cases, so I feel no need to back-patch.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
2023-12-19 11:12:16 -05:00
Peter Eisentraut 2a607fb822 Update comment for Cardinality typedef
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-Zd7Yy80RL1NdskLLo-wz6QoqsbC5TKs%3D3yZxG3BT_aA%40mail.gmail.com
2023-12-19 14:58:47 +01:00
Heikki Linnakangas 3c080fb4fa Simplify newNode() by removing special cases
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a
  faster version of MemoryContextAllocZero(), but modern compilers turn
  the MemSetLoop() into a call to memset() anyway, making it more or
  less identical to MemoryContextAllocZero(). That was the only user of
  MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast().

- Convert newNode() to a static inline function. When this was
  originally originally written, it was written as a macro because
  testing showed that gcc didn't inline the size check as we
  intended. Modern compiler versions do, and now that it just calls
  palloc0() there is no size-check to inline anyway.

One nice effect is that the palloc0() takes one less argument than
MemoryContextAllocZeroAligned(), which saves a few instructions in the
callers of newNode().

Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro
Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
2023-12-19 12:11:47 +02:00
Tom Lane 8b965c549d compute_bitmap_pages' loop_count parameter should be double not int.
The value was double in the original implementation of this logic.
Commit da08a6598 pulled it out into a subroutine, but carelessly
declared the parameter as int when it should have been double.
On most platforms, the only ill effect would be to clamp the value
to be not more than INT_MAX, which would seldom be exceeded and
probably wouldn't change the estimates too much anyway.  Nonetheless,
it's wrong and can cause complaints from ubsan.

While here, improve the comments and parameter names.

This is an ABI change in a globally exposed subroutine, so
back-patching would create some risk of breaking extensions.
The value of the fix doesn't seem high enough to warrant taking
that risk, so fix in HEAD only.

Per report from Alexander Lakhin.

Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
2023-12-18 12:46:10 -05:00
Nathan Bossart 64b1fb5f03 Optimize pg_atomic_exchange_u32 and pg_atomic_exchange_u64.
Presently, all platforms implement atomic exchanges by performing
an atomic compare-and-swap in a loop until it succeeds.  This can
be especially expensive when there is contention on the atomic
variable.  This commit optimizes atomic exchanges on many platforms
by using compiler intrinsics, which should compile into something
much less expensive than a compare-and-swap loop.  Since these
intrinsics have been available for some time, the inline assembly
implementations are omitted.

Suggested-by: Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13
2023-12-18 10:53:32 -06:00
Thomas Munro 4908c58720 Provide vectored variants of smgrread() and smgrwrite().
smgrreadv() and smgrwritev() and their md.c implementations call
FileReadV() and FileWriteV().  A range of disk blocks beginning at
'blocknum' and extending for 'nblocks' can be scattered to or gathered
from multiple buffers with a single system call.  The traditional
smgrread() and smgrwrite() functions are implemented in terms of the new
functions.

Later commits will introduce calls with nblocks > 1, but the following
behavioral changes can be seen already:

* After a short transfer we'll now retry until we eventually read 0
  bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we
  would infer the reason.  Retrying is consistent with xlog.c's
  treatment of large WAL writes, and arguably also xlog.c and fd.c's
  treatment of EINTR.  Arbitrary short returns for larger transfers have
  been observed on several OSes, and might in theory also happen for
  transient reasons with our own pg_p*v() fallback code.

* After unexpected EOF or -1, the error thrown now talks about
  a range even for the single block case, eg "blocks 42..42".

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-18 15:01:50 +13:00
Michael Paquier 3c9d9acae0 Refactor pgstat_prepare_io_time() with an input argument instead of a GUC
Originally, this routine relied on track_io_timing to check if a time
interval for an I/O operation stored in pg_stat_io should be initialized
or not.  However, the addition of WAL statistics to pg_stat_io requires
that the initialization happens when track_wal_io_timing is enabled,
which is dependent on the code path where the I/O operation happens.

Author: Nazir Bilal Yavuz
Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
2023-12-16 20:16:20 +01:00
Alvaro Herrera a6be0600ac
Remove useless LIMIT_OPTION_DEFAULT value from LimitOption
During the development that led to commit 357889eb17, for a time we
had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely
removed later on, before commit.  Complete the removal now.

Author: Zhang Mingli <avamingli@gmail.com>
Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
2023-12-16 18:20:03 +01:00
Thomas Munro b485ad7f07 Provide multi-block smgrprefetch().
Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a
single block at a time.  Add an nblocks argument so that we can do the
same for a range of blocks.  This usually produces a single system call,
but might need to loop if it crosses a segment boundary.  Initially it
is only called with nblocks == 1, but proposed patches will make wider
calls.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version)
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-16 17:51:21 +13:00
Tom Lane 59bd34c2fa Fix bugs in manipulation of large objects.
In v16 and up (since commit afbfc0298), large object ownership
checking has been broken because object_ownercheck() didn't take care
of the discrepancy between our object-address representation of large
objects (classId == LargeObjectRelationId) and the catalog where their
ownership info is actually stored (LargeObjectMetadataRelationId).
This resulted in failures such as "unrecognized class ID: 2613"
when trying to update blob properties as a non-superuser.

Poking around for related bugs, I found that AlterObjectOwner_internal
would pass the wrong classId to the PostAlterHook in the no-op code
path where the large object already has the desired owner.  Also,
recordExtObjInitPriv checked for the wrong classId; that bug is only
latent because the stanza is dead code anyway, but as long as we're
carrying it around it should be less wrong.  These bugs are quite old.

In HEAD, we can reduce the scope for future bugs of this ilk by
changing AlterObjectOwner_internal's API to let the translation happen
inside that function, rather than requiring callers to know about it.

A more bulletproof fix, perhaps, would be to start using
LargeObjectMetadataRelationId as the dependency and object-address
classId for blobs.  However that has substantial risk of breaking
third-party code; even within our own code, it'd create hassles
for pg_dump which would have to cope with a version-dependent
representation.  For now, keep the status quo.

Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
2023-12-15 13:55:05 -05:00
Thomas Munro 871fe4917e Provide vectored variants of FileRead() and FileWrite().
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for
fd.c's virtual file descriptors.  The simple FileRead() and FileWrite()
functions are now implemented in terms of the vectored functions, to
avoid code duplication, and they are converted back to the corresponding
simple system calls further down (commit 15c9ac36).  Later work will
make more interesting multi-iovec calls.

The traditional behavior of reporting a "fake" ENOSPC error is
simplified.  It's now always set for non-failing writes, for the benefit
of callers that expect to log a meaningful "%m" if they determine that
the write was short.  (Perhaps we should consider getting rid of that
expectation one day.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12 13:12:43 +13:00
Thomas Munro 0c6be59f5e Provide helper for retrying partial vectored I/O.
compute_remaining_iovec() is a re-usable routine for retrying after
pg_readv() or pg_writev() reports a short transfer.  This will gain new
users in a later commit, but can already replace the open-coded
equivalent code in the existing pg_pwritev_with_retry() function.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12 10:57:18 +13:00
Thomas Munro baf7c93ed5 Define unconstify() and unvolatize() for C++.
These two macros wouldn't work if used in an inline function definition
in a header seen by g++, because __builtin_types_compatible_p is only
available in C.  Redirect to standard C++ const_cast (which also
adds/removes volatile despite its name).

Per cpluspluscheck failure in a development branch.

Suggested-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com
2023-12-12 09:46:46 +13:00
Alvaro Herrera d3fe6e90ba
Simplify productions for FORMAT JSON [ ENCODING name ]
This removes the production json_encoding_clause_opt, instead merging
it into json_format_clause.  Also remove the auxiliary
makeJsonEncoding() function.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
2023-12-11 11:55:34 +01:00
Michael Paquier c7a3e6b46d Remove trace_recovery_messages
This GUC was intended as a debugging help in the 9.0 area when hot
standby and streaming replication were being developped, able to offer
more information at LOG level rather than DEBUGn.  There are more tools
available these days that are able to offer rather equivalent
information, like pg_waldump introduced in 9.3.  It is not obvious how
this facility is useful these days, so let's remove it.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
2023-12-11 11:49:02 +01:00
Peter Eisentraut 90834ceccd Remove some unnecessary includes of "access/xlog_internal.h"
There were a few places where access/xlog_internal.h was apparently
included unnecessarily.  In some of those places, a more specific
header file (that somehow came in via access/xlog_internal.h) can be
used instead.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/a56a6eec-eb14-471b-9570-3cac23603964%40eisentraut.org
2023-12-10 07:46:06 +01:00
Jeff Davis 867dd2dc87 Cache opaque handle for GUC option to avoid repeasted lookups.
When setting GUCs from proconfig, performance is important, and hash
lookups in the GUC table are significant.

Per suggestion from Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com
Reviewed-by: John Naylor
2023-12-08 11:16:01 -08:00
Peter Geoghegan c9c0589fda Optimize nbtree backward scan boundary cases.
Teach _bt_binsrch (and related helper routines like _bt_search and
_bt_compare) about the initial positioning requirements of backward
scans.  Routines like _bt_binsrch already know all about "nextkey"
searches, so it seems natural to teach them about "goback"/backward
searches, too.  These concepts are closely related, and are much easier
to understand when discussed together.

Now that certain implementation details are hidden from _bt_first, it's
straightforward to add a new optimization: backward scans using the <
strategy now avoid extra leaf page accesses in certain "boundary cases".
Consider the following example, which uses the tenk1 table (and its
tenk1_hundred index) from the standard regression tests:

SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1;

Before this commit, nbtree would scan two leaf pages, even though it was
only really necessary to scan one leaf page.  We'll now descend straight
to the leaf page containing a (12, -inf) high key instead.  The scan
will locate matching non-pivot tuples with "hundred" values starting
from the value 11.  The scan won't waste a page access on the right
sibling leaf page, which cannot possibly contain any matching tuples.

You can think of the optimization added by this commit as disabling an
optimization (the _bt_compare "!pivotsearch" behavior that was added to
Postgres 12 in commit dd299df8) for a small subset of cases where it was
always counterproductive.

Equivalently, you can think of the new optimization as extending the
"pivotsearch" behavior that page deletion by VACUUM has long required
(since the aforementioned Postgres 12 commit went in) to other, similar
cases.  Obviously, this isn't strictly necessary for these new cases
(unlike VACUUM, _bt_first is prepared to move the scan to the left once
on the leaf level), but the underlying principle is the same.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
2023-12-08 11:05:17 -08:00
Tomas Vondra b437571714 Allow parallel CREATE INDEX for BRIN indexes
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.

The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.

The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).

This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.

Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
2023-12-08 18:15:26 +01:00
Heikki Linnakangas b31ba5310b Rename ShmemVariableCache to TransamVariables
The old name was misleading: It's not a cache, the values kept in the
struct are the authoritative source.

Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-08 09:47:15 +02:00
Heikki Linnakangas 15916ffb04 Initialize ShmemVariableCache like other shmem areas
For sake of consistency.

Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-08 09:46:59 +02:00
Jeff Davis 719b342d36 Shrink Unicode category table.
Missing entries can implicitly be considered "unassigned".

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com
2023-12-07 15:44:03 -08:00
David Rowley d16a0c1e2e Verify that attribute counts match in ExecCopySlot
tts_virtual_copyslot() contained an Assert that checked that the srcslot
contained <= attributes than the dstslot.  This seems to be backwards as
if the srcslot contained fewer attributes then the dstslot could be left
with stale Datum values from the previously stored tuple.  It might make
more sense to allow the source to contain additional attributes and only
copy the leading ones that also exist in the destination, however, that's
not what we're doing here.

Here we just remove the Assert from tts_virtual_copyslot() and add an
Assert to ExecCopySlot() to verify the attribute counts match.  There
does not seem to be any places where the destination contains fewer
attributes, so instead of going to the trouble of making the code
properly handle this, let's just ensure the attribute counts match.  If
this Assert fails then that will indicate that we do have cases that
require us to handle the srcslot with more attributes than the dstslot.
It seems better to only write this code if there's a genuine requirement
for it rather than write it now only to leave it untested.

Thanks to Andres Freund for helping with the analysis of this.

Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com
2023-12-07 21:28:24 +13:00
Amit Kapila 0bf62460bb Fix issues in binary_upgrade_logical_slot_has_caught_up().
The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up()
as a non-strict function to allow providing a better error message to callers
in case the passed slot_name is NULL. On further discussion, it seems that
it is not helpful to have a different error message for NULL input in this
function, so this patch marks the function as strict.

This patch also removes the explicit permission check to use replication
slots as this function is invoked only by superusers and instead adds an
Assert.

Reported-by: Masahiko Sawada
Author: Hayato Kuroda
Reviewed-by: Vignesh C
Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
2023-12-07 08:42:48 +05:30
Heikki Linnakangas e7c6efe305 Remove now-unnecessary Autovacuum[Launcher|Worker]IAm functions
After commit fd5e8b440d, InitProcess() is called later in the
EXEC_BACKEND startup sequence, so it's enough to set the
am_autovacuum_[launcher|worker] variables at the same place as in the
!EXEC_BACKEND case.
2023-12-04 15:34:37 +02:00
Peter Eisentraut 457428d9e9 Remove unnecessary include of <math.h>
This was probably never necessary.  (The header used to use random(),
but that shouldn't require <math.h> either.  In any case, that's gone,
too.)

Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04 06:35:22 +01:00
Peter Eisentraut da67cb0a44 Remove unnecessary include of <sys/socket.h>
This was put here as part of a mechanical replacement of the old
"getaddrinfo.h" with <netdb.h> plus <sys/socket.h> (commit
5579388d2d).  But here, we only need netdb.h (for NI_MAXHOST), not
sys/socket.h.

Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04 06:35:22 +01:00
Peter Eisentraut dffb2b478f Remove unnecessary includes of <signal.h>
These were once needed for sig_atomic_t, but that no longer appears in
these headers, so the include is not needed.

Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org
2023-12-04 06:35:22 +01:00
Michael Paquier f21848de20 Add support for REINDEX in event triggers
This commit adds support for REINDEX in event triggers, making this
command react for the events ddl_command_start and ddl_command_end.  The
indexes rebuilt are collected with the ReindexStmt emitted by the
caller, for the concurrent and non-concurrent paths.

Thanks to that, it is possible to know a full list of the indexes that a
single REINDEX command has worked on.

Author: Garrett Thornburg, Jian He
Reviewed-by: Jim Jones, Michael Paquier
Discussion: https://postgr.es/m/CAEEqfk5bm32G7sbhzHbES9WejD8O8DCEOaLkxoBP7HNWxjPpvg@mail.gmail.com
2023-12-04 09:53:49 +09:00
Heikki Linnakangas 388491f1e5 Pass BackgroundWorker entry in the parameter file in EXEC_BACKEND mode
The BackgroundWorker struct is now passed the same way as the Port
struct. Seems more consistent.

This makes it possible to move InitProcess later in SubPostmasterMain
(in next commit), as we no longer need to access shared memory to read
background worker entry.

Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-12-03 16:38:54 +02:00
Heikki Linnakangas 69d903367c Refactor CreateSharedMemoryAndSemaphores
For clarity, have separate functions for *creating* the shared memory
and semaphores at postmaster or single-user backend startup, and
for *attaching* to existing shared memory structures in EXEC_BACKEND
case. CreateSharedMemoryAndSemaphores() is now called only at
postmaster startup, and a new AttachSharedMemoryStructs() function is
called at backend startup in EXEC_BACKEND mode.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2023-12-03 16:09:42 +02:00
Thomas Munro 15c9ac3629 Optimize pg_readv/pg_pwritev single vector case.
For the trivial case of iovcnt == 1, kernels are measurably slower at
dealing with the more complex arguments of preadv/pwritev than the
equivalent plain old pread/pwrite.  The overheads are worth it for
iovcnt > 1, but for 1 let's just redirect to the cheaper calls.  While
we could leave it to callers to worry about that, we already have to
have our own pg_ wrappers for portability reasons so it seems
reasonable to centralize this knowledge there (thanks to Heikki for this
suggestion).  Try to avoid function call overheads by making them
inlinable, which might also allow the compiler to avoid the branch in
some cases.  For systems that don't have preadv and pwritev (currently:
Windows and [closed] Solaris), we might as well pull the replacement
functions up into the static inline functions too.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-11-29 17:19:25 +13:00
Alexander Korotkov 2cdf131c46 Use larger segment file names for pg_notify
This avoids the wraparound in async.c and removes the corresponding code
complexity. The maximum amount of allocated SLRU pages for NOTIFY / LISTEN
queue is now determined by the max_notify_queue_pages GUC. The default
value is 1048576. It allows to consume up to 8 GB of disk space which is
exactly the limit we had previously.

Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev
Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov
Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov
Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund
Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev
Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
2023-11-29 01:41:48 +02:00
Alexander Korotkov 4ed8f0913b Index SLRUs by 64-bit integers rather than by 32-bit integers
We've had repeated bugs in the area of handling SLRU wraparound in the past,
some of which have caused data loss. Switching to an indexing system for SLRUs
that does not wrap around should allow us to get rid of a whole bunch
of problems and improve the overall reliability of the system.

This particular patch however only changes the indexing and doesn't address
the wraparound per se. This is going to be done in the following patches.

Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev
Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov
Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov
Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund
Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev
Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
2023-11-29 01:40:56 +02:00
Tom Lane c82207a548 Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.
We should have done it this way all along, but we accidentally got
away with using the wrong BIO field up until OpenSSL 3.2.  There,
the library's BIO routines that we rely on use the "data" field
for their own purposes, and our conflicting use causes assorted
weird behaviors up to and including core dumps when SSL connections
are attempted.  Switch to using the approved field for the purpose,
i.e. app_data.

While at it, remove our configure probes for BIO_get_data as well
as the fallback implementation.  BIO_{get,set}_app_data have been
there since long before any OpenSSL version that we still support,
even in the back branches.

Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor
change in an error message spelling that evidently came in with 3.2.

Tristan Partin and Bo Andreson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
2023-11-28 12:34:03 -05:00
Heikki Linnakangas 60f227316c Fix assertions with RI triggers in heap_update and heap_delete.
If the tuple being updated is not visible to the crosscheck snapshot,
we return TM_Updated but the assertions would not hold in that case.
Move them to before the cross-check.

Fixes bug #17893. Backpatch to all supported versions.

Author: Alexander Lakhin
Backpatch-through: 12
Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
2023-11-28 12:00:14 +02:00
Michael Paquier 5ad49322e5 Fix comment in tableam.h about GetHeapamTableAmRoutine()
This routine is located in heapam_handler.c, not tableamapi.c.  Issue
noted while hacking the area for a different patch.

Reviewed-by: Richard Guo
Discussion: https://postgr.es/m/ZWQuHltp2KS_0Cct@paquier.xyz
2023-11-28 08:40:08 +09:00
Nathan Bossart 75680c3d80 Retire a few backwards compatibility macros.
As of commits dd04e958c8 and 1833f1a1c3, tuplestore_donestoring(),
SPI_push(), SPI_pop(), SPI_push_conditional(),
SPI_pop_conditional(), and SPI_restore_connection() are no-op
macros provided for backwards compatibility.  This commit removes
these macros, so any uses in third-party code will need to be
removed, too.  Since these macros have been no-ops for a while,
such adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVeO58JM5tK2Qa8QC-%3DkC8sdkJOTd4BFU%3DK8zs4gGYpjQ%40mail.gmail.com
2023-11-27 13:10:09 -06:00
Alexander Korotkov bc3c8db8ae Display length and bounds histograms in pg_stats
Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

Catversion is bumped.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
2023-11-27 01:32:17 +02:00
Alexander Korotkov 441c8a3134 Update comments for pg_statistic catalog table
Make a reminder that pg_stats view needs to be modified whenever a new slot
kind is added.  To prevent situations like 918eee0c49 when pg_stats was
forgotten to be updated.

Also, revise the comment that only non-null, non-empty rows are considered
for the range length histogram.

Discussion: https://postgr.es/m/flat/b67d8b57-9357-7e82-a2e7-f6ce6eaeec67@postgrespro.ru
Author: Egor Rogov, Soumyadeep Chakraborty
Reviewed-by: Tomas Vondra, Justin Pryzby, Jian He
2023-11-27 01:32:17 +02:00
Tomas Vondra c1ec02be1d Reuse BrinDesc and BrinRevmap in brininsert
The brininsert code used to initialize (and destroy) BrinDesc and
BrinRevmap for each tuple, which is not free. This patch initializes
these structures only once, and reuses them for all inserts in the same
command. The data is passed through indexInfo->ii_AmCache.

This also introduces an optional AM callback "aminsertcleanup" that
allows performing custom cleanup in case simply pfree-ing ii_AmCache is
not sufficient (which is the case when the cache contains TupleDesc,
Buffers, and so on).

Author: Soumyadeep Chakraborty
Reviewed-by: Alvaro Herrera, Matthias van de Meent, Tomas Vondra
Discussion: https://postgr.es/m/CAE-ML%2B9r2%3DaO1wwji1sBN9gvPz2xRAtFUGfnffpd0ZqyuzjamA%40mail.gmail.com
2023-11-25 20:27:28 +01:00
Heikki Linnakangas 50c67c2019 Use ResourceOwner to track WaitEventSets.
A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.

This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.)

The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.

Reviewed-by: Alexander Lakhin, Thomas Munro
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
2023-11-23 13:31:36 +02:00
Alvaro Herrera 24ea53dcfa
Avoid overflow in fe_utils' printTable()
The original code would miscalculate the total number of cells when the
table to print has more than ~4 billion cells, leading to an unnecessary
error.  Repair by changing some computations to be 64-bits wide.  Add
some necessary overflow checks.

Author: Hongxu Ma <interma@outlook.com>
Discussion: https://postgr.es/m/TYBP286MB0351B057B101C90D7C1239E6B4E2A@TYBP286MB0351.JPNP286.PROD.OUTLOOK.COM
2023-11-21 14:55:29 +01:00
Jeff Davis b282fa88df simplehash: preserve consistency in case of OOM.
Compute size first, then allocate, then update the structure.

Previously, an out-of-memory when growing could leave the hashtable in
an inconsistent state.

Discussion: https://postgr.es/m/20231117201334.eyb542qr5yk4gilq@awork3.anarazel.de
Reviewed-by: Andres Freund
Reviewed-by: Gurjeet Singh
2023-11-17 13:58:16 -08:00
Michael Paquier b1e5c9fa9a Change logtape/tuplestore code to use int64 for block numbers
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds.  This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER.  This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.

This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.

Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
2023-11-17 11:20:53 +09:00
Tom Lane 743ddafc71 Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Nathan Bossart 69c32b8b35 Fix fallback implementation for pg_atomic_test_set_flag().
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument.  This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.

Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
2023-11-15 15:04:18 -06:00
Nathan Bossart 6a72c42fd5 Retire MemoryContextResetAndDeleteChildren() macro.
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset().  Now
that some time has passed, this macro seems more likely to create
confusion.

This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset().  Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead.  Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.

Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
2023-11-15 13:42:30 -06:00
Dean Rasheed 519fc1bd9e Support +/- infinity in the interval data type.
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.

The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.

Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.

Bump catalog version.

Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.

Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
2023-11-14 10:58:49 +00:00
Peter Eisentraut 3849fe7c2b Replace Gen_dummy_probes.sed with Gen_dummy_probes.pl
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p.  This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.

(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)

As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary.  We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.

The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/3fd0f1bc-4483-4ba9-8aa0-64765b052039@eisentraut.org
2023-11-14 10:27:10 +01:00
Michael Paquier e5cca6288a Add support for pg_stat_reset_slru without argument
pg_stat_reset_slru currently requires an input argument, either:
- NULL to reset the SLRU counters of everything.
- A specific value to reset a single SLRU cache.

This commit adds support for a new pattern: pg_stat_reset_slru without
any argument works the same way as pg_stat_reset_slru(NULL), relying on
a DEFAULT in the function definition to handle this case.  This makes
the function more consistent with 23c8c0c8f4.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CALj2ACW1VizYg01EeH_cA-7qA+4NzWVAoZ5Lw9_XYO1RRHAZbA@mail.gmail.com
2023-11-14 09:50:52 +09:00
Bruce Momjian f279241b09 psql: improve description consistency of \dTS data types
This was done particularly for geometric data types.

Reported-by: Christoph Berg

Discussion: https://postgr.es/m/YGI8Leuk0WvmNWLr@msg.df7cb.de

Co-authored-by: Kyotaro Horiguchi

Backpatch-through: master
2023-11-13 16:26:59 -05:00
Tom Lane 83472de606 Improve readability and error detection of array_in().
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two.  This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing.  In any case this coding seems
much clearer and more straightforward than what we had before.

This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add.  Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.

Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.

Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.

Tom Lane, Heikki Linnakangas, and Jian He.  Thanks to Alexander Lakhin
for the initial report and for review/testing.

Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
2023-11-13 13:01:51 -05:00
Michael Paquier 23c8c0c8f4 Add ability to reset all shared stats types in pg_stat_reset_shared()
Currently, pg_stat_reset_shared() can use an argument to specify the
target of statistics to reset, doing nothing for NULL as it is strict.

This patch adds to pg_stat_reset_shared() the possibility to reset all
the stats types already handled in this function rather than do nothing
if the argument value given is NULL or if nothing is specified
(proisstrict is switched to false).  Like previously, SLRUs are not
included in what gets reset.

The idea to use NULL or no argument to control if all the shared stats
already covered by this function should be reset has been proposed by
Andres Freund.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy,
Matthias van de Meent
Discussion: https://postgr.es/m/4291a55137ddda77cf7cc5f46e846daf@oss.nttdata.com
2023-11-12 16:43:12 +09:00
Amit Kapila 8bfb231b43 Prohibit max_slot_wal_keep_size to value other than -1 during upgrade.
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.

Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com
2023-11-10 08:45:01 +05:30
David Rowley 10d34fefc2 Ensure we use the correct spelling of "ensure"
We seem to have accidentally used "insure" in a few places.  Correct
that.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
2023-11-10 00:15:54 +13:00
Dean Rasheed 0e3e8fbd3a Fix corner-case 64-bit integer subtraction bug on some platforms.
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.

Back-patch to all supported branches.

Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.

Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
2023-11-09 09:50:23 +00:00
Heikki Linnakangas 954e43564d Use a faster hash function in resource owners.
This buys back some of the performance loss that we otherwise saw from the
previous commit.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/d746cead-a1ef-7efe-fb47-933311e876a3%40iki.fi
2023-11-08 13:30:52 +02:00
Heikki Linnakangas b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Alvaro Herrera 615f5f6faa
Stop including parsenodes.h in plannodes.h
I added it by mistake in commit 7103ebb7aa.  To clean up, struct
MergeAction needs to be moved to primnodes.h from parsenodes.h.  (This
forces us to also move OverridingKind to primnodes.h).

Having to add parsenodes.h to bootstrap.h as fallout is a bit
surprising, since nothing nominally needs it there.  However, per
comments in bootscanner.l, it is needed so that YYSTYPE can be declared.
I think this only started with commit dac048f71e, but I didn't
actually verify that.

In passing, stop including parsenodes.h in tcopprot.h.  Nothing needs it
there.

Per discussion on a patch by Ashutosh Bapat.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202311071106.6y7b2ascqjlz@alvherre.pgsql
2023-11-07 19:26:39 +01:00
Tom Lane 18b585155a Detect integer overflow while computing new array dimensions.
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds.  In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen.  Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.

To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.

The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text.  v11 psql
lacks that parameter, so omit the tests in that branch.

Our thanks to Pedro Gallegos for reporting this problem.

Security: CVE-2023-5869
2023-11-06 10:56:43 -05:00
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
Daniel Gustafsson 526fe0d799 Add XMLText function (SQL/XML X038)
This function implements the standard XMLTest function, which
converts text into xml text nodes. It uses the libxml2 function
xmlEncodeSpecialChars to escape predefined entities (&"<>), so
that those do not cause any conflict when concatenating the text
node output with existing xml documents.

This also adds a note in  features.sgml about not supporting
XML(SEQUENCE). The SQL specification defines a RETURNING clause
to a set of XML functions, where RETURNING CONTENT or RETURNING
SEQUENCE can be defined. Since PostgreSQL doesn't support
XML(SEQUENCE) all of these functions operate with an
implicit RETURNING CONTENT.

Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://postgr.es/m/86617a66-ec95-581f-8d54-08059cca8885@uni-muenster.de
2023-11-06 09:38:29 +01:00
Tom Lane 7704a1a72e Be more wary about NULL values for GUC string variables.
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val.  Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice.  Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:47:33 -04:00
Jeff Davis a02b37fc08 Additional unicode primitive functions.
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().

The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.

Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
2023-11-01 22:47:06 -07:00
Bruce Momjian 741ed2065c C comment: adjust statistics mention
No need to talk about the statistics collector.

Discussion: https://postgr.es/m/8a82417cdb6e8038fe276d4960e3207a@oss.nttdata.com

Author: Álvaro Herrera

Backpatch-through: master
2023-10-31 11:02:04 -04:00
Michael Paquier 96f052613f Introduce pg_stat_checkpointer
Historically, the statistics of the checkpointer have been always part
of pg_stat_bgwriter.  This commit removes a few columns from
pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent,
renamed columns (plus a new one for the reset timestamp):
- checkpoints_timed -> num_timed
- checkpoints_req -> num_requested
- checkpoint_write_time -> write_time
- checkpoint_sync_time -> sync_time
- buffers_checkpoint -> buffers_written

The fields of PgStat_CheckpointerStats and its SQL functions are renamed
to match with the new field names, for consistency.  Note that
background writer and checkpointer have been split into two different
processes in commits 806a2aee37 and bf405ba8e4.  The pgstat
structures were already split, making this change straight-forward.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
2023-10-30 09:47:16 +09:00
Dean Rasheed b2d55447a5 Guard against overflow in make_interval().
The original code did very little to guard against integer or floating
point overflow when computing the interval's fields.  Detect any such
overflows and error out, rather than silently returning bogus results.

Joseph Koshakow, reviewed by Ashutosh Bapat and me.

Discussion: https://postgr.es/m/CAAvxfHcm1TPwH_zaGWuFoL8pZBestbRZTU6Z%3D-RvAdSXTPbKfg%40mail.gmail.com
2023-10-29 15:51:53 +00:00
Alexander Korotkov 2b26a69455 Make UniqueRelInfo a node
d3d55ce571 changed RelOptInfo.unique_for_rels from the list of Relid sets to
the list of UniqueRelInfo's.  But it didn't make UniqueRelInfo a node.
This commit makes UniqueRelInfo a node.  Also this commit revises some
comments related to RelOptInfo.unique_for_rels.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/flat/1189851.1698340331%40sss.pgh.pa.us
2023-10-27 05:45:16 +03:00
Michael Paquier 74604a37f2 Remove buffers_backend and buffers_backend_fsync from pg_stat_checkpointer
Two attributes related to checkpointer statistics are removed in this
commit:
- buffers_backend, that counts the number of buffers written directly by
a backend.
- buffers_backend_fsync, that counts the number of times a backend had
to do fsync() by its own.

These are actually not checkpointer properties but backend properties.
Also, pg_stat_io provides a more accurate and equivalent report of these
numbers, by tracking all the I/O stats related to backends, including
writes and fsyncs, so storing them in pg_stat_checkpointer was
redundant.

Thanks also to Robert Haas and Amit Kapila for their input.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund
Discussion: https://postgr.es/m/20230210004604.mcszbscsqs3bc5nx@awork3.anarazel.de
2023-10-27 11:16:39 +09:00
Peter Eisentraut 611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
David Rowley f0efa5aec1 Introduce the concept of read-only StringInfos
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer.  There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h

Here we introduce 2 new functions to initialize StringInfos.  One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc.  Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation.  Having
this as a function allows us to verify the buffer is correctly NUL
terminated.  StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.

The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all.  StringInfos initialized this way
are deemed as "read-only".  This means that it's not possible to
append to them or reset them.

For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.

Incompatibility note:

Here we also forego adding the NUL in a few places where it does not
seem to be required.  These locations are passing the given StringInfo
into a type's receive function.  It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this.  It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.

Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
2023-10-26 16:31:48 +13:00
Amit Kapila 29d0a77fa6 Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.

If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.

Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 07:06:55 +05:30
Alexander Korotkov d3d55ce571 Remove useless self-joins
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result.  Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.

The SJE proof is based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
 1. At most one inner row matches the join clause.
 2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
 1. Collect all merge-joinable join quals which look like a.x = b.x
 2. Add to the list above the baseretrictinfo of the inner table.
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables.
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
    the possibility of self-join elimination inner and outer clauses must have
    an exact match.

The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins.  Tests, covering this feature,
were added to join.sql.  Some regression tests changed due to self-join removal
logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
2023-10-25 12:59:16 +03:00
Tom Lane 387f9ed0a0 Fix problems when a plain-inheritance parent table is excluded.
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not.  (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.)  In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:

1. The wrong table's statement-level triggers would get fired.

2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan.  This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.

In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.

While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch.  Given the lack of field complaints about the
trigger issue, I'll desist for now.  (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)

Per bug #18147 from Hans Buschmann.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-24 14:48:33 -04:00
Jeff Davis 00d7fb5e2e Assert that buffers are marked dirty before XLogRegisterBuffer().
Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.

Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.

Discussion: https://postgr.es/m/c84114f8-c7f1-5b57-f85a-3adc31e1a904@iki.fi
Reviewed-by: Heikki Linnakangas
2023-10-23 17:17:46 -07:00
Robert Haas 5b36e8f078 Change struct tablespaceinfo's oid member from 'char *' to 'Oid'
This shouldn't change behavior except in the unusual case where
there are file in the tablespace directory that have entirely
numeric names but are nevertheless not possible names for a
tablespace directory, either because their names have leading zeroes
that shouldn't be there, or the value is actually zero, or because
the value is too large to represent as an OID.

In those cases, the directory would previously have made it into
the list of tablespaceinfo objects and no longer will. Thus, base
backups will now ignore such directories, instead of treating them
as legitimate tablespace directories. Similarly, if entries for
such tablespaces occur in a tablespace_map file, they will now
be rejected as erroneous, instead of being honored.

This is infrastructure for future work that wants to be able to
know the tablespace of each relation that is part of a backup
*as an OID*. By strengthening the up-front validation, we don't
have to worry about weird cases later, and can more easily avoid
repeated string->integer conversions.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
2023-10-23 15:17:26 -04:00
Robert Haas 5c47c6546c Refactor parse_filename_for_nontemp_relation to parse more.
Instead of returning the number of characters in the RelFileNumber,
return the RelFileNumber itself. Continue to return the fork number,
as before, and additionally return the segment number.

parse_filename_for_nontemp_relation now rejects a RelFileNumber or
segment number that begins with a leading zero. Before, we accepted
such cases as relation filenames, but if we continued to do so after
this change, the function might return the same values for two
different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be
annoying for callers. Since we don't actually ever generate filenames
with leading zeroes in the names, any such files that we find must
have been created by something other than PostgreSQL, and it is
therefore reasonable to treat them as non-relation files.

Along the way, change unlogged_relation_entry to store a RelFileNumber
rather than an OID. This update should have been made in
851f4cc75c, but it was overlooked.
It's trivial to make the update as part of this commit, perhaps more
trivial than it would have been without it, so do that.

Patch by me, reviewed by David Steele.

Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
2023-10-23 15:08:53 -04:00
Tom Lane 2d870b4aef Allow ALTER SYSTEM to set unrecognized custom GUCs.
Previously, ALTER SYSTEM failed if the target GUC wasn't present in
the session's GUC hashtable.  That is a reasonable behavior for core
(single-part) GUC names, and for custom GUCs for which we have loaded
an extension that's reserved the prefix.  But it's unnecessarily
restrictive otherwise, and it also causes inconsistent behavior:
you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar"
earlier in the session.  That's fairly silly.

Hence, refactor things so that we can execute ALTER SYSTEM even
if the variable doesn't have a GUC hashtable entry, as long as the
name meets the custom-variable naming requirements and does not
have a reserved prefix.  (It's safe to do this even if the
variable belongs to an extension we currently don't have loaded.
A bad value will at worst cause a WARNING when the extension
does get loaded.)

Also, adjust GRANT ON PARAMETER to have the same opinions about
whether to allow an unrecognized GUC name, and to throw the
same errors if not (it previously used a one-size-fits-all
message for several distinguishable conditions).  By default,
only a superuser will be allowed to do ALTER SYSTEM SET on an
unrecognized name, but it's possible to GRANT the ability to
do it.

Patch by me, pursuant to a documentation complaint from
Gavin Panella.  Arguably this is a bug fix, but given the
lack of other complaints I'll refrain from back-patching.

Discussion: https://postgr.es/m/2617358.1697501956@sss.pgh.pa.us
Discussion: https://postgr.es/m/169746329791.169914.16613647309012285391@wrigleys.postgresql.org
2023-10-21 13:35:19 -04:00
Tom Lane 2b5154beab Extend ALTER OPERATOR to allow setting more optimization attributes.
Allow the COMMUTATOR, NEGATOR, MERGES, and HASHES attributes to be set
by ALTER OPERATOR.  However, we don't allow COMMUTATOR/NEGATOR to be
changed once set, nor allow the MERGES/HASHES flags to be unset once
set.  Changes like that might invalidate plans already made, and
dealing with the consequences seems like more trouble than it's worth.
The main use-case we foresee for this is to allow addition of missed
properties in extension update scripts, such as extending an existing
operator to support hashing.  So only transitions from not-set to set
states seem very useful.

This patch also causes us to reject some incorrect cases that formerly
resulted in inconsistent catalog state, such as trying to set the
commutator of an operator to be some other operator that already has a
(different) commutator.

While at it, move the InvokeObjectPostCreateHook call for CREATE
OPERATOR to not occur until after we've fixed up commutator or negator
links as needed.  The previous ordering could only be justified by
thinking of the OperatorUpd call as a kind of ALTER OPERATOR step;
but we don't call InvokeObjectPostAlterHook therein.  It seems better
to let the hook see the final state of the operator object.

In the documentation, move the discussion of how to establish
commutator pairs from xoper.sgml to the CREATE OPERATOR ref page.

Tommy Pavlicek, reviewed and editorialized a bit by me

Discussion: https://postgr.es/m/CAEhP-W-vGVzf4udhR5M8Bdv88UYnPrhoSkj3ieR3QNrsGQoqdg@mail.gmail.com
2023-10-20 12:28:46 -04:00
Robert Haas afd12774ae During online checkpoints, insert XLOG_CHECKPOINT_REDO at redo point.
This allows tools that read the WAL sequentially to identify (possible)
redo points when they're reached, rather than only being able to
detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly
much later in the WAL stream. There are other possible applications as
well; see the discussion links below.

Any redo location that precedes the checkpoint location should now point
to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this.

While adjusting the code in CreateCheckPoint() for this patch, I made it
call WALInsertLockAcquireExclusive a bit later than before, since there
appears to be no need for it to be held while checking whether the system
is idle, whether this is an end-of-recovery checkpoint, or what the current
timeline is.

Bump XLOG_PAGE_MAGIC.

Patch by me, based in part on earlier work from Dilip Kumar. Review by
Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de
Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com
2023-10-19 14:47:29 -04:00
Michael Paquier 295c36c0c1 Add local_blk_{read|write}_time I/O timing statistics for local blocks
There was no I/O timing statistics for counting read and write timings
on local blocks, contrary to the counterparts for temp and shared
blocks.  This information is available when track_io_timing is enabled.

The output of EXPLAIN is updated to show this information.  An update of
pg_stat_statements is planned next.

Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
2023-10-19 13:39:38 +09:00
Michael Paquier 13d00729d4 Rename I/O timing statistics columns to shared_blk_{read|write}_time
These two counters, defined in BufferUsage to track respectively the
time spent while reading and writing blocks have historically only
tracked data related to shared buffers, when track_io_timing is enabled.

An upcoming patch to add specific counters for local buffers will take
advantage of this rename as it has come up that no data is currently
tracked for local buffers, and tracking local and shared buffers using
the same fields would be inconsistent with the treatment done for temp
buffers.  Renaming the existing fields clarifies what the block type of
each stats field is.

pg_stat_statement is updated to reflect the rename.  No extension
version bump is required as 5a3423ad8e has done one, affecting v17~.

Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
2023-10-19 11:26:40 +09:00
Michael Paquier 7fb355db14 Install wait_event_types.h in VPATH builds
An extra rule is needed in src/include/Makefile for VPATH builds to
install any generated server-side include files, and wait_event_types.h
was forgotten from the set.

Issue introduced by fa88928470.

Reported-by: Christoph Berg
Discussion: https://postgr.es/m/ZTAA11u7CtX6NqlK@msg.df7cb.de
2023-10-19 09:42:46 +09:00
Thomas Munro f90b4a846b jit: Supply LLVMGlobalGetValueType() for LLVM < 8.
Commit 37d5babb used this C API function while adding support for LLVM
16 and opaque pointers, but it's not available in LLVM 7 and older.
Provide it in our own llvmjit_wrap.cpp.  It just calls a C++ function
that pre-dates LLVM 3.9, our minimum target.

Back-patch to 12, like 37d5babb.

Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
2023-10-19 03:01:55 +13:00
Thomas Munro 37d5babb5c jit: Support opaque pointers in LLVM 16.
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].

 * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
 * For LLVM == 15, we'll continue to do the same, explicitly opting
   out of opaque pointer mode.
 * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
   the extra type argument.

The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc.  The change is mostly mechanical,
except that at each site the correct type had to be provided.

In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.

Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.

[1] https://llvm.org/docs/OpaquePointers.html

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
2023-10-18 22:47:23 +13:00
Michael Paquier 173b56f1ef Add flush option to pg_logical_emit_message()
Since its introduction, LogLogicalMessage() (via the SQL interface
pg_logical_emit_message()) has never included a call to XLogFlush(),
causing it to potentially lose messages on a crash when used in
non-transactional mode.  This has come up to me as a problem while
playing with ideas to design a test suite for what has become
039_end_of_wal.pl introduced in bae868caf2 by Thomas Munro, because
there are no direct ways to force a WAL flush via SQL.

The default is false, to not flush messages and influence existing
use-cases where this function could be used.  If set to true, the
message emitted is flushed before returning back to the caller, making
the message durable on crash.  This new option has no effect when using
pg_logical_emit_message() in transactional mode, as the record's flush
is guaranteed by the WAL record generated by the transaction committed.

Two queries of test_decoding are tweaked to cover the new code path for
the flush.

Bump catalog version.

Author: Michael Paquier
Reviewed-by: Andres Freund, Amit Kapila, Fujii Masao, Tung Nguyen, Tomas
Vondra
Discussion: https://postgr.es/m/ZNsdThSe2qgsfs7R@paquier.xyz
2023-10-18 11:24:59 +09:00
Nathan Bossart 97550c0711 Avoid calling proc_exit() in processes forked by system().
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system().  This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers.  If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit().  This can end badly.  For
example, both processes will try to remove themselves from the
PGPROC shared array.

To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before.  If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.

This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.

Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
2023-10-17 10:41:48 -05:00
Amit Kapila 79243de13f Restart the apply worker if the privileges have been revoked.
Restart the apply worker if the subscription owner's superuser privileges
have been revoked. This is required so that the subscription connection
string gets revalidated and use the password option to connect to the
publisher for non-superusers, if required.

Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/CALDaNm2Dxmhq08nr4P6G+24QvdBo_GAVyZ_Q1TcGYK+8NHs9xw@mail.gmail.com
2023-10-17 08:41:44 +05:30
Alexander Korotkov e83d1b0c40 Add support event triggers on authenticated login
This commit introduces trigger on login event, allowing to fire some actions
right on the user connection.  This can be useful for logging or connection
check purposes as well as for some personalization of environment.  Usage
details are described in the documentation included, but shortly usage is
the same as for other triggers: create function returning event_trigger and
then create event trigger on login event.

In order to prevent the connection time overhead when there are no triggers
the commit introduces pg_database.dathasloginevt flag, which indicates database
has active login triggers.  This flag is set by CREATE/ALTER EVENT TRIGGER
command, and unset at connection time when no active triggers found.

Author: Konstantin Knizhnik, Mikhail Gribkov
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru
Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko
Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund
Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk
Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu
2023-10-16 03:18:22 +03:00
Noah Misch 5f27b5f848 Dissociate btequalimage() from interval_ops, ending its deduplication.
Under interval_ops, some equal values are distinguishable.  One such
pair is '24:00:00' and '1 day'.  With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function.  This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes.  This fix makes interval_ops simply omit the support function,
like numeric_ops does.  Back-pack to v13, where btequalimage() first
appeared.  In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval".  Going
forward, back-branch initdb will include the catalog change.

Reviewed by Peter Geoghegan.

Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
2023-10-14 16:33:51 -07:00
Tom Lane fcdd6689d0 Harden xxx_is_visible() functions against concurrent object drops.
For the same reasons given in commit 403ac226d, adjust these
functions to not assume that checking SearchSysCacheExists can
guarantee success of a later fetch.

This follows the same internal API choices made in the earlier commit:
add a function XXXExt(oid, is_missing) and use that to eliminate
the need for a separate existence check.  The changes are very
straightforward, though tedious.  For the moment I just made the new
functions static in namespace.c, but we could export them if a need
emerges.

Per bug #18014 from Alexander Lakhin.  Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD.

Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
2023-10-14 16:13:11 -04:00
Tom Lane 403ac226dd Harden has_xxx_privilege() functions against concurrent object drops.
The versions of these functions that accept object OIDs are supposed
to return NULL, rather than failing, if the target object has been
dropped.  This makes it safe(r) to use them in queries that scan
catalogs, since the functions will be applied to objects that are
visible in the query's snapshot but might now be gone according to
the catalog snapshot.  In most cases we implemented this by doing
a SearchSysCacheExists test and assuming that if that succeeds, we
can safely invoke the appropriate aclchk.c function, which will
immediately re-fetch the same syscache entry.  It was argued that
if the existence test succeeds then the followup fetch must succeed
as well, for lack of any intervening AcceptInvalidationMessages call.

Alexander Lakhin demonstrated that this is not so when
CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly
dropped at the end of SearchSysCacheExists, and then it is possible
for the catalog snapshot to get advanced while re-fetching the entry.
Alexander's test case requires the operation to happen inside a
parallel worker, but that seems incidental to the fundamental problem.
What remains obscure is whether there is a way for this to happen in a
non-debug build.  Nonetheless, CATCACHE_FORCE_RELEASE is a very useful
test methodology, so we'd better make the code safe for it.

After some discussion we concluded that the most future-proof fix
is to give up the assumption that checking SearchSysCacheExists can
guarantee success of a later fetch.  At best that assumption leads
to fragile code --- for example, has_type_privilege appears broken
for array types even if you believe the assumption holds.  And it's
not even particularly efficient.

There had already been some work towards extending the aclchk.c
APIs to include "is_missing" output flags, so this patch extends
that work to cover all the aclchk.c functions that are used by the
has_xxx_privilege() functions.  (This allows getting rid of some
ad-hoc decisions about not throwing errors in certain places in
aclchk.c.)

In passing, this fixes the has_sequence_privilege() functions to
provide the same guarantees as their cousins: for some reason the
SearchSysCacheExists tests never got added to those.

There is more work to do to remove the unsafe coding pattern with
SearchSysCacheExists in other places, but this is a pretty
self-contained patch so I'll commit it separately.

Per bug #18014 from Alexander Lakhin.  Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD.  (Perhaps we should clean up the has_sequence_privilege()
oversight in the back branches, but in the absence of field complaints
I'm not too excited about that either.)

Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
2023-10-14 14:49:50 -04:00
Nathan Bossart 8d140c5822 Improve the naming in wal_sync_method code.
* sync_method is renamed to wal_sync_method.

* sync_method_options[] is renamed to wal_sync_method_options[].

* assign_xlog_sync_method() is renamed to assign_wal_sync_method().

* The names of the available synchronization methods are now
  prefixed with "WAL_SYNC_METHOD_" and have been moved into a
  WalSyncMethod enum.

* PLATFORM_DEFAULT_SYNC_METHOD is renamed to
  PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is
  renamed to DEFAULT_WAL_SYNC_METHOD.

These more descriptive names help distinguish the code for
wal_sync_method from the code for DataDirSyncMethod (e.g., the
recovery_init_sync_method configuration parameter and the
--sync-method option provided by several frontend utilities).  This
change also prevents name collisions between the aforementioned
sets of code.  Since this only improves the naming of internal
identifiers, there should be no behavior change.

Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
2023-10-13 15:16:45 -05:00
Michael Paquier 97957fdbaa Add support for AT LOCAL
When converting a timestamp to/from with/without time zone, the SQL
Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the
session's time zone.  This includes three system functions able to do
the work in the same way as the existing flavors for AT TIME ZONE,
except that these need to be marked as stable as they depend on the
session's TimeZone GUC.

Bump catalog version.

Author: Vik Fearing
Reviewed-by: Laurenz Albe, Cary Huang, Michael Paquier
Discussion: https://postgr.es/m/8e25dec4-5667-c1a5-6581-167d710c2182@postgresfriends.org
2023-10-13 13:01:37 +09:00
Michael Paquier e7689190b3 Add option to bgworkers to allow the bypass of role login check
This adds a new option called BGWORKER_BYPASS_ROLELOGINCHECK to the
flags available to BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().

This gives the possibility to bgworkers to bypass the role login check,
making possible the use of a role that has no login rights while not
being a superuser.  PostgresInit() gains a new flag called
INIT_PG_OVERRIDE_ROLE_LOGIN, taking advantage of the refactoring done in
4800a5dfb4.

Regression tests are added to worker_spi to check the behavior of this
new option with bgworkers.

Author: Bertrand Drouvot
Reviewed-by: Nathan Bossart, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/bcc36259-7850-4882-97ef-d6b905d2fc51@gmail.com
2023-10-12 09:24:17 +09:00
Michael Paquier 4800a5dfb4 Refactor InitPostgres() to use bitwise option flags
InitPostgres() has been using a set of boolean arguments to control its
behavior, and a patch under discussion was aiming at expanding it with a
third one.  In preparation for expanding this area, this commit switches
all the current boolean arguments of this routine to a single bits32
argument instead.  Two values are currently supported for the flags:
- INIT_PG_LOAD_SESSION_LIBS to load [session|local]_preload_libraries at
startup.
- INIT_PG_OVERRIDE_ALLOW_CONNS to allow connection to a database even if
it has !datallowconn.  This is used by bgworkers.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZSTn66_BXRZCeaqS@paquier.xyz
2023-10-11 12:31:49 +09:00
Peter Eisentraut 1d91d24d9a Add const to values and nulls arguments
This excludes any changes that would change the external AM APIs.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/14c31f4a-0347-0805-dce8-93a9072c05a5%40eisentraut.org
2023-10-10 07:50:43 +02:00
Heikki Linnakangas 637109d13a Rename StartBackgroundWorker() to BackgroundWorkerMain().
The comment claimed that it is "called from postmaster", but it is
actually called in the child process, pretty early in the process
initialization. I guess you could interpret "called from postmaster"
to mean that, but it seems wrong to me. Rename the function to be
consistent with other functions with similar role.

Reviewed-by: Thomas Munro
Discussion: https://www.postgresql.org/message-id/4f95c1fc-ad3c-7974-3a8c-6faa3931804c@iki.fi
2023-10-09 11:52:09 +03:00
David Rowley 77db132637 Remove debug_print_rel and replace usages with pprint
Going by c4a1933b4, b33ef397a and 05893712c (to name just a few), it seems
that maintaining debug_print_rel() is often forgotten.  In the case of
c4a1933b4, it was several years before anyone noticed that a path type
was not handled by debug_print_rel().  (debug_print_rel() is only
compiled when building with OPTIMIZER_DEBUG).

After a quick survey on the pgsql-hackers mailing list, nobody came
forward to admit that they use OPTIMIZER_DEBUG.  So to prevent any future
maintenance neglect, let's just remove debug_print_rel() and have
OPTIMIZER_DEBUG make use of pprint() instead (as suggested by Tom Lane).
If anyone wants to come forward to claim they make use of
OPTIMIZER_DEBUG in a way that they need debug_print_rel() then they have
around 10 months remaining in the v17 cycle where we could revert this.
If nobody comes forward in that time, then we can likely safely declare
debug_print_rel() as not worth keeping.

Discussion: https://postgr.es/m/CAApHDvoCdjo8Cu2zEZF4-AxWG-90S+pYXAnoDDa9J3xH-OrczQ@mail.gmail.com
2023-10-09 15:53:16 +13:00
Tom Lane b6c7cfac88 Restore proper linkage of pg_char_to_encoding() and friends.
Back in the 8.3 era we discovered that it was problematic if
libpq.so had encoding ID assignments different from the backend,
which is possible because on some platforms libpq.so might be
of a different major version from the calling programs.
psql should use libpq's assignments, but initdb has to use the
backend's, else it will put wrong values into pg_database.
The solution devised in commit 8468146b0 relied on giving initdb
its own copy of encnames.c rather than relying on the functions
exported by libpq.  Later, that metamorphosed into ensuring that
libpgcommon got linked before libpq -- which made things OK for
initdb but broke psql.  We didn't notice for lack of any changes
in enum pg_enc since then.  Commit 06843df4a reversed that, fixing
the latent bug in psql but adding one in initdb.  The meson build
infrastructure is also not being sufficiently careful about link
order, and trying to make it so would be equally fragile.

Hence, let's use a new scheme based on giving the libpq-exported
symbols different real names than the same functions exported from
libpgcommon.a or libpgcommon_srv.a.  (We could distinguish those
two cases as well, but there seems no need to.)  libpq gets the
official names to avoid an ABI break for libpq clients, while the
other cases use #define's to make the real names "xxx_private"
rather than "xxx".  By controlling where the #define's are
applied, we can force any particular client program to use one
set or the other of the encnames.c functions.

We cannot back-patch this, since it'd be an ABI break for backend
loadable modules, but there seems little need to.  We're just
trying to ensure that the world is safe for hypothetical future
additions to enum pg_enc.

In passing this should fix "duplicate symbol" linker warnings
that we've been seeing on AIX buildfarm members since commit
06843df4a.  It's not very clear why that linker is complaining
now, when there were strictly *more* duplicates visible before,
but in any case this should remove the reason for complaint.

Patch by me; thanks to Andres Freund for review.

Discussion: https://postgr.es/m/2385119.1696354473@sss.pgh.pa.us
2023-10-07 12:08:10 -04:00
Alexander Korotkov e0b1ee17dc Skip checking of scan keys required for directional scan in B-tree
Currently, B-tree code matches every scan key to every item on the page.
Imagine the ordered B-tree scan for the query like this.

SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

The (col > 'a') scan key will be always matched once we find the location to
start the scan.  The (col < 'b') scan key will match every item on the page
as long as it matches the last item on the page.

This patch implements prechecking of the scan keys required for directional
scan on beginning of page scan.  If precheck is successful we can skip this
scan keys check for the items on the page.  That could lead to significant
acceleration especially if the comparison operator is expensive.

Idea from patch by Konstantin Knizhnik.

Discussion: https://postgr.es/m/079c3f8e-3371-abe2-e93c-fc8a0ae3f571%40garret.ru
Reviewed-by: Peter Geoghegan, Pavel Borisov
2023-10-06 10:40:51 +03:00
Peter Eisentraut 04e485273b Move BuildDescForRelation() from tupdesc.c to tablecmds.c
BuildDescForRelation() main job is to convert ColumnDef lists to
pg_attribute/tuple descriptor arrays, which is really mostly an
internal subroutine of DefineRelation() and some related functions,
which is more the remit of tablecmds.c and doesn't have much to do
with the basic tuple descriptor interfaces in tupdesc.c.  This is also
supported by observing the header includes we can remove in tupdesc.c.
By moving it over, we can also (in the future) make
BuildDescForRelation() use more internals of tablecmds.c that are not
sensible to be exposed in tupdesc.c.

Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org
2023-10-05 16:20:46 +02:00
Heikki Linnakangas e29c464395 Refactor ListenSocket array.
Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.

Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.

Discussion: https://www.postgresql.org/message-id/7bb7ad65-a018-2419-742f-fa5fd877d338@iki.fi
2023-10-05 15:05:25 +03:00
Alvaro Herrera 1c99cde2f3
Improve JsonLexContext's freeability
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived.  With new features
being added for SQL/JSON this is no longer the case.  Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.

At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.

This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it.  AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.

Per Coverity complaint about datum_to_jsonb_internal().

Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
2023-10-05 10:59:08 +02:00
Peter Eisentraut 5e4282772a Remove RelationGetIndexRawAttOptions()
There was only one caller left, for which this function was overkill.

Also, having it in relcache.c was inappropriate, since it doesn't work
with the relcache at all.

Discussion: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org
2023-10-03 17:51:02 +02:00