Commit Graph

2212 Commits

Author SHA1 Message Date
Heikki Linnakangas aafc05de1b Refactor postmaster child process launching
Introduce new postmaster_child_launch() function that deals with the
differences in EXEC_BACKEND mode.

Refactor the mechanism of passing information from the parent to child
process. Instead of using different command-line arguments when
launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of startup data along with all the global
variables. The contents of that blob depend on the kind of child
process being launched. In !EXEC_BACKEND mode, we use the same blob,
but it's simply inherited from the parent to child process.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-18 11:35:30 +02:00
Dean Rasheed c649fa24a4 Add RETURNING support to MERGE.
This allows a RETURNING clause to be appended to a MERGE query, to
return values based on each row inserted, updated, or deleted. As with
plain INSERT, UPDATE, and DELETE commands, the returned values are
based on the new contents of the target table for INSERT and UPDATE
actions, and on its old contents for DELETE actions. Values from the
source relation may also be returned.

As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be
used as the source relation for other operations such as WITH queries
and COPY commands.

Additionally, a special function merge_action() is provided, which
returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action
executed for each row. The merge_action() function can be used
anywhere in the RETURNING list, including in arbitrary expressions and
subqueries, but it is an error to use it anywhere outside of a MERGE
query's RETURNING list.

Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera,
Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut,
and Wolfgang Walther.

Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
2024-03-17 13:58:59 +00:00
Peter Eisentraut d939cb2fd6 Generalize handling of nullable pg_attribute columns in DDL
DDL code uses tuple descriptors to pass around pg_attribute values
during table and index creation.  But tuple descriptors don't include
the variable-length/nullable columns of pg_attribute, so they have to
be handled separately.  Right now, the attoptions field is handled in
a one-off way with a separate argument passed to
InsertPgAttributeTuples().  The other affected fields of pg_attribute
are right now not needed at relation creation time.

The goal of this patch is to generalize this to allow handling
additional variable-length/nullable columns of pg_attribute in a
similar manner.  For that, create a new struct
FormExtraData_pg_attribute, which is to be passed around in parallel
to the tuple descriptor and optionally supplies the additional
columns.  Right now, this struct only contains one field for
attoptions, so no functionality is actually changed by this.

Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org
2024-03-17 12:30:51 +01:00
Peter Eisentraut 6ab2e8385d Put genbki.pl output into src/include/catalog/ directly
With the makefile rules, the output of genbki.pl was written to
src/backend/catalog/, and then the header files were linked to
src/include/catalog/.

This changes it so that the output files are written directly to
src/include/catalog/.  This makes the logic simpler, and it also makes
the behavior consistent with the meson build system.  Also, the list
of catalog files is now kept in parallel in
src/include/catalog/{meson.build,Makefile}, while before the makefiles
had it in src/backend/catalog/Makefile.

Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/21b74bdc-183d-4dd5-9c27-9378d178f459@eisentraut.org
2024-03-14 07:11:21 +01:00
Thomas Munro 0265e5c120 ci: Use a RAM disk and more CPUs on FreeBSD.
Run the tests in a RAM disk.  It's still a UFS file system and is backed
by 20GB of disk, but this avoids a lot of I/O.  Even though we disable
fsync, our tests do a lot of directory manipulations, some of which
force file system meta-data to disk and flush slow device write caches
on UFS.  This was a bottleneck preventing effective scaling beyond 2
CPUs.

Now we can use 4 CPUs like on other OSes, for a huge speedup.

Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BFXLcEg1dyTqJjDiNQ8pGom4KrJj4wF38C90thti9dVA%40mail.gmail.com
2024-03-13 14:58:27 +13:00
Alvaro Herrera 61461a300c
libpq: Add encrypted and non-blocking query cancellation routines
The existing PQcancel API uses blocking IO, which makes PQcancel
impossible to use in an event loop based codebase without blocking the
event loop until the call returns.  It also doesn't encrypt the
connection over which the cancel request is sent, even when the original
connection required encryption.

This commit adds a PQcancelConn struct and assorted functions, which
provide a better mechanism of sending cancel requests; in particular all
the encryption used in the original connection are also used in the
cancel connection.  The main entry points are:

- PQcancelCreate creates the PQcancelConn based on the original
  connection (but does not establish an actual connection).
- PQcancelStart can be used to initiate non-blocking cancel requests,
  using encryption if the original connection did so, which must be
  pumped using
- PQcancelPoll.
- PQcancelReset puts a PQcancelConn back in state so that it can be
  reused to send a new cancel request to the same connection.
- PQcancelBlocking is a simpler-to-use blocking API that still uses
  encryption.

Additional functions are
 - PQcancelStatus, mimicks PQstatus;
 - PQcancelSocket, mimicks PQcancelSocket;
 - PQcancelErrorMessage, mimicks PQerrorMessage;
 - PQcancelFinish, mimicks PQfinish.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Denis Laxalde <denis.laxalde@dalibo.com>
Discussion: https://postgr.es/m/AM5PR83MB0178D3B31CA1B6EC4A8ECC42F7529@AM5PR83MB0178.EURPRD83.prod.outlook.com
2024-03-12 17:32:25 +01:00
Heikki Linnakangas 4945e4ed4a Move initialization of the Port struct to the child process
In postmaster, use a more lightweight ClientSocket struct that
encapsulates just the socket itself and the remote endpoint's address
that you get from accept() call. ClientSocket is passed to the child
process, which initializes the bigger Port struct. This makes it more
clear what information postmaster initializes, and what is left to the
child process.

Rename the StreamServerPort and StreamConnection functions to make it
more clear what they do. Remove StreamClose, replacing it with plain
closesocket() calls.

Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
2024-03-12 13:42:38 +02:00
Michael Paquier 2c8118ee5d Use printf's %m format instead of strerror(errno) in more places
Most callers of strerror() are removed from the backend code.  The
remaining callers require special handling with a saved errno from a
previous system call.  The frontend code still needs strerror() where
error states need to be handled outside of fprintf.

Note that pg_regress is not changed to use %m as the TAP output may
clobber errno, since those functions call fprintf() and friends before
evaluating the format string.

Support for %m in src/port/snprintf.c has been added in d6c55de1f9,
hence all the stable branches currently supported include it.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
2024-03-12 10:02:54 +09:00
Peter Eisentraut 7b8e2ae2fd Combine headerscheck and cpluspluscheck scripts
They are mostly the same, and it is tedious to maintain two copies of
essentially the same exclude list.  headerscheck now has a new option
--cplusplus to select the cpluspluscheck functionality.  The top-level
make targets are still the same.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4754a5b0-a32b-4036-a99a-6de14cf9fd72@eisentraut.org
2024-03-10 07:56:17 +01:00
Amit Kapila bf279ddd1c Introduce a new GUC 'standby_slot_names'.
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.

The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.

Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.

Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-03-08 08:10:45 +05:30
John Naylor ee1b30f128 Add template for adaptive radix tree
This implements a radix tree data structure based on the design in
"The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases"
by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main
technique that makes it adaptive is using several different node types,
each with a different capacity of elements, and a different algorithm
for accessing them. The nodes start small and grow/shrink as needed.

The main advantage over hash tables is efficient sorted iteration and
better memory locality when successive keys are lexicographically
close together. The implementation currently assumes 64-bit integer
keys, and traversing the tree is in general slower than a linear
probing hash table, so this is not a general-purpose associative array.

The paper describes two other techniques not implemented here,
namely "path compression" and "lazy expansion". These can further
reduce memory usage and speed up traversal, but the former would add
significant complexity and the latter requires storing the full key
with the value. We do trivially compress the path when leading bytes
of the key are zeros, however.

For value storage, we use "combined pointer/value slots", as
recommended in the paper. Values of size equal or smaller than the the
platform's pointer type are stored in the array of child pointers in
the last level node, while larger values are each stored in a separate
allocation. This is for now fixed at compile time, but it would be
fairly trivial to allow determining at runtime how variable-length
values are stored.

One innovation in our implementation compared to the ART paper is
decoupling the notion of node "size class" from "kind". The size
classes within a given node kind have the same underlying type, but
a variable capacity for children, so we can introduce additional node
sizes with little additional code.

To enable different use cases to specialize for different value types
and for shared/local memory, we use macro-templatized code generation
in the same manner as simplehash.h and sort_template.h.

Future commits will use this infrastructure for storing TIDs.

Patch by Masahiko Sawada and John Naylor, but a substantial amount of
credit is due to Andres Freund, whose proof-of-concept was a valuable
source of coding idioms and awareness of performance pitfalls, and
who reviewed earlier versions.

Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-07 12:40:11 +07:00
Heikki Linnakangas 067701f577 Remove MyAuxProcType, use MyBackendType instead
MyAuxProcType was redundant with MyBackendType.

Reviewed-by: Reid Thompson, Andres Freund
Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-03-04 10:25:09 +02:00
Michael Paquier 37b369dc67 injection_points: Add wait and wakeup of processes
This commit adds two features to the in-core module for injection
points:
- A new callback called "wait" that can be attached to an injection
point to make it wait.
- A new SQL function to update the shared state and broadcast the update
using a condition variable.  This function uses an input an injection
point name.

This offers the possibility to stop a process in flight and wake it up
in a controlled manner, which is useful when implementing tests that aim
to trigger scenarios for race conditions (some tests are planned for
integration).  The logic uses a set of counters with a condition
variable to monitor and broadcast the changes.  Up to 8 waits can be
registered in a single run, which should be plenty enough.  Waits can be
monitored in pg_stat_activity, based on the injection point name which
is registered in a custom wait event under the "Extension" category.

The shared memory state used by the module is registered using the DSM
registry, and is optional, so there is no need to load the module with
shared_preload_libraries to be able to use these features.

Author: Michael Paquier
Reviewed-by: Andrey Borodin, Bertrand Drouvot
Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz
2024-03-04 09:19:13 +09:00
Michael Paquier 655dc31046 Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntax
This commit switches pg_enc2gettext_tbl[] in encnames.c to use a
C99-designated initializer syntax.

pg_bind_textdomain_codeset() is simplified so as it is possible to do
a direct lookup at the gettext() array with a value of the enum pg_enc
rather than doing a loop through all its elements, as long as the
encoding value provided by GetDatabaseEncoding() is in the correct range
of supported encoding values.  Note that PG_MULE_INTERNAL gains a value
in the array, pointing to NULL.

Author: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-03-01 18:03:48 +09:00
Heikki Linnakangas 0b16bb8776 Remove AIX support
There isn't a lot of user demand for AIX support, we have a bunch of
hacks to work around AIX-specific compiler bugs and idiosyncrasies,
and no one has stepped up to the plate to properly maintain it.
Remove support for AIX to get rid of that maintenance overhead. It's
still supported for stable versions.

The acute issue that triggered this decision was that after commit
8af2565248, the AIX buildfarm members have been hitting this
assertion:

    TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728

Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX
for values larger than PG_IO_ALIGN_SIZE, for a static const variable.
That could be worked around, but we decided to just drop the AIX support
instead.

Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com
Reviewed-by: Andres Freund, Noah Misch, Thomas Munro
2024-02-28 15:17:23 +04:00
Heikki Linnakangas 8af2565248 Introduce a new smgr bulk loading facility.
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-23 16:10:51 +02:00
Amit Kapila ddd5f4f54a Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.

A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.

The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.

If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.

If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.

The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.

A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-14 09:45:36 +05:30
Peter Eisentraut 428e2de1b8 Remove obsolete script related to MSVC build system 2024-02-11 23:43:50 +01:00
Michael Paquier 3e91dba8b0 Fix various issues with ALTER TEXT SEARCH CONFIGURATION
This commit addresses a set of issues when changing token type mappings
in a text search configuration when using duplicated token names:
- ADD MAPPING would fail on insertion because of a constraint failure
after inserting the same mapping.
- ALTER MAPPING with an "overridden" configuration failed with "tuple
already updated by self" when the token mappings are removed.
- DROP MAPPING failed with "tuple already updated by self", like
previously, but in a different code path.

The code is refactored so the token names (with their numbers) are
handled as a List with unique members rather than an array with numbers,
ensuring that no duplicates mess up with the catalog inserts, updates
and deletes.  The list is generated by getTokenTypes(), with the same
error handling as previously while duplicated tokens are discarded from
the list used to work on the catalogs.

Regression tests are expanded to cover much more ground for the cases
fixed by this commit, as there was no coverage for the code touched in
this commit.  A bit more is done regarding the fact that a token name
not supported by a configuration's parser should result in an error even
if IF EXISTS is used in a DROP MAPPING clause.  This is implied in the
code but there was no coverage for that, and it was very easy to miss.

These issues exist since at least their introduction in core with
140d4ebcb4, so backpatch all the way down.

Reported-by: Alexander Lakhin
Author: Tender Wang, Michael Paquier
Discussion: https://postgr.es/m/18310-1eb233c5908189c8@postgresql.org
Backpatch-through: 12
2024-01-31 13:15:21 +09:00
Amit Kapila 7329240437 Allow setting failover property in the replication command.
This commit implements a new replication command called
ALTER_REPLICATION_SLOT and a corresponding walreceiver API function named
walrcv_alter_slot. Additionally, the CREATE_REPLICATION_SLOT command has
been extended to support the failover option.

These new additions allow the modification of the failover property of a
replication slot on the publisher. A subsequent commit will make use of
these commands in subscription commands and will add the tests as well to
cover the functionality added/changed by this commit.

Author: Hou Zhijie, Shveta Malik
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-29 09:37:23 +05:30
Masahiko Sawada 08e6344fd6 Remove ReorderBufferTupleBuf structure.
Since commit a4ccc1cef, the 'node' and 'alloc_tuple_size' fields of
the ReorderBufferTupleBuf structure are no longer used. This leaves
only the 'tuple' field in the structure. Since keeping a single-field
structure makes little sense, the ReorderBufferTupleBuf is removed
entirely. The code is refactored accordingly.

No back-patching since these are ABI changes in an exposed structure
and functions, and there would be some risk of breaking extensions.

Author: Aleksander Alekseev
Reviewed-by: Amit Kapila, Masahiko Sawada, Reid Thompson
Discussion: https://postgr.es/m/CAD21AoCvnuxiXXfRecp7g9+CeC35POQfhuQeJFr7_9u_Q5jc_Q@mail.gmail.com
2024-01-29 10:37:16 +09:00
Peter Eisentraut 9b1a6f50b9 Generate syscache info from catalog files
Add a new genbki macros MAKE_SYSCACHE that specifies the syscache ID
macro, the underlying index, and the number of buckets.  From that, we
can generate the existing tables in syscache.h and syscache.c via
genbki.pl.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2024-01-23 07:31:06 +01:00
Michael Paquier d86d20f0ba Add backend support for injection points
Injection points are a new facility that makes possible for developers
to run custom code in pre-defined code paths.  Its goal is to provide
ways to design and run advanced tests, for cases like:
- Race conditions, where processes need to do actions in a controlled
ordered manner.
- Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force
recovery, etc.
- Arbitrary sleeps.

This implements some basics, and there are plans to extend it more in
the future depending on what's required.  Hence, this commit adds a set
of routines in the backend that allows developers to attach, detach and
run injection points:
- A code path calling an injection point can be declared with the macro
INJECTION_POINT(name).
- InjectionPointAttach() and InjectionPointDetach() to respectively
attach and detach a callback to/from an injection point.  An injection
point name is registered in a shmem hash table with a library name and a
function name, which will be used to load the callback attached to an
injection point when its code path is run.

Injection point names are just strings, so as an injection point can be
declared and run by out-of-core extensions and modules, with callbacks
defined in external libraries.

This facility is hidden behind a dedicated switch for ./configure and
meson, disabled by default.

Note that backends use a local cache to store callbacks already loaded,
cleaning up their cache if a callback has found to be removed on a
best-effort basis.  This could be refined further but any tests but what
we have here was fine with the tests I've written while implementing
these backend APIs.

Author: Michael Paquier, with doc suggestions from Ashutosh Bapat.
Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip
Kumar, Amul Sul, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-22 10:15:50 +09:00
Alexander Korotkov 0452b461bc Explore alternative orderings of group-by pathkeys during optimization.
When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.

Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Author: Andrei Lepikhov, Teodor Sigaev
Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
2024-01-21 22:21:36 +02:00
Nathan Bossart 8b2bcf3f28 Introduce the dynamic shared memory registry.
Presently, the most straightforward way for a shared library to use
shared memory is to request it at server startup via a
shmem_request_hook, which requires specifying the library in
shared_preload_libraries.  Alternatively, the library can create a
dynamic shared memory (DSM) segment, but absent a shared location
to store the segment's handle, other backends cannot use it.  This
commit introduces a registry for DSM segments so that these other
backends can look up existing segments with a library-specified
string.  This allows libraries to easily use shared memory without
needing to request it at server startup.

The registry is accessed via the new GetNamedDSMSegment() function.
This function handles allocating the segment and initializing it
via a provided callback.  If another backend already created and
initialized the segment, it simply attaches the segment.
GetNamedDSMSegment() locks the registry appropriately to ensure
that only one backend initializes the segment and that all other
backends just attach it.

The registry itself is comprised of a dshash table that stores the
DSM segment handles keyed by a library-specified string.

Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul
Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13
2024-01-19 14:24:36 -06:00
Alexander Korotkov b725b7eec4 Rename COPY option from SAVE_ERROR_TO to ON_ERROR
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".

Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi
2024-01-19 15:15:51 +02:00
John Naylor e97b672c88 Add inline incremental hash functions for in-memory use
It can be useful for a hash function to expose separate initialization,
accumulation, and finalization steps.  In particular, this is useful
for building inline hash functions for simplehash.  Instead of trying
to whack around hash_bytes while maintaining its current behavior on
all platforms, we base this work on fasthash (MIT licensed) which
is simple, faster than hash_bytes for inputs over 12 bytes long,
and also passes the hash function testing suite SMHasher.

The fasthash functions have been reimplemented using our added-on
incremental interface to validate that this method will still give
the same answer, provided we have the input length ahead of time.

This functionality lives in a new header hashfn_unstable.h. The name
implies we have the freedom to change things across versions that
would be unacceptable for our other hash functions that are used for
e.g. hash indexes and hash partitioning. As such, these should only
be used for in-memory data structures like hash tables. There is also
no guarantee of being independent of endianness or pointer size.

As demonstration, use fasthash for pgstat_hash_hash_key.  Previously
this called the 32-bit murmur finalizer on the three elements,
then joined them with hash_combine(). The new function is simpler,
faster and takes up less binary space. While the collision and bias
behavior were almost certainly fine with the previous coding, now we
have objective confidence of that.

There are other places that could benefit from this, but that is left
for future work.

Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao
Credit to Andres Freund for the idea

Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de
2024-01-19 12:44:09 +07:00
Robert Haas e313a61137 Remove LVPagePruneState.
Commit cb970240f1 moved some code from
lazy_scan_heap() to lazy_scan_prune(), and now some things that used to
need to be passed back and forth are completely local to lazy_scan_prune().
Hence, this struct is mostly obsolete.  The only thing that still
needs to be passed back to the caller is has_lpdead_items, and that's
also passed back by lazy_scan_noprune(), so do it the same way in both
cases.

Melanie Plageman, reviewed and slightly revised by me.

Discussion: http://postgr.es/m/CAAKRu_aM=OL85AOr-80wBsCr=vLVzhnaavqkVPRkFBtD0zsuLQ@mail.gmail.com
2024-01-18 15:17:09 -05:00
Alexander Korotkov 9e2d870119 Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.

This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.

Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi
2024-01-16 23:08:53 +02:00
Robert Haas ee1bfd1683 Add new pg_walsummary tool.
This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be useful to external tools.

Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
2024-01-11 12:48:27 -05:00
Andrew Dunstan dbad1c53e9 Add copyright notices to a few perl scripts that don't have them 2024-01-05 13:15:50 +00:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Michael Paquier 359a3c586d Fix some typos
Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Shubham Khanna
Discussion: https://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-03 14:22:54 +09:00
Tom Lane ae6bc39317 Minor fixes for search path cache code.
Avoid leaving a dangling pointer in the unlikely event that
nsphash_create fails.  Improve comments, and fix formatting by
adding typedefs.list entries.

Discussion: https://postgr.es/m/3972900.1704145107@sss.pgh.pa.us
2024-01-02 14:57:21 -05:00
Robert Haas e62e73f3a2 gist: fix typo "split(t)ed" -> "split"
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna.

Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02 12:24:28 -05:00
Amit Kapila 9a17be1e24 Allow upgrades to preserve the full subscription's state.
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.

Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker).  Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.

To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.

The subscription's replication origin is needed to ensure that we don't
replicate anything twice.

To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.

pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.

Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 08:08:46 +05:30
Peter Eisentraut cea89c93a1 Turn AT_PASS_* macros into an enum
This make this code simpler and easier to follow.  Also, patches that
want to change the passes won't have to renumber the whole list.

Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com
2024-01-01 19:17:18 +01:00
Michael Paquier a99009a9a3 Exclude files generated by generate-wait_event_types.pl from pgindent
The format of these files becomes arguably worse after being indented,
and, as they are generated, there is no point in applying an indentation
anyway.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACW2JUocmieuR3n9AXL4iSsHcL1LmNkiukuFRUvKNMoiKg@mail.gmail.com
2023-12-31 18:06:56 +09:00
Tomas Vondra 6c63bcbf3c Minor cleanup of the BRIN parallel build code
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:

* The "spool" grouping tuplesort and the heap/index is not necessary.
  The heap/index are available as separate arguments, causing confusion.
  So remove the spool, and use the tuplesort directly.

* The new tuplesort variant does not need the heap/index, as it sorts
  simply by the range block number, without accessing the tuple data.
  So simplify that too.

Initial report and patch by Ranier Vilela, further cleanup by me.

Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
2023-12-30 23:15:04 +01:00
Peter Eisentraut c538592959 Make all Perl warnings fatal
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests.  Occasionally, these scripts produce warnings.  These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.

This changes all warnings into fatal errors, by replacing

    use warnings;

by

    use warnings FATAL => 'all';

in all Perl files.

Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-29 18:20:00 +01:00
Tom Lane e2b73f4a4d Stop generating plain-text INSTALL instructions.
Up to now, our distribution tarballs have included a plain-text form
of the installation.sgml chapter.  The rationale for that was that a
recipient might not have either ready internet access or HTML-viewing
tools; a theory that seems downright quaint today.  Maintaining the
ability to generate this file is not without cost, because it puts
special requirements on installation.sgml that are often overlooked.
Moreover, we are moving in the direction of making our distribution
tarballs be pure git snapshots for traceability/reproducibility
reasons; including generated files doesn't fit into that plan.
Hence, let's just drop INSTALL and remove the infrastructure for
generating it.  The top-level README will now recommend visiting
our website to see the installation instructions.  As a useful
side-effect, we can get rid of README.git which has provoked
confusion.

Discussion: https://postgr.es/m/20231220114927.faccqqprmuyrzdip@alap3.anarazel.de
Discussion: https://postgr.es/m/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-12-22 13:32:15 -05:00
Andrew Dunstan 8ddf9c1dc0 Make win32tzlist.pl checkable again
Commit 1301c80b21 removed some infrastructure needed to check
windows-oriented perl scripts. It also removed most such scripts, but
this one was left over. We repair the damage by making Win32::Registry a
conditional requirement that is only loaded on Windows. With this change
`perl -cw win32tzlist.pl` once again passes on non-Windows machines.

Discussion: https://postgr.es/m/a2bd77fd-61b8-4c2b-b12e-3e22ae260f82@eisentraut.org
2023-12-22 13:56:27 +00:00
Andrew Dunstan 387aecc948 Rename pgindent options
--show-diff becomes --diff, and --silent-diff becomes --check. These
options may now be given together. Without --check, --diff will exit
with a zero status even if diffs are found. With --check, it will now
exit with a non-zero status in that case.

Author: Tristan Partin
Reviewed-by: Daniel Gustafsson, Jelte Fennema-Nio

Discussion: https://postgr.es/m/CXLX2XYTH9S6.140SC6Y61VD88@neon.tech
2023-12-20 22:37:57 +00:00
Robert Haas dc21234005 Add support for incremental backup.
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 09:49:12 -05:00
Robert Haas 174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Michael Paquier 1301c80b21 Remove MSVC scripts
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that.  The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.

All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed.  Some adjustments
for the documentation are planned in a follow-up set of changes.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
2023-12-20 09:44:37 +09:00
Jeff Davis 867dd2dc87 Cache opaque handle for GUC option to avoid repeasted lookups.
When setting GUCs from proconfig, performance is important, and hash
lookups in the GUC table are significant.

Per suggestion from Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com
Reviewed-by: John Naylor
2023-12-08 11:16:01 -08:00
Tomas Vondra b437571714 Allow parallel CREATE INDEX for BRIN indexes
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.

The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.

The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).

This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.

Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
2023-12-08 18:15:26 +01:00
Heikki Linnakangas b31ba5310b Rename ShmemVariableCache to TransamVariables
The old name was misleading: It's not a cache, the values kept in the
struct are the authoritative source.

Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-08 09:47:15 +02:00
Robert Haas d463aa06a9 Rename JsonManifestParseContext callbacks.
There is no real reason to just run multiple words together when
we can instead punctuate them with marks intended for that purpose.

Per suggestion from Álvaro Herrera.

Discussion: http://postgr.es/m/202311161021.nisg7imt7kyf@alvherre.pgsql
2023-12-05 12:17:49 -05:00