Commit Graph

2197 Commits

Author SHA1 Message Date
Heikki Linnakangas 8af2565248 Introduce a new smgr bulk loading facility.
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-23 16:10:51 +02:00
Amit Kapila ddd5f4f54a Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.

A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.

The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.

If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.

If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.

The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.

A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-14 09:45:36 +05:30
Peter Eisentraut 428e2de1b8 Remove obsolete script related to MSVC build system 2024-02-11 23:43:50 +01:00
Michael Paquier 3e91dba8b0 Fix various issues with ALTER TEXT SEARCH CONFIGURATION
This commit addresses a set of issues when changing token type mappings
in a text search configuration when using duplicated token names:
- ADD MAPPING would fail on insertion because of a constraint failure
after inserting the same mapping.
- ALTER MAPPING with an "overridden" configuration failed with "tuple
already updated by self" when the token mappings are removed.
- DROP MAPPING failed with "tuple already updated by self", like
previously, but in a different code path.

The code is refactored so the token names (with their numbers) are
handled as a List with unique members rather than an array with numbers,
ensuring that no duplicates mess up with the catalog inserts, updates
and deletes.  The list is generated by getTokenTypes(), with the same
error handling as previously while duplicated tokens are discarded from
the list used to work on the catalogs.

Regression tests are expanded to cover much more ground for the cases
fixed by this commit, as there was no coverage for the code touched in
this commit.  A bit more is done regarding the fact that a token name
not supported by a configuration's parser should result in an error even
if IF EXISTS is used in a DROP MAPPING clause.  This is implied in the
code but there was no coverage for that, and it was very easy to miss.

These issues exist since at least their introduction in core with
140d4ebcb4, so backpatch all the way down.

Reported-by: Alexander Lakhin
Author: Tender Wang, Michael Paquier
Discussion: https://postgr.es/m/18310-1eb233c5908189c8@postgresql.org
Backpatch-through: 12
2024-01-31 13:15:21 +09:00
Amit Kapila 7329240437 Allow setting failover property in the replication command.
This commit implements a new replication command called
ALTER_REPLICATION_SLOT and a corresponding walreceiver API function named
walrcv_alter_slot. Additionally, the CREATE_REPLICATION_SLOT command has
been extended to support the failover option.

These new additions allow the modification of the failover property of a
replication slot on the publisher. A subsequent commit will make use of
these commands in subscription commands and will add the tests as well to
cover the functionality added/changed by this commit.

Author: Hou Zhijie, Shveta Malik
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-29 09:37:23 +05:30
Masahiko Sawada 08e6344fd6 Remove ReorderBufferTupleBuf structure.
Since commit a4ccc1cef, the 'node' and 'alloc_tuple_size' fields of
the ReorderBufferTupleBuf structure are no longer used. This leaves
only the 'tuple' field in the structure. Since keeping a single-field
structure makes little sense, the ReorderBufferTupleBuf is removed
entirely. The code is refactored accordingly.

No back-patching since these are ABI changes in an exposed structure
and functions, and there would be some risk of breaking extensions.

Author: Aleksander Alekseev
Reviewed-by: Amit Kapila, Masahiko Sawada, Reid Thompson
Discussion: https://postgr.es/m/CAD21AoCvnuxiXXfRecp7g9+CeC35POQfhuQeJFr7_9u_Q5jc_Q@mail.gmail.com
2024-01-29 10:37:16 +09:00
Peter Eisentraut 9b1a6f50b9 Generate syscache info from catalog files
Add a new genbki macros MAKE_SYSCACHE that specifies the syscache ID
macro, the underlying index, and the number of buckets.  From that, we
can generate the existing tables in syscache.h and syscache.c via
genbki.pl.

Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2024-01-23 07:31:06 +01:00
Michael Paquier d86d20f0ba Add backend support for injection points
Injection points are a new facility that makes possible for developers
to run custom code in pre-defined code paths.  Its goal is to provide
ways to design and run advanced tests, for cases like:
- Race conditions, where processes need to do actions in a controlled
ordered manner.
- Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force
recovery, etc.
- Arbitrary sleeps.

This implements some basics, and there are plans to extend it more in
the future depending on what's required.  Hence, this commit adds a set
of routines in the backend that allows developers to attach, detach and
run injection points:
- A code path calling an injection point can be declared with the macro
INJECTION_POINT(name).
- InjectionPointAttach() and InjectionPointDetach() to respectively
attach and detach a callback to/from an injection point.  An injection
point name is registered in a shmem hash table with a library name and a
function name, which will be used to load the callback attached to an
injection point when its code path is run.

Injection point names are just strings, so as an injection point can be
declared and run by out-of-core extensions and modules, with callbacks
defined in external libraries.

This facility is hidden behind a dedicated switch for ./configure and
meson, disabled by default.

Note that backends use a local cache to store callbacks already loaded,
cleaning up their cache if a callback has found to be removed on a
best-effort basis.  This could be refined further but any tests but what
we have here was fine with the tests I've written while implementing
these backend APIs.

Author: Michael Paquier, with doc suggestions from Ashutosh Bapat.
Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip
Kumar, Amul Sul, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-22 10:15:50 +09:00
Alexander Korotkov 0452b461bc Explore alternative orderings of group-by pathkeys during optimization.
When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.

The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.

The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.

This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.

Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Author: Andrei Lepikhov, Teodor Sigaev
Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
2024-01-21 22:21:36 +02:00
Nathan Bossart 8b2bcf3f28 Introduce the dynamic shared memory registry.
Presently, the most straightforward way for a shared library to use
shared memory is to request it at server startup via a
shmem_request_hook, which requires specifying the library in
shared_preload_libraries.  Alternatively, the library can create a
dynamic shared memory (DSM) segment, but absent a shared location
to store the segment's handle, other backends cannot use it.  This
commit introduces a registry for DSM segments so that these other
backends can look up existing segments with a library-specified
string.  This allows libraries to easily use shared memory without
needing to request it at server startup.

The registry is accessed via the new GetNamedDSMSegment() function.
This function handles allocating the segment and initializing it
via a provided callback.  If another backend already created and
initialized the segment, it simply attaches the segment.
GetNamedDSMSegment() locks the registry appropriately to ensure
that only one backend initializes the segment and that all other
backends just attach it.

The registry itself is comprised of a dshash table that stores the
DSM segment handles keyed by a library-specified string.

Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul
Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13
2024-01-19 14:24:36 -06:00
Alexander Korotkov b725b7eec4 Rename COPY option from SAVE_ERROR_TO to ON_ERROR
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".

Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi
2024-01-19 15:15:51 +02:00
John Naylor e97b672c88 Add inline incremental hash functions for in-memory use
It can be useful for a hash function to expose separate initialization,
accumulation, and finalization steps.  In particular, this is useful
for building inline hash functions for simplehash.  Instead of trying
to whack around hash_bytes while maintaining its current behavior on
all platforms, we base this work on fasthash (MIT licensed) which
is simple, faster than hash_bytes for inputs over 12 bytes long,
and also passes the hash function testing suite SMHasher.

The fasthash functions have been reimplemented using our added-on
incremental interface to validate that this method will still give
the same answer, provided we have the input length ahead of time.

This functionality lives in a new header hashfn_unstable.h. The name
implies we have the freedom to change things across versions that
would be unacceptable for our other hash functions that are used for
e.g. hash indexes and hash partitioning. As such, these should only
be used for in-memory data structures like hash tables. There is also
no guarantee of being independent of endianness or pointer size.

As demonstration, use fasthash for pgstat_hash_hash_key.  Previously
this called the 32-bit murmur finalizer on the three elements,
then joined them with hash_combine(). The new function is simpler,
faster and takes up less binary space. While the collision and bias
behavior were almost certainly fine with the previous coding, now we
have objective confidence of that.

There are other places that could benefit from this, but that is left
for future work.

Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao
Credit to Andres Freund for the idea

Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de
2024-01-19 12:44:09 +07:00
Robert Haas e313a61137 Remove LVPagePruneState.
Commit cb970240f1 moved some code from
lazy_scan_heap() to lazy_scan_prune(), and now some things that used to
need to be passed back and forth are completely local to lazy_scan_prune().
Hence, this struct is mostly obsolete.  The only thing that still
needs to be passed back to the caller is has_lpdead_items, and that's
also passed back by lazy_scan_noprune(), so do it the same way in both
cases.

Melanie Plageman, reviewed and slightly revised by me.

Discussion: http://postgr.es/m/CAAKRu_aM=OL85AOr-80wBsCr=vLVzhnaavqkVPRkFBtD0zsuLQ@mail.gmail.com
2024-01-18 15:17:09 -05:00
Alexander Korotkov 9e2d870119 Add new COPY option SAVE_ERROR_TO
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.

This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.

Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.

Later works are expected to add more choices, such as 'log' and 'table'.

Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi
2024-01-16 23:08:53 +02:00
Robert Haas ee1bfd1683 Add new pg_walsummary tool.
This can dump the contents of the WAL summary files found in
pg_wal/summaries. Normally, this shouldn't really be something anyone
needs to do, but it may be needed for debugging problems with
incremental backup, or could possibly be useful to external tools.

Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
2024-01-11 12:48:27 -05:00
Andrew Dunstan dbad1c53e9 Add copyright notices to a few perl scripts that don't have them 2024-01-05 13:15:50 +00:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Michael Paquier 359a3c586d Fix some typos
Author: Dagfinn Ilmari Mannsåker
Reviewed-by: Shubham Khanna
Discussion: https://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-03 14:22:54 +09:00
Tom Lane ae6bc39317 Minor fixes for search path cache code.
Avoid leaving a dangling pointer in the unlikely event that
nsphash_create fails.  Improve comments, and fix formatting by
adding typedefs.list entries.

Discussion: https://postgr.es/m/3972900.1704145107@sss.pgh.pa.us
2024-01-02 14:57:21 -05:00
Robert Haas e62e73f3a2 gist: fix typo "split(t)ed" -> "split"
Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna.

Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
2024-01-02 12:24:28 -05:00
Amit Kapila 9a17be1e24 Allow upgrades to preserve the full subscription's state.
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.

Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker).  Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.

To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.

The subscription's replication origin is needed to ensure that we don't
replicate anything twice.

To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.

pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.

Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
2024-01-02 08:08:46 +05:30
Peter Eisentraut cea89c93a1 Turn AT_PASS_* macros into an enum
This make this code simpler and easier to follow.  Also, patches that
want to change the passes won't have to renumber the whole list.

Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com
2024-01-01 19:17:18 +01:00
Michael Paquier a99009a9a3 Exclude files generated by generate-wait_event_types.pl from pgindent
The format of these files becomes arguably worse after being indented,
and, as they are generated, there is no point in applying an indentation
anyway.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACW2JUocmieuR3n9AXL4iSsHcL1LmNkiukuFRUvKNMoiKg@mail.gmail.com
2023-12-31 18:06:56 +09:00
Tomas Vondra 6c63bcbf3c Minor cleanup of the BRIN parallel build code
Commit b437571714 added support for parallel builds for BRIN indexes,
using code similar to BTREE parallel builds, and also a new tuplesort
variant. This commit simplifies the new code in two ways:

* The "spool" grouping tuplesort and the heap/index is not necessary.
  The heap/index are available as separate arguments, causing confusion.
  So remove the spool, and use the tuplesort directly.

* The new tuplesort variant does not need the heap/index, as it sorts
  simply by the range block number, without accessing the tuple data.
  So simplify that too.

Initial report and patch by Ranier Vilela, further cleanup by me.

Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com
2023-12-30 23:15:04 +01:00
Peter Eisentraut c538592959 Make all Perl warnings fatal
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests.  Occasionally, these scripts produce warnings.  These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.

This changes all warnings into fatal errors, by replacing

    use warnings;

by

    use warnings FATAL => 'all';

in all Perl files.

Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-29 18:20:00 +01:00
Tom Lane e2b73f4a4d Stop generating plain-text INSTALL instructions.
Up to now, our distribution tarballs have included a plain-text form
of the installation.sgml chapter.  The rationale for that was that a
recipient might not have either ready internet access or HTML-viewing
tools; a theory that seems downright quaint today.  Maintaining the
ability to generate this file is not without cost, because it puts
special requirements on installation.sgml that are often overlooked.
Moreover, we are moving in the direction of making our distribution
tarballs be pure git snapshots for traceability/reproducibility
reasons; including generated files doesn't fit into that plan.
Hence, let's just drop INSTALL and remove the infrastructure for
generating it.  The top-level README will now recommend visiting
our website to see the installation instructions.  As a useful
side-effect, we can get rid of README.git which has provoked
confusion.

Discussion: https://postgr.es/m/20231220114927.faccqqprmuyrzdip@alap3.anarazel.de
Discussion: https://postgr.es/m/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-12-22 13:32:15 -05:00
Andrew Dunstan 8ddf9c1dc0 Make win32tzlist.pl checkable again
Commit 1301c80b21 removed some infrastructure needed to check
windows-oriented perl scripts. It also removed most such scripts, but
this one was left over. We repair the damage by making Win32::Registry a
conditional requirement that is only loaded on Windows. With this change
`perl -cw win32tzlist.pl` once again passes on non-Windows machines.

Discussion: https://postgr.es/m/a2bd77fd-61b8-4c2b-b12e-3e22ae260f82@eisentraut.org
2023-12-22 13:56:27 +00:00
Andrew Dunstan 387aecc948 Rename pgindent options
--show-diff becomes --diff, and --silent-diff becomes --check. These
options may now be given together. Without --check, --diff will exit
with a zero status even if diffs are found. With --check, it will now
exit with a non-zero status in that case.

Author: Tristan Partin
Reviewed-by: Daniel Gustafsson, Jelte Fennema-Nio

Discussion: https://postgr.es/m/CXLX2XYTH9S6.140SC6Y61VD88@neon.tech
2023-12-20 22:37:57 +00:00
Robert Haas dc21234005 Add support for incremental backup.
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup.  You then use BASE_BACKUP with the INCREMENTAL option to take
the backup.  pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.

An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.

Patch by me.  Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 09:49:12 -05:00
Robert Haas 174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Michael Paquier 1301c80b21 Remove MSVC scripts
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that.  The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.

All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed.  Some adjustments
for the documentation are planned in a follow-up set of changes.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
2023-12-20 09:44:37 +09:00
Jeff Davis 867dd2dc87 Cache opaque handle for GUC option to avoid repeasted lookups.
When setting GUCs from proconfig, performance is important, and hash
lookups in the GUC table are significant.

Per suggestion from Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com
Reviewed-by: John Naylor
2023-12-08 11:16:01 -08:00
Tomas Vondra b437571714 Allow parallel CREATE INDEX for BRIN indexes
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.

The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.

The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.

Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).

This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.

Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.

Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
2023-12-08 18:15:26 +01:00
Heikki Linnakangas b31ba5310b Rename ShmemVariableCache to TransamVariables
The old name was misleading: It's not a cache, the values kept in the
struct are the authoritative source.

Reviewed-by: Tristan Partin, Richard Guo
Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
2023-12-08 09:47:15 +02:00
Robert Haas d463aa06a9 Rename JsonManifestParseContext callbacks.
There is no real reason to just run multiple words together when
we can instead punctuate them with marks intended for that purpose.

Per suggestion from Álvaro Herrera.

Discussion: http://postgr.es/m/202311161021.nisg7imt7kyf@alvherre.pgsql
2023-12-05 12:17:49 -05:00
Daniel Gustafsson a5cf808be5 Read include/exclude commands for dump/restore from file
When there is a need to filter multiple tables with include and/or exclude
options it's quite possible to run into the limitations of the commandline.
This adds a --filter=FILENAME feature to pg_dump, pg_dumpall and pg_restore
which is used to supply a file containing object exclude/include commands
which work just like their commandline counterparts. The format of the file
is one command per row like:

    <command> <object> <objectpattern>

<command> can be "include" or "exclude", <object> can be table_data, index
table_data_and_children, database, extension, foreign_data, function, table
schema, table_and_children or trigger.

This patch has gone through many revisions and design changes over a long
period of time, the list of reviewers reflect reviewers of some version of
the patch, not necessarily the final version.

Patch by Pavel Stehule with some additional hacking by me.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://postgr.es/m/CAFj8pRB10wvW0CC9Xq=1XDs=zCQxer3cbLcNZa+qiX4cUH-G_A@mail.gmail.com
2023-11-29 14:56:24 +01:00
Thomas Munro 15c9ac3629 Optimize pg_readv/pg_pwritev single vector case.
For the trivial case of iovcnt == 1, kernels are measurably slower at
dealing with the more complex arguments of preadv/pwritev than the
equivalent plain old pread/pwrite.  The overheads are worth it for
iovcnt > 1, but for 1 let's just redirect to the cheaper calls.  While
we could leave it to callers to worry about that, we already have to
have our own pg_ wrappers for portability reasons so it seems
reasonable to centralize this knowledge there (thanks to Heikki for this
suggestion).  Try to avoid function call overheads by making them
inlinable, which might also allow the compiler to avoid the branch in
some cases.  For systems that don't have preadv and pwritev (currently:
Windows and [closed] Solaris), we might as well pull the replacement
functions up into the static inline functions too.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-11-29 17:19:25 +13:00
Tom Lane c82207a548 Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.
We should have done it this way all along, but we accidentally got
away with using the wrong BIO field up until OpenSSL 3.2.  There,
the library's BIO routines that we rely on use the "data" field
for their own purposes, and our conflicting use causes assorted
weird behaviors up to and including core dumps when SSL connections
are attempted.  Switch to using the approved field for the purpose,
i.e. app_data.

While at it, remove our configure probes for BIO_get_data as well
as the fallback implementation.  BIO_{get,set}_app_data have been
there since long before any OpenSSL version that we still support,
even in the back branches.

Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor
change in an error message spelling that evidently came in with 3.2.

Tristan Partin and Bo Andreson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
2023-11-28 12:34:03 -05:00
Tomas Vondra b2caf7c0e1 Fix brin.c indentation issues introduced by c1ec02be1d
Per buildfarm member koel.
2023-11-26 21:35:32 +01:00
Peter Eisentraut bc15a126bb Explicitly skip TAP tests under Meson if disabled
If the tap_tests option is disabled under Meson, the TAP tests are
currently not registered at all.  But this makes it harder to see what
is going on, why suddently there are fewer tests than before.

Instead, run testwrap with an option that marks the test as skipped.
That way, the total list and count of tests is constant whether the
option is enabled or not.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/ad5ec96d-69ec-317b-a137-367ea5019b61@eisentraut.org
2023-11-16 08:14:33 +01:00
Dean Rasheed 519fc1bd9e Support +/- infinity in the interval data type.
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.

The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.

Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.

Bump catalog version.

Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.

Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
2023-11-14 10:58:49 +00:00
Peter Eisentraut 3849fe7c2b Replace Gen_dummy_probes.sed with Gen_dummy_probes.pl
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p.  This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.

(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)

As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary.  We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.

The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/3fd0f1bc-4483-4ba9-8aa0-64765b052039@eisentraut.org
2023-11-14 10:27:10 +01:00
Tom Lane 83472de606 Improve readability and error detection of array_in().
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two.  This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing.  In any case this coding seems
much clearer and more straightforward than what we had before.

This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add.  Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.

Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.

Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.

Tom Lane, Heikki Linnakangas, and Jian He.  Thanks to Alexander Lakhin
for the initial report and for review/testing.

Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
2023-11-13 13:01:51 -05:00
David Rowley 10d34fefc2 Ensure we use the correct spelling of "ensure"
We seem to have accidentally used "insure" in a few places.  Correct
that.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
2023-11-10 00:15:54 +13:00
Heikki Linnakangas b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Peter Eisentraut 721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
David Rowley 151ffcf6d8 Try again to fix the MSVC build
My last attempt in 39c959ef2 mistakenly conditionally added the missing
file based on some unrelated condition.

Reported-by: Thomas Munro
Discussion: https://postgr.es/m/CA+hUKGLovvAXim9Fytn=jxks9s=JhP5=8Oyy0cbxGG-ggALJtg@mail.gmail.com
2023-11-04 15:41:16 +13:00
David Rowley 39c959ef25 Add missing unicode_category.c to MSVC build scripts
Fixes MSVC build failure introduced by a02b37fc0
2023-11-03 20:12:36 +13:00
Amit Kapila 29d0a77fa6 Migrate logical slots to the new node during an upgrade.
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.

If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.

The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.

Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
2023-10-26 07:06:55 +05:30
Alexander Korotkov d3d55ce571 Remove useless self-joins
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result.  Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.

The SJE proof is based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
 1. At most one inner row matches the join clause.
 2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
 1. Collect all merge-joinable join quals which look like a.x = b.x
 2. Add to the list above the baseretrictinfo of the inner table.
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables.
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
    the possibility of self-join elimination inner and outer clauses must have
    an exact match.

The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins.  Tests, covering this feature,
were added to join.sql.  Some regression tests changed due to self-join removal
logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
2023-10-25 12:59:16 +03:00