postgresql

Commit Graph

Author	SHA1	Message	Date
Michael Paquier	f98dbdeb51	Add "ABI_compatibility" regions to wait_event_names.txt The current design behind the automatic generation of the C code and documentation related to wait events introduced in `fa88928470` does not offer a way to attach new wait events without breaking ABI compatibility, as all the events are forcibly reordered for each section in the input file wait_event_names.txt. Adding new wait events to stable branches is something that has happened in the past, `0b6517a3b7` being a recent example of that with VERSION_FILE_SYNC, so we need a way to generate any C code for wait events while maintaining compatibility on stable branches already released. This commit solves this issue by adding a new region called "ABI_compatibility" (keyword could be updated to something else if someone had a better idea) to each section of wait_event_names.txt, so as one can add new wait events to stable branches in wait_event_names.txt while keeping the code ABI-compatible. "ABI_compatibility" has no impact on the documentation generated: all the wait events of one section are still alphabetically ordered. LWLock and Lock sections generate their C code elsewhere, so they do not need an "ABI_compatibility" region. For example, let's imagine a wait_event_names.txt like that: Section: ClassName - Foo FOO_1 "Waiting in Foo 1" FOO_2 "Waiting in Foo 2" ABI_compatibility: NEW_FOO_1 "Waiting in New Foo 1" NEW_BAR_1 "Waiting in New Bar 1" This results in the following enum, where the events in the ABI region are listed last with the same ordering as in wait_event_names.txt: typedef enum { WAIT_EVENT_FOO_1, WAIT_EVENT_FOO_2, WAIT_EVENT_NEW_FOO_1, WAIT_EVENT_NEW_BAR_1 } WaitEventFoo; New wait events added in stable branches should be added at the end of each ABI_compatibility region, and ABI_compatibility should remain empty on HEAD and unreleased stable branches. This design has been suggested by Noah Misch and me. Reported-by: Noah Misch Author: Bertrand Drouvot Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com	2024-04-05 08:56:52 +09:00
Alexander Korotkov	06c418e163	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed before starting the transaction. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes on standby. The queue of waiters is stored in the shared memory array sorted by LSN. During replay of WAL waiters whose LSNs are already replayed are deleted from the shared memory array and woken up by setting of their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working in a non-atomic context, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira	2024-04-02 22:48:03 +03:00
Masahiko Sawada	667e65aac3	Use TidStore for dead tuple TIDs storage during lazy vacuum. Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit `30e144287a`. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-04-02 10:15:37 +09:00
Daniel Gustafsson	697f8d266c	Revert "Add notBefore and notAfter to SSL cert info display" This reverts commit `6acb0a628e` since LibreSSL didn't support ASN1_TIME_diff until OpenBSD 7.1, leaving the older OpenBSD animals in the buildfarm complaining. Per plover in the buildfarm. Discussion: https://postgr.es/m/F0DF7102-192D-4C21-96AE-9A01AE153AD1@yesql.se	2024-03-22 22:58:41 +01:00
Daniel Gustafsson	6acb0a628e	Add notBefore and notAfter to SSL cert info display This adds the X509 attributes notBefore and notAfter to sslinfo as well as pg_stat_ssl to allow verifying and identifying the validity period of the current client certificate. OpenSSL has APIs for extracting notAfter and notBefore, but they are only supported in recent versions so we have to calculate the dates by hand in order to make this work for the older versions of OpenSSL that we still support. Original patch by Cary Huang with additional hacking by Jacob and myself. Author: Cary Huang <cary.huang@highgo.ca> Co-author: Jacob Champion <jacob.champion@enterprisedb.com> Co-author: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca	2024-03-22 21:25:25 +01:00
Alvaro Herrera	da952b415f	Rework lwlocknames.txt to become lwlocklist.h This way, we can fold the list of lock names to occur in BuiltinTrancheNames instead of having its own separate array. This saves two lines of code in GetLWTrancheName and some space in BuiltinTrancheNames, as foreseen in commit `74a7306310`, as well as removing the need for a separate lwlocknames.c file. We still have to build lwlocknames.h using Perl code, which initially I wanted to avoid, but it gives us the chance to cross-check wait_event_names.txt. Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql	2024-03-20 11:55:20 +01:00
Peter Eisentraut	97d85be365	Make the order of the header file includes consistent Similar to commit `7e735035f2`. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com	2024-03-13 15:07:00 +01:00
Michael Paquier	77cf6a78de	Add some asserts based on LWLockHeldByMe() for replication slot statistics Two assertions checking that ReplicationSlotAllocationLock is acquired are added to pgstat_create_replslot() and pgstat_drop_replslot(), corresponding to the routines in charge of the creation and the drop of replication slot statistics. The code previously relied on this assumption and documented it in comments, but did not enforce this policy at runtime. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/Ze_p-hmD_yFeVYXg@paquier.xyz	2024-03-13 07:45:11 +09:00
Michael Paquier	b36fbd9f8d	Improve consistency of replication slot statistics The replication slot stats stored in shared memory rely on an internal index number. Both pgstat_reset_replslot() and pgstat_fetch_replslot() lacked some LWLock protections with ReplicationSlotControlLock while operating on these index numbers. This issue could cause these two functions to potentially operate on incorrect slots when taken in isolation in the event of slots dropped and/or re-created concurrently. Note that pg_stat_get_replication_slot() is called once per slot when querying pg_stat_replication_slots, meaning that the stats are retrieved across multiple ReplicationSlotControlLock acquisitions. So, while this commit improves more consistency, it may still be possible that statistics are not completely consistent for a single scan of pg_stat_replication_slots under concurrent replication slot drop or creation activity. The issue should unlikely be a problem in practice, causing the report of inconsistent stats or or the stats reset of an incorrect slot, so no backpatch is done. Author: Bertrand Drouvot Reviewed-by: Heikki Linnakangas, Shveta Malik, Michael Paquier Discussion: https://postgr.es/m/ZeGq1HDWFfLkjh4o@ip-10-97-1-34.eu-west-3.compute.internal	2024-03-11 10:25:01 +09:00
Amit Kapila	bf279ddd1c	Introduce a new GUC 'standby_slot_names'. This patch provides a way to ensure that physical standbys that are potential failover candidates have received and flushed changes before the primary server making them visible to subscribers. Doing so guarantees that the promoted standby server is not lagging behind the subscribers when a failover is necessary. The logical walsender now guarantees that all local changes are sent and flushed to the standby servers corresponding to the replication slots specified in 'standby_slot_names' before sending those changes to the subscriber. Additionally, the SQL functions pg_logical_slot_get_changes, pg_logical_slot_peek_changes and pg_replication_slot_advance are modified to ensure that they process changes for failover slots only after physical slots specified in 'standby_slot_names' have confirmed WAL receipt for those. Author: Hou Zhijie and Shveta Malik Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-03-08 08:10:45 +05:30
Jeff Davis	0984a3b851	Run pgindent again on the same file. Apparently, pgindent got confused by the double space. The first time I ran it, it moved the function name to the next line. The second time I ran it, it moved the function name back, but without the double space. Now the results appear stable.	2024-03-05 11:16:23 -08:00
Jeff Davis	b406af1806	Run pgindent for commit `ef4cfdce0e`.	2024-03-05 10:58:24 -08:00
Heikki Linnakangas	ef4cfdce0e	Fix references to renamed function in comments I renamed the function in commit `024c521117`, but missed these comments. Reported-by: Richard Guo Discussion: https://www.postgresql.org/message-id/CAMbWs4-jR6qc7JRMKwz-zXQy_AYLUZ3PHjGep4B91of321cqWw@mail.gmail.com	2024-03-05 18:23:58 +02:00
Heikki Linnakangas	55cdba2647	Fix a leftover reference to backend_id in comment Commit `024c521117` replaced backend_id with proc_number. Reported-by: Alexander Lakhin	2024-03-05 09:15:02 +02:00
Peter Eisentraut	dbbca2cf29	Remove unused #include's from backend .c files as determined by include-what-you-use (IWYU) While IWYU also suggests to add a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org	2024-03-04 12:02:20 +01:00
Heikki Linnakangas	393b5599e5	Use MyBackendType in more places to check what process this is Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new AmProcess() macros that use MyBackendType. For consistency with the existing AmProcess() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi	2024-03-04 10:25:12 +02:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00
Heikki Linnakangas	ab355e3a88	Redefine backend ID to be an index into the proc array Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:37:28 +02:00
Alvaro Herrera	53c2a97a92	Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by `bcdfa5f2e2` in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-02-28 17:05:31 +01:00
Nathan Bossart	42a1de3013	Add helper functions for dshash tables with string keys. Presently, string keys are not well-supported for dshash tables. The dshash code always copies key_size bytes into new entries' keys, and dshash.h only provides compare and hash functions that forward to memcmp() and tag_hash(), both of which do not stop at the first NUL. This means that callers must pad string keys so that the data beyond the first NUL does not adversely affect the results of copying, comparing, and hashing the keys. To better support string keys in dshash tables, this commit does a couple things: * A new copy_function field is added to the dshash_parameters struct. This function pointer specifies how the key should be copied into new table entries. For example, we only want to copy up to the first NUL byte for string keys. A dshash_memcpy() helper function is provided and used for all existing in-tree dshash tables without string keys. * A set of helper functions for string keys are provided. These helper functions forward to strcmp(), strcpy(), and string_hash(), all of which ignore data beyond the first NUL. This commit also adjusts the DSM registry's dshash table to use the new helper functions for string keys. Reviewed-by: Andy Fan Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13	2024-02-26 15:47:13 -06:00
Nathan Bossart	5fe08c006c	Use NULL instead of 0 for 'arg' argument in dshash_create() calls. A couple of dshash_create() callers provide 0 for the 'void *arg' argument, which might give readers the incorrect impression that this is some sort of "flags" parameter. Reviewed-by: Andy Fan Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13	2024-02-26 15:46:01 -06:00
Amit Kapila	93db6cbda0	Add a new slot sync worker to synchronize logical slots. By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-22 15:25:15 +05:30
Noah Misch	0b6517a3b7	Sync PG_VERSION file in CREATE DATABASE. An OS crash could leave PG_VERSION empty or missing. The same symptom appeared in a backup by block device snapshot, taken after the next checkpoint and before the OS flushes the PG_VERSION blocks. Device snapshots are not a documented backup method, however. Back-patch to v15, where commit `9c08aea6a3` introduced STRATEGY=WAL_LOG and made it the default. Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com	2024-02-01 13:44:19 -08:00
Michael Paquier	235c09efbb	Fix stats_fetch_consistency with stats for fixed-numbered objects This impacts the statistics retrieved in transactions for the following views when updating the value of stats_fetch_consistency, leading to behaviors contrary to what is documented since `605994651b` as an update of this parameter should discard all statistics snapshot data: - pg_stat_archiver - pg_stat_bgwriter - pg_stat_checkpointer - pg_stat_io - pg_stat_slru - pg_stat_wal For example, updating stats_fetch_consistency from "snapshot" to "cache" in a transaction did not re-fetch any fresh data, using data cached from the time when "snapshot" was in use. Author: Shinya Kato Discussion: https://postgr.es/m/d77fc5190d4dbe1738d77231488e768b@oss.nttdata.com Backpatch-through: 15	2024-02-01 17:12:50 +09:00
Alvaro Herrera	7b745d85b8	Split use of SerialSLRULock, creating SerialControlLock predicate.c has been using SerialSLRULock (the control lock for its SLRU structure) to coordinate access to SerialControlData, another of its numerous shared memory structures; this is unnecessary and confuses further SLRU scalability work. Create a separate LWLock to cover SerialControlData. Extracted from a larger patch from the same author, and some additional changes by Álvaro. Author: Dilip Kumar <dilip.kumar@enterprisedb.com> Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-01-30 18:11:17 +01:00
Michael Paquier	b199eb89c6	Fix some typos Author: Yongtao Huang Discussion: https://postgr.es/m/CAOe1Go1F99o5JsphtXdDC5bxm7AzetU8q3AxLh4AAVGKu1AzEQ@mail.gmail.com	2024-01-22 13:55:25 +09:00
Michael Paquier	d86d20f0ba	Add backend support for injection points Injection points are a new facility that makes possible for developers to run custom code in pre-defined code paths. Its goal is to provide ways to design and run advanced tests, for cases like: - Race conditions, where processes need to do actions in a controlled ordered manner. - Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force recovery, etc. - Arbitrary sleeps. This implements some basics, and there are plans to extend it more in the future depending on what's required. Hence, this commit adds a set of routines in the backend that allows developers to attach, detach and run injection points: - A code path calling an injection point can be declared with the macro INJECTION_POINT(name). - InjectionPointAttach() and InjectionPointDetach() to respectively attach and detach a callback to/from an injection point. An injection point name is registered in a shmem hash table with a library name and a function name, which will be used to load the callback attached to an injection point when its code path is run. Injection point names are just strings, so as an injection point can be declared and run by out-of-core extensions and modules, with callbacks defined in external libraries. This facility is hidden behind a dedicated switch for ./configure and meson, disabled by default. Note that backends use a local cache to store callbacks already loaded, cleaning up their cache if a callback has found to be removed on a best-effort basis. This could be refined further but any tests but what we have here was fine with the tests I've written while implementing these backend APIs. Author: Michael Paquier, with doc suggestions from Ashutosh Bapat. Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip Kumar, Amul Sul, Nazir Bilal Yavuz Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz	2024-01-22 10:15:50 +09:00
Nathan Bossart	8b2bcf3f28	Introduce the dynamic shared memory registry. Presently, the most straightforward way for a shared library to use shared memory is to request it at server startup via a shmem_request_hook, which requires specifying the library in shared_preload_libraries. Alternatively, the library can create a dynamic shared memory (DSM) segment, but absent a shared location to store the segment's handle, other backends cannot use it. This commit introduces a registry for DSM segments so that these other backends can look up existing segments with a library-specified string. This allows libraries to easily use shared memory without needing to request it at server startup. The registry is accessed via the new GetNamedDSMSegment() function. This function handles allocating the segment and initializing it via a provided callback. If another backend already created and initialized the segment, it simply attaches the segment. GetNamedDSMSegment() locks the registry appropriately to ensure that only one backend initializes the segment and that all other backends just attach it. The registry itself is comprised of a dshash table that stores the DSM segment handles keyed by a library-specified string. Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13	2024-01-19 14:24:36 -06:00
Heikki Linnakangas	2b53a462cf	Fix incorrect comment on how BackendStatusArray is indexed The comment was copy-pasted from the call to ProcSignalInit() in AuxiliaryProcessMain(), which uses a similar scheme of having reserved slots for aux processes after MaxBackends slots for backends. However, ProcSignalInit() indexing starts from 1, whereas BackendStatusArray starts from 0. The code is correct, but the comment was wrong. Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi Backpatch-through: v14	2024-01-17 15:44:10 +02:00
Nathan Bossart	5b1b9bce84	Cross-check lists of predefined LWLocks. Both lwlocknames.txt and wait_event_names.txt contain a list of all the predefined LWLocks, i.e., those with predefined positions within MainLWLockArray. It is easy to miss one or the other, especially since the list in wait_event_names.txt omits the "Lock" suffix from all the LWLock wait events. This commit adds a cross- check of these lists to the script that generates lwlocknames.h. If the lists do not match exactly, building will fail. Suggested-by: Robert Haas Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13	2024-01-09 11:05:19 -06:00
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Robert Haas	371b07e894	Remove Lock suffix from WALSummarizerLock in wait_event_names.txt Nathan Bossart Discussion: https://postgr.es/m/20240102173120.GA1061678@nathanxps13	2024-01-02 13:17:23 -05:00
Robert Haas	5c430f9dc5	Add WALSummarizerLock to wait_event_names.txt Per report from Nathan Bossart. Discussion: http://postgr.es/m/20231227153647.GA601861@nathanxps13	2024-01-02 10:31:49 -05:00
Peter Eisentraut	c538592959	Make all Perl warnings fatal There are a lot of Perl scripts in the tree, mostly code generation and TAP tests. Occasionally, these scripts produce warnings. These are probably always mistakes on the developer side (true positives). Typical examples are warnings from genbki.pl or related when you make a mess in the catalog files during development, or warnings from tests when they massage a config file that looks different on different hosts, or mistakes during merges (e.g., duplicate subroutine definitions), or just mistakes that weren't noticed because there is a lot of output in a verbose build. This changes all warnings into fatal errors, by replacing use warnings; by use warnings FATAL => 'all'; in all Perl files. Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org	2023-12-29 18:20:00 +01:00
Alexander Korotkov	12915a58ee	Enhance checkpointer restartpoint statistics Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov	2023-12-25 01:12:36 +02:00
Robert Haas	174c480508	Add a new WAL summarizer process. When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com	2023-12-20 08:42:28 -05:00
Michael Paquier	3c9d9acae0	Refactor pgstat_prepare_io_time() with an input argument instead of a GUC Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com	2023-12-16 20:16:20 +01:00
Peter Eisentraut	721856ff24	Remove distprep A PostgreSQL release tarball contains a number of prebuilt files, in particular files produced by bison, flex, perl, and well as html and man documentation. We have done this consistent with established practice at the time to not require these tools for building from a tarball. Some of these tools were hard to get, or get the right version of, from time to time, and shipping the prebuilt output was a convenience to users. Now this has at least two problems: One, we have to make the build system(s) work in two modes: Building from a git checkout and building from a tarball. This is pretty complicated, but it works so far for autoconf/make. It does not currently work for meson; you can currently only build with meson from a git checkout. Making meson builds work from a tarball seems very difficult or impossible. One particular problem is that since meson requires a separate build directory, we cannot make the build update files like gram.h in the source tree. So if you were to build from a tarball and update gram.y, you will have a gram.h in the source tree and one in the build tree, but the way things work is that the compiler will always use the one in the source tree. So you cannot, for example, make any gram.y changes when building from a tarball. This seems impossible to fix in a non-horrible way. Second, there is increased interest nowadays in precisely tracking the origin of software. We can reasonably track contributions into the git tree, and users can reasonably track the path from a tarball to packages and downloads and installs. But what happens between the git tree and the tarball is obscure and in some cases non-reproducible. The solution for both of these issues is to get rid of the step that adds prebuilt files to the tarball. The tarball now only contains what is in the git tree (). Getting the additional build dependencies is no longer a problem nowadays, and the complications to keep these dual build modes working are significant. And of course we want to get the meson build system working universally. This commit removes the make distprep target altogether. The make dist target continues to do its job, it just doesn't call distprep anymore. () - The tarball also contains the INSTALL file that is built at make dist time, but not by distprep. This is unchanged for now. The make maintainer-clean target, whose job it is to remove the prebuilt files in addition to what make distclean does, is now just an alias to make distprep. (In practice, it is probably obsolete given that git clean is available.) The following programs are now hard build requirements in configure (they were already required by meson.build): - bison - flex - perl Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org	2023-11-06 15:18:04 +01:00
Michael Paquier	96f052613f	Introduce pg_stat_checkpointer Historically, the statistics of the checkpointer have been always part of pg_stat_bgwriter. This commit removes a few columns from pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent, renamed columns (plus a new one for the reset timestamp): - checkpoints_timed -> num_timed - checkpoints_req -> num_requested - checkpoint_write_time -> write_time - checkpoint_sync_time -> sync_time - buffers_checkpoint -> buffers_written The fields of PgStat_CheckpointerStats and its SQL functions are renamed to match with the new field names, for consistency. Note that background writer and checkpointer have been split into two different processes in commits `806a2aee37` and `bf405ba8e4`. The pgstat structures were already split, making this change straight-forward. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com	2023-10-30 09:47:16 +09:00
Michael Paquier	bf01e1ba96	Refactor some code related to transaction-level statistics for relations This commit refactors find_tabstat_entry() so as transaction counters for inserted, updated and deleted tuples are included in the result returned. If a shared entry is found for a relation, its result is now a copy of the PgStat_TableStatus entry retrieved from shared memory. This idea has been proposed by Andres Freund. While on it, the following SQL functions, used in system views, are refactored with macros, in the same spirit as `83a1a1b566`, reducing the amount of code: - pg_stat_get_xact_tuples_deleted() - pg_stat_get_xact_tuples_inserted() - pg_stat_get_xact_tuples_updated() There is now only one caller of find_tabstat_entry() in the tree. Author: Bertrand Drouvot Discussion: https://postgr.es/m/b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com	2023-10-30 08:23:39 +09:00
Michael Paquier	74604a37f2	Remove buffers_backend and buffers_backend_fsync from pg_stat_checkpointer Two attributes related to checkpointer statistics are removed in this commit: - buffers_backend, that counts the number of buffers written directly by a backend. - buffers_backend_fsync, that counts the number of times a backend had to do fsync() by its own. These are actually not checkpointer properties but backend properties. Also, pg_stat_io provides a more accurate and equivalent report of these numbers, by tracking all the I/O stats related to backends, including writes and fsyncs, so storing them in pg_stat_checkpointer was redundant. Thanks also to Robert Haas and Amit Kapila for their input. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Andres Freund Discussion: https://postgr.es/m/20230210004604.mcszbscsqs3bc5nx@awork3.anarazel.de	2023-10-27 11:16:39 +09:00
Michael Paquier	9972c7de1d	Fix typos in wait_event.c Noticed while working on a different patch. Introduced in `af720b4c50`.	2023-10-24 08:05:29 +09:00
Michael Paquier	295c36c0c1	Add local_blk_{read\|write}_time I/O timing statistics for local blocks There was no I/O timing statistics for counting read and write timings on local blocks, contrary to the counterparts for temp and shared blocks. This information is available when track_io_timing is enabled. The output of EXPLAIN is updated to show this information. An update of pg_stat_statements is planned next. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com	2023-10-19 13:39:38 +09:00
Michael Paquier	13d00729d4	Rename I/O timing statistics columns to shared_blk_{read\|write}_time These two counters, defined in BufferUsage to track respectively the time spent while reading and writing blocks have historically only tracked data related to shared buffers, when track_io_timing is enabled. An upcoming patch to add specific counters for local buffers will take advantage of this rename as it has come up that no data is currently tracked for local buffers, and tracking local and shared buffers using the same fields would be inconsistent with the treatment done for temp buffers. Renaming the existing fields clarifies what the block type of each stats field is. pg_stat_statement is updated to reflect the rename. No extension version bump is required as `5a3423ad8e` has done one, affecting v17~. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com	2023-10-19 11:26:40 +09:00
Michael Paquier	d17ffc734d	Count write times when extending relation files for shared buffers Relation files extended by multiple blocks at a time have been counting the number of blocks written, but forgot to increment the write time in this case, as single-block write and relation extension are treated as two different I/O operations in the shared stats: IOOP_EXTEND vs IOOP_WRITE. In this case IOOP_EXTEND was forgotten for normal (non-temporary) relations, still the number of blocks written was incremented according to the relation extend done. Write times are tracked when track_io_timing is enabled, which is not the case by default. Author: Nazir Bilal Yavuz Reviewed-by: Robert Haas, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com Backpatch-through: 16	2023-10-18 14:54:33 +09:00
Thomas Munro	0013ba290b	Add wait events for checkpoint delay mechanism. When MyProc->delayChkptFlags is set to temporarily block phase transitions in a concurrent checkpoint, the checkpointer enters a sleep-poll loop to wait for the flag to be cleared. We should show that as a wait event in the pg_stat_activity view. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGL7Whi8iwKbzkbn_1fixH3Yy8aAPz7mfq6Hpj7FeJrKMg%40mail.gmail.com	2023-10-13 16:43:22 +13:00
Michael Paquier	a956bd3fa9	Avoid memory size overflow when allocating backend activity buffer The code in charge of copying the contents of PgBackendStatus to local memory could fail on memory allocation because of an overflow on the amount of memory to use. The overflow can happen when combining a high value track_activity_query_size (max at 1MB) with a large max_connections, when both multiplied get higher than INT32_MAX as both parameters treated as signed integers. This could for example trigger with the following functions, all calling pgstat_read_current_status(): - pg_stat_get_backend_subxact() - pg_stat_get_backend_idset() - pg_stat_get_progress_info() - pg_stat_get_activity() - pg_stat_get_db_numbackends() The change to use MemoryContextAllocHuge() has been introduced in `8d0ddccec6`, so backpatch down to 12. Author: Jakub Wartak Discussion: https://postgr.es/m/CAKZiRmw8QSNVw2qNK-dznsatQqz+9DkCquxP0GHbbv1jMkGHMA@mail.gmail.com Backpatch-through: 12	2023-10-03 15:37:00 +09:00
Michael Paquier	e221c0befb	Fix behavior of "force" in pgstat_report_wal() As implemented in `5891c7a8ed`, setting "force" to true in pgstat_report_wal() causes the routine to not wait for the pgstat shmem lock if it cannot be acquired, in which case the WAL and I/O statistics finish by not being flushed. The origin of the confusion comes from pgstat_flush_wal() and pgstat_flush_io(), that use "nowait" as sole argument. The I/O stats are new in v16. This is the opposite behavior of what has been used in pgstat_report_stat(), where "force" is the opposite of "nowait". In this case, when "force" is true, the routine sets "nowait" to false, which would cause the routine to wait for the pgstat shmem lock, ensuring that the stats are always flushed. When "force" is false, "nowait" is set to true, and the stats would only not be flushed if the pgstat shmem lock can be acquired, returning immediately without flushing the stats if the lock cannot be acquired. This commit changes pgstat_report_wal() so as "force" has the same behavior as in pgstat_report_stat(). There are currently three callers of pgstat_report_wal(): - Two in the checkpointer where force=true during a shutdown and the main checkpointer loop. Now the code behaves so as the stats are always flushed. - One in the main loop of the bgwriter, where force=false. Now the code behaves so as the stats would not be flushed if the pgstat shmem lock could not be acquired. Before this commit, some stats on WAL and I/O could have been lost after a shutdown, for example. Reported-by: Ryoga Yoshida Author: Ryoga Yoshida, Michael Paquier Discussion: https://postgr.es/m/f87a4d7be70530606b864fd1df91718c@oss.nttdata.com Backpatch-through: 15	2023-09-26 09:29:47 +09:00
Michael Paquier	59cbf60c0f	Remove column for wait event names in wait_event_names.txt This file is now made of two columns, removing the column listing the user-visible strings used in the system views and the documentation: - Enum definitions for each class without the prefix "WAIT_EVENT_", so as this information can be grepped in the code and wait_event_names.txt at the same time. - Description in the documentation. The wait event names are now generated from the enum objects in CamelCase, with the underscores removed. The data generated for wait events is consistent with what was produced by `414f6c0fb7`. This has the advantage to remove WAIT_EVENT_DOCONLY, which was a placeholder for the wait event types Lock and LWLock as these two only require the generation of the documentation. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz	2023-09-06 10:27:02 +09:00
Michael Paquier	414f6c0fb7	Use more consistent names for wait event objects and types The event names use the same case-insensitive characters, hence applying lower() or upper() to the monitoring queries allows the detection of the same events as before this change. It is possible to cross-check the data with the system view pg_wait_events, for instance, with a query like that showing no differences: SELECT lower(type), lower(name), description FROM pg_wait_events ORDER BY 1, 2; This will help in the introduction of more simplifications in the format of wait_event_names. Some of the enum values in the code had to be renamed a bit to follow the same convention naming across the board. Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz	2023-09-06 10:04:43 +09:00

1 2 3 4

167 Commits