Commit Graph

30 Commits

Author SHA1 Message Date
Alexander Korotkov 772faafca1 Revert: Implement pg_wal_replay_wait() stored procedure
This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2,
ee79928441, and 74eaf66f98 per review by Heikki Linnakangas.

Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi
2024-04-11 17:28:15 +03:00
Alvaro Herrera a0e0fb1ba5
Use conditional variable to wait for next MultiXact offset
In one multixact.c edge case, we need a mechanism to wait for one
multixact offset to be written before being allowed to read the next
one.  We used to handle this case by sleeping for one millisecond and
retrying, but such sleeps have been reported as problematic in
production cases.  We can avoid the problem by using a condition
variable: readers sleep on it and then every creator of multixacts
broadcasts into the CV when creation is sufficiently far along.

Author: Kyotaro Horiguchi <horikyotajntt@gmail.com>
Reviewed-by: Andrey Borodin <amborodin@acm.org>
Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru
Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
2024-04-07 20:33:45 +02:00
Alexander Korotkov 25f42429e2 Use an LWLock instead of a spinlock in waitlsn.c
This should prevent busy-waiting when number of waiting processes is high.

Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
Author: Alvaro Herrera
2024-04-07 00:49:53 +03:00
Michael Paquier f98dbdeb51 Add "ABI_compatibility" regions to wait_event_names.txt
The current design behind the automatic generation of the C code and
documentation related to wait events introduced in fa88928470 does not
offer a way to attach new wait events without breaking ABI
compatibility, as all the events are forcibly reordered for each section
in the input file wait_event_names.txt.  Adding new wait events to
stable branches is something that has happened in the past, 0b6517a3b7
being a recent example of that with VERSION_FILE_SYNC, so we need a way
to generate any C code for wait events while maintaining compatibility
on stable branches already released.

This commit solves this issue by adding a new region called
"ABI_compatibility" (keyword could be updated to something else if
someone had a better idea) to each section of wait_event_names.txt, so
as one can add new wait events to stable branches in
wait_event_names.txt while keeping the code ABI-compatible.
"ABI_compatibility" has no impact on the documentation generated: all
the wait events of one section are still alphabetically ordered.  LWLock
and Lock sections generate their C code elsewhere, so they do not need
an "ABI_compatibility" region.

For example, let's imagine a wait_event_names.txt like that:
Section: ClassName - Foo
FOO_1	"Waiting in Foo 1"
FOO_2	"Waiting in Foo 2"
ABI_compatibility:
NEW_FOO_1	"Waiting in New Foo 1"
NEW_BAR_1	"Waiting in New Bar 1"

This results in the following enum, where the events in the ABI region
are listed last with the same ordering as in wait_event_names.txt:
typedef enum
{
    WAIT_EVENT_FOO_1,
    WAIT_EVENT_FOO_2,
    WAIT_EVENT_NEW_FOO_1,
    WAIT_EVENT_NEW_BAR_1
} WaitEventFoo;

New wait events added in stable branches should be added at the end of
each ABI_compatibility region, and ABI_compatibility should remain empty
on HEAD and unreleased stable branches.

This design has been suggested by Noah Misch and me.

Reported-by: Noah Misch
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com
2024-04-05 08:56:52 +09:00
Alexander Korotkov 06c418e163 Implement pg_wal_replay_wait() stored procedure
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed before starting the transaction.
This option is useful when the user makes some data changes on primary and
needs a guarantee to see these changes on standby.

The queue of waiters is stored in the shared memory array sorted by LSN.
During replay of WAL waiters whose LSNs are already replayed are deleted from
the shared memory array and woken up by setting of their latches.

pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
the snapshot could prevent the replay of WAL records implying a kind of
self-deadlock.  This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working in a non-atomic context,
not a function.

Catversion is bumped.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
2024-04-02 22:48:03 +03:00
Masahiko Sawada 667e65aac3 Use TidStore for dead tuple TIDs storage during lazy vacuum.
Previously, we used a simple array for storing dead tuple IDs during
lazy vacuum, which had a number of problems:

* The array used a single allocation and so was limited to 1GB.
* The allocation was pessimistically sized according to table size.
* Lookup with binary search was slow because of poor CPU cache and
  branch prediction behavior.

This commit replaces that array with the TID store from commit
30e144287a.

Since the backing radix tree makes small allocations as needed, the
1GB limit is now gone. Further, the total memory used is now often
smaller by an order of magnitude or more, depending on the
distribution of blocks and offsets. These two features should make
multiple rounds of heap scanning and index cleanup an extremely rare
event. TID lookup during index cleanup is also several times faster,
even more so when index order is correlated with heap tuple order.

Since there is no longer a predictable relationship between the number
of dead tuples vacuumed and the space taken up by their TIDs, the
number of tuples no longer provides any meaningful insights for users,
nor is the maximum number predictable. For that reason this commit
also changes to byte-based progress reporting, with the relevant
columns of pg_stat_progress_vacuum renamed accordingly to
max_dead_tuple_bytes and dead_tuple_bytes.

For parallel vacuum, both the TID store and supplemental information
specific to vacuum are shared among the parallel vacuum workers. As
with the previous array, we don't take any locks on TidStore during
parallel vacuum since writes are still only done by the leader
process.

Bump catalog version.

Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar
Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-04-02 10:15:37 +09:00
Alvaro Herrera da952b415f
Rework lwlocknames.txt to become lwlocklist.h
This way, we can fold the list of lock names to occur in
BuiltinTrancheNames instead of having its own separate array.  This
saves two lines of code in GetLWTrancheName and some space in
BuiltinTrancheNames, as foreseen in commit 74a7306310, as well as
removing the need for a separate lwlocknames.c file.

We still have to build lwlocknames.h using Perl code, which initially I
wanted to avoid, but it gives us the chance to cross-check
wait_event_names.txt.

Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql
2024-03-20 11:55:20 +01:00
Amit Kapila bf279ddd1c Introduce a new GUC 'standby_slot_names'.
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.

The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.

Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.

Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-03-08 08:10:45 +05:30
Alvaro Herrera 53c2a97a92
Improve performance of subsystems on top of SLRU
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches.  In order for this to work with good
performance, two additional changes are made:

1. the cache is divided in "banks" (to borrow terminology from CPU
   caches), and algorithms such as eviction buffer search only affect
   one specific bank.  This forestalls the problem that linear searching
   for a specific buffer across the whole cache takes too long: we only
   have to search the specific bank, whose size is small.  This work is
   authored by Andrey Borodin.

2. Change the locking regime for the SLRU banks, so that each bank uses
   a separate LWLock.  This allows for increased scalability.  This work
   is authored by Dilip Kumar.  (A part of this was previously committed as
   d172b717c6f4.)

Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next.  This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this.  I (Álvaro) also added
lots of comments to make sure the design is sound.

The new GUCs match the names introduced by bcdfa5f2e2 in the
pg_stat_slru view.

The default values for these parameters are similar to the previous
sizes of each SLRU.  commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB.  (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.)  The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0.  These values based on
shared_buffers may need to be revisited, but that's an easy change.

There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally).  However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
	Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
	Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-02-28 17:05:31 +01:00
Amit Kapila 93db6cbda0 Add a new slot sync worker to synchronize logical slots.
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.

The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.

A new parameter sync_replication_slots enables or disables this new
process.

On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.

A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.

Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-22 15:25:15 +05:30
Noah Misch 0b6517a3b7 Sync PG_VERSION file in CREATE DATABASE.
An OS crash could leave PG_VERSION empty or missing.  The same symptom
appeared in a backup by block device snapshot, taken after the next
checkpoint and before the OS flushes the PG_VERSION blocks.  Device
snapshots are not a documented backup method, however.  Back-patch to
v15, where commit 9c08aea6a3 introduced
STRATEGY=WAL_LOG and made it the default.

Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
2024-02-01 13:44:19 -08:00
Alvaro Herrera 7b745d85b8
Split use of SerialSLRULock, creating SerialControlLock
predicate.c has been using SerialSLRULock (the control lock for its SLRU
structure) to coordinate access to SerialControlData, another of its
numerous shared memory structures; this is unnecessary and confuses
further SLRU scalability work.  Create a separate LWLock to cover
SerialControlData.

Extracted from a larger patch from the same author, and some additional
changes by Álvaro.

Author: Dilip Kumar <dilip.kumar@enterprisedb.com>
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-01-30 18:11:17 +01:00
Michael Paquier d86d20f0ba Add backend support for injection points
Injection points are a new facility that makes possible for developers
to run custom code in pre-defined code paths.  Its goal is to provide
ways to design and run advanced tests, for cases like:
- Race conditions, where processes need to do actions in a controlled
ordered manner.
- Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force
recovery, etc.
- Arbitrary sleeps.

This implements some basics, and there are plans to extend it more in
the future depending on what's required.  Hence, this commit adds a set
of routines in the backend that allows developers to attach, detach and
run injection points:
- A code path calling an injection point can be declared with the macro
INJECTION_POINT(name).
- InjectionPointAttach() and InjectionPointDetach() to respectively
attach and detach a callback to/from an injection point.  An injection
point name is registered in a shmem hash table with a library name and a
function name, which will be used to load the callback attached to an
injection point when its code path is run.

Injection point names are just strings, so as an injection point can be
declared and run by out-of-core extensions and modules, with callbacks
defined in external libraries.

This facility is hidden behind a dedicated switch for ./configure and
meson, disabled by default.

Note that backends use a local cache to store callbacks already loaded,
cleaning up their cache if a callback has found to be removed on a
best-effort basis.  This could be refined further but any tests but what
we have here was fine with the tests I've written while implementing
these backend APIs.

Author: Michael Paquier, with doc suggestions from Ashutosh Bapat.
Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip
Kumar, Amul Sul, Nazir Bilal Yavuz
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-22 10:15:50 +09:00
Nathan Bossart 8b2bcf3f28 Introduce the dynamic shared memory registry.
Presently, the most straightforward way for a shared library to use
shared memory is to request it at server startup via a
shmem_request_hook, which requires specifying the library in
shared_preload_libraries.  Alternatively, the library can create a
dynamic shared memory (DSM) segment, but absent a shared location
to store the segment's handle, other backends cannot use it.  This
commit introduces a registry for DSM segments so that these other
backends can look up existing segments with a library-specified
string.  This allows libraries to easily use shared memory without
needing to request it at server startup.

The registry is accessed via the new GetNamedDSMSegment() function.
This function handles allocating the segment and initializing it
via a provided callback.  If another backend already created and
initialized the segment, it simply attaches the segment.
GetNamedDSMSegment() locks the registry appropriately to ensure
that only one backend initializes the segment and that all other
backends just attach it.

The registry itself is comprised of a dshash table that stores the
DSM segment handles keyed by a library-specified string.

Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul
Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13
2024-01-19 14:24:36 -06:00
Nathan Bossart 5b1b9bce84 Cross-check lists of predefined LWLocks.
Both lwlocknames.txt and wait_event_names.txt contain a list of all
the predefined LWLocks, i.e., those with predefined positions
within MainLWLockArray.  It is easy to miss one or the other,
especially since the list in wait_event_names.txt omits the "Lock"
suffix from all the LWLock wait events.  This commit adds a cross-
check of these lists to the script that generates lwlocknames.h.
If the lists do not match exactly, building will fail.

Suggested-by: Robert Haas
Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot
Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13
2024-01-09 11:05:19 -06:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Robert Haas 371b07e894 Remove Lock suffix from WALSummarizerLock in wait_event_names.txt
Nathan Bossart

Discussion: https://postgr.es/m/20240102173120.GA1061678@nathanxps13
2024-01-02 13:17:23 -05:00
Robert Haas 5c430f9dc5 Add WALSummarizerLock to wait_event_names.txt
Per report from Nathan Bossart.

Discussion: http://postgr.es/m/20231227153647.GA601861@nathanxps13
2024-01-02 10:31:49 -05:00
Robert Haas 174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Thomas Munro 0013ba290b Add wait events for checkpoint delay mechanism.
When MyProc->delayChkptFlags is set to temporarily block phase
transitions in a concurrent checkpoint, the checkpointer enters a
sleep-poll loop to wait for the flag to be cleared.  We should show that
as a wait event in the pg_stat_activity view.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGL7Whi8iwKbzkbn_1fixH3Yy8aAPz7mfq6Hpj7FeJrKMg%40mail.gmail.com
2023-10-13 16:43:22 +13:00
Michael Paquier 59cbf60c0f Remove column for wait event names in wait_event_names.txt
This file is now made of two columns, removing the column listing the
user-visible strings used in the system views and the documentation:
- Enum definitions for each class without the prefix "WAIT_EVENT_", so
as this information can be grepped in the code and wait_event_names.txt
at the same time.
- Description in the documentation.

The wait event names are now generated from the enum objects in
CamelCase, with the underscores removed.  The data generated for wait
events is consistent with what was produced by 414f6c0fb7.

This has the advantage to remove WAIT_EVENT_DOCONLY, which was a
placeholder for the wait event types Lock and LWLock as these two only
require the generation of the documentation.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
2023-09-06 10:27:02 +09:00
Michael Paquier 414f6c0fb7 Use more consistent names for wait event objects and types
The event names use the same case-insensitive characters, hence applying
lower() or upper() to the monitoring queries allows the detection of the
same events as before this change.  It is possible to cross-check the
data with the system view pg_wait_events, for instance, with a query
like that showing no differences:
SELECT lower(type), lower(name), description
  FROM pg_wait_events ORDER BY 1, 2;

This will help in the introduction of more simplifications in the format
of wait_event_names.  Some of the enum values in the code had to be
renamed a bit to follow the same convention naming across the board.

Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZOxVHQwEC/9X/p/z@paquier.xyz
2023-09-06 10:04:43 +09:00
Thomas Munro f691f5b80a Remove the "snapshot too old" feature.
Remove the old_snapshot_threshold setting and mechanism for producing
the error "snapshot too old", originally added by commit 848ef42b.
Unfortunately it had a number of known problems in terms of correctness
and performance, mostly reported by Andres in the course of his work on
snapshot scalability.  We agreed to remove it, after a long period
without an active plan to fix it.

This is certainly a desirable feature, and someone might propose a new
or improved implementation in the future.

Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com
Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de
Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com
2023-09-05 19:53:43 +12:00
Michael Paquier bb9002257b Fix some typos in wait_event_names.txt
Noticed in passing, while hacking on a different patch touching this
area.
2023-08-28 17:09:12 +09:00
Michael Paquier 00e49233a9 Fix format if entry in wait_event_names.txt
The entry LockManager had two successive whitespaces between two words.
This is not an actual bug, but let's be clean.  Thinko in fa88928.

Reported-by: Masahiro Ikeda
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/dd836027-2e9e-4df9-9fd9-7527cd1757e1@gmail.com
2023-08-18 08:11:10 +09:00
Michael Paquier af720b4c50 Change custom wait events to use dynamic shared hash tables
Currently, the names of the custom wait event must be registered for
each backend, requiring all these to link to the shared memory area of
an extension, even if these are not loaded with
shared_preload_libraries.

This patch relaxes the constraints related to this infrastructure by
storing the wait events and their names in two dynamic hash tables in
shared memory.  This has the advantage to simplify the registration of
custom wait events to a single routine call that returns an event ID
ready for consumption:
uint32 WaitEventExtensionNew(const char *wait_event_name);

The caller of this routine can then cache locally the ID returned, to be
used for pgstat_report_wait_start(), WaitLatch() or a similar routine.

The implementation uses two hash tables: one with a key based on the
event name to avoid duplicates and a second using the event ID as key
for event lookups, like on pg_stat_activity.  These tables can hold a
minimum of 16 entries, and a maximum of 128 entries, which should be plenty
enough.

The code changes done in worker_spi show how things are simplified (most
of the code removed in this commit comes from there):
- worker_spi_init() is gone.
- No more shared memory hooks required (size requested and
initialization).
- The custom wait event ID is cached in the process that needs to set
it, with one single call to WaitEventExtensionNew() to retrieve it.

Per suggestion from Andres Freund.

Author: Masahiro Ikeda, with a few tweaks from me.
Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcwzcx@awork3.anarazel.de
2023-08-14 14:47:27 +09:00
Michael Paquier f05b1fa1ff doc: Fix incorrect entries generated from wait_event_names.txt
fa88928 has introduced wait_event_names.txt, and some of its entries had
some documentation fields with more information than necessary.

This commit brings back the description of all the wait events to be
consistent with the older stable branches.  Five descriptions were
incorrect.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/e378989e-1899-643a-dec1-10f691a0a105@gmail.com
2023-08-08 08:17:53 +09:00
Michael Paquier c9af054653 Support custom wait events for wait event type "Extension"
Two backend routines are added to allow extension to allocate and define
custom wait events, all of these being allocated in the type
"Extension":
* WaitEventExtensionNew(), that allocates a wait event ID computed from
a counter in shared memory.
* WaitEventExtensionRegisterName(), to associate a custom string to the
wait event ID allocated.

Note that this includes an example of how to use this new facility in
worker_spi with tests in TAP for various scenarios, and some
documentation about how to use them.

Any code in the tree that currently uses WAIT_EVENT_EXTENSION could
switch to this new facility to define custom wait events.  This is left
as work for future patches.

Author: Masahiro Ikeda
Reviewed-by: Andres Freund, Michael Paquier, Tristan Partin, Bharath
Rupireddy
Discussion: https://postgr.es/m/b9f5411acda0cf15c8fbb767702ff43e@oss.nttdata.com
2023-07-31 17:09:24 +09:00
Michael Paquier 183a60a628 Remove double quotes from the second column of wait_event_names.txt
The double quotes used for the wait event names are not required, as
the values quoted are made of single words.  The files generated by
generate-wait_event_types.pl (pgstat_wait_event.c, wait_event_types.h
and wait_event_types.sgml) are exactly the same before and after this
commit, hence the wait event names and the enum elements have the same
names as before.

Discussion: https://postgr.es/m/ZK9S3jFEV1X797Ll@paquier.xyz
2023-07-14 08:55:11 +09:00
Michael Paquier fa88928470 Generate automatically code and documentation related to wait events
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs

Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.

The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them.  These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.

Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.

This commit has been tested with configure/Makefile, the CI and VPATH
build.  clean, distclean and maintainer-clean were working fine.

Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
2023-07-05 10:53:11 +09:00