postgresql

Commit Graph

Author	SHA1	Message	Date
Daniel Gustafsson	64e401b62b	Fix indentation from `a11f330b5` Per buildfarm animal koel	2024-03-25 14:18:33 +01:00
Amit Kapila	a11f330b55	Track last_inactive_time in pg_replication_slots. This commit adds a new property called last_inactive_time for slots. It is set to 0 whenever a slot is made active/acquired and set to the current timestamp whenever the slot is inactive/released or restored from the disk. Note that we don't set the last_inactive_time for the slots currently being synced from the primary to the standby because such slots are typically inactive as decoding is not allowed on those. The 'last_inactive_time' will be useful on production servers to debug and analyze inactive replication slots. It will also help to know the lifetime of a replication slot - one can know how long a streaming standby, logical subscriber, or replication slot consumer is down. The 'last_inactive_time' will also be useful to implement inactive timeout-based replication slot invalidation in a future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2024-03-25 16:34:33 +05:30
Amit Kapila	6ae701b437	Track invalidation_reason in pg_replication_slots. Till now, the reason for replication slot invalidation is not tracked directly in pg_replication_slots. A recent commit `007693f2a3` added 'conflict_reason' to show the reasons for slot conflict/invalidation, but only for logical slots. This commit adds a new column 'invalidation_reason' to show invalidation reasons for both physical and logical slots. And, this commit also turns 'conflict_reason' text column to 'conflicting' boolean column (effectively reverting commit `007693f2a3`). The 'conflicting' column is true for invalidation reasons 'rows_removed' and 'wal_level_insufficient' because those make the slot conflict with recovery. When 'conflicting' is true, one can now look at the new 'invalidation_reason' column for the reason for the logical slot's conflict with recovery. The new 'invalidation_reason' column will also be useful to track other invalidation reasons in the future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2024-03-22 13:52:05 +05:30
Michael Paquier	d6e171fed6	Keep replication slot statistics on invalidation The code path in charge of invalidating a replication slot includes a call to pgstat_drop_replslot(), which would result in removing the statistics of the slot once invalidated. However, there is no need to remove the statistics of an invalidated slot as one could still be interested in looking at them to understand the activity of the slot until its actual removal. The initial design of the feature committed in `be87200efd` used the approach to drop the slots, which is likely why the statistics were still removed during the invalidation. Another problem with this operation is that it was done without holding ReplicationSlotAllocationLock, leaving it unprotected on concurrent activity. This part is arguably a bug, but that's a limited problem in practice so no backpatch is done. In passing, this commit adds a test to check this behavior. The only remaining code path where slot statistics are dropped now related to the slot getting dropped. Author: Bertrand Drouvot Discussion: https://postgr.es/m/ZermH08Eq6YydHpO@ip-10-97-1-34.eu-west-3.compute.internal	2024-03-12 14:22:31 +09:00
Amit Kapila	bf279ddd1c	Introduce a new GUC 'standby_slot_names'. This patch provides a way to ensure that physical standbys that are potential failover candidates have received and flushed changes before the primary server making them visible to subscribers. Doing so guarantees that the promoted standby server is not lagging behind the subscribers when a failover is necessary. The logical walsender now guarantees that all local changes are sent and flushed to the standby servers corresponding to the replication slots specified in 'standby_slot_names' before sending those changes to the subscriber. Additionally, the SQL functions pg_logical_slot_get_changes, pg_logical_slot_peek_changes and pg_replication_slot_advance are modified to ensure that they process changes for failover slots only after physical slots specified in 'standby_slot_names' have confirmed WAL receipt for those. Author: Hou Zhijie and Shveta Malik Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-03-08 08:10:45 +05:30
Michael Paquier	65db0cfb4c	Revert "Add recovery TAP test for race condition with slot invalidations" This reverts commit `08a52ab151`, due to some sporadic instability in the test. Getting the test right should require some redesign with a second injection point, but let's revert it for now to avoid these issues in the CI as a lot of patches are under discussion in this last commit fest. Per buildfarm members hachi and gokiburi. Discussion: https://postgr.es/m/ZekQQHCrIqLVpGz5@paquier.xyz	2024-03-07 09:57:52 +09:00
Michael Paquier	08a52ab151	Add recovery TAP test for race condition with slot invalidations This commit adds a recovery test to provide coverage for the bug fixed in `818fefd8fd`, using an injection point to wait just after the process of an active slot is killed. The trick is to give enough time for effective_xmin and effective_catalog_xmin to advance so as the slot invalidation robustness can be checked since the active process is killed without holding its slot's mutex for a short time. Author: Bertrand Drouvot Discussion: https://postgr.es/m/ZdyZya4YrNapWKqz@ip-10-97-1-34.eu-west-3.compute.internal	2024-03-06 14:39:40 +09:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00
Michael Paquier	efa70c15c7	Make GetSlotInvalidationCause() return RS_INVAL_NONE on unexpected input `943f7ae1c8` has changed GetSlotInvalidationCause() so as it would return the last element of SlotInvalidationCauses[] when an incorrect conflict reason name is given by a caller, but this should return RS_INVAL_NONE in such cases, even if such a state should never be reached in practice. Per gripe from Peter Smith. Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CAHut+PtsrSWxczpGkSaSVtJo+BXrvJ3Hwp5gES14bbL-G+HL7A@mail.gmail.com	2024-02-22 19:59:58 +09:00
Amit Kapila	93db6cbda0	Add a new slot sync worker to synchronize logical slots. By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-22 15:25:15 +05:30
Michael Paquier	943f7ae1c8	Add lookup table for replication slot conflict reasons This commit switches the handling of the conflict cause strings for replication slots to use a table rather than being explicitly listed, using a C99-designated initializer syntax for the array elements. This makes the whole more readable while easing future maintenance with less areas to update when adding a new conflict reason. This is similar to `74a7306310`, but the scale of the change is smaller as there are less conflict causes than LWLock builtin tranche names. Author: Bharath Rupireddy Reviewed-by: Jelte Fennema-Nio Discussion: https://postgr.es/m/CALj2ACUxSLA91QGFrJsWNKs58KXb1C03mbuwKmzqqmoAKLwJaw@mail.gmail.com	2024-02-22 08:40:40 +09:00
Michael Paquier	818fefd8fd	Fix race leading to incorrect conflict cause in InvalidatePossiblyObsoleteSlot() The invalidation of an active slot is done in two steps: - Termination of the backend holding it, if any. - Report that the slot is obsolete, with a conflict cause depending on the slot's data. This can be racy because between these two steps the slot mutex would be released while doing system calls, which means that the effective_xmin and effective_catalog_xmin could advance during that time, detecting a conflict cause different than the one originally wanted before the process owning a slot is terminated. Holding the mutex longer is not an option, so this commit changes the code to record the LSNs stored in the slot during the termination of the process owning the slot. Bonus thanks to Alexander Lakhin for the various tests and the analysis. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/ZaTjW2Xh+TQUCOH0@ip-10-97-1-34.eu-west-3.compute.internal Backpatch-through: 16	2024-02-20 13:43:51 +09:00
Amit Kapila	ddd5f4f54a	Add a slot synchronization function. This commit introduces a new SQL function pg_sync_replication_slots() which is used to synchronize the logical replication slots from the primary server to the physical standby so that logical replication can be resumed after a failover or planned switchover. A new 'synced' flag is introduced in pg_replication_slots view, indicating whether the slot has been synchronized from the primary server. On a standby, synced slots cannot be dropped or consumed, and any attempt to perform logical decoding on them will result in an error. The logical replication slots on the primary can be synchronized to the hot standby by using the 'failover' parameter of pg-create-logical-replication-slot(), or by using the 'failover' option of CREATE SUBSCRIPTION during slot creation, and then calling pg_sync_replication_slots() on standby. For the synchronization to work, it is mandatory to have a physical replication slot between the primary and the standby aka 'primary_slot_name' should be configured on the standby, and 'hot_standby_feedback' must be enabled on the standby. It is also necessary to specify a valid 'dbname' in the 'primary_conninfo'. If a logical slot is invalidated on the primary, then that slot on the standby is also invalidated. If a logical slot on the primary is valid but is invalidated on the standby, then that slot is dropped but will be recreated on the standby in the next pg_sync_replication_slots() call provided the slot still exists on the primary server. It is okay to recreate such slots as long as these are not consumable on standby (which is the case currently). This situation may occur due to the following reasons: - The 'max_slot_wal_keep_size' on the standby is insufficient to retain WAL records from the restart_lsn of the slot. - 'primary_slot_name' is temporarily reset to null and the physical slot is removed. The slot synchronization status on the standby can be monitored using the 'synced' column of pg_replication_slots view. A functionality to automatically synchronize slots by a background worker and allow logical walsenders to wait for the physical will be done in subsequent commits. Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-14 09:45:36 +05:30
Amit Kapila	22f7e61a63	Clean-ups for `776621a5e4` and `7329240437`. Following are a few clean-ups related to failover option support in slots: 1. Improve the documentation in create_subscription.sgml. 2. Remove the spurious blank line in subscriptioncmds.c. 3. Remove the NOTICE for alter_replication_slot in subscriptioncmds.c as we would sometimes print it even when nothing has changed. One can find the change by enabling log_replication_commands on the publisher. 4. Optimize ReplicationSlotAlter() function to prevent disk flushing when the slot's data remains unchanged. Author: Hou Zhijie Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/OS0PR01MB57164904651FB588A518E98894472@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-02-07 10:04:04 +05:30
Amit Kapila	a9a47fb6d9	Fix comments in ReplicationSlotAcquire(). They were incorrectly referring to a slot parameter in ReplicationSlotAcquire() which is not passed to the API. Author: Wang Wei Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/OS3PR01MB6275E3CE4DC15FF8B8B80D3A9E7A2@OS3PR01MB6275.jpnprd01.prod.outlook.com	2024-01-29 10:12:58 +05:30
Amit Kapila	7329240437	Allow setting failover property in the replication command. This commit implements a new replication command called ALTER_REPLICATION_SLOT and a corresponding walreceiver API function named walrcv_alter_slot. Additionally, the CREATE_REPLICATION_SLOT command has been extended to support the failover option. These new additions allow the modification of the failover property of a replication slot on the publisher. A subsequent commit will make use of these commands in subscription commands and will add the tests as well to cover the functionality added/changed by this commit. Author: Hou Zhijie, Shveta Malik Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-01-29 09:37:23 +05:30
Amit Kapila	c393308b69	Allow to enable failover property for replication slots via SQL API. This commit adds the failover property to the replication slot. The failover property indicates whether the slot will be synced to the standby servers, enabling the resumption of corresponding logical replication after failover. But note that this commit does not yet include the capability to sync the replication slot; the subsequent commits will add that capability. A new optional parameter 'failover' is added to the pg_create_logical_replication_slot() function. We will also enable to set 'failover' option for slots via the subscription commands in the subsequent commits. The value of the 'failover' flag is displayed as part of pg_replication_slots view. Author: Hou Zhijie, Shveta Malik, Ajin Cherian Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-01-25 12:15:46 +05:30
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Amit Kapila	7c3fb505b1	Log messages for replication slot acquisition and release. This commit log messages (at LOG level when log_replication_commands is set, otherwise at DEBUG1 level) when walsenders acquire and release replication slots. These messages help to know the lifetime of a replication slot - one can know how long a streaming standby, logical subscriber, or replication slot consumer is down. These messages will be useful on production servers to debug and analyze inactive replication slots. Note that these messages are emitted only for walsenders but not for backends. This is because walsenders are the ones that typically hold replication slots for longer durations, unlike backends which hold them for executing replication related functions. Author: Bharath Rupireddy Reviewed-by: Peter Smith, Amit Kapila, Alvaro Herrera Discussion: http://postgr.es/m/CALj2ACX17G7F-jeLt+7KhJ6YxVeRwR8Zk0rDh4VnT546o0UpTQ@mail.gmail.com	2023-11-21 07:59:53 +05:30
Amit Kapila	8bfb231b43	Prohibit max_slot_wal_keep_size to value other than -1 during upgrade. We don't want existing slots in the old cluster to get invalidated during the upgrade. During an upgrade, we set this variable to -1 via the command line in an attempt to prevent such invalidations, but users have ways to override it. This patch ensures the value is not overridden by the user. Author: Kyotaro Horiguchi Reviewed-by: Peter Smith, Alvaro Herrera Discussion: http://postgr.es/m/20231027.115759.2206827438943188717.horikyota.ntt@gmail.com	2023-11-10 08:45:01 +05:30
Amit Kapila	29d0a77fa6	Migrate logical slots to the new node during an upgrade. While reading information from the old cluster, a list of logical slots is fetched. At the later part of upgrading, pg_upgrade revisits the list and restores slots by executing pg_create_logical_replication_slot() on the new cluster. Migration of logical replication slots is only supported when the old cluster is version 17.0 or later. If the old node has invalid slots or slots with unconsumed WAL records, the pg_upgrade fails. These checks are needed to prevent data loss. The significant advantage of this commit is that it makes it easy to continue logical replication even after upgrading the publisher node. Previously, pg_upgrade allowed copying publications to a new node. With this patch, adjusting the connection string to the new publisher will cause the apply worker on the subscriber to connect to the new publisher automatically. This enables seamless continuation of logical replication, even after an upgrade. Author: Hayato Kuroda, Hou Zhijie Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com	2023-10-26 07:06:55 +05:30
David Rowley	2075ba9dc9	Tidy-up some appendStringInfo*() usages Make a few newish calls to appendStringInfo() which have no special formatting use appendStringInfoString() instead. Also, adjust usages of appendStringInfoString() which only append a string containing a single character to make use of appendStringInfoChar() instead. This makes the code marginally faster, but primarily this change is so we use the StringInfo type as it was intended to be used. Discussion: https://postgr.es/m/CAApHDvpXKQmL+r=VDNS98upqhr9yGBhv2Jw3GBFFk_wKHcB39A@mail.gmail.com	2023-10-03 17:09:52 +13:00
Amit Kapila	e0b2eed047	Flush logical slots to disk during a shutdown checkpoint if required. It's entirely possible for a logical slot to have a confirmed_flush LSN higher than the last value saved on disk while not being marked as dirty. Currently, it is not a major problem but a later patch adding support for the upgrade of slots relies on that value being properly flushed to disk. It can also help avoid processing the same transactions again in some boundary cases after the clean shutdown and restart. Say, we process some transactions for which we didn't send anything downstream (the changes got filtered) but the confirm_flush LSN is updated due to keepalives. As we don't flush the latest value of confirm_flush LSN, it may lead to processing the same changes again without this patch. The approach taken by this patch has been suggested by Ashutosh Bapat. Author: Vignesh C, Julien Rouhaud, Kuroda Hayato Reviewed-by: Amit Kapila, Dilip Kumar, Michael Paquier, Ashutosh Bapat, Peter Smith, Hou Zhijie Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com	2023-09-14 08:57:05 +05:30
Peter Eisentraut	d71e6055e4	Fix lack of message pluralization	2023-08-24 14:22:46 +02:00
Peter Eisentraut	3ad5f07c0f	Error message refactoring Take some untranslatable things out of the message and replace by format placeholders, to reduce translatable strings and reduce translation mistakes.	2023-06-23 16:36:17 +02:00
Andres Freund	5ec69b71f1	Improve error messages introduced in `be87200efd` and `0fdab27ad6` Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20230411.120301.93333867350615278.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20230412174244.6njadz4uoiez3l74@awork3.anarazel.de	2023-04-12 11:00:37 -07:00
Andres Freund	0fdab27ad6	Allow logical decoding on standbys Unsurprisingly, this requires wal_level = logical to be set on the primary and standby. The infrastructure added in `26669757b6` ensures that slots are invalidated if the primary's wal_level is lowered. Creating a slot on a standby waits for a xl_running_xact record to be processed. If the primary is idle (and thus not emitting xl_running_xact records), that can take a while. To make that faster, this commit also introduces the pg_log_standby_snapshot() function. By executing it on the primary, completion of slot creation on the standby can be accelerated. Note that logical decoding on a standby does not itself enforce that required catalog rows are not removed. The user has to use physical replication slots + hot_standby_feedback or other measures to prevent that. If catalog rows required for a slot are removed, the slot is invalidated. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion, for the addition of the pg_log_standby_snapshot() function. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com>	2023-04-08 02:20:05 -07:00
Andres Freund	26669757b6	Handle logical slot conflicts on standby During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in `6af1793954`, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-08 00:05:44 -07:00
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Peter Eisentraut	de4d456b40	Improve several permission-related error messages. Mainly move some detail from errmsg to errdetail, remove explicit mention of superuser where appropriate, since that is implied in most permission checks, and make messages more uniform. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/20230316234701.GA903298@nathanxps13	2023-03-17 10:33:09 +01:00
Peter Eisentraut	442f870065	Integrate superuser check into has_rolreplication() This makes it consistent with similar functions like has_createrole_privilege() and allows removing some explicit superuser checks. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/20230310000313.GA3992372%40nathanxps13	2023-03-16 15:43:33 +01:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Alvaro Herrera	0557e17702	Ignore invalidated slots while computing oldest catalog Xmin Once a logical slot has acquired a catalog_xmin, it doesn't let go of it, even when invalidated by exceeding the max_slot_wal_keep_size, which means that dead catalog tuples are not removed by vacuum anymore since the point is invalidated, until the slot is dropped. This could be catastrophic if catalog churn is high. Change the computation of Xmin to ignore invalidated slots, to prevent dead rows from accumulating. Backpatch to 13, where slot invalidation appeared. Author: Sirisha Chamarthi <sirichamarthi22@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAKrAKeUEDeqquN9vwzNeG-CN8wuVsfRYbeOUV9qKO_RHok=j+g@mail.gmail.com	2022-11-22 10:56:07 +01:00
Peter Eisentraut	b1099eca8f	Remove AssertArg and AssertState These don't offer anything over plain Assert, and their usage had already been declared obsolescent. Author: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13	2022-10-28 09:19:06 +02:00
Michael Paquier	7d25958453	Clean up some GUC declarations and comments This adjusts a few things for GUCs related to logical replication, replication slots and WAL senders, in the shape of incorrect comments and values inconsistent with their initial default value. Author: Peter Smith Reviewed-by: Nathan Bossart, Tom Lane, Justin Pryzby Discussion: https://postgr.es/m/CAHut+PtHE0XSfjjRQ6D4v7+dqzCw=d+1a64ujra4EX8aoc_Z+w@mail.gmail.com	2022-10-25 14:06:07 +09:00
Andres Freund	06dbd619bf	pgstat: Prevent stats reset from corrupting slotname by removing slotname Previously PgStat_StatReplSlotEntry contained the slotname, which was mainly used when writing out the stats during shutdown, to identify the slot in the serialized data (at runtime the index in ReplicationSlotCtl->replication_slots is used, but that can change during a restart). Unfortunately the slotname was overwritten when the slot's stats were reset. That turned out to only cause "real" problems if the slot was active during the reset, triggering an assertion failure at the next pgstat_report_replslot(). In other paths the stats were re-initialized during pgstat_acquire_replslot(). Fix this by removing slotname from PgStat_StatReplSlotEntry. Instead we can get the slot's name from the slot itself. Besides fixing a bug, this also is architecturally cleaner (a name is not really statistics). This is safe because stats, for a slot removed while shut down, will not be restored at startup. In 15 the slotname is not removed, but renamed, to avoid changing the stats format. In master, bump PGSTAT_FILE_FORMAT_ID. This commit does not contain a test for the fix. I think this can only be tested by a tap test starting pg_recvlogical in the background and checking pg_recvlogical's output. That type of test is notoriously hard to be reliable, so committing it shortly before the release is wrapped seems like a bad idea. Reported-by: Jaime Casanova <jcasanov@systemguards.com.ec> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/YxfagaTXUNa9ggLb@ahch-to Backpatch: 15-, where the bug was introduced in `5891c7a8ed`	2022-10-08 09:43:29 -07:00
Tom Lane	551aa6b7b9	Improve wording of log messages triggered by max_slot_wal_keep_size. The one about "terminating process to release replication slot" told you nothing about why that was happening. The one about "invalidating slot because its restart_lsn exceeds max_slot_wal_keep_size" told you what was happening, but violated our message style guideline about keeping the primary message short. Add DETAIL/HINT lines to carry the appropriate detail and make the two cases more uniform. While here, fix bogus test logic in 019_replslot_limit.pl: if it timed out without seeing the expected log message, no test failure would be reported. This is flat broken since commit `549ec201d` removed the test counts; even before that it was horribly bad style, since you'd only get told that not all tests had been run. Kyotaro Horiguchi, reviewed by Bertrand Drouvot; test fixes by me Discussion: https://postgr.es/m/20211214.130456.2233153190058148084.horikyota.ntt@gmail.com	2022-09-29 13:27:48 -04:00
Peter Geoghegan	bfcf1b3480	Harmonize parameter names in storage and AM code. Make sure that function declarations use names that exactly match the corresponding names from function definitions in storage, catalog, access method, executor, and logical replication code, as well as in miscellaneous utility/library code. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 19:18:36 -07:00
Michael Paquier	bfb9dfd937	Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat() Several backend-side loops scanning one or more directories with ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory copy, temporary file removal, configuration file parsing, some logical decoding logic and some pgtz stuff) already know the type of the entry being scanned thanks to the dirent structure associated to the entry, on platforms where we know about DT_REG, DT_DIR and DT_LNK to make the difference between a regular file, a directory and a symbolic link. Relying on the direct structure of an entry saves a few system calls to stat() and lstat() in the loops updated here, shaving some code while on it. The logic of the code remains the same, calling stat() or lstat() depending on if it is necessary to look through symlinks. Authors: Nathan Bossart, Bharath Rupireddy Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com	2022-09-02 16:58:06 +09:00
Andres Freund	3f8148c256	Revert 019_replslot_limit.pl related debugging aids. This reverts most of `91c0570a79`, `f28bf667f6`, `fe0972ee5e`, `afdeff1052`. The only thing left is the retry loop in 019_replslot_limit.pl that avoids spurious failures by retrying a couple times. We haven't seen any hard evidence that this is caused by anything but slow process shutdown. We did not find any cases where walsenders did not vanish after waiting for longer. Therefore there's no reason for this debugging code to remain. Discussion: https://postgr.es/m/20220530190155.47wr3x2prdwyciah@alap3.anarazel.de Backpatch: 15-	2022-07-05 11:01:10 -07:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
David Rowley	b0e5f02ddc	Fix various typos and spelling mistakes in code comments Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-11 20:49:41 +12:00
Andres Freund	5891c7a8ed	pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de	2022-04-06 21:29:46 -07:00
Andres Freund	e41aed674f	pgstat: revise replication slot API in preparation for shared memory stats. Previously the pgstat <-> replication slots API was done with on the basis of names. However, the upcoming move to storing stats in shared memory makes it more convenient to use a integer as key. Change the replication slot functions to take the slot rather than the slot name, and expose ReplicationSlotIndex() to compute the index of an replication slot. Special handling will be required for restarts, as the index is not stable across restarts. For now pgstat internally still uses names. Rename pgstat_report_replslot_{create,drop}() to pgstat_{create,drop}_replslot() to match the functions for other kinds of stats. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220404041516.cctrvpadhuriawlq@alap3.anarazel.de	2022-04-06 18:38:24 -07:00
Andres Freund	91c0570a79	Don't fail for > 1 walsenders in 019_replslot_limit, add debug messages. So far the first of the retries introduced in `f28bf667f6` resolves the issue. But I (Andres) am still suspicious that the start of the failures might indicate a problem. To reduce noise, stop reporting a failure if a retry resolves the problem. To allow figuring out what causes the slow slot drop, add a few more debug messages to ReplicationSlotDropPtr. See also commit `afdeff1052`, `fe0972ee5e` and `f28bf667f6`. Discussion: https://postgr.es/m/20220327213219.smdvfkq2fl74flow@alap3.anarazel.de	2022-03-27 22:35:42 -07:00
Andres Freund	d33aeefd9b	Fix warning on mingw due to pid_t width, introduced in `fe0972ee5e`.	2022-02-26 16:07:07 -08:00
Andres Freund	fe0972ee5e	Add further debug info to help debug 019_replslot_limit.pl failures. See also `afdeff1052`. Failures after that commit provided a few more hints, but not yet enough to understand what's going on. In 019_replslot_limit.pl shut down nodes with fast instead of immediate mode if we observe the failure mode. That should tell us whether the failures we're observing are just a timing issue under high load. PGCTLTIMEOUT should prevent buildfarm animals from hanging endlessly. Also adds a bit more logging to replication slot drop and ShutdownPostgres(). Discussion: https://postgr.es/m/20220225192941.hqnvefgdzaro6gzg@alap3.anarazel.de	2022-02-25 17:04:39 -08:00
Andres Freund	afdeff1052	Add temporary debug info to help debug 019_replslot_limit.pl failures. I have not been able to reproduce the occasional failures of 019_replslot_limit.pl we are seeing in the buildfarm and not for lack of trying. The additional logging and increased log level will hopefully help. Will be reverted once the cause is identified. Discussion: https://postgr.es/m/20220218231415.c4plkp4i3reqcwip@alap3.anarazel.de	2022-02-22 18:02:34 -08:00
Andres Freund	2f6501fa3c	Move replication slot release to before_shmem_exit(). Previously, replication slots were released in ProcKill() on error, resulting in reporting replication slot drop of ephemeral slots after the stats subsystem was already shut down. To fix this problem, move replication slot release to a before_shmem_exit() hook that is called before the stats collector shuts down. There wasn't really a good reason for the slot handling to be in ProcKill() anyway. Patch by Masahiko Sawada, with very minor polishing by me. I, Andres, wrote a test for dropping slots during process exit, but there may be some OS dependent issues around the number of times FATAL error messages are displayed due to a still debated libpq issue. So that test will be committed separately / later. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com	2022-02-14 17:08:17 -08:00

1 2 3 4

162 Commits