postgresql

Commit Graph

Author	SHA1	Message	Date
Alexander Korotkov	b1484a3f19	Let table AM insertion methods control index insertion Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger	2024-03-30 22:53:56 +02:00
Alexander Korotkov	c95c25f9af	Custom reloptions for table AM Let table AM define custom reloptions for its tables. This allows to specify AM-specific parameters by WITH clause when creating a table. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:36:25 +02:00
Alexander Korotkov	27bc1772fc	Generalize relation analyze in table AM interface Currently, there is just one algorithm for sampling tuples from a table written in acquire_sample_rows(). Custom table AM can just redefine the way to get the next block/tuple by implementing scan_analyze_next_block() and scan_analyze_next_tuple() API functions. This approach doesn't seem general enough. For instance, it's unclear how to sample this way index-organized tables. This commit allows table AM to encapsulate the whole sampling algorithm (currently implemented in acquire_sample_rows()) into the relation_analyze() API function. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:34:04 +02:00
Tom Lane	0075d78947	Allow "internal" subtransactions in parallel mode. Allow use of BeginInternalSubTransaction() in parallel mode, so long as the subtransaction doesn't attempt to acquire an XID or increment the command counter. Given those restrictions, the other parallel processes don't need to know about the subtransaction at all, so this should be safe. The benefit is that it allows subtransactions intended for error recovery, such as pl/pgsql exception blocks, to be used in PARALLEL SAFE functions. Another reason for doing this is that the API of BeginInternalSubTransaction() doesn't allow reporting failure. pl/python for one, and perhaps other PLs, copes very poorly with an error longjmp out of BeginInternalSubTransaction(). The headline feature of this patch removes the only easily-triggerable failure case within that function. There remain some resource-exhaustion and similar cases, which we now deal with by promoting them to FATAL errors, so that callers need not try to clean up. (It is likely that such errors would leave us with corrupted transaction state inside xact.c, making recovery difficult if not impossible anyway.) Although this work started because of a report of a pl/python crash, we're not going to do anything about that in the back branches. Back-patching this particular fix is obviously not very wise. While we could contemplate some narrower band-aid, pl/python is already an untrusted language, so it seems okay to classify this as a "so don't do that" case. Patch by me, per report from Hao Zhang. Thanks to Robert Haas for review. Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com	2024-03-28 12:43:10 -04:00
Heikki Linnakangas	427005742b	Remove obsolete comment about VACUUM retrying pruning Commit `1ccc1e05ae` removed the retry logic that the comment talked about. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov	2024-03-28 10:18:48 +02:00
Masahiko Sawada	2d8f56dabb	Rethink create and attach APIs of shared TidStore. Previously, the behavior of TidStoreCreate() was inconsistent between local and shared TidStore instances in terms of memory limitation. For local TidStore, a memory context was created with initial and maximum memory block sizes, as well as a minimum memory context size, based on the specified max_bytes values. However, for shared TidStore, the provided DSA area was used for TID storage. Although commit `bb952c8c8b` allowed specifying the initial and maximum DSA segment sizes, callers would have needed to clamp their own limits, which was not consistent and user-friendly. With this commit, when creating a shared TidStore, a dedicated DSA area is created for TID storage instead of using a provided DSA area. The initial and maximum DSA segment sizes are chosen based on the specified max_bytes. Other processes can attach to the shared TidStore using the handle of the created DSA returned by the new TidStoreGetDSA() function and the DSA pointer returned by TidStoreGetHandle(). The created DSA has the same lifetime as the shared TidStore and is deleted when all processes detach from it. To improve clarity, the TidStoreCreate() function has been divided into two separate functions: TidStoreCreateLocal() and TidStoreCreateShared(). Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com	2024-03-28 10:03:28 +09:00
Alexander Korotkov	818861eb57	Fix some typos and grammar issues from commit `87985cc925` Reported-by: Alexander Lakhin	2024-03-27 11:47:41 +02:00
Masahiko Sawada	4edb37e322	Fix a calculation in TidStoreCreate(). Since we expect that the max_bytes is in bytes, not in kilobytes, it should not be multiplied by 1024. Introduced by `30e144287a`. Reported-by: John Naylor, David Rowley Reviewed-by: John Naylor Discussion: https://postgr.es/m/CANWCAZZTE-14ofsucofTuhFsfuDGBNf%3DNZb22TMYT8bxA41oQQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvojg82NDaDEpj1WEZSbVTafj%3DDRmW%2BFrkBdW8ScL4OFxA%40mail.gmail.com	2024-03-26 13:06:06 +09:00
Alexander Korotkov	87985cc925	Allow locking updated tuples in tuple_update() and tuple_delete() Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by the concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This commit eliminates step 2 by taking the lock during the first tuple_update()/tuple_delete() call. The heap table access method saves some effort by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. Also, this commit makes tuple_update()/tuple_delete() optionally save the old tuple into the dedicated slot. That saves efforts on re-fetching tuples in certain cases. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2024-03-26 01:27:56 +02:00
Heikki Linnakangas	f83d709760	Merge prune, freeze and vacuum WAL record formats The new combined WAL record is now used for pruning, freezing and 2nd pass of vacuum. This is in preparation for changing VACUUM to write a combined prune+freeze record per page, instead of separate two records. The new WAL record format now supports that, but the code still always writes separate records for pruning and freezing. This reserves separate XLOG_HEAP2_* info codes for when the pruning record is emitted for on-access pruning or VACUUM, per Peter Geoghegan's suggestion. The record format is identical, but having separate info codes makes it easier analyze pruning and vacuuming with pg_waldump. The function to emit the new WAL record, log_heap_prune_and_freeze(), is in pruneheap.c. The existing heap_log_freeze_plan() and its subroutines are moved to pruneheap.c without changes, to keep them together with log_heap_prune_and_freeze(). Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com	2024-03-25 14:59:58 +02:00
Alexander Korotkov	b670b93a66	Fix an oversight in refactoring in `06b10f80ba`. It was against intended skipping prechecking keys optimization in the first page of range queries to not influence point queries performance. Reported-by: Anton Melnikov Discussion: https://postgr.es/m/30cd7524-b9f1-4cf8-9c4a-223eb2e34441%40postgrespro.ru Author: Pavel Borisov	2024-03-22 15:25:53 +02:00
Alexander Korotkov	c35a3fb5e0	Allow table AM tuple_insert() method to return the different slot This allows table AM to return a native tuple slot even if VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge about system attributes, which could be accessed in the future. table_multi_insert() method already can modify the input 'slots' array. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li	2024-03-21 23:00:40 +02:00
Alexander Korotkov	02eb07ea89	Allow table AM to store complex data structures in rd_amcache The new table AM method free_rd_amcache is responsible for freeing all the memory related to rd_amcache and setting free_rd_amcache to NULL. If the new method is not specified, we still assume rd_amcache to be a single chunk of memory, which could be just pfree'd. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li	2024-03-21 23:00:34 +02:00
Masahiko Sawada	30e144287a	Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently. TIDStore is a data structure designed to efficiently store large sets of TIDs. For TID storage, it employs a radix tree, where the key is a block number, and the value is a bitmap representing offset numbers. The TIDStore can be created on a DSA area and used by multiple backend processes simultaneously. There are potential future users such as tidbitmap.c, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. For example, we can support updating the offset bitmap of existing values. Currently, the TIDStore is not used for anything yet, aside from the test code. But an upcoming patch will use it. This includes a unit test module, in src/test/modules/test_tidstore. Co-authored-by: John Naylor Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-03-21 10:08:42 +09:00
Heikki Linnakangas	c9c260decd	Remove unused PruneState member rel PruneState->rel is no longer being used, so just remove it. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov	2024-03-20 10:13:42 +02:00
Heikki Linnakangas	c33084205a	Reorganize heap_page_prune() function comment heap_page_prune()'s function header comment didn't explain the parameters in the same order they appear in the function. Fix that. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov	2024-03-20 10:13:39 +02:00
Peter Eisentraut	20e58105ba	Separate equalRowTypes() from equalTupleDescs() This introduces a new function equalRowTypes() that is effectively a subset of equalTupleDescs() but only compares the number of attributes and attribute name, type, typmod, and collation. This is enough for most existing uses of equalTupleDescs(), which are changed to use the new function. The only remaining callers of equalTupleDescs() are those that really want to check the full tuple descriptor as such, without concern about record or row or record type semantics. The existing function hashTupleDesc() is renamed to hashRowType(), because it now corresponds more to equalRowTypes(). The purpose of this change is to be clearer about the semantics of the equality asked for by each caller. (At least one caller had a comment that questioned whether equalTupleDescs() was too restrictive.) For example, `4f622503d6` removed attstattarget from the tuple descriptor structure. It was not fully clear at the time how this should affect equalTupleDescs(). Now the answer is clear: By their own definitions, equalRowTypes() does not care, and equalTupleDescs() just compares whatever is in the tuple descriptor but does not care why it is in there. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org	2024-03-17 05:58:04 +01:00
Heikki Linnakangas	84c18acaf6	Remove redundant snapshot copying from parallel leader to workers The parallel query infrastructure copies the leader backend's active snapshot to the worker processes. But BitmapHeapScan node also had bespoken code to pass the snapshot from leader to the worker. That was redundant, so remove it. The removed code was analogous to the snapshot serialization in table_parallelscan_initialize(), but that was the wrong role model. A parallel bitmap heap scan is more like an independent non-parallel bitmap heap scan in each parallel worker as far as the table AM is concerned, because the coordination is done in nodeBitmapHeapscan.c, and the table AM doesn't need to know anything about it. This relies on the assumption that es_snapshot == GetActiveSnapshot(). That's not a new assumption, things would get weird if you used the QueryDesc's snapshot for visibility checks in the scans, but the active snapshot for evaluating quals, for example. This could use some refactoring and cleanup, but for now, just add some assertions. Reviewed-by: Dilip Kumar, Robert Haas Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi	2024-03-14 15:18:10 +02:00
Peter Eisentraut	97d85be365	Make the order of the header file includes consistent Similar to commit `7e735035f2`. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com	2024-03-13 15:07:00 +01:00
Peter Geoghegan	3045324214	Update obsolete index scan TID comments. Oversight in commit `c2fe139c20`.	2024-03-11 18:07:10 -04:00
Heikki Linnakangas	3d8652cd32	Remove unneeded vacuum_delay_point from heap_vac_scan_get_next_block heap_vac_scan_get_next_block() does relatively little work, so there is no need to call vacuum_delay_point(). A future commit will call heap_vac_scan_get_next_block() from a callback, and we would like to avoid calling vacuum_delay_point() in that callback. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com	2024-03-11 20:45:33 +02:00
Heikki Linnakangas	4e76f984a7	Confine vacuum skip logic to lazy_scan_skip() Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more code into the function, so that the caller doesn't need to know about ranges or skipping anymore. heap_vac_scan_next_block() returns the next block to process, and the logic for determining that block is all within the function. This makes the skipping logic easier to understand, as it's all in the same function, and makes the calling code easier to understand as it's less cluttered. The state variables needed to manage the skipping logic are moved to LVRelState. heap_vac_scan_next_block() now manages its own VM buffer separately from the caller's vmbuffer variable. The caller's vmbuffer holds the VM page for the current block its processing, while heap_vac_scan_next_block() keeps a pin on the VM page for the next unskippable block. Most of the time they are the same, so we hold two pins on the same buffer, but it's more convenient to manage them separately. For readability inside heap_vac_scan_next_block(), move the logic of finding the next unskippable block to separate function, and add some comments. This refactoring will also help future patches to switch to using a streaming read interface, and eventually AIO (https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com) Author: Melanie Plageman, Heikki Linnakangas Reviewed-by: Andres Freund (older version) Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com	2024-03-11 20:43:58 +02:00
Heikki Linnakangas	674e49c73c	Set all_visible_according_to_vm correctly with DISABLE_PAGE_SKIPPING It's important for 'all_visible_according_to_vm' to correctly reflect whether the VM bit is set or not, even when we are not trusting the VM to skip pages, because contrary to what the comment said, lazy_scan_prune() relies on it. If it's incorrectly set to 'false', when the VM bit is in fact set, lazy_scan_prune() will try to set the VM bit again and dirty the page unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all heap pages were dirtied, even if there were no changes. We would also fail to clear any VM bits that were set incorrectly. This was broken in commit `980ae17310`, so backpatch to v16. Backpatch-through: 16 Reviewed-by: Melanie Plageman, Peter Geoghegan Discussion: https://www.postgresql.org/message-id/3df2b582-dc1c-46b6-99b6-38eddd1b2784@iki.fi	2024-03-11 09:28:09 +02:00
Michael Paquier	f500ba07fa	Add some checkpoint and redo LSNs to a couple of recovery errors Two FATALs and one PANIC gain details about the LSNs they fail at: - When restoring from a backup_label, the FATAL log generated when not finding the checkpoint record now reports its LSN. - When restoring from a backup_label, the FATAL log generated when not finding the redo record referenced by a checkpoint record now shows both the redo and checkpoint record LSNs. - When not restoring from a backup_label, the PANIC error generated when not finding the checkpoint record now reports its LSN. This information is useful when debugging corruption issues, and these LSNs may not show up in the logs depending on the level of logging configured in the backend. Author: David Steele Discussion: https://postgr.es/m/0e90da89-77ca-4ccf-872c-9626d755e288@pgmasters.net	2024-03-11 09:08:05 +09:00
Alexander Korotkov	6f38c43eb1	Avoid stack overflow in ShowTransactionStateRec() The function recurses, but didn't perform stack-depth checks. It's just a debugging aid, so instead of the usual check_stack_depth() call, stop the printing if we'd risk stack overflow. Here's an example of how to test this: (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") \| psql >/dev/null In the passing, swap building the list of child XIDs and recursing to parent. That saves memory while recursing, reducing the risk of out of memory errors with lots of subtransactions. The saving is not very significant in practice, but this order seems more logical anyway. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://www.postgresql.org/message-id/1672760457.940462079%40f306.i.mail.ru Author: Heikki Linnakangas Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov	2024-03-08 13:18:30 +02:00
Alexander Korotkov	fefd9a3fed	Turn tail recursion into iteration in CommitTransactionCommand() Usually the compiler will optimize away the tail recursion anyway, but if it doesn't, you can drive the function into stack overflow. For example: (n=1000000; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT s$i;"; done; printf "ERROR; COMMIT;") \| psql >/dev/null In order to get better readability and less changes to the existing code the recursion-replacing loop is implemented as a wrapper function. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru Author: Alexander Korotkov, Heikki Linnakangas	2024-03-08 13:18:30 +02:00
Alvaro Herrera	0d3a71d0c8	Fix misspelled assertions Remove an extra & operator, per Tom Lane. My bugs, introduced with commit `53c2a97a92`. Discussion: https://postgr.es/m/3885480.1709590472@sss.pgh.pa.us	2024-03-05 12:13:12 +01:00
Alvaro Herrera	1a2654b32b	Rework redundant code in subtrans.c When this code was written the duplicity didn't matter, but with all the SLRU-bank stuff we just added, it has become excessive. Turn it into a simpler loop with no code duplication. Also add a test so that this code becomes covered. Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql	2024-03-05 12:09:18 +01:00
Jeff Davis	59825d1639	Fix buildfarm failures from `2af07e2f74`. Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working with parallel query. Now that the call requires more arguments, wrap the call in a new function to avoid code duplication and offer a place for a comment. Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org	2024-03-04 19:42:16 -08:00
Jeff Davis	2af07e2f74	Fix search_path to a safe value during maintenance operations. While executing maintenance operations (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to 'pg_catalog, pg_temp' to prevent inconsistent behavior. Functions that are used for functional indexes, in index expressions, or in materialized views and depend on a different search path must be declared with CREATE FUNCTION ... SET search_path='...'. This change was previously committed as `05e1737351`, then reverted in commit `2fcc7ee7af` because it was too late in the cycle. Preparation for the MAINTAIN privilege, which was previously reverted due to search_path manipulation hazards. Discussion: https://postgr.es/m/d4ccaf3658cb3c281ec88c851a09733cd9482f22.camel@j-davis.com Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com Reviewed-by: Greg Stark, Nathan Bossart, Noah Misch	2024-03-04 17:31:38 -08:00
Alvaro Herrera	a0b808baef	Rework locking code in GetMultiXactIdMembers After commit `53c2a97a92`, the code flow around the "retry" goto label in GetMultiXactIdMembers was confused about what was possible: we never return there with a held lock, so there's no point in testing for one. This realization lets us simplify the code a bit. While at it, make the scope of a couple of local variables in the same function a bit tighter. Per Coverity.	2024-03-04 17:48:01 +01:00
Alvaro Herrera	f9baaf96d3	Simplify coding in slru.c New code in `53c2a97a92` uses direct array access to shared->bank_locks[bankno].lock which can be made a little bit more legible by using the SimpleLruGetBankLock helper function. Nothing terribly serious, but let's add some clarity. Discussion: https://postgr.es/m/202403041517.3a35jw53os65@alvherre.pgsql	2024-03-04 17:37:47 +01:00
Peter Eisentraut	dbbca2cf29	Remove unused #include's from backend .c files as determined by include-what-you-use (IWYU) While IWYU also suggests to add a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org	2024-03-04 12:02:20 +01:00
Heikki Linnakangas	24eebc65c2	Remove unused 'countincludesself' argument to pq_sendcountedtext() It has been unused since we removed support for protocol version 2.	2024-03-04 12:56:05 +02:00
Heikki Linnakangas	0dd094c4a0	Remove unused ParallelWorkerInfo.pid field The pid was originally used in error context of messages propagated from parallel workers, but commit `292794f82b` removed that. If the need arises in the future, you can also get the pid with "shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".	2024-03-04 12:56:02 +02:00
Heikki Linnakangas	393b5599e5	Use MyBackendType in more places to check what process this is Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new AmProcess() macros that use MyBackendType. For consistency with the existing AmProcess() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi	2024-03-04 10:25:12 +02:00
Michael Paquier	6782709df8	Add regression test for restart points during promotion This test serves as a way to demonstrate how to use the features introduced in `37b369dc67`, while providing coverage for `7863ee4def` that caused the startup process to throw "PANIC: could not locate a valid checkpoint record" when starting recovery. The test checks that a node is able to properly restart following a crash when a restart point was finishing across a promotion, with an injection point added in the middle of CreateRestartPoint() to stop the restartpoint in flight. Note that this test fails when `7863ee4def` is reverted. Kyotaro Horiguchi is the original author of this test, that has been originally posted on the thread where `7863ee4def` was discussed. I have just upgraded and polished it to rely on injection points, making it much cheaper to reproduce the failure. This test requires injection points to be enabled in the builds, hence meson and ./configure need an update to pass this knowledge down to the test. The name of the new injection point follows the same naming convention as `6a1ea02c49`. The Makefile's EXTRA_INSTALL of recovery TAP tests is updated to include modules/injection_points. Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Andrey Borodin, Bertrand Drouvot Discussion: https://postgr.es/m/ZdLuxBk5hGpol91B@paquier.xyz	2024-03-04 09:49:03 +09:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00
Heikki Linnakangas	ab355e3a88	Redefine backend ID to be an index into the proc array Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:37:28 +02:00
Nathan Bossart	963d3072af	Convert unloggedLSN to an atomic variable. Currently, this variable is an XLogRecPtr protected by a spinlock. By converting it to an atomic variable, we can remove the spinlock, which saves a small amount of shared memory space. Since this code is not performance-critical, we use atomic operations with full barrier semantics to make it easy to reason about correctness. Author: John Morris Reviewed-by: Michael Paquier, Robert Haas, Andres Freund, Stephen Frost, Bharath Rupireddy Discussion: https://postgr.es/m/BYAPR13MB26772534335255E50318C574A0409%40BYAPR13MB2677.namprd13.prod.outlook.com Discussion: https://postgr.es/m/MN2PR13MB2688FD8B757316CB5C54C8A2A0DDA%40MN2PR13MB2688.namprd13.prod.outlook.com	2024-02-29 14:34:10 -06:00
Amit Kapila	b3f6b14cf4	Fixups for commit `93db6cbda0`. Ensure to set always-secure search path for both local and remote connections during slot synchronization, so that malicious users can't redirect user code (e.g. operators). In the passing, improve the name of define, remove spurious return statement, and a minor change in one of the comments. Author: Bertrand Drouvot and Shveta Malik Reviewed-by: Amit Kapila, Peter Smith Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com Discussion: https://postgr.es/m/ZdcejBDCr+wlVGnO@ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://postgr.es/m/CAJpy0uBNP=nrkNJkJSfF=jSocEh8vU2Owa8Rtpi=63fG=SvfVQ@mail.gmail.com	2024-02-29 09:45:20 +05:30
Alvaro Herrera	53c2a97a92	Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by `bcdfa5f2e2` in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-02-28 17:05:31 +01:00
Alvaro Herrera	bcdfa5f2e2	Rename SLRU elements in view pg_stat_slru The new names are intended to match those in an upcoming patch that adds a few GUCs to configure the SLRU buffer sizes. Backwards compatibility concern: this changes the accepted names for function pg_stat_slru_rest(). Since this function recognizes "any other string" as a request to reset the entry for "other", this means that calling it with the old names would silently reset "other" instead of doing nothing or throwing an error. Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql	2024-02-28 09:39:52 +01:00
Michael Paquier	17a3f79f81	Fix comment thinko in sequence.c One comment mentioned indexes, but the relation opened should be sequences. Reported-by: Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WiMGNG9XK3NSUen-5BARhCnP=u=FXnf8pvpL2qDKeOsZg@mail.gmail.com	2024-02-27 08:19:39 +09:00
Alvaro Herrera	5f79cb7629	slru.c: Reduce scope of variables in 'for' blocks Pretty boring.	2024-02-26 16:49:50 +01:00
Michael Paquier	449e798c77	Introduce sequence_*() access functions Similarly to tables and indexes, these functions are able to open relations with a sequence relkind, which is useful to make a distinction with the other relation kinds. Previously, commands/sequence.c used a mix of table_{close,open}() and relation_{close,open}() routines when manipulating sequence relations, so this clarifies the code. A direct effect of this change is to align the error messages produced when attempting DDLs for sequences on relations with an unexpected relkind, like a table or an index with ALTER SEQUENCE, providing an extra error detail about the relkind of the relation used in the DDL query. Author: Michael Paquier Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz	2024-02-26 16:04:59 +09:00
Tom Lane	f5a465f1a0	Promote assertion about !ReindexIsProcessingIndex to runtime error. When this assertion was installed (in commit `d2f60a3ab`), I thought it was only for catching server logic errors that caused accesses to catalogs that were undergoing index rebuilds. However, it will also fire in case of a user-defined index expression that attempts to access its own table. We occasionally see reports of people trying to do that, and typically getting unintelligible low-level errors as a result. We can provide a more on-point message by making this a regular runtime check. While at it, adjust the similar error check in systable_beginscan_ordered to use the same message text. That one is (probably) not reachable without a coding bug, but we might as well use a translatable message if we have one. Per bug #18363 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18363-e3598a5a572d0699@postgresql.org	2024-02-25 16:15:07 -05:00
Heikki Linnakangas	8af2565248	Introduce a new smgr bulk loading facility. The new facility makes it easier to optimize bulk loading, as the logic for buffering, WAL-logging, and syncing the relation only needs to be implemented once. It's also less error-prone: We have had a number of bugs in how a relation is fsync'd - or not - at the end of a bulk loading operation. By centralizing that logic to one place, we only need to write it correctly once. The new facility is faster for small relations: Instead of of calling smgrimmedsync(), we register the fsync to happen at next checkpoint, which avoids the fsync latency. That can make a big difference if you are e.g. restoring a schema-only dump with lots of relations. It is also slightly more efficient with large relations, as the WAL logging is performed multiple pages at a time. That avoids some WAL header overhead. The sorted GiST index build did that already, this moves the buffering to the new facility. The changes to pageinspect GiST test needs an explanation: Before this patch, the sorted GiST index build set the LSN on every page to the special GistBuildLSN value, not the LSN of the WAL record, even though they were WAL-logged. There was no particular need for it, it just happened naturally when we wrote out the pages before WAL-logging them. Now we WAL-log the pages first, like in B-tree build, so the pages are stamped with the record's real LSN. When the build is not WAL-logged, we still use GistBuildLSN. To make the test output predictable, use an unlogged index. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi	2024-02-23 16:10:51 +02:00
Amit Kapila	93db6cbda0	Add a new slot sync worker to synchronize logical slots. By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-22 15:25:15 +05:30
Heikki Linnakangas	28f3915b73	Remove superfluous 'pgprocno' field from PGPROC It was always just the index of the PGPROC entry from the beginning of the proc array. Introduce a macro to compute it from the pointer instead. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-02-22 01:21:34 +02:00

1 2 3 4 5 ...

4860 Commits