postgresql

Commit Graph

Author	SHA1	Message	Date
Michael Paquier	8d9978a717	Apply quotes more consistently to GUC names in logs Quotes are applied to GUCs in a very inconsistent way across the code base, with a mix of double quotes or no quotes used. This commit removes double quotes around all the GUC names that are obviously referred to as parameters with non-English words (use of underscore, mixed case, etc). This is the result of a discussion with Álvaro Herrera, Nathan Bossart, Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2023-11-30 14:11:45 +09:00
Heikki Linnakangas	60f227316c	Fix assertions with RI triggers in heap_update and heap_delete. If the tuple being updated is not visible to the crosscheck snapshot, we return TM_Updated but the assertions would not hold in that case. Move them to before the cross-check. Fixes bug #17893. Backpatch to all supported versions. Author: Alexander Lakhin Backpatch-through: 12 Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org	2023-11-28 12:00:14 +02:00
Andres Freund	b2e237afdd	Release lock on heap buffer before vacuuming FSM When there are no indexes on a table, we vacuum each heap block after pruning it and then update the freespace map. Periodically, we also vacuum the freespace map. This was done while unnecessarily holding a lock on the heap page. Release the lock before calling FreeSpaceMapVacuumRange() and, while we're at it, ensure the range includes the heap block we just vacuumed. There are no known deadlocks or other similar issues, therefore don't backpatch. It's certainly not good to do all this work under a lock, but it's not frequently reached, making it not worth the risk of backpatching. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com	2023-11-17 12:46:55 -08:00
David Rowley	10d34fefc2	Ensure we use the correct spelling of "ensure" We seem to have accidentally used "insure" in a few places. Correct that. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com Backpatch-through: 12, oldest supported version	2023-11-10 00:15:54 +13:00
Peter Eisentraut	611806cd72	Add trailing commas to enum definitions Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org	2023-10-26 09:20:54 +02:00
Thomas Munro	01529c7040	Fix comment from commit `22655aa231`. Per automated complaint from BF animal koel this needed to be re-indented, but there was also a typo. Back-patch to 16.	2023-10-16 13:32:41 +13:00
Andres Freund	22655aa231	Fix bulk table extension when copying into multiple partitions When COPYing into a partitioned table that does now permit the use of table_multi_insert(), we could error out with ERROR: could not read block NN in file "base/...": read only 0 of 8192 bytes because BulkInsertState->next_free was not reset between partitions. This problem occurred only when not able to use table_multi_insert(), as a dedicated BulkInsertState for each partition is used in that case. The bug was introduced in `00d1e02be2`, but it was hard to hit at that point, as commonly bulk relation extension is not used when not using table_multi_insert(). It became more likely after `82a4edabd2`, which expanded the use of bulk extension. To fix the bug, reset the bulk relation extension state in BulkInsertState in ReleaseBulkInsertStatePin(). That was added (in `b1ecb9b3fc`) to tackle a very similar issue. Obviously the name is not quite correct, but there might be external callers, and bulk insert state needs to be reset in precisely in the situations that ReleaseBulkInsertStatePin() already needed to be called. Medium term the better fix likely is to disallow reusing BulkInsertState across relations. Add a test that, without the fix, reproduces #18130 in most configurations. The test also catches the problem fixed in `b1ecb9b3fc` when run with small shared_buffers. Reported-by: Ivan Kolombet <enderstd@gmail.com> Analyzed-by: Tom Lane <tgl@sss.pgh.pa.us> Analyzed-by: Andres Freund <andres@anarazel.de> Bug: #18130 Discussion: https://postgr.es/m/18130-7a86a7356a75209d%40postgresql.org Discussion: https://postgr.es/m/257696.1695670946%40sss.pgh.pa.us Backpatch: 16-	2023-10-13 19:16:44 -07:00
Robert Haas	1ccc1e05ae	Remove retry loop in heap_page_prune(). The retry loop is needed because heap_page_prune() calls HeapTupleSatisfiesVacuum() and then lazy_scan_prune() does the same thing again, and they might get different answers due to concurrent clog updates. But this patch makes heap_page_prune() return the HeapTupleSatisfiesVacuum() results that it computed back to the caller, which allows lazy_scan_prune() to avoid needing to recompute those values in the first place. That's nice both because it eliminates the need for a retry loop and also because it's cheaper. Melanie Plageman, reviewed by David Geier, Andres Freund, and me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com	2023-10-02 11:40:07 -04:00
Robert Haas	4e9fc3a976	Return data from heap_page_prune via a struct. Previously, one of the values in the struct was returned as the return value, and another was returned via an output parameter. In preparation for returning more stuff, consolidate both values into a struct returned via an output parameter. Melanie Plageman, reviewed by Andres Freund and by me. Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com	2023-09-28 10:36:34 -04:00
Heikki Linnakangas	18724af9e8	Remove unnecessary smgrimmedsync() when creating unlogged table. This became safe after commit `4b4798e138`. The smgrcreate() call will now register the segment for syncing at the next checkpoint, so we don't need to sync it here. If a checkpoint happens before the creation is WAL-logged, the records will be replayed when starting recovery from the checkpoint. If a checkpoint happens after the WAL logging, the checkpoint will fsync() it. In the passing, clarify a comment in smgrDoPendingSyncs(). Discussion: https://www.postgresql.org/message-id/6e5bbc08-cdfc-b2b3-9e23-1a914b9850a9%40iki.fi Reviewed-by: Robert Haas	2023-09-15 17:29:37 +03:00
Thomas Munro	9f0602539d	Remove some more "snapshot too old" vestiges. Commit `f691f5b8` removed the logic, but left behind some now-useless Snapshot arguments to various AM-internal functions, and missed a couple of comments. Reported-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wznj9qSNXZ1P1uWTUD_FeaTezbUazb416EPwi4Qr_jR_6A%40mail.gmail.com	2023-09-08 17:12:12 +12:00
Thomas Munro	f691f5b80a	Remove the "snapshot too old" feature. Remove the old_snapshot_threshold setting and mechanism for producing the error "snapshot too old", originally added by commit `848ef42b`. Unfortunately it had a number of known problems in terms of correctness and performance, mostly reported by Andres in the course of his work on snapshot scalability. We agreed to remove it, after a long period without an active plan to fix it. This is certainly a desirable feature, and someone might propose a new or improved implementation in the future. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CACG%3DezYV%2BEvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA%40mail.gmail.com Discussion: https://postgr.es/m/20200401064008.qob7bfnnbu4w5cw4%40alap3.anarazel.de Discussion: https://postgr.es/m/CA%2BTgmoY%3Daqf0zjTD%2B3dUWYkgMiNDegDLFjo%2B6ze%3DWtpik%2B3XqA%40mail.gmail.com	2023-09-05 19:53:43 +12:00
Heikki Linnakangas	e8d74ad625	Report syncscan position at end of scan. The comment in heapgettup_advance_block() says that it reports the scan position before checking for end of scan, but that didn't match the code. The code was refactored in commit `7ae0ab0ad9`, which inadvertently changed the order of the check and reporting. Change it back. This caused a few regression test failures with a small shared_buffers setting like 10 MB. The 'portals' and 'cluster' tests perform seqscans that are large enough that sync seqscans kick in. When the sync scan position is not updated at end of scan, the next seq scan doesn't start at the beginning of the table, and the test queries are sensitive to that. Reviewed-by: Melanie Plageman, David Rowley Discussion: https://www.postgresql.org/message-id/6f991389-ae22-d844-a9d8-9aceb7c01a9a@iki.fi Backpatch-through: 16	2023-08-31 13:02:15 +03:00
Thomas Munro	7114791158	ExtendBufferedWhat -> BufferManagerRelation. Commit `31966b15` invented a way for functions dealing with relation extension to accept a Relation in online code and an SMgrRelation in recovery code. It seems highly likely that future bufmgr.c interfaces will face the same problem, and need to do something similar. Generalize the names so that each interface doesn't have to re-invent the wheel. Back-patch to 16. Since extension AM authors might start using the constructor macros once 16 ships, we agreed to do the rename in 16 rather than waiting for 17. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com	2023-08-23 12:31:23 +12:00
Andres Freund	82a4edabd2	hio: Take number of prior relation extensions into account The new relation extension logic, introduced in `00d1e02be2`, could lead to slowdowns in some scenarios. E.g., when loading narrow rows into a table using COPY, the caller of RelationGetBufferForTuple() will only request a small number of pages. Without concurrency, we just extended using pwritev() in that case. However, if there is some concurrency, we switched between extending by a small number of pages and a larger number of pages, depending on the number of waiters for the relation extension logic. However, some filesystems, XFS in particular, do not perform well when switching between extending files using fallocate() and pwritev(). To avoid that issue, remember the number of prior relation extensions in BulkInsertState and extend more aggressively if there were prior relation extensions. That not just avoids the aforementioned slowdown, but also leads to noticeable performance gains in other situations, primarily due to extending more aggressively when there is no concurrency. I should have done it this way from the get go. Reported-by: Masahiko Sawada <sawada.mshk@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAD21AoDvDmUQeJtZrau1ovnT_smN940=Kp6mszNGK3bq9yRN6g@mail.gmail.com Backpatch: 16-, where the new relation extension code was added	2023-08-14 11:33:09 -07:00
Masahiko Sawada	46ebdfe164	Report index vacuum progress. This commit adds two columns: indexes_total and indexes_processed, to pg_stat_progress_vacuum system view to show the index vacuum progress. These numbers are reported in the "vacuuming indexes" and "cleaning up indexes" phases. This uses the new parallel message type for progress reporting added by be06506e7. Bump catversion because this changes the definition of pg_stat_progress_vacuum. Author: Sami Imseih Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund Discussion: https://www.postgresql.org/message-id/flat/5478DFCD-2333-401A-B2F0-0D186AB09228@amazon.com	2023-07-11 12:34:01 +09:00
Tomas Vondra	ec99d6e9c8	Document relaxed HOT for summarizing indexes Commit `19d8e2308b` allowed a weaker check for HOT with summarizing indexes, but it did not update README.HOT. So do that now. Patch by Matthias van de Meent, minor changes by me. Backpatch to 16, where the optimization was introduced. Author: Matthias van de Meent Reviewed-by: Tomas Vondra Backpatch-through: 16 Discussion: https://postgr.es/m/CAEze2WiEOm8V+c9kUeYp2BPhbEc5s473fUf51xNeqvSFGv44Ew@mail.gmail.com	2023-07-07 19:04:53 +02:00
Thomas Munro	bcc93a389c	Fix race in SSI interaction with bitmap heap scan. When performing a bitmap heap scan, we don't want to miss concurrent writes that occurred after we observed the heap's rs_nblocks, but before we took predicate locks on index pages. Therefore, we can't skip fetching any heap tuples that are referenced by the index, because we need to test them all with CheckForSerializableConflictOut(). The old optimization that would ignore any references to blocks >= rs_nblocks gets in the way of that requirement, because it means that concurrent writes in that window are ignored. Removing that optimization shouldn't affect correctness at any isolation level, because any new tuples shouldn't be visible to an MVCC snapshot. There also shouldn't be any error-causing references to heap blocks past the end, because we should have held at least an AccessShareLock on the table before the index scan. It can't get smaller while our transaction is running. For now, though, we'll keep the optimization at lower levels to avoid making unnecessary changes in a bug fix. Back-patch to all supported releases. In release 11, the code is in a different place but not fundamentally different. Fixes one aspect of bug #17949. Reported-by: Artem Anisimov <artem.anisimov.255@gmail.com> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/17949-a0f17035294a55e2%40postgresql.org	2023-07-04 09:07:31 +12:00
Peter Geoghegan	d088ba5a5a	nbtree: Allocate new pages in separate function. Split nbtree's _bt_getbuf function is two: code that read locks or write locks existing pages remains in _bt_getbuf, while code that deals with allocating new pages is moved to a new, dedicated function called _bt_allocbuf. This simplifies most _bt_getbuf callers, since it is no longer necessary for them to pass a heaprel argument. Many of the changes to nbtree from commit `61b313e4` can be reverted. This minimizes the divergence between HEAD/PostgreSQL 16 and earlier release branches. _bt_allocbuf replaces the previous nbtree idiom of passing P_NEW to _bt_getbuf. There are only 3 affected call sites, all of which continue to pass a heaprel for recovery conflict purposes. Note that nbtree's use of P_NEW was superficial; nbtree never actually relied on the P_NEW code paths in bufmgr.c, so this change is strictly mechanical. GiST already took the same approach; it has a dedicated function for allocating new pages called gistNewBuffer(). That factor allowed commit `61b313e4` to make much more targeted changes to GiST. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAH2-Wz=8Z9qY58bjm_7TAHgtW6RzZ5Ke62q5emdCEy9BAzwhmg@mail.gmail.com	2023-06-10 14:08:25 -07:00
Tom Lane	0245f8db36	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql	2023-05-19 17:24:48 -04:00
Michael Paquier	8961cb9a03	Fix typos in comments The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com	2023-05-02 12:23:08 +09:00
Daniel Gustafsson	4a6603cd46	Fix assertion failure in heap_vacuum_rel Commit `7d71d3dd08` changed resetting the VacuumFailsafeActive flag to an assertion since the flag is reset before starting vacuuming a relation. This however failed to take recursive calls of vacuum_rel() and vacuum of TOAST tables into consideration. Fix by reverting back to resettting the flag. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/CAFBsxsFz=GqaG5Ens5aNgVYoV2Y+pfMUijX0ku+CCkWfALwiqg@mail.gmail.com	2023-04-28 10:30:05 +02:00
Peter Geoghegan	e944063294	Fix xl_heap_lock WAL record field's data type. Make xl_heap_lock's infobits_set field of type uint8, not int8. Using int8 isn't appropriate given that the field just holds status bits. This fixes an oversight in commit `0ac5ad5134`. In passing rename the nearby TransactionId field to "xmax" to make things consistency with related records, such as xl_heap_lock_updated. Deliberately avoid a bump in XLOG_PAGE_MAGIC. No backpatch, either. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com	2023-04-11 14:07:54 -07:00
Andres Freund	26669757b6	Handle logical slot conflicts on standby During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in `6af1793954`, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-08 00:05:44 -07:00
Thomas Munro	faeedbcefd	Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit `439f6175`. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com	2023-04-08 16:34:50 +12:00
Andres Freund	00d1e02be2	hio: Use ExtendBufferedRelBy() to extend tables more efficiently While we already had some form of bulk extension for relations, it was fairly limited. It only amortized the cost of acquiring the extension lock, the relation itself was still extended one-by-one. Bulk extension was also solely triggered by contention, not by the amount of data inserted. To address this, use ExtendBufferedRelBy(), introduced in `31966b151e`, to extend the relation. We try to extend the relation by multiple blocks in two situations: 1) The caller tells RelationGetBufferForTuple() that it will need multiple pages. For now that's only used by heap_multi_insert(), see commit FIXME. 2) If there is contention on the extension lock, use the number of waiters for the lock as a multiplier for the number of blocks to extend by. This is similar to what we already did. Previously we additionally multiplied the numbers of waiters by 20, but with the new relation extension infrastructure I could not see a benefit in doing so. Using the freespacemap to provide empty pages can cause significant contention, and adds measurable overhead, even if there is no contention. To reduce that, remember the blocks the relation was extended by in the BulkInsertState, in the extending backend. In case 1) from above, the blocks the extending backend needs are not entered into the FSM, as we know that we will need those blocks. One complication with using the FSM to record empty pages, is that we need to insert blocks into the FSM, when we already hold a buffer content lock. To avoid doing IO while holding a content lock, release the content lock before recording free space. Currently that opens a small window in which another backend could fill the block, if a concurrent VACUUM records the free space. If that happens, we retry, similar to the already existing case when otherBuffer is provided. In the future it might be worth closing the race by preventing VACUUM from recording the space in newly extended pages. This change provides very significant wins (3x at 16 clients, on my workstation) for concurrent COPY into a single relation. Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit `4d330a61bb`). Even single-row INSERTs benefit, although to a much smaller degree, as the relation extension lock rarely is the primary bottleneck. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:53:17 -07:00
Andres Freund	5279e9db8e	heapam: Pass number of required pages to RelationGetBufferForTuple() A future commit will use this information to determine how aggressively to extend the relation by. In heap_multi_insert() we know accurately how many pages we need once we need to extend the relation, providing an accurate lower bound for how much to extend. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:17:16 -07:00
Daniel Gustafsson	7d71d3dd08	Refresh cost-based delay params more frequently in autovacuum Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 01:00:21 +02:00
Daniel Gustafsson	71a825194f	Make vacuum failsafe_active globally visible While vacuuming a table in failsafe mode, VacuumCostActive should not be re-enabled. This currently isn't a problem because vacuum cost parameters are only refreshed in between vacuuming tables and failsafe status is reset for every table. In preparation for allowing vacuum cost parameters to be updated more frequently, elevate LVRelState->failsafe_active to a global, VacuumFailsafeActive, which will be checked when determining whether or not to re-enable vacuum cost-related delays. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:08 +02:00
Andres Freund	18103b7c5f	hio: Don't pin the VM while holding buffer lock while extending Starting with commit `7db0cd2145`, RelationGetBufferForTuple() did a visibilitymap_pin() while holding an exclusive buffer content lock on a newly extended page, when using COPY FREEZE. We elsewhere try hard to avoid to doing IO while holding a content lock. And until `14f98e0af9`, that happened while holding the relation extension lock. Practically, this isn't a huge issue, because COPY FREEZE is restricted to relations created or truncated in the current session, so it's unlikely there's a lot of contention. We can't avoid doing IO while holding the content lock by pinning the VM earlier, because we don't know which page it will be on. While we could just ignore the issue in this case, a future commit will add bulk relation extension, which needs to enter pages into the FSM while also trying to hold onto a buffer lock. To address this issue, use visibilitymap_pin_ok() to see if the relevant buffer is already pinned. If not, release the buffer, pin the VM buffer, and acquire the lock again. This opens up a small window for other backends to insert data onto the page - as the page is not entered into the freespacemap, other backends won't see it normally, but a concurrent vacuum could enter the page, if it started just after the relation is extended. In case the page is used by another backend, retry. This is very similar to how locking "otherBuffer" is already dealt with. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4zqz@awork3.anarazel.de	2023-04-06 11:11:13 -07:00
Andres Freund	bba9003b62	hio: Relax rules for calling GetVisibilityMapPins() GetVisibilityMapPins() insisted on the buffer1/buffer2 being in a specific order. This required checks at the callsite. As a subsequent patch will add another callsite, move related logic into GetVisibilityMapPins(). Discussion: https://postgr.es/m/20230403190030.fk2frxv6faklrseb@awork3.anarazel.de	2023-04-06 10:36:30 -07:00
Andres Freund	fcdda1e4b5	Use ExtendBufferedRelTo() in {vm,fsm}_extend() This uses ExtendBufferedRelTo(), introduced in `31966b151e`, to extend the visibilitymap and freespacemap to the size needed. It also happens to fix a warning introduced in `3d6a98457d`, reported by Tom Lane. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/2194723.1680736788@sss.pgh.pa.us	2023-04-05 17:50:09 -07:00
Andres Freund	3d6a98457d	Don't initialize page in {vm,fsm}_extend(), not needed The read path needs to be able to initialize pages anyway, as relation extensions are not durable. By avoiding initializing pages, we can, in a future patch, extend the relation by multiple blocks at once. Using smgrextend() for {vm,fsm}_extend() is not a good idea in general, as at least one page of the VM/FSM will be read immediately after, always causing a cache miss, requiring us to read content we just wrote. Discussion: https://postgr.es/m/20230301223515.pucbj7nb54n4i4nv@awork3.anarazel.de	2023-04-05 08:19:39 -07:00
Peter Geoghegan	a349b86603	Move heaprel struct field next to index rel field. Commit `61b313e4` added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com	2023-04-03 11:01:11 -07:00
Alexander Korotkov	2b65bf046d	Revert `11470f544e` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:54:31 +03:00
David Rowley	4830f10243	Disable vacuum's use of a buffer access strategy during failsafe Traditionally, vacuum always makes use of a buffer access strategy 32 buffers in size. This means that running vacuums tend not to cause too many shared buffers to become dirty, however, this can cause vacuums to run much more slowly than they otherwise could as WAL flushes will occur more frequently due to having to flush WAL out to the LSN of the dirty page before that page can be written to disk. When we are performing failsafe VACUUMs (as added in `1e55e7d17`), we really want to make the vacuum work go as quickly as possible, so here we disable the buffer access strategy when entering failsafe mode while vacuuming a relation. Per idea and analyis from Andres Freund. In passing, also include some changes I had intended for `32fbe0239`. Author: Melanie Plageman Reviewed-by: Justin Pryzby, David Rowley Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de	2023-04-03 23:05:58 +12:00
Andres Freund	6af1793954	Add info in WAL records in preparation for logical slot conflict handling This commit only implements one prerequisite part for allowing logical decoding. The commit message contains an explanation of the overall design, which later commits will refer back to. Overall design: 1. We want to enable logical decoding on standbys, but replay of WAL from the primary might remove data that is needed by logical decoding, causing error(s) on the standby. To prevent those errors, a new replication conflict scenario needs to be addressed (as much as hot standby does). 2. Our chosen strategy for dealing with this type of replication slot is to invalidate logical slots for which needed data has been removed. 3. To do this we need the latestRemovedXid for each change, just as we do for physical replication conflicts, but we also need to know whether any particular change was to data that logical replication might access. That way, during WAL replay, we know when there is a risk of conflict and, if so, if there is a conflict. 4. We can't rely on the standby's relcache entries for this purpose in any way, because the startup process can't access catalog contents. 5. Therefore every WAL record that potentially removes data from the index or heap must carry a flag indicating whether or not it is one that might be accessed during logical decoding. Why do we need this for logical decoding on standby? First, let's forget about logical decoding on standby and recall that on a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed. This is done thanks to the catalog_xmin associated with the logical replication slot. But, with logical decoding on standby, in the following cases: - hot_standby_feedback is off - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it) Then, the primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). So, it’s mandatory to identify those rows and invalidate the slots that may need them if any. Identifying those rows is the purpose of this commit. Implementation: When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field isCatalogRel in such WAL records (and a new bit set in the xl_heap_visible flags field), that is true for catalog tables, so as to arrange for conflict handling. The affected WAL records are the ones that already contain the snapshotConflictHorizon field, namely: - gistxlogDelete - gistxlogPageReuse - xl_hash_vacuum_one_page - xl_heap_prune - xl_heap_freeze_page - xl_heap_visible - xl_btree_reuse_page - xl_btree_delete - spgxlogVacuumRedirect Due to this new field being added, xl_hash_vacuum_one_page and gistxlogDelete do now contain the offsets to be deleted as a FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment. It's not needed on the others struct where isCatalogRel has been added. This commit just introduces the WAL format changes mentioned above. Handling the actual conflicts will follow in future commits. Bumps XLOG_PAGE_MAGIC as the several WAL records are changed. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2023-04-02 12:32:19 -07:00
Andres Freund	61b313e47e	Pass down table relation into more index relation functions This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com	2023-04-01 20:18:29 -07:00
Andres Freund	a88a18b125	Assert only valid flag bits are passed to visibilitymap_set() If visibilitymap_set() is called with flags containing a higher bit than VISIBILITYMAP_ALL_FROZEN, the state of neighboring pages is affected. While there was an assertion that some valid bits were set, it did not check that only valid bits were. Change that. Discussion: https://postgr.es/m/20230331043300.gux3s5wzrapqi4oe@awork3.anarazel.de	2023-04-01 18:00:19 -07:00
Andres Freund	14f98e0af9	hio: Release extension lock before initializing page / pinning VM PageInit() while holding the extension lock is unnecessary after `0d1fe9f74e` started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we release the page lock. PageInit() zeroes the page, which isn't that cheap, so deferring it until after the extension lock is released seems like a good idea. Doing visibilitymap_pin() while holding the extension lock, introduced in `7db0cd2145`, looks like an accident. Due to the restrictions on HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems better to move it out. We also are doing the visibilitymap_pin() while holding the buffer lock, which will be fixed in a separate commit. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-01 17:50:18 -07:00
Peter Geoghegan	ae4fdde135	Count updates that move row to a new page. Add pgstat counter to track row updates that result in the successor version going to a new heap page, leaving behind an original version whose t_ctid points to the new version. The current count is shown by the n_tup_newpage_upd column of each of the pg_stat_*_tables views. The new n_tup_newpage_upd column complements the existing n_tup_hot_upd and n_tup_upd columns. Tables that have high n_tup_newpage_upd values (relative to n_tup_upd) are good candidates for tuning heap fillfactor. Corey Huinker, with small tweaks by me. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com	2023-03-23 11:16:17 -07:00
Alexander Korotkov	11470f544e	Allow locking updated tuples in tuple_update() and tuple_delete() Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This patch eliminates step 2 by taking the lock during first tuple_update()/tuple_delete() call. Heap table access method saves some efforts by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2023-03-23 00:26:59 +03:00
Tomas Vondra	19d8e2308b	Ignore BRIN indexes when checking for HOT updates When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed by block summarizing indexes without references to individual tuples that need to be cleaned up. A new type TU_UpdateIndexes provides a signal to the executor to determine which indexes to update - no indexes, all indexes, or only the summarizing indexes. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. This was originally committed as `5753d4ee32`, but then got reverted by `e3fcca0d0d` because of correctness issues. Original patch by Josef Simanek, various fixes and improvements by Tomas Vondra and me. Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra Reviewed-by: Tomas Vondra, Alvaro Herrera Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com	2023-03-20 11:02:42 +01:00
Tom Lane	462bb7f128	Remove bms_first_member(). This function has been semi-deprecated ever since we invented bms_next_member(). Its habit of scribbling on the input bitmapset isn't great, plus for sufficiently large bitmapsets it would take O(N^2) time to complete a loop. Now we have the additional problem that reducing the input to empty while leaving it still accessible would violate a planned invariant. So let's just get rid of it, after updating the few extant callers to use bms_next_member(). Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us	2023-03-02 11:34:29 -05:00
Peter Eisentraut	b9f0e54bc9	Update types in smgr API Change data buffer to void , from char , and add const where appropriate. This makes it match the File API (see also `2d4f1ba6cf`) and stdio. Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com	2023-02-27 07:47:46 +01:00
David Rowley	cfcf56f923	More refactoring of heapgettup() and heapgettup_pagemode() Here we further simplify the code in heapgettup() and heapgettup_pagemode() to make better use of the helper functions added in the previous recent refactors in this area. In passing, remove an unneeded cast added in `8ca6d49f6`. Author: Melanie Plageman Reviewed-by: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com	2023-02-07 17:24:07 +13:00
David Rowley	7ae0ab0ad9	Reduce code duplication between heapgettup and heapgettup_pagemode The code to get the next block number was exactly the same between these two functions, so let's just put it into a helper function and call that from both locations. Author: Melanie Plageman Reviewed-by: Andres Freund, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com	2023-02-03 16:20:43 +13:00
David Rowley	8ca6d49f63	Add helper functions to simplify heapgettup code Here we add heapgettup_start_page() and heapgettup_continue_page() to simplify the code in the heapgettup() function. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com	2023-02-03 12:38:42 +13:00
David Rowley	f9bc34fcb6	Further refactor of heapgettup and heapgettup_pagemode Backward and forward scans share much of the same page acquisition code. Here we consolidate that code to reduce some duplication. Additionally, add a new rs_coffset field to HeapScanDescData to track the offset of the current tuple. The new field fits nicely into the padding between a bool and BlockNumber field and saves having to look at the last returned tuple to figure out which offset we should be looking at for the current tuple. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com	2023-02-03 11:48:39 +13:00
David Rowley	fb1a59de0c	Refactor heapam.c adding heapgettup_initial_block function Here we adjust heapgettup() and heapgettup_pagemode() to move the code that fetches the first block number to scan out into a helper function. This removes some code duplication. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com	2023-02-02 14:17:15 +13:00
David Rowley	e9aaf06328	Remove dead NoMovementScanDirection code Here remove some dead code from heapgettup() and heapgettup_pagemode() which was trying to support NoMovementScanDirection scans. This code can never be reached as standard_ExecutorRun() never calls ExecutePlan with NoMovementScanDirection. Additionally, plans which were scanning an unordered index would use NoMovementScanDirection rather than ForwardScanDirection. There was no real need for this, so here we adjust this so we use ForwardScanDirection for unordered index scans. A comment in pathnodes.h claimed that NoMovementScanDirection was used for PathKey reasons, but if that was true, it no longer is, per code in build_index_paths(). This does change the non-text format of the EXPLAIN output so that unordered index scans now have a "Forward" scan direction rather than "NoMovement". The text format of EXPLAIN has not changed. Author: Melanie Plageman Reviewed-by: Tom Lane, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com	2023-02-01 10:52:41 +13:00
Peter Geoghegan	6c6b497266	Revert "Add eager and lazy freezing strategies to VACUUM." This reverts commit `4d41799261`. Broad concerns about regressions caused by eager freezing strategy have been raised. Whether or not these concerns can be worked through in any time frame is far from certain. Discussion: https://postgr.es/m/20230126004347.gepcmyenk2csxrri@awork3.anarazel.de	2023-01-25 22:22:27 -08:00
Peter Geoghegan	4d41799261	Add eager and lazy freezing strategies to VACUUM. Eager freezing strategy avoids large build-ups of all-visible pages. It makes VACUUM trigger page-level freezing whenever doing so will enable the page to become all-frozen in the visibility map. This is useful for tables that experience continual growth, particularly strict append-only tables such as pgbench's history table. Eager freezing significantly improves performance stability by spreading out the cost of freezing over time, rather than doing most freezing during aggressive VACUUMs. It complements the insert autovacuum mechanism added by commit `b07642db`. VACUUM determines its freezing strategy based on the value of the new vacuum_freeze_strategy_threshold GUC (or reloption) with logged tables. Tables that exceed the size threshold use the eager freezing strategy. Unlogged tables and temp tables always use eager freezing strategy, since the added cost is negligible there. Non-permanent relations won't incur any extra overhead in WAL written (for the obvious reason), nor in pages dirtied (since any extra freezing will only take place on pages whose PD_ALL_VISIBLE bit needed to be set either way). VACUUM uses lazy freezing strategy for logged tables that fall under the GUC size threshold. Page-level freezing triggers based on the criteria established in commit `1de58df4`, which added basic page-level freezing. Eager freezing is strictly more aggressive than lazy freezing. Settings like vacuum_freeze_min_age still get applied in just the same way in every VACUUM, independent of the strategy in use. The only mechanical difference between eager and lazy freezing strategies is that only the former applies its own additional criteria to trigger freezing pages. Note that even lazy freezing strategy will trigger freezing whenever a page happens to have required that an FPI be written during pruning, provided that the page will thereby become all-frozen in the visibility map afterwards (due to the FPI optimization from commit `1de58df4`). The vacuum_freeze_strategy_threshold default setting is 4GB. This is a relatively low setting that prioritizes performance stability. It will be reviewed at the end of the Postgres 16 beta period. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com	2023-01-25 14:15:38 -08:00
David Rowley	009dbdea02	Run pgindent on heapam.c An upcoming patch by Melanie Plageman does some refactoring work in this area. Run pgindent on that file now before making any changes so that it's easier to maintain/evolve each of the individual patches doing the refactor work. Additionally, add a few new required typedefs to the list to make it easier to do future pgindent runs on this file during the refactor work. Discussion: https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com	2023-01-23 23:08:38 +13:00
Peter Geoghegan	980ae17310	Tighten up VACUUM's approach to setting VM bits. Tighten up the way that visibilitymap_set() is called: request that both the all-visible and all-frozen bits get set whenever the all-frozen bit is set, regardless of what we think we know about the present state of the all-visible bit. Also make sure that the page level PD_ALL_VISIBLE flag is set in the same code path. In practice there doesn't seem to be a concrete scenario in which the previous approach could lead to inconsistencies. It was almost possible in scenarios involving concurrent HOT updates from transactions that abort, but (unlike pruning) freezing can never remove XIDs > VACUUM's OldestXmin, even those from transactions that are known to have aborted. That was protective here. These issues have been around since commit `a892234f83`, which added the all-frozen bit to the VM fork. There is no known live bug here, so no backpatch. In passing, add some defensive assertions to catch the issue, and stop reading the existing state of the VM when setting the VM in VACUUM's final heap pass. We already know that affected pages must have had at least one LP_DEAD item before we set it LP_UNUSED, so there is no point in reading the VM when it is set like this. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com	2023-01-16 09:34:37 -08:00
Peter Geoghegan	d30b499997	Make lazy_vacuum_heap_rel match lazy_scan_heap. Make lazy_vacuum_heap_rel variable names match those from lazy_scan_heap where that makes sense. Extracted from a larger patch to deal with issues with how vacuumlazy.c sets pages all-frozen. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com	2023-01-11 18:45:32 -08:00
Peter Geoghegan	8c233cf86b	vacuumlazy.c: Tweak local variable name. Make a local variable name consistent with the name from its WAL record. Extracted from a larger patch to deal with issues with how vacuumlazy.c sets pages all-frozen. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com	2023-01-11 17:57:18 -08:00
Peter Geoghegan	50767705ed	Rename and relocate freeze plan dedup routines. Rename the heapam.c freeze plan deduplication routines added by commit `9e540599` to names that follow conventions for functions in heapam.c. Also relocate the functions so that they're next to their caller, which runs during original execution, when FREEZE_PAGE WAL records are built. The routines were initially placed next to (and followed the naming conventions of) conceptually related REDO routine code, but that scheme turned out to be kind of jarring when considered in a wider context. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230109214308.icz26oqvt3k2274c@awork3.anarazel.de	2023-01-11 17:30:42 -08:00
Peter Geoghegan	af3855cb77	Improve TransactionIdDidAbort() documentation. Document that TransactionIdDidAbort() won't indicate that transactions that were in-progress during a crash have aborted. Tie this to existing discussion of the TransactionIdDidCommit() and TransactionIdDidCommit() protocol that code in heapam_visibility.c (and a few other places) must observe. Follow-up to bugfix commit `eb5ad4ff`. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzn4bEEqgmaUQL3aJ73yM9gAeK-wE4ngi7kjRjLztb+P0w@mail.gmail.com	2023-01-11 15:31:42 -08:00
Peter Geoghegan	eb5ad4ff05	Check that xmax didn't commit in freeze check. We cannot rely on TransactionIdDidAbort here, since in general it may report transactions that were in-progress at the time of an earlier hard crash as not aborted, effectively behaving as if they were still in progress even after crash recovery completes. Go back to defensively verifying that xmax didn't commit instead. Oversight in commit `79d4bf4e`. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230104035636.hy5djyr2as4gbc4q@awork3.anarazel.de	2023-01-03 21:48:27 -08:00
Peter Geoghegan	54afdcd618	vacuumlazy.c: Save get_database_name() in vacrel. This brings dbname strings in line with namespace and relation name strings. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkQ1TKU-DdNvnGeL870di3+CU1UTo-7nw7xFDpVE-XGjA@mail.gmail.com	2023-01-03 11:48:47 -08:00
Peter Geoghegan	79d4bf4eff	Delay commit status checks until freezing executes. pg_xact lookups are relatively expensive. Move the xmin/xmax commit status checks from the point that freeze plans are prepared to the point that they're actually executed. Otherwise we'll repeat many commit status checks whenever multiple successive VACUUM operations scan the same pages and decide against freezing each time, which is a waste of cycles. Oversight in commit `1de58df4`, which added page-level freezing. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkZpe4K6qMfEt8H4qYJCKc2R7TPvKsBva7jc9w7iGXQSw@mail.gmail.com	2023-01-03 11:22:36 -08:00
Peter Geoghegan	b37a083239	Refine the definition of page-level freezing. Improve comments added by commit `1de58df4` which describe the lazy_scan_prune "freeze the page" path. These newly revised comments are based on suggestions from Jeff Davis. In passing, remove nearby visibility_cutoff_xid comments left over from commit `6daeeb1f`. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/ebc857107fe3edd422ef8a65191ca4a8da568b9b.camel@j-davis.com	2023-01-03 10:08:55 -08:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Peter Geoghegan	325bc54eed	Adjust VACUUM hastup LP_REDIRECT comments. The term "truncation" has been ambiguous since commit `10a8d13823` added line pointer array truncation during heap pruning. Clear things up by specifying that we're talking about rel truncation here, to match nearby comments that apply to tuples with storage.	2023-01-02 10:18:22 -08:00
Peter Geoghegan	6daeeb1f91	Avoid special XID snapshotConflictHorizon values. Don't allow VACUUM to WAL-log the value FrozenTransactionId as the snapshotConflictHorizon of freezing or visibility map related WAL records. The only special XID value that's an allowable snapshotConflictHorizon is InvalidTransactionId, which is interpreted as "record definitely doesn't require a recovery conflict". Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznuNGSzF8v6OsgjaC5aYsb3cZ6HW6MLm30X0d65cmSH6A@mail.gmail.com	2023-01-02 10:16:51 -08:00
Peter Eisentraut	e351f85418	Push lpp variable closer to usage in heapgetpage() Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com	2023-01-02 09:39:31 +01:00
Peter Geoghegan	1de58df4fe	Add page-level freezing to VACUUM. Teach VACUUM to decide on whether or not to trigger freezing at the level of whole heap pages. Individual XIDs and MXIDs fields from tuple headers now trigger freezing of whole pages, rather than independently triggering freezing of each individual tuple header field. Managing the cost of freezing over time now significantly influences when and how VACUUM freezes. The overall amount of WAL written is the single most important freezing related cost, in general. Freezing each page's tuples together in batch allows VACUUM to take full advantage of the freeze plan WAL deduplication optimization added by commit `9e540599`. Also teach VACUUM to trigger page-level freezing whenever it detects that heap pruning generated an FPI. We'll have already written a large amount of WAL just to do that much, so it's very likely a good idea to get freezing out of the way for the page early. This only happens in cases where it will directly lead to marking the page all-frozen in the visibility map. In most cases "freezing a page" removes all XIDs < OldestXmin, and all MXIDs < OldestMxact. It doesn't quite work that way in certain rare cases involving MultiXacts, though. It is convenient to define "freeze the page" in a way that gives FreezeMultiXactId the leeway to put off the work of processing an individual tuple's xmax whenever it happens to be a MultiXactId that would require an expensive second pass to process aggressively (allocating a new multi is especially worth avoiding here). FreezeMultiXactId is eager when processing is cheap (as it usually is), and lazy in the event of an individual multi that happens to require expensive second pass processing. This avoids regressions related to processing of multis that page-level freezing might otherwise cause. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com	2022-12-28 08:50:47 -08:00
Peter Geoghegan	63c844a0a5	Remove overzealous MultiXact freeze assertion. When VACUUM determines that an existing MultiXact should use a freeze plan that sets xmax to InvalidTransactionId, the original Multi may or may not be before OldestMxact. Remove an incorrect assertion that expected it to always be from before OldestMxact. Oversight in commit 4ce3af. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/TYAPR01MB5866B24104FD80B5D7E65C3EF5ED9@TYAPR01MB5866.jpnprd01.prod.outlook.com	2022-12-26 23:36:02 -08:00
Peter Geoghegan	07eef53955	Use scanned_pages to decide when to failsafe check. Perform a failsafe check every time VACUUM's first heap scan scans a further FAILSAFE_EVERY_PAGES pages, rather than using an approach based on the number of physical blocks that our current blkno is from the blkno at the time of the previous failsafe check. That way VACUUM will perform a failsafe check every time it has scanned a uniform number of pages, without it mattering when or how VACUUM skipped pages using the visibility map. Sami Imseih, with changes to FAILSAFE_EVERY_PAGES comments added by me. Author: Sami Imseih <simseih@amazon.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/401CE010-4049-4B94-9961-0B610A5D254D%40amazon.com	2022-12-22 10:41:40 -08:00
Peter Geoghegan	4ce3afb82e	Refactor how VACUUM passes around its XID cutoffs. Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com	2022-12-22 09:37:59 -08:00
Andrew Dunstan	8284cf5f74	Add copyright notices to meson files Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net	2022-12-20 07:54:39 -05:00
Peter Eisentraut	75f49221c2	Static assertions cleanup Because we added StaticAssertStmt() first before StaticAssertDecl(), some uses as well as the instructions in c.h are now a bit backwards from the "native" way static assertions are meant to be used in C. This updates the guidance and moves some static assertions to better places. Specifically, since the addition of StaticAssertDecl(), we can put static assertions at the file level. This moves a number of static assertions out of function bodies, where they might have been stuck out of necessity, to perhaps better places at the file level or in header files. Also, when the static assertion appears in a position where a declaration is allowed, then using StaticAssertDecl() is more native than StaticAssertStmt(). Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com	2022-12-15 10:10:32 +01:00
Michael Paquier	83a1a1b566	Generate pg_stat_get*() functions for tables using macros The same code pattern is repeated 17 times for int64 counters (0 for missing entry) and 5 times for timestamps (NULL for missing entry) on table entries. This code is switched to use a macro for the basic code instead, shaving a few hundred lines of originally-duplicated code. The function names remain the same, but some fields of PgStat_StatTabEntry have to be renamed to cope with the new style. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13	2022-12-06 10:46:35 +09:00
Peter Geoghegan	b6074846ce	Simplify vacuum_set_xid_limits() signature. Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits() directly, rather than passing most individual VacuumParams fields as separate arguments. Also make vacuum_set_xid_limits() output parameter symbol names match those used by its vacuumlazy.c caller. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com	2022-11-23 11:10:06 -08:00
Peter Geoghegan	02d647bbf0	Don't test HEAP_XMAX_INVALID when freezing xmax. We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in t_infomask when considering whether or not an xmax should be deemed already frozen, since that status flag is just a hint. The only acceptable representation for an "xmax_already_frozen" raw xmax field is the transaction ID value zero (also known as InvalidTransactionId). Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to make the rule about xmax_already_frozen clear. Also avoid needlessly rereading the tuple's raw xmax. Oversight in bugfix commit `d2599ecf`. There is no evidence that this ever led to incorrect behavior, so no backpatch. The worst consequence of this bug was that VACUUM could hypothetically fail to notice and report on certain kinds of corruption, which seems fairly benign. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com	2022-11-23 10:49:39 -08:00
Peter Geoghegan	1489b1ce72	Standardize rmgrdesc recovery conflict XID output. Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit `9e540599` that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com	2022-11-17 14:55:08 -08:00
Peter Geoghegan	813492dacc	Use correct type name in comments about freezing. Oversight in commit `9e540599`, which added freeze plan deduplication.	2022-11-17 09:34:12 -08:00
Peter Eisentraut	8e1db29cdb	Variable renaming in preparation for refactoring Rename page -> block and dp -> page where appropriate. The old naming mixed up block and page in confusing ways. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com	2022-11-16 16:40:34 +01:00
Peter Eisentraut	d1cb4e9f92	Remove useless casts Maybe these are left from when PageGetItem() was a macro, but now they are clearly useless.	2022-11-16 16:01:06 +01:00
Peter Eisentraut	4eb3b11200	Turn HeapKeyTest macro into inline function It is easier to read as a function. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com	2022-11-16 13:26:48 +01:00
Peter Geoghegan	9e5405993c	Deduplicate freeze plans in freeze WAL records. Make heapam WAL records that describe freezing performed by VACUUM more space efficient by storing each distinct "freeze plan" once, alongside an array of associated page offset numbers (one per freeze plan). The freeze plans required for most heap pages tend to naturally have a great deal of redundancy, so this technique is very effective in practice. It often leads to freeze WAL records that are less than 20% of the size of equivalent WAL records generated using the previous approach. The freeze plan concept was introduced by commit `3b97e6823b`, which fixed bugs in VACUUM's handling of MultiXacts. We retain the concept of freeze plans, but go back to using page offset number arrays. There is no loss of generality here because deduplication is an additive process that gets applied mechanically when FREEZE_PAGE WAL records are built. More than anything else, freeze plan deduplication is an optimization that reduces the marginal cost of freezing additional tuples on pages that will need to have at least one or two tuples frozen in any case. Ongoing work that adds page-level freezing to VACUUM will take full advantage of the improved cost profile through batching. Also refactor some of the details surrounding recovery conflicts needed to REDO freeze records in passing: make original execution responsible for generating a standard latestRemovedXid cutoff, rather than working backwards to get the same cutoff in the REDO routine. Bugfix commit `66fbcb0d2e` did it the other way around, which is equivalent but obscures what's going on. Also rename the cutoff field from the WAL record/struct (rename the field cutoff_xid to latestRemovedXid to match similar WAL records). Processing of conflicts by REDO routines is already completely uniform, so tools like pg_waldump should present the information driving the process uniformly. There are two remaining WAL record types that still don't quite follow this convention (heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type). They can be brought into line by later work that totally standardizes how the cutoffs are presented. Bump XLOG_PAGE_MAGIC. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com	2022-11-15 07:48:41 -08:00
Jeff Davis	97c61f70d1	Document WAL rules related to PD_ALL_VISIBLE in README. Also improve comments. Discussion: https://postgr.es/m/a50005c1c537f89bb359057fd70e66bb83bce969.camel@j-davis.com Reviewed-by: Peter Geoghegan	2022-11-12 08:37:50 -08:00
Jeff Davis	d6a3dbe14f	Fix theoretical torn page hazard. The original report was concerned with a possible inconsistency between the heap and the visibility map, which I was unable to confirm. The concern has been retracted. However, there did seem to be a torn page hazard when using checksums. By not setting the heap page LSN during redo, the protections of minRecoveryPoint were bypassed. Fixed, along with a misleading comment. It may have been impossible to hit this problem in practice, because it would require a page tear between the checksum and the flags, so I am marking this as a theoretical risk. But, as discussed, it did violate expectations about the page LSN, so it may have other consequences. Backpatch to all supported versions. Reported-by: Konstantin Knizhnik Reviewed-by: Konstantin Knizhnik Discussion: https://postgr.es/m/fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru Backpatch-through: 11	2022-11-11 12:38:29 -08:00
Jeff Davis	3eb8eeccbe	Remove obsolete comments and code from prior to `f8f4227976`. XLogReadBufferForRedo() and XLogReadBufferForRedoExtended() only return BLK_NEEDS_REDO if the record LSN is greater than the page LSN, so the redo routine doesn't need to do the LSN check again. Discussion: https://postgr.es/m/0c37b80e62b1f3007d5a6d1292bd8fa0c275627a.camel@j-davis.com	2022-11-11 08:49:30 -08:00
Andres Freund	c3652cd84a	Remove redundant breaks in HeapTupleSatisfiesVisibility Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_ZJg_N7zHtWP+JoSY_hrce4+GKioL137Y2c2En-kuXQ7g@mail.gmail.com	2022-11-05 01:31:17 -07:00
Michael Paquier	2a71de8915	Remove unneeded includes of <sys/stat.h> Since `bfb9dfd`, none of the files updated in this commit have any stat() calls, so these inclusions are not necessary, for the same reasons as `233cf6e`. Per discussion with John Naylor. Discussion: https://postgr.es/m/CAFBsxsGGGX7KD6RxbNoSJzuSc8Gz3hOxcfhTOMLB_hJcm68dKQ@mail.gmail.com	2022-11-05 12:31:28 +09:00
David Rowley	7c335b7a20	Add doubly linked count list implementation We have various requirements when using a dlist_head to keep track of the number of items in the list. This, traditionally, has been done by maintaining a counter variable in the calling code. Here we tidy this up by adding "dclist", which is very similar to dlist but also keeps track of the number of items stored in the list. Callers may use the new dclist_count() function when they need to know how many items are stored. Obtaining the count is an O(1) operation. For simplicity reasons, dclist and dlist both use dlist_node as their node type and dlist_iter/dlist_mutable_iter as their iterator type. dclists have all of the same functionality as dlists except there is no function named dclist_delete(). To remove an item from a list dclist_delete_from() must be used. This requires knowing which dclist the given item is stored in. Additionally, here we also convert some dlists where additional code exists to keep track of the number of items stored and to make these use dclists instead. Author: David Rowley Reviewed-by: Bharath Rupireddy, Aleksander Alekseev Discussion: https://postgr.es/m/CAApHDvrtVxr+FXEX0VbViCFKDGxA3tWDgw9oFewNXCJMmwLjLg@mail.gmail.com	2022-11-02 14:06:05 +13:00
Peter Eisentraut	b1099eca8f	Remove AssertArg and AssertState These don't offer anything over plain Assert, and their usage had already been declared obsolescent. Author: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13	2022-10-28 09:19:06 +02:00
David Rowley	d37aa3d358	Allow nodeSort to perform Datum sorts for byref types Here we add a new 'copy' parameter to tuplesort_getdatum so that we can instruct the function not to datumCopy() byref Datums before returning. Similar to `91e9e89dc`, this can provide significant performance improvements in nodeSort when sorting by a single byref column and the sort's targetlist contains only that column. This allows us to re-enable Datum sorts for byref types which was disabled in `3a5817695` due to a reported memory leak. Additionally, here we slightly optimize DISTINCT aggregates so that we no longer perform any datumCopy() when we find the current value not to be distinct from the previous value. Previously the code would always take a copy of the most recent Datum and pfree the previous value, even when the values were the same. Testing shows a small but noticeable performance increase when aggregate transitions are skipped due to the current transition value being the same as the prior one. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqS6wC5U==k9Hd26E4EQXH3QR67-T4=Q1rQ36NGvjfVSg@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvqHonfe9G1cVaKeHbDx70R_zCrM3qP2AGXpGrieSKGnhA@mail.gmail.com	2022-10-28 09:25:12 +13:00
David Rowley	2d0bbedda7	Rename shadowed local variables In a similar effort to `f01592f91`, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com	2022-10-05 21:01:41 +13:00
Tom Lane	2dc2e4e31a	Avoid improbable PANIC during heap_update, redux. Commit `34f581c39` intended to ensure that RelationGetBufferForTuple would acquire a visibility-map page pin in case the otherBuffer's all-visible bit had become set since we last had lock on that page. But I missed a case: when we're extending the relation, VM concerns were dealt with only in the relatively-less-likely case that we fail to conditionally lock the otherBuffer. I think I'd believed that we couldn't need to worry about it if the conditional lock succeeds, which is true for the target buffer; but the otherBuffer was unlocked for awhile so its bit might be set anyway. So we need to do the GetVisibilityMapPins dance, and then also recheck the page's free space, in both cases. Per report from Jaime Casanova. Back-patch to v12 as the previous patch was (although there's still no evidence that the bug is reachable pre-v14). Discussion: https://postgr.es/m/E1lWLjP-00006Y-Ml@gemulon.postgresql.org	2022-09-30 19:37:00 -04:00
Thomas Munro	b6d8a60aba	Restore pg_pread and friends. Commits `cf112c12` and `a0dc8271` were a little too hasty in getting rid of the pg_ prefixes where we use pread(), pwrite() and vectored variants. We dropped support for ancient Unixes where we needed to use lseek() to implement replacements for those, but it turns out that Windows also changes the current position even when you pass in an offset to ReadFile() and WriteFile() if the file handle is synchronous, despite its documentation saying otherwise. Switching to asynchronous file handles would fix that, but have other complications. For now let's just put back the pg_ prefix and add some comments to highlight the non-standard side-effect, which we can now describe as Windows-only. Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13	2022-09-29 13:12:11 +13:00
Jeff Davis	163b0993a1	Fix race condition where heap_delete() fails to pin VM page. Similar to `5f12bc94dc`, the code must re-check PageIsAllVisible() after buffer lock is re-acquired. Backpatching to the same version, 12. Discussion: https://postgr.es/m/CAEP4nAw9jYQDKd_5Y+-s2E4YiUJq1vqiikFjYGpLShtp-K3gag@mail.gmail.com Reported-by: Robins Tharakan Reviewed-by: Robins Tharakan Backpatch-through: 12	2022-09-22 11:04:00 -07:00
Andres Freund	e6927270cd	meson: Add initial version of meson based build system Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de	2022-09-21 22:37:17 -07:00
Peter Geoghegan	4bac9600f0	Harmonize heapam and tableam parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com	2022-09-19 16:46:23 -07:00
Peter Geoghegan	d977ffd923	Instrument freezing in autovacuum log reports. Add a new line to log reports from autovacuum (as well as VACUUM VERBOSE output) that shows information about freezing. Emphasis is placed on the total number of heap pages that had one or more tuples frozen by VACUUM. The total number of tuples frozen is also shown. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Janes <jeff.janes@gmail.com> Discussion: https://postgr.es/m/CAH2-WznTY6D0zyE8VLrC6Gd4kh_HGAXxnTPtcOQOOsxzLx9zog@mail.gmail.com	2022-09-08 10:29:39 -07:00
Michael Paquier	bfb9dfd937	Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat() Several backend-side loops scanning one or more directories with ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory copy, temporary file removal, configuration file parsing, some logical decoding logic and some pgtz stuff) already know the type of the entry being scanned thanks to the dirent structure associated to the entry, on platforms where we know about DT_REG, DT_DIR and DT_LNK to make the difference between a regular file, a directory and a symbolic link. Relying on the direct structure of an entry saves a few system calls to stat() and lstat() in the loops updated here, shaving some code while on it. The logic of the code remains the same, calling stat() or lstat() depending on if it is necessary to look through symlinks. Authors: Nathan Bossart, Bharath Rupireddy Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com	2022-09-02 16:58:06 +09:00
Peter Geoghegan	c3ffa731a5	Derive freeze cutoff from nextXID, not OldestXmin. Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs to freeze were determined at the start of each VACUUM by taking related cutoffs that represent which XIDs/MXIDs VACUUM should treat as still running, and subtracting an XID/MXID age based value controlled by GUCs like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff) was derived by subtracting an XID age value from OldestXmin, while the MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting an MXID age value from OldestMxact. This approach didn't match the approach used nearby to determine whether this VACUUM operation should be an aggressive VACUUM or not. VACUUM now uses the standard approach instead: it subtracts the same age-based values from next XID/next MXID (rather than subtracting from OldestXmin/OldestMxact). This approach is simpler and more uniform. Most of the time it will have only a negligible impact on how and when VACUUM freezes. It will occasionally make VACUUM more robust in the event of problems caused by long running transaction. These are cases where OldestXmin and OldestMxact are held back by so much that they attain an age that is a significant fraction of the value of age-based settings like vacuum_freeze_min_age. There is no principled reason why freezing should be affected in any way by the presence of a long-running transaction -- at least not before the point that the OldestXmin and OldestMxact limits used by each VACUUM operation attain an age that makes it unsafe to freeze some of the XIDs/MXIDs whose age exceeds the value of the relevant age-based settings. The new approach should at least make freezing degrade more gracefully than before, even in the most extreme cases. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com	2022-08-31 11:37:35 -07:00
Peter Geoghegan	9887dd38f9	Adjust comments that called MultiXactIds "XMIDs". Oversights in commits `0b018fab` and `f3c15cbe`.	2022-08-29 19:42:30 -07:00

1 2 3 4 5 ...

1143 Commits