postgresql

Commit Graph

Author	SHA1	Message	Date
Peter Geoghegan	c59278a1aa	Fix parallel amvacuumcleanup safety bug. Commit `b4af70cb` inverted the return value of the function parallel_processing_is_safe(), but missed the amvacuumcleanup test. Index AMs that don't support parallel cleanup at all were affected. The practical consequences of this bug were not very serious. Hash indexes are affected, but since they just return the number of blocks during hashvacuumcleanup anyway, it can't have had much impact. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA-Em+aeVPmBbL_s1V-ghsJQSxYL-i3JP8nTfPiD1wjKw@mail.gmail.com Backpatch: 14-, where commit `b4af70cb` appears.	2021-11-02 19:52:11 -07:00
Peter Geoghegan	9bacec15b6	Don't overlook indexes during parallel VACUUM. Commit `b4af70cb`, which simplified state managed by VACUUM, performed refactoring of parallel VACUUM in passing. Confusion about the exact details of the tasks that the leader process is responsible for led to code that made it possible for parallel VACUUM to miss a subset of the table's indexes entirely. Specifically, indexes that fell under the min_parallel_index_scan_size size cutoff were missed. These indexes are supposed to be vacuumed by the leader (alongside any parallel unsafe indexes), but weren't vacuumed at all. Affected indexes could easily end up with duplicate heap TIDs, once heap TIDs were recycled for new heap tuples. This had generic symptoms that might be seen with almost any index corruption involving structural inconsistencies between an index and its table. To fix, make sure that the parallel VACUUM leader process performs any required index vacuuming for indexes that happen to be below the size cutoff. Also document the design of parallel VACUUM with these below-size-cutoff indexes. It's unclear how many users might be affected by this bug. There had to be at least three indexes on the table to hit the bug: a smaller index, plus at least two additional indexes that themselves exceed the size cutoff. Cases with just one additional index would not run into trouble, since the parallel VACUUM cost model requires two larger-than-cutoff indexes on the table to apply any parallel processing. Note also that autovacuum was not affected, since it never uses parallel processing. Test case based on tests from a larger patch to test parallel VACUUM by Masahiko Sawada. Many thanks to Kamigishi Rei for her invaluable help with tracking this problem down. Author: Peter Geoghegan <pg@bowt.ie> Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp> Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk> Diagnosed-By: Andres Freund <andres@anarazel.de> Bug: #17245 Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de Backpatch: 14-, where the refactoring commit appears.	2021-11-02 12:06:17 -07:00
Peter Geoghegan	5f55fc5a34	Demote pg_unreachable() in heapam to an assertion. Commit `d168b66682`, which overhauled index deletion, added a pg_unreachable() to the end of a sort comparator used when sorting heap TIDs from an index page. This allows the compiler to apply optimizations that assume that the heap TIDs from the index AM must always be unique. That doesn't seem like a good idea now, given recent reports of corruption involving duplicate TIDs in indexes on Postgres 14. Demote to an assertion, just in case. Backpatch: 14-, where index deletion was overhauled.	2021-10-29 10:53:48 -07:00
Daniel Gustafsson	941921b875	Replace occurrences of InvalidXid with InvalidTransactionId While Xid is a known shortening of TransactionId, InvalidXid is not defined in the code. Fix comments which mistakenly were using the shorter version. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACUQzdigML868nV4cojfELPkEzNLNOk7b91Pho4JB90fng@mail.gmail.com	2021-10-04 10:31:01 +02:00
Peter Geoghegan	c1a47dfe2e	vacuumlazy.c: Remove obsolete 'onecall' comment. Remove obsolete reference to lazy_vacuum()'s onecall argument. The function argument was removed by commit `3499df0dee`. Also remove adjoining comment block that introduces the wraparound failsafe concept. Talking about the failsafe here no longer makes sense, since lazy_vacuum() (and related functions) are no longer the only place where the failsafe might be triggered. This has been the case since commit `c242baa4a8` taught VACUUM to consider triggering the failsafe mechanism during its initial heap scan.	2021-09-25 10:22:53 -07:00
Peter Geoghegan	c7aeb775df	Document issue with heapam line pointer truncation. Checking that an offset number isn't past the end of a heap page's line pointer array was just a defensive sanity check for HOT-chain traversal code before commit `3c3b8a4b`. It's etrictly necessary now, though. Add comments that reference the issue to code in heapam that needs to get it right. Per suggestion from Alexander Lakhin. Discussion: https://postgr.es/m/f76a292c-9170-1aef-91a0-59d9443b99a3@gmail.com	2021-09-22 19:21:36 -07:00
Peter Geoghegan	5e6716cde5	Remove overzealous index deletion assertion. A broken HOT chain is not an unexpected condition, even when the offset number points past the end of the page's line pointer array. heap_prune_chain() does not (and never has) treated this condition as unexpected, so derivative code in heap_index_delete_tuples() shouldn't do so either. Oversight in commit `4228817449`. The assertion can probably only fail on Postgres 14 and master. Earlier releases don't have commit `3c3b8a4b`, which taught VACUUM to truncate the line pointer array of heap pages. Backpatch all the same, just to be consistent. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/17197-9438f31f46705182@postgresql.org Backpatch: 12-, just like commit `4228817449`.	2021-09-20 14:26:25 -07:00
Michael Paquier	fd0625c7a9	Clean up some code using "(expr) ? true : false" All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com	2021-09-08 09:44:04 +09:00
Peter Geoghegan	b175b9cde7	VACUUM VERBOSE: Don't report "pages removed". It doesn't make any sense to report this information, since VACUUM VERBOSE reports on heap relation truncation directly. This was an oversight in commit `7ab96cf6`, which made VACUUM VERBOSE output a little more consistent with nearby autovacuum-specific log output. Adjust comments that describe how this is supposed to work in passing. Also bring truncation-related VACUUM VERBOSE output in line with the convention established for VACUUM VERBOSE output by commit `f4f4a649`. Author: Peter Geoghegan <pg@bowt.ie> Backpatch: 14-, where VACUUM VERBOSE's output changed.	2021-08-31 20:37:18 -07:00
Peter Geoghegan	6320806ac3	vacuumlazy.c: Correct prune state comment. Oversight in commit `7ab96cf6b3`.	2021-08-31 16:35:01 -07:00
Peter Geoghegan	47029f775a	Remove unneeded old_rel_pages VACUUM state field. The field hasn't been used since commit `3d351d91`, which redefined pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Also rename a local variable of the same name ("old_rel_pages"). This is used by relation truncation to represent the original relation size at the start of the ongoing VACUUM operation. Rename it to orig_rel_pages, since that's a lot clearer. (This name matches similar nearby code.)	2021-08-31 14:59:52 -07:00
Alvaro Herrera	961dd75657	Report tuple address in data-corruption error message Most data-corruption reports mention the location of the problem, but this one failed to. Add it. Backpatch all the way back. In 12 and older, also assign the ERRCODE_DATA_CORRUPTED error code as was done in commit `fd6ec93bf8` for 13 and later. Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql	2021-08-30 16:29:12 -04:00
Peter Geoghegan	bda822554b	track_io_timing logging: Don't special case 0 ms. Adjust track_io_timing related logging code added by commit `94d13d474d`. Make it consistent with other nearby autovacuum and autoanalyze logging code by removing logic that suppressed zero millisecond outputs. log_autovacuum_min_duration log output now reliably shows "read:" and "write:" millisecond-based values in its report (when track_io_timing is enabled). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Stephen Frost <sfrost@snowman.net> Discussion: https://postgr.es/m/CAH2-WznW0FNxSVQMSRazAMYNfZ6DR_gr5WE78hc6E1CBkkJpzw@mail.gmail.com Backpatch: 14-, where the track_io_timing logging was introduced.	2021-08-27 13:34:00 -07:00
Peter Geoghegan	fdfbfa24fa	Reorder log_autovacuum_min_duration log output. This order seems more natural. It starts with details that are particular to heap and index data structures, and ends with system-level costs incurred during the autovacuum worker's VACUUM/ANALYZE operation. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkzxK6ahA9xxsOftRtBX_R0swuHZsvo4QUbak1Bz7hb7Q@mail.gmail.com Backpatch: 14-, which enhanced the log output in various ways.	2021-08-27 13:08:41 -07:00
Peter Geoghegan	de5dcb0796	vacuumlazy.c: Remove unnecessary parentheses. This was arguably a minor oversight in commit `b4af70cb`, which cleaned up the function signatures of functions that modify IndexBulkDeleteResult variables.	2021-08-27 09:47:16 -07:00
Daniel Gustafsson	069d33d0c5	Emit namespace in the post-copy errmsg During a VACUUM or CLUSTER command, the initial output emits a fully qualified relation path with namespace. The post-action errmsg only emitted the relation name however, which may lead to hard to parse output when using multiple jobs with vacuumdb as the output from different jobs may be interleaved. Include the full path in the post-action errmsg to be consistent with the initial errmsg. Author: Mike Fiedler <miketheman@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CAMerE0oz+8G-aORZL_BJcPxnBqewZAvND4bSUysjz+r-oT1BxQ@mail.gmail.com	2021-08-16 20:06:54 +02:00
Peter Eisentraut	f4f4a649d8	Message style improvements	2021-08-07 12:09:37 +02:00
Heikki Linnakangas	317632f307	Move InRecovery and standbyState global vars to xlogutils.c. They are used in code that runs both during normal operation and during WAL replay, and needs to behave differently during replay. Move them to xlogutils.c, because that's where we have other helper functions used by redo routines. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/b3b71061-4919-e882-4857-27e370ab134a%40iki.fi	2021-07-31 09:50:26 +03:00
Tom Lane	f10f0ae420	Replace RelationOpenSmgr() with RelationGetSmgr(). The idea behind this patch is to design out bugs like the one fixed by commit `9d523119f`. Previously, once one did RelationOpenSmgr(rel), it was considered okay to access rel->rd_smgr directly for some not-very-clear interval. But since that pointer will be cleared by relcache flushes, we had bugs arising from overreliance on a previous RelationOpenSmgr call still being effective. Now, very little code except that in rel.h and relcache.c should ever touch the rd_smgr field directly. The normal coding rule is to use RelationGetSmgr(rel) and not expect the result to be valid for longer than one smgr function call. There are a couple of places where using the function every single time seemed like overkill, but they are now annotated with large warning comments. Amul Sul, after an idea of mine. Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com	2021-07-12 17:01:36 -04:00
Michael Paquier	70685385d7	Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncation This has the advantage to make a process more responsive when the postmaster dies, even if the wait time was rather limited as there was only a 50ms timeout here. Another advantage of this change is for monitoring, as we gain a new wait event for the end-of-vacuum truncation. Author: Bharath Rupireddy Reviewed-by: Aleksander Alekseev, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com	2021-07-02 12:58:34 +09:00
Andrew Dunstan	e1c1c30f63	Pre branch pgindent / pgperltidy run Along the way make a slight adjustment to src/include/utils/queryjumble.h to avoid an unused typedef.	2021-06-28 11:05:54 -04:00
Peter Geoghegan	e8f201ab82	Remove overzealous VACUUM failsafe assertions. The failsafe can trigger when index processing is already disabled. This can happen when VACUUM's INDEX_CLEANUP parameter is "off" and the failsafe happens to trigger. Remove assertions that assume that index processing is directly tied to the failsafe. Oversight in commit `c242baa4`, which made it possible for the failsafe to trigger in a two-pass strategy VACUUM that has yet to make its first call to lazy_vacuum_all_indexes().	2021-06-20 18:14:00 -07:00
Peter Geoghegan	3499df0dee	Support disabling index bypassing by VACUUM. Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding reloption): make it into a ternary style boolean parameter. It now exposes a third option, "auto". The "auto" option (which is now the default) enables the "bypass index vacuuming" optimization added by commit `1e55e7d1`. "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM simply do any required index vacuuming, regardless of how few dead tuples are encountered during the first scan of the target heap relation (unless there are exactly zero). This gives users a way of opting out of the "bypass index vacuuming" optimization, if for whatever reason that proves necessary. It is also expected to be used by PostgreSQL developers as a testing option from time to time. "VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it forcibly disables both index vacuuming and index cleanup. It's not expected to be used much in PostgreSQL 14. The failsafe mechanism added by commit `1e55e7d1` addresses the same problem in a simpler way. INDEX_CLEANUP can now be thought of as a testing and compatibility option. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com	2021-06-18 20:04:07 -07:00
Peter Geoghegan	958cfbcf2d	Remove unneeded field from VACUUM state. Bugfix commit `5fc89376` effectively made the lock_waiter_detected field from vacuumlazy.c's global state struct into private state owned by lazy_truncate_heap(). Finish this off by replacing the struct field with a local variable.	2021-06-15 08:59:36 -07:00
Michael Paquier	dbab0c07e5	Remove forced toast recompression in VACUUM FULL/CLUSTER The extra checks added by the recompression of toast data introduced in `bbe0a81` is proving to have a performance impact on VACUUM or CLUSTER even if no recompression is done. This is more noticeable with more toastable columns that contain non-NULL values. Improvements could be done to make those extra checks less expensive, but that's not material for 14 at this stage, and we are not sure either if the code path of VACUUM FULL/CLUSTER is adapted for this job. Per discussion with several people, including Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane and myself. Discussion: https://postgr.es/m/20210527003144.xxqppojoiwurc2iz@alap3.anarazel.de	2021-06-14 09:25:50 +09:00
Tomas Vondra	d1f0aa7696	Fix pg_visibility regression failure with CLOBBER_CACHE_ALWAYS Commit `8e03eb92e9` reverted a bit too much code, reintroducing one of the issues fixed by `39b66a91bd` - a page might have been left partially empty after relcache invalidation. Reported-By: Tom Lane Author: Masahiko Sawada Discussion: https://postgr.es/m/822752.1623032114@sss.pgh.pa.us Discussion: https://postgr.es/m/CAD21AoA%3D%3Df2VSw3c-Cp_y%3DWLKHMKc1D6s7g3YWsCOvgaYPpJcg%40mail.gmail.com	2021-06-08 19:33:11 +02:00
David Rowley	f736e188ce	Standardize usages of appendStringInfo and appendPQExpBuffer Fix a few places that were using appendStringInfo() when they should have been using appendStringInfoString(). Also some cases of appendPQExpBuffer() that would have been better suited to use appendPQExpBufferChar(), and finally, some places that used appendPQExpBuffer() when appendPQExpBufferStr() would have suited better. There are no bugs are being fixed here. The aim is just to make the code use the most optimal function for the job. All the code being changed here is new to PG14. It makes sense to fix these before we branch for PG15. There are a few other places that we could fix, but those cases are older code so fixing those seems less worthwhile as it may cause unnecessary back-patching pain in the future. Author: Hou Zhijie Discussion: https://postgr.es/m/OS0PR01MB5716732158B1C4142C6FE375943D9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2021-06-03 16:38:03 +12:00
Tomas Vondra	8e03eb92e9	Revert most of `39b66a91bd` Reverts most of commit `39b66a91bd`, which was found to cause significant regression for REFRESH MATERIALIZED VIEW. This means only rows inserted by heap_multi_insert will benefit from the optimization, implemented in commit `7db0cd2145`. Reported-by: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoA%3D%3Df2VSw3c-Cp_y%3DWLKHMKc1D6s7g3YWsCOvgaYPpJcg%40mail.gmail.com	2021-06-03 00:13:59 +02:00
Peter Geoghegan	9afdea9824	Fix VACUUM VERBOSE's LP_DEAD item pages output. Oversight in commit `5100010e`.	2021-05-27 17:09:16 -07:00
Tom Lane	e6241d8e03	Rethink definition of pg_attribute.attcompression. Redefine '\0' (InvalidCompressionMethod) as meaning "if we need to compress, use the current setting of default_toast_compression". This allows '\0' to be a suitable default choice regardless of datatype, greatly simplifying code paths that initialize tupledescs and the like. It seems like a more user-friendly approach as well, because now the default compression choice doesn't migrate into table definitions, meaning that changing default_toast_compression is usually sufficient to flip an installation's behavior; one needn't tediously issue per-column ALTER SET COMPRESSION commands. Along the way, fix a few minor bugs and documentation issues with the per-column-compression feature. Adopt more robust APIs for SetIndexStorageProperties and GetAttributeCompression. Bump catversion because typical contents of attcompression will now be different. We could get away without doing that, but it seems better to ensure v14 installations all agree on this. (We already forced initdb for beta2, anyway.) Discussion: https://postgr.es/m/626613.1621787110@sss.pgh.pa.us	2021-05-27 13:24:27 -04:00
Michael Paquier	190fa5a00a	Fix typo in heapam.c Author: Hou Zhijie Discussion: https://postgr.es/m/OS0PR01MB571612191738540B27A8DE5894249@OS0PR01MB5716.jpnprd01.prod.outlook.com	2021-05-26 19:53:03 +09:00
Michael Paquier	fb0f5f0172	Fix memory leak when de-toasting compressed values in VACUUM FULL/CLUSTER VACUUM FULL and CLUSTER can be used to enforce the use of the existing compression method of a toastable column if a value currently stored is compressed with a method that does not match the column's defined method. The code in charge of decompressing and recompressing toast values at rewrite left around the detoasted values, causing an accumulation of memory allocated in TopTransactionContext. When processing large relations, this could cause the system to run out of memory. The detoasted values are not needed once their tuple is rewritten, and this commit ensures that the necessary cleanup happens. Issue introduced by `bbe0a81d`. The comments of the area are reordered a bit while on it. Reported-by: Andres Freund Analyzed-by: Andres Freund Author: Michael Paquier Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/20210521211929.pcehg6f23icwstdb@alap3.anarazel.de	2021-05-25 14:27:18 +09:00
Peter Geoghegan	c242baa4a8	Consider triggering VACUUM failsafe during scan. The wraparound failsafe mechanism added by commit `1e55e7d1` handled the one-pass strategy case (i.e. the "table has no indexes" case) by adding a dedicated failsafe check. This made up for the fact that the usual one-pass checks inside lazy_vacuum_all_indexes() cannot ever be reached during a one-pass strategy VACUUM. This approach failed to account for two-pass VACUUMs that opt out of index vacuuming up-front. The INDEX_CLEANUP off case in the only case that works like that. Fix this by performing a failsafe check every 4GB during the first scan of the heap, regardless of the details of the VACUUM. This eliminates the special case, and will make the failsafe trigger more reliably. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20210424002921.pb3t7h6frupdqnkp@alap3.anarazel.de	2021-05-24 17:14:02 -07:00
Peter Geoghegan	fbe9b80610	Fix autovacuum log output heap truncation issue. The percentage of blocks from the table value reported by autovacuum log output (following commit `5100010ee4`) should never exceed 100% because it describes the state of the table back when lazy_vacuum() was called. The value could nevertheless exceed 100% in the event of heap relation truncation. We failed to compensate for how truncation affects rel_pages. Fix the faulty accounting by using the original rel_pages value instead of the current/final rel_pages value. Reported-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210423204306.5osfpkt2ggaedyvy@alap3.anarazel.de	2021-05-13 16:07:17 -07:00
Tom Lane	def5b065ff	Initial pgindent and pgperltidy run for v14. Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.	2021-05-12 13:14:10 -04:00
Fujii Masao	d780d7c088	Change data type of counters in BufferUsage and WalUsage from long to int64. Previously long was used as the data type for some counters in BufferUsage and WalUsage. But long is only four byte, e.g., on Windows, and it's entirely possible to wrap a four byte counter. For example, emitting more than four billion WAL records in one transaction isn't actually particularly rare. To avoid the overflows of those counters, this commit changes the data type of them from long to int64. Suggested-by: Andres Freund Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20201221211650.k7b53tcnadrciqjo@alap3.anarazel.de Discussion: https://postgr.es/m/af0964ac-7080-1984-dc23-513754987716@oss.nttdata.com	2021-05-12 09:56:34 +09:00
Tom Lane	049e1e2edb	Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists. It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present. If it happens, the ON CONFLICT UPDATE code path would end up storing tuples that include the values of the extra resjunk columns. That's fairly harmless in the short run, but if new columns are added to the table then the values would become accessible, possibly leading to malfunctions if they don't match the datatypes of the new columns. This had escaped notice through a confluence of missing sanity checks, including * There's no cross-check that a tuple presented to heap_insert or heap_update matches the table rowtype. While it's difficult to check that fully at reasonable cost, we can easily add assertions that there aren't too many columns. * The output-column-assignment cases in execExprInterp.c lacked any sanity checks on the output column numbers, which seems like an oversight considering there are plenty of assertion checks on input column numbers. Add assertions there too. * We failed to apply nodeModifyTable's ExecCheckPlanOutput() to the ON CONFLICT UPDATE tlist. That wouldn't have caught this specific error, since that function is chartered to ignore resjunk columns; but it sure seems like a bad omission now that we've seen this bug. In HEAD, the right way to fix this is to make the processing of ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists now do, that is don't add "SET x = x" entries, and use ExecBuildUpdateProjection to evaluate the tlist and combine it with old values of the not-set columns. This adds a little complication to ExecBuildUpdateProjection, but allows removal of a comparable amount of now-dead code from the planner. In the back branches, the most expedient solution seems to be to (a) use an output slot for the ON CONFLICT UPDATE projection that actually matches the target table, and then (b) invent a variant of ExecBuildProjectionInfo that can be told to not store values resulting from resjunk columns, so it doesn't try to store into nonexistent columns of the output slot. (We can't simply ignore the resjunk columns altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.) This works back to v10. In 9.6, projections work much differently and we can't cheaply give them such an option. The 9.6 version of this patch works by inserting a JunkFilter when it's necessary to get rid of resjunk columns. In addition, v11 and up have the reverse problem when trying to perform ON CONFLICT UPDATE on a partitioned table. Through a further oversight, adjust_partition_tlist() discarded resjunk columns when re-ordering the ON CONFLICT UPDATE tlist to match a partition. This accidentally prevented the storing-bogus-tuples problem, but at the cost that MULTIEXPR_SUBLINK cases didn't work, typically crashing if more than one row has to be updated. Fix by preserving resjunk columns in that routine. (I failed to resist the temptation to add more assertions there too, and to do some minor code beautification.) Per report from Andres Freund. Back-patch to all supported branches. Security: CVE-2021-32028	2021-05-10 11:02:29 -04:00
Peter Geoghegan	c9787385db	Remove overzealous VACUUM visibility map assertion. The all_visible_according_to_vm variable's value is inherently prone to becoming invalidated concurrently, since it is set before we even acquire a lock on a related heap page buffer. Oversight in commit `7136bf34`, which added the assertion in passing. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-By: Tang <tanghy.fnst@fujitsu.com> Diagnosed-By:: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDzgc8_MYrA5m1fyydomw_eVKtQiYh7sfDK4KEhdMsf_g@mail.gmail.com	2021-05-06 13:17:39 -07:00
Peter Geoghegan	7136bf34f2	Document LP_DEAD accounting issues in VACUUM. Document VACUUM's soft assumption that any LP_DEAD items encountered during pruning will become LP_UNUSED items before VACUUM finishes up. This is integral to the accounting used by VACUUM to generate its final report on the table to the stats collector. It also affects how VACUUM determines which heap pages are truncatable. In both cases VACUUM is concerned with the likely contents of the page in the near future, not the current contents of the page. This state of affairs created the false impression that VACUUM's dead tuple accounting had significant difference with similar accounting used during ANALYZE. There were and are no substantive differences, at least when the soft assumption completely works out. This is far clearer now. Also document cases where things don't quite work out for VACUUM's dead tuple accounting. It's possible that a significant number of LP_DEAD items will be left behind by VACUUM, and won't be recorded as remaining dead tuples in VACUUM's statistics collector report. This behavior dates back to commit `a96c41fe`, which taught VACUUM to run without index and heap vacuuming at the user's request. The failsafe mechanism added to VACUUM more recently by commit `1e55e7d1` takes the same approach to dead tuple accounting. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=Jmtu18PrsYq3EvvZJGOmZqSO2u3bvKpx9xJa5uhNp=Q@mail.gmail.com	2021-04-19 18:55:31 -07:00
Peter Eisentraut	f59b58e2a1	Use correct format placeholder for block numbers Should be %u rather than %d.	2021-04-17 09:40:50 +02:00
Peter Geoghegan	60f1f09ff4	Don't truncate heap when VACUUM's failsafe is in effect. It seems like a good idea to bypass heap truncation when the wraparound failsafe mechanism (which was added in commit `1e55e7d1`) is in effect. Deliberately don't bypass heap truncation in the INDEX_CLEANUP=off case, even though it is similar to the failsafe case. There is already a separate reloption (and related VACUUM parameter) for that. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDWRh6oTN5T8wa+cpZUVpHXET8BJ8Da7WHVHpwkPP6KLg@mail.gmail.com	2021-04-13 12:58:31 -07:00
Tom Lane	34f581c39e	Avoid improbable PANIC during heap_update. heap_update needs to clear any existing "all visible" flag on the old tuple's page (and on the new page too, if different). Per coding rules, to do this it must acquire pin on the appropriate visibility-map page while not holding exclusive buffer lock; which creates a race condition since someone else could set the flag whenever we're not holding the buffer lock. The code is supposed to handle that by re-checking the flag after acquiring buffer lock and retrying if it became set. However, one code path through heap_update itself, as well as one in its subroutine RelationGetBufferForTuple, failed to do this. The end result, in the unlikely event that a concurrent VACUUM did set the flag while we're transiently not holding lock, is a non-recurring "PANIC: wrong buffer passed to visibilitymap_clear" failure. This has been seen a few times in the buildfarm since recent VACUUM changes that added code paths that could set the all-visible flag while holding only exclusive buffer lock. Previously, the flag was (usually?) set only after doing LockBufferForCleanup, which would insist on buffer pin count zero, thus preventing the flag from becoming set partway through heap_update. However, it's clear that it's heap_update not VACUUM that's at fault here. What's less clear is whether there is any hazard from these bugs in released branches. heap_update is certainly violating API expectations, but if there is no code path that can set all-visible without a cleanup lock then it's only a latent bug. That's not 100% certain though, besides which we should worry about extensions or future back-patch fixes that could introduce such code paths. I chose to back-patch to v12. Fixing RelationGetBufferForTuple before that would require also back-patching portions of older fixes (notably `0d1fe9f74`), which is more code churn than seems prudent to fix a hypothetical issue. Discussion: https://postgr.es/m/2247102.1618008027@sss.pgh.pa.us	2021-04-13 12:17:24 -04:00
Thomas Munro	b1df6b696b	Fix potential SSI hazard in heap_update(). Commit `6f38d4dac3` failed to heed a warning about the stability of the value pointed to by "otid". The caller is allowed to pass in a pointer to newtup->t_self, which will be updated during the execution of the function. Instead, the SSI check should use the value we copy into oldtup.t_self near the top of the function. Not a live bug, because newtup->t_self doesn't really get updated until a bit later, but it was confusing and broke the rule established by the comment. Back-patch to 13. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2689164.1618160085%40sss.pgh.pa.us	2021-04-13 13:02:56 +12:00
Peter Geoghegan	5100010ee4	Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com	2021-04-07 16:14:54 -07:00
Peter Geoghegan	1e55e7d175	Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit `8523492d`, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com	2021-04-07 12:37:45 -07:00
Peter Geoghegan	3c3b8a4b26	Truncate line pointer array during VACUUM. Teach VACUUM to truncate the line pointer array of each heap page when a contiguous group of LP_UNUSED line pointers appear at the end of the array -- these unused and unreferenced items are excluded. This process occurs during VACUUM's second pass over the heap, right after LP_DEAD line pointers on the page (those encountered/pruned during the first pass) are marked LP_UNUSED. Truncation avoids line pointer bloat with certain workloads, particularly those involving continual range DELETEs and bulk INSERTs against the same table. Also harden heapam code to check for an out-of-range page offset number in places where we weren't already doing so. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wzn6a64PJM1Ggzm=uvx2otsopJMhFQj_g1rAj4GWr3ZSzw@mail.gmail.com	2021-04-07 08:47:15 -07:00
Peter Geoghegan	8523492d4e	Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit `dd695979`) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com	2021-04-06 08:49:22 -07:00
Peter Geoghegan	7ab96cf6b3	Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit `a96c41fe`). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit `282d2a03`, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com	2021-04-06 07:49:39 -07:00
Peter Geoghegan	f6b8f19a08	Allocate access strategy in parallel VACUUM workers. Commit `49f49def` took entirely the wrong approach to fixing this issue. Just allocate a local buffer access strategy in each individual worker instead of trying to propagate state. This state was never propagated by parallel VACUUM in the first place. It looks like the only reason that this worked following commit `40d964ec` was that it involved static global variables, which are initialized to 0 per the C standard. A more comprehensive fix may be necessary, even on HEAD. This fix should at least get the buildfarm green once again. Thanks once again to Thomas Munro for continued off-list assistance with the issue.	2021-04-05 17:17:40 -07:00
Peter Geoghegan	49f49defe7	Propagate parallel VACUUM's buffer access strategy. Parallel VACUUM relied on global variable state from the leader process being propagated to workers on fork(). Commit `b4af70cb` removed most uses of global variables inside vacuumlazy.c, but did not account for the buffer access strategy state. To fix, propagate the state through shared memory instead. Per buildfarm failures on elver, curculio, and morepork. Many thanks to Thomas Munro for off-list assistance with this issue.	2021-04-05 14:56:56 -07:00
Peter Geoghegan	b4af70cb21	Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com	2021-04-05 13:21:44 -07:00
Noah Misch	0ff8bbdee1	Accept slightly-filled pages for tuples larger than fillfactor. We always inserted a larger-than-fillfactor tuple into a newly-extended page, even when existing pages were empty or contained nothing but an unused line pointer. This was unnecessary relation extension. Start tolerating page usage up to 1/8 the maximum space that could be taken up by line pointers. This is somewhat arbitrary, but it should allow more cases to reuse pages. This has no effect on tables with fillfactor=100 (the default). John Naylor and Floris van Nee. Reviewed by Matthias van de Meent. Reported by Floris van Nee. Discussion: https://postgr.es/m/6e263217180649339720afe2176c50aa@opammb0562.comp.optiver.com	2021-03-30 18:53:44 -07:00
David Rowley	af527705ed	Adjust design of per-worker parallel seqscan data struct The design of the data structures which allow storage of the per-worker memory during parallel seq scans were not ideal. The work done in `56788d215` required an additional data structure to allow workers to remember the range of pages that had been allocated to them for processing during a parallel seqscan. That commit added a void pointer field to TableScanDescData to allow heapam to store the per-worker allocation information. However putting the field there made very little sense given that we have AM specific structs for that, e.g. HeapScanDescData. Here we remove the void pointer field from TableScanDescData and add a dedicated field for this purpose to HeapScanDescData. Previously we also allocated memory for this parallel per-worker data for all scans, regardless if it was a parallel scan or not. This was just a wasted allocation for non-parallel scans, so here we make the allocation conditional on the scan being parallel. Also, add previously missing pfree() to free the per-worker data in heap_endscan(). Reported-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de	2021-03-30 10:17:09 +13:00
Michael Paquier	a1999a01bb	Sanitize the term "combo CID" in code comments Combo CIDs were referred in the code comments using different terms across various places of the code, so unify a bit the term used with what is currently in use in some of the READMEs. Author: "Hou, Zhijie" Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local	2021-03-25 16:08:03 +09:00
Michael Paquier	99dd75fb99	Reword slightly logs generated for index stats in autovacuum Using "remain" is confusing, as it implies that the index file can shrink. Instead, use "in total". Per discussion with Peter Geoghegan. Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com	2021-03-24 09:36:03 +09:00
Tom Lane	9d523119fd	Avoid possible crash while finishing up a heap rewrite. end_heap_rewrite was not careful to ensure that the target relation is open at the smgr level before performing its final smgrimmedsync. In ordinary cases this is no problem, because it would have been opened earlier during the rewrite. However a crash can be reproduced by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled. Although that exact scenario does not crash in v13, I think that's a chance result of unrelated planner changes, and the problem is likely still reachable with other test cases. The true proximate cause of this failure is commit `c6b92041d`, which replaced a call to heap_sync (which was careful about opening smgr) with a direct call to smgrimmedsync. Hence, back-patch to v13. Amul Sul, per report from Neha Sharma; cosmetic changes and test case by me. Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com	2021-03-23 11:24:16 -04:00
Michael Paquier	5aed6a1fc2	Add per-index stats information in verbose logs of autovacuum Once a relation's autovacuum is completed, the logs include more information about this relation state if the threshold of log_autovacuum_min_duration (or its relation option) is reached, with for example contents about the statistics of the VACUUM operation for the relation, WAL and system usage. This commit adds more information about the statistics of the relation's indexes, with one line of logs generated for each index. The index stats were already calculated, but not printed in the context of autovacuum yet. While on it, some refactoring is done to keep track of the index statistics directly within LVRelStats, simplifying some routines related to parallel VACUUMs. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com	2021-03-23 13:25:14 +09:00
Bruce Momjian	95d77149c5	Add macro RelationIsPermanent() to report relation permanence Previously, to check relation permanence, the Relation's Form_pg_class structure member relpersistence was compared to the value RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro RelationIsPermanent() and is used in appropirate places to simplify the code. This matches other RelationIs* macros. This macro will be used in more places in future cluster file encryption patches. Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net	2021-03-22 20:23:52 -04:00
Robert Haas	bbe0a81db6	Allow configurable LZ4 TOAST compression. There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com	2021-03-19 15:10:38 -04:00
Peter Geoghegan	fbe4cb3bd4	Fix comment about promising tuples. Oversight in commit `d168b66682`, which added bottom-up index deletion.	2021-03-16 13:38:52 -07:00
Stephen Frost	94d13d474d	Improve logging of auto-vacuum and auto-analyze When logging auto-vacuum and auto-analyze activity, include the I/O timing if track_io_timing is enabled. Also, for auto-analyze, add the read rate and the dirty rate, similar to how that information has historically been logged for auto-vacuum. Stephen Frost and Jakub Wartak Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com	2021-03-16 14:46:48 -04:00
Peter Geoghegan	0ea71c93a0	Notice that heap page has dead items during VACUUM. Consistently set a flag variable that tracks whether the current heap page has a dead item during lazy vacuum's heap scan. We missed the common case where there is an preexisting (or even a new) LP_DEAD heap line pointer. Also make it clear that the variable might be affected by an existing line pointer, say from an earlier opportunistic pruning operation. This distinction is important because it's the main reason why we can't just use the nearby tups_vacuumed variable instead. No backpatch. In theory failing to set the page level flag variable had no consequences. Currently it is only used to defensively check if a page marked all visible has dead items, which should never happen anyway (if it does then the table must be corrupt). Author: Masahiko Sawada <sawada.mshk@gmail.com> Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com	2021-03-14 18:05:57 -07:00
Peter Geoghegan	5f8727f5a6	VACUUM ANALYZE: Always update pg_class.reltuples. vacuumlazy.c sometimes fails to update pg_class entries for each index (to ensure that pg_class.reltuples is current), even though analyze.c assumed that that must have happened during VACUUM ANALYZE. There are at least a couple of reasons for this. For example, vacuumlazy.c could fail to update pg_class when the index AM indicated that its statistics are merely an estimate, per the contract for amvacuumcleanup() routines established by commit `e57345975c` back in 2006. Stop assuming that pg_class must have been updated with accurate statistics within VACUUM ANALYZE -- update pg_class for indexes at the same time as the table relation in all cases. That way VACUUM ANALYZE will never fail to keep pg_class.reltuples reasonably accurate. The only downside of this approach (compared to the old approach) is that it might inaccurately set pg_class.reltuples for indexes whose heap relation ends up with the same inaccurate value anyway. This doesn't seem too bad. We already consistently called vac_update_relstats() (to update pg_class) for the heap/table relation twice during any VACUUM ANALYZE -- once in vacuumlazy.c, and once in analyze.c. We now make sure that we call vac_update_relstats() at least once (though often twice) for each index. This is follow up work to commit `9f3665fb`, which dealt with issues in btvacuumcleanup(). Technically this fixes an unrelated issue, though. btvacuumcleanup() no longer provides an accurate num_index_tuples value following commit `9f3665fb` (when there was no btbulkdelete() call during the VACUUM operation in question), but hashvacuumcleanup() has worked in the same way for many years now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com Backpatch: 13-, just like commit `9f3665fb`.	2021-03-10 17:07:57 -08:00
Peter Eisentraut	bbaf315309	Add bound check before bsearch() for performance In the current lazy vacuum implementation, some index AMs such as btree indexes call lazy_tid_reaped() for each index tuple during ambulkdelete to check if the index tuple points to the (collected) garbage tuple. In that function, we simply call bsearch(), but we should be able to know the result without bsearch() if the index tuple points to the heap tuple that is out of range of the collected garbage tuples. Therefore, add a simple bound check before resorting to bsearch(). Testing has shown that this can give significant performance benefits. Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com	2021-03-10 15:19:37 +01:00
David Rowley	bb437f995d	Add TID Range Scans to support efficient scanning ranges of TIDs This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com	2021-02-27 22:59:36 +13:00
Peter Geoghegan	2376361839	VACUUM VERBOSE: Count "newly deleted" index pages. Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM operation -- these are newly deleted pages. VACUUM VERBOSE continues to report on the total number of deleted pages in the entire index (no change there). The former is a subset of the latter. The distinction between each category of deleted index page only arises with index AMs where page deletion is supported and is decoupled from page recycling for performance reasons. This is follow-up work to commit `e5d8a999`, which made nbtree store 64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're deleted. Note that the btm_last_cleanup_num_delpages metapage field added by that commit usually gets set to pages_newly_deleted. The exceptions (the scenarios in which they're not equal) all seem to be tricky cases for the implementation (of page deletion and recycling) in general. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com	2021-02-25 14:32:18 -08:00
Alvaro Herrera	8deb6b38dc	Reinstate HEAP_XMAX_LOCK_ONLY\|HEAP_KEYS_UPDATED as allowed Commit `866e24d47d` added an assert that HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that the latter necessarily referred to an update and not a tuple lock; but that's wrong, because SELECT FOR UPDATE can use precisely that combination, as evidenced by the amcheck test case added here. Remove the Assert(), and also patch amcheck's verify_heapam.c to not complain if the combination is found. Also, out of overabundance of caution, update (across all branches) README.tuplock to be more explicit about this. Author: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/20210124061758.GA11756@nol	2021-02-23 17:30:21 -03:00
Peter Eisentraut	6f6f284c7e	Simplify printing of LSNs Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com	2021-02-23 10:27:02 +01:00
Alvaro Herrera	0f5505a881	Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls Pavan Deolasee recently noted that a few of the HeapTupleHeaderIndicatesMovedPartitions calls added by commit `5db6df0c01` are useless, since they are done after comparing t_self with t_ctid. But because t_self can never be set to the magical values that indicate that the tuple moved partition, this can never succeed: if the first test fails (so we know t_self equals t_ctid), necessarily the second test will also fail. So these checks can be removed and no harm is done. There's no bug here, just a code legibility issue. Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com> Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql	2021-02-22 16:51:44 -03:00
Andres Freund	8001cb77ee	Fix heap_page_prune() parameter order confusion introduced in `dc7420c2c9`. Both luckily and unluckily the passed values meant the same for all types. Luckily because that meant my confusion caused no harm, unluckily because otherwise the compiler might have warned... In passing, synchronize parameter names between definition and declaration. Reported-By: Peter Geoghegan <pg@bowt.ie> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=L=nBoepQdH9b5Qd0nMvepFT2CnT6sjWvvpOXa=K8HVQ@mail.gmail.com	2021-02-15 17:12:12 -08:00
Peter Geoghegan	7cde6b13a9	Adjust lazy_scan_heap() accounting comments. Explain which particular LP_DEAD line pointers get accounted for by the tups_vacuumed variable.	2021-02-14 19:28:37 -08:00
Robert Haas	d18e75664a	Remove CheckpointLock. Up until now, we've held this lock when performing a checkpoint or restartpoint, but commit `076a055acf` back in 2004 and commit `7e48b77b1c` from 2009, taken together, have removed all need for this. In the present code, there's only ever one process entitled to attempt a checkpoint: either the checkpointer, during normal operation, or the postmaster, during single-user operation. So, we don't need the lock. One possible concern in making this change is that it means that a substantial amount of code where HOLD_INTERRUPTS() was previously in effect due to the preceding LWLockAcquire() will now be running without that. This could mean that ProcessInterrupts() gets called in places from which it didn't before. However, this seems unlikely to do very much, because the checkpointer doesn't have any signal mapped to die(), so it's not clear how, for example, ProcDiePending = true could happen in the first place. Similarly with ClientConnectionLost and recovery conflicts. Also, if there are any such problems, we might want to fix them rather than reverting this, since running lots of code with interrupt handling suspended is generally bad. Patch by me, per an inquiry by Amul Sul. Review by Tom Lane and Michael Paquier. Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com	2021-01-25 12:34:38 -05:00
David Rowley	16dfe253e3	Fix hypothetical bug in heap backward scans Both heapgettup() and heapgettup_pagemode() incorrectly set the first page to scan in a backward scan in which the number of pages to scan was specified by heap_setscanlimits(). The code incorrectly started the scan at the end of the relation when startBlk was 0, or otherwise at startBlk - 1, neither of which is correct when only scanning a subset of pages. The fix here checks if heap_setscanlimits() has changed the number of pages to scan and if so we set the first page to scan as the final page in the specified range during backward scans. Proper adjustment of this code was forgotten when heap_setscanlimits() was added in `7516f5259` back in 9.5. However, practice, nowhere in core code performs backward scans after having used heap_setscanlimits(), yet, it is possible an extension uses the heap functions in this way, hence backpatch. An upcoming patch does use heap_setscanlimits() with backward scans, so this must be fixed before that can go in. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com Backpatch-through: 9.5, all supported versions	2021-01-25 19:52:18 +13:00
Tomas Vondra	39b66a91bd	Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS This adds code omitted from commit `7db0cd2145` by accident, which had two consequences. Firstly, only rows inserted by heap_multi_insert were frozen as expected when running COPY FREEZE, while heap_insert left rows unfrozen. That however includes rows in TOAST tables, so a lot of data might have been left unfrozen. Secondly, page might have been left partially empty after relcache invalidation. This addresses both of those issues. Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com	2021-01-24 01:08:11 +01:00
Tomas Vondra	7db0cd2145	Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the visibility map. Until now we only marked individual tuples as frozen, but page-level flags were not updated, so the first VACUUM after the COPY FREEZE had to rewrite the whole table. This is a fairly old patch, and multiple people worked on it. The first version was written by Jeff Janes, and then reworked by Pavan Deolasee and Anastasia Lubennikova. Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii, Darafei Praliaskouski Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com	2021-01-17 22:28:26 +01:00
Peter Geoghegan	d168b66682	Enhance nbtree index tuple deletion. Teach nbtree and heapam to cooperate in order to eagerly remove duplicate tuples representing dead MVCC versions. This is "bottom-up deletion". Each bottom-up deletion pass is triggered lazily in response to a flood of versions on an nbtree leaf page. This usually involves a "logically unchanged index" hint (these are produced by the executor mechanism added by commit `9dc718bd`). The immediate goal of bottom-up index deletion is to avoid "unnecessary" page splits caused entirely by version duplicates. It naturally has an even more useful effect, though: it acts as a backstop against accumulating an excessive number of index tuple versions for any given _logical row_. Bottom-up index deletion complements what we might now call "top-down index deletion": index vacuuming performed by VACUUM. Bottom-up index deletion responds to the immediate local needs of queries, while leaving it up to autovacuum to perform infrequent clean sweeps of the index. The overall effect is to avoid certain pathological performance issues related to "version churn" from UPDATEs. The previous tableam interface used by index AMs to perform tuple deletion (the table_compute_xid_horizon_for_tuples() function) has been replaced with a new interface that supports certain new requirements. Many (perhaps all) of the capabilities added to nbtree by this commit could also be extended to other index AMs. That is left as work for a later commit. Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic to consider extra index tuples (that are not LP_DEAD-marked) for deletion in passing. This increases the number of index tuples deleted significantly in many cases. The LP_DEAD deletion process (which is now called "simple deletion" to clearly distinguish it from bottom-up deletion) won't usually need to visit any extra table blocks to check these extra tuples. We have to visit the same table blocks anyway to generate a latestRemovedXid value (at least in the common case where the index deletion operation's WAL record needs such a value). Testing has shown that the "extra tuples" simple deletion enhancement increases the number of index tuples deleted with almost any workload that has LP_DEAD bits set in leaf pages. That is, it almost never fails to delete at least a few extra index tuples. It helps most of all in cases that happen to naturally have a lot of delete-safe tuples. It's not uncommon for an individual deletion operation to end up deleting an order of magnitude more index tuples compared to the old naive approach (e.g., custom instrumentation of the patch shows that this happens fairly often when the regression tests are run). Add a further enhancement that augments simple deletion and bottom-up deletion in indexes that make use of deduplication: Teach nbtree's _bt_delitems_delete() function to support granular TID deletion in posting list tuples. It is now possible to delete individual TIDs from posting list tuples provided the TIDs have a tableam block number of a table block that gets visited as part of the deletion process (visiting the table block can be triggered directly or indirectly). Setting the LP_DEAD bit of a posting list tuple is still an all-or-nothing thing, but that matters much less now that deletion only needs to start out with the right _general_ idea about which index tuples are deletable. Bump XLOG_PAGE_MAGIC because xl_btree_delete changed. No bump in BTREE_VERSION, since there are no changes to the on-disk representation of nbtree indexes. Indexes built on PostgreSQL 12 or PostgreSQL 13 will automatically benefit from bottom-up index deletion (i.e. no reindexing required) following a pg_upgrade. The enhancement to simple deletion is available with all B-Tree indexes following a pg_upgrade, no matter what PostgreSQL version the user upgrades from. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com	2021-01-13 09:21:32 -08:00
Peter Geoghegan	9dc718bdf2	Pass down "logically unchanged index" hint. Add an executor aminsert() hint mechanism that informs index AMs that the incoming index tuple (the tuple that accompanies the hint) is not being inserted by execution of an SQL statement that logically modifies any of the index's key columns. The hint is received by indexes when an UPDATE takes place that does not apply an optimization like heapam's HOT (though only for indexes where all key columns are logically unchanged). Any index tuple that receives the hint on insert is expected to be a duplicate of at least one existing older version that is needed for the same logical row. Related versions will typically be stored on the same index page, at least within index AMs that apply the hint. Recognizing the difference between MVCC version churn duplicates and true logical row duplicates at the index AM level can help with cleanup of garbage index tuples. Cleanup can intelligently target tuples that are likely to be garbage, without wasting too many cycles on less promising tuples/pages (index pages with little or no version churn). This is infrastructure for an upcoming commit that will teach nbtree to perform bottom-up index deletion. No index AM actually applies the hint just yet. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Victor Yegorov <vyegorov@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com	2021-01-13 08:11:00 -08:00
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Peter Geoghegan	32d6287d2e	Get heap page max offset with buffer lock held. On further reflection it seems better to call PageGetMaxOffsetNumber() after acquiring a buffer lock on the page. This shouldn't really matter, but doing it this way is cleaner. Follow-up to commit `42288174`. Backpatch: 12-, just like commit `42288174`	2020-12-30 17:21:42 -08:00
Peter Geoghegan	4228817449	Fix index deletion latestRemovedXid bug. The logic for determining the latest removed XID for the purposes of generating recovery conflicts in REDO routines was subtly broken. It failed to follow links from HOT chains, and so failed to consider all relevant heap tuple headers in some cases. To fix, expand the loop that deals with LP_REDIRECT line pointers to also deal with HOT chains. The new version of the loop is loosely based on a similar loop from heap_prune_chain(). The impact of this bug is probably quite limited, since the horizon code necessarily deals with heap tuples that are pointed to by LP_DEAD-set index tuples. The process of setting LP_DEAD index tuples (e.g. within the kill_prior_tuple mechanism) is highly correlated with opportunistic pruning of pointed-to heap tuples. Plus the question of generating a recovery conflict usually comes up some time after index tuple LP_DEAD bits were initially set, unlike heap pruning, where a latestRemovedXid is generated at the point of the pruning operation (heap pruning has no deferred "would-be page split" style processing that produces conflicts lazily). Only backpatch to Postgres 12, the first version where this logic runs during original execution (following commit `558a9165e0`). The index latestRemovedXid mechanism has had the same bug since it first appeared over 10 years ago (in commit `a760893d`), but backpatching to all supported versions now seems like a bad idea on balance. Running the new improved code during recovery seems risky, especially given the lack of complaints from the field. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=Eib393+HHcERK_9MtgNS7Ew1HY=RDC_g6GL46zM5C6Q@mail.gmail.com Backpatch: 12-	2020-12-30 16:29:05 -08:00
Tom Lane	b3817f5f77	Improve hash_create()'s API for some added robustness. Invent a new flag bit HASH_STRINGS to specify C-string hashing, which was formerly the default; and add assertions insisting that exactly one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set. This is in hopes of preventing recurrences of the type of oversight fixed in commit `a1b8aa1e4` (i.e., mistakenly omitting HASH_BLOBS). Also, when HASH_STRINGS is specified, insist that the keysize be more than 8 bytes. This is a heuristic, but it should catch accidental use of HASH_STRINGS for integer or pointer keys. (Nearly all existing use-cases set the keysize to NAMEDATALEN or more, so there's little reason to think this restriction should be problematic.) Tweak hash_create() to insist that the HASH_ELEM flag be set, and remove the defaults it had for keysize and entrysize. Since those defaults were undocumented and basically useless, no callers omitted HASH_ELEM anyway. Also, remove memset's zeroing the HASHCTL parameter struct from those callers that had one. This has never been really necessary, and while it wasn't a bad coding convention it was confusing that some callers did it and some did not. We might as well save a few cycles by standardizing on "not". Also improve the documentation for hash_create(). In passing, improve reinit.c's usage of a hash table by storing the key as a binary Oid rather than a string; and, since that's a temporary hash table, allocate it in CurrentMemoryContext for neatness. Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us	2020-12-15 11:38:53 -05:00
Michael Paquier	947789f1f5	Avoid using tuple from syscache for update of pg_database.datfrozenxid pg_database.datfrozenxid gets updated using an in-place update at the end of vacuum or autovacuum. Since `96cdeae`, as pg_database has a toast relation, it is possible for a pg_database tuple to have toast values if there is a large set of ACLs in place. In such a case, the in-place update would fail because of the flattening of the toast values done for the catcache entry fetched. Instead of using a copy from the catcache, this changes the logic to fetch the copy of the tuple by directly scanning pg_database. Per the lack of complaints on the matter, no backpatch is done. Note that before `96cdeae`, attempting to insert such a tuple to pg_database would cause a "row is too big" error, so the end-of-vacuum problem was not reachable. Author: Ashwin Agrawal, Junfeng Yang Discussion: https://postgr.es/m/DM5PR0501MB38800D9E4605BCA72DD35557CCE10@DM5PR0501MB3880.namprd05.prod.outlook.com	2020-12-08 12:13:19 +09:00
Fujii Masao	3df51ca8b3	Fix CLUSTER progress reporting of number of blocks scanned. Previously pg_stat_progress_cluster view reported the current block number in heap scan as the number of heap blocks scanned (i.e., heap_blks_scanned). This reported number could be incorrect when synchronize_seqscans is enabled, because it allowed the heap scan to start at block in middle. This could result in wraparounds in the heap_blks_scanned column when the heap scan wrapped around. This commit fixes the bug by calculating the number of blocks from the block that the heap scan starts at to the current block in scan, and reporting that number in the heap_blks_scanned column. Also, in pg_stat_progress_cluster view, previously heap_blks_scanned could not reach heap_blks_total at the end of heap scan phase if the last pages scanned were empty. This commit fixes the bug by manually updating heap_blks_scanned to the same value as heap_blks_total when the heap scan phase finishes. Back-patch to v12 where pg_stat_progress_cluster view was introduced. Reported-by: Matthias van de Meent Author: Matthias van de Meent Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CAEze2WjCBWSGkVfYag001Rc4+-nNLDpWM7QbyD6yPvuhKs-gYQ@mail.gmail.com	2020-11-27 20:16:44 +09:00
Peter Geoghegan	180cf876d4	Remove ineffective heapam CHECK_FOR_INTERRUPTS(). Remove a CHECK_FOR_INTERRUPTS() call that could never actually handle an interrupt. We always have a heap page buffer lock at this point. Having a useless CHECK_FOR_INTERRUPTS() call is harmless but misleading. It is probably possible to work around the immediate problem by moving the CHECK_FOR_INTERRUPTS() to before the heap page buffer lock is acquired. That isn't enough to make the function responsive to interrupts, though. The index AM caller will still hold an exclusive buffer lock of its own.	2020-11-09 09:00:12 -08:00
Alvaro Herrera	52eec1c53a	Message style improvements * Avoid pointlessly highlighting that an index vacuum was executed by a parallel worker; user doesn't care. * Don't give the impression that a non-concurrent reindex of an invalid index on a TOAST table would work, because it wouldn't. * Add a "translator:" comment for a mysterious message. Discussion: https://postgr.es/m/20201107034943.GA16596@alvherre.pgsql Reviewed-by: Michael Paquier <michael@paquier.xyz>	2020-11-07 19:33:43 -03:00
Michael Paquier	8a15e735be	Fix some grammar and typos in comments and docs The documentation fixes are backpatched down to where they apply. Author: Justin Pryzby Discussion: https://postgr.es/m/20201031020801.GD3080@telsasoft.com Backpatch-through: 9.6	2020-11-02 15:14:41 +09:00
Noah Misch	f90e80b913	Reproduce debug_query_string==NULL on parallel workers. Certain background workers initiate parallel queries while debug_query_string==NULL, at which point they attempted strlen(NULL) and died to SIGSEGV. Older debug_query_string observers allow NULL, so do likewise in these newer ones. Back-patch to v11, where commit `7de4a1bcc5` introduced the first of these. Discussion: https://postgr.es/m/20201014022636.GA1962668@rfd.leadboat.com	2020-10-31 08:43:28 -07:00
Andres Freund	94bc27b576	Centralize horizon determination for temp tables, fixing bug due to skew. This fixes a bug in the edge case where, for a temp table, heap_page_prune() can end up with a different horizon than heap_vacuum_rel(). Which can trigger errors like "ERROR: cannot freeze committed xmax ...". The bug was introduced due to interaction of `a7212be8b9` "Set cutoff xmin more aggressively when vacuuming a temporary table." with `dc7420c2c9` "snapshot scalability: Don't compute global horizons while building snapshots.". The problem is caused by lazy_scan_heap() assuming that the only reason its HeapTupleSatisfiesVacuum() call would return HEAPTUPLE_DEAD is if the tuple is a HOT tuple, or if the tuple's inserting transaction has aborted since the heap_page_prune() call. But after `a7212be8b9` that was also possible in other cases for temp tables, because heap_page_prune() uses a different visibility test after `dc7420c2c9`. The fix is fairly simple: Move the special case logic for temp tables from vacuum_set_xid_limits() to the infrastructure introduced in `dc7420c2c9`. That ensures that the horizon used for pruning is at least as aggressive as the one used by lazy_scan_heap(). The concrete horizon used for temp tables is slightly different than the logic in `dc7420c2c9`, but should always be as aggressive as before (see comments). A significant benefit to centralizing the logic procarray.c is that now the more aggressive horizons for temp tables does not just apply to VACUUM but also to e.g. HOT pruning and the nbtree killtuples logic. Because isTopLevel is not needed by vacuum_set_xid_limits() anymore, I undid the the related changes from `a7212be8b9`. This commit also adds an isolation test ensuring that the more aggressive vacuuming and pruning of temp tables keeps working. Debugged-By: Amit Kapila <amit.kapila16@gmail.com> Debugged-By: Tom Lane <tgl@sss.pgh.pa.us> Debugged-By: Ashutosh Sharma <ashu.coek88@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20201014203103.72oke6hqywcyhx7s@alap3.anarazel.de Discussion: https://postgr.es/m/20201015083735.derdzysdtqdvxshp@alap3.anarazel.de	2020-10-28 18:02:31 -07:00
Robert Haas	866e24d47d	Extend amcheck to check heap pages. Mark Dilger, reviewed by Peter Geoghegan, Andres Freund, Álvaro Herrera, Michael Paquier, Amul Sul, and by me. Some last-minute cosmetic revisions by me. Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com	2020-10-22 08:44:18 -04:00
Alvaro Herrera	93f84d59f8	Revert "Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls" This reverts commit `85adb5e91e`. It was not intended for commit just yet.	2020-10-15 15:16:11 -03:00
Alvaro Herrera	85adb5e91e	Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls Pavan Deolasee recently noted that a few of the HeapTupleHeaderIndicatesMovedPartitions calls added by commit `5db6df0c01` are useless, since they are done after comparing t_self with t_ctid. But because t_self can never be set to the magical values that indicate that the tuple moved partition, this can never succeed: if the first test fails (so we know t_self equals t_ctid), necessarily the second test will also fail. So these checks can be removed and no harm is done. Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql	2020-10-15 14:32:34 -03:00
Amit Kapila	0d32511eca	Fix comments in heapam.c. After commits `85f6b49c2c` and `3ba59ccc89`, we can allow parallel inserts which was earlier not possible as parallel group members won't conflict for relation extension and page lock. In those commits, we forgot to update comments at few places. Author: Amit Kapila Reviewed-by: Robert Haas and Dilip Kumar Backpatch-through: 13 Discussion: https://postgr.es/m/CAFiTN-tMrQh5FFMPx5aWJ+1gi1H6JxktEhq5mDwCHgnEO5oBkA@mail.gmail.com	2020-09-18 09:50:44 +05:30
Peter Eisentraut	3e0242b24c	Message fixes and style improvements	2020-09-14 06:42:30 +02:00
Tom Lane	a7212be8b9	Set cutoff xmin more aggressively when vacuuming a temporary table. Since other sessions aren't allowed to look into a temporary table of our own session, we do not need to worry about the global xmin horizon when setting the vacuum XID cutoff. Indeed, if we're not inside a transaction block, we may set oldestXmin to be the next XID, because there cannot be any in-doubt tuples in a temp table, nor any tuples that are dead but still visible to some snapshot of our transaction. (VACUUM, of course, is never inside a transaction block; but we need to test that because CLUSTER shares the same code.) This approach allows us to always clean out a temp table completely during VACUUM, independently of concurrent activity. Aside from being useful in its own right, that simplifies building reproducible test cases. Discussion: https://postgr.es/m/3490536.1598629609@sss.pgh.pa.us	2020-09-01 18:40:43 -04:00
Tom Lane	3d351d916b	Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. Historically, we've considered the state with relpages and reltuples both zero as indicating that we do not know the table's tuple density. This is problematic because it's impossible to distinguish "never yet vacuumed" from "vacuumed and seen to be empty". In particular, a user cannot use VACUUM or ANALYZE to override the planner's normal heuristic that an empty table should not be believed to be empty because it is probably about to get populated. That heuristic is a good safety measure, so I don't care to abandon it, but there should be a way to override it if the table is indeed intended to stay empty. Hence, represent the initial state of ignorance by setting reltuples to -1 (relpages is still set to zero), and apply the minimum-ten-pages heuristic only when reltuples is still -1. If the table is empty, VACUUM or ANALYZE (but not CREATE INDEX) will override that to reltuples = relpages = 0, and then we'll plan on that basis. This requires a bunch of fiddly little changes, but we can get rid of some ugly kluges that were formerly needed to maintain the old definition. One notable point is that FDWs' GetForeignRelSize methods will see baserel->tuples = -1 when no ANALYZE has been done on the foreign table. That seems like a net improvement, since those methods were formerly also in the dark about what baserel->tuples = 0 really meant. Still, it is an API change. I bumped catversion because code predating this change would get confused by seeing reltuples = -1. Discussion: https://postgr.es/m/F02298E0-6EF4-49A1-BCB6-C484794D9ACC@thebuild.com	2020-08-30 12:21:51 -04:00
Amit Kapila	7e453634bb	Add additional information in the vacuum error context. The additional information added will be an offset number for heap operations. This information will help us in finding the exact tuple due to which the error has occurred. Author: Mahendra Singh Thalor and Amit Kapila Reviewed-by: Sawada Masahiko, Justin Pryzby and Amit Kapila Discussion: https://postgr.es/m/CAKYtNApK488TDF4bMbw+1QH8HJf9cxdNDXquhU50TK5iv_FtCQ@mail.gmail.com	2020-08-26 09:40:52 +05:30
Amit Kapila	a3c66de6c5	Improve the vacuum error context phase information. We were displaying the wrong phase information for 'info' message in the index clean up phase because we were switching to the previous phase a bit early. We were also not displaying context information for heap phase unless the block number is valid which is fine for error cases but for messages at 'info' or lower error level it appears to be inconsistent with index phase information. Reported-by: Sawada Masahiko Author: Sawada Masahiko Reviewed-by: Amit Kapila Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CA+fd4k4HcbhPnCs7paRTw1K-AHin8y4xKomB9Ru0ATw0UeTy2w@mail.gmail.com	2020-08-24 08:16:19 +05:30
Andres Freund	c62a0a49f3	Revert "Make vacuum a bit more verbose to debug BF failure." This reverts commit `49967da65a`. Enough time has passed that we can be confident that `07f32fcd23` resolved the issue. Therefore we can remove the temporary debugging aids. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org	2020-08-20 12:59:00 -07:00
Andres Freund	49967da65a	Make vacuum a bit more verbose to debug BF failure. This is temporary. While possibly some more error checking / debugging in this path would be a good thing, it'll not look exactly like this. Discussion: https://postgr.es/m/20200816181604.l54m6kss5ntd6xow@alap3.anarazel.de	2020-08-16 12:57:01 -07:00
Noah Misch	676a9c3cc4	Correct several behavior descriptions in comments. Reuse cautionary language from src/test/ssl/README in src/test/kerberos/README. SLRUs have had access to six-character segments names since commit `73c986adde`, and recovery stopped calling HeapTupleHeaderAdvanceLatestRemovedXid() in commit `558a9165e0`. The other corrections are more self-evident.	2020-08-15 20:21:52 -07:00
Andres Freund	941697c3c1	snapshot scalability: Introduce dense array of in-progress xids. The new array contains the xids for all connected backends / in-use PGPROC entries in a dense manner (in contrast to the PGPROC/PGXACT arrays which can have unused entries interspersed). This improves performance because GetSnapshotData() always needs to scan the xids of all live procarray entries and now there's no need to go through the procArray->pgprocnos indirection anymore. As the set of running top-level xids changes rarely, compared to the number of snapshots taken, this substantially increases the likelihood of most data required for a snapshot being in l2 cache. In read-mostly workloads scanning the xids[] array will sufficient to build a snapshot, as most backends will not have an xid assigned. To keep the xid array dense ProcArrayRemove() needs to move entries behind the to-be-removed proc's one further up in the array. Obviously moving array entries cannot happen while a backend sets it xid. I.e. locking needs to prevent that array entries are moved while a backend modifies its xid. To avoid locking ProcArrayLock in GetNewTransactionId() - a fairly hot spot already - ProcArrayAdd() / ProcArrayRemove() now needs to hold XidGenLock in addition to ProcArrayLock. Adding / Removing a procarray entry is not a very frequent operation, even taking 2PC into account. Due to the above, the dense array entries can only be read or modified while holding ProcArrayLock and/or XidGenLock. This prevents a concurrent ProcArrayRemove() from shifting the dense array while it is accessed concurrently. While the new dense array is very good when needing to look at all xids it is less suitable when accessing a single backend's xid. In particular it would be problematic to have to acquire a lock to access a backend's own xid. Therefore a backend's xid is not just stored in the dense array, but also in PGPROC. This also allows a backend to only access the shared xid value when the backend had acquired an xid. The infrastructure added in this commit will be used for the remaining PGXACT fields in subsequent commits. They are kept separate to make review easier. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-14 15:33:35 -07:00
Alvaro Herrera	a811ea5bde	Handle new HOT chains in index-build table scans When a table is scanned by heapam_index_build_range_scan (née IndexBuildHeapScan) and the table lock being held allows concurrent data changes, it is possible for new HOT chains to sprout in a page that were unknown when the scan of a page happened. This leads to an error such as ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl" because the root tuple was not present when we first obtained the list of the page's root tuples. This can be fixed by re-obtaining the list of root tuples, if we see that a heap-only tuple appears to point to a non-existing root. This was reported by Anastasia as occurring for BRIN summarization (which exists since 9.5), but I think it could theoretically also happen with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY (very recent). It seems a happy coincidence that BRIN forces us to backpatch this all the way to 9.5. Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru Backpatch: 9.5 - master	2020-08-13 17:33:49 -04:00
Andres Freund	dc7420c2c9	snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-12 16:03:49 -07:00
Amit Kapila	7259736a6e	Implement streaming mode in ReorderBuffer. Instead of serializing the transaction to disk after reaching the logical_decoding_work_mem limit in memory, we consume the changes we have in memory and invoke stream API methods added by commit `45fdc9738b`. However, sometimes if we have incomplete toast or speculative insert we spill to the disk because we can't generate the complete tuple and stream. And, as soon as we get the complete tuple we stream the transaction including the serialized changes. We can do this incremental processing thanks to having assignments (associating subxact with toplevel xacts) in WAL right away, and thanks to logging the invalidation messages at each command end. These features are added by commits `0bead9af48` and `c55040ccd0` respectively. Now that we can stream in-progress transactions, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK sqlerrcode from system table scan APIs to the backend or WALSender decoding a specific uncommitted transaction. The decoding logic on the receipt of such a sqlerrcode aborts the decoding of the current transaction and continue with the decoding of other transactions. We have ReorderBufferTXN pointer in each ReorderBufferChange by which we know which xact it belongs to. The output plugin can use this to decide which changes to discard in case of stream_abort_cb (e.g. when a subxact gets discarded). We also provide a new option via SQL APIs to fetch the changes being streamed. Author: Dilip Kumar, Tomas Vondra, Amit Kapila, Nikhil Sontakke Reviewed-by: Amit Kapila, Kuntal Ghosh, Ajin Cherian Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-08-08 07:47:06 +05:30
Thomas Munro	c5315f4f44	Cache smgrnblocks() results in recovery. Avoid repeatedly calling lseek(SEEK_END) during recovery by caching the size of each fork. For now, we can't use the same technique in other processes, because we lack a shared invalidation mechanism. Do this by generalizing the pre-existing caching used by FSM and VM to support all forks. Discussion: https://postgr.es/m/CAEepm%3D3SSw-Ty1DFcK%3D1rU-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com	2020-07-31 14:29:52 +12:00
Michael Paquier	e3931d01f3	Use multi-inserts for pg_attribute and pg_shdepend For pg_attribute, this allows to insert at once a full set of attributes for a relation (roughly 15% of WAL reduction in extreme cases). For pg_shdepend, this reduces the work done when creating new shared dependencies from a database template. The number of slots used for the insertion is capped at 64kB of data inserted for both, depending on the number of items to insert and the length of the rows involved. More can be done for other catalogs, like pg_depend. This part requires a different approach as the number of slots to use depends also on the number of entries discarded as pinned dependencies. This is also related to the rework or dependency handling for ALTER TABLE and CREATE TABLE, mainly. Author: Daniel Gustafsson Reviewed-by: Andres Freund, Michael Paquier Discussion: https://postgr.es/m/20190213182737.mxn6hkdxwrzgxk35@alap3.anarazel.de	2020-07-31 10:54:26 +09:00
Thomas Munro	cb04ad4985	Move syncscan.c to src/backend/access/common. Since the tableam.c code needs to make use of the syncscan.c routines itself, and since other block-oriented AMs might also want to use it one day, it didn't make sense for it to live under src/backend/access/heap. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGLCnG%3DNEAByg6bk%2BCT9JZD97Y%3DAxKhh27Su9FeGWOKvDg%40mail.gmail.com	2020-07-29 16:59:33 +12:00
David Rowley	56788d2156	Allocate consecutive blocks during parallel seqscans Previously we would allocate blocks to parallel workers during a parallel sequential scan 1 block at a time. Since other workers were likely to request a block before a worker returns for another block number to work on, this could lead to non-sequential I/O patterns in each worker which could cause the operating system's readahead to perform poorly or not at all. Here we change things so that we allocate consecutive "chunks" of blocks to workers and have them work on those until they're done, at which time we allocate another chunk for the worker. The size of these chunks is based on the size of the relation. Initial patch here was by Thomas Munro which showed some good improvements just having a fixed chunk size of 64 blocks with a simple ramp-down near the end of the scan. The revisions of the patch to make the chunk size based on the relation size and the adjusted ramp-down in powers of two was done by me, along with quite extensive benchmarking to determine the optimal chunk sizes. For the most part, benchmarks have shown significant performance improvements for large parallel sequential scans on Linux, FreeBSD and Windows using SSDs. It's less clear how this affects the performance of cloud providers. Tests done so far are unable to obtain stable enough performance to provide meaningful benchmark results. It is possible that this could cause some performance regressions on more obscure filesystems, so we may need to later provide users with some ability to get something closer to the old behavior. For now, let's leave that until we see that it's really required. Author: Thomas Munro, David Rowley Reviewed-by: Ranier Vilela, Soumyadeep Chakraborty, Robert Haas Reviewed-by: Amit Kapila, Kirk Jamison Discussion: https://postgr.es/m/CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com	2020-07-26 21:02:45 +12:00
Peter Geoghegan	a766d6ca22	Avoid harmless Valgrind no-buffer-pin errors. Valgrind builds with assertions enabled sometimes perform a theoretically unsafe page access inside an assertion in heapam_tuple_lock(). This happened when the eval-plan-qual isolation test ran one of the permutations added by commit `a2418f9e23`. Avoid complaints from Valgrind by moving the assertion ever so slightly. This is minor cleanup for commit `1e0dfd16`, which added Valgrind buffer access instrumentation. No backpatch, since this only happens within an assertion, and seems very unlikely to cause any real problems even with assert-enabled builds.	2020-07-19 16:12:51 -07:00
Michael Paquier	9168793d72	Fix comments related to table AMs Incorrect function names were referenced. As this fixes some portions of tableam.h, that is mentioned in the docs as something to look at when implementing a table AM, backpatch down to 12 where this has been introduced. Author: Hironobu Suzuki Discussion: https://postgr.es/m/8fe6d672-28dd-3f1d-7aed-ac2f6d599d3f@interdb.jp Backpatch-through: 12	2020-07-14 13:17:11 +09:00
Andres Freund	5e7bbb5286	code: replace 'master' with 'primary' where appropriate. Also changed "in the primary" to "on the primary", and added a few "the" before "primary". Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de	2020-07-08 12:57:23 -07:00
Amit Kapila	231ef5b90d	Remove unused function parameter in end_parallel_vacuum. Author: Vignesh C Reviewed-by: Sawada Masahiko Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CALDaNm3Ppt71NafGY5mk3V2i3Q+mm93pVibDq-0NpW7WU67Jcg@mail.gmail.com	2020-07-06 08:21:52 +05:30
Amit Kapila	a69e041d0c	Improve vacuum error context handling. Use separate functions to save and restore error context information as that made code easier to understand. Also, make it clear that the index information required for error context is sane. Author: Andres Freund, Justin Pryzby, Amit Kapila Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CAA4eK1LWo+v1OWu=Sky27GTGSCuOmr7iaURNbc5xz6jO+SaPeA@mail.gmail.com	2020-07-01 07:58:36 +05:30
Thomas Munro	7aa4fb5925	Improve comments for [Heap]CheckForSerializableConflictOut(). Rewrite the documentation of these functions, in light of recent bug fix commit `5940ffb2`. Back-patch to 13 where the check-for-conflict-out code was split up into AM-specific and generic parts, and new documentation was added that now looked wrong. Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io	2020-06-12 10:55:38 +12:00
Peter Geoghegan	5940ffb221	Avoid update conflict out serialization anomalies. SSI's HeapCheckForSerializableConflictOut() test failed to correctly handle conditions involving a concurrently inserted tuple which is later concurrently updated by a separate transaction . A SELECT statement that called HeapCheckForSerializableConflictOut() could end up using the same XID (updater's XID) for both the original tuple, and the successor tuple, missing the XID of the xact that created the original tuple entirely. This only happened when neither tuple from the chain was visible to the transaction's MVCC snapshot. The observable symptoms of this bug were subtle. A pair of transactions could commit, with the later transaction failing to observe the effects of the earlier transaction (because of the confusion created by the update to the non-visible row). This bug dates all the way back to commit `dafaa3ef`, which added SSI. To fix, make sure that we check the xmin of concurrently inserted tuples that happen to also have been updated concurrently. Author: Peter Geoghegan Reported-By: Kyle Kingsbury Reviewed-By: Thomas Munro Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io Backpatch: All supported versions	2020-06-11 10:09:47 -07:00
Tom Lane	5cbfce562f	Initial pgindent and pgperltidy run for v13. Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit `c9d297751`, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.	2020-05-14 13:06:50 -04:00
Alvaro Herrera	17cc133f01	Dial back -Wimplicit-fallthrough to level 3 The additional pain from level 4 is excessive for the gain. Also revert all the source annotation changes to their original wordings, to avoid back-patching pain. Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us	2020-05-13 15:31:14 -04:00
Alvaro Herrera	3e9744465d	Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGS Use it at level 4, a bit more restrictive than the default level, and tweak our commanding comments to FALLTHROUGH. (However, leave zic.c alone, since it's external code; to avoid the warnings that would appear there, change CFLAGS for that file in the Makefile.) Author: Julien Rouhaud <rjuju123@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org	2020-05-12 16:07:30 -04:00
Amit Kapila	69bfaf2e1d	Change the display of WAL usage statistics in Explain. In commit `33e05f89c5`, we have added the option to display WAL usage statistics in Explain and auto_explain. The display format used two spaces between each field which is inconsistent with Buffer usage statistics which is using one space between each field. Change the format to make WAL usage statistics consistent with Buffer usage statistics. This commit also changed the usage of "full page writes" to "full page images" for WAL usage statistics to make it consistent with other parts of code and docs. Author: Julien Rouhaud, Amit Kapila Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-05-05 08:00:53 +05:30
Peter Eisentraut	aaf069aa34	Remove HEAPDEBUGALL This has been broken since PostgreSQL 12 and was probably never really used. PostgreSQL 12 added an analogous HEAPAMSLOTDEBUGALL, which still works right now, but it's also not very useful, so remove that as well. Discussion: https://www.postgresql.org/message-id/flat/645c0646-4218-d4c3-409a-a7003a0c108d%402ndquadrant.com	2020-04-22 08:35:33 +02:00
Amit Kapila	a6fea120a7	Comments and doc fixes for commit `40d964ec99`. Reported-by: Justin Pryzby Author: Justin Pryzby, with few changes by me Reviewed-by: Amit Kapila and Sawada Masahiko Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com	2020-04-14 08:10:27 +05:30
Amit Kapila	ef08ca113f	Cosmetic fixups for WAL usage work. Reported-by: Justin Pryzby and Euler Taveira Author: Justin Pryzby and Julien Rouhaud Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-13 15:31:16 +05:30
Amit Kapila	b7ce6de93b	Allow autovacuum to log WAL usage statistics. This commit allows autovacuum to log WAL usage statistics added by commit `df3b181499`. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-06 16:24:51 +05:30
Andres Freund	f946069e68	Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative(). There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-	2020-04-05 17:47:30 -07:00
Noah Misch	c6b92041d3	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-04-04 12:25:34 -07:00
Amit Kapila	df3b181499	Add infrastructure to track WAL usage. This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-04 10:02:08 +05:30
Amit Kapila	3a5e22138a	Allow parallel vacuum to accumulate buffer usage. Commit `40d964ec99` allowed vacuum command to process indexes in parallel but forgot to accumulate the buffer usage stats of parallel workers. This allows leader backend to accumulate buffer usage stats of all the parallel workers. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud Discussion: https://postgr.es/m/20200328151721.GB12854@nol	2020-04-02 08:04:58 +05:30
Amit Kapila	ef75140fe7	Avoid calls to RelationGetRelationName() and RelationGetNamespace() in vacuum code. After commit `b61d161c14`, during vacuum, we cache the information of relation name and relation namespace in local structure LVRelStats so that we can use it in an error callback function. We can use the cached information to avoid the calls to RelationGetRelationName(), RelationGetNamespace() and get_namespace_name(). This is mainly for the consistent in vacuum code path but it will avoid the extra syscache lookup we do in get_namespace_name(). Author: Justin Pryzby Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-31 09:34:49 +05:30
Michael Paquier	dd9ac7d5d8	Revert "Skip redundant anti-wraparound vacuums" This reverts commit `2aa6e33`, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12	2020-03-31 08:27:47 +09:00
Andres Freund	d4b34f60c5	Deduplicate PageIsNew() check in lazy_scan_heap(). The recheck isn't needed anymore, as RelationGetBufferForTuple() now extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to handle the fact that relation extension extended the relation and then separately acquired a lock on the page - while expecting that the page is empty. Reported-By: Ranier Vilela Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com	2020-03-30 13:56:40 -07:00
Amit Kapila	b61d161c14	Introduce vacuum errcontext to display additional information. The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-30 07:33:38 +05:30
Noah Misch	de9396326e	Revert "Skip WAL for new relfilenodes, under wal_level=minimal." This reverts commit `cb2fd7eac2`. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com	2020-03-22 09:24:09 -07:00
Noah Misch	cb2fd7eac2	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-03-21 09:38:26 -07:00
Thomas Munro	fc34b0d9de	Introduce a maintenance_io_concurrency setting. Introduce a GUC and a tablespace option to control I/O prefetching, much like effective_io_concurrency, but for work that is done on behalf of many client sessions. Use the new setting in heapam.c instead of the hard-coded formula effective_io_concurrency + 10 introduced by commit `558a9165e0`. Go with a default value of 10 for now, because it's a round number pretty close to the value used for that existing case. Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com	2020-03-16 17:14:26 +13:00
Tom Lane	3ed2005ff5	Introduce macros for typalign and typstorage constants. Our usual practice for "poor man's enum" catalog columns is to define macros for the possible values and use those, not literal constants, in C code. But for some reason lost in the mists of time, this was never done for typalign/attalign or typstorage/attstorage. It's never too late to make it better though, so let's do that. The reason I got interested in this right now is the need to duplicate some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch. But in general, this sort of change aids greppability and readability, so it's a good idea even without any specific motivation. I may have missed a few places that could be converted, and it's even more likely that pending patches will re-introduce some hard-coded references. But that's not fatal --- there's no expectation that we'd actually change any of these values. We can clean up stragglers over time. Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us	2020-03-04 10:34:25 -05:00
Michael Paquier	7d672b76bf	Issue properly WAL record for CID of first catalog tuple in multi-insert Multi-insert for heap is not yet used actively for catalogs, but the code to support this case is in place for logical decoding. The existing code forgot to issue a XLOG_HEAP2_NEW_CID record for the first tuple inserted, leading to failures when attempting to use multiple inserts for catalogs at decoding time. This commit fixes the problem by WAL-logging the needed CID. This is not an active bug, so no back-patch is done. Author: Daniel Gustafsson Discussion: https://postgr.es/m/E0D4CC67-A1CF-4DF4-991D-B3AC2EB5FAE9@yesql.se	2020-02-25 07:55:22 +09:00
Fujii Masao	0074919794	Fix mesurement of elapsed time during truncating heap in VACUUM. VACUUM may truncate heap in several batches. The activity report is logged for each batch, and contains the number of pages in the table before and after the truncation, and also the elapsed time during the truncation. Previously the elapsed time reported in each batch was the total elapsed time since starting the truncation until finishing each batch. For example, if the truncation was processed dividing into three batches, the second batch reported the accumulated time elapsed during both first and second batches. This is strange and confusing because the number of pages in the table reported together is not total. Instead, each batch should report the time elapsed during only that batch. The cause of this issue was that the resource usage snapshot was initialized only at the beginning of the truncation and was never reset later. This commit fixes the issue by changing VACUUM so that the resource usage snapshot is reset at each batch. Back-patch to all supported branches. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com	2020-02-19 20:37:26 +09:00
Thomas Munro	701a51fd4e	Use pg_pwrite() in more places. This removes some lseek() system calls. Author: Thomas Munro Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGJ%2BoHhnvqjn3%3DHro7xu-YDR8FPr0FL6LF35kHRX%3D_bUzg%40mail.gmail.com	2020-02-11 17:50:22 +13:00
Alvaro Herrera	15d13e8291	Make vacuum buffer counters 64 bits wide Using 32 bit counters means they can now realistically wrap around when vacuuming extremely large tables. Because they're signed integers, stats printed by vacuum look very odd when they do. We'd love to backpatch this, but refrain because the variables are exported and could cause third-party code to break. Reviewed-by: Julien Rouhaud, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/20200131205926.GA16367@alvherre.pgsql	2020-02-05 16:59:29 -03:00
Alvaro Herrera	c9d2977519	Clean up newlines following left parentheses We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql	2020-01-30 13:42:14 -03:00
Thomas Munro	11da6bccd1	Fix compile error on HP C. Per build farm animal anole, after commit `6f38d4dac3`.	2020-01-28 20:30:40 +13:00
Thomas Munro	6f38d4dac3	Remove dependency on HeapTuple from predicate locking functions. The following changes make the predicate locking functions more generic and suitable for use by future access methods: - PredicateLockTuple() is renamed to PredicateLockTID(). It takes ItemPointer and inserting transaction ID instead of HeapTuple. - CheckForSerializableConflictIn() takes blocknum instead of buffer. - CheckForSerializableConflictOut() no longer takes HeapTuple or buffer. Author: Ashwin Agrawal Reviewed-by: Andres Freund, Kuntal Ghosh, Thomas Munro Discussion: https://postgr.es/m/CALfoeiv0k3hkEb3Oqk%3DziWqtyk2Jys1UOK5hwRBNeANT_yX%2Bng%40mail.gmail.com	2020-01-28 13:13:04 +13:00
Michael Paquier	f942dfb952	Clarify some comments in vacuumlazy.c Author: Justin Pryzby Discussion: https://postgr.es/m/20200113004542.GA26045@telsasoft.com	2020-01-23 15:56:56 +09:00
Amit Kapila	79a3efb84d	Fix the computation of max dead tuples during the vacuum. In commit `40d964ec99`, we changed the way memory is allocated for dead tuples but forgot to update the place where we compute the maximum number of dead tuples. This could lead to invalid memory requests. Reported-by: Andres Freund Diagnosed-by: Andres Freund Author: Masahiko Sawada Reviewed-by: Amit Kapila and Dilip Kumar Discussion: https://postgr.es/m/20200121060020.e3cr7s7fj5rw4lok@alap3.anarazel.de	2020-01-22 07:43:51 +05:30
Amit Kapila	40d964ec99	Allow vacuum command to process indexes in parallel. This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. This option can't be used with the FULL option. Each index is processed by at most one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes. The parallel degree is either specified by the user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The index can participate in parallel vacuum iff it's size is greater than min_parallel_index_scan_size. Author: Masahiko Sawada and Amit Kapila Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra, Mahendra Singh and Sergei Kornilov Tested-by: Mahendra Singh and Prabhat Sahu Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com	2020-01-20 07:57:49 +05:30
Michael Paquier	7689d907bb	Fix comment in heapam.c Improvement per suggestion from Tom Lane. Author: Daniel Gustafsson Discussion: https://postgr.es/m/FED18699-4270-4778-8DA8-10F119A5ECF3@yesql.se	2020-01-13 17:57:38 +09:00
Robert Haas	ce242ae154	tableam: New callback relation_fetch_toast_slice. Instead of always calling heap_fetch_toast_slice during detoasting, invoke a table AM callback which, when the toast table is a heap table, will be heap_fetch_toast_slice. This makes it possible for a table AM other than heap to be used as a TOAST table. It also completes the series of commits intended to improve the interaction of tableam with TOAST that began with commit 8b94dab06617ef80a0901ab103ebd8754427ef5a; detoast.c is now, hopefully, fully AM-independent. Patch by me, reviewed by Andres Freund and Peter Eisentraut. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com	2020-01-07 14:36:38 -05:00
Robert Haas	83322e38da	tableam: Allow choice of toast AM. Previously, the toast table had to be implemented by the same AM that was used for the main table, which was bad, because the detoasting code won't work with anything but heap. This commit doesn't fix the latter problem, although there's another patch coming which does, but it does let you pick something that works (i.e. heap, right now). Patch by me, reviewed by Andres Freund. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com	2020-01-07 14:23:25 -05:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Michael Paquier	7854e07f25	Revert "Rename files and headers related to index AM" This follows multiple complains from Peter Geoghegan, Andres Freund and Alvaro Herrera that this issue ought to be dug more before actually happening, if it happens. Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql	2019-12-27 08:09:00 +09:00

1 2 3 4 5 ...

1025 Commits