postgresql

Commit Graph

Author	SHA1	Message	Date
Michael Paquier	58f5edf849	Fix typo with wait event for SLRU buffer of commit timestamps This wait event was documented as "CommitTsBuffer" since its introduction, but the code named it "CommitTSBuffer". This commit fixes the code to follow the term documented, which is also more consistent with the naming of the other wait events used for commit timestamps. Introduced by `5da1493`. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com Backpatch-through: 13	2023-05-05 21:25:44 +09:00
Alvaro Herrera	f75cec4fff	Fix ExecCheckPermissions call in RI_Initial_Check RI_Initial_Check was setting up a list of RTEPermissionInfo for ExecCheckPermissions() wrong, and the problem is subtle enough that it doesn't have any immediate effect in core code. However, if an extension is using the ExecutorCheckPerms_hook, then it would get the wrong parameters and perhaps arrive at a wrong conclusion, or outright malfunction. Fix by constructing that list and the RTE list more honestly. We also add an assertion check to verify that these lists match. This new assertion would have caught this bug. Co-authored-by: Олег Целебровский (Oleg Tselebrovskii) <o.tselebrovskiy@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/3722b7a2cbe27a1796ee40824bd86dd1@postgrespro.ru	2023-05-04 19:55:56 +02:00
Tom Lane	4c40995f61	In array_position()/array_positions(), beware of empty input array. These functions incautiously fetched the array's first lower bound even when the array is zero-dimensional, thus fetching the word after the allocated array space. While almost always harmless, with very bad luck this could result in SIGSEGV. Fix by adding an early exit for empty input. Per bug #17920 from Alexander Lakhin. Discussion: https://postgr.es/m/17920-f7c228c627b6d02e%40postgresql.org	2023-05-04 11:48:23 -04:00
Alvaro Herrera	5472743d9e	Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt" This reverts commit `ec38694894` and its fixup `589bb81649`. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql	2023-05-04 12:09:59 +02:00
Amit Kapila	de63f8dade	Fix assertion failure in apply worker. During exit, the logical replication apply worker tries to release session level locks, if any. However, if the apply worker exits due to an error before its connection is initialized, trying to release locks can lead to assertion failure. The locks will be acquired once the worker is initialized, so we don't need to release them till the worker initialization is complete. Reported-by: Alexander Lakhin Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com	2023-05-03 10:17:49 +05:30
Peter Eisentraut	e0bb5d0c5b	Update SQL features Some updates for SQL:2023 and some new features in PostgreSQL 16.	2023-05-02 10:59:21 +02:00
Michael Paquier	8961cb9a03	Fix typos in comments The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com	2023-05-02 12:23:08 +09:00
Michael Paquier	4dadd660f0	Fix crashes with CREATE SCHEMA AUTHORIZATION and schema elements CREATE SCHEMA AUTHORIZATION with appended schema elements can lead to crashes when comparing the schema name of the query with the schemas used in the qualification of some clauses in the elements' queries. The origin of the problem is that the transformation routine for the elements listed in a CREATE SCHEMA query uses as new, expected, schema name the one listed in CreateSchemaStmt itself. However, depending on the query, CreateSchemaStmt.schemaname may be NULL, being computed instead from the role specification of the query given by the AUTHORIZATION clause, that could be either: - A user name string, with the new schema name being set to the same value as the role given. - Guessed from CURRENT_ROLE, SESSION_ROLE or CURRENT_ROLE, with a new schema name computed from the security context where CREATE SCHEMA is running. Regression tests are added for CREATE SCHEMA with some appended elements (some of them with schema qualifications), covering also some role specification patterns. While on it, this simplifies the context structure used during the transformation of the elements listed in a CREATE SCHEMA query by removing the fields for the role specification and the role type. They were not used, and for the role specification this could be confusing as the schema name may by extracted from that at the beginning of CreateSchemaCommand(). This issue exists for a long time, so backpatch down to all the versions supported. Reported-by: Song Hongyu Author: Michael Paquier Reviewed-by: Richard Guo Discussion: https://postgr.es/m/17909-f65c12dfc5f0451d@postgresql.org Backpatch-through: 11	2023-04-28 19:29:12 +09:00
Daniel Gustafsson	4a6603cd46	Fix assertion failure in heap_vacuum_rel Commit `7d71d3dd08` changed resetting the VacuumFailsafeActive flag to an assertion since the flag is reset before starting vacuuming a relation. This however failed to take recursive calls of vacuum_rel() and vacuum of TOAST tables into consideration. Fix by reverting back to resettting the flag. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/CAFBsxsFz=GqaG5Ens5aNgVYoV2Y+pfMUijX0ku+CCkWfALwiqg@mail.gmail.com	2023-04-28 10:30:05 +02:00
Masahiko Sawada	b72f564d87	Add unit to vacuum_buffer_usage_limit value in postgresql.conf.sample. Also adjust the indentation of the comment to the surrounding parameters. Author: Masahiko Sawada Reviewed-by: David Rowley, Daniel Gustafsson, Melanie Plageman Discussion: https://postgr.es/m/CAD21AoCBSqmqOKVH4Q256DeCC_UE50gu1sgixcjLFZGLEbABVA@mail.gmail.com	2023-04-28 15:40:12 +09:00
Nathan Bossart	b72623671d	Prevent underflow in KeepLogSeg(). The call to XLogGetReplicationSlotMinimumLSN() might return a greater LSN than the one given to the function. Subsequent segment number calculations might then underflow, which could result in unexpected behavior when removing or recyling WAL files. This was introduced with max_slot_wal_keep_size in `c655077639`. To fix, skip the block of code for replication slots if the LSN is greater. Reported-by: Xu Xingwang Author: Kyotaro Horiguchi Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/17903-4288d439dee856c6%40postgresql.org Backpatch-through: 13	2023-04-27 14:31:17 -07:00
Alexander Korotkov	db93e739ac	Fix wrong construct_array_builtin() call in GUCArrayDelete() The current code unintentionally uses the wrong datum to construct an array. The bug was introduced by `096dd80f3c`, so no backpatching is needed. Reported-by: David Steele Discussion: https://postgr.es/m/d46f9265-ff3c-6743-2278-6772598233c2%40pgmasters.net Author: Nathan Bossart Reviewed-by: David Steele, Tom Lane	2023-04-27 22:06:14 +03:00
Michael Paquier	84cc142674	Re-add tracking of wait event SLRUFlushSync SLRUFlushSync has been accidently removed during `dee663f`, that has moved the flush of the SLRU files to the checkpointer, so add it back. The issue has been noticed by Thomas when checking for orphaned wait events. Author: Thomas Munro Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CA+hUKGK6tqm59KuF1z+h5Y8fsWcu5v8+84kduSHwRzwjB2aa_A@mail.gmail.com	2023-04-26 07:10:06 +09:00
Daniel Gustafsson	bfac8f8bc4	Fix vacuum_cost_delay check for balance calculation. Commit `1021bd6a89` excluded autovacuum workers from cost-limit balance calculations when per-relation options were set. The code checks for limit and cost_delay being greater than zero, but since cost_delay can be set to -1 the test needs to check for greater than or zero. Backpatch to all supported branches since `1021bd6a89` was backpatched all the way at the time. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com Backpatch-through: v11 (all supported branches)	2023-04-25 13:54:10 +02:00
Michael Paquier	806fad7573	Fix buffer refcount leak with FDW bulk inserts The leak would show up when using batch inserts with foreign tables included in a partition tree, as the slots used in the batch were not reset once processed. In order to fix this problem, some ExecClearTuple() are added to clean up the slots used once a batch is filled and processed, mapping with the number of slots currently in use as tracked by the counter ri_NumSlots. This buffer refcount leak has been introduced in `b676ac4` with the addition of the executor facility to improve bulk inserts for FDWs, so backpatch down to 14. Alexander has provided the patch (slightly modified by me). The test for postgres_fdw comes from me, based on the test case that the author has sent in the report. Author: Alexander Pyhalov Discussion: https://postgr.es/m/b035780a740efd38dc30790c76927255@postgrespro.ru Backpatch-through: 14	2023-04-25 09:42:19 +09:00
Andres Freund	1118cd37eb	Remove vacuum_defer_cleanup_age vacuum_defer_cleanup_age was introduced before hot_standby_feedback and replication slots existed. It is hard to use reasonably - commonly it will either be set too low (not preventing recovery conflicts, while still causing some bloat), or too high (causing a lot of bloat). The alternatives do not have that issue. That on its own might not be sufficient reason to remove vacuum_defer_cleanup_age, but it also complicates computation of xid horizons. See e.g. the bug fixed in `be504a3e97`. It also is untested. This commit removes TransactionIdRetreatSafely(), as there are no users anymore. There might be potential future users, hence noting that here. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4axls@awork3.anarazel.de	2023-04-24 12:21:02 -07:00
Tom Lane	fce3b26e97	Rename ExecAggTransReparent, and improve its documentation. The name of this function suggests that it ought to reparent R/W expanded objects to be children of the persistent aggcontext, instead of copying them. In fact it does no such thing, and if you try to make it do so you will see multiple regression failures. Rename it to the less-misleading ExecAggCopyTransValue, and add commentary about why that attractive-sounding optimization won't work. Also adjust comments at call sites, some of which were describing logic that has since been moved into ExecAggCopyTransValue. Discussion: https://postgr.es/m/3004282.1681930251@sss.pgh.pa.us	2023-04-24 13:01:33 -04:00
Peter Eisentraut	5a8366ad75	doc: Update SQL features names Some feature names have been adjusted over time upstream. This pulls in those changes.	2023-04-24 15:39:54 +02:00
Daniel Gustafsson	69537f5d17	Remove duplicate lines of code Commit `6df7a9698b` accidentally included two identical prototypes for default_multirange_selectivi() and commit `086cf1458c` added a break; statement where one was already present, thus duplicating it. While there is no bug caused by this, fix by removing the duplicated lines as they provide no value. Backpatch the fix for duplicate prototypes to v14 and the duplicate break statement fix to all supported branches to avoid backpatching hazards due to the removal. Reported-by: Anton Voloshin <a.voloshin@postgrespro.ru> Discussion: https://postgr.es/m/0e69cb60-0176-f6d0-7e15-6478b7d85724@postgrespro.ru	2023-04-24 11:16:17 +02:00
Masahiko Sawada	781ac42d43	Use elog to report unexpected action in handle_streamed_transaction(). An oversight in commit `216a784829`. Author: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDDbM8_HJt-nMCvcjTK8K9hPzXWqJj7pyaUvR4mm_NrSg@mail.gmail.com	2023-04-24 15:37:14 +09:00
Alexander Korotkov	cd115c3530	Fix custom validators call in build_local_reloptions() We need to call them only when validate == true. Backpatch to 13, where opclass options were introduced. Reported-by: Tom Lane Discussion: https://postgr.es/m/2656633.1681831542%40sss.pgh.pa.us Reviewed-by: Tom Lane, Pavel Borisov Backpatch-through: 13	2023-04-23 13:58:41 +03:00
Jeff Davis	c04c6c5d6f	Avoid character classification in regex escape parsing. For regex escape sequences, just test directly for the relevant ASCII characters rather than using locale-sensitive character classification. This fixes an assertion failure when a locale considers a non-ASCII character, such as "൧", to be a digit. Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs49Q6UoKGeT8pBkMtJGJd+16CBFZaaWUk9Du+2ERE5g_YA@mail.gmail.com Backpatch-through: 11	2023-04-21 08:19:41 -07:00
David Rowley	d91d163529	Fix incorrect function name reference This function was renamed in `0c9d84427` but this comment wasn't updated. Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com	2023-04-21 10:46:08 +12:00
Michael Paquier	0ecb87e1fa	Remove io prefix from pg_stat_io columns `a9c70b46` added the statistics view pg_stat_io which contained columns "io_context" and "io_object". Given that the columns are in the pg_stat_io view, the "io" prefix is somewhat redundant, so remove it. The code variables referring to these fields are kept unchanged so as they can keep their context about I/O. Bump catalog version. Author: Melanie Plageman Reviewed-by: Kyotaro Horiguchi, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CAAKRu_aAQoJWrvT2BYYQvJChFKra_O-5ra3jhzKJZqWsTR1CPQ@mail.gmail.com	2023-04-21 07:21:50 +09:00
Daniel Gustafsson	a9781ae11b	Fix autovacuum cost debug logging Commit `7d71d3dd0` introduced finer grained updates of autovacuum option changes by increasing the frequency of reading the configuration file. The debug logging of cost parameter was however changed such that some initial values weren't logged. Fix by changing logging to use the old frequency of logging regardless of them changing. Also avoid taking a log for rendering the log message unless the set loglevel is such that the log entry will be emitted. Author: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoBS7o6Ljt_vfqPQPf67AhzKu3fR0iqk8B=vVYczMugKMQ@mail.gmail.com	2023-04-20 15:45:44 +02:00
Amit Kapila	c1cc4e688b	Restart the apply worker if the 'password_required' option is changed. The apply worker is restarted if any subscription option that affects the remote connection was changed. In commit `c3afe8cf5a`, we added the option 'password_required' which can affect the remote connection, so we should restart the worker if it was changed. Author: Amit Kapila Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CAA4eK1+z9UDFEynXLsWeMMuUZc1iQkRwj2HNDtxUHTPo-u1F4A@mail.gmail.com Discussion: https://postgr.es/m/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53@enterprisedb.com	2023-04-20 08:56:18 +05:30
Thomas Munro	7d3d72b55e	Remove obsolete defense against strxfrm() bugs. Old versions of Solaris and illumos had buffer overrun bugs in their strxfrm() implementations. The bugs were fixed more than a decade ago and the relevant releases are long out of vendor support. It's time to remove the defense added by commit `be8b06c3`. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com	2023-04-20 13:20:14 +12:00
David Rowley	e35ded2956	Fix list_copy_head() with empty Lists list_copy_head() given an empty List would crash from trying to dereference the List to obtain its length. Since NIL is how we represent an empty List, we should just be returning another empty List in this case. list_copy_head() is new to v16, so let's fix it now before too many people start coding around the buggy NIL behavior. Reported-by: Miroslav Bendik Discussion: https://postgr.es/m/CAPoEpV02WhawuWnmnKet6BqU63bEu7oec0pJc=nKMtPsHMzTXQ@mail.gmail.com	2023-04-20 10:34:46 +12:00
Peter Geoghegan	2584639653	Use nbtdesc "level" field name consistently. The "lev" name that appeared in NEWROOT nbtree record desc output was inconsistent with the symbol name from the underlying C struct. It was also inconsistent with nbtdesc output for other nearby record types with similar level fields. Standardize on "level" to make everything consistent. Follow-up to commit `1c453cfd`.	2023-04-19 12:15:15 -07:00
Peter Geoghegan	50547a3fae	Fix wal_consistency_checking enhanced desc output. Recent enhancements to rmgr desc routines that made the output summarize certain block data (added by commits `7d8219a4` and `1c453cfd`) dealt with records that lack relevant block data (and so have nothing to give a more detailed summary of) by testing !DecodedBkpBlock.has_image. As a result, more detailed descriptions of block data were not output when wal_consistency_checking was enabled. This bug affected records with summarizable block data that also happened to have an FPI that the REDO routine isn't supposed to apply (FPIs used for consistency checking purposes only). The presence of such an FPI was incorrectly taken to indicate the absence of block data. To fix, test DecodedBkpBlock.has_data, not !DecodedBkpBlock.has_image. This is the exact condition that we care about, not an inexact proxy. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzm5Sc9cBg1qWV_cEBfLNJCrW9FjS-SoHVt8FLA7Ldn8yg@mail.gmail.com	2023-04-19 10:42:39 -07:00
Peter Eisentraut	77dedeb2c4	Remove some tabs in SQL code in C string literals This is not handled uniformly throughout the code, but at least nearby code can be consistent.	2023-04-19 09:29:43 +02:00
David Rowley	3f58a4e296	Fix various typos and incorrect/outdated name references Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com	2023-04-19 13:50:33 +12:00
Peter Geoghegan	06e0652750	Remove useless argument from nbtree dedup function. _bt_dedup_pass()'s heapRel argument hasn't been needed or used since commit `cf2acaf4dc` made deleting any existing LP_DEAD index tuples the caller's responsibility.	2023-04-18 10:33:15 -07:00
Robert Haas	363e8f9115	Fix pg_basebackup with in-place tablespaces some more. Commit `c6f2f01611` purported to make this work, but problems remained. In a plain-format backup, the files from an in-place tablespace got included in the tar file for the main tablespace, which is wrong but it's not clear that it has any user-visible consequences. In a tar-format backup, the TABLESPACE_MAP option is used, and so we never iterated over pg_tblspc and thus never backed up the in-place tablespaces anywhere at all. To fix this, reverse the changes in that commit, so that when we scan pg_tblspc during a backup, we create tablespaceinfo objects even for in-place tablespaces. We set the field that would normally contain the absolute pathname to the relative path pg_tblspc/${TSOID}, and that's good enough to make basebackup.c happy without any further changes. However, pg_basebackup needs a couple of adjustments to make it work. First, it needs to understand that a relative path for a tablespace means it's an in-place tablespace. Second, it needs to tolerate the situation where restoring the main tablespace tries to create pg_tblspc or a subdirectory and finds that it already exists, because we restore user-defined tablespaces before the main tablespace. Since in-place tablespaces are only intended for use in development and testing, no back-patch. Patch by me, reviewed by Thomas Munro and Michael Paquier. Discussion: http://postgr.es/m/CA+TgmobwvbEp+fLq2PykMYzizcvuNv0a7gPMJtxOTMOuuRLMHg@mail.gmail.com	2023-04-18 11:23:34 -04:00
David Rowley	eef231e816	Fix some typos and some incorrectly duplicated words Author: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com	2023-04-18 14:03:49 +12:00
David Rowley	b4dbf3e924	Fix various typos This fixes many spelling mistakes in comments, but a few references to invalid parameter names, function names and option names too in comments and also some in string constants Also, fix an #undef that was undefining the incorrect definition Author: Alexander Lakhin Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com	2023-04-18 13:23:23 +12:00
Jeff Davis	e39d512f3e	Comment fix for `60684dd834`. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/766f3799-0269-162f-ba63-4cae34a5534f@enterprisedb.com	2023-04-17 13:45:50 -07:00
Tom Lane	3e383f9b68	Avoid trying to write an empty WAL record in log_newpage_range(). If the last few pages in the specified range are empty (all zero), then log_newpage_range() could try to emit an empty WAL record containing no FPIs. This at least upsets an Assert in ReserveXLogInsertLocation, and might perhaps have bad real-world consequences in non-assert builds. This has been broken since log_newpage_range() was introduced, but the case was hard if not impossible to hit before commit `3d6a98457` decided it was okay to leave VM and FSM pages intentionally zero. Nonetheless, it seems prudent to back-patch. log_newpage_range() was added in v12 but later back-patched, so this affects all supported branches. Matthias van de Meent, per report from Justin Pryzby Discussion: https://postgr.es/m/ZD1daibg4RF50IOj@telsasoft.com	2023-04-17 14:22:26 -04:00
Peter Eisentraut	1c77de9801	doc: Add additional SQL features codes from SQL:2023 These were mysteriously omitted in `c9f57541d9`.	2023-04-17 16:06:41 +02:00
Tom Lane	78d5952dd0	Ensure result of an aggregate's finalfunc is made read-only. The finalfunc might return a read-write expanded object. If we de-duplicate multiple call sites for the aggregate, any function(s) receiving the aggregate result earlier could alter or destroy the value that reaches the ones called later. This is a brown-paper-bag bug in commit `42b746d4c`, because we actually considered the need for read-only-ness but failed to realize that it applied to the case with a finalfunc as well as the case without. Per report from Justin Pryzby. New error in HEAD, no need for back-patch. Discussion: https://postgr.es/m/ZDm5TuKsh3tzoEjz@telsasoft.com	2023-04-16 14:16:40 -04:00
Tom Lane	064eb89e83	Fix assignment to array of domain over composite, redux. Commit `3e310d837` taught isAssignmentIndirectionExpr() to look through CoerceToDomain nodes. That's not sufficient, because since commit `04fe805a1` it's been possible for the planner to simplify CoerceToDomain to RelabelType when the domain has no constraints to enforce. So we need to look through RelabelType too. Per bug #17897 from Alexander Lakhin. Although `3e310d837` was back-patched to v11, it seems sufficient to apply this change to v12 and later, since `04fe805a1` came in in v12. Dmitry Dolgov Discussion: https://postgr.es/m/17897-4216c546c3874044@postgresql.org	2023-04-15 12:01:39 -04:00
David Rowley	414d66220a	Adjust Valgrind macro usage to protect chunk headers Prior to this commit we only ever protected MemoryChunk's requested_size field with Valgrind NOACCESS. This means that if the hdrmask field is ever accessed accidentally then we're not going to get any warnings from Valgrind about it. Valgrind would have warned us about the problem fixed in `92957ed98` had we already been doing this. Per suggestion from Tom Lane Reviewed-by: Richard Guo Discussion: https://postgr.es/m/1650235.1672694719@sss.pgh.pa.us Discussion: https://postgr.es/m/CAApHDvr=FZNGbj252Z6M9BSFKoq6BMxgkQ2yEAGUYoo7RquqZg@mail.gmail.com	2023-04-15 11:59:52 +12:00
Andres Freund	43a33ef54e	Support RBM_ZERO_AND_CLEANUP_LOCK in ExtendBufferedRelTo(), add tests For some reason I had not implemented RBM_ZERO_AND_CLEANUP_LOCK support in ExtendBufferedRelTo(), likely thinking it not being reachable. But it is reachable, e.g. when replaying a WAL record for a page in a relation that subsequently is truncated (likely only reachable when doing crash recovery or PITR, not during ongoing streaming replication). As now all of the RBM_* modes are supported, remove assertions checking mode. As we had no test coverage for this scenario, add a new TAP test. There's plenty more that ought to be tested in this area... Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/392271.1681238924%40sss.pgh.pa.us Discussion: https://postgr.es/m/0b5eb82b-cb99-e0a4-b932-3dc60e2e3926@gmail.com	2023-04-14 11:30:33 -07:00
Tom Lane	e4d905f772	NULL is not an ideal way to spell bool "false". Thinko in commit `6633cfb21`, detected by buildfarm member hamerkop.	2023-04-14 13:31:51 -04:00
David Rowley	e0693faf79	Fix incorrect partition pruning logic for boolean partitioned tables The partition pruning logic assumed that "b IS NOT true" was exactly the same as "b IS FALSE". This is not the case when considering NULL values. Fix this so we correctly include any partition which could hold NULL values for the NOT case. Additionally, this fixes a bug in the partition pruning code which handles partitioned tables partitioned like ((NOT boolcol)). This is a seemingly unlikely schema design, and it was untested and also broken. Here we add tests for the ((NOT boolcol)) case and insert some actual data into those tables and verify we do get the correct rows back when running queries. I've also adjusted the existing boolpart tests to include some data and verify we get the correct results too. Both the bugs being fixed here could lead to incorrect query results with fewer rows being returned than expected. No additional rows could have been returned accidentally. In passing, remove needless ternary expression. It's more simple just to pass !is_not_clause to makeBoolConst(). It makes sense to do this so the code is consistent with the bug fix in the "else if" condition just below. David Kimura did submit a patch to fix the first of the issues here, but that's not what's being committed here. Reported-by: David Kimura Reviewed-by: Richard Guo, David Kimura Discussion: https://postgr.es/m/CAHnPFjQ5qxs6J_p+g8=ww7GQvfn71_JE+Tygj0S7RdRci1uwPw@mail.gmail.com Backpatch-through: 11, all supported versions	2023-04-14 16:20:27 +12:00
Thomas Munro	558c9d75fe	Fix PHJ match bit initialization. Hash join tuples reuse the HOT status bit to indicate match status during hash join execution. Correct reuse requires clearing the bit in all tuples. Serial hash join and parallel multi-batch hash join do so upon inserting the tuple into the hashtable. Single batch parallel hash join and batch 0 of unexpected multi-batch hash joins forgot to do this. It hadn't come up before because hashtable tuple match bits are only used for right and full outer joins and parallel ROJ and FOJ were unsupported. `11c2d6fdf5` introduced support for parallel ROJ/FOJ but neglected to ensure the match bits were reset. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/flat/CAMbWs48Nde1Mv%3DBJv6_vXmRKHMuHZm2Q_g4F6Z3_pn%2B3EV6BGQ%40mail.gmail.com	2023-04-14 11:02:38 +12:00
Michael Paquier	a282697088	Remove code in charge of freeing regexps generation by Lab.c `bea3d7e` has redesigned the regexp engine so as all the allocations go through palloc() with a dedicated memory context. hba.c had to cope with the past memory management logic by going through all the HBA and ident lines generated, then directly free all the regexps found in AuthTokens to ensure that no leaks would happen. Such leaks could happen for example in the postmaster after a SIGHUP, in the event of an HBA and/or ident reload failure where all the new content parsed must be discarded, including all the regexps that may have been compiled. Now that regexps are palloc()'d in their own memory context, MemoryContextDelete() is enough to ensure that all the compiled regexps are properly gone. Simplifying this logic in hba.c has the effect to only remove code. Most of it is new in v16, except the part for regexps compiled in ident entries for the system username, so doing this cleanup now rather than when v17 opens for business will reduce future diffs with the upcoming REL_16_STABLE. Some comments were incorrect since `bea3d7e`, now fixed to reflect the reality. Reviewed-by: Bertrand Drouvot, Álvaro Herrera Discussion: https://postgr.es/m/ZDdJ289Ky2qEj4h+@paquier.xyz	2023-04-14 07:27:44 +09:00
David Rowley	0981846b9c	Remove old GUC name mapping for "force_parallel_mode" This GUC was renamed to debug_parallel_query in `5352ca22e`. That commit added an entry into map_old_guc_names[] to allow the old name still to work. That was done to allow a transition time where the buildfarm configs could be swapped over to use debug_parallel_query instead. That work is now complete. Here we remove the old name with the intention of breaking any user code which is using force_parallel_query. As mentioned in the commit message for `5352ca22e`, it appeared many users were misled into thinking that setting this GUC was doing something useful for them to make queries run more quickly. Discussion: https://postgr.es/m/CAApHDvoR7EOz7Tvyzrd18FO-Dw2Cp4Uyq25TEWguK+XyCJtzOw@mail.gmail.com	2023-04-14 10:19:45 +12:00
Peter Geoghegan	d6f0f95a6b	Harmonize some more function parameter names. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced relatively recently, after the code base had parameter name mismatches fixed in bulk (see commits starting with commits `4274dc22` and `035ce1fe`). pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. Like all earlier commits that cleaned up function parameter names, this commit was written with help from clang-tidy.	2023-04-13 10:15:20 -07:00
Stephen Frost	f7431bca8b	Explicitly require MIT Kerberos for GSSAPI WHen building with GSSAPI support, explicitly require MIT Kerberos and check for gssapi_ext.h in configure.ac and meson.build. Also add documentation explicitly stating that we now require MIT Kerberos when building with GSSAPI support. Reveiwed by: Johnathan Katz Discussion: https://postgr.es/m/abcc73d0-acf7-6896-e0dc-f5bc12a61bb1@postgresql.org	2023-04-13 08:55:13 -04:00
Stephen Frost	6633cfb216	De-Revert "Add support for Kerberos credential delegation" This reverts commit `3d03b24c3` (Revert Add support for Kerberos credential delegation) which was committed on the grounds of concern about portability, but on further review and discussion, it's clear that we are better off explicitly requiring MIT Kerberos as that appears to be the only GSSAPI library currently that's under proper maintenance and ongoing development. The API used for storing credentials was added to MIT Kerberos over a decade ago while for the other libraries which appear to be mainly based on Heimdal, which exists explicitly to be a re-implementation of MIT Kerberos, the API never made it to a released version (even though it was added to the Heimdal git repo over 5 years ago..). This post-feature-freeze change was approved by the RMT. Discussion: https://postgr.es/m/ZDDO6jaESKaBgej0%40tamriel.snowman.net	2023-04-13 08:55:07 -04:00
Thomas Munro	b37d051b0e	Remove overzealous assertion from PHJ. We can't assert that we're the only process attached to a barrier after BarrierArriveAndDetachExceptLast(). Although that'll be true almost always, a late-starting parallel worker can attach very briefly (that is, immediately detach after checking the phase) right at that moment. BarrierArriveAndDetachExceptLast() already contains an assertion like that, but it holds a spinlock preventing the race. This thinko caused a one-off failure on build farm animal chimaera. Diagnosed-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3590249.1680971629@sss.pgh.pa.us	2023-04-13 09:37:54 +12:00
Andres Freund	5ec69b71f1	Improve error messages introduced in `be87200efd` and `0fdab27ad6` Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20230411.120301.93333867350615278.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20230412174244.6njadz4uoiez3l74@awork3.anarazel.de	2023-04-12 11:00:37 -07:00
Alvaro Herrera	9ce04b50e1	Revert "Catalog NOT NULL constraints" and fallout This reverts commit `e056c557ae` and minor later fixes thereof. There's a few problems in this new feature -- most notably regarding pg_upgrade behavior, but others as well. This new feature is not in any way critical on its own, so instead of scrambling to fix it we revert it and try again in early 17 with these issues in mind. Discussion: https://postgr.es/m/3801207.1681057430@sss.pgh.pa.us	2023-04-12 19:29:21 +02:00
Tom Lane	88ceac5d77	Fix parallel-safety marking when moving initplans to another node. Our policy since commit `ab77a5a45` has been that a plan node having any initplans is automatically not parallel-safe. (This could be relaxed, but not today.) clean_up_removed_plan_level neglected this, and could attach initplans to a parallel-safe child plan node without clearing the plan's parallel-safe flag. That could lead to "subplan was not initialized" errors at runtime, in case an initplan referenced another one and only the referencing one got transmitted to parallel workers. The fix in clean_up_removed_plan_level is trivial enough. materialize_finished_plan also moves initplans from one node to another, but it's okay because it already copies the source node's parallel_safe flag. The other place that does this kind of thing is standard_planner's hack to inject a top-level Gather when debug_parallel_query is active. But that's actually dead code given that we're correctly enforcing the "initplans aren't parallel safe" rule, so just replace it with an Assert that there are no initplans. Also improve some related comments. Normally we'd add a regression test case for this sort of bug. The mistake itself is already reached by existing tests, but there is accidentally no visible problem. The only known test case that creates an actual failure seems too indirect and fragile to justify keeping it as a regression test (not least because it fails to fail in v11, though the bug is clearly present there too). Per report from Justin Pryzby. Back-patch to all supported branches. Discussion: https://postgr.es/m/ZDVt6MaNWkRDO1LQ@telsasoft.com	2023-04-12 10:46:38 -04:00
Peter Eisentraut	5f38a2034e	Fix incorrect format placeholders	2023-04-12 10:05:50 +02:00
Peter Geoghegan	c03c2eae0a	Refine the guidelines for rmgrdesc authors. Clarify the goals of the recently added guidelines for rmgrdesc authors: to avoid gratuitous inconsistencies across resource managers, and to make it reasonably easy to write a reusable custom parser. Beyond that, the guidelines leave rmgrdesc authors with a significant amount of leeway. This even includes the leeway to invent custom conventions (in cases where it's warranted). Follow-up to commit `7d8219a4`. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-11 15:26:24 -07:00
Peter Geoghegan	96149a180d	Fix Heap rmgr's desc output for infobits arrays. Make heap desc routines that output status bit as arrays of constants avoid outputting array literals that contain superfluous punctuation characters that complicate parsing the output. Also make sure that no heap desc routine repeats the same key name (at the same nesting level), for the same reason. Arguably, these were both oversights in commit `7d8219a4`. In passing, make the desc output code (which covers Heap's DELETE, UPDATE, HOT_UPDATE, LOCK, and LOCK_UPDATED record types) consistent in terms of the output order of each field. This order also matches WAL record struct order. Heap's DELETE desc output now shows the record's xmax field for the first time (just like UPDATE/HOT_UPDATE records). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=pNYtxiJ2Jx5Lj=fKo1OEZ4GE0p_kct+ugAUTqBwU46g@mail.gmail.com	2023-04-11 15:25:02 -07:00
Peter Geoghegan	e944063294	Fix xl_heap_lock WAL record field's data type. Make xl_heap_lock's infobits_set field of type uint8, not int8. Using int8 isn't appropriate given that the field just holds status bits. This fixes an oversight in commit `0ac5ad5134`. In passing rename the nearby TransactionId field to "xmax" to make things consistency with related records, such as xl_heap_lock_updated. Deliberately avoid a bump in XLOG_PAGE_MAGIC. No backpatch, either. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkCd3kOS8b7Rfxw7Mh1_6jvX=Nzo-CWR1VBTiOtVZkWHA@mail.gmail.com	2023-04-11 14:07:54 -07:00
David Rowley	4c8a1b4769	Fix uninitialized variable in transformTableLikeClause() process_notnull_constraints should be set to false until we discover a NOT NULL column. Discovered while running Valgrind. Discussion: https://postgr.es/m/CAApHDvoMyiZVi1KW5WVdqMRzWsWkD3F7n6QD+BbAO6WTeAWsUQ@mail.gmail.com	2023-04-11 23:01:12 +12:00
David Rowley	68a2a437f4	Improve ereports for VACUUM's BUFFER_USAGE_LIMIT option There's no need to check if opt->arg is NULL since defGetString() already does that and raises an ERROR if it is. Let's just remove that check. Also, combine the two remaining ERRORs into a single check. It seems better to give an indication about what sort of values we're looking for rather than just to state that the value given isn't valid. Make BUFFER_USAGE_LIMIT uppercase in this ERROR message too. It's already upper case in one other error message, so make that consistent. Reported-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/20230411.102335.1643720544536884844.horikyota.ntt@gmail.com	2023-04-11 19:36:34 +12:00
Peter Geoghegan	26e65ebdb2	Clarify nbtree posting list update desc issue. Per complaint from Melanie Plageman. Follow-up to commit `5d6728e5`. Reported-By: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230411002315.oyaicmcqrq2hb3ek@liskov	2023-04-10 17:55:23 -07:00
Peter Geoghegan	5d6728e588	Fix nbtree posting list update desc output. We cannot use the generic array_desc approach with per-tuple nbtree posting list update metadata because array_desc can only deal with fixed width elements (e.g., page offset numbers). Using array_desc led to incorrect rmgr descriptions for updates from nbtree DELETE/VACUUM WAL records. To fix, add specialized code to describe the update metadata as array elements in desc output. We now iterate over the update metadata using an approach that matches related REDO routines. Also stop showing the updates offset number array separately in nbtree DELETE/VACUUM desc output. It's redundant information, since the same page offset numbers appear in the description of each individual update element. Also make some small tweaks to the way that we format arrays in all desc routines (not just nbtree desc routines) to make arrays a little less verbose. Oversight in commit `1c453cfd`, which enhanced the nbtree rmgr desc routines. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkbYuvwYKm-Y-72QEh6SPMQcAo9uONv+mR3bMGcu9E_Cg@mail.gmail.com	2023-04-10 11:15:41 -07:00
Stephen Frost	3d03b24c35	Revert "Add support for Kerberos credential delegation" This reverts commit `3d4fa227bc`. Per discussion and buildfarm, this depends on APIs that seem to not be available on at least one platform (NetBSD). Should be certainly possible to rework to be optional on that platform if necessary but bit late for that at this point. Discussion: https://postgr.es/m/3286097.1680922218@sss.pgh.pa.us	2023-04-08 07:21:35 -04:00
Thomas Munro	db4f21e4a3	Redesign interrupt/cancel API for regex engine. Previously, a PostgreSQL-specific callback checked by the regex engine had a way to trigger a special error code REG_CANCEL if it detected that the next call to CHECK_FOR_INTERRUPTS() would certainly throw via ereport(). A later proposed bugfix aims to move some complex logic out of signal handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above design impossible unless we split CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to ereport(). We may develop such a system in the future, but for the regex code it is no longer necessary. An earlier commit moved regex memory management over to our MemoryContext system. Given that the purpose of the two-phase interrupt checking was to free memory before throwing, something we don't need to worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just let it throw. Since the plan is to keep PostgreSQL-specific concerns separate from the main regex engine code (with a view to bein able to stay in sync with other projects), do this with a new macro INTERRUPT(), customizable in regcustom.h and defaulting to nothing. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:10:39 +12:00
Thomas Munro	4f51429dd7	Update tsearch regex memory management. Now that our regex engine uses palloc(), it's not necessary to set up a special memory context callback to free compiled regexes. The regex has no resources other than the memory that is already going to be freed in bulk. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:09:17 +12:00
Thomas Munro	bea3d7e383	Use MemoryContext API for regex memory management. Previously, regex_t objects' memory was managed with malloc() and free() directly. Switch to palloc()-based memory management instead. Advantages: * memory used by cached regexes is now visible with MemoryContext observability tools * cleanup can be done automatically in certain failure modes (something that later commits will take advantage of) * cleanup can be done in bulk On the downside, there may be more fragmentation (wasted memory) due to per-regex MemoryContext objects. This is a problem shared with other cached objects in PostgreSQL and can probably be improved with later tuning. Thanks to Noah Misch for suggesting this general approach, which unblocks later work on interrupts. Suggested-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:08:41 +12:00
Andres Freund	0fdab27ad6	Allow logical decoding on standbys Unsurprisingly, this requires wal_level = logical to be set on the primary and standby. The infrastructure added in `26669757b6` ensures that slots are invalidated if the primary's wal_level is lowered. Creating a slot on a standby waits for a xl_running_xact record to be processed. If the primary is idle (and thus not emitting xl_running_xact records), that can take a while. To make that faster, this commit also introduces the pg_log_standby_snapshot() function. By executing it on the primary, completion of slot creation on the standby can be accelerated. Note that logical decoding on a standby does not itself enforce that required catalog rows are not removed. The user has to use physical replication slots + hot_standby_feedback or other measures to prevent that. If catalog rows required for a slot are removed, the slot is invalidated. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion, for the addition of the pg_log_standby_snapshot() function. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com>	2023-04-08 02:20:05 -07:00
Andres Freund	e101dfac3a	For cascading replication, wake physical and logical walsenders separately Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would have been not applied yet, so logical walsenders were awakened too early. Per idea from Jeff Davis and Amit Kapila. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com	2023-04-08 01:06:00 -07:00
Andres Freund	26669757b6	Handle logical slot conflicts on standby During WAL replay on the standby, when a conflict with a logical slot is identified, invalidate such slots. There are two sources of conflicts: 1) Using the information added in `6af1793954`, logical slots are invalidated if required rows are removed 2) wal_level on the primary server is reduced to below logical Uses the infrastructure introduced in the prior commit. FIXME: add commit reference. Change InvalidatePossiblyObsoleteSlot() to use a recovery conflict to interrupt use of a slot, if called in the startup process. The new recovery conflict is added to pg_stat_database_conflicts, as confl_active_logicalslot. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the pg_stat_database_conflicts column. Bumps PGSTAT_FILE_FORMAT_ID for the same reason. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-08 00:05:44 -07:00
Andres Freund	be87200efd	Support invalidating replication slots due to horizon and wal_level Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See `6af1793954` for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:40:27 -07:00
Andres Freund	4397abd0a2	Prevent use of invalidated logical slot in CreateDecodingContext() Previously we had checks for this in multiple places. Support for logical decoding on standbys will add other forms of invalidation, making it worth while to centralize the checks. This slightly changes the error message for both the walsender and SQL interface. Particularly the SQL interface error was inaccurate, as the "This slot has never previously reserved WAL" portion was unreachable. Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 22:19:05 -07:00
Andres Freund	15f8203a59	Replace replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de	2023-04-07 21:47:25 -07:00
Thomas Munro	d4e71df6d7	Add io_direct setting (developer-only). Provide a way to ask the kernel to use O_DIRECT (or local equivalent) where available for data and WAL files, to avoid or minimize kernel caching. This hurts performance currently and is not intended for end users yet. Later proposed work would introduce our own I/O clustering, read-ahead, etc to replace the facilities the kernel disables with this option. The only user-visible change, if the developer-only GUC is not used, is that this commit also removes the obscure logic that would activate O_DIRECT for the WAL when wal_sync_method=open_[data]sync and wal_level=minimal (which also requires max_wal_senders=0). Those are non-default and unlikely settings, and this behavior wasn't (correctly) documented. The same effect can be achieved with io_direct=wal. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com	2023-04-08 16:35:07 +12:00
Thomas Munro	faeedbcefd	Introduce PG_IO_ALIGN_SIZE and align all I/O buffers. In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a later commit, we need the addresses of user space buffers to be well aligned. The exact requirements vary by OS and file system (typically sectors and/or memory pages). The address alignment size is set to 4096, which is enough for currently known systems: it matches modern sectors and common memory page size. There is no standard governing O_DIRECT's requirements so we might eventually have to reconsider this with more information from the field or future systems. Aligning I/O buffers on memory pages is also known to improve regular buffered I/O performance. Three classes of I/O buffers for regular data pages are adjusted: (1) Heap buffers are now allocated with the new palloc_aligned() or MemoryContextAllocAligned() functions introduced by commit `439f6175`. (2) Stack buffers now use a new struct PGIOAlignedBlock to respect PG_IO_ALIGN_SIZE, if possible with this compiler. (3) The buffer pool is also aligned in shared memory. WAL buffers were already aligned on XLOG_BLCKSZ. It's possible for XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus for O_DIRECT WAL writes to fail to be well aligned, but that's a pre-existing condition and will be addressed by a later commit. BufFiles are not yet addressed (there's no current plan to use O_DIRECT for those, but they could potentially get some incidental speedup even in plain buffered I/O operations through better alignment). If we can't align stack objects suitably using the compiler extensions we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to 0. This avoids the need to consider systems that have O_DIRECT but can't align stack objects the way we want; such systems could in theory be supported with more work but we don't currently know of any such machines, so it's easier to pretend there is no O_DIRECT support instead. That's an existing and tested class of system. Add assertions that all buffers passed into smgrread(), smgrwrite() and smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack alignment tricks may be unavailable) or the block size has been set too small to allow arrays of buffers to be all aligned. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com	2023-04-08 16:34:50 +12:00
Stephen Frost	3d4fa227bc	Add support for Kerberos credential delegation Support GSSAPI/Kerberos credentials being delegated to the server by a client. With this, a user authenticating to PostgreSQL using Kerberos (GSSAPI) credentials can choose to delegate their credentials to the PostgreSQL server (which can choose to accept them, or not), allowing the server to then use those delegated credentials to connect to another service, such as with postgres_fdw or dblink or theoretically any other service which is able to be authenticated using Kerberos. Both postgres_fdw and dblink are changed to allow non-superuser password-less connections but only when GSSAPI credentials have been delegated to the server by the client and GSSAPI is used to authenticate to the remote system. Authors: Stephen Frost, Peifeng Qiu Reviewed-By: David Christensen Discussion: https://postgr.es/m/CO1PR05MB8023CC2CB575E0FAAD7DF4F8A8E29@CO1PR05MB8023.namprd05.prod.outlook.com	2023-04-07 21:58:04 -04:00
Andres Freund	ac8d53dae5	Track IO times in pg_stat_io `a9c70b46db` and 8aaa04b32S added counting of IO operations to a new view, pg_stat_io. Now, add IO timing for reads, writes, extends, and fsyncs to pg_stat_io as well. This combines the tracking for pgBufferUsage with the tracking for pg_stat_io into a new function pgstat_count_io_op_time(). This should make it a bit easier to avoid the somewhat costly instr_time conversion done for pgBufferUsage. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com	2023-04-07 17:04:56 -07:00
Peter Geoghegan	1c453cfd89	Show more detail in nbtree rmgr descriptions. Show a detailed description of the page offset number arrays that appear in certain nbtree WAL records. Also brings nbtree desc routines in line with the guidelines established by recent commit `7d8219a4`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:46:23 -07:00
Peter Geoghegan	7d8219a444	Show more detail in heapam rmgr descriptions. Add helper functions that output arrays in a standard format, and use the functions inside heapdesc routines. This allows tools like pg_walinspect to show a detailed description of the page offset number arrays for records like PRUNE and VACUUM (unless there was an FPI). Also document the conventions that desc routines should follow. Only the heapdesc routines follow the conventions for now, so they're just guidelines for the time being. Based on a suggestion from Andres Freund. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/flat/20230109215842.fktuhesvayno6o4g%40awork3.anarazel.de	2023-04-07 16:08:52 -07:00
Andres Freund	704261ecc6	Improve IO accounting for temp relation writes Both pgstat_database and pgBufferUsage count IO timing for reads of temporary relation blocks into local buffers. However, both failed to count write IO timing for flushes of dirty local buffers. Fix. Additionally, FlushRelationBuffers() seems to have omitted counting write IO (both count and timing) stats for both pgstat_database and pgBufferUsage. Fix. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230321023451.7rzy4kjj2iktrg2r%40awork3.anarazel.de	2023-04-07 13:24:26 -07:00
Alvaro Herrera	e056c557ae	Catalog NOT NULL constraints We now create pg_constaint rows for NOT NULL constraints with contype='n'. We propagate these constraints during operations such as adding inheritance relationships, creating and attaching partitions, creating tables LIKE other tables. We mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations; for example, as opposed to CHECK constraints, we don't match NOT NULL ones by name when descending a hierarchy to alter it; instead we match by column number. This means we don't require the constraint names to be identical across a hierarchy. For now, we omit them from system catalogs. Maybe this is worth reconsidering. We don't support NOT VALID nor DEFERRABLE clauses either; these can be added as separate features later (this patch is already large and complicated enough.) This has been very long in the making. The first patch was written by Bernd Helmle in 2010 to add a new pg_constraint.contype value ('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one was killed by the realization that we ought to use contype='c' instead: manufactured CHECK constraints. However, later SQL standard development, as well as nonobvious emergent properties of that design (mostly, failure to distinguish them from "normal" CHECK constraints as well as the performance implication of having to test the CHECK expression) led us to reconsider this choice, so now the current implementation uses contype='n' again. In 2016 Vitaly Burovoy also worked on this feature[1] but found no consensus for his proposed approach, which was claimed to be closer to the letter of the standard, requiring additional pg_attribute columns to track the OID of the NOT NULL constraint for that column. [1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/CACA0E642A0267EDA387AF2B%40%5B172.26.14.62%5D Discussion: https://postgr.es/m/AANLkTinLXMOEMz+0J29tf1POokKi4XDkWJ6-DDR9BKgU@mail.gmail.com Discussion: https://postgr.es/m/20110707213401.GA27098@alvh.no-ip.org Discussion: https://postgr.es/m/1343682669-sup-2532@alvh.no-ip.org Discussion: https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com Discussion: https://postgr.es/m/20220817181249.q7qvj3okywctra3c@alvherre.pgsql	2023-04-07 19:59:57 +02:00
Tom Lane	ff245a3788	Doc: improve descriptions of max_[pred_]locks_per_transaction GUCs. The old wording described these as being multiplied by max_connections plus max_prepared_transactions, which hasn't been exactly right for some time thanks to the addition of various auxiliary processes. Moreover, exactness here is a bit pointless given that the lock tables can expand into the initially-unallocated "slop" space in shared memory. Rather than trying to track exactly what the code is doing, let's just use the term "server processes". Likewise adjust these GUCs' description strings in guc_tables.c. Wang Wei, reviewed by Nathan Bossart and myself Discussion: https://postgr.es/m/OS3PR01MB6275BDD09C9B875C65FCC5AB9EA39@OS3PR01MB6275.jpnprd01.prod.outlook.com	2023-04-07 13:29:29 -04:00
Tom Lane	888f2ea0a8	Add array_sample() and array_shuffle() functions. These are useful in Monte Carlo applications. Martin Kalcher, reviewed/adjusted by Daniel Gustafsson and myself Discussion: https://postgr.es/m/9d160a44-7675-51e8-60cf-6d64b76db831@aboutsource.net	2023-04-07 11:47:07 -04:00
Andres Freund	21d7c05a5c	Fix copy-paste bug in `12f3867f55` triggering an assert after a write error The same condition accidentally was copied to both branches. Manual testing confirms that otherwise the error recovery path works fine. Found while reviewing the logical-decoding-on-standby patch.	2023-04-07 01:02:46 -07:00
Michael Paquier	8fcb32db98	Add more protections in WAL record APIs against overflows This commit adds a limit to the size of an XLogRecord at 1020MB, based on a suggestion by Heikki Linnakangas. This counts for the overhead needed by the XLogReader when allocating the memory it needs to read a record in DecodeXLogRecordRequiredSpace(), based on the record size. An assertion based on that is added to detect that any additions in the XLogReader facilities would not cause any overflows. If that's ever the case, the upper bound allowed would need to be adjusted. Before this, it was possible for an external module to create WAL records large enough to be assembled but not replayable, causing failures when replaying such WAL records on standbys. One case mentioned where this is possible is the in-core function pg_logical_emit_message() (wrapper for LogLogicalMessage), that allows to emit WAL records with an arbitrary amount of data potentially higher than the replay limit of approximately 1GB (limit of a palloc, minus the overhead needed by a XLogReader). This commit is a follow-up of `ffd1b6b` that has added similar protections for the block-level data. Here, the checks are extended to the whole record length, mainrdata_len being extended from uint32 to uint64 with the routines registering buffer and record data still limited to uint32 to minimize the checks when assembling a record. All the error messages related to overflow checks are improved to provide more context about the error happening. Author: Matthias van de Meent Reviewed-by: Andres Freund, Heikki Linnakangas, Michael Paquier Discussion: https://postgr.es/m/CAEze2WgGiw+LZt+vHf8tWqB_6VxeLsMeoAuod0N=ij1q17n5pw@mail.gmail.com	2023-04-07 10:10:17 +09:00
Andres Freund	26158b852d	Use ExtendBufferedRelTo() in XLogReadBufferExtended() Instead of extending the relation block-by-block, use ExtendBufferedRelTo(), introduced in `31966b151e`. This is faster and simpler. This also somewhat reduces the danger that disconnected segments pose (which can be "discovered" once the previous segment reaches SEGSIZE), as ExtendBufferedRelTo() won't extend past the block it has been asked. However, the risk of the content of such a disconnected segment being invalid remains. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20230223010147.32oir7sb66slqnjk@awork3.anarazel.de	2023-04-06 17:56:17 -07:00
David Rowley	ae78cae3be	Add --buffer-usage-limit option to vacuumdb `1cbbee033` added BUFFER_USAGE_LIMIT to the VACUUM and ANALYZE commands, so here we permit that option to be specified in vacuumdb. In passing, adjust the documents for vacuum_buffer_usage_limit and the BUFFER_USAGE_LIMIT VACUUM option to mention "kB" rather than "KB". Do the same for the ERROR message in ExecVacuum() and check_vacuum_buffer_usage_limit(). Without that we might tell a user that the valid minimum value is 128 KB only to reject that because we accept only "kB" and not "KB". Also, add a small reminder comment in vacuum.h to try to trigger the memory of anyone adding new fields to VacuumParams that they might want to consider if vacuumdb needs to grow a new option too. Author: Melanie Plageman Reviewed-by: Justin Pryzby Reviewed-by: David Rowley Discussion: https://postgr.es/m/ZAzTg3iEnubscvbf@telsasoft.com	2023-04-07 12:47:10 +12:00
Andres Freund	00d1e02be2	hio: Use ExtendBufferedRelBy() to extend tables more efficiently While we already had some form of bulk extension for relations, it was fairly limited. It only amortized the cost of acquiring the extension lock, the relation itself was still extended one-by-one. Bulk extension was also solely triggered by contention, not by the amount of data inserted. To address this, use ExtendBufferedRelBy(), introduced in `31966b151e`, to extend the relation. We try to extend the relation by multiple blocks in two situations: 1) The caller tells RelationGetBufferForTuple() that it will need multiple pages. For now that's only used by heap_multi_insert(), see commit FIXME. 2) If there is contention on the extension lock, use the number of waiters for the lock as a multiplier for the number of blocks to extend by. This is similar to what we already did. Previously we additionally multiplied the numbers of waiters by 20, but with the new relation extension infrastructure I could not see a benefit in doing so. Using the freespacemap to provide empty pages can cause significant contention, and adds measurable overhead, even if there is no contention. To reduce that, remember the blocks the relation was extended by in the BulkInsertState, in the extending backend. In case 1) from above, the blocks the extending backend needs are not entered into the FSM, as we know that we will need those blocks. One complication with using the FSM to record empty pages, is that we need to insert blocks into the FSM, when we already hold a buffer content lock. To avoid doing IO while holding a content lock, release the content lock before recording free space. Currently that opens a small window in which another backend could fill the block, if a concurrent VACUUM records the free space. If that happens, we retry, similar to the already existing case when otherBuffer is provided. In the future it might be worth closing the race by preventing VACUUM from recording the space in newly extended pages. This change provides very significant wins (3x at 16 clients, on my workstation) for concurrent COPY into a single relation. Even single threaded COPY is measurably faster, primarily due to not dirtying pages while extending, if supported by the operating system (see commit `4d330a61bb`). Even single-row INSERTs benefit, although to a much smaller degree, as the relation extension lock rarely is the primary bottleneck. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:53:17 -07:00
David Rowley	1cbbee0338	Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT option Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb@awork3.anarazel.de	2023-04-07 11:40:31 +12:00
Andres Freund	5279e9db8e	heapam: Pass number of required pages to RelationGetBufferForTuple() A future commit will use this information to determine how aggressively to extend the relation by. In heap_multi_insert() we know accurately how many pages we need once we need to extend the relation, providing an accurate lower bound for how much to extend. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-06 16:17:16 -07:00
Daniel Gustafsson	7d71d3dd08	Refresh cost-based delay params more frequently in autovacuum Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 01:00:21 +02:00
Daniel Gustafsson	a85c60a945	Separate vacuum cost variables from GUCs Vacuum code run both by autovacuum workers and a backend doing VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay, which are the global variables backing the GUCs vacuum_cost_limit and vacuum_cost_delay. Autovacuum workers needed to override these variables with their own values, derived from autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay and worker cost limit balancing logic. This led to confusing code which, in some cases, both derived and set a new value of VacuumCostLimit from VacuumCostLimit. In preparation for refreshing these GUC values more often, introduce new, independent global variables and add a function to update them using the GUCs and existing logic. Per suggestion by Kyotaro Horiguchi Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:53 +02:00
Daniel Gustafsson	71a825194f	Make vacuum failsafe_active globally visible While vacuuming a table in failsafe mode, VacuumCostActive should not be re-enabled. This currently isn't a problem because vacuum cost parameters are only refreshed in between vacuuming tables and failsafe status is reset for every table. In preparation for allowing vacuum cost parameters to be updated more frequently, elevate LVRelState->failsafe_active to a global, VacuumFailsafeActive, which will be checked when determining whether or not to re-enable vacuum cost-related delays. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com	2023-04-07 00:54:08 +02:00
Tom Lane	029dea882a	Fix ts_headline() edge cases for empty query and empty search text. tsquery's GETQUERY() macro is only safe to apply to a tsquery that is known non-empty; otherwise it gives a pointer to garbage. Before commit `5a617d75d`, ts_headline() avoided this pitfall, but only in a very indirect, nonobvious way. (hlCover could not reach its TS_execute call, because if the query contains no lexemes then hlFirstIndex would surely return -1.) After that commit, it fell into the trap, resulting in weird errors such as "unrecognized operator" and/or valgrind complaints. In HEAD, fix this by not calling TS_execute_locations() at all for an empty query. In the back branches, add a defensive check to hlCover() --- that's not fixing any live bug, but I judge the code a bit too fragile as-is. Also, both mark_hl_fragments() and mark_hl_words() were careless about the possibility of empty search text: in the cases where no match has been found, they'd end up telling mark_fragment() to mark from word indexes 0 to 0 inclusive, even when there is no word 0. This is harmless since we over-allocated the prs->words array, but it does annoy valgrind. Fix so that the end index is -1 and thus mark_fragment() will do nothing in such cases. Bottom line is that this fixes a live bug in HEAD, but in the back branches it's only getting rid of a valgrind nitpick. Back-patch anyway. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com	2023-04-06 15:52:44 -04:00
Andres Freund	18103b7c5f	hio: Don't pin the VM while holding buffer lock while extending Starting with commit `7db0cd2145`, RelationGetBufferForTuple() did a visibilitymap_pin() while holding an exclusive buffer content lock on a newly extended page, when using COPY FREEZE. We elsewhere try hard to avoid to doing IO while holding a content lock. And until `14f98e0af9`, that happened while holding the relation extension lock. Practically, this isn't a huge issue, because COPY FREEZE is restricted to relations created or truncated in the current session, so it's unlikely there's a lot of contention. We can't avoid doing IO while holding the content lock by pinning the VM earlier, because we don't know which page it will be on. While we could just ignore the issue in this case, a future commit will add bulk relation extension, which needs to enter pages into the FSM while also trying to hold onto a buffer lock. To address this issue, use visibilitymap_pin_ok() to see if the relevant buffer is already pinned. If not, release the buffer, pin the VM buffer, and acquire the lock again. This opens up a small window for other backends to insert data onto the page - as the page is not entered into the freespacemap, other backends won't see it normally, but a concurrent vacuum could enter the page, if it started just after the relation is extended. In case the page is used by another backend, retry. This is very similar to how locking "otherBuffer" is already dealt with. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: http://postgr.es/m/20230325025740.wzvchp2kromw4zqz@awork3.anarazel.de	2023-04-06 11:11:13 -07:00
Andres Freund	bba9003b62	hio: Relax rules for calling GetVisibilityMapPins() GetVisibilityMapPins() insisted on the buffer1/buffer2 being in a specific order. This required checks at the callsite. As a subsequent patch will add another callsite, move related logic into GetVisibilityMapPins(). Discussion: https://postgr.es/m/20230403190030.fk2frxv6faklrseb@awork3.anarazel.de	2023-04-06 10:36:30 -07:00
Tomas Vondra	2820adf775	Support long distance matching for zstd compression zstd compression supports a special mode for finding matched in distant past, which may result in better compression ratio, at the expense of using more memory (the window size is 128MB). To enable this optional mode, use the "long" keyword when specifying the compression method (--compress=zstd:long). Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com	2023-04-06 17:18:42 +02:00
David Rowley	b9b125b9c1	Move various prechecks from vacuum() into ExecVacuum() vacuum() is used for both the VACUUM command and for autovacuum. There were many prechecks being done inside vacuum() that were just not relevant to autovacuum. Let's move the bulk of these into ExecVacuum() so that they're only executed when running the VACUUM command. This removes a small amount of overhead when autovacuum vacuums a table. While we are at it, allocate VACUUM's BufferAccessStrategy in ExecVacuum() and pass it into vacuum() instead of expecting vacuum() to make it if it's not already made by the calling function. To make this work, we need to create the vacuum memory context slightly earlier, so we now need to pass that down to vacuum() so that it's available for use in other memory allocations. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/20230405211534.4skgskbilnxqrmxg@awork3.anarazel.de	2023-04-06 15:44:52 +12:00
Andres Freund	acab1b0914	Convert many uses of ReadBuffer[Extended](P_NEW) to ExtendBufferedRel() A few places are not converted. Some because they are tackled in later commits (e.g. hio.c, xlogutils.c), some because they are more complicated (e.g. brin_pageops.c). Having a few users of ReadBuffer(P_NEW) is good anyway, to ensure the backward compat path stays working. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 18:57:29 -07:00
Andres Freund	fcdda1e4b5	Use ExtendBufferedRelTo() in {vm,fsm}_extend() This uses ExtendBufferedRelTo(), introduced in `31966b151e`, to extend the visibilitymap and freespacemap to the size needed. It also happens to fix a warning introduced in `3d6a98457d`, reported by Tom Lane. Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/2194723.1680736788@sss.pgh.pa.us	2023-04-05 17:50:09 -07:00
David Rowley	bccd6908ca	Always make a BufferAccessStrategy for ANALYZE `32fbe0239` changed things so we didn't bother allocating the BufferAccessStrategy during VACUUM (ONLY_DATABASE_STATS); and VACUUM (FULL), however, it forgot to consider that VACUUM (FULL, ANALYZE) is a possible combination. That change would have resulted in such a command allowing ANALYZE to make full use of shared buffers, which wasn't intended, so fix that. Reported-by: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_bJRKe+v_=OqwC+5sA3j5qv8rqdAwy3+yHaO3wmtfrCRg@mail.gmail.com	2023-04-06 12:37:03 +12:00
Michael Paquier	1d477a907e	Fix row tracking in pg_stat_statements with extended query protocol pg_stat_statements relies on EState->es_processed to count the number of rows processed by ExecutorRun(). This proves to be a problem under the extended query protocol when the result of a query is fetched through more than one call of ExecutorRun(), as es_processed is reset each time ExecutorRun() is called. This causes pg_stat_statements to report the number of rows calculated in the last execute fetch, rather than the global sum of all the rows processed. As pquery.c tells, this is a problem when a portal does not use holdStore. For example, DMLs with RETURNING would report a correct tuple count as these do one execution cycle when the query is first executed to fill in the portal's store with one ExecutorRun(), feeding on the portal's store for each follow-up execute fetch depending on the fetch size requested by the client. The fix proposed for this issue is simple with the addition of an extra counter in EState that's preserved across multiple ExecutorRun() calls, incremented with the value calculated in es_processed. This approach is not back-patchable, unfortunately. Note that libpq does not currently give any way to control the fetch size when using the extended v3 protocol, meaning that in-core testing is not possible yet. This issue can be easily verified with the JDBC driver, though, with autocommit disabled. Hence, having in-core tests requires more features, left for future discussion: - At least two new libpq routines splitting PQsendQueryGuts(), one for the bind/describe and a second for a series of execute fetches with a custom fetch size, likely in a fashion similar to what JDBC does. - A psql meta-command for the execute phase. This part is not strictly mandatory, still it could be handy. Reported-by: Andrew Dunstan (original discovery by Simon Siggs) Author: Sami Imseih Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net	2023-04-06 09:29:03 +09:00
Andres Freund	31966b151e	bufmgr: Introduce infrastructure for faster relation extension The primary bottlenecks for relation extension are: 1) The extension lock is held while acquiring a victim buffer for the new page. Acquiring a victim buffer can require writing out the old page contents including possibly needing to flush WAL. 2) When extending via ReadBuffer() et al, we write a zero page during the extension, and then later write out the actual page contents. This can nearly double the write rate. 3) The existing bulk relation extension infrastructure in hio.c just amortized the cost of acquiring the relation extension lock, but none of the other costs. Unfortunately 1) cannot currently be addressed in a central manner as the callers to ReadBuffer() need to acquire the extension lock. To address that, this this commit moves the responsibility for acquiring the extension lock into bufmgr.c functions. That allows to acquire the relation extension lock for just the required time. This will also allow us to improve relation extension further, without changing callers. The reason we write all-zeroes pages during relation extension is that we hope to get ENOSPC errors earlier that way (largely works, except for CoW filesystems). It is easier to handle out-of-space errors gracefully if the page doesn't yet contain actual tuples. This commit addresses 2), by using the recently introduced smgrzeroextend(), which extends the relation, without dirtying the kernel page cache for all the extended pages. To address 3), this commit introduces a function to extend a relation by multiple blocks at a time. There are three new exposed functions: ExtendBufferedRel() for extending the relation by a single block, ExtendBufferedRelBy() to extend a relation by multiple blocks at once, and ExtendBufferedRelTo() for extending a relation up to a certain size. To avoid duplicating code between ReadBuffer(P_NEW) and the new functions, ReadBuffer(P_NEW) now implements relation extension with ExtendBufferedRel(), using a flag to tell ExtendBufferedRel() that the relation lock is already held. Note that this commit does not yet lead to a meaningful performance or scalability improvement - for that uses of ReadBuffer(P_NEW) will need to be converted to ExtendBuffered*(), which will be done in subsequent commits. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 16:21:09 -07:00
Andres Freund	12f3867f55	bufmgr: Support multiple in-progress IOs by using resowner A future patch will add support for extending relations by multiple blocks at once. To be concurrency safe, the buffers for those blocks need to be marked as BM_IO_IN_PROGRESS. Until now we only had infrastructure for recovering from an IO error for a single buffer. This commit extends that infrastructure to multiple buffers by using the resource owner infrastructure. This commit increases the size of the ResourceOwnerData struct, which appears to have a just about measurable overhead in very extreme workloads. Medium term we are planning to substantially shrink the size of ResourceOwnerData. Short term the increase is small enough to not worry about it for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de Discussion: https://postgr.es/m/20221029200025.w7bvlgvamjfo6z44@awork3.anarazel.de	2023-04-05 14:17:55 -07:00
Tom Lane	16dc2703c5	Support "Right Anti Join" plan shapes. Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com	2023-04-05 16:59:09 -04:00
Andres Freund	dad50f677c	bufmgr: Acquire and clean victim buffer separately Previously we held buffer locks for two buffer mapping partitions at the same time to change the identity of buffers. Particularly for extending relations needing to hold the extension lock while acquiring a victim buffer is painful.But it also creates a bottleneck for workloads that just involve reads. Now we instead first acquire a victim buffer and write it out, if necessary. Then we remove that buffer from the old partition with just the old partition's partition lock held and insert it into the new partition with just that partition's lock held. By separating out the victim buffer acquisition, future commits will be able to change relation extensions to scale better. On my workstation, a micro-benchmark exercising buffered reads strenuously and under a lot of concurrency, sees a >2x improvement. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 13:47:46 -07:00
Tom Lane	65eb2d00c6	Acquire locks on views in AcquirePlannerLocks, too. Commit `47bb9db75` taught AcquireExecutorLocks to re-acquire locks on views using data from their RTE_SUBQUERY replacements, but it now seems like we should make AcquirePlannerLocks do the same. In this way, if a view has been redefined, we will notice that a bit earlier while checking validity of a cached plan and thereby avoid some wasted work. Report and patch by Amit Langote. Discussion: https://postgr.es/m/CA+HiwqH0xZOQ+GQAdKeckY1R4NOeHdzhtfxkAMJLSchpapNk5w@mail.gmail.com	2023-04-05 15:56:35 -04:00
Andres Freund	794f259447	bufmgr: Add Pin/UnpinLocalBuffer() So far these were open-coded in quite a few places, without a good reason. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:42:17 -07:00
Andres Freund	819b69a81d	bufmgr: Add some more error checking [infrastructure] around pinning This adds a few more assertions against a buffer being local in places we don't expect, and extracts the check for a buffer being pinned exactly once from LockBufferForCleanup() into its own function. Later commits will use this function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-05 10:42:17 -07:00
Andres Freund	4d330a61bb	Add smgrzeroextend(), FileZero(), FileFallocate() smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple blocks. When extending by a small number of blocks, use FileZero() instead, as using posix_fallocate() for small numbers of blocks is inefficient for some file systems / operating systems. FileZero() is also used as the fallback for FileFallocate() on platforms / filesystems that don't support fallocate. A big advantage of using posix_fallocate() is that it typically won't cause dirty buffers in the kernel pagecache. So far the most common pattern in our code is that we smgrextend() a page full of zeroes and put the corresponding page into shared buffers, from where we later write out the actual contents of the page. If the kernel, e.g. due to memory pressure or elapsed time, already wrote back the all-zeroes page, this can lead to doubling the amount of writes reaching storage. There are no users of smgrzeroextend() as of this commit. That will follow in future commits. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de	2023-04-05 10:06:39 -07:00
Tom Lane	4766eef317	Fix another issue with ENABLE/DISABLE TRIGGER on partitioned tables. In v13 and v14, the ENABLE/DISABLE TRIGGER USER variant malfunctioned on cloned triggers, failing to find the clones because it thought they were system triggers. Other variants of ENABLE/DISABLE TRIGGER would improperly apply a superuserness check. Fix by adjusting the is-it- a-system-trigger check to match reality in those branches. (As far as I can find, this is the only place that got it wrong.) There's no such bug in v15/HEAD, because we revised the catalog representation of system triggers to be what this code was expecting. However, add the test case to these branches anyway, because this area is visibly pretty fragile. Also remove an obsoleted comment. The recent v15/HEAD commit `6949b921d` fixed a nearby bug. I now see that my commit message for that was inaccurate: the behavior of recursing to clone triggers is older than v15, but it didn't apply to the case in v13/v14 because in those branches parent partitioned tables have no pg_trigger entries for foreign-key triggers. But add the test case from that commit to v13/v14, just to show what is happening there. Per bug #17886 from DzmitryH. Discussion: https://postgr.es/m/17886-5406d5d828aa4aa3@postgresql.org	2023-04-05 12:56:32 -04:00
Andres Freund	3d6a98457d	Don't initialize page in {vm,fsm}_extend(), not needed The read path needs to be able to initialize pages anyway, as relation extensions are not durable. By avoiding initializing pages, we can, in a future patch, extend the relation by multiple blocks at once. Using smgrextend() for {vm,fsm}_extend() is not a good idea in general, as at least one page of the VM/FSM will be read immediately after, always causing a cache miss, requiring us to read content we just wrote. Discussion: https://postgr.es/m/20230301223515.pucbj7nb54n4i4nv@awork3.anarazel.de	2023-04-05 08:19:39 -07:00
Robert Haas	86a3fc7ec8	Fix wrong word in comment. Reported by Peter Smith. Discussion: http://postgr.es/m/CAHut+Pt52ueOEAO-G5qeZiiPv1p9pBT_W5Vj3BTYfC8sD8LFxw@mail.gmail.com	2023-04-05 09:50:08 -04:00
Peter Eisentraut	f275af8cb8	Update information_schema for SQL:2023 This is mainly a light renumbering to match the sections in the standard. The comments for SQL_IMPLEMENTATION_INFO and SQL_SIZING are no longer applicable because the required information has been moved into part 9.	2023-04-05 09:57:44 +02:00
Peter Eisentraut	c9f57541d9	doc: Update SQL features/conformance information to SQL:2023 Optional subfeatures have been changed to top-level features, so there is a bit of a churn in the list for that. Some existing functions have been added to the standard, so they are moved from the "other" to the "standard" lists in their sections. Discussion: https://www.postgresql.org/message-id/flat/63f285d9-4ec8-0c9e-4bf5-e76334ddc0af@enterprisedb.com	2023-04-05 09:20:25 +02:00
Peter Eisentraut	c209d317e9	Fix minor signed/unsigned mixup The chunk header is unsigned, and the output format takes unsigned, so casting it to signed in between is incorrect.	2023-04-05 07:34:52 +02:00
Andres Freund	3f695b3117	sequences: Lock buffer before initializing page fill_seq_fork_with_data(), used to initialize a new sequence relation, only locked the buffer after calling PageInit(), even though PageInit() modifies page contents. This is unlikely to cause real-world issues, as the relation is exclusively locked at that point, and the buffer not yet marked dirty, so other processes should not access the buffer. This issue looks to have been present since the introduction of sequences in `e8647c45d6`. Given the low risk, it does not seem worth backpatching the fix. Discussion: https://postgr.es/m/20230404185501.wdkmo3k7bedlx6qk@awork3.anarazel.de	2023-04-04 16:42:52 -07:00
Jeff Davis	36320cbc16	Fix MSVC warning introduced in `ea1db8ae70`. Discussion: https://postgr.es/m/CA+hUKGJR1BhCORa5WdvwxztD3arhENcwaN1zEQ1Upg20BwjKWA@mail.gmail.com Reported-by: Thomas Munro	2023-04-04 15:43:18 -07:00
Thomas Munro	f303ec6210	Remove comment obsoleted by `11c2d6fd`. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1604497.1680637072%40sss.pgh.pa.us	2023-04-05 09:40:34 +12:00
Jeff Davis	ea1db8ae70	Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-04-04 10:38:58 -07:00
Robert Haas	482675987b	Add a run_as_owner option to subscriptions. This option is normally false, but can be set to true to obtain the legacy behavior where the subscription runs with the permissions of the subscription owner rather than the permissions of the table owner. The advantages of this mode are (1) it doesn't require that the subscription owner have permission to SET ROLE to each table owner and (2) since no role switching occurs, the SECURITY_RESTRICTED_OPERATION restrictions do not apply. On the downside, it allows any table owner to easily usurp the privileges of the subscription owner - basically, to take over their account. Because that's generally quite undesirable, we don't make this mode the default, but we do make it available, just in case the new behavior causes too many problems for someone. Discussion: http://postgr.es/m/CA+TgmoZ-WEeG6Z14AfH7KhmpX2eFh+tZ0z+vf0=eMDdbda269g@mail.gmail.com	2023-04-04 12:03:03 -04:00
Robert Haas	1e10d49b65	Perform logical replication actions as the table owner. Up until now, logical replication actions have been performed as the subscription owner, who will generally be a superuser. Commit `cec57b1a0f` documented hazards associated with that situation, namely, that any user who owns a table on the subscriber side could assume the privileges of the subscription owner by attaching a trigger, expression index, or some other kind of executable code to it. As a remedy, it suggested not creating configurations where users who are not fully trusted own tables on the subscriber. Although that will work, it basically precludes using logical replication in the way that people typically want to use it, namely, to replicate a database from one node to another without necessarily having any restrictions on which database users can own tables. So, instead, change logical replication to execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the table owner when they are replicated. Since this involves switching the active user frequently within a session that is authenticated as the subscription user, also impose SECURITY_RESTRICTED_OPERATION restrictions on logical replication code. As an exception, if the table owner can SET ROLE to the subscription owner, these restrictions have no security value, so don't impose them in that case. Subscription owners are now required to have the ability to SET ROLE to every role that owns a table that the subscription is replicating. If they don't, replication will fail. Superusers, who normally own subscriptions, satisfy this property by default. Non-superusers users who own subscriptions will need to be granted the roles that own relevant tables. Patch by me, reviewed (but not necessarily in its entirety) by Jelte Fennema, Jeff Davis, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com	2023-04-04 11:25:23 -04:00
Alvaro Herrera	71bfd1543f	Code review for recent SQL/JSON commits - At the last minute and for no particularly good reason, I changed the WITHOUT token to be marked especially for lookahead, from the one in WITHOUT TIME to the one in WITHOUT UNIQUE. Study of upcoming patches (where a new WITHOUT ARRAY WRAPPER clause is added) showed me that the former was better, so put it back the way the original patch had it. - update exprTypmod() for JsonConstructorExpr to return the typmod of the RETURNING clause, as a comment there suggested. Perhaps it's possible for this to make a difference with datetime types, but I didn't try to build a test case. - The nodeFuncs.c support code for new nodes was calling walker() directly instead of the WALK() macro as introduced by commit `1c27d16e6e`. Modernize that. Also add exprLocation() support for a couple of nodes that missed it. Lastly, reorder the code more sensibly. The WITHOUT_LA -> WITHOUT change means that stored rules containing either WITHOUT TIME ZONE or WITHOUT UNIQUE KEYS would change representation. Therefore, bump catversion. Discussion: https://postgr.es/m/20230329181708.e64g2tpy7jyufqkr@alvherre.pgsql	2023-04-04 14:04:30 +02:00
Andres Freund	8a2b1b1477	bufmgr: Remove buffer-write-dirty tracepoints The trace point was using the relfileno / fork / block for the to-be-read-in buffer. Some upcoming work would make that more expensive to provide. We still have buffer-flush-start/done, which can serve most tracing needs that buffer-write-dirty could serve. Discussion: https://postgr.es/m/f5164e7a-eef6-8972-75a3-8ac622ed0c6e@iki.fi	2023-04-03 18:02:41 -07:00
Peter Geoghegan	05a304a855	Make SP-GiST redirect cleanup more aggressive. Commit `61b313e4` made VACUUM pass down a heaprel to index AM bulkdelete and vacuumcleanup routines. Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how we determine how to cleanup SP-GiST redirect and placeholder tuples. Pass the heaprel to GlobalVisTestFor() during cleanup of redirect and placeholder tuples, rather than pessimistically passing NULL. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/02392033-f030-a3c8-c7d0-5c27eb529fec@gmail.com	2023-04-03 11:47:48 -07:00
Peter Geoghegan	e48c817395	Recycle deleted nbtree pages more aggressively. Commit `61b313e4` made nbtree consistently pass down a heaprel to low level routines like _bt_getbuf(). Although this was primarily intended as preparation for logical decoding on standbys, it also made it easy to correct an old deficiency in how nbtree VACUUM determines whether or not it's now safe to recycle deleted pages. Pass the heaprel to GlobalVisTestFor() in nbtree routines that deal with recycle safety. nbtree now makes less pessimistic assumptions about recycle safety within non-catalog relations. This enhancement complements the recycling enhancement added by commit `9dd963ae25`. nbtree remains just as pessimistic as ever when it comes to recycle safety within indexes on catalog relations. There is no fundamental reason why we need to treat catalog relations differently, though. The behavioral inconsistency is a consequence of the way that nbtree uses nextXID values to implement what Lanin and Shasha call "the drain technique". Note in particular that it has nothing to do with whether or not index tuples might still be required for an older MVCC snapshot. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkaiDxCje0yPuH=3Uh2p1V_2pFGY==xfbZoZu7Ax_NB8g@mail.gmail.com	2023-04-03 11:31:43 -07:00
Peter Geoghegan	a349b86603	Move heaprel struct field next to index rel field. Commit `61b313e4` added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com	2023-04-03 11:01:11 -07:00
Robert Haas	e7e7da2f8d	Fix possible logical replication crash. Commit `c3afe8cf5a` added a new password_required option but forgot that you need database access to check whether an arbitrary role ID is a superuser. Report and patch by Hou Zhijie. I added a comment. Thanks to Alexander Lakhin for devising a way to reproduce the crash. Discussion: http://postgr.es/m/OS0PR01MB5716BFD7EC44284C89F40808948F9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-04-03 13:54:21 -04:00
Tom Lane	a8a00124f1	When using valgrind, log the current query after an error is detected. In USE_VALGRIND builds, add code to print the text of the current query to the valgrind log whenever the valgrind error count has increased during the query. Valgrind will already have printed its report, if the error is distinct from ones already seen, so that this works out fairly well as a way of annotating the log. Onur Tirtir and Tom Lane Discussion: https://postgr.es/m/AM9PR83MB0498531E804DC8DF8CFF0E8FE9D09@AM9PR83MB0498.EURPRD83.prod.outlook.com	2023-04-03 10:18:38 -04:00
Alexander Korotkov	b0b91ced16	Revert `764da7710b` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:55:09 +03:00
Alexander Korotkov	2b65bf046d	Revert `11470f544e` Discussion: https://postgr.es/m/20230323003003.plgaxjqahjgkuxrk%40awork3.anarazel.de	2023-04-03 16:54:31 +03:00
David Rowley	8d928e3a9f	Rename BufferAccessStrategyData.ring_size to nbuffers The new name better reflects what the field is - the size of the buffers[] array. ring_size sounded more like it is in units of bytes. An upcoming commit allows a BufferAccessStrategy of custom sizes, so it seems relevant to improve this beforehand. Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_YefVbhg4rAxU2frYFdTap61UftH-kUNQZBvAs%2BYK81Zg%40mail.gmail.com	2023-04-03 23:31:16 +12:00
David Rowley	4830f10243	Disable vacuum's use of a buffer access strategy during failsafe Traditionally, vacuum always makes use of a buffer access strategy 32 buffers in size. This means that running vacuums tend not to cause too many shared buffers to become dirty, however, this can cause vacuums to run much more slowly than they otherwise could as WAL flushes will occur more frequently due to having to flush WAL out to the LSN of the dirty page before that page can be written to disk. When we are performing failsafe VACUUMs (as added in `1e55e7d17`), we really want to make the vacuum work go as quickly as possible, so here we disable the buffer access strategy when entering failsafe mode while vacuuming a relation. Per idea and analyis from Andres Freund. In passing, also include some changes I had intended for `32fbe0239`. Author: Melanie Plageman Reviewed-by: Justin Pryzby, David Rowley Discussion: https://postgr.es/m/20230111182720.ejifsclfwymw2reb%40awork3.anarazel.de	2023-04-03 23:05:58 +12:00
David Rowley	32fbe0239b	Only make buffer strategy for vacuum when it's likely needed VACUUM FULL and VACUUM ONLY_DATABASE_STATS will not use the vacuum strategy ring created in vacuum(), so don't waste effort making it in those cases. There are other conceivable cases where the buffer strategy also won't be used, but those are probably less common and not worth troubling over too much. For example VACUUM (PROCESS_MAIN false, PROCESS_TOAST false). There are other cases too, but many of these are only discovered once inside vacuum_rel(). Author: Melanie Plageman Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAAKRu_ZLRuzkM3zKogiZAz2hUony37yLY4aaLb8fPf8fgqs5Og@mail.gmail.com	2023-04-03 19:19:45 +12:00
David Rowley	3f476c9534	Remove some global variables from vacuum.c Using global variables because we don't want to pass these values around as parameters does not really seem like a great idea, so let's remove these two global variables and adjust a few functions to accept these values as parameters instead. This is part of a wider patch which intends to allow the size of the buffer access strategy that vacuum uses to be adjusted. Author: Melanie Plageman Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/CAAKRu_b1q_07uquUtAvLqTM%3DW9nzee7QbtzHwA4XdUo7KX_Cnw%40mail.gmail.com	2023-04-03 15:07:25 +12:00
Andres Freund	6af1793954	Add info in WAL records in preparation for logical slot conflict handling This commit only implements one prerequisite part for allowing logical decoding. The commit message contains an explanation of the overall design, which later commits will refer back to. Overall design: 1. We want to enable logical decoding on standbys, but replay of WAL from the primary might remove data that is needed by logical decoding, causing error(s) on the standby. To prevent those errors, a new replication conflict scenario needs to be addressed (as much as hot standby does). 2. Our chosen strategy for dealing with this type of replication slot is to invalidate logical slots for which needed data has been removed. 3. To do this we need the latestRemovedXid for each change, just as we do for physical replication conflicts, but we also need to know whether any particular change was to data that logical replication might access. That way, during WAL replay, we know when there is a risk of conflict and, if so, if there is a conflict. 4. We can't rely on the standby's relcache entries for this purpose in any way, because the startup process can't access catalog contents. 5. Therefore every WAL record that potentially removes data from the index or heap must carry a flag indicating whether or not it is one that might be accessed during logical decoding. Why do we need this for logical decoding on standby? First, let's forget about logical decoding on standby and recall that on a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed. This is done thanks to the catalog_xmin associated with the logical replication slot. But, with logical decoding on standby, in the following cases: - hot_standby_feedback is off - hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it) Then, the primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). So, it’s mandatory to identify those rows and invalidate the slots that may need them if any. Identifying those rows is the purpose of this commit. Implementation: When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field isCatalogRel in such WAL records (and a new bit set in the xl_heap_visible flags field), that is true for catalog tables, so as to arrange for conflict handling. The affected WAL records are the ones that already contain the snapshotConflictHorizon field, namely: - gistxlogDelete - gistxlogPageReuse - xl_hash_vacuum_one_page - xl_heap_prune - xl_heap_freeze_page - xl_heap_visible - xl_btree_reuse_page - xl_btree_delete - spgxlogVacuumRedirect Due to this new field being added, xl_hash_vacuum_one_page and gistxlogDelete do now contain the offsets to be deleted as a FLEXIBLE_ARRAY_MEMBER. This is needed to ensure correct alignment. It's not needed on the others struct where isCatalogRel has been added. This commit just introduces the WAL format changes mentioned above. Handling the actual conflicts will follow in future commits. Bumps XLOG_PAGE_MAGIC as the several WAL records are changed. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>	2023-04-02 12:32:19 -07:00
Andres Freund	61b313e47e	Pass down table relation into more index relation functions This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/21b700c3-eecf-2e05-a699-f8c78dd31ec7@gmail.com	2023-04-01 20:18:29 -07:00
Andres Freund	a88a18b125	Assert only valid flag bits are passed to visibilitymap_set() If visibilitymap_set() is called with flags containing a higher bit than VISIBILITYMAP_ALL_FROZEN, the state of neighboring pages is affected. While there was an assertion that some valid bits were set, it did not check that only valid bits were. Change that. Discussion: https://postgr.es/m/20230331043300.gux3s5wzrapqi4oe@awork3.anarazel.de	2023-04-01 18:00:19 -07:00
Andres Freund	14f98e0af9	hio: Release extension lock before initializing page / pinning VM PageInit() while holding the extension lock is unnecessary after `0d1fe9f74e` started to use RBM_ZERO_AND_LOCK - nobody can look at the new page before we release the page lock. PageInit() zeroes the page, which isn't that cheap, so deferring it until after the extension lock is released seems like a good idea. Doing visibilitymap_pin() while holding the extension lock, introduced in `7db0cd2145`, looks like an accident. Due to the restrictions on HEAP_INSERT_FROZEN it's unlikely to be a performance issue, but it still seems better to move it out. We also are doing the visibilitymap_pin() while holding the buffer lock, which will be fixed in a separate commit. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: http://postgr.es/m/419312fd-9255-078c-c3e3-f0525f911d7f@iki.fi	2023-04-01 17:50:18 -07:00
Alvaro Herrera	6ee30209a6	SQL/JSON: support the IS JSON predicate This patch introduces the SQL standard IS JSON predicate. It operates on text and bytea values representing JSON, as well as on the json and jsonb types. Each test has IS and IS NOT variants and supports a WITH UNIQUE KEYS flag. The tests are: IS JSON [VALUE] IS JSON ARRAY IS JSON OBJECT IS JSON SCALAR These should be self-explanatory. The WITH UNIQUE KEYS flag makes these return false when duplicate keys exist in any object within the value, not necessarily directly contained in the outermost object. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-31 22:34:04 +02:00
Tom Lane	a2a0c7c29e	Further tweaking of width_bucket() edge cases. I realized that the third overflow case I posited in commit `b0e9e4d76` actually should be handled in a different way: rather than tolerating the idea that the quotient could round to 1, we should clamp so that the output cannot be more than "count" when we know that the operand is less than bound2. That being the case, we don't need an overflow-aware increment in that code path, which leads me to revert the movement of the pg_add_s32_overflow() call. (The diff in width_bucket_float8 might be easier to read by comparing against b0e9e4d76^.) What's more, width_bucket_numeric also has this problem of the quotient potentially rounding to 1, so add a clamp there too. As before, I'm not quite convinced that a back-patch is warranted. Discussion: https://postgr.es/m/391415.1680268470@sss.pgh.pa.us	2023-03-31 16:29:55 -04:00
Tom Lane	f0d65c0eaf	Reject system columns as elements of foreign keys. Up through v11 it was sensible to use the "oid" system column as a foreign key column, but since that was removed there's no visible usefulness in making any of the remaining system columns a foreign key. Moreover, since the TupleTableSlot rewrites in v12, such cases actively fail because of implicit assumptions that only user columns appear in foreign keys. The lack of complaints about that seems like good evidence that no one is trying to do it. Hence, rather than trying to repair those assumptions (of which there are at least two, maybe more), let's just forbid the case up front. Per this patch, a system column in either the referenced or referencing side of a foreign key will draw this error; however, putting one in the referenced side would have failed later anyway, since we don't allow unique indexes to be made on system columns. Per bug #17877 from Alexander Lakhin. Back-patch to v12; the case still appears to work in v11, so we shouldn't break it there. Discussion: https://postgr.es/m/17877-4bcc658e33df6de1@postgresql.org	2023-03-31 11:18:49 -04:00
Tom Lane	c2d7d679c1	Ensure acquire_inherited_sample_rows sets its output parameters. The totalrows/totaldeadrows outputs were left uninitialized in cases where we find no analyzable child tables of a partitioned table. This could lead to setting the partitioned table's pg_class.reltuples value to garbage. It's not clear that that would have any very bad effects in practice, but fix it anyway because it's making valgrind unhappy. Reported and diagnosed by Alexander Lakhin (bug #17880). Discussion: https://postgr.es/m/17880-9282037c923d856e@postgresql.org	2023-03-31 10:08:40 -04:00
Alvaro Herrera	9b058f6b0d	Move ExecEvalJsonConstructor new function to a more natural place Commit `7081ac46ac` put it at the end of the file, but that doesn't look very nice.	2023-03-31 12:55:25 +02:00
Alvaro Herrera	47a9709846	No need to add FORMAT to the keyword precedence list Commit `7081ac46ac` put it there. Remove it.	2023-03-31 11:16:43 +02:00
Andres Freund	8aaa04b32d	Track shared buffer hits in pg_stat_io Among other things, this should make it easier to calculate a useful cache hit ratio by excluding buffer reads via buffer access strategies. As buffer access strategies reuse buffers (and thus evict the prior buffer contents), it is normal to see reads on repeated scans of the same data. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAAKRu_beMa9Hzih40%3DXPYqhDVz6tsgUGTrhZXRo%3Dunp%2Bszb%3DUA%40mail.gmail.com	2023-03-30 19:24:21 -07:00
David Rowley	6c3b697b19	Fix List memory issue in transformColumnDefinition When calling generateSerialExtraStmts(), we would pass in the constraint->options. In some cases, generateSerialExtraStmts() would modify the referenced List to remove elements from it, but doing so is invalid without assigning the list back to all variables that point to it. In the particular reported problem case, the List became empty, in which cases it became NIL, but the passed in constraint->options didn't get to find out about that and was left pointing to free'd memory. To fix this, just perform a list_copy() inside generateSerialExtraStmts(). We could just do a list_copy() just before we perform the delete from the list, however, that seems less robust. Let's make sure the generated CreateSeqStmt gets a completely different copy of the list to be safe. Bug: #17879 Reported-by: Fei Changhong Diagnosed-by: Fei Changhong Discussion: https://postgr.es/m/17879-b7dfb5debee58ff5@postgresql.org Backpatch-through: 11, all supported versions	2023-03-31 12:13:05 +13:00
Thomas Munro	11c2d6fdf5	Parallel Hash Full Join. Full and right outer joins were not supported in the initial implementation of Parallel Hash Join because of deadlock hazards (see discussion). Therefore FULL JOIN inhibited parallelism, as the other join strategies can't do that in parallel either. Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on the inner side of one batch's hash table. For now, sidestep the deadlock problem by terminating parallelism there. The last process to arrive at that phase emits the unmatched tuples, while others detach and are free to go and work on other batches, if there are any, but otherwise they finish the join early. That unfairness is considered acceptable for now, because it's better than no parallelism at all. The build and probe phases are run in parallel, and the new scan-for-unmatched phase, while serial, is usually applied to the smaller of the two relations and is either limited by some multiple of work_mem, or it's too big and is partitioned into batches and then the situation is improved by batch-level parallelism. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com	2023-03-31 11:34:03 +13:00
Andres Freund	ca7b3c4c00	pg_stat_wal: Accumulate time as instr_time instead of microseconds In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. The reason for that is that converting to microseconds is not cheap, and can loose precision. Therefore this commit changes 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/1feedb83-7aa9-cb4b-5086-598349d3f555@gmail.com	2023-03-30 14:23:14 -07:00
Alvaro Herrera	63cc20205c	Simplify transformJsonAggConstructor() API There's no need for callers to pass aggregate names so that the function can resolve them to OIDs, when callers can just pass aggregate OIDs directly to begin with.	2023-03-30 21:07:24 +02:00
Alvaro Herrera	60966f56c3	Fix inconsistencies and style issues in new SQL/JSON code Reported by Alexander Lakhin. Discussion: https://postgr.es/m/60483139-5c34-851d-baee-6c0d014e1710@gmail.com	2023-03-30 21:06:31 +02:00
Alvaro Herrera	589bb81649	Fix setrefs.c code for adjusting partPruneInfos We were transferring partPruneInfos from PlannerInfo into PlannerGlobal wrong, essentially relying on all of them being transferred, and adjusting their list indexes based on that. But apparently it's possible that some of them are skipped, so that strategy leads to a corrupted execution tree. Instead, adjust each Append/MergeAppend's partpruneinfo index as we copy from one list to the other, which seems safer anyway. This requires adjusting the RT offset of the RTE referenced in each partPruneInfo ahead of actually adjusting the RTE itself, which seems a bit too ad-hoc. This problem was introduced by commit `ec38694894`. However, it may be that we no longer require the change introduced there, so perhaps we should revert both the present commit and that one. Problem noticed by sqlsmith. Author: Amit Langote <amitlangote09@gmail.com Discussion: https://postgr.es/m/CA+HiwqG6tbc2oadsbyyy24b2AL295XHQgyLRWghmA7u_SL1K8A@mail.gmail.com	2023-03-30 21:06:31 +02:00
Andres Freund	f9054b7a7c	Fix format code in fd.c debugging infrastructure These were not sufficiently adjusted in `2d4f1ba6cf`.	2023-03-30 10:26:10 -07:00
Andres Freund	558cf80387	bufmgr: Fix undefined behaviour with, unrealistically, large temp_buffers Quoting Melanie: > Since if buffer is INT_MAX, then the -(buffer + 1) version invokes > undefined behavior while the -buffer - 1 version doesn't. All other places were already using the correct version. I (Andres), copied the code into more places in a patch. Melanie caught it in review, but to prevent more people from copying the bad code, fix it. Even if it is a theoretical issue. We really ought to wrap these accesses in a helper function... As this is a theoretical issue, don't backpatch. Reported-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aW2SX_LWtwHgfnqYpBrunMLfE9PD6-ioPpkh92XH0qpg@mail.gmail.com	2023-03-30 10:26:10 -07:00
Robert Haas	c3afe8cf5a	Add new predefined role pg_create_subscription. This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com	2023-03-30 11:37:19 -04:00
Tom Lane	b0e9e4d76c	Avoid overflow in width_bucket_float8(). The original coding of this function paid little attention to the possibility of overflow. There were actually three different hazards: 1. The range from bound1 to bound2 could exceed DBL_MAX, which on IEEE-compliant machines produces +Infinity in the subtraction. At best we'd lose all precision in the result, and at worst produce NaN due to dividing Inf/Inf. The range can't exceed twice DBL_MAX though, so we can fix this case by scaling all the inputs by 0.5. 2. We computed count * (operand - bound1), which is also at risk of float overflow, before dividing. Safer is to do the division first, producing a quotient that should be in [0,1), and even after allowing for roundoff error can't be outside [0,1]; then multiplying by count can't produce a result overflowing an int. (width_bucket_numeric does the multiplication first on the grounds that that improves accuracy of its result, but I don't think that a similar argument can be made in float arithmetic.) 3. If the division result does round to 1, and count is INT_MAX, the final addition of 1 would overflow an int. We took care of that in the operand >= bound2 case but did not consider that it could be possible in the main path. Fix that by moving the overflow-aware addition of 1 so it is done that way in all cases. The fix for point 2 creates a possibility that values very close to a bucket boundary will be rounded differently than they were before. I'm not troubled by that for HEAD, but it is an argument against putting this into the stable branches. Given that the cases being fixed here are fairly extreme and unlikely to be hit in normal use, it seems best not to back-patch. Mats Kindahl and Tom Lane Discussion: https://postgr.es/m/17876-61f280d1601f978d@postgresql.org	2023-03-30 11:27:36 -04:00
Peter Eisentraut	261cf8962b	Fix incorrect format placeholders	2023-03-30 08:33:43 +02:00
Amit Kapila	da324d6cd4	Refactor pgoutput_change(). Instead of mostly-duplicate code for different operation (insert/update/delete) types, write a common code to compute old/new tuples, and check the row filter. Author: Hou Zhijie Reviewed-by: Peter Smith, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716194A47FFA8D91133687D94BF9@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-03-30 11:10:38 +05:30
David Rowley	902ecd3bd4	Fix outdated comments regarding TupleTableSlots The tts_flag is named TTS_FLAG_SHOULDFREE, so use that instead of TTS_SHOULDFREE, which is the name of the macro that checks for that flag. Additionally, `4da597edf` got rid of the TupleTableSlot.tts_tuple field but forgot to update a comment which referenced that field. Fix that. Reported-by: Zhen Mingyang <zhenmingyang@yeah.net> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/1a96696c.9d3.187193989c3.Coremail.zhenmingyang@yeah.net	2023-03-30 16:37:03 +13:00
Tom Lane	8e5eef50c5	Fix dereference of dangling pointer in GiST index buffering build. gistBuildCallback tried to fetch the size of an index tuple that might have already been freed by gistProcessEmptyingQueue. While this seems to usually be harmless in production builds, in principle it could result in a SIGSEGV, or more likely a bogus value for indtuplesSize leading to poor page-split decisions later in the build. The memory management here is confusing and could stand to be refactored, but for the moment it seems to be enough to fetch the tuple size sooner. AFAICT the indtuples[Size] totals aren't used in between these places; even if they were, the updated values shouldn't be any worse to use. So just move the incrementing of the totals up. It's not very clear why our valgrind-using buildfarm animals haven't noticed this problem, because the relevant code path does seem to be exercised according to the code coverage report. I think the reason that we didn't fix this bug after the first report is that I'd wanted to try to understand that better. However, now that it's been re-discovered let's just be pragmatic and fix it already. Original report by Alexander Lakhin (bug #16329), later rediscovered by Egor Chindyaskin (bug #17874). Patch by Alexander Lakhin (commentary by Pavel Borisov and me). Back-patch to all supported branches. Discussion: https://postgr.es/m/16329-7a6aa9b6fa1118a1@postgresql.org Discussion: https://postgr.es/m/17874-63ca6c7ce42d2103@postgresql.org	2023-03-29 11:31:30 -04:00
Tom Lane	58c9600a9f	Remove empty function BufmgrCommit(). This function has been a no-op for over a decade. Even if bufmgr regains a need to be called during commit, it seems unlikely that the most appropriate call points would be precisely here, so it's not doing us much good as a placeholder either. Now, removing it probably doesn't save any noticeable number of cycles --- but the main call is inside the commit critical section, and the less work done there the better. Matthias van de Meent Discussion: https://postgr.es/m/CAEze2Wi1=tLKbxZnXzcD+8fYKyKqBtivVakLQC_mYBsP4Y8qVA@mail.gmail.com	2023-03-29 09:13:57 -04:00
Alvaro Herrera	7081ac46ac	SQL/JSON: add standard JSON constructor functions This commit introduces the SQL/JSON standard-conforming constructors for JSON types: JSON_ARRAY() JSON_ARRAYAGG() JSON_OBJECT() JSON_OBJECTAGG() Most of the functionality was already present in PostgreSQL-specific functions, but these include some new functionality such as the ability to skip or include NULL values, and to allow duplicate keys or throw error when they are found, as well as the standard specified syntax to specify output type and format. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org	2023-03-29 12:11:36 +02:00
Peter Eisentraut	38b7437b90	Fix some section numbers in information_schema.sql Some of the section numbers that appeared multiple times were not updated completely by previous changes `d61d9aa750` and `eb3a1376c9`.	2023-03-29 11:34:37 +02:00
Peter Eisentraut	0d15afc875	Simplify useless 0L constants In ancient times, these belonged to arguments or fields that were actually of type long, but now they are not anymore, so this "L" decoration is just confusing. (Some other 0L and other "L" constants remain, where they are actually associated with a long type.)	2023-03-29 08:25:12 +02:00
Amit Kapila	062a844424	Avoid syncing data twice for the 'publish_via_partition_root' option. When there are multiple publications for a subscription and one of those publishes via the parent table by using publish_via_partition_root and the other one directly publishes the child table, we end up copying the same data twice during initial synchronization. The reason for this was that we get both the parent and child tables from the publisher and try to copy the data for both of them. This patch extends the function pg_get_publication_tables() to take a publication list as its input parameter. This allows us to exclude a partition table whose ancestor is published by the same publication list. This problem does exist in back-branches but we decide to fix it there in a separate commit if required. The fix for back-branches requires quite complicated changes to fetch the required table information from the publisher as we can't update the function pg_get_publication_tables() in back-branches. We are not sure whether we want to deviate and complicate the code in back-branches for this problem as there are no field reports yet. Author: Wang wei Reviewed-by: Peter Smith, Jacob Champion, Kuroda Hayato, Vignesh C, Osumi Takamichi, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB57167F45D481F78CDC5986F794B99@OS0PR01MB5716.jpnprd01.prod.outlook.com	2023-03-29 10:46:58 +05:30
Jeff Davis	1671f990dd	Validate ICU locales. For ICU collations, ensure that the locale's language exists in ICU, and that the locale can be opened. Basic validation helps avoid minor mistakes and misspellings, which often fall back to the root locale instead of the intended locale. It's even more important to avoid such mistakes in ICU versions 54 and earlier, where the same (misspelled) locale string could fall back to different locales depending on the environment. Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-28 16:34:29 -07:00
Tom Lane	326a33a289	Fix corner-case planner failure for MERGE. MERGE planning could fail with "variable not found in subplan target list" if the target table is partitioned and all its partitions are excluded at plan time, or in the case where it has no partitions but used to have some. This happened because distribute_row_identity_vars thought it didn't need to make the target table's reltarget list fully valid; but if we generate a join plan then that is required because the dummy Result node's tlist will be made from the reltarget. The same logic appears in distribute_row_identity_vars in v14, but AFAICS the problem is unreachable in that branch for lack of MERGE. In other updating statements, the target table is always inner-joined to any other tables, so if the target is known dummy then the whole plan reduces to dummy, so no join nodes are created. So I'll refrain from back-patching this code change to v14 for now. Per report from Alvaro Herrera. Discussion: https://postgr.es/m/20230328112248.6as34mlx5sr4kltg@alvherre.pgsql	2023-03-28 11:39:24 -04:00
Jeff Davis	8b3eb0c584	Fix error inconsistency in older ICU versions. To support older ICU versions, we rely on icu_set_collation_attributes() to do error checking that is handled directly by ucol_open() in newer ICU versions. Commit `3b50275b12` introduced a slight inconsistency, where the error report includes the fixed-up locale string, rather than the locale string passed to pg_ucol_open(). Refactor slightly so that pg_ucol_open() handles the errors from both ucol_open() and icu_set_collation_attributes(), making it easier to see any differences between the error reports. It also makes pg_ucol_open() responsible for closing the UCollator on error, which seems like the right place. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-28 08:24:18 -07:00
Peter Eisentraut	90189eefc1	Save a few bytes in pg_attribute Change the columns attndims, attstattarget, and attinhcount from int32 to int16, and reorder a bit. This saves some space (currently 4 bytes) in pg_attribute and tuple descriptors, which translates into small performance benefits and/or room for new columns in pg_attribute needed by future features. attndims and attinhcount are never realistically used with values larger than int16. Just to be sure, add some overflow checks. attstattarget is currently limited explicitly to 10000. For consistency, pg_constraint.coninhcount is also changed like attinhcount. Discussion: https://www.postgresql.org/message-id/flat/d07ffc2b-e0e8-77f7-38fb-be921dff71af%40enterprisedb.com	2023-03-28 10:05:56 +02:00
Michael Paquier	4efd0bf7ea	Generate a few more functions of pgstatfuncs.c with macros Two new macros are added with their respective functions switched to use them. These are for functions with millisecond stats, with and without "xact" in their names (for the stats that can be tracked within a transaction). While on it, prefix the macro for float8 on database entries with "_MS", as it does a us->ms conversion, based on a suggestion from Andres Freund. Author: Bertrand Drouvot Discussion: https://postgr.es/m/6e2efb4f-6fd0-807e-f6bf-94207db8183a@gmail.com	2023-03-28 07:35:33 +09:00
Tom Lane	a3c9d35ae1	Reject attempts to alter composite types used in indexes. find_composite_type_dependencies() ignored indexes, which is a poor decision because an expression index could have a stored column of a composite (or other container) type even when the underlying table does not. Teach it to detect such cases and error out. We have to work a bit harder than for other relations because the pg_depend entry won't identify the specific index column of concern, but it's not much new code. This does not address bug #17872's original complaint that dropping a column in such a type might lead to violations of the uniqueness property that a unique index is supposed to ensure. That seems of much less concern to me because it won't lead to crashes. Per bug #17872 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17872-d0fbb799dc3fd85d@postgresql.org	2023-03-27 15:04:15 -04:00
Daniel Gustafsson	b577743000	Make SCRAM iteration count configurable Replace the hardcoded value with a GUC such that the iteration count can be raised in order to increase protection against brute-force attacks. The hardcoded value for SCRAM iteration count was defined to be 4096, which is taken from RFC 7677, so set the default for the GUC to 4096 to match. In RFC 7677 the recommendation is at least 15000 iterations but 4096 is listed as a SHOULD requirement given that it's estimated to yield a 0.5s processing time on a mobile handset of the time of RFC writing (late 2015). Raising the iteration count of SCRAM will make stored passwords more resilient to brute-force attacks at a higher computational cost during connection establishment. Lowering the count will reduce computational overhead during connections at the tradeoff of reducing strength against brute-force attacks. There are however platforms where even a modest iteration count yields a too high computational overhead, with weaker password encryption schemes chosen as a result. In these situations, SCRAM with a very low iteration count still gives benefits over weaker schemes like md5, so we allow the iteration count to be set to one at the low end. The new GUC is intentionally generically named such that it can be made to support future SCRAM standards should they emerge. At that point the value can be made into key:value pairs with an undefined key as a default which will be backwards compatible with this. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se	2023-03-27 09:46:29 +02:00
Michael Paquier	850f4b4c8c	Generate pg_stat_get_xact*() functions for relations using macros This change replaces seven functions definitions by macros. This is the same idea as `8018ffb` or `83a1a1b`, taking advantage of the variable rename done in `8089517` for relation entries. Author: Bertrand Drouvot Discussion: https://postgr.es/m/631e3084-c5d9-8463-7540-fcff4674caa5@gmail.com	2023-03-27 09:57:41 +09:00
Tom Lane	554841699f	Fix oversights in array manipulation. The nested-arrays code path in ExecEvalArrayExpr() used palloc to allocate the result array, whereas every other array-creating function has used palloc0 since `18c0b4ecc`. This mostly works, but unused bits past the end of the nulls bitmap may end up undefined. That causes valgrind complaints with -DWRITE_READ_PARSE_PLAN_TREES, and could cause planner misbehavior as cited in `18c0b4ecc`. There seems no very good reason why we should strive to avoid palloc0 in just this one case, so fix it the easy way with s/palloc/palloc0/. While looking at that I noted that we also failed to check for overflow of "nbytes" and "nitems" while summing the sizes of the sub-arrays, potentially allowing a crash due to undersized output allocation. For "nbytes", follow the policy used by other array-munging code of checking for overflow after each addition. (As elsewhere, the last addition of the array's overhead space doesn't need an extra check, since palloc itself will catch a value between 1Gb and 2Gb.) For "nitems", there's no very good reason to sum the inputs at all, since we can perfectly well use ArrayGetNItems' result instead of ignoring it. Per discussion of this bug, also remove redundant zeroing of the nulls bitmap in array_set_element and array_set_slice. Patch by Alexander Lakhin and myself, per bug #17858 from Alexander Lakhin; thanks also to Richard Guo. These bugs are a dozen years old, so back-patch to all supported branches. Discussion: https://postgr.es/m/17858-8fd287fd3663d051@postgresql.org	2023-03-26 13:41:06 -04:00
Daniel Gustafsson	d435f15fff	Add SysCacheGetAttrNotNull for guaranteed not-null attrs When extracting an attr from a cached tuple in the syscache with SysCacheGetAttr the isnull parameter must be checked in case the attr cannot be NULL. For cases when this is known beforehand, a wrapper is introduced which perform the errorhandling internally on behalf of the caller, invoking an elog in case of a NULL attr. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/AD76405E-DB45-46B6-941F-17B1EB3A9076@yesql.se	2023-03-25 22:49:33 +01:00
Noah Misch	e33967b13b	Comment on expectations for AutoVacuumWorkItem handlers. This might prevent a repeat of the brin_summarize_range() vulnerability that commit `a117cebd63` fixed.	2023-03-25 13:00:27 -07:00
Tom Lane	27f5c712b2	Fix CREATE INDEX progress reporting for multi-level partitioning. The "partitions_total" and "partitions_done" fields were updated as though the current level of partitioning was the only one. In multi-level cases, not only could partitions_total change over the course of the command, but partitions_done could go backwards or exceed the currently-reported partitions_total. Fix by setting partitions_total to the total number of direct and indirect children once at command start, and then just incrementing partitions_done at appropriate points. Invent a new progress monitoring function "pgstat_progress_incr_param" to simplify doing the latter. We can avoid adding cost for the former when doing CREATE INDEX, because ProcessUtility already enumerates the children and it's pretty easy to pass the count down to DefineIndex. In principle the same could be done in ALTER TABLE, but that's structurally difficult; for now, just eat the cost of an extra find_all_inheritors scan in that case. Ilya Gladyshev and Justin Pryzby Discussion: https://postgr.es/m/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com	2023-03-25 15:34:03 -04:00
Jeff Davis	81a6d57e33	Fix abbreviated keys bug introduced in `d87d548cd0`. Discussion: http://postgr.es/m/CAMkU=1z17XJatF-rMCY3Cjqcxer-Kyn57x6h3OSCpJ0LpAp0ig@mail.gmail.com Reported-by: Jeff Janes	2023-03-25 11:08:32 -07:00
Tom Lane	3c05284d83	Invent GENERIC_PLAN option for EXPLAIN. This provides a very simple way to see the generic plan for a parameterized query. Without this, it's necessary to define a prepared statement and temporarily change plan_cache_mode, which is a bit tedious. One thing that's a bit of a hack perhaps is that we disable execution-time partition pruning when the GENERIC_PLAN option is given. That's because the pruning code may attempt to fetch the value of one of the parameters, which would fail. Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones, and myself Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at	2023-03-24 17:07:22 -04:00
Jeff Davis	a03b3b6b4a	Avoid potential UCollator leak for older ICU versions. ICU versions 53 and earlier rely on icu_set_collation_attributes() to process the attributes in the locale string. Avoid leaking the already-opened UCollator object if an error is encountered. Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-24 08:48:03 -07:00
Jeff Davis	9a24289915	pg_locale.c: change ereport() to elog(). Discussion: https://postgr.es/m/73553013-3926-0f34-0fb8-f37909fe4902@enterprisedb.com Reported-by: Peter Eisentraut	2023-03-24 08:47:42 -07:00
Daniel Gustafsson	a04761ac77	Fix typo in header comment Commit `4c04be9b0` accidentally left off the _id portion of the function name in the header comment. Author: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LP+ytnAXSzR=yiEaQrde+iCybMHsuPn9n=UN3puV_1tw@mail.gmail.com	2023-03-24 09:03:31 +01:00
Peter Eisentraut	a9bc04b211	Fix incorrect format placeholders The fields of NLSVERSIONINFOEX are of type DWORD, which is unsigned long, so the results of the computations being printed are also of type unsigned long.	2023-03-24 07:21:40 +01:00
Andres Freund	e522049f23	meson: add install-{quiet, world} targets To define our own install target, we need dependencies on the i18n targets, which we did not collect so far. Discussion: https://postgr.es/m/3fc3bb9b-f7f8-d442-35c1-ec82280c564a@enterprisedb.com	2023-03-23 21:20:18 -07:00
Michael Paquier	8089517ab8	Rename fields in pgstat structures for functions and relations This commit renames the members of a few pgstat structures related to functions and relations, by respectively removing their prefix "f_" and "t_". The statistics for functions and relations and handled in their own file, and pgstatfuncs.c associates each field in a structure variable named based on the object type handled, so no information is lost with this rename. This will help with some of the refactoring aimed for pgstatfuncs.c, as this makes more consistent the field names with the SQL functions retrieving them. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Melanie Plageman Discussion: https://postgr.es/m/9142f62a-a422-145c-bde0-b5bc498a4ada@gmail.com	2023-03-24 08:46:29 +09:00
Peter Geoghegan	ae4fdde135	Count updates that move row to a new page. Add pgstat counter to track row updates that result in the successor version going to a new heap page, leaving behind an original version whose t_ctid points to the new version. The current count is shown by the n_tup_newpage_upd column of each of the pg_stat_*_tables views. The new n_tup_newpage_upd column complements the existing n_tup_hot_upd and n_tup_upd columns. Tables that have high n_tup_newpage_upd values (relative to n_tup_upd) are good candidates for tuning heap fillfactor. Corey Huinker, with small tweaks by me. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com	2023-03-23 11:16:17 -07:00
Jeff Davis	3b50275b12	Handle the "und" locale in ICU versions 54 and older. The "und" locale is an alternative spelling of the root locale, but it was not recognized until ICU 55. To maintain common behavior across all supported ICU versions, check for "und" and replace with "root" before opening. Previously, the lack of support for "und" was dangerous, because versions 54 and older fall back to the environment when a locale is not found. If the user specified "und" for the language (which is expected and documented), it could not only resolve to the wrong collator, but it could unexpectedly change (which could lead to corrupt indexes). This effectively reverts commit `d72900bded`, which worked around the problem for the built-in "unicode" collation, and is no longer necessary. Discussion: https://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com Discussion: https://postgr.es/m/0c6fa66f2753217d2a40480a96bd2ccf023536a1.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-23 10:08:27 -07:00
Jeff Davis	a326aac8f1	Wrap ICU ucol_open(). Hide details of supporting older ICU versions in a wrapper function. The current code only needs to handle icu_set_collation_attributes(), but a subsequent commit will add additional version-specific code. Discussion: https://postgr.es/m/7ee414ad-deb5-1144-8a0e-b34ae3b71cd5@enterprisedb.com Reviewed-by: Peter Eisentraut	2023-03-23 09:15:25 -07:00
Amit Kapila	adedf54e65	Ignore generated columns during apply of update/delete. We fail to apply updates and deletes when the REPLICA IDENTITY FULL is used for the table having generated columns. We didn't use to ignore generated columns while doing tuple comparison among the tuples from the publisher and subscriber during apply of updates and deletes. Author: Onder Kalaci Reviewed-by: Shi yu, Amit Kapila Backpatch-through: 12 Discussion: https://postgr.es/m/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3+v8q32AWYsdpGg@mail.gmail.com	2023-03-23 11:58:36 +05:30
Amit Kapila	ecb696527c	Allow logical replication to copy tables in binary format. This patch allows copying tables in the binary format during table synchronization when the binary option for a subscription is enabled. Previously, tables are copied in text format even if the subscription is created with the binary option enabled. Copying tables in binary format may reduce the time spent depending on column types. A binary copy for initial table synchronization is supported only when both publisher and subscriber are v16 or later. Author: Melih Mutlu Reviewed-by: Peter Smith, Shi yu, Euler Taveira, Vignesh C, Kuroda Hayato, Osumi Takamichi, Bharath Rupireddy, Hou Zhijie Discussion: https://postgr.es/m/CAGPVpCQvAziCLknEnygY0v1-KBtg%2BOm-9JHJYZOnNPKFJPompw%40mail.gmail.com	2023-03-23 08:45:51 +05:30
Thomas Munro	8fba928fd7	Improve the naming of Parallel Hash Join phases. * Commit `3048898e` dropped -ING from PHJ wait event names. Update the corresponding barrier phases names to match. * Rename the "DONE" phases to "FREE". That's symmetrical with "ALLOCATE", and names the activity that actually happens in that phase (as we do for the other phases) rather than a state. The bug fixed by commit `8d578b9b` might have been more obvious with this name. * Rename the batch/bucket growth barriers' "ALLOCATE" phases to "REALLOCATE", a better description of what they do. * Update the high level comments about phases to highlight phases are executed by a single process with an asterisk (mostly memory management phases). No behavior change, as this is just improving internal identifiers. The only user-visible sign of this is that a couple of wait events' display names change from "...Allocate" to "...Reallocate" in pg_stat_activity, to stay in sync with the internal names. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com	2023-03-23 13:14:25 +13:00
Alexander Korotkov	11470f544e	Allow locking updated tuples in tuple_update() and tuple_delete() Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This patch eliminates step 2 by taking the lock during first tuple_update()/tuple_delete() call. Heap table access method saves some efforts by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2023-03-23 00:26:59 +03:00
Alexander Korotkov	764da7710b	Evade extra table_tuple_fetch_row_version() in ExecUpdate()/ExecDelete() When we lock tuple using table_tuple_lock() then we at the same time fetch the locked tuple to the slot. In this case we can skip extra table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple and nobody can change it concurrently since it's locked. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2023-03-23 00:26:59 +03:00
Andres Freund	5df319f3d5	Fix memory leak and inefficiency in CREATE DATABASE ... STRATEGY WAL_LOG RelationCopyStorageUsingBuffer() did not free the strategies used to access the source / target relation. They memory was released at the end of the transaction, but when using a template database with a lot of relations, the temporary leak can become big prohibitively big. RelationCopyStorageUsingBuffer() acquired the buffer for the target relation with RBM_NORMAL, therefore requiring a read of a block guaranteed to be zero. Use RBM_ZERO_AND_LOCK instead. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20230321070113.o2vqqxogjykwgfrr@awork3.anarazel.de Backpatch: 15-, where STRATEGY WAL_LOG was introduced	2023-03-22 09:20:34 -07:00
Jeff Davis	869650fa86	Support language tags in older ICU versions (53 and earlier). By calling uloc_canonicalize() before parsing the attributes, the existing locale attribute parsing logic works on language tags as well. Fix a small memory leak, too. Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com Reviewed-by: Peter Eisentraut	2023-03-21 16:12:37 -07:00
Michael Paquier	e8e1f96c49	Fix make maintainer-clean with queryjumblefuncs.*.c files in src/backend/nodes/ The files generated by gen_node_support.pl for query jumbling (queryjumblefuncs.funcs.c and queryjumblefuncs.switch.c) were not being removed on make maintainer-clean (they need to remain around after a simple "clean"). This commit makes the operation consistent with the copy, equal, out and read files. While on it, update a comment in the nodes'README where a reference to queryjumblefuncs.funcs.c was missing. Reported-by: Nathan Bossart Reviewed-by: Richard Guo, Daniel Gustafsson Discussion: https://postgr.es/m/ZBgAfTHcL6W7zGdW@paquier.xyz	2023-03-22 07:51:16 +09:00
David Rowley	b94c671648	Fix incorrect comment in preptlist.c Author: Etsuro Fujita Reviewed-by: Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAPmGK15V8dcVxL9vcgVWPHV6pw1qzM42LzoUkQDB7-e+1onnJw@mail.gmail.com	2023-03-22 08:58:13 +13:00
David Rowley	f48b4f892f	Correct Memoize's estimated cache hit ratio calculation As demonstrated by David Johnston, the Memoize cache hit ratio calculation wasn't quite correct. This change only affects the estimated hit ratio when the estimated number of entries to cache is estimated not to fit inside the cache. For example, if we expect 2000 distinct cache key values and only expect to be able to cache 1000 of those at once due to memory constraints, with an estimate of 10000 calls, if we could store all entries then the hit ratio should be 80% to account for the first 2000 of the 10000 calls to be a cache miss due to the value not being cached yet. If we can only store 1000 entries for each of the 2000 distinct possible values at once then the 80% should be reduced by half to make the final estimate of 40%. Previously, the calculation would have produced an estimated hit ratio of 30%, which wasn't correct. Apply to master only so as not to destabilize plans in the back branches. Reported-by: David G. Johnston Discussion: https://postgr.es/m/CAKFQuwZEmcNk3YQo2Xj4EDUOdY6qakad31rOD1Vc4q1_s68-Ew@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvrV44LwiF4W_qf_RpbGYWSgp1kF=cZr+kTRRaALUfmXqw@mail.gmail.com	2023-03-22 08:44:54 +13:00
Amit Kapila	b797def595	Ignore dropped columns during apply of update/delete. We fail to apply updates and deletes when the REPLICA IDENTITY FULL is used for the table having dropped columns. We didn't use to ignore dropped columns while doing tuple comparison among the tuples from the publisher and subscriber during apply of updates and deletes. Author: Onder Kalaci, Shi yu Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3+v8q32AWYsdpGg@mail.gmail.com	2023-03-21 09:47:21 +05:30
Thomas Munro	8d578b9b2e	Fix race in parallel hash join batch cleanup, take II. With unlucky timing and parallel_leader_participation=off (not the default), PHJ could attempt to access per-batch shared state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was racy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase, so that late attachers can avoid the hazard. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). An earlier attempt to fix this (commit `3b8981b6`, later reverted) missed one special case. When the inner side is empty (the "empty inner optimization), the build barrier would only make it to PHJ_BUILD_HASHING_INNER phase before workers attempted to detach from the hashtable. In that case, fast-forward the build barrier to PHJ_BUILD_RUNNING before proceeding, so that our later assertions hold and we can still negotiate who is cleaning up. Revealed by build farm failures, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). In non-assert builds, the result could be a segmentation fault. Back-patch to all supported releases. Author: Thomas Munro <thomas.munro@gmail.com> Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Michael Paquier <michael@paquier.xyz> Reported-by: David Geier <geidav.pg@gmail.com> Tested-by: David Geier <geidav.pg@gmail.com> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz	2023-03-21 14:29:34 +13:00

... 2 3 4 5 6 ...

24330 Commits