postgresql

Commit Graph

Author	SHA1	Message	Date
John Naylor	4b968e2027	Fix incorrect return type fasthash32() calculates a 32-bit hashcode, but the return type was uint64. Change to uint32. Noted by Jeff Davis Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com	2024-04-06 12:20:40 +07:00
Thomas Munro	aa1e8c2064	Improve read_stream.c's fast path. The "fast path" for well cached scans that don't do any I/O was accidentally coded in a way that could only be triggered by pg_prewarm's usage pattern, which starts out with a higher distance because of the flags it passes in. We want it to work for streaming sequential scans too, once that patch is committed. Adjust. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com	2024-04-06 18:00:35 +13:00
Andres Freund	9e7386924e	Fix headerscheck violation introduced in `f8ce4ed78c` Per ci.	2024-04-05 14:27:45 -07:00
Andrew Dunstan	c3e60f3d7e	Silence some compiler warnings in commit `3311ea86ed` Per report from Nathan Bossart	2024-04-05 16:08:40 -04:00
Robert Haas	55a5ee30cd	Fix incorrect calculation in BlockRefTableEntryGetBlocks. The previous formula was incorrect in the case where the function's nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens whenever a relation segment file is exactly 512MB or exactly 1GB in length. In such cases, the formula would calculate a stop_offset of 0 rather than 65536, resulting in modified blocks in the second half of a 1GB file, or all the modified blocks in a 512MB file, being omitted from the incremental backup. Reported off-list by Tomas Vondra and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com	2024-04-05 13:41:19 -04:00
Tomas Vondra	079d94ab34	Check HAVE_COPY_FILE_RANGE before calling copy_file_range Fix a mistake in `ac81101551` - write_reconstructed_file() called copy_file_range() without properly checking HAVE_COPY_FILE_RANGE. Reported by several macOS machines. Also reported by cfbot, but I missed that issue before commit.	2024-04-05 19:38:20 +02:00
Tomas Vondra	ac81101551	Allow using copy_file_range in write_reconstructed_file This commit allows using copy_file_range() for efficient combining of data from multiple files, instead of simply reading/writing the blocks. Depending on the filesystem and other factors (size of the increment, distribution of modified blocks etc.) this may be faster than the block-by-block copy, but more importantly it enables various features provided by CoW filesystems. If a checksum needs to be calculated for the file, the same strategy as when copying whole files is used - copy_file_range is used to copy the blocks, but the file is also read for the checksum calculation. While the checksum calculation is rarely needed when cloning whole files, when reconstructing the files from multiple backups it needs to happen almost always (the only exception is when the user specified --no-manifest). Author: Tomas Vondra Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 19:19:36 +02:00
Alvaro Herrera	b8b37e41ba	Make libpqsrv_cancel's return const char , not char Per headerscheck's C++ check. Discussion: https://postgr.es/m/372769.1712179784@sss.pgh.pa.us	2024-04-05 18:23:10 +02:00
Tomas Vondra	8e392595e5	Remove unused variable in checksum_file() The 'offset' variable was set but otherwise unused. Per buildfarm animals with clang, e.g. sifaka and longlin.	2024-04-05 18:13:15 +02:00
Tomas Vondra	f8ce4ed78c	Allow copying files using clone/copy_file_range Adds --clone/--copy-file-range options to pg_combinebackup, to allow copying files using file cloning or copy_file_range(). These methods may be faster than the standard block-by-block copy, but the main advantage is that they enable various features provided by CoW filesystems. This commit only uses these copy methods for files that did not change and can be copied as a whole from a single backup. These new copy methods may not be available on all platforms, in which case the command throws an error (immediately, even if no files would be copied as a whole). This early failure seems better than failing later when trying to copy the first file, after performing a lot of work on earlier files. If the requested copy method is available, but a checksum needs to be recalculated (e.g. because of a different checksum type), the file is still copied using the requested method, but it is also read for the checksum calculation. Depending on the filesystem this may be more expensive than just performing the simple copy, but it does enable the CoW benefits. Initial patch by Jakub Wartak, various reworks and improvements by me. Author: Tomas Vondra, Jakub Wartak Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 18:01:32 +02:00
Tom Lane	3c5ff36aba	Suppress "variable may be used uninitialized" warning. Buildfarm member caiman is showing this, which surprises me because it's very late-model gcc (14.0.1) and ought to be smart enough to know that elog(ERROR) doesn't return. But we're likely to see the same from stupider compilers too, so add a dummy initialization in our usual style.	2024-04-05 10:58:30 -04:00
Tomas Vondra	10e3226ba1	Align blocks in incremental backups to BLCKSZ Align blocks stored in incremental files to BLCKSZ, so that the incremental backups work well with CoW filesystems. The header of the incremental file is padded with \0 to a multiple of BLCKSZ, so that the block data (also BLCKSZ) is aligned to BLCKSZ. The padding is added only to files containing block data, so files with just the header remain small. This adds a bit of extra space, but as the number of blocks increases the overhead gets negligible very quickly. And as the padding is \0 bytes, it does compress extremely well. The alignment is important for CoW filesystems that usually require the blocks to be aligned to filesystem page size for features like block sharing, deduplication etc. to work well. With the variable sized header the blocks in the increments were not aligned at all, negating the benefits of the CoW filesystems. This matters even for non-CoW filesystems, for example when placed on a RAID array. If the block is not aligned, it may easily span multiple devices, causing read and write amplification. It might be better to align the blocks to the filesystem page, not BLCKSZ, but we have no good way to determine that. Even if we determine the page size at the time of taking the backup, the backup may move. For now the BLCKSZ seems sufficient - the filesystem page is usually 4K, so the default BLCKSZ (8K by default) is aligned to that. Author: Tomas Vondra Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 16:30:01 +02:00
Alvaro Herrera	ee1cbe806d	Operate XLogCtl->log{Write,Flush}Result with atomics This removes the need to hold both the info_lck spinlock and WALWriteLock to update them. We use stock atomic write instead, with WALWriteLock held. Readers can use atomic read, without any locking. This allows for some code to be reordered: some places were a bit contorted to avoid repeated spinlock acquisition, but that's no longer a concern, so we can turn them to more natural coding. Some further changes are possible (maybe to performance wins), but in this commit I did rather minimal ones only, to avoid increasing the blast radius. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql	2024-04-05 16:14:39 +02:00
Amit Kapila	6f132ed693	Allow synced slots to have their inactive_since. This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com	2024-04-05 09:48:49 +05:30
Michael Paquier	f98dbdeb51	Add "ABI_compatibility" regions to wait_event_names.txt The current design behind the automatic generation of the C code and documentation related to wait events introduced in `fa88928470` does not offer a way to attach new wait events without breaking ABI compatibility, as all the events are forcibly reordered for each section in the input file wait_event_names.txt. Adding new wait events to stable branches is something that has happened in the past, `0b6517a3b7` being a recent example of that with VERSION_FILE_SYNC, so we need a way to generate any C code for wait events while maintaining compatibility on stable branches already released. This commit solves this issue by adding a new region called "ABI_compatibility" (keyword could be updated to something else if someone had a better idea) to each section of wait_event_names.txt, so as one can add new wait events to stable branches in wait_event_names.txt while keeping the code ABI-compatible. "ABI_compatibility" has no impact on the documentation generated: all the wait events of one section are still alphabetically ordered. LWLock and Lock sections generate their C code elsewhere, so they do not need an "ABI_compatibility" region. For example, let's imagine a wait_event_names.txt like that: Section: ClassName - Foo FOO_1 "Waiting in Foo 1" FOO_2 "Waiting in Foo 2" ABI_compatibility: NEW_FOO_1 "Waiting in New Foo 1" NEW_BAR_1 "Waiting in New Bar 1" This results in the following enum, where the events in the ABI region are listed last with the same ordering as in wait_event_names.txt: typedef enum { WAIT_EVENT_FOO_1, WAIT_EVENT_FOO_2, WAIT_EVENT_NEW_FOO_1, WAIT_EVENT_NEW_BAR_1 } WaitEventFoo; New wait events added in stable branches should be added at the end of each ABI_compatibility region, and ABI_compatibility should remain empty on HEAD and unreleased stable branches. This design has been suggested by Noah Misch and me. Reported-by: Noah Misch Author: Bertrand Drouvot Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com	2024-04-05 08:56:52 +09:00
Jeff Davis	e2a2357671	Fix test failures when language environment is not UTF-8. For tests that depend on UTF-8 encoding, force LC_COLLATE=C and LC_CTYPE=C to avoid an encoding mismatch. Reported-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGK-ZqV1njkG_=xcCqXh2fcMkz85FTMnhS2opm4ZerH=xw@mail.gmail.com	2024-04-04 16:10:12 -07:00
Robert Haas	e57fe3824e	Fix old, misleading comment for PGRES_POLLING_ACTIVE. The comment implies that we can eventually remove this, but per discussion, we actually don't want to do that ever, in order to maintain compatibility. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Robert Haas	12b964d781	Remove reachable call to pg_unreachable(). The loop just before this uses break, not return, so this line is reachable. Commit `cafe105655` introduced this issue. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Tom Lane	096a761d68	Fix ecpg's mechanism for detecting unsupported cases in the grammar. ecpg wants to emit a warning if it parses a SQL construct that the backend can parse but will immediately throw a FEATURE_NOT_SUPPORTED error for. The way it was testing for this was to see if the string ERRCODE_FEATURE_NOT_SUPPORTED appeared anywhere in the gram.y code. This is, of course, not nearly good enough, as there are plenty of rules in gram.y that throw that error only conditionally. There was a hack dating to 2008 to suppress the warning in one rule that doesn't even exist anymore, but nothing for other cases we've created since then. End result was that you could get "unsupported feature will be passed to server" warnings while compiling perfectly good SQL code in ecpg. Somehow we'd not heard complaints about this, but it was exposed by the recent addition of an ecpg test for a SQL/JSON construct. To fix, suppress the warning if the rule contains any "if" statement. Manual comparison of gram.y with the generated preproc.y file shows that the warning is now emitted only in rules where it's sensible. This problem has existed for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/603615.1712245382@sss.pgh.pa.us	2024-04-04 15:31:53 -04:00
Tom Lane	332d406140	Further cleanup for recent JSON-related commits. The link commands in test_json_parser/Makefile were a long way shy of a load, as evidenced by buildfarm failures. Model them on pgxs.mk's PROGRAM rule. (Probably we should have put these two test programs in different subdirectories so we could actually use the PROGRAM rule. But I won't question that decision today.)	2024-04-04 13:39:12 -04:00
Tom Lane	2497a669ef	Further cleanup for recent JSON-related commits. Add overlooked .gitignore entries. Fix test_json_parser/Makefile to use the pgxs.mk clean rule instead of fighting it. Suppresses a warning from make, at least for me.	2024-04-04 13:21:25 -04:00
Andrew Dunstan	88620824c2	Tidy up after incremental JSON parser patch Remove junk left over from non-vpath builds. Try to remedy gettext error on some platforms.	2024-04-04 12:41:55 -04:00
Andrew Dunstan	1b00fe30a6	Fix warnings re typedef redefinition in `ea7b4e9a2a` and `3311ea86ed` Per gripe from Tom Lane and the buildfarm	2024-04-04 11:36:26 -04:00
Amit Langote	6f4d63e989	Add missing initialization in transformJsonFuncExpr() `de3600452b` added some code for the new JSON_TABLE_OP to that function but missed to initialize the default_format variable. Reported-by: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl	2024-04-04 22:01:13 +09:00
Amit Langote	2f6e78b061	Fix typo introduced in `6185c9737` Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com	2024-04-04 20:53:23 +09:00
Amit Langote	de3600452b	Add basic JSON_TABLE() functionality JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in `6185c9737c` are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-04 20:20:15 +09:00
Peter Eisentraut	a9d6c38684	pg_upgrade: Fix typo in message	2024-04-04 12:58:57 +02:00
Andrew Dunstan	222e11a10a	Use incremental parsing of backup manifests. This changes the three callers to json_parse_manifest() to use json_parse_manifest_incremental_chunk() if appropriate. In the case of the backend caller, since we don't know the size of the manifest in advance we always call the incremental parser. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	ea7b4e9a2a	Add support for incrementally parsing backup manifests This adds the infrastructure for using the new non-recursive JSON parser in processing manifests. It's important that callers make sure that the last piece of json handed to the incremental manifest parser contains the entire last few lines of the manifest, including the checksum. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	3311ea86ed	Introduce a non-recursive JSON parser This parser uses an explicit prediction stack, unlike the present recursive descent parser where the parser state is represented on the call stack. This difference makes the new parser suitable for use in incremental parsing of huge JSON documents that cannot be conveniently handled piece-wise by the recursive descent parser. One potential use for this will be in parsing large backup manifests associated with incremental backups. Because this parser is somewhat slower than the recursive descent parser, it is not replacing that parser, but is an additional parser available to callers. For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all JSON parsing is done with the non-recursive parser, in which case only trivial regression differences in error messages should be observed. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
David Rowley	3a4a3537a9	Secondary refactor of heap scanning functions Similar to `44086b097`, refactor heap scanning functions to be more suitable for the read stream API. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 19:22:45 +13:00
Michael Paquier	2a217c3717	Coordinate emit_log_hook and all log destinations to share the same timeval This would cause the timestamp values used by emit_log_hook and all the other log destinations to differ, because the timestamps are reset before sending the logs to the server and after calling the hook. This change matters for emit_log_hook when generating log information with 'n' or 'm' in log_line_prefix through log_status_format(), or when doing direct calls to get_formatted_log_time() like in the JSON or CSV log formats. While on it, this commit fixes a couple of comments related to the formatted timestamps where the JSON was not mentioned. Oversight in `dc686681e0`, that I have noticed while reviewing this patch. Author: Kambam Vinay, Michael Paquier Discussion: https://postgr.es/m/CANiRfmsK36A0i8mnQtzaxhSm3CUCimPwJPp4WQNq53OdSNkgWg@mail.gmail.com	2024-04-04 14:15:22 +09:00
David Rowley	44086b0975	Preliminary refactor of heap scanning functions To allow the use of the read stream API added in `b5a9b18cd` for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 16:41:13 +13:00
Michael Paquier	85230a247c	pg_regress: Save errno in emit_tap_output_v() and switch to %m emit_tap_output_v() includes some fprintf() calls for some output related to the TAP protocol, that may clobber errno and break %m. This commit makes the logging of pg_regress smarter by saving errno before restoring it in vfprintf() where the input strings are used, removing the need for strerror(). All logs are switched to %m rather than strerror(), shaving some code. This was not a problem until now as pg_regress.c did not use %m, but the change is simple enough that we have no reason to not support this placeholder, and that will avoid future mistakes if new logs that include %m are added. Author: Dagfinn Ilmari Mannsåker Reviewed-by: Peter Eisentraunt, Michael Paquier Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org	2024-04-04 11:33:07 +09:00
Jeff Davis	71b66171d0	CREATE INDEX: do not update stats during binary upgrade. During binary upgrade, indexes are created before the data is moved into place, so it will always be zero. This is not currently a major problem, but will be when we try to preserve statistics during upgrade. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=daPdFB8V0tgFxK-dLowFsAEzWRWJHyxij7BG3kBjcouA@mail.gmail.com	2024-04-03 16:12:45 -07:00
Tom Lane	06286709ee	Invent SERIALIZE option for EXPLAIN. EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de	2024-04-03 17:41:57 -04:00
Alexander Korotkov	97ce821e3e	Fix the parameters order for TableAmRoutine.relation_copy_for_cluster() Specify OldTable first, NewTable second as used by table_relation_copy_for_cluster() and as implemented in heapam_relation_copy_for_cluster(). Backpatch to PostgreSQL 12, where TableAmRoutine was introduced. Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM Author: Japin Li Reviewed-by: Pavel Borisov Backpatch-through: 12	2024-04-04 00:34:28 +03:00
Alvaro Herrera	c9920a9068	Split XLogCtl->LogwrtResult into separate struct members After this change we have XLogCtl->logWriteResult and ->logFlushResult. There's no functional change, other than the fact that the assignment from shared memory to local is no longer done via struct assignment, but instead using a macro that copies each member separately. The current representation is inconvenient going forward; notably, we would like to add a new member "Copy" (to keep track of the last position copied into WAL buffers), so the symmetry between the values in shared memory vs. those in local would be lost. This also gives us freedom to later change the concurrency model for the values in shared memory: we can make them use atomics instead of relying on the info_lck spinlock. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql	2024-04-03 19:55:11 +02:00
Nathan Bossart	deb1486c7d	Inline pg_popcount() for small buffers. If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13	2024-04-03 12:22:02 -05:00
Heikki Linnakangas	6dbb490261	Combine freezing and pruning steps in VACUUM Execute both freezing and pruning of tuples in the same heap_page_prune() function, now called heap_page_prune_and_freeze(), and emit a single WAL record containing all changes. That reduces the overall amount of WAL generated. This moves the freezing logic from vacuumlazy.c to the heap_page_prune_and_freeze() function. The main difference in the coding is that in vacuumlazy.c, we looked at the tuples after the pruning had already happened, but in heap_page_prune_and_freeze() we operate on the tuples before pruning. The heap_prepare_freeze_tuple() function is now invoked after we have determined that a tuple is not going to be pruned away. VACUUM no longer needs to loop through the items on the page after pruning. heap_page_prune_and_freeze() does all the work. It now returns the list of dead offsets, including existing LP_DEAD items, to the caller. Similarly it's now responsible for tracking 'all_visible', 'all_frozen', and 'hastup' on the caller's behalf. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-03 19:32:28 +03:00
Heikki Linnakangas	26d138f644	Refactor how heap_prune_chain() updates prunable_xid In preparation of freezing and counting tuples which are not candidates for pruning, split heap_prune_record_unchanged() into multiple functions, depending the kind of line pointer. That's not too interesting right now, but makes the next commit smaller. Recording the lowest soon-to-be prunable xid is one of the actions we take for unchanged LP_NORMAL item pointers but not for others, so move that to the new heap_prune_record_unchanged_lp_normal() function. The next commit will add more actions to these functions. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-03 19:32:21 +03:00
Alvaro Herrera	be2f073100	Fix zeroing of pg_serial page without SLRU bank lock Bug in commit 53c2a97a9266: we failed to acquire the correct SLRU bank lock when iterating to zero-out intermediate pages in predicate.c. Rewrite the code block so that we follow the locking protocol correctly. Also update an outdated comment in the same file -- SerialSLRULock exists no more. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/2a25eaf4-a3a4-5fd1-6241-9d7c73142085@gmail.com	2024-04-03 17:49:44 +02:00
Alexander Korotkov	bf1e650806	Use the pairing heap instead of a flat array for LSN replay waiters `06c418e163` introduced pg_wal_replay_wait() procedure allowing to wait for the particular LSN to be replayed on standby. The waiters were stored in the flat array. Even though scanning small arrays is fast, that might be a problem at scale (a lot of waiting processes). This commit replaces the flat shared memory array with the pairing heap, which holds the waiter with the least LSN at the top. This gives us O(log N) complexity for both inserting and removing waiters. Reported-by: Alvaro Herrera Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql	2024-04-03 18:15:41 +03:00
Daniel Gustafsson	936e3fa378	Drop global objects after completed test Project policy is to not leave global objects behind after a regress test run. This was found as a result of the development of a patch to make pg_regress detect such leftovers automatically, which in the end was withdrawn due to issues with parallel runs. Discussion: https://postgr.es/m/E1phvk7-000VAH-7k@gemulon.postgresql.org	2024-04-03 13:33:25 +02:00
Amit Kapila	2ec005b4e2	Ensure that the sync slots reach a consistent state after promotion without losing data. We were directly copying the LSN locations while syncing the slots on the standby. Now, it is possible that at some particular restart_lsn there are some running xacts, which means if we start reading the WAL from that location after promotion, we won't reach a consistent snapshot state at that point. However, on the primary, we would have already been in a consistent snapshot state at that restart_lsn so we would have just serialized the existing snapshot. To avoid this problem we will use the advance_slot functionality unless the snapshot already exists at the synced restart_lsn location. This will help us to ensure that snapbuilder/slot statuses are updated properly without generating any changes. Note that the synced slot will remain as RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a consistent snapshot state after which they will be marked as RS_PERSISTENT. Per buildfarm Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-04-03 14:04:59 +05:30
Alexander Korotkov	e37662f221	Minor improvements for waitlsn.c * Remove extra includes * Fill 'cur' in addLSNWaiter() before taking the spinlock * Initialize 'endtime' with zero in WaitForLSN() to avoid compiler warning Reported-by: Alvaro Herrera, Masahiko Sawada, Daniel Gustafsson Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Discussion: https://postgr.es/m/CAD21AoAx7irptnPH1OkkkNh9E0M6X-phfX7sYZfwoMsc1qV1sQ%40mail.gmail.com	2024-04-03 11:32:39 +03:00
Daniel Gustafsson	9301308bd1	Fix indentation from `cafe105655` Per buildfarm animal koel	2024-04-03 09:44:47 +02:00
Daniel Gustafsson	226261f387	Add error codes to some PANIC/FATAL errors reports This adds errcodes to a set of PANIC and FATAL errors in xlog.c and relcache.c, which previously had no errcode at all set, in order to make fleetwide analysis of errorlogs easier. There are many more ereport/elogs left which could benefit from having an errcode but this at least makes a dent in the issue. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ1k8LgLEqncPGmz_fWnrobV6bjABOTH4tOWta6xNcPQig@mail.gmail.com	2024-04-03 09:19:25 +02:00
Nathan Bossart	c627d944e6	Add built-in ERROR handling for archive callbacks. Presently, the archiver process restarts when an archive callback ERRORs. To avoid this, archive module authors can use sigsetjmp(), manage a memory context, etc., but that requires a lot of extra code that will likely look roughly the same between modules. This commit adds basic archive callback ERROR handling to pgarch.c so that module authors won't ordinarily need to worry about this. While this built-in handler attempts to clean up anything that an archive module could conceivably have left behind, it is possible that some modules are doing unexpected things that require additional cleanup. Module authors should be sure to do any extra required cleanup in a PG_CATCH block within the archiving callback. The archiving callback is now called in a short-lived memory context that the archiver process resets between invocations. If a module requires longer-lived storage, it must maintain its own memory context. Thanks to these changes, the basic_archive module can be greatly simplified. Suggested-by: Andres Freund Reviewed-by: Andres Freund, Yong Li Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13	2024-04-02 22:28:11 -05:00
Masahiko Sawada	5bec1d6bc5	Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions. Previously, when selecting the transaction to evict during logical decoding, we check all transactions to find the largest transaction. This could lead to a significant replication lag especially in the case where there are many subtransactions. This commit improves the eviction algorithm in ReorderBuffer using the max-heap with transaction size as the key to efficiently find the largest transaction. The max-heap starts with empty. While the max-heap is empty, we don't do anything for the max-heap when updating the memory counter. Therefore, we get the largest transaction in O(N) time, where N is the number of transactions including top-level transactions and subtransactions. We build the max-heap just before selecting the largest transactions if the number of transactions being decoded is higher than the threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap, we also update the max-heap when updating the memory counter. The intention is to efficiently find the largest transaction in O(1) time instead of incurring the cost of memory counter updates (O(log N)). Once the number of transactions got lower than the threshold, we reset the max-heap. The performance benchmark results showed significant speed up (more than x30 speed up on my machine) in decoding a transaction with 100k subtransactions, whereas there is no visible overhead in other cases. Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian, Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com	2024-04-03 11:40:42 +09:00
David Rowley	7487044d6c	Don't adjust ressortgroupref in generate_setop_child_grouplist() This is already done inside assignSortGroupRef(), therefore is redundant. Oversight from `66c0185a3`. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us	2024-04-03 15:39:29 +13:00
Masahiko Sawada	b840508644	Add functions to binaryheap for efficient key removal and update. Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com	2024-04-03 10:44:21 +09:00
Masahiko Sawada	bcb14f4abc	Make binaryheap enlargeable. The node array space of the binaryheap is doubled when there is no available space. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com	2024-04-03 10:27:43 +09:00
Alexander Korotkov	2c91e13013	Move WaitLSNShmemInit() to CreateOrAttachShmemStructs() Thanks to Andres Freund, Thomas Munrom and David Rowley for investigating this issue. Discussion: https://postgr.es/m/CAPpHfdvap5mMLikt8CUjA0osAvCJHT0qnYeR3f84EJ_Kvse0mg%40mail.gmail.com	2024-04-03 02:55:03 +03:00
David Rowley	3b1a7eb289	Don't zero tuple_fraction when planning UNIONs with ORDER BYs Since `66c0185a3`, the planner is able to use Merge Append -> Unique to implement UNION queries and each subquery is prompted to produce Paths correctly sorted by the UNION's targetlist. Here we remove some now redundant code which was zeroing the tuple_fraction at the parent level. This will allow the planner to consider cheap startup paths when planning the UNION's subqueries. EXCEPT and INTERSECT set operations still have the tuple_fraction zeroed in generate_nonunion_paths(). These operations currently always read all of their subqueries' tuples. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us	2024-04-03 11:40:33 +13:00
Alexander Korotkov	06c418e163	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed before starting the transaction. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes on standby. The queue of waiters is stored in the shared memory array sorted by LSN. During replay of WAL waiters whose LSNs are already replayed are deleted from the shared memory array and woken up by setting of their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working in a non-atomic context, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira	2024-04-02 22:48:03 +03:00
Tom Lane	6faca9ae28	Avoid deadlock during orphan temp table removal. If temp tables have dependencies (such as sequences) then it's possible for autovacuum's cleanup of orphan temp tables to deadlock against an incoming backend that's trying to clean out the temp namespace for its own use. That can happen because RemoveTempRelations' performDeletion call can visit objects within the namespace in an order different from the order in which a per-table deletion will visit them. To fix, observe that performDeletion will begin by taking an exclusive lock on the temp namespace (even though it won't actually delete it). So, if we can get a shared lock on the namespace, we can be sure we're not running concurrently with RemoveTempRelations, while also not conflicting with ordinary use of the namespace. This requires introducing a conditional version of LockDatabaseObject, but that's no big deal. (It's surprising we've got along without that this long.) Report and patch by Mikhail Zhilin. Back-patch to all supported branches. Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru	2024-04-02 14:59:32 -04:00
Nathan Bossart	4133c1f45c	Avoid function call overhead of pg_popcount() in syslogger.c. Instead of calling the pg_popcount() function for a single byte, we can look up the value in the pg_number_of_ones array. Discussion: https://postgr.es/m/20240401221117.GB2362108%40nathanxps13	2024-04-02 10:32:49 -05:00
Nathan Bossart	6687430c98	Refactor code for setting pg_popcount* function pointers. Presently, there are three copies of this code, and a proposed follow-up patch would add more code to each copy. This commit introduces a new inline function for this code and makes use of it in the pg_popcount*_choose functions, thereby reducing code duplication. Author: Paul Amonson Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com	2024-04-02 10:16:00 -05:00
Tom Lane	38698dd38e	Unwind #if spaghetti in hmac_openssl.c a bit. Make this code a little less confusing by defining a separate macro that controls whether we'll use ResourceOwner facilities to track the existence of a pg_hmac_ctx context. The proximate reason to touch this is that since `b8bff07da`, we got "unused function" warnings if building with older OpenSSL, because the #if guards around the ResourceOwner wrapper function definitions were different from those around the calls of those functions. Pulling the ResourceOwner machinations outside of the #ifdef HAVE_xxx guards fixes that and makes the code clearer too. Discussion: https://postgr.es/m/1394271.1712016101@sss.pgh.pa.us	2024-04-02 10:41:44 -04:00
Robert Haas	cafe105655	Allow SIGINT to cancel psql database reconnections. After installing the SIGINT handler in psql, SIGINT can no longer cancel database reconnections. For instance, if the user starts a reconnection and then needs to do some form of interaction (ie psql is polling), there is no way to cancel the reconnection process currently. Use PQconnectStartParams() in order to insert a cancel_pressed check into the polling loop. Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte Fennema-Nio, and me. Discussion: http://postgr.es/m/D08WWCPVHKHN.3QELIKZJ2D9RZ@neon.tech	2024-04-02 10:26:10 -04:00
Robert Haas	f5e4dedfa8	Expose PQsocketPoll via libpq This is useful when connecting to a database asynchronously via PQconnectStart(), since it handles deciding between poll() and select(), and some of the required boilerplate. Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte Fennema-Nio, and me. Discussion: http://postgr.es/m/D08WWCPVHKHN.3QELIKZJ2D9RZ@neon.tech	2024-04-02 10:15:56 -04:00
Thomas Munro	b5a9b18cd0	Provide API for streaming relation data. Introduce an abstraction allowing relation data to be accessed as a stream of buffers, with an implementation that is more efficient than the equivalent sequence of ReadBuffer() calls. Client code supplies a callback that can say which block number it wants next, and then consumes individual buffers one at a time from the stream. This division puts read_stream.c in control of how far ahead it can see and allows it to read clusters of neighboring blocks with StartReadBuffers(). It also issues POSIX_FADV_WILLNEED advice ahead of time when random access is detected. Other variants of I/O stream will be proposed in future work (for example to support recovery, whose LsnReadQueue device in xlogprefetcher.c is a distant cousin of this code and should eventually be replaced by this), but this basic API is sufficient for many common executor usage patterns involving predictable access to a single fork of a single relation. Several patches using this API are proposed separately. This stream concept is loosely based on ideas from Andres Freund on how we should pave the way for later work on asynchronous I/O. Author: Thomas Munro <thomas.munro@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> (contributions) Author: Melanie Plageman <melanieplageman@gmail.com> (contributions) Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:49:46 +13:00
Thomas Munro	210622c60e	Provide vectored variant of ReadBuffer(). Break ReadBuffer() up into two steps. StartReadBuffers() and WaitReadBuffers() give us two main advantages: 1. Multiple consecutive blocks can be read with one system call. 2. Advice (hints of future reads) can optionally be issued to the kernel ahead of time. The traditional ReadBuffer() function is now implemented in terms of those functions, to avoid duplication. A new GUC io_combine_limit is defined, and the functions for limiting per-backend pin counts are made into public APIs. Those are provided for use by callers of StartReadBuffers(), when deciding how many buffers to read at once. The following commit will add a higher level mechanism for doing that automatically with a practical interface. With some more infrastructure in later work, StartReadBuffers() could be extended to start real asynchronous I/O instead of just issuing advice and leaving WaitReadBuffers() to do the work synchronously. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> (some optimization tweaks) Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:23:20 +13:00
Alvaro Herrera	13b3b62746	Don't use the pg_am system catalog in new test This causes deadlocks because it's a highly trafficked catalog. Use a regular table created by the same test instead. Discussion: https://postgr.es/m/f3e61e27-19d0-5e40-3eb2-53282fa0532a@gmail.com	2024-04-02 13:10:16 +02:00
Alexander Korotkov	867cc7b6dd	Revert "Custom reloptions for table AM" This reverts commit `c95c25f9af` due to multiple design issues spotted after commit. Reported-by: Jeff Davis Discussion: https://postgr.es/m/11550b536211d5748bb2865ed6cb3502ff073bf7.camel%40j-davis.com	2024-04-02 11:29:00 +03:00
Masahiko Sawada	667e65aac3	Use TidStore for dead tuple TIDs storage during lazy vacuum. Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit `30e144287a`. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-04-02 10:15:37 +09:00
David Rowley	d5d2205c8d	Fix assert failure when planning setop subqueries with CTEs `66c0185a3` adjusted the UNION planner to request that union child queries produce Paths correctly ordered to implement the UNION by way of MergeAppend followed by Unique. The code there made a bad assumption that if the root->parent_root->parse had setOperations set that the query must be the child subquery of a set operation. That's not true when it comes to planning a non-inlined CTE which is parented by a set operation. This causes issues as the CTE's targetlist has no requirement to match up to the SetOperationStmt's groupClauses Fix this by adding a new parameter to both subquery_planner() and grouping_planner() to explicitly pass the SetOperationStmt only when planning set operation child subqueries. Thank you to Tom Lane for helping to rationalize the decision on the best function signature for subquery_planner(). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com	2024-04-02 12:15:45 +13:00
Tom Lane	3622c80846	Avoid "unused variable" warning on non-USE_SSL_ENGINE platforms. If we are building with openssl but USE_SSL_ENGINE didn't get set, initialize_SSL's variable "pkey" is declared but used nowhere. Apparently this combination hasn't been exercised in the buildfarm before now, because I've not seen this warning before, even though the code has been like this a long time. Move the declaration to silence the warning (and remove its useless initialization). Per buildfarm member sawshark. Back-patch to all supported branches.	2024-04-01 19:01:36 -04:00
Heikki Linnakangas	3d0f730bf1	Introduce 'options' argument to heap_page_prune() Currently there is only one option, HEAP_PAGE_PRUNE_MARK_UNUSED_NOW which replaces the old boolean argument, but upcoming patches will introduce at least one more. Having a lot of boolean arguments makes it hard to see at the call sites what the arguments mean, so prefer a bitmask of options with human-readable names. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240401172219.fngjosaqdgqqvg4e@liskov	2024-04-02 00:56:05 +03:00
Tom Lane	959b38d770	Invent --transaction-size option for pg_restore. This patch allows pg_restore to wrap its commands into transaction blocks, somewhat like --single-transaction, except that we commit and start a new block after every N objects. Using this mode with a size limit of 1000 or so objects greatly reduces the number of transactions consumed by the restore, while preventing any one transaction from taking enough locks to overrun the receiving server's shared lock table. (A value of 1000 works well with the default lock table size of around 6400 locks. Higher --transaction-size values can be used if one has increased the receiving server's lock table size.) Excessive consumption of XIDs has been reported as a problem for pg_upgrade in particular, but it could be bad for any restore; and the change also reduces the number of fsyncs and amount of WAL generated, so it should provide speed benefits too. This patch does not try to make parallel workers batch the SQL commands they issue. The trouble with doing that is that other workers may need to see the objects a worker creates right away. Possibly this can be improved later. In this patch I have hard-wired pg_upgrade to use a transaction size of 1000 divided by the number of parallel restore jobs allowed (without that, we'd still be at risk of overrunning the shared lock table). Perhaps there would be value in adding another pg_upgrade option to allow user control of that, but I'm unsure that it's worth the trouble; I think few users would use it, and any who did would see not that much benefit compared to the default. Patch by me, but the original idea to batch SQL commands during restore is due to Robins Tharakan. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com	2024-04-01 16:46:24 -04:00
Tom Lane	a45c78e328	Rearrange pg_dump's handling of large objects for better efficiency. Commit `c0d5be5d6` caused pg_dump to create a separate BLOB metadata TOC entry for each large object (blob), but it did not touch the ancient decision to put all the blobs' data into a single "BLOBS" TOC entry. This is bad for a few reasons: for databases with millions of blobs, the TOC becomes unreasonably large, causing performance issues; selective restore of just some blobs is quite impossible; and we cannot parallelize either dump or restore of the blob data, since our architecture for that relies on farming out whole TOC entries to worker processes. To improve matters, let's group multiple blobs into each blob metadata TOC entry, and then make corresponding per-group blob data TOC entries. Selective restore using pg_restore's -l/-L switches is then possible, though only at the group level. (Perhaps we should provide a switch to allow forcing one-blob-per-group for users who need precise selective restore and don't have huge numbers of blobs. This patch doesn't do that, instead just hard-wiring the maximum number of blobs per entry at 1000.) The blobs in a group must all have the same owner, since the TOC entry format only allows one owner to be named. In this implementation we also require them to all share the same ACL (grants); the archive format wouldn't require that, but pg_dump's representation of DumpableObjects does. It seems unlikely that either restriction will be problematic for databases with huge numbers of blobs. The metadata TOC entries now have a "desc" string of "BLOB METADATA", and their "defn" string is just a newline-separated list of blob OIDs. The restore code has to generate creation commands, ALTER OWNER commands, and drop commands (for --clean mode) from that. We would need special-case code for ALTER OWNER and drop in any case, so the alternative of keeping the "defn" as directly executable SQL code for creation wouldn't buy much, and it seems like it'd bloat the archive to little purpose. Since we require the blobs of a metadata group to share the same ACL, we can furthermore store only one copy of that ACL, and then make pg_restore regenerate the appropriate commands for each blob. This saves space in the dump file not only by removing duplicative SQL command strings, but by not needing a separate TOC entry for each blob's ACL. In turn, that reduces client-side memory requirements for handling many blobs. ACL TOC entries that need this special processing are labeled as "ACL"/"LARGE OBJECTS nnn..nnn". If we have a blob with a unique ACL, continue to label it as "ACL"/"LARGE OBJECT nnn". We don't actually have to make such a distinction, but it saves a few cycles during restore for the easy case, and it seems like a good idea to not change the TOC contents unnecessarily. The data TOC entries ("BLOBS") are exactly the same as before, except that now there can be more than one, so we'd better give them identifying tag strings. Also, commit `c0d5be5d6` put the new BLOB metadata TOC entries into SECTION_PRE_DATA, which perhaps is defensible in some ways, but it's a rather odd choice considering that we go out of our way to treat blobs as data. Moreover, because parallel restore handles the PRE_DATA section serially, this means we'd only get part of the parallelism speedup we could hope for. Move these entries into SECTION_DATA, letting us parallelize the lo_create calls not just the data loading when there are many blobs. Add dependencies to ensure that we won't try to load data for a blob we've not yet created. As this stands, we still generate a separate TOC entry for any comment or security label attached to a blob. I feel comfortable in believing that comments and security labels on blobs are rare, so this patch should be enough to get most of the useful TOC compression for blobs. We have to bump the archive file format version number, since existing versions of pg_restore wouldn't know they need to do something special for BLOB METADATA, plus they aren't going to work correctly with multiple BLOBS entries or multiple-large-object ACL entries. The directory and tar-file format handlers need some work for multiple BLOBS entries: they used to hard-wire the file name as "blobs.toc", which is replaced here with "blobs_<dumpid>.toc". The 002_pg_dump.pl test script also knows about that and requires minor updates. (I had to drop the test for manually-compressed blobs.toc files with LZ4, because lz4's obtuse command line design requires explicit specification of the output file name which seems impractical here. I don't think we're losing any useful test coverage thereby; that test stanza seems completely duplicative with the gzip and zstd cases anyway.) In passing, centralize management of the lo_buf used to hold data while restoring blobs. The code previously had each format handler create lo_buf, which seems rather pointless given that the format handlers all make it the same way. Moreover, the format handlers never use lo_buf directly, making this setup a failure from a separation-of-concerns standpoint. Let's move the responsibility into pg_backup_archiver.c, which is the only module concerned with lo_buf. The reason to do this in this patch is that it allows a centralized fix for the now-false assumption that we never restore blobs in parallel. Also, get rid of dead code in DropLOIfExists: it's been a long time since we had any need to be able to restore to a pre-9.0 server. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com	2024-04-01 16:25:56 -04:00
Tom Lane	5eac8cef24	Avoid possible longjmp-induced logic error in PLy_trigger_build_args. The "pltargs" variable wasn't marked volatile, which makes it unsafe to change its value within the PG_TRY block. It looks like the worst outcome would be to fail to release a refcount on Py_None during an (improbable) error exit, which would likely go unnoticed in the field. Still, it's a bug. A one-liner fix could be to mark pltargs volatile, but on the whole it seems cleaner to arrange things so that we don't change its value within PG_TRY. Per report from Xing Guo. This has been there for quite awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/CACpMh+DLrk=fDv07MNpBT4J413fDAm+gmMXgi8cjPONE+jvzuw@mail.gmail.com	2024-04-01 15:15:03 -04:00
Tom Lane	91cbb4b492	Fix assorted resource leaks in new pg_createsubscriber code. Various error paths did not release resources before returning. While it's likely that the program would just exit shortly later, none of the functions in question have summary exit(1) calls, so they should not be assuming that. Ranier Vilela and Tom Lane, per reports from Coverity Discussion: https://postgr.es/m/CAEudQAr2_SZFxB4kXJiL4+2UaNZxUk5UBJtj0oXyJYMGZu-03g@mail.gmail.com	2024-04-01 13:47:49 -04:00
Heikki Linnakangas	6f47f68831	Handle non-chain tuples outside of heap_prune_chain() Handle dead branches of aborted HOT chains outside heap_prune_chain() as a separate phase. This simplifies the logic in heap_prune_chain(), as well as allowing us to clean up more RECENTLY_DEAD -> DEAD chains. To accomplish this efficiently, partition tuples into HOT and non-HOT while first collecting visibility information for each tuple in heap_page_prune(). Then call heap_prune_chain() only on potential chain members. Then mop up the leftover HOT tuples afterwards. As part of this, keep track of which items on page have already been processed, in 'processed' array. This replaces the 'marked' array which was only set for tuples marked for removal or redirection. The 'processed' array is updated also for items that are left unchanged, when we conclude that an item can be left unchanged. At the end of pruning, every item on the page should be marked as processed in the array; an assertion is added for that. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-01 20:33:50 +03:00
Heikki Linnakangas	7aa00f1360	Refactor heap_prune_chain() Keep track of the number of deleted tuples in PruneState and record this information when recording a tuple dead, unused or redirected. This removes a special case from the traversal and chain processing logic as well as setting a precedent of recording the impact of prune actions in the record functions themselves. This paradigm will be used in future commits which move tracking of additional statistics on pruning actions from lazy_scan_prune() to heap_prune_chain(). Simplify heap_prune_chain()'s chain traversal logic by handling each case explicitly. That is, do not attempt to share code when processing different types of chains. For each category of chain, process it specifically and procedurally: first handling the root, then any intervening tuples, and, finally, the end of the chain. While we are at it, add a few new comments to heap_prune_chain() clarifying some special cases involving RECENTLY_DEAD tuples. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-01 13:28:44 +03:00
Heikki Linnakangas	9917e79d99	Minor refactoring in heap_page_prune Pass 'page', 'blockno' and 'maxoff' to heap_prune_chain() as arguments, so that it doesn't need to fetch them from the buffer. This saves a few cycles per chain. Remove the "if (off_loc != NULL)" checks, and require the caller to pass a non-NULL 'off_loc'. Pass a pointer to a dummy local variable when it's not needed. Those checks are cheap, but it's still better to avoid them in the per-chain loops when we can do so easily. The CPU time saving from these changes are hardly measurable, but fewer instructions is good anyway, so why not. I spotted the potential for these while reviewing Melanie Plageman's patch set to combine prune and freeze records. Discussion: https://www.postgresql.org/message-id/CAAKRu_abm2tHhrc0QSQa%3D%3DsHe%3DVA1%3Doz1dJMQYUOKuHmu%2B9Xrg%40mail.gmail.com	2024-04-01 12:07:30 +03:00
Masahiko Sawada	f5a227895e	Add new COPY option LOG_VERBOSITY. This commit adds a new COPY option LOG_VERBOSITY, which controls the amount of messages emitted during processing. Valid values are 'default' and 'verbose'. This is currently used in COPY FROM when ON_ERROR option is set to ignore. If 'verbose' is specified, a NOTICE message is emitted for each discarded row, providing additional information such as line number, column name, and the malformed value. This helps users to identify problematic rows that failed to load. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Atsushi Torikoshi, Masahiko Sawada Discussion: https://www.postgresql.org/message-id/CALj2ACUk700cYhx1ATRQyRw-fBM%2BaRo6auRAitKGff7XNmYfqQ%40mail.gmail.com	2024-04-01 15:25:25 +09:00
John Naylor	f4ad0021af	Revert "Speed up tail processing when hashing aligned C strings" This reverts commit `07f0f6abfc`. This has shown failures on both Valgrind and big-endian machines, per members skink and pike.	2024-03-31 14:18:36 +07:00
John Naylor	07f0f6abfc	Speed up tail processing when hashing aligned C strings After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. The mask can be cheaply computed without knowing the terminator's position. We still need that position for the length calculation, but the CPU can now do that in parallel with other work, shortening the dependency chain. Ants Aasma and John Naylor Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-03-31 12:19:16 +07:00
Alexander Korotkov	b1484a3f19	Let table AM insertion methods control index insertion Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger	2024-03-30 22:53:56 +02:00
Alexander Korotkov	c95c25f9af	Custom reloptions for table AM Let table AM define custom reloptions for its tables. This allows to specify AM-specific parameters by WITH clause when creating a table. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:36:25 +02:00
Alexander Korotkov	27bc1772fc	Generalize relation analyze in table AM interface Currently, there is just one algorithm for sampling tuples from a table written in acquire_sample_rows(). Custom table AM can just redefine the way to get the next block/tuple by implementing scan_analyze_next_block() and scan_analyze_next_tuple() API functions. This approach doesn't seem general enough. For instance, it's unclear how to sample this way index-organized tables. This commit allows table AM to encapsulate the whole sampling algorithm (currently implemented in acquire_sample_rows()) into the relation_analyze() API function. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:34:04 +02:00
Tom Lane	b154d8a6d0	Add pg_basetype() function to extract a domain's base type. This SQL-callable function behaves much like our internal utility function getBaseType(), except it returns NULL rather than failing for an invalid type OID. (That behavior is modeled on our experience with other catalog-inquiry functions such as the ACL checking functions.) The key advantage over doing a join to pg_type is that it will loop as needed to find the bottom base type of a nest of domains. Steve Chavez, reviewed by jian he and others Discussion: https://postgr.es/m/CAGRrpzZSX8j=MQcbCSEisFA=ic=K3bknVfnFjAv1diVJxFHJvg@mail.gmail.com	2024-03-30 13:57:19 -04:00
Dean Rasheed	0294df2f1f	Add support for MERGE ... WHEN NOT MATCHED BY SOURCE. This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE actions, which operate on rows that exist in the target relation, but not in the data source. These actions can execute UPDATE, DELETE, or DO NOTHING sub-commands. This is in contrast to already-supported WHEN NOT MATCHED actions, which operate on rows that exist in the data source, but not in the target relation. To make this distinction clearer, such actions may now be written as WHEN NOT MATCHED BY TARGET. Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is equivalent to writing WHEN NOT MATCHED BY TARGET. Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing. Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com	2024-03-30 10:00:26 +00:00
Jeff Davis	46e5441fa5	Add unicode_strtitle() for Unicode Default Case Conversion. This brings the titlecasing implementation for the builtin provider out of formatting.c and into unicode_case.c, along with unicode_strlower() and unicode_strupper(). Accepts an arbitrary word boundary callback. Simple for now, but can be extended to support the Unicode Default Case Conversion algorithm with full case mapping. Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com Reviewed-by: Peter Eisentraut	2024-03-29 17:35:07 -07:00
Daniel Gustafsson	a96a8b15fa	Remove superfluous trailing semicolons Two semicolons were accidentally added to rows which were already terminated semicolons. While harmless, fix by removing these. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs4_fnJ0+yOgFioswzLE7t6R8P6cqbuacFVeZqbESFAjs1A@mail.gmail.com	2024-03-29 23:51:43 +01:00
Jeff Davis	46a44dc372	Use version for builtin collations. Given that the version field already exists, there's little reason not to use it. Suggestion from Peter Eisentraut. Discussion: https://postgr.es/m/613c120a-5413-4fa7-a501-6590eae558f8@eisentraut.org Reviewed-by: Peter Eisentraut	2024-03-29 10:53:26 -07:00
Tom Lane	c2df2ed90a	Try to stabilize flappy test result. This recently-added test case checks the plan of an inner join between two identical tables. It's just chance which join order the planner will pick, and in the presence of any variation in the underlying statistics, the displayed plan might change. Add a WHERE condition to break the cost symmetry and hopefully stabilize matters. (We're still trying to understand exactly why the underlying statistics aren't as stable as intended, but this seems like a good change anyway, since this test would surely bite us again in future.) While here, clean up assorted comment spelling, grammar, and whitespace problems. Discussion: https://postgr.es/m/4168116.1711720146@sss.pgh.pa.us	2024-03-29 10:40:31 -04:00
Robert Haas	d3ae2a24f2	Add allow_alter_system GUC. This is marked PGC_SIGHUP, so it can only be set in a configuration file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE, so it can't be set using ALTER SYSTEM. When set to false, the ALTER SYSTEM command is disallowed. There was considerable concern that this would be misinterpreted as a security feature, which it is not, because a determined superuser has various ways of bypassing it. Hence, a lot of work has gone into wordsmithing the documentation, in the hopes of avoiding any such confusion. Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions for the documentation from many others. Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com	2024-03-29 08:45:11 -04:00
Tom Lane	0075d78947	Allow "internal" subtransactions in parallel mode. Allow use of BeginInternalSubTransaction() in parallel mode, so long as the subtransaction doesn't attempt to acquire an XID or increment the command counter. Given those restrictions, the other parallel processes don't need to know about the subtransaction at all, so this should be safe. The benefit is that it allows subtransactions intended for error recovery, such as pl/pgsql exception blocks, to be used in PARALLEL SAFE functions. Another reason for doing this is that the API of BeginInternalSubTransaction() doesn't allow reporting failure. pl/python for one, and perhaps other PLs, copes very poorly with an error longjmp out of BeginInternalSubTransaction(). The headline feature of this patch removes the only easily-triggerable failure case within that function. There remain some resource-exhaustion and similar cases, which we now deal with by promoting them to FATAL errors, so that callers need not try to clean up. (It is likely that such errors would leave us with corrupted transaction state inside xact.c, making recovery difficult if not impossible anyway.) Although this work started because of a report of a pl/python crash, we're not going to do anything about that in the back branches. Back-patching this particular fix is obviously not very wise. While we could contemplate some narrower band-aid, pl/python is already an untrusted language, so it seems okay to classify this as a "so don't do that" case. Patch by me, per report from Hao Zhang. Thanks to Robert Haas for review. Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com	2024-03-28 12:43:10 -04:00
Alvaro Herrera	e2395cdbe8	ALTER TABLE: rework determination of access method ID Avoid setting an access method OID for relation kinds that don't take one. Code review for new feature added in `374c7a2290`. Author: Justin Pryzby <pryzby@telsasoft.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/e5516ac1-5264-c3c0-d822-9e6f614ea93b@gmail.com	2024-03-28 16:51:20 +01:00
Tom Lane	be98a550cc	Update comment in set_dummy_rel_pathlist(). This comment claimed that set_dummy_rel_pathlist() has callers other than (possibly indirectly) set_rel_size(). It doesn't, so revise the argument to not rely on that. Noted by Richard Guo. Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com	2024-03-28 11:44:00 -04:00
Alvaro Herrera	213c959a29	Remove translation markers from libpq-be-fe-helpers.h Apparently these markers cause the modules to not link correctly in some platforms, at least per buildfarm member indri; moreover, this code is only used in modules that don't have a translation. If we someday add i18n support to contrib/ it might be worth revisiting this.	2024-03-28 13:12:12 +01:00
Alvaro Herrera	2466d6654f	libpq-be-fe-helpers.h: wrap new cancel APIs Commit `61461a300c` introduced new functions to libpq for cancelling queries. This commit introduces a helper function that backend-side libraries and extensions can use to invoke those. This function takes a timeout and can itself be interrupted while it is waiting for a cancel request to be sent and processed, instead of being blocked. This replaces the usage of the old functions in postgres_fdw and dblink. Finally, it also adds some test coverage for the cancel support in postgres_fdw. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/CAGECzQT_VgOWWENUqvUV9xQmbaCyXjtRRAYO8W07oqashk_N+g@mail.gmail.com	2024-03-28 11:31:03 +01:00
Heikki Linnakangas	427005742b	Remove obsolete comment about VACUUM retrying pruning Commit `1ccc1e05ae` removed the retry logic that the comment talked about. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov	2024-03-28 10:18:48 +02:00
Masahiko Sawada	f1bb9284f7	Improve tab completion for ALTER TABLE ALTER COLUMN SET in psql. The commit changes the tab completion to add DATA TYPE after ALTER TABLE ... ALTER COLUMN ... SET. Author: Vignesh C Reviewed-by: Shubham Khanna, Masahiko Sawada Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com	2024-03-28 16:30:10 +09:00
Nathan Bossart	7188a7806d	Improve style of pg_lfind32(). This commit simplifies pg_lfind32() a bit by moving the standard one-by-one linear search code to an inline helper function. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20240327013616.GA3940109%40nathanxps13	2024-03-27 20:26:05 -05:00
Masahiko Sawada	2d8f56dabb	Rethink create and attach APIs of shared TidStore. Previously, the behavior of TidStoreCreate() was inconsistent between local and shared TidStore instances in terms of memory limitation. For local TidStore, a memory context was created with initial and maximum memory block sizes, as well as a minimum memory context size, based on the specified max_bytes values. However, for shared TidStore, the provided DSA area was used for TID storage. Although commit `bb952c8c8b` allowed specifying the initial and maximum DSA segment sizes, callers would have needed to clamp their own limits, which was not consistent and user-friendly. With this commit, when creating a shared TidStore, a dedicated DSA area is created for TID storage instead of using a provided DSA area. The initial and maximum DSA segment sizes are chosen based on the specified max_bytes. Other processes can attach to the shared TidStore using the handle of the created DSA returned by the new TidStoreGetDSA() function and the DSA pointer returned by TidStoreGetHandle(). The created DSA has the same lifetime as the shared TidStore and is deleted when all processes detach from it. To improve clarity, the TidStoreCreate() function has been divided into two separate functions: TidStoreCreateLocal() and TidStoreCreateShared(). Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com	2024-03-28 10:03:28 +09:00
Jeff Davis	b0be28761e	Run perltidy on generate-unicode_version.pl.	2024-03-27 13:21:29 -07:00
Tom Lane	a767cdc84c	Fix unnecessary use of moving-aggregate mode with non-moving frame. When a plain aggregate is used as a window function, and the window frame start is specified as UNBOUNDED PRECEDING, the frame's head cannot move so we do not need to use moving-aggregate mode. The check for that was put into initialize_peragg(), failing to notice that ExecInitWindowAgg() calls that function before it's filled in winstate->frameOptions. Since makeNode() would have zeroed the field, this didn't provoke uninitialized-value complaints, nor would the erroneous decision have resulted in more than a little inefficiency. Still, it's wrong, so move the initialization of winstate->frameOptions earlier to make it work properly. While here, also fix a thinko in a comment. Both errors crept in in commit `a9d9acbf2` which introduced the moving-aggregate mode. Spotted by Vallimaharajan G. Back-patch to all supported branches. Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com	2024-03-27 13:39:03 -04:00
Robert Haas	de7e96bd0f	Rename COMPAT_OPTIONS_CLIENT to COMPAT_OPTIONS_OTHER. The user-facing name is "Other Platforms and Clients", but the internal name seems too focused on clients specifically, especially given the plan to add a new setting to this session that is about platform or deployment model compatibility rather than client compatibility. Jelte Fennema-Nio Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com	2024-03-27 10:45:28 -04:00
David Rowley	d6a6957d53	Fix unstable aggregate regression test Buildfarm member avocet has shown a plan change by switching the finalize aggregate stage to use a GroupAggregate rather than a HashAggregate. This is consistent with autovacuum having triggered on the table, per analysis by Alexander Lakhin. Fix this by disabling autovacuum on the table. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/d4493a28-589a-5328-fed5-250f2d7d3e2a@gmail.com Backpatch-through: 16, where this test was added.	2024-03-28 00:13:48 +13:00
Dean Rasheed	e6341323a8	Add functions to generate random numbers in a specified range. This adds 3 new variants of the random() function: random(min integer, max integer) returns integer random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric Each returns a random number x in the range min <= x <= max. For the numeric function, the number of digits after the decimal point is equal to the number of digits that "min" or "max" has after the decimal point, whichever has more. The main entry points for these functions are in a new C source file. The existing random(), random_normal(), and setseed() functions are moved there too, so that they can all share the same PRNG state, which is kept private to that file. Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev, and Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com	2024-03-27 10:12:39 +00:00
Alexander Korotkov	818861eb57	Fix some typos and grammar issues from commit `87985cc925` Reported-by: Alexander Lakhin	2024-03-27 11:47:41 +02:00
Amit Kapila	677a45c4ae	Fix random failure in 004_subscription. After the upgrade, the failed test was ensuring that the changes made on the publisher should be replicated to the subscriber. We missed waiting for one of the subscriptions to catch up. Per buildfarm Author: Vignesh C Reviewed-by: Kuroda Hayato Discussion: https://postgr.es/m/CALDaNm0z=fLtio1h50K8WossUGXU+gy0H9y9=RYh1DDZiq2EDw@mail.gmail.com	2024-03-27 14:15:03 +05:30
Amit Kapila	6d49c8d4b4	Change last_inactive_time to inactive_since in pg_replication_slots. Commit `a11f330b55` added last_inactive_time to show the last time the slot was inactive. But, it tells the last time that a currently-inactive slot previously WAS active. This could be unclear, so we changed the name to inactive_since. Reported-by: Robert Haas Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com	2024-03-27 09:27:44 +05:30
Masahiko Sawada	bb952c8c8b	Allow specifying initial and maximum segment sizes for DSA. Previously, the DSA segment size always started with 1MB and grew up to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios, such as when the caller desired a soft constraint on the total DSA segment size, limiting it to less than 1MB. This commit introduces the capability to specify the initial and maximum DSA segment sizes when creating a DSA area, providing more flexibility and control over memory usage. Reviewed-by: John Naylor, Tomas Vondra Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com	2024-03-27 11:43:29 +09:00
Nathan Bossart	1f42337be5	Fix compiler warning for pg_lfind32(). The newly-introduced "one_by_one" label produces -Wunused-label warnings when building without SIMD support. To fix, move the label into the SIMD section of this function. Oversight in commit `7644a7340c`. Reported-by: Tom Lane Discussion: https://postgr.es/m/3189995.1711495704%40sss.pgh.pa.us	2024-03-26 20:27:46 -05:00
Tom Lane	9d00cf4772	Remove some redundant set_cheapest() calls. Commit `e2fa76d80` centralized the responsibility for doing set_cheapest() for a baserel, but these functions added later seemingly didn't get the memo. There's no apparent reason why we need the cheapest path for these relation types to be available any sooner than it is for other base relation types, so delete the duplicate calls. Doesn't save much since there's only one path in these cases, but it might improve clarity. Richard Guo Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com	2024-03-26 16:02:44 -04:00
Nathan Bossart	d365ae7054	Optimize roles_is_member_of() with a Bloom filter. When the list of roles gathered by roles_is_member_of() grows very large, a Bloom filter is created to help avoid some linear searches through the list. The threshold for creating the Bloom filter is set arbitrarily high and may require future adjustment. Suggested-by: Tom Lane Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com	2024-03-26 14:43:37 -05:00
Tom Lane	fad3b5b5ac	Fix failure of ALTER FOREIGN TABLE SET SCHEMA to move sequences. Ordinary ALTER TABLE SET SCHEMA will also move any owned sequences into the new schema. We failed to do likewise for foreign tables, because AlterTableNamespaceInternal believed that only certain relkinds could have indexes, owned sequences, or constraints. We could simply add foreign tables to that relkind list, but it seems likely that the same oversight could be made again in future. Instead let's remove the relkind filter altogether. These functions shouldn't cost much when there are no objects that they need to process, and surely this isn't an especially performance-critical case anyway. Per bug #18407 from Vidushi Gupta. Back-patch to all supported branches. Discussion: https://postgr.es/m/18407-4fd07373d252c6a0@postgresql.org	2024-03-26 15:28:31 -04:00
Nathan Bossart	7644a7340c	Micro-optimize pg_lfind32(). This commit improves the performance of pg_lfind32() in many cases by modifying it to process the remaining "tail" of elements with SIMD instructions instead of processing them one-by-one. Since the SIMD code processes a large block of elements, this means that we will process a subset of elements more than once, but that won't affect the correctness of the result, and testing has shown that this helps more cases than it regresses. With this change, the standard one-by-one linear search code is only used for small arrays and for platforms without SIMD support. Suggested-by: John Naylor Reviewed-by: John Naylor Discussion: https://postgr.es/m/20231129171526.GA857928%40nathanxps13	2024-03-26 14:03:32 -05:00
Tom Lane	a65724dfa7	Propagate pathkeys from CTEs up to the outer query. If we know the sort order of a CTE's output, and it is relevant to the outer query, label the CTE's outer-query access path using those pathkeys. This may enable optimizations such as avoiding a sort in the outer query. The code for hoisting pathkeys into the outer query already exists for regular RTE_SUBQUERY subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query. However, on the same arguments used for commit `f7816aec2`, there seems no harm in letting the outer query know what the inner query decided to do. In support of this, we now remember the best Path as well as Plan for each subquery for the rest of the planner run. There may be future applications for having that at hand, and it surely costs little to build one more List. Richard Guo (minor mods by me) Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com	2024-03-26 13:05:51 -04:00
Bruce Momjian	e648e77e25	C comment: mention no doc for negative start of substring(text) Also add URL to hackers discussion. Backpatch-through: master	2024-03-26 12:27:41 -04:00
Tom Lane	8a92b70c11	Allow "make check"-style testing to work with musl C library. The musl dynamic linker saves a pointer to the process' environment value of LD_LIBRARY_PATH very early in startup. When we move/clobber the environment to make more room for ps status strings, we clobber that value and thereby prevent libraries from being found via LD_LIBRARY_PATH, which breaks the use of a temporary installation for testing purposes. To fix, stop collecting usable space for ps status if we notice that the variable we are about to clobber is LD_LIBRARY_PATH. This will result in some reduction in how long the ps status can be, but it's only likely to occur in temporary test contexts, so it doesn't seem like a big problem. In any case, we don't have to do it if we see we are on glibc, which surely is where the majority of our Linux testing is done. Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang Walther. Back-patch to all supported branches, with the hope that we'll set up a buildfarm animal to test on this platform. Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de	2024-03-26 11:44:49 -04:00
Peter Eisentraut	89e5ef7e21	Remove ObjectClass type ObjectClass is an enum whose values correspond to catalog OIDs. But the extra layer of redirection, which is used only in small parts of the code, and the similarity to ObjectType, are confusing and cumbersome. One advantage has been that some switches processing the OCLASS enum don't have "default:" cases. This is so that the compiler tells us when we fail to add support for some new object class. But you can also handle that with some assertions and proper test coverage. It's not even clear how strong this benefit is. For example, in AlterObjectNamespace_oid(), you could still put a new OCLASS into the "ignore object types that don't have schema-qualified names" case, and it might or might not be wrong. Also, there are already various OCLASS switches that do have a default case, so it's not even clear what the preferred coding style should be. Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com	2024-03-26 10:08:56 +01:00
Peter Eisentraut	8c4f2d5475	Message fixes for pg_createsubscriber Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/20240326.140116.1116279856046587865.horikyota.ntt@gmail.com	2024-03-26 08:22:46 +01:00
Masahiko Sawada	a0e22ef911	Fix inconsistent function prototypes with function definitions. Introduced by `30e144287a`. Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAD21AoCaDT%2B-ZaVjbtvumms0tyyHPNLELK2UX-MLG9XCgioaNw%40mail.gmail.com	2024-03-26 13:13:26 +09:00
Masahiko Sawada	4edb37e322	Fix a calculation in TidStoreCreate(). Since we expect that the max_bytes is in bytes, not in kilobytes, it should not be multiplied by 1024. Introduced by `30e144287a`. Reported-by: John Naylor, David Rowley Reviewed-by: John Naylor Discussion: https://postgr.es/m/CANWCAZZTE-14ofsucofTuhFsfuDGBNf%3DNZb22TMYT8bxA41oQQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvojg82NDaDEpj1WEZSbVTafj%3DDRmW%2BFrkBdW8ScL4OFxA%40mail.gmail.com	2024-03-26 13:06:06 +09:00
Alexander Korotkov	41d3780d3d	Improve error message for tts_(virtual\|minimal)_is_current_xact_tuple Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com Author: Pavel Borisov	2024-03-26 01:55:22 +02:00
Alexander Korotkov	10baee0c95	Add comments on some MinimalTupleSlots methods usage Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com Author: Pavel Borisov	2024-03-26 01:53:34 +02:00
Alexander Korotkov	8ffc2aa720	Add EvalPlanQual delete returning isolation test Author: Andres Freund Reviewed-by: Pavel Borisov Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com	2024-03-26 01:28:05 +02:00
Alexander Korotkov	87985cc925	Allow locking updated tuples in tuple_update() and tuple_delete() Currently, in read committed transaction isolation mode (default), we have the following sequence of actions when tuple_update()/tuple_delete() finds the tuple updated by the concurrent transaction. 1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which returns TM_Updated. 2. Lock tuple with tuple_lock(). 3. Re-evaluate plan qual (recheck if we still need to update/delete and calculate the new tuple for update). 4. Second attempt to update/delete tuple with tuple_update()/tuple_delete(). This attempt should be successful, since the tuple was previously locked. This commit eliminates step 2 by taking the lock during the first tuple_update()/tuple_delete() call. The heap table access method saves some effort by checking the updated tuple once instead of twice. Future undo-based table access methods, which will start from the latest row version, can immediately place a lock there. Also, this commit makes tuple_update()/tuple_delete() optionally save the old tuple into the dedicated slot. That saves efforts on re-fetching tuples in certain cases. The code in nodeModifyTable.c is simplified by removing the nested switch/case. Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp Reviewed-by: Andres Freund, Chris Travers	2024-03-26 01:27:56 +02:00
Tom Lane	c7076ba6ad	Refactor predicate_{implied,refuted}_by_simple_clause. Put the node-type-dependent operations into switches on nodeTag. This should ease addition of new proof rules for other expression node types. There is no functional change, although some tests are made in a different order than before. Also, add a couple of new cross-checks in test_predtest.c. James Coleman (part of a larger patch series) Discussion: https://postgr.es/m/CAAaqYe8Bo4bf_i6qKj8KBsmHMYXhe3Xt6vOe3OBQnOaf3_XBWg@mail.gmail.com	2024-03-25 17:45:15 -04:00
Jeff Davis	bc5fcaa289	Clarify comment for LogicalTapeSetBlocks(). Discussion: https://postgr.es/m/1229327.1711160246@sss.pgh.pa.us Backpatch-through: 13	2024-03-25 11:51:44 -07:00
Nathan Bossart	3ff01b2b6e	Adjust pgbench option for debug mode. Many other utilities use -d to specify the database to use, but pgbench uses it to enable debug mode. This is causing some users to accidentally enable it. This commit changes -d to accept the database name and introduces --dbname. Debug mode can still be enabled with --debug. This is a backward-incompatible change, but it has been judged to be worth the trade-off, i.e., some scripts that use pgbench will need to be updated. Author: Greg Sabino Mullane Reviewed-by: Tomas Vondra, Euler Taveira, Alvaro Herrera, David Christensen Discussion: https://postgr.es/m/CAKAnmmLjAzwVtb%3DVEaeuCtnmOLpzkJ1uJ_XiQ362YdD9B72HSg%40mail.gmail.com	2024-03-25 11:08:53 -05:00
Alvaro Herrera	374c7a2290	Allow specifying an access method for partitioned tables It's now possible to specify a table access method via CREATE TABLE ... USING for a partitioned table, as well change it with ALTER TABLE ... SET ACCESS METHOD. Specifying an AM for a partitioned table lets the value be used for all future partitions created under it, closely mirroring the behavior of the TABLESPACE option for partitioned tables. Existing partitions are not modified. For a partitioned table with no AM specified, any new partitions are created with the default_table_access_method. Also add ALTER TABLE ... SET ACCESS METHOD DEFAULT, which reverts to the original state of using the default for new partitions. The relcache of partitioned tables is not changed: rd_tableam is not set, even if a partitioned table has a relam set. Author: Justin Pryzby <pryzby@telsasoft.com> Author: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Author: Michaël Paquier <michael@paquier.xyz> Reviewed-by: The authors themselves Discussion: https://postgr.es/m/CAE-ML+9zM4wJCGCBGv01k96qQ3gFv4WFcFy=zqPHKeaEFwwv6A@mail.gmail.com Discussion: https://postgr.es/m/20210308010707.GA29832%40telsasoft.com	2024-03-25 16:30:36 +01:00
Daniel Gustafsson	b8528fe026	Fix typo in comment Spotted while reviewing a patch changing things around this area.	2024-03-25 14:49:17 +01:00
Daniel Gustafsson	b2d6b4c728	ecpg: Fix return code for overflow in numeric conversion The decimal conversion functions dectoint and dectolong are documented to return ECPG_INFORMIX_NUM_OVERFLOW in case of overflows, but always returned -1 on all errors due to incorrectly checking the returnvalue from the PGTYPES* functions. Author: Aidar Imamov <a.imamov@postgrespro.ru> Discussion: https://postgr.es/m/54d2b53327516d9454daa5fb2f893bdc@postgrespro.ru	2024-03-25 14:18:36 +01:00
Daniel Gustafsson	64e401b62b	Fix indentation from `a11f330b5` Per buildfarm animal koel	2024-03-25 14:18:33 +01:00
Heikki Linnakangas	f83d709760	Merge prune, freeze and vacuum WAL record formats The new combined WAL record is now used for pruning, freezing and 2nd pass of vacuum. This is in preparation for changing VACUUM to write a combined prune+freeze record per page, instead of separate two records. The new WAL record format now supports that, but the code still always writes separate records for pruning and freezing. This reserves separate XLOG_HEAP2_* info codes for when the pruning record is emitted for on-access pruning or VACUUM, per Peter Geoghegan's suggestion. The record format is identical, but having separate info codes makes it easier analyze pruning and vacuuming with pg_waldump. The function to emit the new WAL record, log_heap_prune_and_freeze(), is in pruneheap.c. The existing heap_log_freeze_plan() and its subroutines are moved to pruneheap.c without changes, to keep them together with log_heap_prune_and_freeze(). Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com	2024-03-25 14:59:58 +02:00
Peter Eisentraut	d44032d014	pg_createsubscriber: creates a new logical replica from a standby server It must be run on the target server and should be able to connect to the source server (publisher) and the target server (subscriber). All tables in the specified database(s) are included in the logical replication setup. A pair of publication and subscription objects are created for each database. The main advantage of pg_createsubscriber over the common logical replication setup is the initial data copy. It also reduces the catchup phase. Some prerequisites must be met to successfully run it. It is basically the logical replication requirements. It starts creating a publication using FOR ALL TABLES and a replication slot for each specified database. Write recovery parameters into the target data directory and start the target server. It specifies the LSN of the last replication slot (replication start point) up to which the recovery will proceed. Wait until the target server is promoted. Create one subscription per specified database (using publication and replication slot created in a previous step) on the target server. Set the replication progress to the replication start point for each subscription. Enable the subscription for each specified database on the target server. And finally, change the system identifier on the target server. Author: Euler Taveira <euler.taveira@enterprisedb.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Shubham Khanna <khannashubham1197@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/5ac50071-f2ed-4ace-a8fd-b892cffd33eb@www.fastmail.com	2024-03-25 12:42:47 +01:00
Amit Kapila	a11f330b55	Track last_inactive_time in pg_replication_slots. This commit adds a new property called last_inactive_time for slots. It is set to 0 whenever a slot is made active/acquired and set to the current timestamp whenever the slot is inactive/released or restored from the disk. Note that we don't set the last_inactive_time for the slots currently being synced from the primary to the standby because such slots are typically inactive as decoding is not allowed on those. The 'last_inactive_time' will be useful on production servers to debug and analyze inactive replication slots. It will also help to know the lifetime of a replication slot - one can know how long a streaming standby, logical subscriber, or replication slot consumer is down. The 'last_inactive_time' will also be useful to implement inactive timeout-based replication slot invalidation in a future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2024-03-25 16:34:33 +05:30
Amit Langote	0f7863afef	Code review for `6190d828cd` * Fix the comment of init_dummy_sjinfo() to remove references to non-existing parameters 'rel1' and 'rel2'. * Adjust consider_new_or_clause() to call init_dummy_sjinfo() to make up a SpecialJoinInfo for inner joins like other code sites that were adjusted in `6190d828cd` to do so. Author: Richard Guo <guofenglinux@gmail.com> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com	2024-03-25 19:45:27 +09:00
Alexander Korotkov	cc0e7ebd30	reindexdb: Fix warning about uninitialized indices_tables_cell Initialize indices_tables_cell with NULL to silence the warning. Also, refactor the place of the first assignment of indices_tables_cell. Reported-by: Thomas Munro, David Rowley, Tom Lane, Richard Guo Discussion: https://postgr.es/m/2348025.1711332418%40sss.pgh.pa.us Discussion: https://postgr.es/m/E1roXs4-005UdX-1V%40gemulon.postgresql.org	2024-03-25 11:40:25 +02:00
Amit Langote	6190d828cd	Do not translate dummy SpecialJoinInfos for child joins This teaches build_child_join_sjinfo() to create the dummy SpecialJoinInfos (those created for inner joins) directly for a given child join, skipping the unnecessary overhead of translating the parent joinrel's SpecialJoinInfo. To that end, this commit moves the code to initialize the dummy SpecialJoinInfos to a new function named init_dummy_sjinfo() and changes the few existing sites that have this code and build_child_join_sjinfo() to call this new function. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com	2024-03-25 18:06:47 +09:00
Amit Langote	5278d0a2e8	Reduce memory used by partitionwise joins Specifically, this commit reduces the memory consumed by the SpecialJoinInfos that are allocated for child joins in try_partitionwise_join() by freeing them at the end of creating paths for each child join. A SpecialJoinInfo allocated for a given child join is a copy of the parent join's SpecialJoinInfo, which contains the translated copies of the various Relids bitmapsets and semi_rhs_exprs, which is a List of Nodes. The newly added freeing step frees the struct itself and the various bitmapsets, but not semi_rhs_exprs, because there's no handy function to free the memory of Node trees. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com	2024-03-25 18:06:46 +09:00
Masahiko Sawada	80d5d4937c	Fix potential integer handling issue in radixtree.h. Coverity complained about the integer handling issue; if we start with an arbitrary non-negative shift value, the loop may decrement it down to something less than zero before exiting. This commit adds an assertion to make sure the 'shift' is always 0 after the loop, and uses 0 as the shift to get the key chunk in the following operation. Introduced by `ee1b30f12`. Reported-by: Tom Lane as per coverity Reviewed-by: Tom Lane Discussion: https://postgr.es/m/2089517.1711299216%40sss.pgh.pa.us	2024-03-25 12:06:41 +09:00
David Rowley	66c0185a3d	Allow planner to use Merge Append to efficiently implement UNION Until now, UNION queries have often been suboptimal as the planner has only ever considered using an Append node and making the results unique by either using a Hash Aggregate, or by Sorting the entire Append result and running it through the Unique operator. Both of these methods always require reading all rows from the union subqueries. Here we adjust the union planner so that it can request that each subquery produce results in target list order so that these can be Merge Appended together and made unique with a Unique node. This can improve performance significantly as the union child can make use of the likes of btree indexes and/or Merge Joins to provide the top-level UNION with presorted input. This is especially good if the top-level UNION contains a LIMIT node that limits the output rows to a small subset of the unioned rows as cheap startup plans can be used. Author: David Rowley Reviewed-by: Richard Guo, Andy Fan Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com	2024-03-25 14:31:14 +13:00
Alexander Korotkov	47f99a407d	reindexdb: Add the index-level REINDEX with multiple jobs Straight-forward index-level REINDEX is not supported with multiple jobs as we cannot control the concurrent processing of multiple indexes depending on the same relation. Instead, we dedicate the whole table to certain reindex job. Thus, if indexes in the lists belong to different tables, that gives us a fair level of parallelism. This commit teaches get_parallel_object_list() to fetch table names for indexes in the case of index-level REINDEX. The same tables are grouped together in the output order, and the list of indexes is also rebuilt to match that order. Later during processingof that list, we push indexes belonging to the same table into the same job. Discussion: https://postgr.es/m/CACG%3DezZU_VwDi-1PN8RUSE6mcYG%2BYx1NH_rJO4%2BKe-mKqLp%3DNw%40mail.gmail.com Author: Maxim Orlov, Svetlana Derevyanko, Alexander Korotkov Reviewed-by: Michael Paquier	2024-03-25 02:07:15 +02:00
Jeff Davis	503c0ad976	Fix convert_case(), introduced in `5c40364dd6`. Check source length before checking for NUL terminator to avoid reading one byte past the string end. Also fix unreachable bug when caller does not expect NUL-terminated result. Add unit test coverage of convert_case() in case_test.c, which makes it easier to reproduce the valgrind failure. Discussion: https://postgr.es/m/7a9fd36d-7a38-4dc2-e676-fc939491a95a@gmail.com Reported-by: Alexander Lakhin	2024-03-24 16:28:34 -07:00
Tom Lane	af1d395843	Allow more cases to pass the unsafe-use-of-new-enum-value restriction. Up to now we've rejected cases like BEGIN; CREATE TYPE rainbow AS ENUM (); ALTER TYPE rainbow ADD VALUE 'red'; -- use the value 'red', perhaps in a constraint or index COMMIT; The concern is that the uncommitted enum value 'red' might get into an index and then break the index if we roll back the ALTER ADD. If the ALTER is in the same transaction as the CREATE then it's really perfectly safe, but we weren't taking the trouble to identify that. pg_dump in binary-upgrade mode will emit enum definitions that look like the above, which up to now didn't fall foul of the unsafe-usage check because we processed each restore command as a separate transaction. However an upcoming patch proposes to bundle the restore commands into large transactions to reduce XID consumption during pg_upgrade, and that makes this behavior a problem. To fix, remember the OIDs of enum types created in the current transaction, and allow use of enum values that are added to one later in the same transaction. To do this fully correctly in the presence of subtransactions, we'd have to track subtransaction nesting level of the CREATE and do maintenance work at every subsequent subtransaction exit. That seems expensive, and we don't need it to satisfy pg_dump's usage. Hence, apply the additional optimization only when the CREATE and ALTER are at outermost transaction level. Patch by me, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/1548468.1711220438@sss.pgh.pa.us	2024-03-24 14:30:29 -04:00
Tom Lane	d37e0d0c50	Release PQconninfoOptions array in GetDbnameFromConnectionOptions(). It wasn't getting freed in one code path, which Coverity identified as a resource leak. It's probably of little consequence, but re-ordering the code into the correct sequence is no more work than dismissing the complaint. Minor oversight in commit `a145f424d`. While here, improve the unreasonably clunky coding of FindDbnameInConnParams: use of an output parameter is unnecessary and prone to uninitialized-variable problems.	2024-03-24 12:31:05 -04:00
Tom Lane	225e1dde46	Release temporary array in check_for_data_types_usage(). Coverity identified this as a resource leak. It's surely of no consequence given that the function is called only once per run, but freeing the storage is no more work than dismissing the complaint. Minor oversight in commit `347758b12`.	2024-03-24 12:13:35 -04:00
Peter Eisentraut	fc2d260c7e	ci: freebsd repartition script didn't copy .git directory We need a slightly different "cp" incantation to make sure top-level "dot" files, such as ".git", are also copied. This is relevant for example if a script wants to execute a git command. This currently does not happen, but it has come up while testing other patches. Reviewed-by: Tristan Partin <tristan@neon.tech> Discussion: https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org	2024-03-24 08:41:14 +01:00
Peter Eisentraut	34768ee361	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-03-24 07:37:13 +01:00
Daniel Gustafsson	697f8d266c	Revert "Add notBefore and notAfter to SSL cert info display" This reverts commit `6acb0a628e` since LibreSSL didn't support ASN1_TIME_diff until OpenBSD 7.1, leaving the older OpenBSD animals in the buildfarm complaining. Per plover in the buildfarm. Discussion: https://postgr.es/m/F0DF7102-192D-4C21-96AE-9A01AE153AD1@yesql.se	2024-03-22 22:58:41 +01:00
Tom Lane	473182c952	Use a hash table for catcache.c's CatCList objects. Up to now, all of the "catcache list" objects within a catalog cache were just chained together on a single dlist, requiring O(N) time to search. Remarkably, we've not had serious performance problems with that so far; but we got a complaint of a bad performance regression from v15 in a case with a large number of roles in the system, which traced down to O(N^2) total time when we probed N catcache lists. Replace that data structure with a hashtable having an enlargeable number of dlists, in an exactly parallel way to the data structure we've used for years for the plain CatCTup cache members. The extra cost of maintaining a hash table seems negligible, since we were already computing a hash value for list searches. Normally this'd be HEAD-only material, but in view of the performance regression it seems advisable to back-patch into v16. In the v16 version of the patch, leave the dead cc_lists field where it is and add the new fields at the end of struct catcache, to avoid possible ABI breakage in case any external code is looking at these structs. (We assume no external code is actually allocating new catcache structs.) Per report from alex work. Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz=vggErdSG7pv2s6vmmTOLJSg@mail.gmail.com	2024-03-22 17:13:53 -04:00
Daniel Gustafsson	6acb0a628e	Add notBefore and notAfter to SSL cert info display This adds the X509 attributes notBefore and notAfter to sslinfo as well as pg_stat_ssl to allow verifying and identifying the validity period of the current client certificate. OpenSSL has APIs for extracting notAfter and notBefore, but they are only supported in recent versions so we have to calculate the dates by hand in order to make this work for the older versions of OpenSSL that we still support. Original patch by Cary Huang with additional hacking by Jacob and myself. Author: Cary Huang <cary.huang@highgo.ca> Co-author: Jacob Champion <jacob.champion@enterprisedb.com> Co-author: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca	2024-03-22 21:25:25 +01:00
Alexander Korotkov	b670b93a66	Fix an oversight in refactoring in `06b10f80ba`. It was against intended skipping prechecking keys optimization in the first page of range queries to not influence point queries performance. Reported-by: Anton Melnikov Discussion: https://postgr.es/m/30cd7524-b9f1-4cf8-9c4a-223eb2e34441%40postgrespro.ru Author: Pavel Borisov	2024-03-22 15:25:53 +02:00
Peter Eisentraut	d20d8fbd3e	Do not output actual value of location fields in node serialization by default This changes nodeToString() to not output the actual value of location fields in nodes, but instead it writes -1. This mirrors the fact that stringToNode() also does not read location field values but always stores -1. For most uses of nodeToString(), which is to store nodes in catalog fields, this is more useful. We don't store original query texts in catalogs, so any lingering query location values are not meaningful. For debugging purposes, there is a new nodeToStringWithLocations(), which mirrors the existing stringToNodeWithLocations(). This is used for WRITE_READ_PARSE_PLAN_TREES and nodes/print.c functions, which covers all the debugging uses. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEze2WgrCiR3JZmWyB0YTc8HV7ewRdx13j0CqD6mVkYAW+SFGQ@mail.gmail.com	2024-03-22 09:49:12 +01:00
Amit Kapila	6ae701b437	Track invalidation_reason in pg_replication_slots. Till now, the reason for replication slot invalidation is not tracked directly in pg_replication_slots. A recent commit `007693f2a3` added 'conflict_reason' to show the reasons for slot conflict/invalidation, but only for logical slots. This commit adds a new column 'invalidation_reason' to show invalidation reasons for both physical and logical slots. And, this commit also turns 'conflict_reason' text column to 'conflicting' boolean column (effectively reverting commit `007693f2a3`). The 'conflicting' column is true for invalidation reasons 'rows_removed' and 'wal_level_insufficient' because those make the slot conflict with recovery. When 'conflicting' is true, one can now look at the new 'invalidation_reason' column for the reason for the logical slot's conflict with recovery. The new 'invalidation_reason' column will also be useful to track other invalidation reasons in the future commit. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2024-03-22 13:52:05 +05:30
Peter Eisentraut	b4080fa3dc	Make RangeTblEntry dump order consistent Put the fields alias and eref earlier in the struct, so that it matches the order in _outRangeTblEntry()/_readRangeTblEntry(). This helps if we ever want to fully automate out/read of RangeTblEntry. Also, it makes dumps in the debugger easier to read in the same way. Internally, this makes no difference. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-03-22 07:28:33 +01:00
Peter Eisentraut	367c989cd8	Remove custom _jumbleRangeTblEntry() This is part of an effort to reduce the number of special cases in the automatically generated node support functions. This patch removes _jumbleRangeTblEntry() and instead adds per-field query_jumble_ignore annotations to match the behavior of the previous custom code. The pg_stat_statements test suite has some coverage of this. It gets rid of the switch on rtekind; this should be technically correct, since we do the equal and copy functions like this also. The list of fields to jumble has been checked and is considered correct as of `8b29a119fd`. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-03-22 07:23:47 +01:00
Peter Eisentraut	d575051b9a	Reformat some node comments Reformat some comments in node field definitions to avoid long lines. This makes room for per-field annotations. Similar to `835d476fd2`. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-03-22 07:21:51 +01:00
Peter Eisentraut	1e1eb12c25	Improve comment Clarify that RangeTblEntry.lateral reflects whether LATERAL was specified in the statement (as opposed to whether lateralness is implicit). Also, the list of applicable entry types was incomplete. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-03-22 07:21:06 +01:00
Peter Eisentraut	83d8065b1f	Remove obsolete comment The idea to use a union in the definition of RangeTblEntry is clearly not being pursued. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-03-22 07:20:11 +01:00
Amit Langote	085e759e9d	Avoid splitting errmsg string to span multiple lines The error message being fixed was added in `6185c9737c`. While at it, add an "a" to the sentence. Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20240322.095149.895185546948714852.horikyota.ntt%40gmail.com	2024-03-22 12:04:06 +09:00
Daniel Gustafsson	7e65ad197f	Fix dumping role comments when using --no-role-passwords Commit `9a83d56b38` added support for allowing pg_dumpall to dump roles without including passwords, which accidentally made dumps omit COMMENTs on roles. This fixes it by using pg_authid to get the comment. Backpatch to all supported versions. Patch simultaneously written independently by Álvaro and myself. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Bartosz Chroł <bartosz.chrol@handen.pl> Discussion: https://postgr.es/m/AS8P194MB1271CDA0ADCA7B75FCD8E767F7332@AS8P194MB1271.EURP194.PROD.OUTLOOK.COM Discussion: https://postgr.es/m/CAEP4nAz9V4H41_4ESJd1Gf0v%3DdevkqO1%3Dpo91jUw-GJSx8Hxqg%40mail.gmail.com Backpatch-through: v12	2024-03-21 23:31:57 +01:00
Alexander Korotkov	0997e0af27	Add TupleTableSlotOps.is_current_xact_tuple() method This allows us to abstract how/whether table AM uses transaction identifiers. A custom table AM can use a custom slot, which may not store xmin directly, but determine the tuple belonging to the current transaction in the other way. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li	2024-03-21 23:00:43 +02:00
Alexander Korotkov	c35a3fb5e0	Allow table AM tuple_insert() method to return the different slot This allows table AM to return a native tuple slot even if VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge about system attributes, which could be accessed in the future. table_multi_insert() method already can modify the input 'slots' array. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li	2024-03-21 23:00:40 +02:00
Alexander Korotkov	02eb07ea89	Allow table AM to store complex data structures in rd_amcache The new table AM method free_rd_amcache is responsible for freeing all the memory related to rd_amcache and setting free_rd_amcache to NULL. If the new method is not specified, we still assume rd_amcache to be a single chunk of memory, which could be just pfree'd. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov Reviewed-by: Nikita Malakhov, Japin Li	2024-03-21 23:00:34 +02:00
Daniel Gustafsson	adcdb2c8dd	Explicitly require password for SCRAM exchange This refactors the SASL init flow to set password_needed on the two SCRAM exchanges currently supported. The code already required this but was set up in such a way that all SASL exchanges required using a password, a restriction which may not hold for all exchanges (the example at hand being the proposed OAuthbearer exchange). This was extracted from a larger patchset to introduce OAuthBearer authentication and authorization. Author: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-03-21 14:45:54 +01:00
Daniel Gustafsson	24178e235e	Refactor SASL exchange to return tri-state status The SASL exchange callback returned state in to output variables: done and success. This refactors that logic by introducing a new return variable of type SASLStatus which makes the code easier to read and understand, and prepares for future SASL exchanges which operate asynchronously. This was extracted from a larger patchset to introduce OAuthBearer authentication and authorization. Author: Jacob Champion <jacob.champion@enterprisedb.com> Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-03-21 14:45:46 +01:00
David Rowley	1db689715d	Temporarily install debugging in partition_prune test The buildfarm animal parula has been sporadically failing in the partition_prune test for the past week or so. It appears like an auto-vacuum or auto-analyze has run on one of the partitions of the "ab" table, causing the plan to change. This is unexpected as the table is empty. Here we install some telemetry to find out if this is the case. This also joins in pg_index to see if something has gone wrong with the index which could result in the planner being unable to use that index. We can revert this once we've figured out the cause of the plan instability. Reported-by: Tom Lane Investigation-by: Tom Lane Discussion: https://postgr.es/m/4009739.1710878318%40sss.pgh.pa.us	2024-03-21 21:21:05 +13:00
Amit Langote	6185c9737c	Add SQL/JSON query functions This introduces the following SQL/JSON functions for querying JSON data using jsonpath expressions: JSON_EXISTS(), which can be used to apply a jsonpath expression to a JSON value to check if it yields any values. JSON_QUERY(), which can be used to to apply a jsonpath expression to a JSON value to get a JSON object, an array, or a string. There are various options to control whether multi-value result uses array wrappers and whether the singleton scalar strings are quoted or not. JSON_VALUE(), which can be used to apply a jsonpath expression to a JSON value to return a single scalar value, producing an error if it multiple values are matched. Both JSON_VALUE() and JSON_QUERY() functions have options for handling EMPTY and ERROR conditions, which can be used to specify the behavior when no values are matched and when an error occurs during jsonpath evaluation, respectively. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov, Nikita Malakhov, Peter Eisentraut, Tomas Vondra Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-03-21 17:07:03 +09:00
Amit Kapila	a145f424d5	Allow dbname to be written as part of connstring via pg_basebackup's -R option. Commit `cca97ce6a6` allowed dbname in pg_basebackup connstring and in this commit we allow it to be written in postgresql.auto.conf when -R option is used. The database name in the connection string will be used by the logical replication slot synchronization on standby. The dbname will be recorded only if specified explicitly in the connection string or environment variable. Masahiko Sawada hasn't reviewed the code in detail but endorsed the idea. Author: Vignesh C, Kuroda Hayato Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAB8KJ=hdKdg+UeXhReeHpHA6N6v3e0qFF+ZsPFHk9_ThWKf=2A@mail.gmail.com	2024-03-21 10:50:33 +05:30
Masahiko Sawada	30e144287a	Add TIDStore, to store sets of TIDs (ItemPointerData) efficiently. TIDStore is a data structure designed to efficiently store large sets of TIDs. For TID storage, it employs a radix tree, where the key is a block number, and the value is a bitmap representing offset numbers. The TIDStore can be created on a DSA area and used by multiple backend processes simultaneously. There are potential future users such as tidbitmap.c, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. For example, we can support updating the offset bitmap of existing values. Currently, the TIDStore is not used for anything yet, aside from the test code. But an upcoming patch will use it. This includes a unit test module, in src/test/modules/test_tidstore. Co-authored-by: John Naylor Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-03-21 10:08:42 +09:00
Tom Lane	995e0fbc1c	Un-break genbki.pl's error reporting capabilities. This essentially reverts commit `69eb643b2`, which added a fast path in Catalog::ParseData, but neglected to preserve the behavior of adding a line_number field in each hash. That makes it impossible for genbki.pl to provide any localization of error reports, which is bad enough; but actually the affected error reports failed entirely, producing useless bleats like "use of undefined value in sprintf". `69eb643b2` claimed to get a 15% speedup, but I'm not sure I believe that: the time to rebuild the bki files changes by less than 1% for me. In any case, making debugging of mistakes in .dat files more difficult would not be justified by even an order of magnitude speedup here; it's just not that big a chunk of the total build time. Per report from David Wheeler. Although it's also broken in v16, I don't think this is worth a back-patch, since we're very unlikely to touch the v16 catalog data again. Discussion: https://postgr.es/m/19238.1710953049@sss.pgh.pa.us	2024-03-20 18:02:50 -04:00
Tom Lane	1218ca9956	Add to_regtypemod function to extract typemod from a string type name. In combination with to_regtype, this allows converting a string to the "canonicalized" form emitted by format_type. That usage requires parsing the string twice, which is slightly annoying but not really too expensive. We considered alternatives such as returning a record type, but that way was notationally uglier than this, and possibly less flexible. Like to_regtype(), we'd rather that this return NULL for any bad input, but the underlying type-parsing logic isn't yet capable of not throwing syntax errors. Adjust the documentation for both functions to point that out. In passing, fix up a couple of nearby entries in the System Catalog Information Functions table that had not gotten the word about our since-v13 convention for displaying function usage examples. David Wheeler and Erik Wienhold, reviewed by Pavel Stehule, Jim Jones, and others. Discussion: https://postgr.es/m/DF2324CA-2673-4ABE-B382-26B5770B6AA3@justatheory.com	2024-03-20 17:11:28 -04:00
Nathan Bossart	80686761c4	Avoid overflow in MaybeRemoveOldWalSummaries(). This commit limits the maximum value of wal_summary_keep_time to INT_MAX / SECS_PER_MINUTE to avoid overflow when it is converted to seconds. In passing, use the HOURS_PER_DAY, MINS_PER_HOUR, and SECS_PER_MINUTE macros in the code for this GUC instead of hard- coding those values. Discussion: https://postgr.es/m/20240314210010.GA3056455%40nathanxps13	2024-03-20 13:31:58 -05:00
Jeff Davis	9acae56ce0	Inline basic UTF-8 functions. Shows a measurable speedup when processing UTF-8 data, such as with the new builtin collation provider. Discussion: https://postgr.es/m/163f4e2190cdf67f67016044e503c5004547e5a9.camel@j-davis.com Reviewed-by: Peter Eisentraut	2024-03-20 09:40:57 -07:00
Nathan Bossart	2b520860c0	Revert "Temporary patch to help debug pg_walsummary test failures." Thanks to commits `ea18eb7d62`, `b6ee30ec08`, and `19a829a327`, the 002_blocks.pl test now consistently passes, so we can remove this temporary debugging code. This reverts commit `5ddf997347`. Discussion: https://postgr.es/m/20240314210010.GA3056455%40nathanxps13	2024-03-20 11:34:00 -05:00
Alvaro Herrera	a0390f6ca6	Review wording on tablespaces w.r.t. partitioned tables Remove a redundant comment, and document pg_class.reltablespace properly in catalogs.sgml. After commits `a36c84c3e4`, `87259588d0` and others. Backpatch to 12. Discussion: https://postgr.es/m/202403191013.w2kr7wqlamqz@alvherre.pgsql	2024-03-20 15:28:14 +01:00
Alvaro Herrera	da952b415f	Rework lwlocknames.txt to become lwlocklist.h This way, we can fold the list of lock names to occur in BuiltinTrancheNames instead of having its own separate array. This saves two lines of code in GetLWTrancheName and some space in BuiltinTrancheNames, as foreseen in commit `74a7306310`, as well as removing the need for a separate lwlocknames.c file. We still have to build lwlocknames.h using Perl code, which initially I wanted to avoid, but it gives us the chance to cross-check wait_event_names.txt. Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql	2024-03-20 11:55:20 +01:00
Peter Eisentraut	e5da0fe3c2	Catalog domain not-null constraints This applies the explicit catalog representation of not-null constraints introduced by `b0e96f3119` for table constraints also to domain not-null constraints. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org	2024-03-20 10:05:37 +01:00
Heikki Linnakangas	c9c260decd	Remove unused PruneState member rel PruneState->rel is no longer being used, so just remove it. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov	2024-03-20 10:13:42 +02:00
Heikki Linnakangas	c33084205a	Reorganize heap_page_prune() function comment heap_page_prune()'s function header comment didn't explain the parameters in the same order they appear in the function. Fix that. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240320013602.6sypr4cx6sefpemg@liskov	2024-03-20 10:13:39 +02:00
Dean Rasheed	522ed12f7c	Add "--exclude-extension" to pg_dump's options. This option (or equivalently specifying "exclude extension pattern" in a filter file) allows extensions matching the specified pattern to be excluded from the dump. Ayush Vatsa, reviewed by Junwang Zhao, Dean Rasheed, and Daniel Gustafsson. Discussion: https://postgr.es/m/CACX+KaP=VgVy9h-EUh598DTu+-fNr1jyEmpghC8rRp9s=w33Kg@mail.gmail.com	2024-03-20 08:05:44 +00:00
Heikki Linnakangas	d63d486d6c	Remove assertions that some compiler say are tautological To avoid the compiler warnings: launch_backend.c:211:39: warning: comparison of constant 16 with expression of type 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] launch_backend.c:233:39: warning: comparison of constant 16 with expression of type 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] The point of the assertions was to fail more explicitly if someone adds a new BackendType to the end of the enum, but forgets to add it to the child_process_kinds array. It was a pretty weak assertion to begin with, because it wouldn't catch if you added a new BackendType in the middle of the enum. So let's just remove it. Per buildfarm member ayu and a few others, spotted by Tom Lane. Discussion: https://www.postgresql.org/message-id/4119680.1710913067@sss.pgh.pa.us	2024-03-20 09:14:51 +02:00
Peter Eisentraut	9578393bc5	Add tests for domain-related information schema views Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org	2024-03-20 07:08:01 +01:00
Jeff Davis	f69319f2f1	Support C.UTF-8 locale in the new builtin collation provider. The builtin C.UTF-8 locale has similar semantics to the libc locale of the same name. That is, code point sort order (fast, memcmp-based) combined with Unicode semantics for character operations such as pattern matching, regular expressions, and LOWER()/INITCAP()/UPPER(). The character semantics are based on Unicode simple case mappings. The builtin provider's C.UTF-8 offers several important advantages over libc: * faster sorting -- benefits from additional optimizations such as abbreviated keys and varstrfastcmp_c * faster case conversion, e.g. LOWER(), at least compared with some libc implementations * available on all platforms with identical semantics, and the semantics are stable, testable, and documentable within a given Postgres major version Being based on memcmp, the builtin C.UTF-8 locale does not offer natural language sort order. But it is an improvement for most use cases that might otherwise use libc's "C.UTF-8" locale, as well as many use cases that use libc's "C" locale. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider	2024-03-19 15:24:41 -07:00
Tom Lane	fd0398fcb0	Improve EXPLAIN's display of SubPlan nodes and output parameters. Historically we've printed SubPlan expression nodes as "(SubPlan N)", which is pretty uninformative. Trying to reproduce the original SQL for the subquery is still as impractical as before, and would be mighty verbose as well. However, we can still do better than that. Displaying the "testexpr" when present, and adding a keyword to indicate the SubLinkType, goes a long way toward showing what's really going on. In addition, this patch gets rid of EXPLAIN's use of "$n" to represent subplan and initplan output Params. Instead we now print "(SubPlan N).colX" or "(InitPlan N).colX" to represent the X'th output column of that subplan. This eliminates confusion with the use of "$n" to represent PARAM_EXTERN Params, and it's useful for the first part of this change because it eliminates needing some other indication of which subplan is referenced by a SubPlan that has a testexpr. In passing, this adds simple regression test coverage of the ROWCOMPARE_SUBLINK code paths, which were entirely unburdened by testing before. Tom Lane and Dean Rasheed, reviewed by Aleksander Alekseev. Thanks to Chantal Keller for raising the question of whether this area couldn't be improved. Discussion: https://postgr.es/m/2838538.1705692747@sss.pgh.pa.us	2024-03-19 18:19:24 -04:00
Nathan Bossart	cc4826dd5e	Inline pg_popcount{32,64} into pg_popcount(). On some systems, calls to pg_popcount{32,64} are indirected through a function pointer. This commit converts pg_popcount() to a function pointer on those systems so that we can inline the appropriate pg_popcount{32,64} implementations into each of the pg_popcount() implementations. Since pg_popcount() may call pg_popcount{32,64} several times, this can significantly improve its performance. Suggested-by: David Rowley Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAApHDvrb7MJRB6JuKLDEY2x_LKdFHwVbogpjZBCX547i5%2BrXOQ%40mail.gmail.com	2024-03-19 14:46:16 -05:00
Tom Lane	b7e2121ab7	Postpone reparameterization of paths until create_plan(). When considering nestloop paths for individual partitions within a partitionwise join, if the inner path is parameterized, it is parameterized by the topmost parent of the outer rel, not the corresponding outer rel itself. Therefore, we need to translate the parameterization so that the inner path is parameterized by the corresponding outer rel. Up to now, we did this while generating join paths. However, that's problematic because we must also translate some expressions that are shared across all paths for a relation, such as restriction clauses (kept in the RelOptInfo and/or IndexOptInfo) and TableSampleClauses (kept in the RangeTblEntry). The existing code fails to translate these at all, leading to wrong answers, odd failures such as "variable not found in subplan target list", or executor crashes. But we can't modify them during path generation, because that would break things if we end up choosing some non-partitioned-join path. So this patch postpones reparameterization of the inner path until createplan.c, where it is safe to modify the referenced RangeTblEntry, RelOptInfo or IndexOptInfo, because we have made a final choice of which Path to use. We do still have to check during path generation that the reparameterization will be possible. So we introduce a new function path_is_reparameterizable_by_child() to detect that. The duplication between path_is_reparameterizable_by_child() and reparameterize_path_by_child() is a bit annoying, but there seems no other good answer. A small benefit is that we can avoid building useless reparameterized trees in cases where a non-partitioned join is ultimately chosen. Also, reparameterize_path_by_child() can now be allowed to scribble on the input paths, saving a few cycles. This fix repairs the same problems previously addressed in the back branches by commits `62f120203` et al. Richard Guo, reviewed at various times by Ashutosh Bapat, Andrei Lepikhov, Alena Rybakina, Robert Haas, and myself Discussion: https://postgr.es/m/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com	2024-03-19 14:51:58 -04:00
Peter Eisentraut	605721f819	gen_node_support.pl: Mark location fields as type alias ParseLoc Instead of the rather ugly type=int + name ~= location$, we now have a marker type for offset pointers or sizes that are only relevant when a query text is included, which decreases the complexity required in gen_node_support.pl for handling these values. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAEze2WgrCiR3JZmWyB0YTc8HV7ewRdx13j0CqD6mVkYAW+SFGQ@mail.gmail.com	2024-03-19 16:56:44 +01:00
Daniel Gustafsson	347758b120	pg_upgrade: run all data type checks per connection The checks for data type usage were each connecting to all databases in the cluster and running their query. On clusters which have a lot of databases this can become unnecessarily expensive. This moves the checks to run in a single connection instead to minimize setup and teardown overhead. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se	2024-03-19 13:32:50 +01:00
Peter Eisentraut	5577a71fb0	Use half-open interval notation in without_overlaps tests This way, the input literals match the output in any error messages. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-03-19 11:44:14 +01:00
Peter Eisentraut	49b579f92d	Fix misleading comments To match code changes in `229fb58d4f`.	2024-03-19 10:55:51 +01:00
Peter Eisentraut	a88c800deb	Use daterange and YMD in without_overlaps tests instead of tsrange. This makes things a lot easier to read, especially when we get to the FOREIGN KEY tests later. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-03-19 10:17:03 +01:00
Peter Eisentraut	794f10f6b9	Add some UUID support functions Add uuid_extract_timestamp() and uuid_extract_version(). Author: Andrey Borodin Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com	2024-03-19 09:32:04 +01:00
Peter Eisentraut	d56cb42b54	Activate perlcritic InputOutput::RequireCheckedSyscalls and fix resulting warnings This checks that certain I/O-related Perl functions properly check their return value. Some parts of the PostgreSQL code had been a bit sloppy about that. The new perlcritic warnings are fixed here. I didn't design any beautiful error messages, mostly just used "or die $!", which mostly matches existing code, and also this is developer-level code, so having the system error plus source code reference should be ok. Initially, we only activate this check for a subset of what the perlcritic check would warn about. The effective list is chmod flock open read rename seek symlink system The initial set of functions is picked because most existing code already checked the return value of those, so any omissions are probably unintended, or because it seems important for test correctness. The actual perlcritic configuration is written as an exclude list. That seems better so that we are clear on what we are currently not checking. Maybe future patches want to investigate checking some of the other functions. (In principle, we might eventually want to check all of them, but since this is test and build support code, not production code, there are probably some reasonable compromises to be made.) Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/88b7d4f2-46d9-4cc7-b1f7-613c90f9a76a%40eisentraut.org	2024-03-19 07:09:31 +01:00
Jeff Davis	f9f3fb1cb7	Update src/common/unicode/README. Change the test description to include the case mapping test. Oversight in `5c40364dd6`.	2024-03-18 16:39:29 -07:00
Jeff Davis	60769c62dc	Fix another warning, introduced by `846311051e`. Discussion: https://postgr.es/m/3703896.1710799495@sss.pgh.pa.us Reported-by: Tom Lane	2024-03-18 15:34:43 -07:00
Jeff Davis	846311051e	Address more review comments on commit `2d819a08a1`. Based on comments from Peter Eisentraut. * Document CREATE DATABASE ... BUILTIN_LOCALE. * Determine required encoding based on locale name for CREATE COLLATION. Use -1 for "C" (requires catversion bump). * initdb output fixups. * Make ctype_is_c a constant true for now. * Fixups to ICU 010_create_database.pl test. Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org	2024-03-18 11:58:13 -07:00
Alvaro Herrera	66ab9371a2	dblink/isolationtester/fe_utils: Use new cancel API Commit `61461a300c` introduced new functions to libpq for cancelling queries. This replaces the usage of the old ones in parts of the codebase with these newer ones. This specifically leaves out changes to psql and pgbench, as those would need a much larger refactor to be able to call them due to the new functions not being signal-safe; and also postgres_fdw, because the original code there is not clear to me (Álvaro) and not fully tested. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/CAGECzQT_VgOWWENUqvUV9xQmbaCyXjtRRAYO8W07oqashk_N+g@mail.gmail.com	2024-03-18 19:28:58 +01:00
Jeff Davis	61f352ece9	Fix unreachable code warning from commit `2d819a08a1`. Found by Coverity. Discussion: https://postgr.es/m/3422201.1710711993@sss.pgh.pa.us Reported-by: Tom Lane	2024-03-18 09:15:47 -07:00
Peter Eisentraut	5162f663fc	Add missing source files to nls.mk	2024-03-18 16:46:22 +01:00
Alvaro Herrera	6b3678d347	Put libpq_pipeline cancel test back I disabled it in `cc6e64afda` because it was unstable; hopefully the changes here make it more stable. (Jelte proposed to submit the queries using simple query protocol, but I chose to instead make the monitor connection wait until the PgSleep wait event is seen. This is probably not a terribly meaningful increase in coverage ...) Discussion: https://postgr.es/m/CAGECzQRQh_5tSy+5cndgv08eNJ2O0Zpwn2YddJtSsmC=Wpy1BQ@mail.gmail.com	2024-03-18 13:14:55 +01:00
Heikki Linnakangas	0960ae1967	Fix EXPLAIN Bitmap heap scan to count pages with no visible tuples Previously, bitmap heap scans only counted lossy and exact pages for explain when there was at least one visible tuple on the page. heapam_scan_bitmap_next_block() returned true only if there was a "valid" page with tuples to be processed. However, the lossy and exact page counters in EXPLAIN should count the number of pages represented in a lossy or non-lossy way in the constructed bitmap, regardless of whether or not the pages ultimately contained visible tuples. Backpatch to all supported versions. Author: Melanie Plageman Discussion: https://www.postgresql.org/message-id/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ%2BopUrXDRXdcfwFZGA@mail.gmail.com	2024-03-18 14:03:58 +02:00
Peter Eisentraut	48018f1d8c	Add some const decorations Discussion: https://www.postgresql.org/message-id/flat/5ac50071-f2ed-4ace-a8fd-b892cffd33eb@www.fastmail.com	2024-03-18 12:07:09 +01:00
Heikki Linnakangas	05c3980e7f	Move code for backend startup to separate file This is code that runs in the backend process after forking, rather than postmaster. Move it out of postmaster.c for clarity. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:38:10 +02:00
Heikki Linnakangas	aafc05de1b	Refactor postmaster child process launching Introduce new postmaster_child_launch() function that deals with the differences in EXEC_BACKEND mode. Refactor the mechanism of passing information from the parent to child process. Instead of using different command-line arguments when launching the child process in EXEC_BACKEND mode, pass a variable-length blob of startup data along with all the global variables. The contents of that blob depend on the kind of child process being launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply inherited from the parent to child process. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:30 +02:00
Heikki Linnakangas	f1baed18bc	Move some functions from postmaster.c to a new source file This just moves the functions, with no other changes, to make the next commits smaller and easier to review. The moved functions are related to launching postmaster child processes in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:05 +02:00
Heikki Linnakangas	14cddee9cc	Split registration of Win32 deadchild callback to separate function The next commit will move the internal_forkexec() function to a different source file, but it makes sense to keep all the code related to the win32 waitpid() emulation in postmaster.c. Split it off to a separate function now, to make the next commit more mechanical. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:01 +02:00
Michael Paquier	ca108be72e	Remove references to backup_fs_hot() in Cluster.pm This routine has been removed in `39969e2a1e` with the exclusive backup mode but there were still references to it. Issue noticed while working on `071e3ad59d`.	2024-03-18 14:21:32 +09:00
Nathan Bossart	949300402b	Initialize variables to placate compiler. Since commit `012460ee93`, some compilers have been warning that a couple of variables may be used uninitialized. There doesn't appear to be any actual risk, so let's just initialize these variables to 0 to silence the compiler warnings. Discussion: https://postgr.es/m/20240317192927.GA3978212%40nathanxps13	2024-03-17 20:16:15 -05:00
Daniel Gustafsson	d6607016c7	Support json_errdetail in FRONTEND code Allocate memory for the error message inside memory owned by the JsonLexContext and move responsibility away from the caller for freeing it. This means that we can partially revert `b44669b2ca` as this is now safe to use in FRONTEND code. The motivation for this comes from the OAuth and incremental JSON patchsets but it also adds value on its own. Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com	2024-03-17 23:56:15 +01:00
Tom Lane	33f13168cc	Mark hash_corrupted() as pg_attribute_noreturn. Coverity started complaining about this after `cc5ef90ed`. The code's not really different from before, but might as well clarify its intent.	2024-03-17 17:54:45 -04:00
Dean Rasheed	c649fa24a4	Add RETURNING support to MERGE. This allows a RETURNING clause to be appended to a MERGE query, to return values based on each row inserted, updated, or deleted. As with plain INSERT, UPDATE, and DELETE commands, the returned values are based on the new contents of the target table for INSERT and UPDATE actions, and on its old contents for DELETE actions. Values from the source relation may also be returned. As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be used as the source relation for other operations such as WITH queries and COPY commands. Additionally, a special function merge_action() is provided, which returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action executed for each row. The merge_action() function can be used anywhere in the RETURNING list, including in arbitrary expressions and subqueries, but it is an error to use it anywhere outside of a MERGE query's RETURNING list. Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera, Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut, and Wolfgang Walther. Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com	2024-03-17 13:58:59 +00:00
Peter Eisentraut	6a004f1be8	Add attstattarget to FormExtraData_pg_attribute This allows setting attstattarget when a relation is created. We make use of this by having index_concurrently_create_copy() copy over the attstattarget values when the new index is created, instead of having index_concurrently_swap() fix it up later. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:38:27 +01:00
Peter Eisentraut	d939cb2fd6	Generalize handling of nullable pg_attribute columns in DDL DDL code uses tuple descriptors to pass around pg_attribute values during table and index creation. But tuple descriptors don't include the variable-length/nullable columns of pg_attribute, so they have to be handled separately. Right now, the attoptions field is handled in a one-off way with a separate argument passed to InsertPgAttributeTuples(). The other affected fields of pg_attribute are right now not needed at relation creation time. The goal of this patch is to generalize this to allow handling additional variable-length/nullable columns of pg_attribute in a similar manner. For that, create a new struct FormExtraData_pg_attribute, which is to be passed around in parallel to the tuple descriptor and optionally supplies the additional columns. Right now, this struct only contains one field for attoptions, so no functionality is actually changed by this. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:30:51 +01:00
Peter Eisentraut	012460ee93	Make stxstattarget nullable To match attstattarget change (commit `4f622503d6`). The logic inside CreateStatistics() is clarified a bit compared to that previous patch, and so here we also update ATExecSetStatistics() to match. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:26:26 +01:00
Dean Rasheed	33e729c514	Fix EXPLAIN output for subplans in MERGE. Given a subplan in a MERGE query, EXPLAIN would sometimes fail to properly display expressions involving Params referencing variables in other parts of the plan tree. This would affect subplans outside the topmost join plan node, for which expansion of Params would go via the top-level ModifyTable plan node. The problem was that "inner_tlist" for the ModifyTable node's deparse_namespace was set to the join node's targetlist, but "inner_plan" was set to the ModifyTable node itself, rather than the join node, leading to incorrect results when descending to the referenced variable. Fix and backpatch to v15, where MERGE was introduced. Discussion: https://postgr.es/m/CAEZATCWAv-sZuH%2BwG5xJ-%2BGt7qGNGX8wUQd3XYydMFDKgRB9nw%40mail.gmail.com	2024-03-17 10:17:11 +00:00
Peter Eisentraut	20e58105ba	Separate equalRowTypes() from equalTupleDescs() This introduces a new function equalRowTypes() that is effectively a subset of equalTupleDescs() but only compares the number of attributes and attribute name, type, typmod, and collation. This is enough for most existing uses of equalTupleDescs(), which are changed to use the new function. The only remaining callers of equalTupleDescs() are those that really want to check the full tuple descriptor as such, without concern about record or row or record type semantics. The existing function hashTupleDesc() is renamed to hashRowType(), because it now corresponds more to equalRowTypes(). The purpose of this change is to be clearer about the semantics of the equality asked for by each caller. (At least one caller had a comment that questioned whether equalTupleDescs() was too restrictive.) For example, `4f622503d6` removed attstattarget from the tuple descriptor structure. It was not fully clear at the time how this should affect equalTupleDescs(). Now the answer is clear: By their own definitions, equalRowTypes() does not care, and equalTupleDescs() just compares whatever is in the tuple descriptor but does not care why it is in there. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org	2024-03-17 05:58:04 +01:00
Daniel Gustafsson	b783186515	Add destroyStringInfo function for cleaning up StringInfos destroyStringInfo() is a counterpart to makeStringInfo(), freeing a palloc'd StringInfo and its data. This is a convenience function to align the StringInfo API with the PQExpBuffer API. Originally added in the OAuth patchset, it was extracted and committed separately in order to aid upcoming JSON work. Author: Daniel Gustafsson <daniel@yesql.se> Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com	2024-03-16 23:18:28 +01:00
Alexander Korotkov	927332b95e	psql: fix variable existence tab completion psql has the :{?name} syntax for testing for a psql variable existence. This commit implements a tab completion for this syntax. Notably, in order to implement this we have to remove '{' from WORD_BREAKS. It appears that '{' here from the very beginning and it comes from the default value of rl_basic_word_break_characters. And :{?name} is the only psql syntax using the '{' sign. So, removing it from WORD_BREAKS shouldn't break anything. Discussion: https://postgr.es/m/CAGRrpzZU48F2oV3d8eDLr%3D4TU9xFH5Jt9ED%2BqU1%2BX91gMH68Sw%40mail.gmail.com Author: Steve Chavez Reviewed-by: Erik Wienhold	2024-03-16 23:49:10 +02:00
Alexander Korotkov	605062227f	Use locale-aware value for \watch in 005_timeouts.pl Reported-by: Alexander Lakhin	2024-03-15 21:37:17 +02:00
Daniel Gustafsson	196eeb6b2f	Fix handling of expecteddir in pg_regress Commit `c855872074` introduced a new parameter to pg_regress to set the directory where to look for expected files, but accidentally only implemented it for when compiling pg_regress for ECPG tests. Fix by adding support for the parameter to the main regression test compilation of pg_regress as well. Backpatch to v16 where --expecteddir was introduced. Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Discussion: https://postgr.es/m/CAO6_Xqq5yKJHcJsq__LPcKwSY0XHRqVERNWGxx5ttNXXj7+W=A@mail.gmail.com Backpatch-through: 16	2024-03-15 17:02:07 +01:00
Heikki Linnakangas	d802ff06d0	Fix backstop in gin test if injection point is not reached Per Tom Lane's observation that the test got stuck in infinite loop if the injection_points module was not loaded. It was supposed to give up after 10000 iterations, but the backstop was broken. Discussion: https://www.postgresql.org/message-id/2498595.1710511222%40sss.pgh.pa.us	2024-03-15 17:55:12 +02:00
Heikki Linnakangas	85f65d7a26	Try to unbreak injection-fault tests in the buildfarm The buildfarm script attempts to run all tests marked as NO_INSTALLCHECK under src/test/modules without paying attention to whether they are enabled or disabled in the parent Makefile. That hasn't been a problem so far, because all the tests marked with NO_INSTALLCHECK ran unconditionally in "make check". But commit `e2e3b8ae9e` changed that: the injection fault tests are marked as NO_INSTALLCHECK, and also depend on --enable-injection-points. Try to work around that by ensuring that "make check" does nothing in the those subdirectories. We can hopefully get rid of this hack soon, after fixing the buildfarm client, or by switching to meson. Discussion: https://www.postgresql.org/message-id/8e4cf596-dd70-432e-9068-16466ed596ed%40iki.fi	2024-03-15 15:22:12 +02:00
Alexander Korotkov	7a65cc079e	Fix wordings in timeouts TAP test Reported-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/20240315.104235.1835366724413653745.horikyota.ntt%40gmail.com Author: Andrey Borodin	2024-03-15 14:38:22 +02:00
Alexander Korotkov	4c2eda67f5	Fix race condition in transaction timeout TAP tests The interruption handler within the injection point can get stuck in an infinite loop while handling transaction timeout. To avoid this situation we reset the timeout flag before invoking the injection point. Author: Alexander Korotkov Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/ZfPchPC6oNN71X2J%40paquier.xyz	2024-03-15 14:38:22 +02:00
Heikki Linnakangas	a3f349c612	Improve log messages referring to background worker processes "Worker" could also mean autovacuum worker or slot sync worker, so let's be more explicit. Per Tristan Partin's suggestion. Discussion: https://www.postgresql.org/message-id/CZM6WDX5H4QI.NZG1YUCKWLA@neon.tech	2024-03-15 13:14:38 +02:00
Heikki Linnakangas	e2e3b8ae9e	Disable tests using injection points in installcheck The 'gin' test injections faults to GIN index build. If another test running concurrently in the same cluster also tries to create a GIN index, it will hit the fault, too. To fix, disable tests using injection points when running against an existing cluster. A better long-term solution would be to make the injection points scoped to the database or process, but this will do for now. Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJYhcG_o2nwSK6r01eOZJwNWUJUbX%3D%3DAVnW84f-%2B8yamQ@mail.gmail.com Discussion: https://www.postgresql.org/message-id/10fd6cdd-c5d9-46fe-9fa1-7e661191309e@iki.fi	2024-03-15 13:06:57 +02:00
Michael Paquier	071e3ad59d	Add basic TAP tests for the low-level backup method, take two There are currently no tests for the low-level backup method where pg_backup_start() and pg_backup_stop() are involved while taking a file-system backup. The tests introduced in this commit rely on a background psql process to make sure that the backup is taken while the session doing the SQL start and stop calls remains alive. Two cases are checked here with the backup taken: - Recovery without a backup_label, leading to a corrupted state. - Recovery with a backup_label, with a consistent state reached. Both cases cross-check some patterns in the logs generated when running recovery. Compared to the first attempt in `99b4a63bef`, this includes a couple of fixes making the CI stable (5 runs succeeded here): - Add the file to the list of tests in meson.build. - Race condition with the first WAL segment that we expect in the primary's archives, by adding a poll on pg_stat_archiver. The second segment with the checkpoint record is archived thanks to pg_backup_stop waiting for it. - Fix failure of test where the backup_label does not exist. The cluster inherits the configuration of the first node; it was attempting to store segments in the first node's archives, triggering failures with copy on Windows. - Fix failure of test on Windows because of incorrect parsing of the backup_file in the success case. The data of the backup_label file is retrieved from the output pg_backup_stop() from a BackgroundPsql written directly to the backup's data folder. This would include CRLFs (\r\n), causing the startup process to fail at the beginning of recovery when parsing the backup_label because only LFs (\n) are allowed. Author: David Steele Discussion: https://postgr.es/m/f20fcc82-dadb-478d-beb4-1e2ffb0ace76@pgmasters.net	2024-03-15 08:29:54 +09:00
Michael Paquier	cc5ef90edd	Refactor initial hash lookup in dynahash.c The same pattern is used three times in dynahash.c to retrieve a bucket number and a hash bucket from a hash value. This has popped up while discussing improvements for the type cache, where this piece of refactoring would become useful. Note that hash_search_with_hash_value() does not need the bucket number, just the hash bucket. Author: Teodor Sigaev Reviewed-by: Aleksander Alekseev, Michael Paquier Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1@sigaev.ru	2024-03-15 07:57:17 +09:00
David Rowley	4169850f0b	Trim ORDER BY/DISTINCT aggregate pathkeys in gather_grouping_paths Similar to `d8a295389`, trim off any PathKeys which are for ORDER BY / DISTINCT aggregate functions from the PathKey List for the Gather Merge paths created by gather_grouping_paths(). These additional PathKeys are not valid to use after grouping has taken place as these PathKeys belong to columns which are inputs to an aggregate function and, therefore are unavailable after aggregation. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/cf63174c-8c89-3953-cb49-48f41f74941a@gmail.com Backpatch-through: 16, where `1349d2790` was added	2024-03-15 11:54:36 +13:00
Daniel Gustafsson	4665cebc8a	Login event trigger documentation wordsmithing Minor wordsmithing on the login trigger documentation and code comments to improve readability, as well as fixing a few small incorrect statements in the comments. Author: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/CAJSLCQ0aMWUh1m6E9YdjeqV61baQ=EhteJX8XOxXg8H_2Lcr0Q@mail.gmail.com	2024-03-14 23:35:35 +01:00
Tom Lane	b4a71cf65d	Make INSERT-from-multiple-VALUES-rows handle domain target columns. Commit `a3c7a993d` fixed some cases involving target columns that are arrays or composites by applying transformAssignedExpr to the VALUES entries, and then stripping off any assignment ArrayRefs or FieldStores that the transformation added. But I forgot about domains over arrays or composites :-(. Such cases would either fail with surprising complaints about mismatched datatypes, or insert unexpected coercions that could lead to odd results. To fix, extend the stripping logic to get rid of CoerceToDomain if it's atop an ArrayRef or FieldStore. While poking at this, I realized that there's a poorly documented and not-at-all-tested behavior nearby: we coerce each VALUES column to the domain type separately, and rely on the rewriter to merge those operations so that the domain constraints are checked only once. If that merging did not happen, it's entirely possible that we'd get unexpected domain constraint failures due to checking a partially-updated container value. There's no bug there, but while we're here let's improve the commentary about it and add some test cases that explicitly exercise that behavior. Per bug #18393 from Pablo Kharo. Back-patch to all supported branches. Discussion: https://postgr.es/m/18393-65fedb1a0de9260d@postgresql.org	2024-03-14 14:57:16 -04:00
Nathan Bossart	d1162cfda8	Add pg_column_toast_chunk_id(). This function returns the chunk_id of an on-disk TOASTed value. If the value is un-TOASTed or not on-disk, it returns NULL. This is useful for identifying which values are actually TOASTed and for investigating "unexpected chunk number" errors. Bumps catversion. Author: Yugo Nagata Reviewed-by: Jian He Discussion: https://postgr.es/m/20230329105507.d764497456eeac1ca491b5bd%40sraoss.co.jp	2024-03-14 10:58:00 -05:00
Heikki Linnakangas	84c18acaf6	Remove redundant snapshot copying from parallel leader to workers The parallel query infrastructure copies the leader backend's active snapshot to the worker processes. But BitmapHeapScan node also had bespoken code to pass the snapshot from leader to the worker. That was redundant, so remove it. The removed code was analogous to the snapshot serialization in table_parallelscan_initialize(), but that was the wrong role model. A parallel bitmap heap scan is more like an independent non-parallel bitmap heap scan in each parallel worker as far as the table AM is concerned, because the coordination is done in nodeBitmapHeapscan.c, and the table AM doesn't need to know anything about it. This relies on the assumption that es_snapshot == GetActiveSnapshot(). That's not a new assumption, things would get weird if you used the QueryDesc's snapshot for visibility checks in the scans, but the active snapshot for evaluating quals, for example. This could use some refactoring and cleanup, but for now, just add some assertions. Reviewed-by: Dilip Kumar, Robert Haas Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi	2024-03-14 15:18:10 +02:00
Robert Haas	2346df6fc3	Allow a no-wait lock acquisition to succeed in more cases. We don't determine the position at which a process waiting for a lock should insert itself into the wait queue until we reach ProcSleep(), and we may at that point discover that we must insert ourselves ahead of everyone who wants a conflicting lock, in which case we obtain the lock immediately. Up until now, a no-wait lock acquisition would fail in such cases, erroneously claiming that the lock couldn't be obtained immediately. Fix that by trying ProcSleep even in the no-wait case. No back-patch for now, because I'm treating this as an improvement to the existing no-wait feature. It could instead be argued that it's a bug fix, on the theory that there should never be any case whatsoever where no-wait fails to obtain a lock that would have been obtained immediately without no-wait, but I'm reluctant to interpret the semantics of no-wait that strictly. Robert Haas and Jingxian Li Discussion: http://postgr.es/m/CA+TgmobCH-kMXGVpb0BB-iNMdtcNkTvcZ4JBxDJows3kYM+GDg@mail.gmail.com	2024-03-14 08:56:06 -04:00
Alexander Korotkov	eeefd4280f	Add TAP tests for timeouts This commit adds new tests to verify that transaction_timeout, idle_session_timeout, and idle_in_transaction_session_timeout work as expected. We introduce new injection points in before throwing a timeout FATAL error and check these injection points are reached. Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com Author: Andrey Borodin Reviewed-by: Alexander Korotkov	2024-03-14 13:12:15 +02:00
Alexander Korotkov	e85662df44	Fix false reports in pg_visibility Currently, pg_visibility computes its xid horizon using the GetOldestNonRemovableTransactionId(). The problem is that this horizon can sometimes go backward. That can lead to reporting false errors. In order to fix that, this commit implements a new function GetStrictOldestNonRemovableTransactionId(). This function computes the xid horizon, which would be guaranteed to be newer or equal to any xid horizon computed before. We have to do the following to achieve this. 1. Ignore processes xmin's, because they consider connection to other databases that were ignored before. 2. Ignore KnownAssignedXids, because they are not database-aware. At the same time, the primary could compute its horizons database-aware. 3. Ignore walsender xmin, because it could go backward if some replication connections don't use replication slots. As a result, we're using only currently running xids to compute the horizon. Surely these would significantly sacrifice accuracy. But we have to do so to avoid reporting false errors. Inspired by earlier patch by Daniel Shelepanov and the following discussion with Robert Haas and Tom Lane. Discussion: https://postgr.es/m/1649062270.289865713%40f403.i.mail.ru Reviewed-by: Alexander Lakhin, Dmitry Koval	2024-03-14 13:12:05 +02:00
Alvaro Herrera	cc6e64afda	Comment out noisy libpq_pipeline test libpq_pipeline's new 'cancel' test needs more research; disable it temporarily to prevent measles in the buildfarm.	2024-03-14 10:23:38 +01:00
Daniel Gustafsson	6b41ef0330	Fix documentation comment for pg_md5_hash Commit `b69aba7457` added the errstr parameter to pg_md5_hash but missed updating the synopsis in the documentation comment. The follow-up commit `587de223f0` added the parameter to the list of outputs. The returnvalue had been changed from integer to bool before that but remained in the synopsis. This fixes both. Author: Tatsuro Yamada <tatsuro.yamada@ntt.com> Discussion: https://postgr.es/m/TYYPR01MB82313576150CC86084A122CD9E292@TYYPR01MB8231.jpnprd01.prod.outlook.com	2024-03-14 09:23:37 +01:00
Amit Kapila	9c40db3b02	Fix typos in reorderbuffer.c. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20240314.132817.1496502692848380820.horikyota.ntt@gmail.com	2024-03-14 12:12:55 +05:30
Jeff Davis	2d819a08a1	Introduce "builtin" collation provider. New provider for collations, like "libc" or "icu", but without any external dependency. Initially, the only locale supported by the builtin provider is "C", which is identical to the libc provider's "C" locale. The libc provider's "C" locale has always been treated as a special case that uses an internal implementation, without using libc at all -- so the new builtin provider uses the same implementation. The builtin provider's locale is independent of the server environment variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the database collation locale can be "C" while LC_COLLATE and LC_CTYPE are set to "en_US", which is impossible with the libc provider. By offering a new builtin provider, it clarifies that the semantics of a collation using this provider will never depend on libc, and makes it easier to document the behavior. Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider	2024-03-13 23:33:44 -07:00
Peter Eisentraut	6ab2e8385d	Put genbki.pl output into src/include/catalog/ directly With the makefile rules, the output of genbki.pl was written to src/backend/catalog/, and then the header files were linked to src/include/catalog/. This changes it so that the output files are written directly to src/include/catalog/. This makes the logic simpler, and it also makes the behavior consistent with the meson build system. Also, the list of catalog files is now kept in parallel in src/include/catalog/{meson.build,Makefile}, while before the makefiles had it in src/backend/catalog/Makefile. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/21b74bdc-183d-4dd5-9c27-9378d178f459@eisentraut.org	2024-03-14 07:11:21 +01:00
Michael Paquier	6cb1b632b3	Revert "Add basic TAP tests for the low-level backup method" This reverts commit `99b4a63bef`. The test is proving to be unstable, so revert it for now. One of the failures seen involves the cluster started without the backup_label, where the archives of the primary are overwritten, causing recovery failures on Windows. This is simple to fix, but there is another issue that's creeping behind in the form of an "invalid data in file" ERROR when parsing the backup_label for the second recovery case, as an effect of the backup_label data written after retrieving its contents from pg_backup_stop(). _ Per buildfarm member sidewinder.	2024-03-14 13:19:12 +09:00
Michael Paquier	99b4a63bef	Add basic TAP tests for the low-level backup method There are currently no tests for the low-level backup method where pg_backup_start() and pg_backup_stop() are involved while taking a file-system backup. The tests introduced in this commit rely on a background psql process to make sure that the backup is taken while the session doing the SQL start and stop calls remains alive. Two cases are checked here with the backup taken: - Recovery without a backup_label, leading to a corrupted state. - Recovery with a backup_label, with a consistent state reached. Both cases cross-check some patterns in the logs generated when running recovery. Author: David Steele Discussion: https://postgr.es/m/f20fcc82-dadb-478d-beb4-1e2ffb0ace76@pgmasters.net	2024-03-14 10:49:52 +09:00
Alexander Korotkov	e820db5b56	Improve documentation for pg_stat_checkpointer fields pg_stat_checkpointer contains statistics for checkpoints and restartpoints. Before `12915a58ee` documentation said only about checkpoints implying that restartpoint is the variation of checkpoint. `12915a58ee` introduced new separate statistics fields for restartpoints. This commit explicitly documents fields that are relevant for both checkpoints and restartpoints. Reported-by: Magnus Hagander Discussion: https://postgr.es/m/CABUevExav5-SR0x%2BG9kBUMV0G8XsvSUfuyyqmYBBJi6VHns6sw%40mail.gmail.com Reviewed-by: Anton A. Melnikov	2024-03-14 02:17:59 +02:00
Nathan Bossart	ecb0fd3372	Reintroduce MAINTAIN privilege and pg_maintain predefined role. Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation. Roles with privileges of pg_maintain may run those same commands on all relations. This was previously committed for v16, but it was reverted in commit `151c22deee` due to concerns about search_path tricks that could be used to escalate privileges to the table owner. Commits `2af07e2f74`, `59825d1639`, and `c7ea3f4229` resolved these concerns by restricting search_path when running maintenance commands. Bumps catversion. Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13	2024-03-13 14:49:26 -05:00
Robert Haas	2041bc4276	Add the system identifier to backup manifests. Before this patch, if you took a full backup on server A and then tried to use the backup manifest to take an incremental backup on server B, it wouldn't know that the manifest was from a different server and so the incremental backup operation could potentially complete without error. When you later tried to run pg_combinebackup, you'd find out that your incremental backup was and always had been invalid. That's poor timing, because nobody likes finding out about backup problems only at restore time. With this patch, you'll get an error when trying to take the (invalid) incremental backup, which seems a lot nicer. Amul Sul, revised by me. Review by Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoYLZzbSAMM3cAjV4Y+iCRZn-bR9H2+Mdz7NdaJFU1Zb5w@mail.gmail.com	2024-03-13 15:12:33 -04:00
Alvaro Herrera	1ee910ce43	Hopefully make libpq_pipeline's new cancel test more reliable The newly introduced cancel test in libpq_pipeline was flaky. It's not completely clear why, but one option is that the check for "active" was actually seeing the active state for the previous query. This change should address any such race condition by first waiting until the connection is reported as idle. Author: Jelte Fennema-Nio <me@jeltef.nl> Discussion: https://postgr.es/m/CAGECzQRvmUK5-d68A+cm+fgmfht9Dv2uZ28-qq3QiaF6EAZqPQ@mail.gmail.com	2024-03-13 19:55:09 +01:00
Robert Haas	dbfc447165	Expose new function get_controlfile_by_exact_path(). This works just like get_controlfile(), but expects the path to the control file rather than the path to the data directory that contains the control file. This makes more sense in cases where the caller has already constructed the path to the control file itself. Amul Sul and Robert Haas, reviewed by Michael Paquier	2024-03-13 12:06:44 -04:00
Peter Eisentraut	97d85be365	Make the order of the header file includes consistent Similar to commit `7e735035f2`. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com	2024-03-13 15:07:00 +01:00
Peter Eisentraut	6612185883	Fix incorrect format placeholders	2024-03-13 06:40:32 +01:00

... 3 4 5 6 7 ...

43091 Commits