postgresql

Commit Graph

Author	SHA1	Message	Date
Tomas Vondra	f8ce4ed78c	Allow copying files using clone/copy_file_range Adds --clone/--copy-file-range options to pg_combinebackup, to allow copying files using file cloning or copy_file_range(). These methods may be faster than the standard block-by-block copy, but the main advantage is that they enable various features provided by CoW filesystems. This commit only uses these copy methods for files that did not change and can be copied as a whole from a single backup. These new copy methods may not be available on all platforms, in which case the command throws an error (immediately, even if no files would be copied as a whole). This early failure seems better than failing later when trying to copy the first file, after performing a lot of work on earlier files. If the requested copy method is available, but a checksum needs to be recalculated (e.g. because of a different checksum type), the file is still copied using the requested method, but it is also read for the checksum calculation. Depending on the filesystem this may be more expensive than just performing the simple copy, but it does enable the CoW benefits. Initial patch by Jakub Wartak, various reworks and improvements by me. Author: Tomas Vondra, Jakub Wartak Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 18:01:32 +02:00
Tom Lane	3c5ff36aba	Suppress "variable may be used uninitialized" warning. Buildfarm member caiman is showing this, which surprises me because it's very late-model gcc (14.0.1) and ought to be smart enough to know that elog(ERROR) doesn't return. But we're likely to see the same from stupider compilers too, so add a dummy initialization in our usual style.	2024-04-05 10:58:30 -04:00
Robert Haas	fe8eaa5442	docs: Merge separate chapters on built-in index AMs into one. The documentation index is getting very long, which makes it hard to find things. Since these chapters are all very similar in structure and content, merging them is a natural way of reducing the size of the toplevel index. Rather than actually combining all of the SGML into a single file, keep one file per <sect1>, and add a glue file that includes all of them. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-05 10:34:04 -04:00
Tomas Vondra	10e3226ba1	Align blocks in incremental backups to BLCKSZ Align blocks stored in incremental files to BLCKSZ, so that the incremental backups work well with CoW filesystems. The header of the incremental file is padded with \0 to a multiple of BLCKSZ, so that the block data (also BLCKSZ) is aligned to BLCKSZ. The padding is added only to files containing block data, so files with just the header remain small. This adds a bit of extra space, but as the number of blocks increases the overhead gets negligible very quickly. And as the padding is \0 bytes, it does compress extremely well. The alignment is important for CoW filesystems that usually require the blocks to be aligned to filesystem page size for features like block sharing, deduplication etc. to work well. With the variable sized header the blocks in the increments were not aligned at all, negating the benefits of the CoW filesystems. This matters even for non-CoW filesystems, for example when placed on a RAID array. If the block is not aligned, it may easily span multiple devices, causing read and write amplification. It might be better to align the blocks to the filesystem page, not BLCKSZ, but we have no good way to determine that. Even if we determine the page size at the time of taking the backup, the backup may move. For now the BLCKSZ seems sufficient - the filesystem page is usually 4K, so the default BLCKSZ (8K by default) is aligned to that. Author: Tomas Vondra Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 16:30:01 +02:00
Alvaro Herrera	ee1cbe806d	Operate XLogCtl->log{Write,Flush}Result with atomics This removes the need to hold both the info_lck spinlock and WALWriteLock to update them. We use stock atomic write instead, with WALWriteLock held. Readers can use atomic read, without any locking. This allows for some code to be reordered: some places were a bit contorted to avoid repeated spinlock acquisition, but that's no longer a concern, so we can turn them to more natural coding. Some further changes are possible (maybe to performance wins), but in this commit I did rather minimal ones only, to avoid increasing the blast radius. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql	2024-04-05 16:14:39 +02:00
Amit Kapila	6f132ed693	Allow synced slots to have their inactive_since. This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com	2024-04-05 09:48:49 +05:30
Michael Paquier	f98dbdeb51	Add "ABI_compatibility" regions to wait_event_names.txt The current design behind the automatic generation of the C code and documentation related to wait events introduced in `fa88928470` does not offer a way to attach new wait events without breaking ABI compatibility, as all the events are forcibly reordered for each section in the input file wait_event_names.txt. Adding new wait events to stable branches is something that has happened in the past, `0b6517a3b7` being a recent example of that with VERSION_FILE_SYNC, so we need a way to generate any C code for wait events while maintaining compatibility on stable branches already released. This commit solves this issue by adding a new region called "ABI_compatibility" (keyword could be updated to something else if someone had a better idea) to each section of wait_event_names.txt, so as one can add new wait events to stable branches in wait_event_names.txt while keeping the code ABI-compatible. "ABI_compatibility" has no impact on the documentation generated: all the wait events of one section are still alphabetically ordered. LWLock and Lock sections generate their C code elsewhere, so they do not need an "ABI_compatibility" region. For example, let's imagine a wait_event_names.txt like that: Section: ClassName - Foo FOO_1 "Waiting in Foo 1" FOO_2 "Waiting in Foo 2" ABI_compatibility: NEW_FOO_1 "Waiting in New Foo 1" NEW_BAR_1 "Waiting in New Bar 1" This results in the following enum, where the events in the ABI region are listed last with the same ordering as in wait_event_names.txt: typedef enum { WAIT_EVENT_FOO_1, WAIT_EVENT_FOO_2, WAIT_EVENT_NEW_FOO_1, WAIT_EVENT_NEW_BAR_1 } WaitEventFoo; New wait events added in stable branches should be added at the end of each ABI_compatibility region, and ABI_compatibility should remain empty on HEAD and unreleased stable branches. This design has been suggested by Noah Misch and me. Reported-by: Noah Misch Author: Bertrand Drouvot Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com	2024-04-05 08:56:52 +09:00
Jeff Davis	e2a2357671	Fix test failures when language environment is not UTF-8. For tests that depend on UTF-8 encoding, force LC_COLLATE=C and LC_CTYPE=C to avoid an encoding mismatch. Reported-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGK-ZqV1njkG_=xcCqXh2fcMkz85FTMnhS2opm4ZerH=xw@mail.gmail.com	2024-04-04 16:10:12 -07:00
Robert Haas	e57fe3824e	Fix old, misleading comment for PGRES_POLLING_ACTIVE. The comment implies that we can eventually remove this, but per discussion, we actually don't want to do that ever, in order to maintain compatibility. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Robert Haas	12b964d781	Remove reachable call to pg_unreachable(). The loop just before this uses break, not return, so this line is reachable. Commit `cafe105655` introduced this issue. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com	2024-04-04 16:22:11 -04:00
Tom Lane	096a761d68	Fix ecpg's mechanism for detecting unsupported cases in the grammar. ecpg wants to emit a warning if it parses a SQL construct that the backend can parse but will immediately throw a FEATURE_NOT_SUPPORTED error for. The way it was testing for this was to see if the string ERRCODE_FEATURE_NOT_SUPPORTED appeared anywhere in the gram.y code. This is, of course, not nearly good enough, as there are plenty of rules in gram.y that throw that error only conditionally. There was a hack dating to 2008 to suppress the warning in one rule that doesn't even exist anymore, but nothing for other cases we've created since then. End result was that you could get "unsupported feature will be passed to server" warnings while compiling perfectly good SQL code in ecpg. Somehow we'd not heard complaints about this, but it was exposed by the recent addition of an ecpg test for a SQL/JSON construct. To fix, suppress the warning if the rule contains any "if" statement. Manual comparison of gram.y with the generated preproc.y file shows that the warning is now emitted only in rules where it's sensible. This problem has existed for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/603615.1712245382@sss.pgh.pa.us	2024-04-04 15:31:53 -04:00
Tom Lane	332d406140	Further cleanup for recent JSON-related commits. The link commands in test_json_parser/Makefile were a long way shy of a load, as evidenced by buildfarm failures. Model them on pgxs.mk's PROGRAM rule. (Probably we should have put these two test programs in different subdirectories so we could actually use the PROGRAM rule. But I won't question that decision today.)	2024-04-04 13:39:12 -04:00
Tom Lane	2497a669ef	Further cleanup for recent JSON-related commits. Add overlooked .gitignore entries. Fix test_json_parser/Makefile to use the pgxs.mk clean rule instead of fighting it. Suppresses a warning from make, at least for me.	2024-04-04 13:21:25 -04:00
Andrew Dunstan	88620824c2	Tidy up after incremental JSON parser patch Remove junk left over from non-vpath builds. Try to remedy gettext error on some platforms.	2024-04-04 12:41:55 -04:00
Andrew Dunstan	1b00fe30a6	Fix warnings re typedef redefinition in `ea7b4e9a2a` and `3311ea86ed` Per gripe from Tom Lane and the buildfarm	2024-04-04 11:36:26 -04:00
Amit Langote	6f4d63e989	Add missing initialization in transformJsonFuncExpr() `de3600452b` added some code for the new JSON_TABLE_OP to that function but missed to initialize the default_format variable. Reported-by: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl	2024-04-04 22:01:13 +09:00
Amit Langote	2f6e78b061	Fix typo introduced in `6185c9737` Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com	2024-04-04 20:53:23 +09:00
Amit Langote	de3600452b	Add basic JSON_TABLE() functionality JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in `6185c9737c` are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-04 20:20:15 +09:00
Peter Eisentraut	a9d6c38684	pg_upgrade: Fix typo in message	2024-04-04 12:58:57 +02:00
Andrew Dunstan	222e11a10a	Use incremental parsing of backup manifests. This changes the three callers to json_parse_manifest() to use json_parse_manifest_incremental_chunk() if appropriate. In the case of the backend caller, since we don't know the size of the manifest in advance we always call the incremental parser. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	ea7b4e9a2a	Add support for incrementally parsing backup manifests This adds the infrastructure for using the new non-recursive JSON parser in processing manifests. It's important that callers make sure that the last piece of json handed to the incremental manifest parser contains the entire last few lines of the manifest, including the checksum. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Andrew Dunstan	3311ea86ed	Introduce a non-recursive JSON parser This parser uses an explicit prediction stack, unlike the present recursive descent parser where the parser state is represented on the call stack. This difference makes the new parser suitable for use in incremental parsing of huge JSON documents that cannot be conveniently handled piece-wise by the recursive descent parser. One potential use for this will be in parsing large backup manifests associated with incremental backups. Because this parser is somewhat slower than the recursive descent parser, it is not replacing that parser, but is an additional parser available to callers. For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all JSON parsing is done with the non-recursive parser, in which case only trivial regression differences in error messages should be observed. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net	2024-04-04 06:46:40 -04:00
Peter Eisentraut	585df02b44	Silence meson warning Commit `619bc23a1a` introduced WARNING: Project targets '>=0.54' but uses feature introduced in '0.55.0': Passing executable/found program object to script parameter of add_dist_script Work around that by wrapping the offending line in a meson version check. Author: Tristan Partin <tristan@neon.tech> Discussion: https://www.postgresql.org/message-id/flat/D096Q3NFFVH1.1T5RE4MOO9ZFH%40neon.tech	2024-04-04 11:22:07 +02:00
Etsuro Fujita	dd24098cd6	postgres_fdw: Remove useless ternary expression. There is no case where we would call pgfdw_exec_cleanup_query or pgfdw_exec_cleanup_query_{begin,end} with a NULL query string, so this expression is pointless; remove it and instead add to the latter functions an assertion ensuring the given query string is not NULL. Thinko in commit `815d61fcd`. Discussion: https://postgr.es/m/CAPmGK14mm%2B%3DUjyjoWj_Hu7c%2BQqX-058RFfF%2BqOkcMZ_Nj52v-A%40mail.gmail.com	2024-04-04 17:55:00 +09:00
David Rowley	3a4a3537a9	Secondary refactor of heap scanning functions Similar to `44086b097`, refactor heap scanning functions to be more suitable for the read stream API. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 19:22:45 +13:00
Michael Paquier	2a217c3717	Coordinate emit_log_hook and all log destinations to share the same timeval This would cause the timestamp values used by emit_log_hook and all the other log destinations to differ, because the timestamps are reset before sending the logs to the server and after calling the hook. This change matters for emit_log_hook when generating log information with 'n' or 'm' in log_line_prefix through log_status_format(), or when doing direct calls to get_formatted_log_time() like in the JSON or CSV log formats. While on it, this commit fixes a couple of comments related to the formatted timestamps where the JSON was not mentioned. Oversight in `dc686681e0`, that I have noticed while reviewing this patch. Author: Kambam Vinay, Michael Paquier Discussion: https://postgr.es/m/CANiRfmsK36A0i8mnQtzaxhSm3CUCimPwJPp4WQNq53OdSNkgWg@mail.gmail.com	2024-04-04 14:15:22 +09:00
David Rowley	44086b0975	Preliminary refactor of heap scanning functions To allow the use of the read stream API added in `b5a9b18cd` for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 16:41:13 +13:00
Michael Paquier	85230a247c	pg_regress: Save errno in emit_tap_output_v() and switch to %m emit_tap_output_v() includes some fprintf() calls for some output related to the TAP protocol, that may clobber errno and break %m. This commit makes the logging of pg_regress smarter by saving errno before restoring it in vfprintf() where the input strings are used, removing the need for strerror(). All logs are switched to %m rather than strerror(), shaving some code. This was not a problem until now as pg_regress.c did not use %m, but the change is simple enough that we have no reason to not support this placeholder, and that will avoid future mistakes if new logs that include %m are added. Author: Dagfinn Ilmari Mannsåker Reviewed-by: Peter Eisentraunt, Michael Paquier Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org	2024-04-04 11:33:07 +09:00
Jeff Davis	71b66171d0	CREATE INDEX: do not update stats during binary upgrade. During binary upgrade, indexes are created before the data is moved into place, so it will always be zero. This is not currently a major problem, but will be when we try to preserve statistics during upgrade. Author: Corey Huinker Discussion: https://postgr.es/m/CADkLM=daPdFB8V0tgFxK-dLowFsAEzWRWJHyxij7BG3kBjcouA@mail.gmail.com	2024-04-03 16:12:45 -07:00
Tom Lane	06286709ee	Invent SERIALIZE option for EXPLAIN. EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de	2024-04-03 17:41:57 -04:00
Alexander Korotkov	97ce821e3e	Fix the parameters order for TableAmRoutine.relation_copy_for_cluster() Specify OldTable first, NewTable second as used by table_relation_copy_for_cluster() and as implemented in heapam_relation_copy_for_cluster(). Backpatch to PostgreSQL 12, where TableAmRoutine was introduced. Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM Author: Japin Li Reviewed-by: Pavel Borisov Backpatch-through: 12	2024-04-04 00:34:28 +03:00
Robert Haas	f470b5c679	docs: Demote "Monitoring Disk Usage" from chapter to section. This chapter is very short, and the immediately preceding chapter is called "Monitoring Database Activity". So, instead of having a separate chapter for this, make it the last section of the preceding chapter instead. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-03 16:09:41 -04:00
Alvaro Herrera	c9920a9068	Split XLogCtl->LogwrtResult into separate struct members After this change we have XLogCtl->logWriteResult and ->logFlushResult. There's no functional change, other than the fact that the assignment from shared memory to local is no longer done via struct assignment, but instead using a macro that copies each member separately. The current representation is inconvenient going forward; notably, we would like to add a new member "Copy" (to keep track of the last position copied into WAL buffers), so the symmetry between the values in shared memory vs. those in local would be lost. This also gives us freedom to later change the concurrency model for the values in shared memory: we can make them use atomics instead of relying on the info_lck spinlock. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/202404031119.cd2kugjk2vho@alvherre.pgsql	2024-04-03 19:55:11 +02:00
Nathan Bossart	deb1486c7d	Inline pg_popcount() for small buffers. If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13	2024-04-03 12:22:02 -05:00
Heikki Linnakangas	6dbb490261	Combine freezing and pruning steps in VACUUM Execute both freezing and pruning of tuples in the same heap_page_prune() function, now called heap_page_prune_and_freeze(), and emit a single WAL record containing all changes. That reduces the overall amount of WAL generated. This moves the freezing logic from vacuumlazy.c to the heap_page_prune_and_freeze() function. The main difference in the coding is that in vacuumlazy.c, we looked at the tuples after the pruning had already happened, but in heap_page_prune_and_freeze() we operate on the tuples before pruning. The heap_prepare_freeze_tuple() function is now invoked after we have determined that a tuple is not going to be pruned away. VACUUM no longer needs to loop through the items on the page after pruning. heap_page_prune_and_freeze() does all the work. It now returns the list of dead offsets, including existing LP_DEAD items, to the caller. Similarly it's now responsible for tracking 'all_visible', 'all_frozen', and 'hastup' on the caller's behalf. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-03 19:32:28 +03:00
Heikki Linnakangas	26d138f644	Refactor how heap_prune_chain() updates prunable_xid In preparation of freezing and counting tuples which are not candidates for pruning, split heap_prune_record_unchanged() into multiple functions, depending the kind of line pointer. That's not too interesting right now, but makes the next commit smaller. Recording the lowest soon-to-be prunable xid is one of the actions we take for unchanged LP_NORMAL item pointers but not for others, so move that to the new heap_prune_record_unchanged_lp_normal() function. The next commit will add more actions to these functions. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-03 19:32:21 +03:00
Alvaro Herrera	be2f073100	Fix zeroing of pg_serial page without SLRU bank lock Bug in commit 53c2a97a9266: we failed to acquire the correct SLRU bank lock when iterating to zero-out intermediate pages in predicate.c. Rewrite the code block so that we follow the locking protocol correctly. Also update an outdated comment in the same file -- SerialSLRULock exists no more. Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/2a25eaf4-a3a4-5fd1-6241-9d7c73142085@gmail.com	2024-04-03 17:49:44 +02:00
Alexander Korotkov	bf1e650806	Use the pairing heap instead of a flat array for LSN replay waiters `06c418e163` introduced pg_wal_replay_wait() procedure allowing to wait for the particular LSN to be replayed on standby. The waiters were stored in the flat array. Even though scanning small arrays is fast, that might be a problem at scale (a lot of waiting processes). This commit replaces the flat shared memory array with the pairing heap, which holds the waiter with the least LSN at the top. This gives us O(log N) complexity for both inserting and removing waiters. Reported-by: Alvaro Herrera Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql	2024-04-03 18:15:41 +03:00
Daniel Gustafsson	936e3fa378	Drop global objects after completed test Project policy is to not leave global objects behind after a regress test run. This was found as a result of the development of a patch to make pg_regress detect such leftovers automatically, which in the end was withdrawn due to issues with parallel runs. Discussion: https://postgr.es/m/E1phvk7-000VAH-7k@gemulon.postgresql.org	2024-04-03 13:33:25 +02:00
Amit Kapila	2ec005b4e2	Ensure that the sync slots reach a consistent state after promotion without losing data. We were directly copying the LSN locations while syncing the slots on the standby. Now, it is possible that at some particular restart_lsn there are some running xacts, which means if we start reading the WAL from that location after promotion, we won't reach a consistent snapshot state at that point. However, on the primary, we would have already been in a consistent snapshot state at that restart_lsn so we would have just serialized the existing snapshot. To avoid this problem we will use the advance_slot functionality unless the snapshot already exists at the synced restart_lsn location. This will help us to ensure that snapbuilder/slot statuses are updated properly without generating any changes. Note that the synced slot will remain as RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a consistent snapshot state after which they will be marked as RS_PERSISTENT. Per buildfarm Author: Hou Zhijie Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-04-03 14:04:59 +05:30
Alexander Korotkov	e37662f221	Minor improvements for waitlsn.c * Remove extra includes * Fill 'cur' in addLSNWaiter() before taking the spinlock * Initialize 'endtime' with zero in WaitForLSN() to avoid compiler warning Reported-by: Alvaro Herrera, Masahiko Sawada, Daniel Gustafsson Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Discussion: https://postgr.es/m/CAD21AoAx7irptnPH1OkkkNh9E0M6X-phfX7sYZfwoMsc1qV1sQ%40mail.gmail.com	2024-04-03 11:32:39 +03:00
Daniel Gustafsson	9301308bd1	Fix indentation from `cafe105655` Per buildfarm animal koel	2024-04-03 09:44:47 +02:00
Daniel Gustafsson	226261f387	Add error codes to some PANIC/FATAL errors reports This adds errcodes to a set of PANIC and FATAL errors in xlog.c and relcache.c, which previously had no errcode at all set, in order to make fleetwide analysis of errorlogs easier. There are many more ereport/elogs left which could benefit from having an errcode but this at least makes a dent in the issue. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ1k8LgLEqncPGmz_fWnrobV6bjABOTH4tOWta6xNcPQig@mail.gmail.com	2024-04-03 09:19:25 +02:00
Nathan Bossart	c627d944e6	Add built-in ERROR handling for archive callbacks. Presently, the archiver process restarts when an archive callback ERRORs. To avoid this, archive module authors can use sigsetjmp(), manage a memory context, etc., but that requires a lot of extra code that will likely look roughly the same between modules. This commit adds basic archive callback ERROR handling to pgarch.c so that module authors won't ordinarily need to worry about this. While this built-in handler attempts to clean up anything that an archive module could conceivably have left behind, it is possible that some modules are doing unexpected things that require additional cleanup. Module authors should be sure to do any extra required cleanup in a PG_CATCH block within the archiving callback. The archiving callback is now called in a short-lived memory context that the archiver process resets between invocations. If a module requires longer-lived storage, it must maintain its own memory context. Thanks to these changes, the basic_archive module can be greatly simplified. Suggested-by: Andres Freund Reviewed-by: Andres Freund, Yong Li Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13	2024-04-02 22:28:11 -05:00
Masahiko Sawada	5bec1d6bc5	Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions. Previously, when selecting the transaction to evict during logical decoding, we check all transactions to find the largest transaction. This could lead to a significant replication lag especially in the case where there are many subtransactions. This commit improves the eviction algorithm in ReorderBuffer using the max-heap with transaction size as the key to efficiently find the largest transaction. The max-heap starts with empty. While the max-heap is empty, we don't do anything for the max-heap when updating the memory counter. Therefore, we get the largest transaction in O(N) time, where N is the number of transactions including top-level transactions and subtransactions. We build the max-heap just before selecting the largest transactions if the number of transactions being decoded is higher than the threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap, we also update the max-heap when updating the memory counter. The intention is to efficiently find the largest transaction in O(1) time instead of incurring the cost of memory counter updates (O(log N)). Once the number of transactions got lower than the threshold, we reset the max-heap. The performance benchmark results showed significant speed up (more than x30 speed up on my machine) in decoding a transaction with 100k subtransactions, whereas there is no visible overhead in other cases. Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian, Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com	2024-04-03 11:40:42 +09:00
David Rowley	7487044d6c	Don't adjust ressortgroupref in generate_setop_child_grouplist() This is already done inside assignSortGroupRef(), therefore is redundant. Oversight from `66c0185a3`. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us	2024-04-03 15:39:29 +13:00
Masahiko Sawada	b840508644	Add functions to binaryheap for efficient key removal and update. Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com	2024-04-03 10:44:21 +09:00
Masahiko Sawada	bcb14f4abc	Make binaryheap enlargeable. The node array space of the binaryheap is doubled when there is no available space. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com	2024-04-03 10:27:43 +09:00
Alexander Korotkov	2c91e13013	Move WaitLSNShmemInit() to CreateOrAttachShmemStructs() Thanks to Andres Freund, Thomas Munrom and David Rowley for investigating this issue. Discussion: https://postgr.es/m/CAPpHfdvap5mMLikt8CUjA0osAvCJHT0qnYeR3f84EJ_Kvse0mg%40mail.gmail.com	2024-04-03 02:55:03 +03:00
David Rowley	3b1a7eb289	Don't zero tuple_fraction when planning UNIONs with ORDER BYs Since `66c0185a3`, the planner is able to use Merge Append -> Unique to implement UNION queries and each subquery is prompted to produce Paths correctly sorted by the UNION's targetlist. Here we remove some now redundant code which was zeroing the tuple_fraction at the parent level. This will allow the planner to consider cheap startup paths when planning the UNION's subqueries. EXCEPT and INTERSECT set operations still have the tuple_fraction zeroed in generate_nonunion_paths(). These operations currently always read all of their subqueries' tuples. Reported-by: Tom Lane Discussion: https://postgr.es/m/3703023.1711654574@sss.pgh.pa.us	2024-04-03 11:40:33 +13:00

... 3 4 5 6 7 ...

58450 Commits All Branches Search

58450 Commits

All Branches