postgresql

Commit Graph

Author	SHA1	Message	Date
David Rowley	2d3389c28c	Doc: document cases where queryid is stable The documents were clear that queryid should not be assumed to be stable between major versions but said nothing about minor versions and left the reader to guess if that was implied by the mention of the instability of queryid between major versions. Here we give minor versions an explicit mention to indicate queryid can generally be assumed stable between minor versions. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAApHDvpYGE6h0cD9UO-eHySPynPj1L3J%3DHxT%2BA7Ud8_Yo6AuzA%40mail.gmail.com Backpatch-through: 12	2024-04-20 13:53:35 +12:00
Tomas Vondra	41d2c6f952	Add missing index_insert_cleanup calls The optimization for inserts into BRIN indexes added by `c1ec02be1d` relies on a cache that needs to be explicitly released after calling index_insert(). The commit however failed to invoke the cleanup in validate_index(), which calls index_insert() indirectly through table_index_validate_scan(). After inspecting index_insert() callers, it seems unique_key_recheck() is missing the call too. Fixed by adding the two missing index_insert_cleanup() calls. The commit does two additional improvements. The aminsertcleanup() signature is modified to have the index as the first argument, to make it more like the other AM callbacks. And the aminsertcleanup() callback is invoked even if the ii_AmCache is NULL, so that it can decide if the cleanup is necessary. Author: Alvaro Herrera, Tomas Vondra Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql	2024-04-19 16:08:34 +02:00
Tomas Vondra	95d14b7ae2	Fix a couple typos in BRIN code Typos introduced by commits `c1ec02be1d`, `b437571714` and `dae761a87e`. Author: Alvaro Herrera Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql	2024-04-19 15:43:17 +02:00
Daniel Gustafsson	e6488fcb05	Doc: Remove mention of @ and ~ GiST operators These operators were removed by `2f70fdb064` in the v14 cycle but they were accidentally left in the table of build-in operator classes. Backpatch down to v14 where the operators where removed. Author: Aleksander Alekseev <aleksander@timescale.com> Reported-by: Colin Caine <cmcaine@gmail.com> Discussion: https://postgr.es/m/CADwQTQbbr2UQ_fpbyc+8ay=RwEYgYk=TZxH3+RHDqAQfoG+EWA@mail.gmail.com Backpatch-through: v14	2024-04-19 14:50:10 +02:00
Daniel Gustafsson	84db9a0eb1	Doc: Update link to the mentioned subsection This updates the link from pg_createsubscriber to initial data sync to actually link to the subsection in question as opposed to the main logical replication section. Author: Pavel Luzanov <p.luzanov@postgrespro.ru> Discussion: https://postgr.es/m/a4af555a-ac60-4416-877d-0440d29b8763@postgrespro.ru	2024-04-18 23:02:39 +02:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Robert Haas	fbed6ebe41	Remove spurious "the". Spotted by Martin Marqués. Discussion: http://postgr.es/m/CABeG9LvQMtsKrOkhcA_mKJu1duArw4v+smeJKurYGjPVBZFecg@mail.gmail.com	2024-04-18 13:06:27 -04:00
Robert Haas	2e2d4604d9	docs: Mention that pg_combinebackup does not verify backups. We don't want users to think that pg_combinebackup is trying to check the validity of individual backups, because it isn't. Adjust the wording about sanity checks to make it clear that verification of individual backups is the job of pg_verifybackup, and that the checks performed by pg_combinebackup are around the relationships between the backups. Per discussion with David Steele. Discussion: http://postgr.es/m/e6f930c3-590c-47b9-b094-217bb2a3e22e@pgmasters.net	2024-04-18 09:52:19 -04:00
Amit Langote	ef744ebb73	SQL/JSON: Miscellaneous fixes and improvements This addresses some post-commit review comments for commits `6185c973`, `de3600452`, and 9425c596a0, with the following changes: * Fix JSON_TABLE() syntax documentation to use the term "path_expression" for JSON path expressions instead of "json_path_specification" to be consistent with the other SQL/JSON functions. * Fix a typo in the example code in JSON_TABLE() documentation. * Rewrite some newly added comments in jsonpath.h. * In JsonPathQuery(), add missing cast to int before printing an enum value. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxG_e0QLCgaELrr2ZNz7AxPeGCNKAORe3fHtFCQLsH4J4Q@mail.gmail.com	2024-04-18 14:46:43 +09:00
Masahiko Sawada	f6f8ac8e75	doc: Fix COPY ON_ERROR option syntax synopsis. ON_ERROR option values don't require quotations, which was inconsistent with the syntax synopsis in the documentation. Oversight in `b725b7eec4`. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoC%3Dn4xR3%2BKQiqodnfT9chSB62XwZqmMff39H%3Dx9DS4scQ%40mail.gmail.com	2024-04-17 16:10:18 +09:00
Tom Lane	6f0cef9353	Fix assorted bugs in ecpg's macro mechanism. The code associated with EXEC SQL DEFINE was unreadable and full of bugs, notably: * It'd attempt to free a non-malloced string if the ecpg program tries to redefine a macro that was defined on the command line. * Possible memory stomp if user writes "-D=foo". * Undef'ing or redefining a macro defined on the command line would change the state visible to the next file, when multiple files are specified on the command line. (While possibly that could have been an intentional choice, the code clearly intends to revert to the original macro state; it's just failing to consider this interaction.) * Missing "break" in defining a new macro meant that redefinition of an existing name would cause an extra entry to be added to the definition list. While not immediately harmful, a subsequent undef would result in the prior entry becoming visible again. * The interactions with input buffering are subtle and were entirely undocumented. It's not that surprising that we hadn't noticed these bugs, because there was no test coverage at all of either the -D command line switch or multiple input files. This patch adds such coverage (in a rather hacky way I guess). In addition to the code bugs, the user documentation was confused about whether the -D switch defines a C macro or an ecpg one, and it failed to mention that you can write "-Dsymbol=value". These problems are old, so back-patch to all supported branches. Discussion: https://postgr.es/m/998011.1713217712@sss.pgh.pa.us	2024-04-16 12:31:42 -04:00
Robert Haas	09d9800e52	docs: Consolidate into new "WAL for Extensions" chapter. Previously, we had consecutive, very short chapters called "Generic WAL" and "Custom WAL Resource Managers," explaining different approaches to the same problem. Merge them into a single chapter. Explain most of the differences between the approaches in the chapter's introductory text, rather than in the individual sections. Discussion: http://postgr.es/m/46ac50c1-6b2a-404f-a683-b67af6ab56e9@eisentraut.org	2024-04-15 15:57:13 -04:00
Nathan Bossart	953cf49e16	doc: Note exceptions for SET ROLE's effect on privilege checks. The documentation for SET ROLE states that superusers who switch to a non-superuser role lose their superuser privileges. While this is true for most commands, there are exceptions such as SET ROLE and SET SESSION AUTHORIZATION, which continue to use the current session user and the authenticated user, respectively. Furthermore, the description of this command already describes its effect, so it is arguably unnecessary to include this special case. This commit removes the note about the superuser case and adds a sentence about the aforementioned exceptions to the description. Co-authored-by: Yurii Rashkovskii Reviewed-by: Shubham Khanna, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/CA%2BRLCQysHtME0znk2KUMJN343ksboSRQSU-hCnOjesX6VK300Q%40mail.gmail.com	2024-04-15 14:03:24 -05:00
Alexander Korotkov	9dfcac8e15	Grammar fixes for split/merge partitions code The fixes relate to comments, error messages, and corresponding expected output of regression tests. Discussion: https://postgr.es/m/CAMbWs49DDsknxyoycBqiE72VxzL_sYHF6zqL8dSeNehKPJhkKg%40mail.gmail.com Discussion: https://postgr.es/m/86bfd241-a58c-479a-9a72-2c67a02becf8%40postgrespro.ru Discussion: https://postgr.es/m/CAHewXNkGMPU50QG7V6Q60JGFORfo8LfYO1_GCkCa0VWbmB-fEw%40mail.gmail.com Author: Richard Guo, Dmitry Koval, Tender Wang	2024-04-15 16:00:02 +03:00
Peter Eisentraut	9895b35cb8	Fix ALTER DOMAIN NOT NULL syntax This addresses a few problems with commit `e5da0fe3c2` ("Catalog domain not-null constraints"). In CREATE DOMAIN, a NOT NULL constraint looks like CREATE DOMAIN d1 AS int [ CONSTRAINT conname ] NOT NULL (Before `e5da0fe3c2`, the constraint name was accepted but ignored.) But in ALTER DOMAIN, a NOT NULL constraint looks like ALTER DOMAIN d1 ADD [ CONSTRAINT conname ] NOT NULL VALUE where VALUE is where for a table constraint the column name would be. (This works as of `e5da0fe3c2`. Before `e5da0fe3c2`, this syntax resulted in an internal error.) But for domains, this latter syntax is confusing and needlessly inconsistent between CREATE and ALTER. So this changes it to just ALTER DOMAIN d1 ADD [ CONSTRAINT conname ] NOT NULL (None of these syntaxes are per SQL standard; we are just living with the bits of inconsistency that have built up over time.) In passing, this also changes the psql \dD output to not show not-null constraints in the column "Check", since it's already shown in the column "Nullable". This has also been off since `e5da0fe3c2`. Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/9ec24d7b-633d-463a-84c6-7acff769c9e8%40eisentraut.org	2024-04-15 08:34:45 +02:00
Noah Misch	68ba46dfe3	Correct "improve role option documentation". This corrects doc commit `21912e3c02`. Back-patch to v16, like that one. Reviewed by David G. Johnston. Discussion: https://postgr.es/m/20240331061642.07@rfd.leadboat.com	2024-04-13 07:56:14 -07:00
Heikki Linnakangas	4cc1c76fe9	Document PG_TEST_EXTRA=libpq_encryption and also check 'kerberos' In the libpq encryption negotiation tests, don't run the GSSAPI tests unless PG_TEST_EXTRA='kerberos' is also set. That makes it possible to still run most of the tests when GSSAPI support is compiled in, but there's no MIT Kerberos installation.	2024-04-12 19:52:39 +03:00
Tom Lane	6d4f062714	Doc: fix bogus to_date() examples. November doesn't have 31 days. Remarkably, this thinko has escaped detection since commit `3f1998727`. Noted by Y. Saburov. Discussion: https://postgr.es/m/171276122213.681.531905738590773705@wrigleys.postgresql.org	2024-04-11 11:09:00 -04:00
Alexander Korotkov	772faafca1	Revert: Implement pg_wal_replay_wait() stored procedure This commit reverts `06c418e163`, `e37662f221`, `bf1e650806`, `25f42429e2`, `ee79928441`, and `74eaf66f98` per review by Heikki Linnakangas. Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi	2024-04-11 17:28:15 +03:00
Daniel Gustafsson	52b49b796c	Doc: Update ulinks to RFC documents to avoid redirect The tools.ietf.org site has been decommissioned and replaced by a number of sites serving various purposes. Links to RFCs and BCPs are now 301 redirected to their new respective IETF sites. Since this serves no purpose and only adds network overhead, update our links to the new locations. Backpatch to all supported versions. Discussion: https://postgr.es/m/3C1CEA99-FCED-447D-9858-5A579B4C6687@yesql.se Backpatch-through: v12	2024-04-10 13:53:25 +02:00
Alexander Korotkov	ff9f72c68f	revert: Transform OR clauses to ANY expression This commit reverts `72bd38cc99` due to implementation and design issues. Reported-by: Tom Lane Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us	2024-04-10 02:28:09 +03:00
David Rowley	b1b13d2b52	Doc: use "an SQL" instead of "a SQL" Although which is correct depends entirely on whether you pronounce SQL as "ess-que-ell" or "sequel", we have standardized on the former in our user-facing documentation, so use the correct article according to that pronunciation. Discussion: https://postgr.es/m/CAApHDvp3osQwQam+wNTp9BdhP+QfWO6aY6ZTixQQMfM-UArKCw@mail.gmail.com	2024-04-10 10:43:31 +12:00
Daniel Gustafsson	ad55cc9845	doc: Remove stray comma from list of psql options Back in 7.2 the list of options had short options and long options on the same line separated by comma, but since 7.3 they are listed separate lines. The comma on -X was left behind so fix by removing and backpatching all the way. Reported-by: y.saburov@gmail.com Discussion: https://postgr.es/m/171267154345.684.7212826057932148541@wrigleys.postgresql.org Backpatch-through: v12	2024-04-09 23:39:38 +02:00
Peter Eisentraut	43a9cab484	Fix whitespace	2024-04-09 11:32:48 +02:00
Heikki Linnakangas	e9f29233fd	Fix typo in docs Author: Erik Rijkers Discussion: https://www.postgresql.org/message-id/0167b1e1-676c-66ba-e857-3ad7cd84404f@xs4all.nl	2024-04-09 08:04:20 +03:00
Amit Langote	bb766cde63	JSON_TABLE: Add support for NESTED paths and columns A NESTED path allows to extract data from nested levels of JSON objects given by the parent path expression, which are projected as columns specified using a nested COLUMNS clause, just like the parent COLUMNS clause. Rows comprised from a NESTED columns are "joined" to the row comprised from the parent columns. If a particular NESTED path evaluates to 0 rows, then the nested COLUMNS will emit NULLs, making it an OUTER join. NESTED columns themselves may include NESTED paths to allow extracting data from arbitrary nesting levels, which are likewise joined against the rows at the parent level. Multiple NESTED paths at a given level are called "sibling" paths and their rows are combined by UNIONing them, that is, after being joined against the parent row as described above. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-08 16:14:13 +09:00
Thomas Munro	13453eedd3	Add pg_buffercache_evict() function for testing. When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com	2024-04-08 16:23:40 +12:00
Heikki Linnakangas	91044ae4ba	Send ALPN in TLS handshake, require it in direct SSL connections libpq now always tries to send ALPN. With the traditional negotiated SSL connections, the server accepts the ALPN, and refuses the connection if it's not what we expect, but connecting without ALPN is still OK. With the new direct SSL connections, ALPN is mandatory. NOTE: This uses "TBD-pgsql" as the protocol ID. We must register a proper one with IANA before the release! Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion	2024-04-08 04:24:51 +03:00
Heikki Linnakangas	d39a49c1e4	Support TLS handshake directly without SSLRequest negotiation By skipping SSLRequest, you can eliminate one round-trip when establishing a TLS connection. It is also more friendly to generic TLS proxies that don't understand the PostgreSQL protocol. This is disabled by default in libpq, because the direct TLS handshake will fail with old server versions. It can be enabled with the sslnegotation=direct option. It will still fall back to the negotiated TLS handshake if the server rejects the direct attempt, either because it is an older version or the server doesn't support TLS at all, but the fallback can be disabled with the sslnegotiation=requiredirect option. Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion	2024-04-08 04:24:49 +03:00
Alexander Korotkov	72bd38cc99	Transform OR clauses to ANY expression Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>	2024-04-08 01:27:52 +03:00
Tom Lane	626603d463	Doc: clarify behavior of boolean options in replication protocol commands. Same idea as `ec7e053a9`, but applying to the walsender commands described in protocol.sgml. Peter Smith Discussion: https://postgr.es/m/CAHut+PvwjZfdGt2R8HTXgSZft=jZKymrS8KUg31pS7zqaaWKKw@mail.gmail.com	2024-04-07 17:16:32 -04:00
Tom Lane	2daeba6a4e	Doc: show how to get the equivalent of LIMIT for UPDATE/DELETE. Add examples showing use of a CTE and a self-join to perform partial UPDATEs and DELETEs. Corey Huinker, reviewed by Laurenz Albe Discussion: https://postgr.es/m/CADkLM=caNEQsUwPWnfi2jR4ix99E0EJM_3jtcE-YjnEQC7Rssw@mail.gmail.com	2024-04-07 16:26:47 -04:00
Tom Lane	1973d9fb31	Doc: update documentation about EXCLUDE constraint elements. What the documentation calls an exclude_element is an index_elem according to gram.y, and it allows all the same options that a CREATE INDEX column specification does. The COLLATE patch neglected to update the CREATE/ALTER TABLE docs about that, and later the opclass-parameters patch made the same oversight. Add those options to the syntax synopses, and polish the associated text a bit. Back-patch to v13 where opclass parameters came in. We could update v12 with just the COLLATE omission, but it doesn't quite seem worth the trouble at this point. Shihao Zhong, reviewed by Daniel Vérité, Shubham Khanna and myself Discussion: https://postgr.es/m/CAGRkXqShbVyB8E3gapfdtuwiWTiK=Q67Qb9qwxu=+-w0w46EBA@mail.gmail.com	2024-04-07 15:36:11 -04:00
Tom Lane	4643a2b265	Support retrieval of results in chunks with libpq. This patch generalizes libpq's existing single-row mode to allow individual partial-result PGresults to contain up to N rows, rather than always one row. This reduces malloc overhead compared to plain single-row mode, and it is very useful for psql's FETCH_COUNT feature, since otherwise we'd have to add code (and cycles) to either merge single-row PGresults into a bigger one or teach psql's results-printing logic to accept arrays of PGresults. To avoid API breakage, PQsetSingleRowMode() remains the same, and we add a new function PQsetChunkedRowsMode() to invoke the more general case. Also, PGresults obtained the old way continue to carry the PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is used then their status code is PGRES_TUPLES_CHUNK. The underlying logic is the same either way, though. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com	2024-04-06 20:45:11 -04:00
Alexander Korotkov	87c21bb941	Implement ALTER TABLE ... SPLIT PARTITION ... command This new DDL command splits a single partition into several parititions. Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are created using createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:44 +03:00
Alexander Korotkov	1adf16b8fb	Implement ALTER TABLE ... MERGE PARTITIONS ... command This new DDL command merges several partitions into the one partition of the target table. The target partition is created using new createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires	2024-04-07 01:18:43 +03:00
Peter Geoghegan	5bf748b86b	Enhance nbtree ScalarArrayOp execution. Commit `9e8da0f7` taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit `807a40c5` taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit `a4523c5a`, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com	2024-04-06 11:47:10 -04:00
Thomas Munro	98f320eb2e	Increase default vacuum_buffer_usage_limit to 2MB. The BAS_VACUUM ring size has been 256kB since commit `d526575f` introduced the mechanism 17 years ago. Commit `1cbbee03` recently made it configurable but retained the traditional default. The correct default size has been debated for years, but 256kB is certainly very small. VACUUM soon needs to write back data it dirtied only 32 blocks ago, which usually requires flushing the WAL. New experiments in prefetching pages for VACUUM exacerbated the problem by crashing into dirty data even sooner. Let's make the default 2MB. That's 1.6% of the default toy buffer pool size, and 0.2% of 1GB, which would be a considered a small shared_buffers setting for a real system these days. Users are still free to set the GUC to a different value. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com	2024-04-06 23:12:03 +13:00
Tomas Vondra	f8ce4ed78c	Allow copying files using clone/copy_file_range Adds --clone/--copy-file-range options to pg_combinebackup, to allow copying files using file cloning or copy_file_range(). These methods may be faster than the standard block-by-block copy, but the main advantage is that they enable various features provided by CoW filesystems. This commit only uses these copy methods for files that did not change and can be copied as a whole from a single backup. These new copy methods may not be available on all platforms, in which case the command throws an error (immediately, even if no files would be copied as a whole). This early failure seems better than failing later when trying to copy the first file, after performing a lot of work on earlier files. If the requested copy method is available, but a checksum needs to be recalculated (e.g. because of a different checksum type), the file is still copied using the requested method, but it is also read for the checksum calculation. Depending on the filesystem this may be more expensive than just performing the simple copy, but it does enable the CoW benefits. Initial patch by Jakub Wartak, various reworks and improvements by me. Author: Tomas Vondra, Jakub Wartak Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com	2024-04-05 18:01:32 +02:00
Robert Haas	fe8eaa5442	docs: Merge separate chapters on built-in index AMs into one. The documentation index is getting very long, which makes it hard to find things. Since these chapters are all very similar in structure and content, merging them is a natural way of reducing the size of the toplevel index. Rather than actually combining all of the SGML into a single file, keep one file per <sect1>, and add a glue file that includes all of them. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-05 10:34:04 -04:00
Amit Kapila	6f132ed693	Allow synced slots to have their inactive_since. This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com	2024-04-05 09:48:49 +05:30
Amit Langote	de3600452b	Add basic JSON_TABLE() functionality JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in `6185c9737c` are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-04-04 20:20:15 +09:00
Tom Lane	06286709ee	Invent SERIALIZE option for EXPLAIN. EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting. Stepan Rutz and Matthias van de Meent, reviewed by Tomas Vondra and myself Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de	2024-04-03 17:41:57 -04:00
Robert Haas	f470b5c679	docs: Demote "Monitoring Disk Usage" from chapter to section. This chapter is very short, and the immediately preceding chapter is called "Monitoring Database Activity". So, instead of having a separate chapter for this, make it the last section of the preceding chapter instead. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com	2024-04-03 16:09:41 -04:00
Nathan Bossart	c627d944e6	Add built-in ERROR handling for archive callbacks. Presently, the archiver process restarts when an archive callback ERRORs. To avoid this, archive module authors can use sigsetjmp(), manage a memory context, etc., but that requires a lot of extra code that will likely look roughly the same between modules. This commit adds basic archive callback ERROR handling to pgarch.c so that module authors won't ordinarily need to worry about this. While this built-in handler attempts to clean up anything that an archive module could conceivably have left behind, it is possible that some modules are doing unexpected things that require additional cleanup. Module authors should be sure to do any extra required cleanup in a PG_CATCH block within the archiving callback. The archiving callback is now called in a short-lived memory context that the archiver process resets between invocations. If a module requires longer-lived storage, it must maintain its own memory context. Thanks to these changes, the basic_archive module can be greatly simplified. Suggested-by: Andres Freund Reviewed-by: Andres Freund, Yong Li Discussion: https://postgr.es/m/20230217215624.GA3131134%40nathanxps13	2024-04-02 22:28:11 -05:00
Alexander Korotkov	06c418e163	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed before starting the transaction. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes on standby. The queue of waiters is stored in the shared memory array sorted by LSN. During replay of WAL waiters whose LSNs are already replayed are deleted from the shared memory array and woken up by setting of their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working in a non-atomic context, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira	2024-04-02 22:48:03 +03:00
Robert Haas	f5e4dedfa8	Expose PQsocketPoll via libpq This is useful when connecting to a database asynchronously via PQconnectStart(), since it handles deciding between poll() and select(), and some of the required boilerplate. Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte Fennema-Nio, and me. Discussion: http://postgr.es/m/D08WWCPVHKHN.3QELIKZJ2D9RZ@neon.tech	2024-04-02 10:15:56 -04:00
Thomas Munro	210622c60e	Provide vectored variant of ReadBuffer(). Break ReadBuffer() up into two steps. StartReadBuffers() and WaitReadBuffers() give us two main advantages: 1. Multiple consecutive blocks can be read with one system call. 2. Advice (hints of future reads) can optionally be issued to the kernel ahead of time. The traditional ReadBuffer() function is now implemented in terms of those functions, to avoid duplication. A new GUC io_combine_limit is defined, and the functions for limiting per-backend pin counts are made into public APIs. Those are provided for use by callers of StartReadBuffers(), when deciding how many buffers to read at once. The following commit will add a higher level mechanism for doing that automatically with a practical interface. With some more infrastructure in later work, StartReadBuffers() could be extended to start real asynchronous I/O instead of just issuing advice and leaving WaitReadBuffers() to do the work synchronously. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> (some optimization tweaks) Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:23:20 +13:00
Masahiko Sawada	667e65aac3	Use TidStore for dead tuple TIDs storage during lazy vacuum. Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit `30e144287a`. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-04-02 10:15:37 +09:00
Tom Lane	959b38d770	Invent --transaction-size option for pg_restore. This patch allows pg_restore to wrap its commands into transaction blocks, somewhat like --single-transaction, except that we commit and start a new block after every N objects. Using this mode with a size limit of 1000 or so objects greatly reduces the number of transactions consumed by the restore, while preventing any one transaction from taking enough locks to overrun the receiving server's shared lock table. (A value of 1000 works well with the default lock table size of around 6400 locks. Higher --transaction-size values can be used if one has increased the receiving server's lock table size.) Excessive consumption of XIDs has been reported as a problem for pg_upgrade in particular, but it could be bad for any restore; and the change also reduces the number of fsyncs and amount of WAL generated, so it should provide speed benefits too. This patch does not try to make parallel workers batch the SQL commands they issue. The trouble with doing that is that other workers may need to see the objects a worker creates right away. Possibly this can be improved later. In this patch I have hard-wired pg_upgrade to use a transaction size of 1000 divided by the number of parallel restore jobs allowed (without that, we'd still be at risk of overrunning the shared lock table). Perhaps there would be value in adding another pg_upgrade option to allow user control of that, but I'm unsure that it's worth the trouble; I think few users would use it, and any who did would see not that much benefit compared to the default. Patch by me, but the original idea to batch SQL commands during restore is due to Robins Tharakan. Discussion: https://postgr.es/m/a9f9376f1c3343a6bb319dce294e20ac@EX13D05UWC001.ant.amazon.com	2024-04-01 16:46:24 -04:00

1 2 3 4 5 ...

17065 Commits