postgresql

Commit Graph

Author	SHA1	Message	Date
Alvaro Herrera	3fe773b149	Track detached partitions more accurately in partdescs In `d6b8d29419` I (Álvaro) was sloppy about recording whether a partition descripor does or does not include detached partitions, when the snapshot checking does not see the pg_inherits row marked detached. In that case no partition was omitted, yet in the relcache entry we were saving the partdesc as omitting partitions. Flip that (so we save it as a partdesc not omitting partitions, which indeed it doesn't), which hopefully makes the code easier to reason about. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqE7GxGU4VdzwZzfiz+Ont5SsopoFkgtrZGEdPqWRL+biA@mail.gmail.com	2021-05-06 12:47:30 -04:00
Amit Kapila	592f00f8de	Update replication statistics after every stream/spill. Currently, replication slot statistics are updated at prepare, commit, and rollback. Now, if the transaction is interrupted the stats might not get updated. Fixed this by updating replication statistics after every stream/spill. In passing update the docs to change the description of some of the slot stats. Author: Vignesh C, Sawada Masahiko Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-05-06 11:21:26 +05:30
Andres Freund	7f2e10baa2	jit: Fix warning reported by gcc-11 caused by dubious function signature. Reported-By: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/833107370.1313189.1619647621213@webmailclassic.xs4all.nl Backpatch: 13, where `b059d2f456` introduced the issue.	2021-05-05 22:13:55 -07:00
Amit Kapila	2ce353fc19	Tighten the concurrent abort check during decoding. During decoding of an in-progress or prepared transaction, we detect concurrent abort with an error code ERRCODE_TRANSACTION_ROLLBACK. That is not sufficient because a callback can decide to throw that error code at other times as well. Reported-by: Tom Lane Author: Amit Kapila Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/CAA4eK1KCjPRS4aZHB48QMM4J8XOC1+TD8jo-4Yu84E+MjwqVhA@mail.gmail.com	2021-05-06 08:26:42 +05:30
Alvaro Herrera	c250062df4	Remove unused argument of ATAddForeignConstraint Commit `0325d7a595` made this unused but forgot to remove it. Do so now. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/209c99fe-b9a2-94f4-cd68-a8304186a09e@lab.ntt.co.jp	2021-05-05 12:27:39 -04:00
Alvaro Herrera	6f70d7ca1d	Have ALTER CONSTRAINT recurse on partitioned tables When ALTER TABLE .. ALTER CONSTRAINT changes deferrability properties changed in a partitioned table, we failed to propagate those changes correctly to partitions and to triggers. Repair by adding a recursion mechanism to affect all derived constraints and all derived triggers. (In particular, recurse to partitions even if their respective parents are already in the desired state: it is possible for the partitions to have been altered individually.) Because foreign keys involve tables in two sides, we cannot use the standard ALTER TABLE recursion mechanism, so we invent our own by following pg_constraint.conparentid down. When ALTER TABLE .. ALTER CONSTRAINT is invoked on the derived pg_constraint object that's automaticaly created in a partition as a result of a constraint added to its parent, raise an error instead of pretending to work and then failing to modify all the affected triggers. Before this commit such a command would be allowed but failed to affect all triggers, so it would silently misbehave. (Restoring dumps of existing databases is not affected, because pg_dump does not produce anything for such a derived constraint anyway.) Add some tests for the case. Backpatch to 11, where foreign key support was added to partitioned tables by commit `3de241dba8`. (A related change is commit `f56f8f8da6` in pg12 which added support for FKs referencing partitioned tables; this is what forces us to use an ad-hoc recursion mechanism for this.) Diagnosed by Tom Lane from bug report from Ron L Johnson. As of this writing, no reviews were offered. Discussion: https://postgr.es/m/75fe0761-a291-86a9-c8d8-4906da077469@gmail.com Discussion: https://postgr.es/m/3144850.1607369633@sss.pgh.pa.us	2021-05-05 12:21:50 -04:00
Peter Eisentraut	38f36aad8c	GUC description improvements for clarity	2021-05-05 08:18:22 +02:00
Alvaro Herrera	e798d095da	Fix OID passed to object-alter hook during ALTER CONSTRAINT The OID of the constraint is used instead of the OID of the trigger -- an easy mistake to make. Apparently the object-alter hooks are not very well tested :-( Backpatch to 12, where this typo was introduced by `578b229718` Discussion: https://postgr.es/m/20210503231633.GA6994@alvherre.pgsql	2021-05-04 10:09:12 -04:00
Peter Eisentraut	a970edbed3	Fix ALTER TABLE / INHERIT with generated columns When running ALTER TABLE t2 INHERIT t1, we must check that columns in t2 that correspond to a generated column in t1 are also generated and have the same generation expression. Otherwise, this would allow creating setups that a normal CREATE TABLE sequence would not allow. Discussion: https://www.postgresql.org/message-id/22de27f6-7096-8d96-4619-7b882932ca25@2ndquadrant.com	2021-05-04 12:09:08 +02:00
Bruce Momjian	f7a97b6ec3	Update query_id computation Properly fix: - the "ONLY" in FROM [ONLY] isn't hashed - the agglevelsup field in GROUPING isn't hashed - WITH TIES not being hashed (new in PG 13) - "DISTINCT" in "GROUP BY [DISTINCT]" isn't hashed (new in PG 14) Reported-by: Julien Rouhaud Discussion: https://postgr.es/m/20210425081119.ulyzxqz23ueh3wuj@nol	2021-05-03 14:59:39 -04:00
Tom Lane	f68970e33f	Fix performance issue in new regex match-all detection code. Commit `824bf7190` introduced a new search of the NFAs generated by regex compilation. I failed to think hard about the performance characteristics of that search, with the predictable outcome that it's bad: weird regexes can trigger exponential search time. Worse, there's no check-for-interrupt in that code, so you can't even cancel the query if this happens. Fix by introducing memo-ization of the search results, so that any one NFA state need be examined in detail just once. This potentially uses a lot of memory, but we can bound the memory usage by putting a limit on the number of states for which we'll try to prove match-all-ness. That is sane because we already have a limit (DUPINF) on the maximum finite string length that a matchall regex can match; and patterns that involve much more than DUPINF states would probably exceed that limit anyway. Also, rearrange the logic so that we check the basic is-the-graph- all-RAINBOW-arcs property before we start the recursive search to determine path lengths. This will ensure that we fall out quickly whenever the NFA couldn't possibly be matchall. Also stick in a check-for-interrupt, just in case these measures don't completely eliminate the risk of slowness. Discussion: https://postgr.es/m/3483895.1619898362@sss.pgh.pa.us	2021-05-03 11:42:31 -04:00
Peter Eisentraut	b94409a02f	Prevent lwlock dtrace probes from unnecessary work If dtrace is compiled in but disabled, the lwlock dtrace probes still evaluate their arguments. Since PostgreSQL 13, T_NAME(lock) does nontrivial work, so it should be avoided if not needed. To fix, make these calls conditional on the *_ENABLED() macro corresponding to each probe. Reviewed-by: Craig Ringer <craig.ringer@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com	2021-05-03 12:18:27 +02:00
Peter Eisentraut	c285babf8f	Remove unused function argument became unused by `04942bffd0`	2021-05-03 09:05:58 +02:00
Amit Kapila	205f466282	Fix the computation of slot stats for 'total_bytes'. Previously, we were using the size of all the changes present in ReorderBuffer to compute total_bytes after decoding a transaction and that can lead to counting some of the transactions' changes more than once. Fix it by using the size of the changes decoded for a transaction to compute 'total_bytes'. Author: Sawada Masahiko Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-05-03 07:22:08 +05:30
Alexander Korotkov	eb086056fe	Make websearch_to_tsquery() parse text in quotes as a single token websearch_to_tsquery() splits text in quotes into tokens and connects them with phrase operator on its own. However, that leads to surprising results when the token contains no words. For instance, websearch_to_tsquery('"aaa: bbb"') is 'aaa <2> bbb', because it is equivalent of to_tsquery(E'aaa <-> \':\' <-> bbb'). But websearch_to_tsquery('"aaa: bbb"') has to be 'aaa <-> bbb' in order to match to_tsvector('aaa: bbb'). Since `0c4f355c6a`, we anyway connect lexemes of complex tokens with phrase operators. Thus, let's just websearch_to_tsquery() parse text in quotes as a single token. Therefore, websearch_to_tsquery() should process the quoted text in the same way phraseto_tsquery() does. This solution is what we exactly need and also simplifies the code. This commit is an incompatible change, so we don't backpatch it. Reported-by: Valentin Gatien-Baron Discussion: https://postgr.es/m/CA%2B0DEqiZs7gdOd4ikmg%3D0UWG%2BSwWOLxPsk_JW-sx9WNOyrb0KQ%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Tom Lane, Zhihong Yu	2021-05-03 04:18:19 +03:00
Bruce Momjian	651d005e76	Revert use singular for -1 (commits `9ee7d533da` and `5da9868ed9` Turns out you can specify negative values using plurals: https://english.stackexchange.com/questions/9735/is-1-followed-by-a-singular-or-plural-noun so the previous code was correct enough, and consistent with other usage in our code. Also add comment in the two places where this could be confused. Reported-by: Noah Misch Diagnosed-by: 20210425115726.GA2353095@rfd.leadboat.com	2021-05-01 10:42:44 -04:00
Tom Lane	2efcd502e5	Disallow calling anything but plain functions via the fastpath API. Reject aggregates, window functions, and procedures. Aggregates failed anyway, though with a somewhat obscure error message. Window functions would hit an Assert or null-pointer dereference. Procedures seemed to work as long as you didn't try to do transaction control, but (a) transaction control is sort of the point of a procedure, and (b) it's not entirely clear that no bugs lurk in that path. Given the lack of testing of this area, it seems safest to be conservative in what we support. Also reject proretset functions, as the fastpath protocol can't support returning a set. Also remove an easily-triggered assertion that the given OID isn't 0; the subsequent lookups can handle that case themselves. Per report from Theodor-Arsenij Larionov-Trichkin. Back-patch to all supported branches. (The procedure angle only applies in v11+, of course.) Discussion: https://postgr.es/m/2039442.1615317309@sss.pgh.pa.us	2021-04-30 14:10:26 -04:00
Amit Kapila	ee4ba01dbb	Fix the bugs in selecting the transaction for streaming. There were two problems: a. We were always selecting the next available txn instead of selecting it when it is larger than the previous transaction. b. We were selecting the transactions which haven't made any changes to the database (base snapshot is not set). Later it was hitting an Assert because we don't decode such transactions and the changes in txn remain as it is. It is better not to choose such transactions for streaming in the first place. Reported-by: Haiying Tang Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB61133B94E63177040F7ECDA1FB429@OS0PR01MB6113.jpnprd01.prod.outlook.com	2021-04-30 10:49:52 +05:30
David Rowley	3c80e96dff	Adjust EXPLAIN output for parallel Result Cache plans Here we adjust the EXPLAIN ANALYZE output for Result Cache so that we don't show any Result Cache stats for parallel workers who don't contribute anything to Result Cache plan nodes. I originally had ideas that workers who don't help could still have their Result Cache stats displayed. The idea with that was so that I could write some parallel Result Cache regression tests that show the EXPLAIN ANALYZE output. However, I realized a little too late that such tests would just not be possible to have run in a stable way on the buildfarm. With that knowledge, before `9eacee2e6` went in, I had removed all of the tests that were showing the EXPLAIN ANALYZE output of a parallel Result Cache plan, however, I forgot to put back the code that adjusts the EXPLAIN output to hide the Result Cache stats for parallel workers who were not fast enough to help out before query execution was over. All other nodes behave this way and so should Result Cache. Additionally, with this change, it now seems safe enough to remove the SET force_parallel_mode = off that I had added to the regression tests. Also, perform some cleanup in the partition_prune tests. I had adjusted the explain_parallel_append() function to sanitize the Result Cache EXPLAIN ANALYZE output. However, since I didn't actually include any parallel Result Cache tests that show their EXPLAIN ANALYZE output, that code does nothing and can be removed. In passing, move the setting of memPeakKb into the scope where it's used. Reported-by: Amit Khandekar Author: David Rowley, Amit Khandekar Discussion: https://postgr.es/m/CAJ3gD9d8SkfY95GpM1zmsOtX2-Ogx5q-WLsf8f0ykEb0hCRK3w@mail.gmail.com	2021-04-30 14:46:42 +12:00
Peter Eisentraut	3a948ea0a2	pg_hba.conf.sample: Reword connection type section Improve the wording in the connection type section of pg_hba.conf.sample a bit. After the hostgssenc part was added on, the whole thing became a bit wordy, and it's also a bit inaccurate for example in that the current wording for "host" appears to say that it does not apply to GSS-encrypted connections. Discussion: https://www.postgresql.org/message-id/flat/fc06dcc5-513f-e944-cd07-ba51dd7c6916%40enterprisedb.com	2021-04-29 07:00:20 +02:00
Tom Lane	9626325da5	Add heuristic incoming-message-size limits in the server. We had a report of confusing server behavior caused by a client bug that sent junk to the server: the server thought the junk was a very long message length and waited patiently for data that would never come. We can reduce the risk of that by being less trusting about message lengths. For a long time, libpq has had a heuristic rule that it wouldn't believe large message size words, except for a small number of message types that are expected to be (potentially) long. This provides some defense against loss of message-boundary sync and other corrupted-data cases. The server does something similar, except that up to now it only limited the lengths of messages received during the connection authentication phase. Let's do the same as in libpq and put restrictions on the allowed length of all messages, while distinguishing between message types that are expected to be long and those that aren't. I used a limit of 10000 bytes for non-long messages. (libpq's corresponding limit is 30000 bytes, but given the asymmetry of the FE/BE protocol, there's no good reason why the numbers should be the same.) Experimentation suggests that this is at least a factor of 10, maybe a factor of 100, more than we really need; but plenty of daylight seems desirable to avoid false positives. In any case we can adjust the limit based on beta-test results. For long messages, set a limit of MaxAllocSize - 1, which is the most that we can absorb into the StringInfo buffer that the message is collected in. This just serves to make sure that a bogus message size is reported as such, rather than as a confusing gripe about not being able to enlarge a string buffer. While at it, make sure that non-mainline code paths (such as COPY FROM STDIN) are as paranoid as SocketBackend is, and validate the message type code before believing the message length. This provides an additional guard against getting stuck on corrupted input. Discussion: https://postgr.es/m/2003757.1619373089@sss.pgh.pa.us	2021-04-28 15:50:46 -04:00
Alvaro Herrera	d6b8d29419	Allow a partdesc-omitting-partitions to be cached Makes partition descriptor acquisition faster during the transient period in which a partition is in the process of being detached. This also adds the restriction that only one partition can be in pending-detach state for a partitioned table. While at it, return find_inheritance_children() API to what it was before `71f4c8c6f7`, and create a separate find_inheritance_children_extended() that returns detailed info about detached partitions. (This incidentally fixes a bug in `8aba932251` whereby a memory context holding a transient partdesc is reparented to a NULL PortalContext, leading to permanent leak of that memory. The fix is to no longer rely on reparenting contexts to PortalContext. Reported by Amit Langote.) Per gripe from Amit Langote Discussion: https://postgr.es/m/CA+HiwqFgpP1LxJZOBYGt9rpvTjXXkg5qG2+Xch2Z1Q7KrqZR1A@mail.gmail.com	2021-04-28 15:44:35 -04:00
Michael Paquier	f93f0b5b25	Fix use-after-release issue with pg_identify_object_as_address() Spotted by buildfarm member prion, with -DRELCACHE_FORCE_RELEASE. Introduced in `f7aab36`. Discussion: https://postgr.es/m/2759018.1619577848@sss.pgh.pa.us Backpatch-through: 9.6	2021-04-28 11:58:08 +09:00
Michael Paquier	f7aab36d61	Fix pg_identify_object_as_address() with event triggers Attempting to use this function with event triggers failed, as, since its introduction in `a676201`, this code has never associated an object name with event triggers. This addresses the failure by adding the event trigger name to the set defining its object address. Note that regression tests are added within event_trigger and not object_address to avoid issues with concurrent connections in parallel schedules. Author: Joel Jacobson Discussion: https://postgr.es/m/3c905e77-a026-46ae-8835-c3f6cd1d24c8@www.fastmail.com Backpatch-through: 9.6	2021-04-28 11:17:58 +09:00
Fujii Masao	8e9ea08bae	Don't pass "ONLY" options specified in TRUNCATE to foreign data wrapper. Commit `8ff1c94649` allowed TRUNCATE command to truncate foreign tables. Previously the information about "ONLY" options specified in TRUNCATE command were passed to the foreign data wrapper. Then postgres_fdw constructed the TRUNCATE command to issue the remote server and included "ONLY" options in it based on the passed information. On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE have no effect when accessing or modifying the remote table, i.e., are not passed to the foreign data wrapper. So it's inconsistent to make only TRUNCATE command pass the "ONLY" options to the foreign data wrapper. Therefore this commit changes the TRUNCATE command so that it doesn't pass the "ONLY" options to the foreign data wrapper, for the consistency with other statements. Also this commit changes postgres_fdw so that it always doesn't include "ONLY" options in the TRUNCATE command that it constructs. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com	2021-04-27 14:41:27 +09:00
Amit Kapila	3fa17d3771	Use HTAB for replication slot statistics. Previously, we used to use the array of size max_replication_slots to store stats for replication slots. But that had two problems in the cases where a message for dropping a slot gets lost: 1) the stats for the new slot are not recorded if the array is full and 2) writing beyond the end of the array if the user reduces the max_replication_slots. This commit uses HTAB for replication slot statistics, resolving both problems. Now, pgstat_vacuum_stat() search for all the dead replication slots in stats hashtable and tell the collector to remove them. To avoid showing the stats for the already-dropped slots, pg_stat_replication_slots view searches slot stats by the slot name taken from pg_replication_slots. Also, we send a message for creating a slot at slot creation, initializing the stats. This reduces the possibility that the stats are accumulated into the old slot stats when a message for dropping a slot gets lost. Reported-by: Andres Freund Author: Sawada Masahiko, test case by Vignesh C Reviewed-by: Amit Kapila, Vignesh C, Dilip Kumar Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-04-27 09:09:11 +05:30
Amit Kapila	e7eea52b2d	Fix Logical Replication of Truncate in synchronous commit mode. The Truncate operation acquires an exclusive lock on the target relation and indexes. It then waits for logical replication of the operation to finish at commit. Now because we are acquiring the shared lock on the target index to get index attributes in pgoutput while sending the changes for the Truncate operation, it leads to a deadlock. Actually, we don't need to acquire a lock on the target index as we build the cache entry using a historic snapshot and all the later changes are absorbed while decoding WAL. So, we wrote a special purpose function for logical replication to get a bitmap of replica identity attribute numbers where we get that information without locking the target index. We decided not to backpatch this as there doesn't seem to be any field complaint about this issue since it was introduced in commit `5dfd1e5a` in v11. Reported-by: Haiying Tang Author: Takamichi Osumi, test case by Li Japin Reviewed-by: Amit Kapila, Ajin Cherian Discussion: https://postgr.es/m/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com	2021-04-27 08:28:26 +05:30
Tom Lane	04942bffd0	Remove rewriteTargetListIU's expansion of view targetlists in UPDATE. Commit `2ec993a7c`, which added triggers on views, modified the rewriter to add dummy entries like "SET x = x" for all columns that weren't actually being updated by the user in any UPDATE directed at a view. That was needed at the time to produce a complete "NEW" row to pass to the trigger. Later it was found to cause problems for ordinary updatable views, so commit `cab5dc5da` restricted it to happen only for trigger-updatable views. But in the wake of commit `86dc90056`, we really don't need it at all. nodeModifyTable.c populates the trigger "OLD" row from the whole-row variable that is generated for the view, and then it computes the "NEW" row using that old row and the UPDATE targetlist. So there is no need for the UPDATE tlist to have dummy entries, any more than it needs them for regular tables or other types of views. (The comments for rewriteTargetListIU suggest that we must do this for correct expansion of NEW references in rules, but I now think that that was just lazy comment editing in `2ec993a7c`. If we didn't need it for rules on views before there were triggers, we don't need it after that.) This essentially propagates 86dc90056's decision that we don't need dummy column updates into the view case. Aside from making the different cases more uniform and hence possibly forestalling future bugs, it ought to save a little bit of rewriter/planner effort. Discussion: https://postgr.es/m/2181213.1619397634@sss.pgh.pa.us	2021-04-26 13:58:00 -04:00
Amit Kapila	f25a4584c6	Avoid sending prepare multiple times while decoding. We send the prepare for the concurrently aborted xacts so that later when rollback prepared is decoded and sent, the downstream should be able to rollback such a xact. For 'streaming' case (when we send changes for in-progress transactions), we were sending prepare twice when concurrent abort was detected. Author: Peter Smith Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com	2021-04-26 11:27:44 +05:30
Amit Kapila	6d2e87a077	Fix typo in reorderbuffer.c. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+PtvzuYY0zu=dVRK_WVz5WGos1+otZWgEWqjha1ncoSRag@mail.gmail.com	2021-04-26 08:42:46 +05:30
Tom Lane	08a9869665	Update comments for rewriteTargetListIU(). This function's behavior for UPDATE on a trigger-updatable view was justified by analogy to what preptlist.c used to do for UPDATE on regular tables. Since preptlist.c hasn't done that since `86dc90056`, that argument is no longer sensible, let alone convincing. I think we do still need it to act that way, so update the comment to explain why.	2021-04-25 18:02:03 -04:00
Michael Paquier	9b5558e7ad	Fix come comments in execMain.c `1375422` has refactored this area of the executor code, and some comments went out-of-sync. Author: Yukun Wang Reviewed-by: Amul Sul Discussion: https://postgr.es/m/OS0PR01MB60033394FCAEF79B98F078F5B4459@OS0PR01MB6003.jpnprd01.prod.outlook.com	2021-04-24 15:07:04 +09:00
Michael Paquier	4aba61b870	Add some forgotten LSN_FORMAT_ARGS() in xlogreader.c `6f6f284` has introduced a specific macro to make printf()-ing of LSNs easier. This takes care of what looks like the remaining code paths that did not get the call. Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/YIJS9x6K8ruizN7j@paquier.xyz	2021-04-24 09:09:02 +09:00
Peter Eisentraut	82c3cd9741	Factor out system call names from error messages Instead, put them in via a format placeholder. This reduces the number of distinct translatable messages and also reduces the chances of typos during translation. We already did this for the system call arguments in a number of cases, so this is just the same thing taken a bit further. Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com	2021-04-23 14:21:37 +02:00
Peter Eisentraut	9486844f30	Use correct format placeholder for WSAGetLastError() Some code thought this was unsigned, but it's signed int.	2021-04-23 14:21:37 +02:00
Alexander Korotkov	6bbcff096f	Mark multirange_constructor0() and multirange_constructor2() strict These functions shouldn't receive null arguments: multirange_constructor0() doesn't have any arguments while multirange_constructor2() has a single array argument, which is never null. But mark them strict anyway for the sake of uniformity. Also, make checks for null arguments use elog() instead of ereport() as these errors should normally be never thrown. And adjust corresponding comments. Catversion is bumped. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/0f783a96-8d67-9e71-996b-f34a7352eeef%40enterprisedb.com	2021-04-23 13:25:45 +03:00
Fujii Masao	3f20d5f370	Reorder COMPRESSION option in gram.y and parsenodes.h into alphabetical order. Commit `bbe0a81db6` introduced "INCLUDING COMPRESSION" option in CREATE TABLE command, but previously TableLikeOption in gram.y and parsenodes.h didn't classify this new option in alphabetical order with the rest. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/YHerAixOhfR1ryXa@paquier.xyz	2021-04-23 19:10:24 +09:00
Peter Eisentraut	7776a23a4b	Fix incorrect format placeholder	2021-04-23 07:21:13 +02:00
Michael Paquier	45c0c5f70e	Fix some comments in fmgr.c Oversight in `2a0faed`. Author: Hou Zhijie Discussion: https://postgr.es/m/OS0PR01MB5716405E2464D85E6DB6DC0794469@OS0PR01MB5716.jpnprd01.prod.outlook.com	2021-04-23 13:34:02 +09:00
Michael Paquier	62aa2bb293	Remove use of [U]INT64_FORMAT in some translatable strings %lld with (long long), or %llu with (unsigned long long) are more adapted. This is similar to `3286065`. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20210421.200000.1462448394029407895.horikyota.ntt@gmail.com	2021-04-23 13:25:49 +09:00
Etsuro Fujita	bb684c82f7	Minor code cleanup in asynchronous execution support. This is cleanup for commit 27e1f1456: * ExecAppendAsyncEventWait(), which was modified a bit further by commit `a8af856d3`, duplicated the same nevents calculation. Simplify the code a little bit to avoid the duplication. Update comments there. * Add an assertion to ExecAppendAsyncRequest(). * Update a comment about merging the async_capable options from input relations in merge_fdw_options(), per complaint from Kyotaro Horiguchi. * Add a comment for fetch_more_data_begin(). Author: Etsuro Fujita Discussion: https://postgr.es/m/CAPmGK1637W30Wx3MnrReewhafn6F_0J76mrJGoFXFnpPq4QfvA%40mail.gmail.com	2021-04-23 12:00:00 +09:00
Tom Lane	d479d00285	Don't crash on reference to an un-available system column. Adopt a more consistent policy about what slot-type-specific getsysattr functions should do when system attributes are not available. To wit, they should all throw the same user-oriented error, rather than variously crashing or emitting developer-oriented messages. This closes a identifiable problem in commits `a71cfc56b` and `3fb93103a` (in v13 and v12), so back-patch into those branches, along with a test case to try to ensure we don't break it again. It is not known that any of the former crash cases are reachable in HEAD, but this seems like a good safety improvement in any case. Discussion: https://postgr.es/m/141051591267657@mail.yandex.ru	2021-04-22 17:30:55 -04:00
Alvaro Herrera	43b55ec4bc	Fix uninitialized memory bug Have interested callers of find_inheritance_children set the detached_exist value to false prior to calling it, so that that routine only has to set it true in the rare cases where it is necessary. Don't touch it otherwise. Per buildfarm member thorntail (which reported a UBSan failure here).	2021-04-22 16:04:48 -04:00
Alvaro Herrera	8aba932251	Fix relcache inconsistency hazard in partition detach During queries coming from ri_triggers.c, we need to omit partitions that are marked pending detach -- otherwise, the RI query is tricked into allowing a row into the referencing table whose corresponding row is in the detached partition. Which is bogus: once the detach operation completes, the row becomes an orphan. However, the code was not doing that in repeatable-read transactions, because relcache kept a copy of the partition descriptor that included the partition, and used it in the RI query. This commit changes the partdesc cache code to only keep descriptors that aren't dependent on a snapshot (namely: those where no detached partition exist, and those where detached partitions are included). When a partdesc-without- detached-partitions is requested, we create one afresh each time; also, those partdescs are stored in PortalContext instead of CacheMemoryContext. find_inheritance_children gets a new output *detached_exist boolean, which indicates whether any partition marked pending-detach is found. Its "include_detached" input flag is changed to "omit_detached", because that name captures desired the semantics more naturally. CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are identically renamed. This was noticed because a buildfarm member that runs with relcache clobbering, which would not keep the improperly cached partdesc, broke one test, which led us to realize that the expected output of that test was bogus. This commit also corrects that expected output. Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us	2021-04-22 15:13:25 -04:00
Michael Paquier	f3b141c482	Fix relation leak for subscribers firing triggers in logical replication Creating a trigger on a relation to which an apply operation is triggered would cause a relation leak once the change gets committed, as the executor would miss that the relation needs to be closed beforehand. This issue got introduced with the refactoring done in `1375422c`, where it becomes necessary to track relations within es_opened_result_relations to make sure that they are closed. We have discussed using ExecInitResultRelation() coupled with ExecCloseResultRelations() for the relations in need of tracking by the apply operations in the subscribers, which would simplify greatly the opening and closing of indexes, but this requires a larger rework and reorganization of the worker code, particularly for the tuple routing part. And that's not really welcome post feature freeze. So, for now, settle down to the same solution as TRUNCATE which is to fill in es_opened_result_relations with the relation opened, to make sure that ExecGetTriggerResultRel() finds them and that they get closed. The code is lightly refactored so as a relation is not registered three times for each DML code path, making the whole a bit easier to follow. Reported-by: Tang Haiying, Shi Yu, Hou Zhijie Author: Amit Langote, Masahiko Sawada, Hou Zhijie Reviewed-by: Amit Kapila, Michael Paquier Discussion: https://postgr.es/m/OS0PR01MB611383FA0FE92EB9DE21946AFB769@OS0PR01MB6113.jpnprd01.prod.outlook.com	2021-04-22 12:48:54 +09:00
Alvaro Herrera	7c298c6573	Add comment about extract_autovac_opts not holding lock Per observation from Tom Lane. Discussion: https://postgr.es/m/1901125.1617904665@sss.pgh.pa.us	2021-04-21 18:36:12 -04:00
Alvaro Herrera	7b357cc6ae	Don't add a redundant constraint when detaching a partition On ALTER TABLE .. DETACH CONCURRENTLY, we add a new table constraint that duplicates the partition constraint. But if the partition already has another constraint that implies that one, then that's unnecessary. We were already avoiding the addition of a duplicate constraint if there was an exact 'equal' match -- this just improves the quality of the check. Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20210410184226.GY6592@telsasoft.com	2021-04-21 18:12:05 -04:00
Peter Eisentraut	d84ffffe58	Add DISTINCT to information schema usage views Since pg_depend can contain duplicate entries, we need to eliminate those in information schema views that build on pg_depend, using DISTINCT. Some of the older views already did that correctly, but some of the more recently added ones didn't. (In some of these views, it might not be possible to reproduce the issue because of how the implementation happens to deduplicate dependencies while recording them, but it seems better to keep this consistent in all cases.)	2021-04-21 11:54:47 +02:00
Peter Eisentraut	544b28088f	doc: Improve hyphenation consistency	2021-04-21 08:14:43 +02:00
Peter Eisentraut	f0ec598b43	Fix typo	2021-04-21 08:07:37 +02:00
Tom Lane	783be78ca9	Improve WAL record descriptions for SP-GiST records. While tracking down the bug fixed in the preceding commit, I got quite annoyed by the low quality of spg_desc's output. Add missing fields, try to make the formatting consistent.	2021-04-20 17:01:49 -04:00
Bruce Momjian	db01f797dd	Fix interaction of log_line_prefix's query_id and log_statement log_statement is issued before query_id can be computed, so properly clear the value, and document the interaction. Reported-by: Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/YHPkU8hFi4no4NSw@paquier.xyz Author: Julien Rouhaud	2021-04-20 12:57:59 -04:00
Bruce Momjian	9660834dd8	adjust query id feature to use pg_stat_activity.query_id Previously, it was pg_stat_activity.queryid to match the pg_stat_statements queryid column. This is an adjustment to patch `4f0b0966c8`. This also adjusts some of the internal function calls to match. Catversion bumped. Reported-by: Álvaro Herrera, Julien Rouhaud Discussion: https://postgr.es/m/20210408032704.GA7498@alvherre.pgsql	2021-04-20 12:22:26 -04:00
Tom Lane	7645376774	Rename find_em_expr_usable_for_sorting_rel. I didn't particularly like this function name, as it fails to express what's going on. Also, returning the sort expression alone isn't too helpful --- typically, a caller would also need some other fields of the EquivalenceMember. But the sole caller really only needs a bool result, so let's make it "bool relation_can_be_sorted_early()". Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com	2021-04-20 11:37:36 -04:00
Tom Lane	3753982441	Fix planner failure in some cases of sorting by an aggregate. An oversight introduced by the incremental-sort patches caused "could not find pathkey item to sort" errors in some situations where a sort key involves an aggregate or window function. The basic problem here is that find_em_expr_usable_for_sorting_rel isn't properly modeling what prepare_sort_from_pathkeys will do later. Rather than hoping we can keep those functions in sync, let's refactor so that they actually share the code for identifying a suitable sort expression. With this refactoring, tlist.c's tlist_member_ignore_relabel is unused. I removed it in HEAD but left it in place in v13, in case any extensions are using it. Per report from Luc Vlaming. Back-patch to v13 where the problem arose. James Coleman and Tom Lane Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com	2021-04-20 11:32:02 -04:00
Magnus Hagander	8b4b5669cd	Fix typo in comment Author: Julien Rouhaud Backpatch-through: 11 Discussion: https://postgr.es/m/20210420121659.odjueyd4rpilorn5@nol	2021-04-20 14:35:16 +02:00
Peter Geoghegan	7136bf34f2	Document LP_DEAD accounting issues in VACUUM. Document VACUUM's soft assumption that any LP_DEAD items encountered during pruning will become LP_UNUSED items before VACUUM finishes up. This is integral to the accounting used by VACUUM to generate its final report on the table to the stats collector. It also affects how VACUUM determines which heap pages are truncatable. In both cases VACUUM is concerned with the likely contents of the page in the near future, not the current contents of the page. This state of affairs created the false impression that VACUUM's dead tuple accounting had significant difference with similar accounting used during ANALYZE. There were and are no substantive differences, at least when the soft assumption completely works out. This is far clearer now. Also document cases where things don't quite work out for VACUUM's dead tuple accounting. It's possible that a significant number of LP_DEAD items will be left behind by VACUUM, and won't be recorded as remaining dead tuples in VACUUM's statistics collector report. This behavior dates back to commit `a96c41fe`, which taught VACUUM to run without index and heap vacuuming at the user's request. The failsafe mechanism added to VACUUM more recently by commit `1e55e7d1` takes the same approach to dead tuple accounting. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=Jmtu18PrsYq3EvvZJGOmZqSO2u3bvKpx9xJa5uhNp=Q@mail.gmail.com	2021-04-19 18:55:31 -07:00
Peter Eisentraut	640b91c3ed	Use correct format placeholder for pids Should be signed, not unsigned.	2021-04-19 10:43:18 +02:00
Michael Paquier	7ef8b52cf0	Fix typos and grammar in comments and docs Author: Justin Pryzby Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com	2021-04-19 11:32:30 +09:00
Thomas Munro	8e861eaae8	Explain postmaster's treatment of SIGURG. Add a few words of comment to explain why SIGURG doesn't follow the dummy_handler pattern used for SIGUSR2, since that might otherwise appear to be a bug. Discussion: https://postgr.es/m/4006115.1618577212%40sss.pgh.pa.us	2021-04-19 10:35:51 +12:00
Peter Eisentraut	f59b58e2a1	Use correct format placeholder for block numbers Should be %u rather than %d.	2021-04-17 09:40:50 +02:00
Tom Lane	f24b156997	Rethink extraction of collation dependencies. As it stands, find_expr_references_walker() pays attention to leaf-node collation fields while ignoring the input collations of actual function and operator nodes. That seems exactly backwards from a semantic standpoint, and it leads to reporting dependencies on collations that really have nothing to do with the expression's behavior. Hence, rewrite to look at function input collations instead. This isn't completely perfect either; it fails to account for the behavior of record_eq and its siblings. (The previous coding at least gave an approximation of that, though I think it could be fooled pretty easily into considering the columns of irrelevant composite types.) We may be able to improve on this later, but for now this should satisfy the buildfarm members that didn't like `ef387bed8`. In passing fix some oversights in GetTypeCollations(), and get rid of its duplicative de-duplications. (I'm worried that it's still potentially O(N^2) or worse, but this makes it a little better.) Discussion: https://postgr.es/m/3564817.1618420687@sss.pgh.pa.us	2021-04-16 22:23:46 -04:00
Tom Lane	767982e362	Convert built-in SQL-language functions to SQL-standard-body style. Adopt the new pre-parsed representation for all built-in and information_schema SQL-language functions, except for a small number that can't presently be converted because they have polymorphic arguments. This eliminates residual hazards around search-path safety of these functions, and might provide some small performance benefits by reducing parsing costs. It seems useful also to provide more test coverage for the SQL-standard-body feature. Discussion: https://postgr.es/m/3956760.1618529139@sss.pgh.pa.us	2021-04-16 18:37:02 -04:00
Tom Lane	e809493725	Split function definitions out of system_views.sql into a new file. Invent system_functions.sql to carry the function definitions that were formerly in system_views.sql. The function definitions were already a quarter of the file and are about to be more, so it seems appropriate to give them their own home. In passing, fix an oversight in dfb75e478: it neglected to call check_input() for system_constraints.sql. Discussion: https://postgr.es/m/3956760.1618529139@sss.pgh.pa.us	2021-04-16 18:37:02 -04:00
Tom Lane	ef387bed87	Fix bogus collation-version-recording logic. recordMultipleDependencies had the wrong scope for its "version" variable, allowing a version label to leak from the collation entry it was meant for to subsequent non-collation entries. This is relatively hard to trigger because of the OID-descending order that the inputs will normally arrive in: subsequent non-collation items will tend to be pinned. But it can be exhibited easily with a custom collation. Also, don't special-case the default collation, but instead ignore pinned-ness of a collation when we've found a version for it. This avoids creating useless pg_depend entries, and removes a not-very- future-proof assumption that C, POSIX, and DEFAULT are the only pinned collations. A small problem is that, because the default collation may or may not have a version, the regression tests can't assume anything about whether dependency entries will be made for it. This seems OK though since it's now handled just the same as other collations, and we have test cases for both versioned and unversioned collations. Fixes oversights in commit `257836a75`. Thanks to Julien Rouhaud for review. Discussion: https://postgr.es/m/3564817.1618420687@sss.pgh.pa.us	2021-04-16 12:26:50 -04:00
Tom Lane	f90c708a04	Fix wrong units in two ExplainPropertyFloat calls. This is only a latent bug, since these calls are only reached for non-text output formats, and currently none of those will print the units. Still, we should get it right in case that ever changes. Justin Pryzby Discussion: https://postgr.es/m/20210415163846.GA3315@telsasoft.com	2021-04-16 11:30:27 -04:00
Amit Kapila	f5fc2f5b23	Add information of total data processed to replication slot stats. This adds the statistics about total transactions count and total transaction data logically sent to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats. Suggested-by: Andres Freund Author: Vignesh C and Amit Kapila Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-04-16 07:34:43 +05:30
Tom Lane	409723365b	Provide query source text when parsing a SQL-standard function body. Without this, we lose error cursor positions, as shown in the modified regression test result. Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us	2021-04-15 17:24:12 -04:00
Tom Lane	83efce7a1e	Revert "Cope with NULL query string in ExecInitParallelPlan()." This reverts commit `b3ee4c5038`. We don't need it in the wake of the preceding commit, which added an upstream check that the querystring isn't null. Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us	2021-04-15 17:17:45 -04:00
Tom Lane	1111b2668d	Undo decision to allow pg_proc.prosrc to be NULL. Commit `e717a9a18` changed the longstanding rule that prosrc is NOT NULL because when a SQL-language function is written in SQL-standard style, we don't currently have anything useful to put there. This seems a poor decision though, as it could easily have negative impacts on external PLs (opening them to crashes they didn't use to have, for instance). SQL-function-related code can just as easily test "is prosqlbody not null" as "is prosrc null", so there's no real gain there either. Hence, revert the NOT NULL marking removal and adjust related logic. For now, we just put an empty string into prosrc for SQL-standard functions. Maybe we'll have a better idea later, although the history of things like pg_attrdef.adsrc suggests that it's not easy to maintain a string equivalent of a node tree. This also adds an assertion that queryDesc->sourceText != NULL to standard_ExecutorStart. We'd been silently relying on that for awhile, so let's make it less silent. Also fix some overlooked documentation and test cases. Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us	2021-04-15 17:17:20 -04:00
Tom Lane	e1623b7d86	Fix obsolete comments referencing JoinPathExtraData.extra_lateral_rels. That field went away in commit `edca44b15`, but it seems that commit `45be99f8c` re-introduced some comments mentioning it. Noted by James Coleman, though this isn't exactly his proposed new wording. Also thanks to Justin Pryzby for software archaeology. Discussion: https://postgr.es/m/CAAaqYe8fxZjq3na+XkNx4C78gDqykH-7dbnzygm9Qa9nuDTePg@mail.gmail.com	2021-04-14 14:28:24 -04:00
Peter Eisentraut	07e5e66742	Improve quoting in some error messages	2021-04-14 09:11:29 +02:00
Michael Paquier	ac725ee0f9	doc: Move force_parallel_mode to section for developer options This GUC has always been classified as a planner option since its introduction in `7c944bd`, and was listed in postgresql.conf.sample. As this parameter exists for testing purposes, move it to the section dedicated to developer parameters and hence remove it from postgresql.conf.sample. This will avoid any temptation to play with it on production servers for users that should never really have to touch this parameter. The general description used for developer options is reworded a bit, to take into account the inclusion of force_parallel_mode, per a suggestion from Tom Lane. Per discussion between Tom Lane, Bruce Momjian, Justin Pryzby, Bharath Rupireddy and me. Author: Justin Pryzby, Tom Lane Discussion: https://postgr.es/m/20210403152402.GA8049@momjian.us	2021-04-14 15:55:55 +09:00
Amit Kapila	cca57c1d9b	Use NameData datatype for slotname in stats. This will make it consistent with the other usage of slotname in the code. In the passing, change pgstat_report_replslot signature to use a structure rather than multiple parameters. Reported-by: Andres Freund Author: Vignesh C Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de	2021-04-14 08:55:03 +05:30
Tomas Vondra	20661c15db	Initialize t_self and t_tableOid in statext_expressions_load The function is building a fake heap tuple, but left some of the header fields (tid and table OID) uninitialized. Per Coverity report. Reported-by: Ranier Vilela Discussion: https://postgr.es/m/CAEudQApj6h8tZ0-eP5Af5PKc5NG1YUc7=SdN_99YoHS51fKa0Q@mail.gmail.com	2021-04-14 00:46:12 +02:00
Peter Geoghegan	60f1f09ff4	Don't truncate heap when VACUUM's failsafe is in effect. It seems like a good idea to bypass heap truncation when the wraparound failsafe mechanism (which was added in commit `1e55e7d1`) is in effect. Deliberately don't bypass heap truncation in the INDEX_CLEANUP=off case, even though it is similar to the failsafe case. There is already a separate reloption (and related VACUUM parameter) for that. Reported-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoDWRh6oTN5T8wa+cpZUVpHXET8BJ8Da7WHVHpwkPP6KLg@mail.gmail.com	2021-04-13 12:58:31 -07:00
Tom Lane	6c0373ab77	Allow table-qualified variable names in ON CONFLICT ... WHERE. Previously you could only use unqualified variable names here. While that's not a functional deficiency, since only the target table can be referenced, it's a surprising inconsistency with the rules for partial-index predicates, on which this syntax is supposedly modeled. The fix for that is no harder than passing addToRelNameSpace = true to addNSItemToQuery. However, it's really pretty bogus for transformOnConflictArbiter and transformOnConflictClause to be messing with the namespace item for the target table at all. It's not theirs to manage, it results in duplicative creations of namespace items, and transformOnConflictClause wasn't even doing it quite correctly (that coding resulted in two nsitems for the target table, since it hadn't cleaned out the existing one). Hence, make transformInsertStmt responsible for setting up the target nsitem once for both these clauses and RETURNING. Also, arrange for ON CONFLICT ... UPDATE's "excluded" pseudo-relation to be added to the rangetable before we run transformOnConflictArbiter. This produces a more helpful HINT if someone writes "excluded.col" in the arbiter expression. Per bug #16958 from Lukas Eder. Although I agree this is a bug, the consequences are hardly severe, so no back-patch. Discussion: https://postgr.es/m/16958-963f638020de271c@postgresql.org	2021-04-13 15:39:41 -04:00
Tom Lane	69d5ca484b	Fix some inappropriately-disallowed uses of ALTER ROLE/DATABASE SET. Most GUC check hooks that inspect database state have special checks that prevent them from throwing hard errors for state-dependent issues when source == PGC_S_TEST. This allows, for example, "ALTER DATABASE d SET default_text_search_config = foo" when the "foo" configuration hasn't been created yet. Without this, we have problems during dump/reload or pg_upgrade, because pg_dump has no idea about possible dependencies of GUC values and can't ensure a safe restore ordering. However, check_role() and check_session_authorization() hadn't gotten the memo about that, and would throw hard errors anyway. It's not entirely clear what is the use-case for "ALTER ROLE x SET role = y", but we've now heard two independent complaints about that bollixing an upgrade, so apparently some people are doing it. Hence, fix these two functions to act more like other check hooks with similar needs. (But I did not change their insistence on being inside a transaction, as it's still not apparent that setting either GUC from the configuration file would be wise.) Also fix check_temp_buffers, which had a different form of the disease of making state-dependent checks without any exception for PGC_S_TEST. A cursory survey of other GUC check hooks did not find any more issues of this ilk. (There are a lot of interdependencies among PGC_POSTMASTER and PGC_SIGHUP GUCs, which may be a bad idea, but they're not relevant to the immediate concern because they can't be set via ALTER ROLE/DATABASE.) Per reports from Charlie Hornsby and Nathan Bossart. Back-patch to all supported branches. Discussion: https://postgr.es/m/HE1P189MB0523B31598B0C772C908088DB7709@HE1P189MB0523.EURP189.PROD.OUTLOOK.COM Discussion: https://postgr.es/m/20160711223641.1426.86096@wrigleys.postgresql.org	2021-04-13 15:10:18 -04:00
Tom Lane	c2db458c10	Redesign the caching done by get_cached_rowtype(). Previously, get_cached_rowtype() cached a pointer to a reference-counted tuple descriptor from the typcache, relying on the ExprContextCallback mechanism to release the tupdesc refcount when the expression tree using the tupdesc was destroyed. This worked fine when it was designed, but the introduction of within-DO-block COMMITs broke it. The refcount is logged in a transaction-lifespan resource owner, but plpgsql won't destroy simple expressions made within the DO block (before its first commit) until the DO block is exited. That results in a warning about a leaked tupdesc refcount when the COMMIT destroys the original resource owner, and then an error about the active resource owner not holding a matching refcount when the expression is destroyed. To fix, get rid of the need to have a shutdown callback at all, by instead caching a pointer to the relevant typcache entry. Those survive for the life of the backend, so we needn't worry about the pointer becoming stale. (For registered RECORD types, we can still cache a pointer to the tupdesc, knowing that it won't change for the life of the backend.) This mechanism has been in use in plpgsql and expandedrecord.c since commit `4b93f5799`, and seems to work well. This change requires modifying the ExprEvalStep structs used by the relevant expression step types, which is slightly worrisome for back-patching. However, there seems no good reason for extensions to be familiar with the details of these particular sub-structs. Per report from Rohit Bhogate. Back-patch to v11 where within-DO-block COMMITs became a thing. Discussion: https://postgr.es/m/CAAV6ZkQRCVBh8qAY+SZiHnz+U+FqAGBBDaDTjF2yiKa2nJSLKg@mail.gmail.com	2021-04-13 13:37:07 -04:00
Tom Lane	34f581c39e	Avoid improbable PANIC during heap_update. heap_update needs to clear any existing "all visible" flag on the old tuple's page (and on the new page too, if different). Per coding rules, to do this it must acquire pin on the appropriate visibility-map page while not holding exclusive buffer lock; which creates a race condition since someone else could set the flag whenever we're not holding the buffer lock. The code is supposed to handle that by re-checking the flag after acquiring buffer lock and retrying if it became set. However, one code path through heap_update itself, as well as one in its subroutine RelationGetBufferForTuple, failed to do this. The end result, in the unlikely event that a concurrent VACUUM did set the flag while we're transiently not holding lock, is a non-recurring "PANIC: wrong buffer passed to visibilitymap_clear" failure. This has been seen a few times in the buildfarm since recent VACUUM changes that added code paths that could set the all-visible flag while holding only exclusive buffer lock. Previously, the flag was (usually?) set only after doing LockBufferForCleanup, which would insist on buffer pin count zero, thus preventing the flag from becoming set partway through heap_update. However, it's clear that it's heap_update not VACUUM that's at fault here. What's less clear is whether there is any hazard from these bugs in released branches. heap_update is certainly violating API expectations, but if there is no code path that can set all-visible without a cleanup lock then it's only a latent bug. That's not 100% certain though, besides which we should worry about extensions or future back-patch fixes that could introduce such code paths. I chose to back-patch to v12. Fixing RelationGetBufferForTuple before that would require also back-patching portions of older fixes (notably `0d1fe9f74`), which is more code churn than seems prudent to fix a hypothetical issue. Discussion: https://postgr.es/m/2247102.1618008027@sss.pgh.pa.us	2021-04-13 12:17:24 -04:00
Noah Misch	455dbc010b	Use "-I." in directories holding Bison parsers, for Oracle compilers. With the Oracle Developer Studio 12.6 compiler, #line directives alter the current source file location for purposes of #include "..." directives. Hence, a VPATH build failed with 'cannot find include file: "specscanner.c"'. With two exceptions, parser-containing directories already add "-I. -I$(srcdir)"; eliminate the exceptions. Back-patch to 9.6 (all supported versions).	2021-04-12 19:24:41 -07:00
Thomas Munro	b1df6b696b	Fix potential SSI hazard in heap_update(). Commit `6f38d4dac3` failed to heed a warning about the stability of the value pointed to by "otid". The caller is allowed to pass in a pointer to newtup->t_self, which will be updated during the execution of the function. Instead, the SSI check should use the value we copy into oldtup.t_self near the top of the function. Not a live bug, because newtup->t_self doesn't really get updated until a bit later, but it was confusing and broke the rule established by the comment. Back-patch to 13. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2689164.1618160085%40sss.pgh.pa.us	2021-04-13 13:02:56 +12:00
Tom Lane	c402b02b9f	Fix old bug with coercing the result of a COLLATE expression. There are hacks in parse_coerce.c to push down a requested coercion to below any CollateExpr that may appear. However, we did that even if the requested data type is non-collatable, leading to an invalid expression tree in which CollateExpr is applied to a non-collatable type. The fix is just to drop the CollateExpr altogether, reasoning that it's useless. This bug is ten years old, dating to the original addition of COLLATE support. The lack of field complaints suggests that there aren't a lot of user-visible consequences. We noticed the problem because it would trigger an assertion in DefineVirtualRelation if the invalid structure appears as an output column of a view; however, in a non-assert build, you don't see a crash just a (subtly incorrect) complaint about applying collation to a non-collatable type. I found that by putting the incorrect structure further down in a view, I could make a view definition that would fail dump/reload, per the added regression test case. But CollateExpr doesn't do anything at run-time, so this likely doesn't lead to any really exciting consequences. Per report from Yulin Pei. Back-patch to all supported branches. Discussion: https://postgr.es/m/HK0PR01MB22744393C474D503E16C8509F4709@HK0PR01MB2274.apcprd01.prod.exchangelabs.com	2021-04-12 14:37:49 -04:00
Michael Paquier	b094063cd1	Move log_autovacuum_min_duration into its correct sections This GUC has already been classified as LOGGING_WHAT, but its location in postgresql.conf.sample and the documentation did not reflect that, so fix those inconsistencies. Author: Justin Pryzby Discussion: https://postgr.es/m/20210404012546.GK6592@telsasoft.com	2021-04-12 13:53:17 +09:00
Michael Paquier	7a3972597f	Fix out-of-bound memory access for interval -> char conversion Using Roman numbers (via "RM" or "rm") for a conversion to calculate a number of months has never considered the case of negative numbers, where a conversion could easily cause out-of-bound memory accesses. The conversions in themselves were not completely consistent either, as specifying 12 would result in NULL, but it should mean XII. This commit reworks the conversion calculation to have a more consistent behavior: - If the number of months and years is 0, return NULL. - If the number of months is positive, return the exact month number. - If the number of months is negative, do a backward calculation, with -1 meaning December, -2 November, etc. Reported-by: Theodor Arsenij Larionov-Trichkin Author: Julien Rouhaud Discussion: https://postgr.es/m/16953-f255a18f8c51f1d5@postgresql.org backpatch-through: 9.6	2021-04-12 11:30:50 +09:00
Tom Lane	6277435a8a	Silence some Coverity warnings and improve code consistency. Coverity complained about possible overflow in expressions like intresult = tm->tm_sec * 1000000 + fsec; on the grounds that the multiplication would happen in 32-bit arithmetic before widening to the int64 result. I think these are all false positives because of the limited possible range of tm_sec; but nonetheless it seems silly to spell it like that when nearby lines have the identical computation written with a 64-bit constant. ... or more accurately, with an LL constant, which is not project style. Make all of these use INT64CONST(), as we do elsewhere. This is all new code from `a2da77cdb`, so no need for back-patch.	2021-04-11 17:02:04 -04:00
Tom Lane	9cb9233409	Fix uninitialized variable from commit `a4d75c86b`. The path for *exprs != NIL would misbehave, and likely crash, since pull_varattnos expects its last argument to be valid at call. Found by Coverity --- we have no coverage of this path in the regression tests.	2021-04-11 11:46:46 -04:00
Fujii Masao	81a23dd879	Avoid unnecessary table open/close in TRUNCATE command. ExecuteTruncate() filters out the duplicate tables specified in the TRUNCATE command, for example in the case where "TRUNCATE foo, foo" is executed. Such duplicate tables obviously don't need to be opened and closed because they are skipped. But previously it always opened the tables before checking whether they were duplicated ones or not, and then closed them if they were. That is, the duplicated tables were opened and closed unnecessarily. This commit changes ExecuteTruncate() so that it opens the table after it confirms that table is not duplicated one, which leads to avoid unnecessary table open/close. Do not back-patch because such unnecessary table open/close is not a bug though it exists in older versions. Author: Bharath Rupireddy Reviewed-by: Amul Sul, Fujii Masao Discussion: https://postgr.es/m/CALj2ACUdBO_sXJTa08OZ0YT0qk7F_gAmRa9hT4dxRcgPS4nsZA@mail.gmail.com	2021-04-12 00:05:58 +09:00
Fujii Masao	08aa89b326	Remove COMMIT_TS_SETTS record. Commit `438fc4a39c` prevented the WAL replay from writing COMMIT_TS_SETTS record. By this change there is no code that generates COMMIT_TS_SETTS record in PostgreSQL core. Also we can think that there are no extensions using the record because we've not received so far any complaints about the issue that commit `438fc4a39c` fixed. Therefore this commit removes COMMIT_TS_SETTS record and its related code. Even without this record, the timestamp required for commit timestamp feature can be acquired from the COMMIT record. Bump WAL page magic. Reported-by: lx zou <zoulx1982@163.com> Author: Fujii Masao Reviewed-by: Alvaro Herrera Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org	2021-04-12 00:04:30 +09:00
Noah Misch	df5efaf441	Standardize pg_authid oid_symbol values. Commit `c9c41c7a33` used two different naming patterns. Standardize on the majority pattern, which was the only pattern in the last reviewed version of that commit.	2021-04-10 12:01:41 -07:00
Peter Eisentraut	496e58bb0e	Improve behavior of date_bin with origin in the future Currently, when the origin is after the input, the result is the timestamp at the end of the bin, rather than the beginning as expected. This puts the result consistently at the beginning of the bin. Author: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAFBsxsGjLDxQofRfH+d4KSAXxPf3MMevUG7s6EDfdBOvHLDLjw@mail.gmail.com	2021-04-10 19:33:46 +02:00
Tom Lane	07b76833b1	Doc: update documentation of check_function_bodies. Adjust docs and description string to note that check_function_bodies applies to procedures too. (In hindsight it should have been named check_routine_bodies, but it seems too late for that now.) Daniel Westermann Discussion: https://postgr.es/m/GV0P278MB04834A9EB9A74B036DC7CE49D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM	2021-04-10 12:08:28 -04:00
David Rowley	152d33bcce	Improve slightly misleading comments in nodeFuncs.c There were some comments in nodeFuncs.c that, depending on your interpretation of the word "result", could lead you to believe that the comments were badly copied and pasted from somewhere else. If you thought of "result" as the return value of the function that the comment is written in, then you'd be misled. However, if you'd correctly interpreted "result" to mean the result type of the given node type, you'd not have seen any issues. Here we do a small cleanup to try to prevent any future misinterpretations. Per wording suggestion from Tom Lane. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvp+Bw=2Qiu5=uXMKfC7gd0+B=4JvexVgGJU=am2g9a1CA@mail.gmail.com	2021-04-10 19:19:45 +12:00
Thomas Munro	846d35b2dc	Make new GUC short descriptions more consistent. Reported-by: Daniel Westermann (DWE) <daniel.westermann@dbi-services.com> Discussion: https://postgr.es/m/GV0P278MB0483490FEAC879DCA5ED583DD2739%40GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM	2021-04-10 08:41:07 +12:00
Thomas Munro	dc88460c24	Doc: Review for "Optionally prefetch referenced data in recovery." Typos, corrections and language improvements in the docs, and a few in code comments too. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20210409033703.GP6592%40telsasoft.com	2021-04-10 08:21:53 +12:00
Alvaro Herrera	0e69f705cc	Set pg_class.reltuples for partitioned tables When commit `0827e8af70` added auto-analyze support for partitioned tables, it included code to obtain reltuples for the partitioned table as a number of catalog accesses to read pg_class.reltuples for each partition. That's not only very inefficient, but also problematic because autovacuum doesn't hold any locks on any of those tables -- and doesn't want to. Replace that code with a read of pg_class.reltuples for the partitioned table, and make sure ANALYZE and TRUNCATE properly maintain that value. I found no code that would be affected by the change of relpages from zero to non-zero for partitioned tables, and no other code that should be maintaining it, but if there is, hopefully it'll be an easy fix. Per buildfarm. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/1823909.1617862590@sss.pgh.pa.us	2021-04-09 11:50:33 -04:00
Magnus Hagander	1798d8f8b6	Fix typo Author: Daniel Westermann Backpatch-through: 9.6 Discussion: https://postgr.es/m/GV0P278MB0483A7AA85BAFCC06D90F453D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM	2021-04-09 12:40:56 +02:00
Michael Paquier	609b0652af	Fix typos and grammar in documentation and code comments Comment fixes are applied on HEAD, and documentation improvements are applied on back-branches where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com Backpatch-through: 9.6	2021-04-09 13:53:07 +09:00
Peter Geoghegan	796092fb84	Silence another _bt_check_unique compiler warning. Per complaint from Tom Lane Discussion: https://postgr.es/m/1922884.1617909599@sss.pgh.pa.us	2021-04-08 12:54:31 -07:00
Tom Lane	01add89454	Suppress uninitialized-variable warning. Several buildfarm critters that don't usually produce such warnings are complaining about `e717a9a18`. I think it's actually safe, but move initialization to silence the warning.	2021-04-08 15:14:26 -04:00
Bruce Momjian	0f61727b75	Fixes for query_id feature Ignore parallel workers in pg_stat_statements Oversight in `4f0b0966c8` which exposed queryid in parallel workers. Counters are aggregated by the main backend process so parallel workers would report duplicated activity, and could also report activity for the wrong entry as they are only aware of the top level queryid. Fix thinko in pg_stat_get_activity when retrieving the queryid. Remove unnecessary call to pgstat_report_queryid(). Reported-by: Amit Kapila, Andres Freund, Thomas Munro Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax5gd@alap3.anarazel.de p634GTSOqnDW86Owrn6qDAVosC5dJjXjp7BMfc5Gz1Q@mail.gmail.com Author: Julien Rouhaud	2021-04-08 11:16:01 -04:00
Fujii Masao	8ff1c94649	Allow TRUNCATE command to truncate foreign tables. This commit introduces new foreign data wrapper API for TRUNCATE. It extends TRUNCATE command so that it accepts foreign tables as the targets to truncate and invokes that API. Also it extends postgres_fdw so that it can issue TRUNCATE command to foreign servers, by adding new routine for that TRUNCATE API. The information about options specified in TRUNCATE command, e.g., ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to truncate is also passed to FDW. FDW truncates the foreign data sources that the passed foreign tables specify, based on those information. For example, postgres_fdw constructs TRUNCATE command using them and issues it to the foreign server. For performance, TRUNCATE command invokes the FDW routine for TRUNCATE once per foreign server that foreign tables to truncate belong to. Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com	2021-04-08 20:56:08 +09:00
David Rowley	50e17ad281	Speedup ScalarArrayOpExpr evaluation ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand side have traditionally been evaluated by using a linear search over the array. When these arrays contain large numbers of elements then this linear search could become a significant part of execution time. Here we add a new method of evaluating ScalarArrayOpExpr expressions to allow them to be evaluated by first building a hash table containing each element, then on subsequent evaluations, we just probe that hash table to determine if there is a match. The planner is in charge of determining when this optimization is possible and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The executor will only perform the hash table evaluation when the hashfuncid is set. This means that not all cases are optimized. For example CHECK constraints containing an IN clause won't go through the planner, so won't get the hashfuncid set. We could maybe do something about that at some later date. The reason we're not doing it now is from fear that we may slow down cases where the expression is evaluated only once. Those cases can be common, for example, a single row INSERT to a table with a CHECK constraint containing an IN clause. In the planner, we enable this when there are suitable hash functions for the ScalarArrayOpExpr's operator and only when there is at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is currently set to 9. Author: James Coleman, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com	2021-04-08 23:51:22 +12:00
Thomas Munro	1d257577e0	Optionally prefetch referenced data in recovery. Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2021-04-08 23:20:42 +12:00
Thomas Munro	f003d9f872	Add circular WAL decoding buffer. Teach xlogreader.c to decode its output into a circular buffer, to support optimizations based on looking ahead. * XLogReadRecord() works as before, consuming records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros. * An alternative new interface XLogNextRecord() is added that returns pointers to DecodedXLogRecord structs that can be examined directly. * XLogReadAhead() provides a second cursor that lets you see further ahead, as long as data is available and there is enough space in the decoding buffer. This returns DecodedXLogRecord pointers to the caller, but also adds them to a queue of records that will later be consumed by XLogNextRecord()/XLogReadRecord(). The buffer's size is controlled with wal_decode_buffer_size. The buffer could potentially be placed into shared memory, for future projects. Large records that don't fit in the circular buffer are called "oversized" and allocated separately with palloc(). Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com	2021-04-08 23:20:42 +12:00
Thomas Munro	323cbe7c7d	Remove read_page callback from XLogReader. Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version) Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp	2021-04-08 23:20:42 +12:00
David Rowley	5ac9c43073	Cleanup partition pruning step generation There was some code in gen_prune_steps_from_opexps that needlessly checked a list was not empty when it clearly had to contain at least one item. This prompted a further cleanup operation in partprune.c. Additionally, the previous code could end up adding additional needless INTERSECT steps. However, those do not appear to be able to cause any misbehavior. gen_prune_steps_from_opexps is now no longer in charge of generating combine pruning steps. Instead, gen_partprune_steps_internal, which already does some combine step creation has been given the sole responsibility of generating all combine steps. This means that when we recursively call gen_partprune_steps_internal, since it always now adds a combine step when it produces multiple steps, we can just pay attention to the final step returned. In passing, do quite a bit of work on the comments to try to more clearly explain the role of both gen_partprune_steps_internal and gen_prune_steps_from_opexps. This is fairly complex code so some extra effort to give any new readers an overview of how things work seems like a good idea. Author: Amit Langote Reported-by: Andy Fan Reviewed-by: Kyotaro Horiguchi, Andy Fan, Ryan Lambert, David Rowley Discussion: https://postgr.es/m/CAKU4AWqWoVii+bRTeBQmeVW+PznkdO8DfbwqNsu9Gj4ubt9A6w@mail.gmail.com	2021-04-08 22:35:48 +12:00
Magnus Hagander	aaf0432572	Add functions to wait for backend termination This adds a function, pg_wait_for_backend_termination(), and a new timeout argument to pg_terminate_backend(), which will wait for the backend to actually terminate (with or without signaling it to do so depending on which function is called). The default behaviour of pg_terminate_backend() remains being timeout=0 which does not waiting. For pg_wait_for_backend_termination() the default wait is 5 seconds. Author: Bharath Rupireddy Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama, Hou Zhijie, Magnus Hagander Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com	2021-04-08 11:40:54 +02:00
Thomas Munro	2f27f8c511	Provide ReadRecentBuffer() to re-pin buffers by ID. If you know the ID of a buffer that recently held a block that you would like to pin, this function can be used check if it's still there. It can be used to avoid a second lookup in the buffer mapping table after PrefetchBuffer() reports a cache hit. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com	2021-04-08 17:50:25 +12:00
Alvaro Herrera	0827e8af70	autovacuum: handle analyze for partitioned tables Previously, autovacuum would completely ignore partitioned tables, which is not good regarding analyze -- failing to analyze those tables means poor plans may be chosen. Make autovacuum aware of those tables by propagating "changes since analyze" counts from the leaf partitions up the partitioning hierarchy. This also introduces necessary reloptions support for partitioned tables (autovacuum_enabled, autovacuum_analyze_scale_factor, autovacuum_analyze_threshold). It's unclear how best to document this aspect. Author: Yuzuko Hosoya <yuzukohosoya@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAKkQ508_PwVgwJyBY=0Lmkz90j8CmWNPUxgHvCUwGhMrouz6UA@mail.gmail.com	2021-04-08 01:19:36 -04:00
Andres Freund	b3ee4c5038	Cope with NULL query string in ExecInitParallelPlan(). It's far from clear that this is the right approach - but a good portion of the buildfarm has been red for a few hours, on the last day of the CF. And this fixes at least the obvious crash. So let's go with that for now. Discussion: https://postgr.es/m/20210407225806.majgznh4lk34hjvu%40alap3.anarazel.de	2021-04-07 22:08:24 -07:00
Amit Kapila	8ffb003591	Fix typo in jsonfuncs.c. Author: Tatsuro Yamada Discussion: https://postgr.es/m/7c166a60-2808-6b89-9524-feefc6233748@nttcom.co.jp_1	2021-04-08 10:24:00 +05:30
Alvaro Herrera	4131f755d5	Repair find_inheritance_children with no active snapshot When working on a scan with only a catalog snapshot, we may not have an ActiveSnapshot set. If we were to come across a detached partition, that would cause a crash. Fix by only ignoring detached partitions when there's an active snapshot.	2021-04-08 00:46:14 -04:00
Bruce Momjian	f57a2f5e03	Add csvlog output for the new query_id value This also adjusts the printf format for query id used by log_line_prefix (%Q). Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20210408005402.GG24239@momjian.us Author: Julien Rouhaud, Bruce Momjian	2021-04-07 22:30:30 -04:00
Peter Geoghegan	5100010ee4	Teach VACUUM to bypass unnecessary index vacuuming. VACUUM has never needed to call ambulkdelete() for each index in cases where there are precisely zero TIDs in its dead_tuples array by the end of its first pass over the heap (also its only pass over the heap in this scenario). Index vacuuming is simply not required when this happens. Index cleanup will still go ahead, but in practice most calls to amvacuumcleanup() are usually no-ops when there were zero preceding ambulkdelete() calls. In short, VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. As a further condition, the dead_tuples array (i.e. VACUUM's array of LP_DEAD item TIDs) must not exceed 32MB at the point that the first pass over the heap finishes, which is also when the decision to bypass is made. (The VACUUM must also have been able to fit all TIDs in its maintenance_work_mem-bound dead_tuples space, though with a default maintenance_work_mem setting it can't matter.) This avoids surprising jumps in the duration and overhead of routine vacuuming with workloads where successive VACUUM operations consistently have almost zero dead index tuples. The number of LP_DEAD items may well accumulate over multiple VACUUM operations, before finally the threshold is crossed and VACUUM performs conventional index vacuuming. Even then, the optimization will have avoided a great deal of largely unnecessary index vacuuming. In the future we may teach VACUUM to skip index vacuuming on a per-index basis, using a much more sophisticated approach. For now we only consider the extreme cases, where we can be quite confident that index vacuuming just isn't worth it using simple heuristics. Also log information about how many heap pages have one or more LP_DEAD items when autovacuum logging is enabled. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmkebqPd4MVGuPTOS9bMFvp9MDs5cRTCOsv1rQJ3jCbXw@mail.gmail.com	2021-04-07 16:14:54 -07:00
Peter Eisentraut	e717a9a18b	SQL-standard function body This adds support for writing CREATE FUNCTION and CREATE PROCEDURE statements for language SQL with a function body that conforms to the SQL standard and is portable to other implementations. Instead of the PostgreSQL-specific AS $$ string literal $$ syntax, this allows writing out the SQL statements making up the body unquoted, either as a single statement: CREATE FUNCTION add(a integer, b integer) RETURNS integer LANGUAGE SQL RETURN a + b; or as a block CREATE PROCEDURE insert_data(a integer, b integer) LANGUAGE SQL BEGIN ATOMIC INSERT INTO tbl VALUES (a); INSERT INTO tbl VALUES (b); END; The function body is parsed at function definition time and stored as expression nodes in a new pg_proc column prosqlbody. So at run time, no further parsing is required. However, this form does not support polymorphic arguments, because there is no more parse analysis done at call time. Dependencies between the function and the objects it uses are fully tracked. A new RETURN statement is introduced. This can only be used inside function bodies. Internally, it is treated much like a SELECT statement. psql needs some new intelligence to keep track of function body boundaries so that it doesn't send off statements when it sees semicolons that are inside a function body. Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com	2021-04-07 21:47:55 +02:00
Peter Geoghegan	1e55e7d175	Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit `8523492d`, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com	2021-04-07 12:37:45 -07:00
Bruce Momjian	4f0b0966c8	Make use of in-core query id added by commit `5fd9dfa5f5` Use the in-core query id computation for pg_stat_activity, log_line_prefix, and EXPLAIN VERBOSE. Similar to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Add a %Q placeholder to include the queryid in log_line_prefix, which will also only expose top level statements. For EXPLAIN VERBOSE, if a query identifier has been computed, either by enabling compute_query_id or using a third-party module, display it. Bump catalog version. Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol Author: Julien Rouhaud Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu	2021-04-07 14:04:06 -04:00
Bruce Momjian	5fd9dfa5f5	Move pg_stat_statements query jumbling to core. Add compute_query_id GUC to control whether a query identifier should be computed by the core (off by default). It's thefore now possible to disable core queryid computation and use pg_stat_statements with a different algorithm to compute the query identifier by using a third-party module. To ensure that a single source of query identifier can be used and is well defined, modules that calculate a query identifier should throw an error if compute_query_id specified to compute a query id and if a query idenfitier was already calculated. Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol Author: Julien Rouhaud Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu	2021-04-07 13:06:56 -04:00
Tom Lane	0d46771eaa	Comment cleanup for `a1115fa07`. Amit Langote Discussion: https://postgr.es/m/CA+HiwqEcawatEaUh1uTbZMEZTJeLzbroRTz9_X9Z5CFjTWJkhw@mail.gmail.com	2021-04-07 12:22:02 -04:00
Peter Geoghegan	3c3b8a4b26	Truncate line pointer array during VACUUM. Teach VACUUM to truncate the line pointer array of each heap page when a contiguous group of LP_UNUSED line pointers appear at the end of the array -- these unused and unreferenced items are excluded. This process occurs during VACUUM's second pass over the heap, right after LP_DEAD line pointers on the page (those encountered/pruned during the first pass) are marked LP_UNUSED. Truncation avoids line pointer bloat with certain workloads, particularly those involving continual range DELETEs and bulk INSERTs against the same table. Also harden heapam code to check for an out-of-range page offset number in places where we weren't already doing so. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wzn6a64PJM1Ggzm=uvx2otsopJMhFQj_g1rAj4GWr3ZSzw@mail.gmail.com	2021-04-07 08:47:15 -07:00
Tom Lane	3db826bd55	Tighten up allowed names for custom GUC parameters. Formerly we were pretty lax about what a custom GUC's name could be; so long as it had at least one dot in it, we'd take it. However, corner cases such as dashes or equal signs in the name would cause various bits of functionality to misbehave. Rather than trying to make the world perfectly safe for that, let's just require that custom names look like "identifier.identifier", where "identifier" means something that scan.l would accept without double quotes. Along the way, this patch refactors things slightly in guc.c so that find_option() is responsible for reporting GUC-not-found cases, allowing removal of duplicative code from its callers. Per report from Hubert Depesz Lubaczewski. No back-patch, since the consequences of the problem don't seem to warrant changing behavior in stable branches. Discussion: https://postgr.es/m/951335.1612910077@sss.pgh.pa.us	2021-04-07 11:22:22 -04:00
Tomas Vondra	23607a8156	Don't add non-existent pages to bitmap from BRIN The code in bringetbitmap() simply added the whole matching page range to the TID bitmap, as determined by pages_per_range, even if some of the pages were beyond the end of the heap. The query then might fail with an error like this: ERROR: could not open file "base/20176/20228.2" (target block 262144): previous segment is only 131021 blocks In this case, the relation has 262093 pages (131072 and 131021 pages), but we're trying to acess block 262144, i.e. first block of the 3rd segment. At that point _mdfd_getseg() notices the preceding segment is incomplete, and fails. Hitting this in practice is rather unlikely, because: * Most indexes use power-of-two ranges, so segments and page ranges align perfectly (segment end is also a page range end). * The table size has to be just right, with the last segment being almost full - less than one page range from full segment, so that the last page range actually crosses the segment boundary. * Prefetch has to be enabled. The regular page access checks that pages are not beyond heap end, but prefetch does not. On older releases (before 12) the execution stops after hitting the first non-existent page, so the prefetch distance has to be sufficient to reach the first page in the next segment to trigger the issue. Since 12 it's enough to just have prefetch enabled, the prefetch distance does not matter. Fixed by not adding non-existent pages to the TID bitmap. Backpatch all the way back to 9.6 (BRIN indexes were introduced in 9.5, but that release is EOL). Backpatch-through: 9.6	2021-04-07 15:58:36 +02:00
Magnus Hagander	c1968426ba	Refactor hba_authname The previous implementation (from `9afffcb833`) had an unnecessary check on the boundaries of the enum which trigtered compile warnings. To clean it up, move the pre-existing static assert to a central location and call that. Reported-By: Erik Rijkers Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/1056399262.13159.1617793249020@webmailclassic.xs4all.nl	2021-04-07 14:24:47 +02:00
Heikki Linnakangas	d92b1cdbab	Revert "Add sortsupport for gist_btree opclasses, for faster index builds." This reverts commit `9f984ba6d2`. It was making the buildfarm unhappy, apparently setting client_min_messages in a regression test produces different output if log_statement='all'. Another issue is that I now suspect the bit sortsupport function was in fact not correct to call byteacmp(). Revert to investigate both of those issues.	2021-04-07 14:33:21 +03:00
Heikki Linnakangas	9f984ba6d2	Add sortsupport for gist_btree opclasses, for faster index builds. Commit `16fa9b2b30` introduced a faster way to build GiST indexes, by sorting all the data. This commit adds the sortsupport functions needed to make use of that feature for btree_gist. Author: Andrey Borodin Discussion: https://www.postgresql.org/message-id/2F3F7265-0D22-44DB-AD71-8554C743D943@yandex-team.ru	2021-04-07 13:22:05 +03:00
Peter Eisentraut	dd13ad9d39	Fix use of cursor sensitivity terminology Documentation and comments in code and tests have been using the terms sensitive/insensitive cursor incorrectly relative to the SQL standard. (Cursor sensitivity is only relevant for changes made in the same transaction as the cursor, not for concurrent changes in other sessions.) Moreover, some of the behavior of PostgreSQL is incorrect according to the SQL standard, confusing the issue further. (WHERE CURRENT OF changes are not visible in insensitive cursors, but they should be.) This change corrects the terminology and removes the claim that sensitive cursors are supported. It also adds a test case that checks the insensitive behavior in a "correct" way, using a change command not using WHERE CURRENT OF. Finally, it adds the ASENSITIVE cursor option to select the default asensitive behavior, per SQL standard. There are no changes to cursor behavior in this patch. Discussion: https://www.postgresql.org/message-id/flat/96ee8b30-9889-9e1b-b053-90e10c050e85%40enterprisedb.com	2021-04-07 08:05:55 +02:00
Peter Eisentraut	0b5e824528	Message improvement The previous wording contained a superfluous comma. Adjust phrasing for grammatical correctness and clarity.	2021-04-07 07:42:44 +02:00
Michael Paquier	4c0239cb7a	Remove redundant memset(0) calls for page init of some index AMs Bloom, GIN, GiST and SP-GiST rely on PageInit() to initialize the contents of a page, and this routine fills entirely a page with zeros for a size of BLCKSZ, including the special space. Those index AMs have been using an extra memset() call to fill with zeros the special page space, or even the whole page, which is not necessary as PageInit() already does this work, so let's remove them. GiST was not doing this extra call, but has commented out a system call that did so since `6236991`. While on it, remove one MAXALIGN() for SP-GiST as PageInit() takes care of that. This makes the whole page initialization logic more consistent across all index AMs. Author: Bharath Rupireddy Reviewed-by: Vignesh C, Mahendra Singh Thalor Discussion: https://postgr.es/m/CALj2ACViOo2qyaPT7krWm4LRyRTw9kOXt+g6PfNmYuGA=YHj9A@mail.gmail.com	2021-04-07 14:35:26 +09:00
Michael Paquier	9afffcb833	Add some information about authenticated identity via log_connections The "authenticated identity" is the string used by an authentication method to identify a particular user. In many common cases, this is the same as the PostgreSQL username, but for some third-party authentication methods, the identifier in use may be shortened or otherwise translated (e.g. through pg_ident user mappings) before the server stores it. To help administrators see who has actually interacted with the system, this commit adds the capability to store the original identity when authentication succeeds within the backend's Port, and generates a log entry when log_connections is enabled. The log entries generated look something like this (where a local user named "foouser" is connecting to the database as the database user called "admin"): LOG: connection received: host=[local] LOG: connection authenticated: identity="foouser" method=peer (/data/pg_hba.conf:88) LOG: connection authorized: user=admin database=postgres application_name=psql Port->authn_id is set according to the authentication method: bsd: the PostgreSQL username (aka the local username) cert: the client's Subject DN gss: the user principal ident: the remote username ldap: the final bind DN pam: the PostgreSQL username (aka PAM username) password (and all pw-challenge methods): the PostgreSQL username peer: the peer's pw_name radius: the PostgreSQL username (aka the RADIUS username) sspi: either the down-level (SAM-compatible) logon name, if compat_realm=1, or the User Principal Name if compat_realm=0 The trust auth method does not set an authenticated identity. Neither does clientcert=verify-full. Port->authn_id could be used for other purposes, like a superuser-only extra column in pg_stat_activity, but this is left as future work. PostgresNode::connect_{ok,fails}() have been modified to let tests check the backend log files for required or prohibited patterns, using the new log_like and log_unlike parameters. This uses a method based on a truncation of the existing server log file, like issues_sql_like(). Tests are added to the ldap, kerberos, authentication and SSL test suites. Author: Jacob Champion Reviewed-by: Stephen Frost, Magnus Hagander, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com	2021-04-07 10:16:39 +09:00
Tom Lane	a1115fa078	Postpone some more stuff out of ExecInitModifyTable. Delay creation of the projections for INSERT and UPDATE tuples until they're needed. This saves a pretty fair amount of work when only some of the partitions are actually touched. The logic associated with identifying junk columns in UPDATE/DELETE is moved to another loop, allowing removal of one loop over the target relations; but it didn't actually change at all. Extracted from a larger patch, which seemed to me to be too messy to push in one commit. Amit Langote, reviewed at different times by Heikki Linnakangas and myself Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com	2021-04-06 18:13:17 -04:00
Tom Lane	c5b7ba4e67	Postpone some stuff out of ExecInitModifyTable. Arrange to do some things on-demand, rather than immediately during executor startup, because there's a fair chance of never having to do them at all: * Don't open result relations' indexes until needed. * Don't initialize partition tuple routing, nor the child-to-root tuple conversion map, until needed. This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates; with larger partition counts the savings is quite noticeable. Also, we can remove some sketchy heuristics in ExecInitModifyTable about whether to set up tuple routing. Also, remove execPartition.c's private hash table tracking which partitions were already opened by the ModifyTable node. Instead use the hash added to ModifyTable itself by commit `86dc90056`. To allow lazy computation of the conversion maps, we now set ri_RootResultRelInfo in all child ResultRelInfos. We formerly set it only in some, not terribly well-defined, cases. This has user-visible side effects in that now more error messages refer to the root relation instead of some partition (and provide error data in the root's column order, too). It looks to me like this is a strict improvement in consistency, so I don't have a problem with the output changes visible in this commit. Extracted from a larger patch, which seemed to me to be too messy to push in one commit. Amit Langote, reviewed at different times by Heikki Linnakangas and myself Discussion: https://postgr.es/m/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com	2021-04-06 15:57:11 -04:00
Andres Freund	90c885cdab	Increment xactCompletionCount during subtransaction abort. Snapshot caching, introduced in `623a9ba79b`, did not increment xactCompletionCount during subtransaction abort. That could lead to an older snapshot being reused. That is, at least as far as I can see, not a correctness issue (for MVCC snapshots there's no difference between "in progress" and "aborted"). The only difference between the old and new snapshots would be a newer ->xmax. While HeapTupleSatisfiesMVCC makes the same visibility determination, reusing the old snapshot leads HeapTupleSatisfiesMVCC to not set HEAP_XMIN_INVALID. Which subsequently causes the kill_prior_tuple optimization to not kick in (via HeapTupleIsSurelyDead() returning false). The performance effects of doing the same index-lookups over and over again is how the issue was discovered... Fix the issue by incrementing xactCompletionCount in XidCacheRemoveRunningXids. It already acquires ProcArrayLock exclusively, making that an easy proposition. Add a test to ensure that kill_prior_tuple prevents index growth when it involves aborted subtransaction of the current transaction. Author: Andres Freund Discussion: https://postgr.es/m/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de Discussion: https://postgr.es/m/20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de	2021-04-06 09:24:50 -07:00
Peter Geoghegan	8523492d4e	Remove tupgone special case from vacuumlazy.c. Retry the call to heap_prune_page() in rare cases where there is disagreement between the heap_prune_page() call and the call to HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is possible when a concurrently-aborted transaction makes a tuple DEAD during the tiny window between each step. This was the only case where a tuple considered DEAD by VACUUM still had storage following pruning. VACUUM's definition of dead tuples is now uniformly simple and unambiguous: dead tuples from each page are always LP_DEAD line pointers that were encountered just after we performed pruning (and just before we considered freezing remaining items with tuple storage). Eliminating the tupgone=true special case enables INDEX_CLEANUP=off style skipping of index vacuuming that takes place based on flexible, dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping indexes up-front before now, due to a subtle interaction with the special case (see commit `dd695979`) -- this was a special case unto itself. Now there are no special cases. And so now it won't matter when or how we decide to skip index vacuuming: it won't affect how pruning behaves, and it won't be affected by any of the implementation details of pruning or freezing. Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer necessary because we now rely entirely on heap pruning taking care of recovery conflicts. There is no longer any need to generate recovery conflicts for DEAD tuples that pruning just missed. This also means that heap vacuuming now uses exactly the same strategy for recovery conflicts as index vacuuming always has: REDO routines never need to process a latestRemovedXid from the WAL record, since earlier REDO of the WAL record from pruning is sufficient in all cases. The generic XLOG_HEAP2_CLEAN record type is now split into two new record types to reflect this new division (these are called XLOG_HEAP2_PRUNE and XLOG_HEAP2_VACUUM). Also stop acquiring a super-exclusive lock for heap pages when they're vacuumed during VACUUM's second heap pass. A regular exclusive lock is enough. This is correct because heap page vacuuming is now strictly a matter of setting the LP_DEAD line pointers to LP_UNUSED. No other backend can have a pointer to a tuple located in a pinned buffer that can be invalidated by a concurrent heap page vacuum operation. Heap vacuuming can now be thought of as conceptually similar to index vacuuming and conceptually dissimilar to heap pruning. Heap pruning now has sole responsibility for anything involving the logical contents of the database (e.g., managing transaction status information, recovery conflicts, considering what to do with HOT chains). Index vacuuming and heap vacuuming are now only concerned with recycling garbage items from physical data structures that back the logical database. Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record changes. Credit for the idea of retrying pruning a page to avoid the tupgone case goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com	2021-04-06 08:49:22 -07:00
Tom Lane	789d81de8a	Fix missing #include in nodeResultCache.h. Per cpluspluscheck.	2021-04-06 11:23:56 -04:00
Tomas Vondra	518442c7f3	Fix handling of clauses incompatible with extended statistics Handling of incompatible clauses while applying extended statistics was a bit confused - while handling a mix of compatible and incompatible clauses it sometimes incorrectly treated the incompatible clauses as compatible, resulting in a crash. Fixed by reworking the code applying the selected statistics object to make it easier to understand, and adding a proper compatibility check. Reported-by: David Rowley Discussion: https://postgr.es/m/CAApHDvpYT10-nkSp8xXe-nbO3jmoaRyRFHbzh-RWMfAJynqgpQ%40mail.gmail.com	2021-04-06 16:56:06 +02:00
Peter Geoghegan	7ab96cf6b3	Refactor lazy_scan_heap() loop. Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit `a96c41fe`). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit `282d2a03`, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com	2021-04-06 07:49:39 -07:00
Tom Lane	091e22b2e6	Clean up treatment of missing default and CHECK-constraint records. Andrew Gierth reported that it's possible to crash the backend if no pg_attrdef record is found to match an attribute that has atthasdef set. AttrDefaultFetch warns about this situation, but then leaves behind a relation tupdesc that has null "adbin" pointer(s), which most places don't guard against. We considered promoting the warning to an error, but throwing errors during relcache load is pretty drastic: it effectively locks one out of using the relation at all. What seems better is to leave the load-time behavior as a warning, but then throw an error in any code path that wants to use a default and can't find it. This confines the error to a subset of INSERT/UPDATE operations on the table, and in particular will at least allow a pg_dump to succeed. Also, we should fix AttrDefaultFetch to not leave any null pointers in the tupdesc, because that just creates an untested bug hazard. While at it, apply the same philosophy of "warn at load, throw error only upon use of the known-missing info" to CHECK constraints. CheckConstraintFetch is very nearly the same logic as AttrDefaultFetch, but for reasons lost in the mists of time, it was throwing ERROR for the same cases that AttrDefaultFetch treats as WARNING. Make the two functions more nearly alike. In passing, get rid of potentially-O(N^2) loops in equalTupleDesc by making AttrDefaultFetch sort the entries after fetching them, so that equalTupleDesc can assume that entries in two equal tupdescs must be in matching order. (CheckConstraintFetch already was sorting CHECK constraints, but equalTupleDesc hadn't been told about it.) There's some argument for back-patching this, but with such a small number of field reports, I'm content to fix it in HEAD. Discussion: https://postgr.es/m/87pmzaq4gx.fsf@news-spur.riddles.org.uk	2021-04-06 10:34:39 -04:00
Fujii Masao	9de9294b0c	Stop archive recovery if WAL generated with wal_level=minimal is found. Previously if hot standby was enabled, archive recovery exited with an error when it found WAL generated with wal_level=minimal. But if hot standby was disabled, it just reported a warning and continued in that case. Which could lead to data loss or errors during normal operation. A warning was emitted, but users could easily miss that and not notice this serious situation until they encountered the actual errors. To improve this situation, this commit changes archive recovery so that it exits with FATAL error when it finds WAL generated with wal_level=minimal whatever the setting of hot standby. This enables users to notice the serious situation soon. The FATAL error is thrown if archive recovery starts from a base backup taken before wal_level is changed to minimal. When archive recovery exits with the error, if users have a base backup taken after setting wal_level to higher than minimal, they can recover the database by starting archive recovery from that newer backup. But note that if such backup doesn't exist, there is no easy way to complete archive recovery, which may make the database server unstartable and users may lose whole database. The commit adds the note about this risk into the document. Even in the case of unstartable database server, previously by just disabling hot standby users could avoid the error during archive recovery, forcibly start up the server and salvage data from it. But note that this commit makes this procedure unavailable at all. Author: Takamichi Osumi Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com	2021-04-06 22:56:51 +09:00
Etsuro Fujita	a8af856d32	Adjust input value to WaitEventSetWait() in ExecAppendAsyncEventWait(). Adjust the number of events given to WaitEventSetWait() so that it doesn't exceed the maximum number of events in the WaitEventSet given to that function (set->nevents_space) in hopes of making the buildfarm green. Per valgrind failure report from Tom Lane and the buildfarm. Author: Etsuro Fujita Discussion: https://postgr.es/m/3411577.1617289776%40sss.pgh.pa.us	2021-04-06 19:15:00 +09:00
Peter Eisentraut	82ed7748b7	ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION At present, if we want to update publications in a subscription, we can use SET PUBLICATION. However, it requires supplying all publications that exists and the new publications. If we want to add new publications, it's inconvenient. The new syntax only supplies the new publications. When the refresh is true, it only refreshes the new publications. Author: Japin Li <japinli@hotmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/MEYP282MB166939D0D6C480B7FBE7EFFBB6BC0@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2021-04-06 11:49:51 +02:00
Peter Eisentraut	a2da77cdb4	Change return type of EXTRACT to numeric The previous implementation of EXTRACT mapped internally to date_part(), which returned type double precision (since it was implemented long before the numeric type existed). This can lead to imprecise output in some cases, so returning numeric would be preferrable. Changing the return type of an existing function is a bit risky, so instead we do the following: We implement a new set of functions, which are now called "extract", in parallel to the existing date_part functions. They work the same way internally but use numeric instead of float8. The EXTRACT construct is now mapped by the parser to these new extract functions. That way, dumps of views etc. from old versions (which would use date_part) continue to work unchanged, but new uses will map to the new extract functions. Additionally, the reverse compilation of EXTRACT now reproduces the original syntax, using the new mechanism introduced in `40c24bfef9`. The following minor changes of behavior result from the new implementation: - The column name from an isolated EXTRACT call is now "extract" instead of "date_part". - Extract from date now rejects inappropriate field names such as HOUR. It was previously mapped internally to extract from timestamp, so it would silently accept everything appropriate for timestamp. - Return values when extracting fields with possibly fractional values, such as second and epoch, now have the full scale that the value has internally (so, for example, '1.000000' instead of just '1'). Reported-by: Petr Fedorov <petr.fedorov@phystech.edu> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu	2021-04-06 07:20:42 +02:00
Fujii Masao	f5d94e405e	Fix typo in pgstat.c. Introduced by `9868167500`. Author: Vignesh C Discussion: https://postgr.es/m/CALDaNm1DqgaLBAJrtGznKk1sR1mH-augmp7LfGvxWwTUhah+rg@mail.gmail.com	2021-04-06 14:09:40 +09:00
Fujii Masao	43620e3286	Add function to log the memory contexts of specified backend process. Commit `3e98c0bafb` added pg_backend_memory_contexts view to display the memory contexts of the backend process. However its target process is limited to the backend that is accessing to the view. So this is not so convenient when investigating the local memory bloat of other backend process. To improve this situation, this commit adds pg_log_backend_memory_contexts() function that requests to log the memory contexts of the specified backend process. This information can be also collected by calling MemoryContextStats(TopMemoryContext) via a debugger. But this technique cannot be used in some environments because no debugger is available there. So, pg_log_backend_memory_contexts() allows us to see the memory contexts of specified backend more easily. Only superusers are allowed to request to log the memory contexts because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its memory contexts at LOG_SERVER_ONLY level, so that these memory contexts will appear in the server log but not be sent to the client. It logs one message per memory context. Because if it buffers all memory contexts into StringInfo to log them as one message, which may require the buffer to be enlarged very much and lead to OOM error since there can be a large number of memory contexts in a backend. When a backend process is consuming huge memory, logging all its memory contexts might overrun available disk space. To prevent this, now this patch limits the number of child contexts to log per parent to 100. As with MemoryContextStats(), it supposes that practical cases where the log gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. There was another proposed patch to add the function to return the memory contexts of specified backend as the result sets, instead of logging them, in the discussion. However that patch is not included in this commit because it had several issues to address. Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra, Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com	2021-04-06 13:44:15 +09:00
Amit Kapila	ac4645c015	Allow pgoutput to send logical decoding messages. The output plugin accepts a new parameter (messages) that controls if logical decoding messages are written into the replication stream. It is useful for those clients that use pgoutput as an output plugin and needs to process messages that were written by pg_logical_emit_message(). Although logical streaming replication protocol supports logical decoding messages now, logical replication does not use this feature yet. Author: David Pirotte, Euler Taveira Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com	2021-04-06 08:40:47 +05:30
Amit Kapila	531737ddad	Refactor function parse_output_parameters. Instead of using multiple parameters in parse_ouput_parameters function signature, use the struct PGOutputData that encapsulates all pgoutput options. It will be useful for future work where we need to add other options in pgoutput. Author: Euler Taveira Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com	2021-04-06 08:26:31 +05:30
Peter Geoghegan	f6b8f19a08	Allocate access strategy in parallel VACUUM workers. Commit `49f49def` took entirely the wrong approach to fixing this issue. Just allocate a local buffer access strategy in each individual worker instead of trying to propagate state. This state was never propagated by parallel VACUUM in the first place. It looks like the only reason that this worked following commit `40d964ec` was that it involved static global variables, which are initialized to 0 per the C standard. A more comprehensive fix may be necessary, even on HEAD. This fix should at least get the buildfarm green once again. Thanks once again to Thomas Munro for continued off-list assistance with the issue.	2021-04-05 17:17:40 -07:00
Tom Lane	09c1c6ab4b	Support INCLUDE'd columns in SP-GiST. Not much to say here: does what it says on the tin. We steal a previously-always-zero bit from the nextOffset field of leaf index tuples in order to track whether there is a nulls bitmap. Otherwise it works about like included columns in other index types. Pavel Borisov, reviewed by Andrey Borodin and Anastasia Lubennikova, and rather heavily editorialized on by me Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com	2021-04-05 18:41:21 -04:00
Peter Geoghegan	49f49defe7	Propagate parallel VACUUM's buffer access strategy. Parallel VACUUM relied on global variable state from the leader process being propagated to workers on fork(). Commit `b4af70cb` removed most uses of global variables inside vacuumlazy.c, but did not account for the buffer access strategy state. To fix, propagate the state through shared memory instead. Per buildfarm failures on elver, curculio, and morepork. Many thanks to Thomas Munro for off-list assistance with this issue.	2021-04-05 14:56:56 -07:00
Peter Geoghegan	b4af70cb21	Simplify state managed by VACUUM. Reorganize the state struct used by VACUUM -- group related items together to make it easier to understand. Also stop relying on stack variables inside lazy_scan_heap() -- move those into the state struct instead. Doing things this way simplifies large groups of related functions whose function signatures had a lot of unnecessary redundancy. Switch over to using int64 for the struct fields used to count things that are reported to the user via log_autovacuum and VACUUM VERBOSE output. We were using double, but that doesn't seem to have any advantages. Using int64 makes it possible to add assertions that verify that the first pass over the heap (pruning) encounters precisely the same number of LP_DEAD items that get deleted from indexes later on, in the second pass over the heap. These assertions will be added in later commits. Finally, adjust the signatures of functions with IndexBulkDeleteResult pointer arguments in cases where there was ambiguity about whether or not the argument relates to a single index or all indexes. Functions now use the idiom that both ambulkdelete() and amvacuumcleanup() have always used (where appropriate): accept a mutable IndexBulkDeleteResult pointer argument, and return a result IndexBulkDeleteResult pointer to caller. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkeOSYwC6KNckbhk2b1aNnWum6Yyn0NKP9D-Hq1LGTDPw@mail.gmail.com	2021-04-05 13:21:44 -07:00
Stephen Frost	6c3ffd697e	Add pg_read_all_data and pg_write_all_data roles A commonly requested use-case is to have a role who can run an unfettered pg_dump without having to explicitly GRANT that user access to all tables, schemas, et al, without that role being a superuser. This address that by adding a "pg_read_all_data" role which implicitly gives any member of this role SELECT rights on all tables, views and sequences, and USAGE rights on all schemas. As there may be cases where it's also useful to have a role who has write access to all objects, pg_write_all_data is also introduced and gives users implicit INSERT, UPDATE and DELETE rights on all tables, views and sequences. These roles can not be logged into directly but instead should be GRANT'd to a role which is able to log in. As noted in the documentation, if RLS is being used then an administrator may (or may not) wish to set BYPASSRLS on the login role which these predefined roles are GRANT'd to. Reviewed-by: Georgios Kokolatos Discussion: https://postgr.es/m/20200828003023.GU29590@tamriel.snowman.net	2021-04-05 13:42:52 -04:00
Fujii Masao	ad8b674922	Shut down transaction tracking at startup process exit. Maxim Orlov reported that the shutdown of standby server could result in the following assertion failure. The cause of this issue was that, when the shutdown caused the startup process to exit, recovery-time transaction tracking was not shut down even if it's already initialized, and some locks the tracked transactions were holding could not be released. At this situation, if other process was invoked and the PGPROC entry that the startup process used was assigned to it, it found such unreleased locks and caused the assertion failure, during the initialization of it. TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))" This commit fixes this issue by making the startup process shut down transaction tracking and release all locks, at the exit of it. Back-patch to all supported branches. Reported-by: Maxim Orlov Author: Fujii Masao Reviewed-by: Maxim Orlov Discussion: https://postgr.es/m/ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru	2021-04-06 02:25:37 +09:00
Michael Paquier	9f6f1f9b8e	Fix typo in collationcmds.c Introduced by `51e225d`. Author: Anton Voloshin Discussion: https://postgr.es/m/05477da0-703c-7de7-998c-5879738e8f39@postgrespro.ru	2021-04-05 11:18:12 +09:00
Tom Lane	dfc843d465	Fix more confusion in SP-GiST. spg_box_quad_leaf_consistent unconditionally returned the leaf datum as leafValue, even though in its usage for poly_ops that value is of completely the wrong type. In versions before 12, that was harmless because the core code did nothing with leafValue in non-index-only scans ... but since commit `2a6368343`, if we were doing a KNN-style scan, spgNewHeapItem would unconditionally try to copy the value using the wrong datatype parameters. Said copying is a waste of time and space if we're not going to return the data, but it accidentally failed to fail until I fixed the datatype confusion in `ac9099fc1`. Hence, change spgNewHeapItem to not copy the datum unless we're actually going to return it later. This saves cycles and dodges the question of whether lossy opclasses are returning the right type. Also change spg_box_quad_leaf_consistent to not return data that might be of the wrong type, as insurance against somebody introducing a similar bug into the core code in future. It seems like a good idea to back-patch these two changes into v12 and v13, although I'm afraid to change spgNewHeapItem's mistaken idea of which datatype to use in those branches. Per buildfarm results from `ac9099fc1`. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	2021-04-04 17:57:07 -04:00
Tom Lane	ac9099fc1d	Fix confusion in SP-GiST between attribute type and leaf storage type. According to the documentation, the attType passed to the opclass config function (and also relied on by the core code) is the type of the heap column or expression being indexed. But what was actually being passed was the type stored for the index column. This made no difference for user-defined SP-GiST opclasses, because we weren't allowing the STORAGE clause of CREATE OPCLASS to be used, so the two types would be the same. But it's silly not to allow that, seeing that the built-in poly_ops opclass has a different value for opckeytype than opcintype, and that if you want to do lossy storage then the types must really be different. (Thus, user-defined opclasses doing lossy storage had to lie about what type is in the index.) Hence, remove the restriction, and make sure that we use the input column type not opckeytype where relevant. For reasons of backwards compatibility with existing user-defined opclasses, we can't quite insist that the specified leafType match the STORAGE clause; instead just add an amvalidate() warning if they don't match. Also fix some bugs that would only manifest when trying to return index entries when attType is different from attLeafType. It's not too surprising that these have not been reported, because the only usual reason for such a difference is to store the leaf value lossily, rendering index-only scans impossible. Add a src/test/modules module to exercise cases where attType is different from attLeafType and yet index-only scan is supported. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	2021-04-04 14:28:57 -04:00
Tomas Vondra	d9c5b9a9ee	Fix bug in brin_minmax_multi_union When calling sort_expanded_ranges() we need to remember the return value, because the function sorts and also deduplicates the ranges. So the number of ranges may decrease. brin_minmax_multi_union failed to do that, which resulted in crashes due to bogus ranges (equal minval/maxval but not marked as compacted). Reported-by: Jaime Casanova Discussion: https://postgr.es/m/20210404052550.GA4376%40ahch-to	2021-04-04 19:36:12 +02:00
Tomas Vondra	1dad2a5ea3	Fix order of parameters in BRIN minmax-multi calls The BRIN minmax-multi consistent function incorrectly assumed it can lookup an operator, and then swap the arguments to get the commutator. For example <(a,b) would be called as <(b,a) to get >(a,b). This works when the arguments are of the same type, but with cross-type opclasses this fails. We can't swap <(float4,float8) arguments, for example. Fixed by passing arguments in the right order. Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:25:41 +02:00
Tomas Vondra	e1fbe1181c	Fix BRIN minmax-multi distance for inet type The distance calculation ignored the mask, unlike the inet comparator, which resulted in negative distance in some cases. Fixed by applying the mask in brin_minmax_multi_distance_inet. I've considered simply calling inetmi() to calculate the delta, but that does not consider mask either. Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/1a0a7b9d-9bda-e3a2-7fa4-88f15042a051%40enterprisedb.com	2021-04-04 19:23:32 +02:00
Tomas Vondra	7262f2421a	Fix BRIN minmax-multi distance for timetz type The distance calculation ignored the time zone, so the result of (b-a) might have ended negative even if (b > a). Fixed by considering the time zone difference. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:22:23 +02:00
Tomas Vondra	2b10e0e3c2	Fix BRIN minmax-multi distance for interval type The distance calculation for interval type was treating months as having 31 days, which is inconsistent with the interval comparator (using 30 days). Due to this it was possible to get negative distance (b-a) when (a<b), trigerring an assert. Fixed by adopting the same logic as interval_cmp_value. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jKH0Xhneau2mNftNPtTy-BVgQfXc8zQkEvRvBHfeUThQ%40mail.gmail.com	2021-04-04 19:19:51 +02:00
Andres Freund	225a22b19e	Improve efficiency of wait event reporting, remove proc.h dependency. pgstat_report_wait_start() and pgstat_report_wait_end() required two conditional branches so far. One to check if MyProc is NULL, the other to check if pgstat_track_activities is set. As wait events are used around comparatively lightweight operations, and are inlined (reducing branch predictor effectiveness), that's not great. The dependency on MyProc has a second disadvantage: Low-level subsystems, like storage/file/fd.c, report wait events, but architecturally it is preferable for them to not depend on inter-process subsystems like proc.h (defining PGPROC). After this change including pgstat.h (nor obviously its sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC related headers anymore. These goals, efficiency and abstraction, are achieved by having pgstat_report_wait_start/end() not interact with MyProc, but instead a new my_wait_event_info variable. At backend startup it points to a local variable, removing the need to check for MyProc being NULL. During process initialization my_wait_event_info is redirected to MyProc->wait_event_info. At shutdown this is reversed. Because wait event reporting now does not need to know about where the wait event is stored, it does not need to know about PGPROC anymore. The removal of the branch for checking pgstat_track_activities is simpler: Don't check anymore. The cost due to the branch are often higher than the store - and even if not, pgstat_track_activities is rarely disabled. The main motivator to commit this work now is that removing the (indirect) pgproc.h include from pgstat.h simplifies a patch to move statistics reporting to shared memory (which still has a chance to get into 14). Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-03 12:03:45 -07:00
Andres Freund	e1025044cd	Split backend status and progress related functionality out of pgstat.c. Backend status (supporting pg_stat_activity) and command progress (supporting pg_stat_progress*) related code is largely independent from the rest of pgstat.[ch] (supporting views like pg_stat_all_tables that accumulate data over time). See also `a333476b92`. This commit doesn't rename the function names to make the distinction from the rest of pgstat_ clearer - that'd be more invasive and not clearly beneficial. If we were to decide to do such a rename at some point, it's better done separately from moving the code as well. Robert's review was of an earlier version. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-03 11:42:52 -07:00
Michael Paquier	e6bdfd9700	Refactor HMAC implementations Similarly to the cryptohash implementations, this refactors the existing HMAC code into a single set of APIs that can be plugged with any crypto libraries PostgreSQL is built with (only OpenSSL currently). If there is no such libraries, a fallback implementation is available. Those new APIs are designed similarly to the existing cryptohash layer, so there is no real new design here, with the same logic around buffer bound checks and memory handling. HMAC has a dependency on cryptohashes, so all the cryptohash types supported by cryptohash{_openssl}.c can be used with HMAC. This refactoring is an advantage mainly for SCRAM, that included its own implementation of HMAC with SHA256 without relying on the existing crypto libraries even if PostgreSQL was built with their support. This code has been tested on Windows and Linux, with and without OpenSSL, across all the versions supported on HEAD from 1.1.1 down to 1.0.1. I have also checked that the implementations are working fine using some sample results, a custom extension of my own, and doing cross-checks across different major versions with SCRAM with the client and the backend. Author: Michael Paquier Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz	2021-04-03 17:30:49 +09:00
Andres Freund	1d9c5d0ce2	Do not rely on pgstat.h to indirectly include storage/ headers. An upcoming patch might remove the (now indirect) proc.h include (which in turn includes other headers), and it's cleaner for the modified files to include their dependencies directly anyway... Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-02 20:02:47 -07:00
Andres Freund	a333476b92	Split wait event related code from pgstat.[ch] into wait_event.[ch]. The wait event related code is independent from the rest of the pgstat.[ch] code, of nontrivial size and changes on a regular basis. Put it into its own set of files. As there doesn't seem to be a good pre-existing directory for code like this, add src/backend/utils/activity. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-02 20:02:26 -07:00
David Rowley	1267d9862f	Remove useless Asserts in Result Cache code Testing if an unsigned variable is >= 0 is pretty pointless. There's likely enough code in remove_cache_entry() to verify the cache memory accounting is correct in assert enabled builds. These Asserts were not adding much extra cover, even if they had been checking >= 0 on a signed variable. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210402204734.6mo3nfacnljlicgn@alap3.anarazel.de	2021-04-03 10:41:43 +13:00
Thomas Munro	c30f54ad73	Detect POLLHUP/POLLRDHUP while running queries. Provide a new GUC check_client_connection_interval that can be used to check whether the client connection has gone away, while running very long queries. It is disabled by default. For now this uses a non-standard Linux extension (also adopted by at least one other OS). POLLRDHUP is not defined by POSIX, and other OSes don't have a reliable way to know if a connection was closed without actually trying to read or write. In future we might consider trying to send a no-op/heartbeat message instead, but that could require protocol changes. Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Maksim Milyutin <milyutinma@gmail.com> Reviewed-by: Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version) Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru	2021-04-03 09:02:41 +13:00
Tom Lane	53aafdb9ff	Strip file names reported in error messages on Windows, too. Commit `dd136052b` established a policy that error message FILE items should include only the base name of the reporting source file, for uniformity and succinctness. We now observe that some Windows compilers use backslashes in __FILE__ strings, so truncate at backslashes as well. This is expected to fix some platform variation in the results of the new libpq_pipeline test module. Discussion: https://postgr.es/m/3650140.1617372290@sss.pgh.pa.us	2021-04-02 10:43:54 -04:00
Peter Eisentraut	9c5f67fd62	Add support for NullIfExpr in eval_const_expressions Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Discussion: https://www.postgresql.org/message-id/flat/7ea5ce773bbc4eea9ff1a381acd3b102@G08CNEXMBPEKD05.g08.fujitsu.local	2021-04-02 11:01:49 +02:00
Fujii Masao	96bdb7e19d	Fix pgstat_report_replslot() to use proper data types for its arguments. The caller of pgstat_report_replslot() passes int64 values to the function. Also the function stores those values in PgStat_Counter (i.e., int64) fields of PgStat_MsgReplSlot struct. But previously the function used "int" as the data types of some arguments for those values, which could lead to the overflow of values. To avoid this risk, this commit fixes pgstat_report_replslot() to use PgStat_Counter type for the arguments. Since they are the statistics counters, PgStat_Counter, the data type used for counters, is used for them instead of int64. Reported-by: Vignesh C Author: Vignesh C Reviewed-by: Jeevan Ladhe, Fujii Masao Discussion: https://postgr.es/m/CALDaNm080OpG=ZwOb0i8EyChH5SyHAMFWJCKaKTXmrfvJLbgaA@mail.gmail.com	2021-04-02 17:27:31 +09:00
David Rowley	9eacee2e62	Add Result Cache executor node (take 2) Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-02 14:10:56 +13:00
Tom Lane	1ebdec8c03	Rethink handling of pass-by-value leaf datums in SP-GiST. The existing convention in SP-GiST is that any pass-by-value datatype is stored in Datum representation, i.e. it's of width sizeof(Datum) even when typlen is less than that. This is okay, or at least it's too late to change it, for prefix datums and node-label datums in inner (upper) tuples. But it's problematic for leaf datums, because we'd prefer those to be stored in Postgres' standard on-disk representation so that we can easily extend leaf tuples to carry additional "included" columns. I believe, however, that we can get away with just up and changing that. This would be an unacceptable on-disk-format break, but there are two big mitigating factors: 1. It seems quite unlikely that there are any SP-GiST opclasses out there that use pass-by-value leaf datatypes. Certainly none of the ones in core do, nor has codesearch.debian.net heard of any. Given what SP-GiST is good for, it's hard to conceive of a use-case where the leaf-level values would be both small and fixed-width. (As an example, if you wanted to index text values with the leaf level being just a byte, then every text string would have to be represented with one level of inner tuple per preceding byte, which would be horrendously space-inefficient and slow to access. You always want to use as few inner-tuple levels as possible, leaving as much as possible in the leaf values.) 2. Even granting that you have such an index, this change only breaks things on big-endian machines. On little-endian, the high order bytes of the Datum format will now just appear to be alignment padding space. So, change the code to store pass-by-value leaf datums in their usual on-disk form. Inner-tuple datums are not touched. This is extracted from a larger patch that intends to add support for "included" columns. I'm committing it separately for visibility in our commit logs. Pavel Borisov and Tom Lane, reviewed by Andrey Borodin Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com	2021-04-01 17:55:17 -04:00
Stephen Frost	c9c41c7a33	Rename Default Roles to Predefined Roles The term 'default roles' wasn't quite apt as these roles aren't able to be modified or removed after installation, so rename them to be 'Predefined Roles' instead, adding an entry into the newly added Obsolete Appendix to help users of current releases find the new documentation. Bruce Momjian and Stephen Frost Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net	2021-04-01 15:32:06 -04:00
Peter Eisentraut	91e7c90329	Fix internal extract(timezone_minute) formulas Through various refactorings over time, the extract(timezone_minute from time with time zone) and extract(timezone_minute from timestamp with time zone) implementations ended up with two different but equally nonsensical formulas by using SECS_PER_MINUTE and MINS_PER_HOUR interchangeably. Since those two are of course both the same number, the formulas do work, but for readability, fix them to be semantically correct.	2021-04-01 16:12:53 +02:00
Heikki Linnakangas	f82de5c46b	Do COPY FROM encoding conversion/verification in larger chunks. This gives a small performance gain, by reducing the number of calls to the conversion/verification function, and letting it work with larger inputs. Also, reorganizing the input pipeline makes it easier to parallelize the input parsing: after the input has been converted to the database encoding, the next stage of finding the newlines can be done in parallel, because there cannot be any newline chars "embedded" in multi-byte characters in the encodings that we support as server encodings. This changes behavior in one corner case: if client and server encodings are the same single-byte encoding (e.g. latin1), previously the input would not be checked for zero bytes ('\0'). Any fields containing zero bytes would be truncated at the zero. But if encoding conversion was needed, the conversion routine would throw an error on the zero. After this commit, the input is always checked for zeros. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 12:23:40 +03:00
Heikki Linnakangas	ea1b99a661	Add 'noError' argument to encoding conversion functions. With the 'noError' argument, you can try to convert a buffer without knowing the character boundaries beforehand. The functions now need to return the number of input bytes successfully converted. This is is a backwards-incompatible change, if you have created a custom encoding conversion with CREATE CONVERSION. This adds a check to pg_upgrade for that, refusing the upgrade if there are any user-defined encoding conversions. Custom conversions are very rare, there are no commonly used extensions that I know of that uses that feature. No other objects can depend on conversions, so if you do have one, you can fairly easily drop it before upgrading, and recreate it after the upgrade with an updated version. Add regression tests for built-in encoding conversions. This doesn't cover every conversion, but it covers all the internal functions in conv.c that are used to implement the conversions. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 11:45:22 +03:00
Amit Kapila	4778826532	Ensure to send a prepare after we detect concurrent abort during decoding. It is possible that while decoding a prepared transaction, it gets aborted concurrently via a ROLLBACK PREPARED command. In that case, we were skipping all the changes and directly sending Rollback Prepared when we find the same in WAL. However, the downstream has no idea of the GID of such a transaction. So, ensure to send prepare even when a concurrent abort is detected. Author: Ajin Cherian Reviewed-by: Markus Wanner, Amit Kapila Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com	2021-04-01 07:57:34 +05:30
David Rowley	28b3e3905c	Revert `b6002a796` This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com	2021-04-01 13:33:23 +13:00
David Rowley	b6002a796d	Add Result Cache executor node Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-01 12:32:22 +13:00
Tom Lane	c545e9524d	Don't prematurely cram a value into a short int. Since `a4d75c86b`, some buildfarm members have been warning that Assert(attnum <= MaxAttrNumber); is useless if attnum is an AttrNumber. I'm not certain how plausible it is that the value coming out of the bitmap could actually exceed MaxAttrNumber, but we seem to have thought that that was possible back in `7300a6995`. Revert the intermediate variable to int so that we have the same overflow protection as before.	2021-03-31 16:45:24 -04:00
Tom Lane	6197db5340	Improve style of some replication-related error messages. Put the remote end's error message into the primary error string, instead of relegating it to errdetail(). Although this could end up being awkward if the remote sends us a really long error message, it seems more in keeping with our message style guidelines, and more helpful in situations where the errdetail could get dropped. Peter Smith Discussion: https://postgr.es/m/CAHut+Ps-Qv2yQceCwobQDP0aJOkfDzRFrOaR6+2Op2K=WHGeWg@mail.gmail.com	2021-03-31 15:25:53 -04:00
Joe Conway	b12bd4869b	Fix has_column_privilege function corner case According to the comments, when an invalid or dropped column oid is passed to has_column_privilege(), the intention has always been to return NULL. However, when the caller had table level privilege the invalid/missing column was never discovered, because table permissions were checked first. Fix that by introducing extended versions of pg_attribute_acl(check\|mask) and pg_class_acl(check\|mask) which take a new argument, is_missing. When is_missing is NULL, the old behavior is preserved. But when is_missing is passed by the caller, no ERROR is thrown for dropped or missing columns/relations, and is_missing is flipped to true. This in turn allows has_column_privilege to check for column privileges first, providing the desired semantics. Not backpatched since it is a user visible behavioral change with no previous complaints, and the fix is a bit on the invasive side. Author: Joe Conway Reviewed-By: Tom Lane Reported by: Ian Barwick Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com	2021-03-31 13:55:25 -04:00
Tom Lane	86dc90056d	Rework planning and execution of UPDATE and DELETE. This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com	2021-03-31 11:52:37 -04:00
Peter Eisentraut	055fee7eb4	Allow an alias to be attached to a JOIN ... USING This allows something like SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x where x has the columns a, b, c and unlike a regular alias it does not hide the range variables of the tables being joined t1 and t2. Per SQL:2016 feature F404 "Range variable for common column names". Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com	2021-03-31 17:10:50 +02:00
Etsuro Fujita	27e1f14563	Add support for asynchronous execution. This implements asynchronous execution, which runs multiple parts of a non-parallel-aware Append concurrently rather than serially to improve performance when possible. Currently, the only node type that can be run concurrently is a ForeignScan that is an immediate child of such an Append. In the case where such ForeignScans access data on different remote servers, this would run those ForeignScans concurrently, and overlap the remote operations to be performed simultaneously, so it'll improve the performance especially when the operations involve time-consuming ones such as remote join and remote aggregation. We may extend this to other node types such as joins or aggregates over ForeignScans in the future. This also adds the support for postgres_fdw, which is enabled by the table-level/server-level option "async_capable". The default is false. Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself. This commit is mostly based on the patch proposed by Robert Haas, but also uses stuff from the patch proposed by Kyotaro Horiguchi and from the patch proposed by Thomas Munro. Reviewed by Kyotaro Horiguchi, Konstantin Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and others. Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com	2021-03-31 18:45:00 +09:00
Peter Eisentraut	66392d3965	Add p_names field to ParseNamespaceItem ParseNamespaceItem had a wired-in assumption that p_rte->eref describes the table and column aliases exposed by the nsitem. This relaxes this by creating a separate p_names field in an nsitem. This is mainly preparation for a patch for JOIN USING aliases, but it saves one indirection in common code paths, so it's possibly a win on its own. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/785329.1616455091@sss.pgh.pa.us	2021-03-31 10:52:37 +02:00
Peter Eisentraut	91c5a8caaa	Add errhint_plural() function and make use of it Similar to existing errmsg_plural() and errdetail_plural(). Some errhint() calls hadn't received the proper plural treatment yet.	2021-03-31 09:16:25 +02:00
Noah Misch	0ff8bbdee1	Accept slightly-filled pages for tuples larger than fillfactor. We always inserted a larger-than-fillfactor tuple into a newly-extended page, even when existing pages were empty or contained nothing but an unused line pointer. This was unnecessary relation extension. Start tolerating page usage up to 1/8 the maximum space that could be taken up by line pointers. This is somewhat arbitrary, but it should allow more cases to reuse pages. This has no effect on tables with fillfactor=100 (the default). John Naylor and Floris van Nee. Reviewed by Matthias van de Meent. Reported by Floris van Nee. Discussion: https://postgr.es/m/6e263217180649339720afe2176c50aa@opammb0562.comp.optiver.com	2021-03-30 18:53:44 -07:00
Tom Lane	65158f497a	Remove small inefficiency in ExecARDeleteTriggers/ExecARUpdateTriggers. Whilst poking at nodeModifyTable.c, I chanced to notice that while its calls to ExecBRTriggers and ExecIRTriggers are protected by tests to see if there are any relevant triggers to fire, its calls to ExecARTriggers are not; the latter functions do the equivalent tests themselves. This seems possibly reasonable given the more complex conditions involved, but what's less reasonable is that the ExecAR functions aren't careful to do no work when there is no work to be done. ExecARInsertTriggers gets this right, but the other two will both force creation of a slot that the query may have no use for. ExecARUpdateTriggers additionally performed a usually-useless ExecClearTuple() on that slot. This is probably all pretty microscopic in real workloads, but a cycle shaved is a cycle earned.	2021-03-30 20:01:31 -04:00
Bruce Momjian	5da9868ed9	In messages, use singular nouns for -1, like we do for +1. This outputs "-1 year", not "-1 years". Reported-by: neverov.max@gmail.com Bug: 16939 Discussion: https://postgr.es/m/16939-cceeb03fb72736ee@postgresql.org	2021-03-30 18:34:27 -04:00
Stephen Frost	4753ef37e0	Use a WaitLatch for vacuum/autovacuum sleeping Instead of using pg_usleep() in vacuum_delay_point(), use a WaitLatch. This has the advantage that we will realize if the postmaster has been killed since the last time we decided to sleep while vacuuming. Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=JN5w@mail.gmail.com	2021-03-30 12:52:56 -04:00
David Rowley	ed934d4fa3	Allow estimate_num_groups() to pass back further details about the estimation Here we add a new output parameter to estimate_num_groups() to allow it to inform the caller of additional, possibly useful information about the estimation. The new output parameter is a struct that currently contains just a single field with a set of flags. This was done rather than having the flags as an output parameter to allow future fields to be added without having to change the signature of the function at a later date when we want to pass back further information that might not be suitable to store in the flags field. It seems reasonable that one day in the future that the planner would want to know more about the estimation. For example, how many individual sets of statistics was the estimation generated from? The planner may want to take that into account if we ever want to consider risks as well as costs when generating plans. For now, there's only 1 flag we set in the flags field. This is to indicate if the estimation fell back on using the hard-coded constants in any part of the estimation. Callers may like to change their behavior if this is set, and this gives them the ability to do so. Callers may pass the flag pointer as NULL if they have no interest in obtaining any additional information about the estimate. We're not adding any actual usages of these flags here. Some follow-up commits will make use of this feature. Additionally, we're also not making any changes to add support for clauselist_selectivity() and clauselist_selectivity_ext(). However, if this is required in the future then the same struct being added here should be fine to use as a new output argument for those functions too. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com	2021-03-30 20:52:46 +13:00
David Rowley	efd9d92bb3	Fix compiler warning in unistr function Some compilers are not aware that elog/ereport ERROR does not return.	2021-03-30 20:28:09 +13:00
Amit Kapila	f64ea6dc5c	Add a xid argument to the filter_prepare callback for output plugins. Along with gid, this provides a different way to identify the transaction. The users that use xid in some way to prepare the transactions can use it to filter prepare transactions. The later commands COMMIT PREPARED or ROLLBACK PREPARED carries both identifiers, providing an output plugin the choice of what to use. Author: Markus Wanner Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/ee280000-7355-c4dc-e47b-2436e7be959c@enterprisedb.com	2021-03-30 10:34:43 +05:30
David Rowley	af527705ed	Adjust design of per-worker parallel seqscan data struct The design of the data structures which allow storage of the per-worker memory during parallel seq scans were not ideal. The work done in `56788d215` required an additional data structure to allow workers to remember the range of pages that had been allocated to them for processing during a parallel seqscan. That commit added a void pointer field to TableScanDescData to allow heapam to store the per-worker allocation information. However putting the field there made very little sense given that we have AM specific structs for that, e.g. HeapScanDescData. Here we remove the void pointer field from TableScanDescData and add a dedicated field for this purpose to HeapScanDescData. Previously we also allocated memory for this parallel per-worker data for all scans, regardless if it was a parallel scan or not. This was just a wasted allocation for non-parallel scans, so here we make the allocation conditional on the scan being parallel. Also, add previously missing pfree() to free the per-worker data in heap_endscan(). Reported-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de	2021-03-30 10:17:09 +13:00
Andrew Dunstan	6d7a6feac4	Allow matching the DN of a client certificate for authentication Currently we only recognize the Common Name (CN) of a certificate's subject to be matched against the user name. Thus certificates with subjects '/OU=eng/CN=fred' and '/OU=sales/CN=fred' will have the same connection rights. This patch provides an option to match the whole Distinguished Name (DN) instead of just the CN. On any hba line using client certificate identity, there is an option 'clientname' which can have values of 'DN' or 'CN'. The default is 'CN', the current procedure. The DN is matched against the RFC2253 formatted DN, which looks like 'CN=fred,OU=eng'. This facility of probably best used in conjunction with an ident map. Discussion: https://postgr.es/m/92e70110-9273-d93c-5913-0bccb6562740@dunslane.net Reviewed-By: Michael Paquier, Daniel Gustafsson, Jacob Champion	2021-03-29 15:49:39 -04:00
Peter Eisentraut	f37fec837c	Add unistr function This allows decoding a string with Unicode escape sequences. It is similar to Unicode escape strings, but offers some more flexibility. Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Asif Rehman <asifr.rehman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com	2021-03-29 11:56:53 +02:00
Peter Geoghegan	30aaab26e5	PageAddItemExtended(): Add LP_UNUSED assertion. Assert that LP_UNUSED items have no storage. If it's worth having defensive code in non-assert builds then it's worth having an assertion as well.	2021-03-28 20:10:02 -07:00
David Rowley	f58b230ed0	Cache if PathTarget and RestrictInfos contain volatile functions Here we aim to reduce duplicate work done by contain_volatile_functions() by caching whether PathTargets and RestrictInfos contain any volatile functions the first time contain_volatile_functions() is called for them. Any future calls for these nodes just use the cached value rather than going to the trouble of recursively checking the sub-node all over again. Thanks to Tom Lane for the idea. Any locations in the code which make changes to a PathTarget or RestrictInfo which could change the outcome of the volatility check must change the cached value back to VOLATILITY_UNKNOWN again. contain_volatile_functions() is the only code in charge of setting the cache value to either VOLATILITY_VOLATILE or VOLATILITY_NOVOLATILE. Some existing code does benefit from this additional caching, however, this change is mainly aimed at an upcoming patch that must check for volatility during the join search. Repeated volatility checks in that case can become very expensive when the join search contains more than a few relations. Author: David Rowley Discussion: https://postgr.es/m/3795226.1614059027@sss.pgh.pa.us	2021-03-29 14:55:26 +13:00
Peter Eisentraut	8df2f37114	Improve consistency of SQL code capitalization	2021-03-27 10:17:12 +01:00

... 2 3 4 5 6 ...

21944 Commits