postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	e81e5741a6	Fix full text search to handle NOT above a phrase search correctly. Queries such as '!(foo<->bar)' failed to find matching rows when implemented as a GiST or GIN index search. That's because of failing to handle phrase searches as tri-valued when considering a query without any position information for the target tsvector. We can only say that the phrase operator might match, not that it does match; and therefore its NOT also might match. The previous coding incorrectly inverted the approximate phrase result to decide that there was certainly no match. To fix, we need to make TS_phrase_execute return a real ternary result, and then bubble that up accurately in TS_execute. As long as we have to do that anyway, we can simplify the baroque things TS_phrase_execute was doing internally to manage tri-valued searching with only a bool as explicit result. For now, I left the externally-visible result of TS_execute as a plain bool. There do not appear to be any outside callers that need to distinguish a three-way result, given that they passed in a flag saying what to do in the absence of position data. This might need to change someday, but we wouldn't want to back-patch such a change. Although tsginidx.c has its own TS_execute_ternary implementation for use at upper index levels, that sadly managed to get this case wrong as well :-(. Fixing it is a lot easier fortunately. Per bug #16388 from Charles Offenbacher. Back-patch to 9.6 where phrase search was introduced. Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org	2020-04-27 12:21:04 -04:00
Peter Eisentraut	d51f704fd8	pg_dump: Replace can't-happen error with assertion	2020-04-27 14:24:20 +02:00
Michael Paquier	641b76d9d1	Fix some typos Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com	2020-04-27 14:59:36 +09:00
Peter Eisentraut	f057980149	Fix typo from `303640199d`	2020-04-26 13:48:33 +02:00
Noah Misch	896135512e	Raise a timeout to 180s, in test 003_recovery_targets.pl. Buildfarm member chipmunk has failed twice due to taking >30s, and twenty-four runs of other members have used >5s. The test is new in v13, so no back-patch.	2020-04-25 18:45:27 -07:00
Peter Geoghegan	7154aa16a6	Fix another minor page deletion buffer lock issue. Avoid accessing the leaf page's top parent tuple without a buffer lock held during the second phase of nbtree page deletion. The old approach was safe, though only because VACUUM never drops its buffer pin (and because only VACUUM itself can modify a half-dead page). Even still, it seems like a good idea to be strict here. Tighten things up by copying the top parent page's block number to a local variable before releasing the buffer lock on the leaf page -- not after. This is a follow-up to commit `fa7ff642`, which fixed a similar issue in the first phase of nbtree page deletion. Update some related comments in passing. Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com	2020-04-25 16:45:20 -07:00
Peter Geoghegan	fa7ff642c2	Fix minor nbtree page deletion buffer lock issue. Avoid accessing the deletion target page's special area during nbtree page deletion at a point where there is no buffer lock held. This issue was detected by a patch that extends Valgrind's memcheck tool to mark nbtree pages that are unsafe to access (due to not having a buffer lock or buffer pin) as NOACCESS. We do hold a buffer pin at this point, and only access the special area, so the old approach was safe. Even still, it seems like a good idea to tighten up the rules in this area. There is no reason to not simply insist on always holding a buffer lock (not just a pin) when accessing nbtree pages. Update some related comments in passing. Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com	2020-04-25 14:17:02 -07:00
Noah Misch	f246ea3b2a	In caught-up logical walsender, sleep only in WalSndWaitForWal(). Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write < sentPtr. When the latest physical LSN yields no logical replication messages (a common case), that keepalive elicits a reply. Processing the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when WalSndLoop() slept, replay_lsn advancement could stall until wal_receiver_status_interval elapsed. This sometimes stalled src/test/subscription/t/001_rep_changes.pl for up to 10s. Reviewed by Fujii Masao and Michael Paquier. Discussion: https://postgr.es/m/20200418070142.GA1075445@rfd.leadboat.com	2020-04-25 10:18:12 -07:00
Noah Misch	72a3dc321d	Revert "When WalSndCaughtUp, sleep only in WalSndWaitForWal()." This reverts commit `4216858122`. It caused idle physical walsenders to busy-wait, as reported by Fujii Masao. Discussion: https://postgr.es/m/20200417054146.GA1061007@rfd.leadboat.com	2020-04-25 10:17:26 -07:00
Andrew Gierth	d9a4cce29d	Fix error case for CREATE ROLE ... IN ROLE. CreateRole() was passing a Value node, not a RoleSpec node, for the newly-created role name when adding the role as a member of existing roles for the IN ROLE syntax. This mistake went unnoticed because the node in question is used only for error messages and is not accessed on non-error paths. In older pg versions (such as 9.5 where this was found), this results in an "unexpected node type" error in place of the real error. That node type check was removed at some point, after which the code would accidentally fail to fail on 64-bit platforms (on which accessing the Value node as if it were a RoleSpec would be mostly harmless) or give an "unexpected role type" error on 32-bit platforms. Fix the code to pass the correct node type, and add an lfirst_node assertion just in case. Per report on irc from user m1chelangelo. Backpatch all the way, because this error has been around for a long time.	2020-04-25 05:09:30 +01:00
Tom Lane	6c5f916168	Update Windows timezone name list to include currently-known zones. Thanks to Juan José Santamaría Flecha. Discussion: https://postgr.es/m/5752.1587740484@sss.pgh.pa.us	2020-04-24 17:53:23 -04:00
Tom Lane	bd8c5cee96	Improve placement of "display name" comment in win32_tzmap[] entries. Sticking this comment at the end of the last line was a bad idea: it's not particularly readable, and it tempts pgindent to mess with line breaks within the comment, which in turn reveals that win32tzlist.pl's clean_displayname() does the wrong thing to clean up such line breaks. While that's not hard to fix, there's basically no excuse for this arrangement to begin with, especially since it makes the table layout needlessly vary across back branches with different pgindent rules. Let's just put the comment inside the braces, instead. This commit just moves and reformats the comments, and updates win32tzlist.pl to match; there's no actual data change. Per odd-looking results from Juan José Santamaría Flecha. Back-patch, since the point is to make win32_tzmap[] look the same in all supported branches again. Discussion: https://postgr.es/m/5752.1587740484@sss.pgh.pa.us	2020-04-24 17:21:44 -04:00
Bruce Momjian	395a9a1248	git_changelog: use modern format for rel branch names in example e.g., REL_12_STABLE	2020-04-24 15:16:07 -04:00
Robert Haas	05021a2c0c	Try to avoid compiler warnings in optimized builds. Per report from Andres Freund, who also says that this fix works for him. Discussion: http://postgr.es/m/20200405193118.alprgmozhxcfabkw@alap3.anarazel.de	2020-04-24 14:11:45 -04:00
Tom Lane	baf17ad9df	Repair performance regression in information_schema.triggers view. Commit `32ff26911` introduced use of rank() into the triggers view to calculate the spec-mandated action_order column. As written, this prevents query constraints on the table-name column from being pushed below the window aggregate step. That's bad for performance of this typical usage pattern, since the view now has to be evaluated for all tables not just the one(s) the user wants to see. It's also the cause of some recent buildfarm failures, in which trying to evaluate the view rows for triggers in process of being dropped resulted in "cache lookup failed for function NNN" errors. Those rows aren't of interest to the test script doing the query, but the filter that would eliminate them is being applied too late. None of this happened before the rank() call was there, so it's a regression compared to v10 and before. We can improve matters by changing the rank() call so that instead of partitioning by OIDs, it partitions by nspname and relname, casting those to sql_identifier so that they match the respective view output columns exactly. The planner has enough intelligence to know that constraints on partitioning columns are safe to push down, so this eliminates the performance problem and the regression test failure risk. We could make the other partitioning columns match view outputs as well, but it'd be more complicated and the performance benefits are questionable. Side note: as this stands, the planner will push down constraints on event_object_table and trigger_schema, but not on event_object_schema, because it checks for ressortgroupref matches not expression equivalence. That might be worth improving someday, but it's not necessary to fix the immediate concern. Back-patch to v11 where the rank() call was added. Ordinarily we'd not change information_schema in released branches, but the test failure has been seen in v12 and presumably could happen in v11 as well, so we need to do this to keep the buildfarm happy. The change is harmless so far as users are concerned. Some might wish to apply it to existing installations if performance of this type of query is of concern, but those who don't are no worse off. I bumped catversion in HEAD as a pro forma matter (there's no catalog incompatibility that would really require a re-initdb). Obviously that can't be done in the back branches. Discussion: https://postgr.es/m/5891.1587594470@sss.pgh.pa.us	2020-04-24 12:02:36 -04:00
Tom Lane	4cac3a49e6	Update time zone data files to tzdata release 2020a. DST law changes in Morocco and the Canadian Yukon. Historical corrections for Shanghai. The America/Godthab zone is renamed to America/Nuuk to reflect current English usage; however, the old name remains available as a compatibility link.	2020-04-24 10:54:47 -04:00
Peter Eisentraut	3a89615776	Update Unicode data to Unicode 13.0.0 and CLDR 37	2020-04-24 09:52:59 +02:00
Michael Paquier	f9c1b8dba4	Remove some unstable parts from new TAP test for archive status check The test is proving to have timing issues when looking at archive status files on standbys after crash recovery, while other parts of the test rely on pg_stat_archiver as a wait point to make sure that a given state of the archiving is reached. The coverage is not heavily impacted by the removal those extra tests. Per reports from several buildfarm animals, like crake, piculet, culicidae and francolin. Discussion: https://postgr.es/m/20200424005929.GK33034@paquier.xyz Backpatch-through: 9.5	2020-04-24 11:33:41 +09:00
Michael Paquier	4e87c4836a	Fix handling of WAL segments ready to be archived during crash recovery `78ea8b5` has fixed an issue related to the recycling of WAL segments on standbys depending on archive_mode. However, it has introduced a regression with the handling of WAL segments ready to be archived during crash recovery, causing those files to be recycled without getting archived. This commit fixes the regression by tracking in shared memory if a live cluster is either in crash recovery or archive recovery as the handling of WAL segments ready to be archived is different in both cases (those WAL segments should not be removed during crash recovery), and by using this new shared memory state to decide if a segment can be recycled or not. Previously, it was not possible to know if a cluster was in crash recovery or archive recovery as the shared state was able to track only if recovery was happening or not, leading to the problem. A set of TAP tests is added to close the gap here, making sure that WAL segments ready to be archived are correctly handled when a cluster is in archive or crash recovery with archive_mode set to "on" or "always", for both standby and primary. Reported-by: Benoît Lobréau Author: Jehan-Guillaume de Rorthais Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost Backpatch-through: 9.5	2020-04-24 08:48:28 +09:00
Tom Lane	3436c5e283	Remove ACLDEBUG #define and associated code. In the footsteps of `aaf069aa3`, remove ACLDEBUG, which was the only other remaining undocumented symbol in pg_config_manual.h. The fact that nobody had bothered to document it in seventeen years is a good clue to its usefulness. In practice, none of the tracing logic it enabled would be of any value without additional effort. Discussion: https://postgr.es/m/6631.1587565046@sss.pgh.pa.us	2020-04-23 15:38:04 -04:00
Tom Lane	ee88ef55db	Remove useless (and broken) logging logic in memory context functions. Nobody really uses this stuff, especially not since we created valgrind-based infrastructure that does the same thing better. It is thus unsurprising that the generation.c and slab.c versions were actually broken. Rather than fix 'em, let's just remove 'em. Alexander Lakhin Discussion: https://postgr.es/m/8936216c-3492-3f6e-634b-d638fddc5f91@gmail.com	2020-04-23 15:27:37 -04:00
Robert Haas	ab7646ff92	Also rename 'struct manifest_info'. The previous commit renamed the struct's typedef, but not the struct name itself.	2020-04-23 09:47:50 -04:00
Robert Haas	3989dbdf12	Rename exposed identifiers to say "backup manifest". Function names declared "extern" now use BackupManifest in the name rather than just Manifest, and data types use backup_manifest rather than just manifest. Per note from Michael Paquier. Discussion: http://postgr.es/m/20200418125713.GG350229@paquier.xyz	2020-04-23 08:44:06 -04:00
Andres Freund	299298bc87	Fix transient memory leak for SRFs in FROM. In `a9c35cf85c` I changed ExecMakeTableFunctionResult() to dynamically allocate the FunctionCallInfo used to call the SRF. Unfortunately I did not account for the fact that the surrounding memory context has query lifetime, leading to a leak till the end of the query. In most cases the leak is fairly inconsequential, but if the FunctionScan is done many times in the query, the leak can add up. This happens e.g. if the function scan is on the inner side of a nested loop, due to a lateral join. EXPLAIN SELECT sum(f) FROM generate_series(1, 100000000) g(i), generate_series(i, i+1) f; quickly shows the leak. Instead of explicitly freeing the FunctionCallInfo it seems better to make sure all the per-set temporary state in ExecMakeTableFunctionResult() is cleaned up wholesale. Currently that's probably just the FunctionCallInfo allocation, but since there's some initialization work, and since there's already an appropriate context, this seems like a more robust approach. Bug: #16112 Reported-By: Ben Cornett Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/16112-4448bbf55a404189%40postgresql.org Backpatch: 12, `a9c35cf85c`	2020-04-22 19:53:06 -07:00
Fujii Masao	0a89e93bfa	Fix option related issues in pg_verifybackup. This commit does: - get rid of the garbage code for unused --print-parse-wal option. - add help message for --quiet option into usage(). - fix typo of option name in help message. Author: Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/ff4710f7-2331-4f6b-012e-d76da3275e91@oss.nttdata.com	2020-04-23 11:32:17 +09:00
Peter Geoghegan	48107e396f	nbtree: Rename BT_RESERVED_OFFSET_MASK. The mask was added by commit `8224de4f`, which introduced INCLUDE nbtree indexes. The status bits really were reserved initially. We now use 2 out of 4 of the bits for additional tuple metadata, though. Rename the mask to BT_STATUS_OFFSET_MASK. Also consolidate related nbtree.h code comments about the format of pivot tuples and posting list tuples.	2020-04-22 16:09:55 -07:00
Tomas Vondra	de0dc1a847	Fix cost_incremental_sort for expressions with varno 0 When estimating the number of pre-sorted groups in cost_incremental_sort we must not pass Vars with varno 0 to estimate_num_groups, which would cause failues in find_base_rel. This may happen when sorting output of set operations, thanks to generate_append_tlist. Unlike recurse_set_operations we can't easily access the original target list, so if we find any Vars with varno 0, we fall back to the default estimate DEFAULT_NUM_DISTINCT. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20200411214639.GK2228%40telsasoft.com	2020-04-23 00:15:24 +02:00
David Rowley	9f2c4edec2	Remove bogus Assert in foreign key cloning code This Assert was trying to ensure that the number of columns in the foreign key being cloned was the same number of attributes in the parentRel. Of course, it's perfectly valid to have columns in the table which are not part of the foreign key constraint. It appears that this Assert was misunderstanding that. Reported-by: Rajkumar Raghuwanshi Reviewed-by: amul sul Discussion: https://postgr.es/m/CAKcux6=z1dtiWw5BOpqDx-U6KTiq+zD0Y2m810zUtWL+giVXWA@mail.gmail.com	2020-04-22 22:12:19 +12:00
Peter Eisentraut	aaf069aa34	Remove HEAPDEBUGALL This has been broken since PostgreSQL 12 and was probably never really used. PostgreSQL 12 added an analogous HEAPAMSLOTDEBUGALL, which still works right now, but it's also not very useful, so remove that as well. Discussion: https://www.postgresql.org/message-id/flat/645c0646-4218-d4c3-409a-a7003a0c108d%402ndquadrant.com	2020-04-22 08:35:33 +02:00
Michael Paquier	cd12323440	Fix single-record reads to use restore_command if available in pg_rewind readOneRecord() is used now when looking for a checkpoint record to check if the target server is an ancestor of the source across multiple timelines, and using a restore_command if available improves the stability of the operation. This part was missed in `a7e8ece`. Reported-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/20200421.150830.1410714948345179794.horikyota.ntt@gmail.com	2020-04-22 08:08:28 +09:00
Alvaro Herrera	c33869cc3b	psql \d: Display table where trigger is defined, if inherited It's important to know that a trigger is cloned from a parent table, because of the behavior that the trigger is dropped on detach. Make psql's \d display it. We'd like to backpatch, but lack of the pg_trigger.tgparentid column makes it more difficult. Punt for now. If somebody wants to volunteer an implementation that reads pg_depend on older versions, that can probably be backpatched. Authors: Justin Pryzby, Amit Langote, Álvaro Herrera Discussion: https://postgr.es/m/20200419002206.GM26953@telsasoft.com	2020-04-21 18:37:26 -04:00
Michael Paquier	27dbe1a184	Fix memory leak in libpq when using sslmode=verify-full Checking if Subject Alternative Names (SANs) from a certificate match with the hostname connected to leaked memory after each lookup done. This is broken since `acd08d7` that added support for SANs in SSL certificates, so backpatch down to 9.5. Author: Roman Peshkurov Reviewed-by: Hamid Akhtar, Michael Paquier, David Steele Discussion: https://postgr.es/m/CALLDf-pZ-E3mjxd5=bnHsDu9zHEOnpgPgdnO84E2RuwMCjjyPw@mail.gmail.com Backpatch-through: 9.5	2020-04-22 07:27:03 +09:00
Tom Lane	d12bdba77b	Fix possible crash during FATAL exit from reindexing. index.c supposed that it could just use a PG_TRY block to clean up the state associated with an active REINDEX operation. However, that code doesn't run if we do a FATAL exit --- for example, due to a SIGTERM shutdown signal --- while the REINDEX is happening. And that state does get consulted during catalog accesses, which makes it problematic if we do any catalog accesses during shutdown --- for example, to clean up any temp tables created in the session. If this combination of circumstances occurred, we could find ourselves trying to access already-freed memory. In debug builds that'd fairly reliably cause an assertion failure. In production we might often get away with it, but with some bad luck it could cause a core dump. Another possible bad outcome is an erroneous conclusion that an index-to-be-accessed is being reindexed; but it looks like that would be unlikely to have any consequences worse than failing to drop temp tables right away. (They'd still get dropped by the next session that uses that temp schema.) To fix, get rid of the use of PG_TRY here, and instead hook into the transaction abort mechanisms to clean up reindex state. Per bug #16378 from Alexander Lakhin. This has been wrong for a very long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/16378-7a70ca41b3ec2009@postgresql.org	2020-04-21 15:58:42 -04:00
Tom Lane	5836d32655	Fix minor violations of FunctionCallInvoke usage protocol. Working on commit `1c455078b` led me to check through FunctionCallInvoke call sites to see if every one was being honest about (a) making sure that fcinfo.isnull is initially false, and (b) checking its state after the call. Sure enough, I found some violations. The main one is that finalize_partialaggregate re-used serialfn_fcinfo without resetting isnull, even though it clearly intends to cater for serialfns that return NULL. There would only be an issue with a non-strict serialfn, since it's unlikely that a serialfn would return NULL for non-null input. We have no non-strict serialfns in core, and there may be none in the wild either, which would account for the lack of complaints. Still, it's clearly wrong, so back-patch that fix to 9.6 where finalize_partialaggregate was introduced. Also, arrayfuncs.c and rowtypes.c contained various callers that were not bothering to check for result nulls. While what's being called is a comparison or hash function that probably shouldn't return null, that's a lousy excuse for not having any check at all. There are existing places that just Assert(!fcinfo->isnull) in comparable situations, so I added that to the places that were calling btree comparison or hash support functions. In the places calling boolean-returning equality functions, it's quite cheap to have them treat isnull as FALSE, so make those places do that. Also remove some "locfcinfo->isnull = false" assignments that are unnecessary given the assumption that no previous call returned null. These changes seem like mostly neatnik-ism or debugging support, so I didn't back-patch.	2020-04-21 14:23:53 -04:00
Alvaro Herrera	afccd76f1c	Fix detaching partitions with cloned row triggers When a partition is detached, any triggers that had been cloned from its parent were not properly disentangled from its parent triggers. This resulted in triggers that could not be dropped because they depended on the trigger in the trigger in the no-longer-parent table: ALTER TABLE t DETACH PARTITION t1; DROP TRIGGER trig ON t1; ERROR: cannot drop trigger trig on table t1 because trigger trig on table t requires it HINT: You can drop trigger trig on table t instead. Moreover the table can no longer be re-attached to its parent, because the trigger name is already taken: ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2); ERROR: trigger "trig" for relation "t1" already exists The former is a bug introduced in commit `86f575948c`. (The latter is not necessarily a bug, but it makes the bug more uncomfortable.) To avoid the complexity that would be needed to tell whether the trigger has a local definition that has to be merged with the one coming from the parent table, establish the behavior that the trigger is removed when the table is detached. Backpatch to pg11. Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com	2020-04-21 13:57:00 -04:00
Peter Geoghegan	1542e16f2c	Consider outliers in split interval calculation. Commit `0d861bbb`, which introduced deduplication to nbtree, added some logic to take large posting list tuples into account when choosing a split point. We subtract firstright posting list overhead from the projected new high key size when calculating leftfree/rightfree values for an affected candidate split point. Posting list tuples aren't special to nbtsplitloc.c, but taking them into account like this makes a huge difference in practice. Posting list tuples are frequently tuple size outliers. However, commit `0d861bbb` missed a closely related issue: split interval itself is calculated based on the assumption that tuples on the page being split are roughly equisized. That assumption was acceptable back when commit `fab25024` taught the logic for choosing a split point about suffix truncation, but it's pretty questionable now that very large tuple sizes are common. This oversight led to unbalanced page splits in low cardinality multi-column indexes when deduplication was used: page splits that don't give sufficient weight to how unbalanced the split is when the interval happens to include some large posting list tuples (and when most other tuples on the page are not so large). Nail this down by calculating an initial split interval in a way that's attuned to the actual cost that we want to keep under control (not a fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to each candidate split point that actually gets included in the split interval (for the default strategy). This replaces logic that used a percentage of all legal split points for the page as the basis of the initial split interval. Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com	2020-04-21 09:59:24 -07:00
Tom Lane	1c455078b0	Allow matchingsel() to be used with operators that might return NULL. Although selfuncs.c will never call a target operator with null inputs, some functions might return null anyway. The existing coding will fail if that happens (since FunctionCall2Coll will punt), which seems undesirable given that matchingsel() has such a broad range of potential applicability --- in fact, we already have a problem because we apply it to jsonb_path_exists_opr, which can return null. Hence, rejigger the underlying functions mcv_selectivity and histogram_selectivity to cope, treating a null result as false. While we are at it, we can move the InitFunctionCallInfoData overhead out of the inner loops, which isn't a huge number of cycles but might save something considering we are likely calling functions as cheap as int4eq(). Plus, the number of loop cycles to be expected is much more than it was when this code was written, since typical settings of default_statistics_target are higher. In view of that consideration, let's apply the same change to var_eq_const, eqjoinsel_inner, and eqjoinsel_semi. We do not expect equality functions to ever return null for non-null inputs (and certainly that code has been that way a long time without complaints), but the cycle savings seem attractive, especially in the eqjoinsel loops where there's potentially an O(N^2) savings. Similar code exists in ineq_histogram_selectivity and get_variable_range, but I forebore from changing those for now. The performance argument for changing ineq_histogram_selectivity is really weak anyway, since that will only iterate log2(N) times. Nikita Glukhov and Tom Lane Discussion: https://postgr.es/m/9d3b0959-95d6-c37e-2c0b-287bcfe5c705@postgrespro.ru	2020-04-21 12:56:55 -04:00
Tom Lane	9d25e1aa31	Clean up cpluspluscheck violation. "operator" is a reserved word in C++, so per project conventions, don't use it as an identifier in header files. My oversight in commit `a80818605`.	2020-04-21 11:21:15 -04:00
Tom Lane	2117c3cb3d	Fix duplicate typedef from commit `0d8c9c121`. Older gcc versions don't like duplicate typedefs, so get rid of that in favor of doing it like we do it elsewhere, ie just use a "struct" declaration when trying to avoid importing a whole header file. Also, there seems no reason to include stringinfo.h here at all, so get rid of that addition too. Discussion: https://postgr.es/m/27239.1587415696@sss.pgh.pa.us	2020-04-21 11:13:05 -04:00
Robert Haas	079ac29d4d	Move the server's backup manifest code to a separate file. basebackup.c is already a pretty big and complicated file, so it makes more sense to keep the backup manifest support routines in a separate file, for clarity and ease of maintenance. Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com	2020-04-20 14:38:15 -04:00
Alvaro Herrera	1e324cb0e7	Add tab-completion for ALTER INDEX .. [NO] DEPENDS ON ... as added in the prior commit. (We'd like to have tab-completion for the other object types too, but they don't have sub-command completion yet.) Author: Ibrar Ahmed <ibrar.ahmad@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CALtqXTcogrFEVP9uou5vFtnGsn+vHZUu9+9a0inarfYVOHScYQ@mail.gmail.com	2020-04-20 13:42:41 -04:00
Alvaro Herrera	5fc703946b	Add ALTER .. NO DEPENDS ON Commit `f2fcad27d5` (9.6 era) added the ability to mark objects as dependent an extension, but forgot to add a way for such dependencies to be removed. This commit fixes that oversight. Strictly speaking this should be backpatched to 9.6, but due to lack of demand we're not doing so at this time. Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com> Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2020-04-20 13:42:12 -04:00
Magnus Hagander	7e4e574744	Allow pg_read_all_stats to access all stats views again The views pg_stat_progress_* had not gotten the memo that pg_read_all_stats is supposed to be able to read all statistics. Also make a pass over all text-returning pg_stat_xyz functions that could return "insufficient privilege" and make sure they also respect pg_read_all_status. Reported-by: Andrey M. Borodin Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru	2020-04-20 12:53:40 +02:00
Jeff Davis	0cacb2b79d	Fix missing pfree() in logtape.c, missed by `24d85952`.	2020-04-19 10:33:06 -07:00
Peter Eisentraut	73afabcdc2	Fix update-unicode target The normalization-check target needs to be run last, after moving the newly generated files into place. Also, we need an additional dependency so that unicode_norm.o is rebuilt first. Otherwise, norm_test will still test the old files but against the new expected results, which will probably fail.	2020-04-19 14:59:29 +02:00
Tom Lane	f332241a60	Fix race conditions in synchronous standby management. We have repeatedly seen the buildfarm reach the Assert(false) in SyncRepGetSyncStandbysPriority. This apparently is due to failing to consider the possibility that the sync_standby_priority values in shared memory might be inconsistent; but they will be whenever only some of the walsenders have updated their values after a change in the synchronous_standby_names setting. That function is vastly too complex for what it does, anyway, so rewriting it seems better than trying to apply a band-aid fix. Furthermore, the API of SyncRepGetSyncStandbys is broken by design: it returns a list of WalSnd array indexes, but there is nothing guaranteeing that the contents of the WalSnd array remain stable. Thus, if some walsender exits and then a new walsender process takes over that WalSnd array slot, a caller might make use of WAL position data that it should not, potentially leading to incorrect decisions about whether to release transactions that are waiting for synchronous commit. To fix, replace SyncRepGetSyncStandbys with a new function SyncRepGetCandidateStandbys that copies all the required data from shared memory while holding the relevant mutexes. If the associated walsender process then exits, this data is still safe to make release decisions with, since we know that that much WAL was sent to a valid standby server. This incidentally means that we no longer need to treat sync_standby_priority as protected by the SyncRepLock rather than the per-walsender mutex. SyncRepGetSyncStandbys is no longer used by the core code, so remove it entirely in HEAD. However, it seems possible that external code is relying on that function, so do not remove it from the back branches. Instead, just remove the known-incorrect Assert. When the bug occurs, the function will return a too-short list, which callers should treat as meaning there are not enough sync standbys, which seems like a reasonably safe fallback until the inconsistent state is resolved. Moreover it's bug-compatible with what has been happening in non-assert builds. We cannot do anything about the walsender-replacement race condition without an API/ABI break. The bogus assertion exists back to 9.6, but 9.6 is sufficiently different from the later branches that the patch doesn't apply at all. I chose to just remove the bogus assertion in 9.6, feeling that the probability of a bad outcome from the walsender-replacement race condition is too low to justify rewriting the whole patch for 9.6. Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us	2020-04-18 14:02:44 -04:00
David Rowley	3cb02e307e	Fix possible crash with GENERATED ALWAYS columns In some corner cases, this could also lead to corrupted values being included in the tuple. Users who are concerned that they are affected by this should first upgrade and then perform a base backup of their database and restore onto an off-line server. They should then query each table with generated columns to ensure there are no rows where the generated expression does not match a newly calculated version of the GENERATED ALWAYS expression. If no crashes occur and no rows are returned then you're not affected. Fixes bug #16369. Reported-by: Cameron Ezell Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)	2020-04-18 14:10:37 +12:00
Andrew Dunstan	6741cfa596	Revert "Only provide new libpq sslpasskey hook for openssl-enabled builds" This reverts commit `9e24109f1a`. This caused build errors when building without openssl, and it's simplest just to revert it.	2020-04-17 16:53:01 -04:00
Andrew Dunstan	f342d7ad03	Only provide openssl_tls_init_hook if building with openssl This should have been protected by #ifdef USE_OPENSSL in commit `896fcdb230`. Per the real complaint this time from Daniel Gustafsson.	2020-04-17 15:57:19 -04:00
Andrew Dunstan	a9659fb654	Use a slightly more liberal regex to detect Visual Studio version Apparently in some language versions of Visual Studio nmake outputs some material after the version number and before the end of the line. This has been seen in Chinese versions. Therefore, we no longer demand that the version string comes at the end of a line. Per complaint from Cuiping Lin. Backpatch to all live branches.	2020-04-17 14:44:33 -04:00
Andrew Dunstan	9e24109f1a	Only provide new libpq sslpasskey hook for openssl-enabled builds In commit `4dc6355210` I neglected to put #ifdef USE_OPENSSL around the declarations of the new items. This is remedied here. Per complaint from Daniel Gustafsson.	2020-04-17 14:11:18 -04:00
Tom Lane	3125a5baec	Fix possible future cache reference leak in ALTER EXTENSION ADD/DROP. recordExtObjInitPriv and removeExtObjInitPriv were sloppy about calling ReleaseSysCache. The cases cannot occur given current usage in ALTER EXTENSION ADD/DROP, since we wouldn't get here for these relkinds; but it seems wise to clean up better. In passing, extend test logic in test_pg_dump to exercise the dropped-column code paths here. Since the case is unreachable at present, there seems no great need to back-patch; hence fix HEAD only. Kyotaro Horiguchi, with test case and comment adjustments by me Discussion: https://postgr.es/m/20200417.151831.1153577605111650154.horikyota.ntt@gmail.com	2020-04-17 13:41:59 -04:00
Michael Paquier	198efe774b	Fix minor memory leak in pg_basebackup and pg_receivewal The result of the query used to retrieve the WAL segment size from the backend was not getting freed in two code paths. Both pg_basebackup and pg_receivewal exit immediately if a failure happened on this query, so this was not an actual problem, but it could be an issue if this code gets used for other tools in different ways, be they future tools in this code tree or external, existing, ones. Oversight in commit `fc49e24`, so backpatch down to 11. Author: Jie Zhang Discussion: https://postgr.es/m/970ad9508461469b9450b64027842331@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 11	2020-04-17 10:45:08 +09:00
David Rowley	5b736e9cf9	Remove unneeded constraint dependency tracking It was previously thought that remove_useless_groupby_columns() needed to keep track of which constraints the generated plan depended upon, however, this is unnecessary. The confusion likely arose regarding this because of check_functional_grouping(), which does need to track the dependency to ensure VIEWs with columns which are functionally dependant on the GROUP BY remain so. For remove_useless_groupby_columns(), cached plans will just become invalidated when the primary key's underlying index is removed through the normal relcache invalidation code. Here we just remove the unneeded code which records the dependency and updates the comments. The previous comments claimed that we could not use UNIQUE constraints for the same optimization due to lack of a pg_constraint record for NOT NULL constraints (which are required because NULLs can be duplicated in a unique index). Since we don't actually need a pg_constraint record to handle the invalidation, it looks like we could add code to do this in the future. But not today. We're not really fixing any bug in the code here, this fix is just to set the record straight on UNIQUE constraints. This code was added back in 9.6, but due to lack of any bug, we'll not be backpatching this. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvrdYa=VhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL+PYMVN8OnQ@mail.gmail.com	2020-04-17 10:29:49 +12:00
Amit Kapila	24d2d38b1e	Fix the usage of parallel and full options of vacuum command. Earlier we were inconsistent in allowing the usage of parallel and full options. Change it such that we disallow them only when they are combined in a way that we don't support. In passing, improve the comments in some of the existing tests of parallel vacuum. Reported-by: Tushar Ahuja Author: Justin Pryzby, Amit Kapila Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and Amit Kapila Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com	2020-04-16 10:55:02 +05:30
Michael Paquier	542d7817f7	Disable silently generation of manifests with servers <= 12 in pg_basebackup Since `0d8c9c1`, pg_basebackup would generate an error if connected to a backend version older than 12 where backup manifests are not supported. Avoiding this error is possible by using the --no-manifest option. This error handling could be confusing for some users, where patching a backup script that interacts with multiple backend versions would cause the addition of --no-manifest to potentially not generate a backup manifest even for Postgres 13 and newer versions. As we want to encourage the use of backup manifests as much as possible, this commit silently disables manifests where not supported, instead of generating an error. While on it, rework a bit the code to make it more consistent with the surroundings when generating the BASE_BACKUP command. Per discussion with Andres Freund, Stephen Frost, Robert Haas, Álvaro Herrera, Kyotaro Horiguchi, Tom Lane, David Steele, and me. Author: Michael Paquier Discussion: https://postgr.es/m/20200410080910.GZ1606@paquier.xyz	2020-04-16 13:57:07 +09:00
Peter Geoghegan	f0ca378d4c	Slightly simplify nbtree split point choice loop. Spotted during post-commit review of the nbtree deduplication commit (commit `0d861bbb`).	2020-04-15 15:47:26 -07:00
Michael Paquier	8f4ee44bcd	Fix minor memory leak in pg_dump A query used to read default ACL information from the catalogs did not free a set of PQExpBuffer. Oversight in commit `e2090d9`, so backpatch down to 9.6. Author: Jie Zhang Reviewed-by: Sawada Masahiko Discussion: https://postgr.es/m/05bcbc5857f948efa0b451b85a48ae10@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 9.6	2020-04-15 15:56:01 +09:00
Fujii Masao	a2ac73e7be	Code review for backup manifest. This commit prevents pg_basebackup from receiving backup_manifest file when --no-manifest is specified. Previously, when pg_basebackup was writing a tarfile to stdout, it tried to receive backup_manifest file even when --no-manifest was specified, and reported an error. Also remove unused -m option from pg_basebackup. Also fix typo in BASE_BACKUP command documentation. Author: Fujii Masao Reviewed-by: Michael Paquier, Robert Haas Discussion: https://postgr.es/m/01e3ed3a-8729-5aaa-ca84-e60e3ca59db8@oss.nttdata.com	2020-04-15 11:15:12 +09:00
Peter Geoghegan	4a05a64095	Remove obsolete "hole in center of page" comment. A comment from the Berkeley days incorrectly claimed that the page management code cares about the contents of the hole in the center of the page (at least in the case of the left half of an nbtree page split). Commit `8fa30f906b` added an addendum that stated that the original comment was "probably obsolete". It's definitely obsolete, though, so remove the original comment plus the addendum.	2020-04-14 14:38:28 -07:00
Tom Lane	2d59643dbc	Account for collation when coercing the output of a SQL function. Commit `913bbd88d` overlooked that the result of coerce_to_target_type might need collation fixups. Per report from Andreas Joseph Krogh. Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena	2020-04-14 17:30:36 -04:00
Andrew Dunstan	0516f94d18	Stop requiring an explicit return from perl subroutines The consensus of the project appears to be that this provides little benefit and is simply an annoyance. Discussion: https://postgr.es/m/27481.1586618092@sss.pgh.pa.us	2020-04-14 16:55:34 -04:00
Andrew Dunstan	e60c6f6ea1	Set Perl search path more idiomatically Back in commits `1df92eeafe`, `f884a96819`, and `592123efbb` I used some hackish code to set the script search path, unaware despite decades of perl that there was a completely standard way to do this. This patch changes those cases to use the standard perl FindBin package.	2020-04-14 16:47:07 -04:00
Peter Geoghegan	80634e3b18	Rearrange _bt_insertonpg() "update metapage" code. Nest the "update metapage as part of insert into root-like page" branch inside the broader "insert into internal page" branch. This improves readability.	2020-04-14 09:33:18 -07:00
Michael Paquier	8128b0c152	Fix collection of typos and grammar mistakes in the tree, volume 2 This fixes some comments and documentation new as of Postgres 13, and is a follow-up of the work done in `dd0f37e`. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com	2020-04-14 14:45:43 +09:00
Peter Geoghegan	f762b2feba	Add defensive "split_only_page" nbtree assertion. Clearly it's not okay for nbtree to split a page that is the only page on its level, and then find that it has to split the parent one level up in turn. There is simply no code to handle the split_only_page case in the _bt_insertonpg() "newitem won't fit" branch (only the "newitem fits" branch handles split_only_page). Add a defensive assertion that will fail if a split_only_page call to _bt_insertonpg() somehow ends up splitting the target/parent page. I (pgeoghegan) believe that we don't need split_only_page handling for the "newitem won't fit" branch because anybody calling _bt_insertonpg() like this would have to hold a lock on the same one and only child page.	2020-04-13 21:11:03 -07:00
Amit Kapila	a6fea120a7	Comments and doc fixes for commit `40d964ec99`. Reported-by: Justin Pryzby Author: Justin Pryzby, with few changes by me Reviewed-by: Amit Kapila and Sawada Masahiko Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com	2020-04-14 08:10:27 +05:30
Peter Geoghegan	826ee1a019	Make _bt_insertonpg() more like _bt_split(). It seems like a good idea for nbtree's retail insert code to be absolutely consistent with nbtree's page split code for anything that naturally requires equivalent handling. Anything that concerns inserting newitem (which is handled as part of the page split atomic action when a page split is required) should work in exactly the same way. With that in mind, make _bt_insertonpg() handle 'cbuf' in a way that matches _bt_split().	2020-04-13 19:26:41 -07:00
Noah Misch	d60cfb6bf2	Add a wait_for_catchup() before immediate stop of a test master. Per buildfarm member hoverfly, a slow walsender could make the test fail. Back-patch to v10, where the test was introduced. Discussion: https://postgr.es/m/20200414013849.GA886648@rfd.leadboat.com	2020-04-13 18:47:28 -07:00
Peter Geoghegan	bc3087b626	Harmonize nbtree page split point code. An nbtree split point can be thought of as a point between two adjoining tuples from an imaginary version of the page being split that includes the incoming/new item (in addition to the items that really are on the page). These adjoining tuples are called the lastleft and firstright tuples. The variables that represent split points contained a field called firstright, which is an offset number of the first data item from the original page that goes on the new right page. The corresponding tuple from origpage was usually the same thing as the actual firstright tuple, but not always: the firstright tuple is sometimes the new/incoming item instead. This situation seems unnecessarily confusing. Make things clearer by renaming the origpage offset returned by _bt_findsplitloc() to "firstrightoff". We now have a firstright tuple and a firstrightoff offset number which are comparable to the newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers respectively. Also make sure that we are consistent about how we describe nbtree page split point state. Push the responsibility for dealing with pg_upgrade'd !heapkeyspace indexes down to lower level code, relieving _bt_split() from dealing with it directly. This means that we always have a palloc'd left page high key on the leaf level, no matter what. This enables simplifying some of the code (and code comments) within _bt_split(). Finally, restructure the page split code to make it clearer why suffix truncation (which only takes place during leaf page splits) is completely different to the first data item truncation that takes place during internal page splits. Tuples are marked as having fewer attributes stored in both cases, and the firstright tuple is truncated in both cases, so it's easy to imagine somebody missing the distinction.	2020-04-13 16:39:55 -07:00
Andrew Dunstan	8f00d84afc	Use perl's $/ more idiomatically This replaces a few occurrences of ugly code with a more clean and idiomatic usage. The problem was highlighted by perlcritic, but we're not enforcing the policy that led to the discovery. Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com	2020-04-13 12:06:11 -04:00
Andrew Dunstan	7be5d8df1f	Use perl warnings pragma consistently We've had a mixture of the warnings pragma, the -w switch on the shebang line, and no warnings at all. This patch removes the -w swicth and add the warnings pragma to all perl sources missing it. It raises the severity of the TestingAndDebugging::RequireUseWarnings perlcritic policy to level 5, so that we catch any future violations. Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com	2020-04-13 11:55:45 -04:00
Andrew Dunstan	8930e43ecd	Print policy name in perlcritic messages This makes it easier to do a web search for details of the policy that's been violated, as well as displaying the name that might be needed for a policy override. Various perlcritic settings changes are being discussed, but this one should be uncontroversial.	2020-04-13 11:46:18 -04:00
Amit Kapila	ef08ca113f	Cosmetic fixups for WAL usage work. Reported-by: Justin Pryzby and Euler Taveira Author: Justin Pryzby and Julien Rouhaud Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-13 15:31:16 +05:30
Peter Eisentraut	0c620a5803	Improve error messages after LoadLibrary() Move the file name to a format parameter to ease translatability. Add error code where missing. Make the wording consistent.	2020-04-13 10:24:46 +02:00
Robert Haas	dbc60c5593	Rename pg_validatebackup to pg_verifybackup. Also, use "verify" rather than "validate" to refer to the process being undertaken here. Per discussion, that is a more appropriate term. Discussion: https://www.postgresql.org/message-id/172c9d9b-1d0a-1b94-1456-376b1e017322@2ndquadrant.com Discussion: http://postgr.es/m/CA+TgmobLgMh6p8FmLbj_rv9Uhd7tPrLnAyLgGd2SoSj=qD-bVg@mail.gmail.com	2020-04-12 11:26:05 -04:00
Tom Lane	35cb574aa8	Suppress -Wimplicit-fallthrough warning in new LIMIT WITH TIES code. The placement of the fall-through comment in this code appears not to work to suppress the warning in recent gcc. Move it to the bottom of the case group, and add an assertion that we didn't get there through some other code path. Also improve wording of nearby comments. Julien Rouhaud, comment hacking by me Discussion: https://postgr.es/m/CAOBaU_aLdPGU5wCpaowNLF-Q8328iR7mj1yJAhMOVsdLwY+sdg@mail.gmail.com	2020-04-11 15:02:44 -04:00
Noah Misch	328c70997b	Optimize RelationFindReplTupleSeq() for CLOBBER_CACHE_ALWAYS. Specifically, remember lookup_type_cache() results instead of retrieving them once per comparison. Under CLOBBER_CACHE_ALWAYS, this reduced src/test/subscription/t/001_rep_changes.pl elapsed time by an order of magnitude, which reduced check-world elapsed time by 9%. Discussion: https://postgr.es/m/20200406085420.GC162712@rfd.leadboat.com	2020-04-11 10:30:12 -07:00
Noah Misch	4216858122	When WalSndCaughtUp, sleep only in WalSndWaitForWal(). Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write < sentPtr. That is important in logical replication. When the latest physical LSN yields no logical replication messages (a common case), that keepalive elicits a reply, and processing the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when WalSndLoop() slept, replay_lsn advancement could stall until wal_receiver_status_interval elapsed. This sometimes stalled src/test/subscription/t/001_rep_changes.pl for up to 10s. Discussion: https://postgr.es/m/20200406063649.GA3738151@rfd.leadboat.com	2020-04-11 10:30:00 -07:00
Tom Lane	969f9d0b4b	Make EXPLAIN report maximum hashtable usage across multiple rescans. Before discarding the old hash table in ExecReScanHashJoin, capture its statistics, ensuring that we report the maximum hashtable size across repeated rescans of the hash input relation. We can repurpose the existing code for reporting hashtable size in parallel workers to help with this, making the patch pretty small. This also ensures that if rescans happen within parallel workers, we get the correct maximums across all instances. Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro of a trouble report from Alvaro Herrera. Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql	2020-04-11 12:39:19 -04:00
Tom Lane	5c27bce7f3	Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case. ExecReScanHashJoin will destroy the join's hash table if it expects that the inner relation will produce different rows on rescan. Up to now it's not bothered to clear the additional pointer to that hash table that exists in the child HashState node. However, it's possible for the query to terminate without building a fresh hash table (this happens if the outer relation is found to be empty during the final rescan). So we can end with a dangling pointer to a deleted hash table. That was harmless originally, but since 9.0 EXPLAIN ANALYZE has used that pointer to print hash table statistics. In debug builds this reproducibly results in garbage statistics. In non-debug builds there's frequently no ill effects, but in principle one could get wrong EXPLAIN ANALYZE output, or perhaps even a crash if free() has released the hashtable memory back to the OS. To fix, just make sure we clear the additional pointer when destroying the hash table. In problematic cases, EXPLAIN ANALYZE will then print no hashtable statistics (reverting to its pre-9.0 behavior). This isn't ideal, but since the problem manifests only in unusual corner cases, it's hard to justify taking any risks to do better in the back branches. A follow-on patch will improve matters in HEAD. Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro of a trouble report from Alvaro Herrera. Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql	2020-04-11 12:29:06 -04:00
Peter Eisentraut	12fb189bfe	Fix RELCACHE_FORCE_RELEASE issue Introduced by `83fd4532a7`. To fix, the tuple descriptors need to be copied into the current memory context. Discussion: https://www.postgresql.org/message-id/04d78603-edae-9243-9dde-fe3037176a7d@2ndquadrant.com	2020-04-11 15:07:25 +02:00
Peter Eisentraut	5a1d0c9925	Fix relcache reference leak Introduced by `83fd4532a7`	2020-04-11 09:44:14 +02:00
Tom Lane	401418ca6a	Suppress unused-variable warning. Ashutosh Bapat Discussion: https://postgr.es/m/CAG-ACPWPB8Lc_aFj25eiPFqi31YB5vmaZnb39mbHSf5Yej=miA@mail.gmail.com	2020-04-10 12:00:28 -04:00
Michael Paquier	dd0f37ecce	Fix collection of typos and grammar mistakes in the tree This fixes some comments and documentation new as of Postgres 13. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com	2020-04-10 11:18:39 +09:00
Tom Lane	e083fa34ce	Further stabilize results of 019_replslot_limit.pl. Depending on specific values of restart_lsn or pg_current_wal_lsn() is obviously unsafe. The previous coding tried to dodge this issue by rounding off, but that's not good enough, as shown by multiple buildfarm members. Nuke all the uses of these values except for null-ness checks, pending some credible argument why we should think something else could be usefully stable. Kyotaro Horiguchi, further modified by me Discussion: https://postgr.es/m/E1jM1Sa-0003mS-99@gemulon.postgresql.org	2020-04-09 17:28:58 -04:00
Tom Lane	2e0e409e3c	Further cleanup of ts_headline code. Suppress a probably-meaningless uninitialized-variable warning (induced by my previous patch, I'm sorry to say). Improve mark_hl_fragments()'s test for overlapping cover strings: it failed to consider the possibility that the current string is strictly within another one. That's unlikely given the preceding splitting into MaxWords fragments, but I don't think it's impossible. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 15:38:43 -04:00
Tom Lane	c9b0c678d3	Fix default text search parser's ts_headline code for phrase queries. This code could produce very poor results when asked to highlight a string based on a query using phrase-match operators. The root cause is that hlCover(), which is supposed to find a minimal substring that matches the query, was written assuming that word position is not significant. I'm only 95% convinced that its algorithm was correct even for plain AND/OR queries; but it definitely fails completely for phrase matches, causing it to possibly not identify a cover string at all. Hence, rewrite hlCover() with a less-tense algorithm that just tries all the possible substrings, earlier and shorter ones first. (This is not as bad as it sounds performance-wise, because all of the string matching has been done already: the repeated tsquery match checks boil down to pointer comparisons.) Unfortunately, since that approach produces more candidate cover strings than before, it also exposes that there were bugs in the heuristics in mark_hl_words() for selecting a best cover string. Fixes there include: * Do not apply the ShortWord filter to words that appear in the query. * Remove a misguided optimization for quickly rejecting a cover. * Fix order-of-operation bug that could cause computation of a wrong figure of merit (poslen) when shortening a cover. * Change the preference rule so that candidate headlines that do not include their whole cover string (after MaxWords trimming) are lowest priority, since they may not actually satisfy the user's query. This results in some changes in existing regression test cases, but they all seem reasonable. Note in particular that the tests involving strings like "1 2 3" were previously being affected by the ShortWord filter, masking the normal matching behavior. Per bug #16345 from Augustinas Jokubauskas; the new test cases are based on that example. Back-patch to 9.6 where phrase search was added to tsquery. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 13:19:23 -04:00
Tom Lane	b10f8bb9fd	Cosmetic improvements for default text search parser's ts_headline code. This code was woefully unreadable and under-commented. Try to improve matters by adding comments, using some macros to make complicated if-tests more readable, using boolean type where appropriate, etc. There are a couple of tiny coding improvements too, but this commit includes (I hope) no behavioral change. Nonetheless, back-patch as far as 9.6, because a followup bug-fixing commit depends on this. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 12:36:59 -04:00
Peter Eisentraut	e92e4a2b68	Fix CREATE TABLE LIKE INCLUDING GENERATED column order issue CREATE TABLE LIKE INCLUDING GENERATED would fail if a generated column referred to a column with a higher attribute number. This is because the column mapping mechanism created the mapping incrementally as columns are added. This was sufficient for previous uses of that mechanism (omitting dropped columns), and it also happened to work if generated columns only referred to columns with lower attribute numbers, but here it failed. This fix is to build the attribute mapping in a separate loop before processing the columns in detail. Bug: #16342 Reported-by: Ethan Waldo <ewaldo@healthetechs.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2020-04-09 16:36:45 +02:00
Fujii Masao	1ec50a81ec	Exclude backup_manifest file that existed in database, from BASE_BACKUP. If there is already a backup_manifest file in the database cluster, it belongs to the past backup that was used to start this server. It is not correct for the backup being taken now. So this commit changes pg_basebackup so that it always skips such backup_manifest file. The backup_manifest file for the current backup will be injected separately if users want it. Author: Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/78f76a3d-1a28-a97d-0394-5c96985dd1c0@oss.nttdata.com	2020-04-09 22:37:11 +09:00
Amit Kapila	5c71362174	Allow parallel create index to accumulate buffer usage stats. Currently, we don't account for buffer usage incurred by parallel workers for parallel create index. This commit allows each worker to record the buffer usage stats and leader backend to accumulate that stats at the end of the operation. This will allow pg_stat_statements to display correct buffer usage stats for (parallel) create index command. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila Backpatch-through: 11, where this was introduced Discussion: https://postgr.es/m/20200328151721.GB12854@nol	2020-04-09 09:49:30 +05:30
Andrew Dunstan	c3e4cbaab9	Msys2 tweaks for pg_validatebackup corruption test 1. Tell Msys2 not to mangle the tablespace map parameter 2. If rmdir doesn't work, fall back to trying unlink on the entry in pg_tblspc. Discussion: https://postgr.es/m/7330a7c7-ce5f-9769-39a1-bdb0b32bb4a6@2ndQuadrant.com	2020-04-08 17:50:55 -04:00
Peter Eisentraut	f45b8e51b6	createuser: Change a fprintf to pg_log_error	2020-04-08 19:26:09 +02:00
Tomas Vondra	cea09246e5	Stabilize incremental_sort tests The test never did ANALYZE on the test table, so the plans depended on various defaults (e.g. number of groups being 200). This worked most of the time, but with CLOBBER_CACHE_ALWAYS the autoanalyze often managed to build accurate stats, changing the plan. Fixed by increasing the size of test tables a bit, making the Sort a bit more expensive than Incremental Sort. The tests were constructed to test transitions in the Incremental Sort algorithm, and this change does not break that. Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-08 18:30:11 +02:00
Tom Lane	a9d70c1087	Fix pg_dump/pg_restore to restore event trigger comments later. Repair an oversight in commit 8728b2c70: if we're postponing restore of event triggers to the end, we must also postpone restoring any comments on them, since of course we cannot create the comments first. (This opens yet another opportunity for an event trigger to bollix the restore, but there's no help for that.) Per bug #16346 from Alexander Lakhin. Like the previous commit, back-patch to all supported branches. Hamid Akhtar and Tom Lane Discussion: https://postgr.es/m/16346-6210ad7a0ea81be1@postgresql.org	2020-04-08 11:23:39 -04:00
Thomas Munro	d140f2f3e2	Rationalize GetWalRcv{Write,Flush}RecPtr(). GetWalRcvWriteRecPtr() previously reported the latest flushed location. Adopt the conventional terminology used elsewhere in the tree by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related variables that used the term "received". Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest written value. This will allow later patches to use the value for non-data-integrity purposes, without having to wait for the flush pointer to advance. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2020-04-08 23:45:09 +12:00
Peter Eisentraut	83fd4532a7	Allow publishing partition changes via ancestors To control whether partition changes are replicated using their own identity and schema or an ancestor's, add a new parameter that can be set per publication named 'publish_via_partition_root'. This allows replicating a partitioned table into a different partition structure on the subscriber. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-04-08 11:19:23 +02:00
Alexander Korotkov	1aac32df89	Revert `0f5ca02f53` `0f5ca02f53` introduces 3 new keywords. It appears to be too much for relatively small feature. Given now we past feature freeze, it's already late for discussion of the new syntax. So, revert. Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us	2020-04-08 11:37:27 +03:00
David Rowley	02a2e8b442	Modify additional power 2 calculations to use new helper functions 2nd pass of modifying various places which obtain the next power of 2 of a number and make them use the new functions added in `f0705bb62`. In passing, also modify num_combinations(). This can be implemented using simple bitshifting rather than looping. Reviewed-by: John Naylor Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org	2020-04-08 18:29:51 +12:00
Michael Paquier	c0187869a0	Fix crash when using COLLATE in partition bound expressions Attempting to use a COLLATE clause with a type that it not collatable in a partition bound expression could crash the server. This commit fixes the code by adding more checks similar to what is done when computing index or partition attributes by making sure that there is a collation iff the type is collatable. Backpatch down to 12, as `7c079d7` introduced this problem. Reported-by: Alexander Lakhin Author: Dmitry Dolgov Discussion: https://postgr.es/m/16325-809194cf742313ab@postgresql.org Backpatch-through: 12	2020-04-08 15:04:51 +09:00
David Rowley	d025cf88ba	Modify various power 2 calculations to use new helper functions First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org	2020-04-08 16:55:03 +12:00
Jeff Davis	50a38f6517	Create memory context for HashAgg with a reasonable maxBlockSize. If the memory context's maxBlockSize is too big, a single block allocation can suddenly exceed work_mem. For Hash Aggregation, this can mean spilling to disk too early or reporting a confusing memory usage number for EXPLAN ANALYZE. Introduce CreateWorkExprContext(), which is like CreateExprContext(), except that it creates the AllocSet with a maxBlockSize that is reasonable in proportion to work_mem. Right now, CreateWorkExprContext() is only used by Hash Aggregation, but it may be generally useful in the future. Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com	2020-04-07 21:25:28 -07:00
David Rowley	f0705bb628	Add functions to calculate the next power of 2 There are many areas in the code where we need to determine the next highest power of 2 of a given number. We tend to always do that in an ad-hoc way each time, generally with some tight for loop which performs a bitshift left once per loop and goes until it finds a number above the given number. Here we add two generic functions which make use of the existing pg_leftmost_one_pos* functions which, when available, will allow us to calculate the next power of 2 without any looping. Here we don't add any code which uses these new functions. That will be done in follow-up commits. Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org	2020-04-08 16:22:52 +12:00
Tom Lane	7a5d74b7dd	Put back mistakenly removed #include. In commit `4dbcb3f84` I removed some code from parse_coerce.c, and also removed some apparently-no-longer-needed #includes. But removing datum.h broke some not-compiled-by-default code. Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development	2020-04-08 00:10:16 -04:00
Alvaro Herrera	9e9abed746	Remove testing for precise LSN/reserved bytes in new TAP test Trying to ensure that a slot's restart_lsn or amount of reserved bytes exactly match some specific values seems unnecessary, and fragile as shown by failures in multiple buildfarm members. Discussion: https://postgr.es/m/20200407232602.GA21559@alvherre.pgsql	2020-04-07 23:28:27 -04:00
Thomas Munro	3985b600f5	Support PrefetchBuffer() in recovery. Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for use in recovery. Rename LocalPrefetchBuffer() to PrefetchLocalBuffer() for consistency. Add a return value to all of these. In recovery, tolerate and report missing files, so we can handle relations unlinked before crash recovery began. Also report cache hits and misses, so that callers can do faster buffer lookups and better I/O accounting. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2020-04-08 14:56:57 +12:00
Tom Lane	981643dcdb	Allow partitionwise join to handle nested FULL JOIN USING cases. This case didn't work because columns merged by FULL JOIN USING are represented in the parse tree by COALESCE expressions, and the logic for recognizing a partitionable join failed to match upper-level join clauses to such expressions. To fix, synthesize suitable COALESCE expressions and add them to the nullable_partexprs lists. This is pretty ugly and brute-force, but it gets the job done. (I have ambitions of rethinking the way outer-join output Vars are represented, so maybe that will provide a cleaner solution someday. For now, do this.) Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com	2020-04-07 22:12:14 -04:00
Etsuro Fujita	c8434d64ce	Allow partitionwise joins in more cases. Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com	2020-04-08 10:25:00 +09:00
Tom Lane	41a194f491	Fix circle_in to accept "(x,y),r" as it's advertised to do. Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca	2020-04-07 20:50:28 -04:00
Andres Freund	75848bc744	snapshot scalability: Move delayChkpt from PGXACT to PGPROC. The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-04-07 17:36:23 -07:00
Tomas Vondra	2b88fdde30	Track SLRU page hits in SimpleLruReadPage_ReadOnly SLRU page hits were tracked only in SimpleLruReadPage, but that's not enough because we may hit the page in SimpleLruReadPage_ReadOnly in which case we don't call SimpleLruReadPage at all. Reported-by: Kuntal Ghosh Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development	2020-04-08 02:15:47 +02:00
Andres Freund	91c40548d5	Fix XLogReader FD leak that makes backends unusable after 2PC usage. Before the fix every 2PC commit/abort leaked a file descriptor. As the files are opened using BasicOpenFile(), that quickly leads to the backend running out of file descriptors. Once enough 2PC abort/commit have caused enough FDs to leak, any IO in the backend will fail with "Too many open files", as BasicOpenFilePerm() will have triggered all open files known to fd.c to be closed. The leak causing the problem at hand is a consequence of `0dc8ead46`, but is only exascerbated by it. Previously most XLogPageReadCB callbacks used static variables to cache one open file, but after the commit the cache is private to each XLogReader instance. There never was infrastructure to close FDs at the time of XLogReaderFree, but the way XLogReader was used limited the leak to one FD. This commit just closes the during XLogReaderFree() if the FD is stored in XLogReaderState.seg.ws_segno. This may not be the way to solve this medium/long term, but at least unbreaks 2PC. Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de	2020-04-07 17:03:04 -07:00
Alvaro Herrera	7e2ffb3885	Appease perlcritic Food for the gods must always be found somehow, even when the land starves.	2020-04-07 19:09:55 -04:00
Peter Geoghegan	60cbd7751c	Remove nbtree BTreeTupleSetAltHeapTID() function. Since heap TID is supposed to be just another key attribute to the implementation, it doesn't make much sense to have separate BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions. Merge the two functions together. This slightly simplifies _bt_truncate().	2020-04-07 15:56:52 -07:00
Alvaro Herrera	c655077639	Allow users to limit storage reserved by replication slots Replication slots are useful to retain data that may be needed by a replication system. But experience has shown that allowing them to retain excessive data can lead to the primary failing because of running out of space. This new feature allows the user to configure a maximum amount of space to be reserved using the new option max_slot_wal_keep_size. Slots that overrun that space are invalidated at checkpoint time, enabling the storage to be released. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp	2020-04-07 18:35:00 -04:00
Tom Lane	b63c293bcb	Allow psql's \g and \gx commands to transiently change \pset options. We invented \gx to allow the "\pset expanded" flag to be forced on for the duration of one command output, but that turns out to not be nearly enough to satisfy the demand for variant output formats. Hence, make it possible to change any pset option(s) for the duration of a single command output, by writing "option=value ..." inside parentheses, for example \g (format=csv csv_fieldsep='\t') somefile \gx can now be understood as a shorthand for including expanded=on inside the parentheses. Patch by me, expanding on a proposal by Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRBx9OnBPRJVtfA5ycUpySge-XootAXAsv_4rrkHxJ8eRg@mail.gmail.com	2020-04-07 17:46:29 -04:00
Alexander Korotkov	0f5ca02f53	Implement waiting for given lsn at transaction start This commit adds following optional clause to BEGIN and START TRANSACTION commands. WAIT FOR LSN lsn [ TIMEOUT timeout ] New clause pospones transaction start till given lsn is applied on standby. This clause allows user be sure, that changes previously made on primary would be visible on standby. New shared memory struct is used to track awaited lsn per backend. Recovery process wakes up backend once required lsn is applied. Author: Ivan Kartyshov, Anna Akenteva Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs Reviewed-by: Amit Kapila, Alexander Korotkov Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru	2020-04-07 23:51:10 +03:00
Alvaro Herrera	357889eb17	Support FETCH FIRST WITH TIES WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL standard's spelling of LIMIT), where you additionally get rows that compare equal to the last of those N rows by the columns in the mandatory ORDER BY clause. There was a proposal by Andrew Gierth to implement this functionality in a more powerful way that would yield more features, but the other patch had not been finished at this time, so we decided to use this one for now in the spirit of incremental development. Author: Surafel Temesgen <surafel3000@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk	2020-04-07 16:22:13 -04:00
Tom Lane	26a944cf29	Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering. Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. Lift that restriction by redeclaring the bit number arguments as int8 (which requires a catversion bump, hence is not back-patchable). The similarly-named functions for bit/varbit don't really have a problem because we restrict those types to at most VARBITMAXLEN bits; hence leave them alone. While here, extend the encode/decode functions in utils/adt/encode.c to allow dealing with values wider than 1GB. This is not a live bug or restriction in current usage, because no input could be more than 1GB, and since none of the encoders can expand a string more than 4X, the result size couldn't overflow uint32. But it might be desirable to support more in future, so make the input length values size_t and the potential-output-length values uint64. Also add some test cases to improve the miserable code coverage of these functions. Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca	2020-04-07 15:57:58 -04:00
Tomas Vondra	9c74ceb20b	Remove debugging elog from pgstat_recv_resetslrucounter Reported-by: Thomas Munro	2020-04-07 19:20:20 +02:00
Tomas Vondra	d22782a539	Minor improvements in Incremental Sort explain Some places still used "Maximum" instead of "Peak" when displaying info about sort space, so fix that. Also, add a comment clarifying why it's correct to check the number of full/prefix sort groups. Author: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-07 18:25:13 +02:00
Fujii Masao	4bd0ad9e44	Prevent archive recovery from scanning non-existent WAL files. Previously when there were multiple timelines listed in the history file of the recovery target timeline, archive recovery searched all of them, starting from the newest timeline to the oldest one, to find the segment to read. That is, archive recovery had to continuously fail scanning the segment until it reached the timeline that the segment belonged to. These scans for non-existent segment could be harmful on the recovery performance especially when archival area was located on the remote storage and each scan could take a long time. To address the issue, this commit changes archive recovery so that it skips scanning the timeline that the segment to read doesn't belong to. Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com	2020-04-08 00:49:29 +09:00
Tomas Vondra	ba3e76cc57	Consider Incremental Sort paths at additional places Commit `d2d8a229bc` introduced Incremental Sort, but it was considered only in create_ordered_paths() as an alternative to regular Sort. There are many other places that require sorted input and might benefit from considering Incremental Sort too. This patch modifies a number of those places, but not all. The concern is that just adding Incremental Sort to any place that already adds Sort may increase the number of paths considered, negatively affecting planning time, without any benefit. So we've taken a more conservative approach, based on analysis of which places do affect a set of queries that did seem practical. This means some less common queries may not benefit from Incremental Sort yet. Author: Tomas Vondra Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-07 16:43:22 +02:00
Tom Lane	c7654f6a37	Fix representation of SORT_TYPE_STILL_IN_PROGRESS. It turns out that the code did indeed rely on a zeroed TuplesortInstrumentation.sortMethod field to indicate "this worker never did anything", although it seems the issue only comes up during certain race-condition-y cases. Hence, rearrange the TuplesortMethod enum to restore SORT_TYPE_STILL_IN_PROGRESS to having the value zero, and add some comments reinforcing that that isn't optional. Also future-proof a loop over the possible values of the enum. sizeof(bits32) happened to be the correct limit value, but only by purest coincidence. Per buildfarm and local investigation. Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us	2020-04-06 22:22:13 -04:00
Thomas Munro	4c04be9b05	Introduce xid8-based functions to replace txid_XXX. The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to users as int8. Now that we have an SQL type xid8 for FullTransactionId, define a new set of functions including pg_current_xact_id() and pg_current_snapshot() based on that. Keep the old functions around too, for now. It's a bit sneaky to use the same C functions for both, but since the binary representation is identical except for the signedness of the type, and since older functions are the ones using the wrong signedness, and since we'll presumably drop the older ones after a reasonable period of time, it seems reasonable to switch to FullTransactionId internally and share the code for both. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de	2020-04-07 12:04:32 +12:00
Thomas Munro	aeec457de8	Add SQL type xid8 to expose FullTransactionId to users. Similar to xid, but 64 bits wide. This new type is suitable for use in various system views and administration functions. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de	2020-04-07 12:03:59 +12:00
Tomas Vondra	4bea576b03	Use INT64_FORMAT when formatting int64 values in explain Per report from lapwing.	2020-04-07 01:16:57 +02:00
Tomas Vondra	23ba3b5ee2	Fix failures in incremental_sort due to number of workers The last test in incremental_sort suite prints a parallel plan, but some of the buildfarm animals have custom max_parallel_workers_per_gather values, causing failures. Fixed by setting the GUC to an explicit value. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-07 00:02:07 +02:00
Peter Geoghegan	ce2cee0ade	Fix nbtree kill_prior_tuple posting list assert. An assertion added by commit `0d861bbb` checked that _bt_killitems() only processes a BTScanPosItem whose heap TID is contained in a posting list tuple when its page offset number still matches what is on the page (i.e. when it matches the posting list tuple's current offset number). This was only correct in the common case where the page can't have changed since we first read it. It was not correct in cases where we don't drop the buffer pin (and don't need to verify the page hasn't changed using its LSN). The latter category includes scans involving unlogged tables, and scans that use a non-MVCC snapshot, per the logic originally introduced by commit `2ed5b87f`. The assertion still seems helpful. Fix it by taking cases where the page may have been concurrently modified into account. Reported-By: Anastasia Lubennikova, Alexander Lakhin Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru	2020-04-06 14:46:33 -07:00
Tomas Vondra	7d6d82a524	Fix show_incremental_sort_info with force_parallel_mode When executed with force_parallel_mode=regress, the function was exiting too early and thus failed to print the worker stats. Fixed by making it more like show_sort_info. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-06 23:19:13 +02:00
Tomas Vondra	d2d8a229bc	Implement Incremental Sort Incremental Sort is an optimized variant of multikey sort for cases when the input is already sorted by a prefix of the requested sort keys. For example when the relation is already sorted by (key1, key2) and we need to sort it by (key1, key2, key3) we can simply split the input rows into groups having equal values in (key1, key2), and only sort/compare the remaining column key3. This has a number of benefits: - Reduced memory consumption, because only a single group (determined by values in the sorted prefix) needs to be kept in memory. This may also eliminate the need to spill to disk. - Lower startup cost, because Incremental Sort produce results after each prefix group, which is beneficial for plans where startup cost matters (like for example queries with LIMIT clause). We consider both Sort and Incremental Sort, and decide based on costing. The implemented algorithm operates in two different modes: - Fetching a minimum number of tuples without check of equality on the prefix keys, and sorting on all columns when safe. - Fetching all tuples for a single prefix group and then sorting by comparing only the remaining (non-prefix) keys. We always start in the first mode, and employ a heuristic to switch into the second mode if we believe it's beneficial - the goal is to minimize the number of unnecessary comparions while keeping memory consumption below work_mem. This is a very old patch series. The idea was originally proposed by Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the patch was taken over by James Coleman, who wrote and rewrote most of the current code. There were many reviewers/contributors since 2013 - I've done my best to pick the most active ones, and listed them in this commit message. Author: James Coleman, Alexander Korotkov Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-06 21:35:10 +02:00
Tom Lane	3c8553547b	Re-stabilize infinite_recurse() test case. Since commit `8f59f6b9c0`, CLOBBER_CACHE_ALWAYS buildfarm members have been failing this test case because the error message now sometimes includes an error cursor position. It seems largely just luck that that never happened before, and there are likely to be more ways it could happen in future. Hence, rather than trying to prevent it, adjust the test script to suppress that component of the report. At some point we might need to back-patch this, but refrain until there's a demonstrated need. (We'd need a different fix before v12, anyway, since VERBOSITY=sqlstate is a recent thing.) Tom Lane and Andres Freund Discussion: https://postgr.es/m/30675.1586111599@sss.pgh.pa.us	2020-04-06 12:00:37 -04:00
Peter Eisentraut	f1ac27bfda	Add logical replication support to replicate into partitioned tables Mainly, this adds support code in logical/worker.c for applying replicated operations whose target is a partitioned table to its relevant partitions. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-04-06 15:15:52 +02:00
Amit Kapila	b7ce6de93b	Allow autovacuum to log WAL usage statistics. This commit allows autovacuum to log WAL usage statistics added by commit `df3b181499`. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-06 16:24:51 +05:30
Michael Paquier	8ef9451f58	Refactor cluster.c to use new routine get_index_isclustered() This new cache lookup routine has been introduced in `a40caf5`, and more code paths can directly use it. Note that in cluster_rel(), the code was returning immediately if the tuple's entry in pg_index for the clustered index was not valid. This commit changes the code so as a lookup error is raised instead, something that could not happen from the start as we check for the existence of the index beforehand, while holding an exclusive lock on the parent table. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com	2020-04-06 11:44:23 +09:00
Amit Kapila	33e05f89c5	Add the option to report WAL usage in EXPLAIN and auto_explain. This commit adds a new option WAL similar to existing option BUFFERS in the EXPLAIN command. This option allows to include information on WAL record generation added by commit `df3b181499` in EXPLAIN output. This also allows the WAL usage information to be displayed via the auto_explain module. A new parameter auto_explain.log_wal controls whether WAL usage statistics are printed when an execution plan is logged. This parameter has no effect unless auto_explain.log_analyze is enabled. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-06 08:02:15 +05:30
Michael Paquier	a40caf5f86	Preserve clustered index after rewrites with ALTER TABLE A table rewritten by ALTER TABLE would lose tracking of an index usable for CLUSTER. This setting is tracked by pg_index.indisclustered and is controlled by ALTER TABLE, so some extra work was needed to restore it properly. Note that ALTER TABLE only marks the index that can be used for clustering, and does not do the actual operation. Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com Backpatch-through: 9.5	2020-04-06 11:03:49 +09:00
Andres Freund	fc3f4453a2	Recompute stack base in forked postmaster children. This is for the benefit of running postgres under the rr debugger. When using rr signal handlers running while a syscall is active use an alternative stack. As e.g. bgworkers are started from within signal handlers, the forked backend then has a different stack base than postmaster. Previously that subsequently lead to those processes triggering spurious "stack depth limit exceeded" errors. Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de	2020-04-05 18:23:30 -07:00
Andres Freund	f946069e68	Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative(). There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-	2020-04-05 17:47:30 -07:00
Andres Freund	549a3e23c3	Fix recently introduced typo. Reported-By: David Rowley	2020-04-05 12:03:09 -07:00
Peter Eisentraut	a9d9bdd3ad	Save errno across LWLockRelease() calls Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>	2020-04-05 10:02:00 +02:00
Tom Lane	18d85e9b8a	Further improve stability fix for partition_aggregate test. Commit `7cb0a423f` overlooked that the multi-level partition test table pagg_tab_ml still had an exactly even row split at its upper level of partitioning, so that some of the sub-aggregation plan steps still had exactly equal costs, leading to plan instability. Tweak the partition boundaries some more to make the row distribution unequal at both levels. This leads to more changes in the "expected" plan order than the previous round, but it seems fine. (Actually, I'm surprised that this didn't affect even more plans in this test: looking at the underlying costs shows that some of the parallel plan groups are not getting sorted by cost. Bug?) Per buildfarm member lousyjack, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-04%2021%3A03%3A04 Discussion: https://postgr.es/m/24467.1585838693@sss.pgh.pa.us	2020-04-05 00:53:28 -04:00
Noah Misch	70de4e950c	Add perl2host call missing from a new test file. Oversight in today's commit `c6b92041d3`. Per buildfarm member jacana. Discussion: http://postgr.es/m/20200404223212.GC3442685@rfd.leadboat.com	2020-04-04 15:45:45 -07:00
Tom Lane	07871d40c7	Remove bogus Assert, add some regression test cases showing why. Commit `77ec5affb` added an assertion to enforce_generic_type_consistency that boils down to "if the function result is polymorphic, there must be at least one polymorphic argument". This should be true for user-created functions, but there are built-in functions for which it's not true, as pointed out by Jaime Casanova. Hence, go back to the old behavior of leaving the return type alone. There's only a limited amount of stuff you can do with such a function result, but it does work to some extent; add some regression test cases to ensure we don't break that again. Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com	2020-04-04 18:03:30 -04:00
Noah Misch	c6b92041d3	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-04-04 12:25:34 -07:00
Peter Eisentraut	552fcebff0	Revert "Improve handling of parameter differences in physical replication" This reverts commit `246f136e76`. That patch wasn't quite complete enough. Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org	2020-04-04 09:08:12 +02:00
Amit Kapila	df3b181499	Add infrastructure to track WAL usage. This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-04 10:02:08 +05:30
Jeff Davis	0588ee63aa	Include chunk overhead in hash table entry size estimate. Don't try to be precise about it, just use a constant 16 bytes of chunk overhead. Being smarter would require knowing the memory context where the chunk will be allocated, which is not known by all callers. Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de	2020-04-03 20:07:58 -07:00
Robert Haas	3e0d80fd8d	Fix resource management bug with replication=database. Commit `0d8c9c1210` allowed BASE_BACKUP to acquire a ResourceOwner without a transaction so that the backup manifest functionality could use a BufFile, but it overlooked the fact that when a walsender is used with replication=database, it might have a transaction in progress, because in that mode, SQL and replication commands can be mixed. Try to fix things up so that the two cleanup mechanisms don't conflict. Per buildfarm member serinus, which triggered the problem when CREATE_REPLICATION_SLOT failed from inside a transaction. It passed on the subsequent run, so evidently the failure doesn't happen every time.	2020-04-03 22:28:37 -04:00
Robert Haas	db1531cae0	Be more careful about time_t vs. pg_time_t in basebackup.c. lapwing is complaining that about a call to pg_gmtime, saying that it "expected 'const pg_time_t ' but argument is of type 'time_t '". I at first thought that the problem had someting to do with const, but Thomas Munro suggested that it might be just because time_t and pg_time_t are different identifers. lapwing is i686 rather than x86_64, and pg_time_t is always int64, so that seems like a good guess. There is other code that just casts time_t to pg_time_t without any conversion function, so try that approach here. Introduced in commit `0d8c9c1210`.	2020-04-03 20:18:47 -04:00
Robert Haas	9f8f881caa	pg_validatebackup: Fix 'make clean' to remove tmp_check. Report by Tom Lane. Discussion: http://postgr.es/m/22394.1585951968@sss.pgh.pa.us	2020-04-03 19:51:18 -04:00
Robert Haas	19c0422ad0	pg_validatebackup: Adjust TAP tests to undo permissions change. It may be necessary to go further and remove this test altogether, but I'm going to try this fix first. It's not clear, at least to me, exactly how this is breaking buildfarm members, but it appears to be doing so.	2020-04-03 19:01:59 -04:00
Robert Haas	460314db08	pg_validatebackup: Also use perl2host in TAP tests. Second try at getting the buildfarm to be happy with 003_corrution.pl as added by commit `0d8c9c1210`. Per suggestion from Álvaro Herrera. Discussion: http://postgr.es/m/20200403205412.GA8279@alvherre.pgsql	2020-04-03 17:18:23 -04:00
Tom Lane	0568e7a2a4	Cosmetic improvements for code related to partitionwise join. Move have_partkey_equi_join and match_expr_to_partition_keys to relnode.c, since they're used only there. Refactor build_joinrel_partition_info to split out the code that fills the joinrel's partition key lists; this doesn't have any non-cosmetic impact, but it seems like a useful separation of concerns. Improve assorted nearby comments. Amit Langote, with a little further editorialization by me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com	2020-04-03 17:00:35 -04:00
Robert Haas	21dc48840c	pg_validatebackup: Use tempdir_short in TAP tests. The buildfarm is very unhappy right now because TAP test 003_corruption.pl uses TestLib::tempdir to generate the name of a temporary directory that is used as a tablespace name, and this results in a 'symbolic link target too long' error message on many of the buildfarm machines, but not on my machine. It appears that other people have run into similar problems in the past and that TestLib::tempdir_short was the solution, so let's try using that instead.	2020-04-03 15:40:35 -04:00
Robert Haas	87e3004340	pg_validatebackup: Adjust TAP tests to placate perlcritic. It seems that we have a policy that every Perl subroutine should end with an explicit "return", so add explicit "return" statements to all the new subroutines added by my prior commit `0d8c9c1210`. Per buildfarm.	2020-04-03 15:28:59 -04:00
Robert Haas	0d8c9c1210	Generate backup manifests for base backups, and validate them. A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com	2020-04-03 15:05:59 -04:00
Fujii Masao	ce77abe63c	Include information on buffer usage during planning phase, in EXPLAIN output, take two. When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. This commit revives the original commit `ed7a509571` that was reverted by commit `19db23bcbd`. The original commit had to be reverted because it caused the regression test failure on the buildfarm members prion and dory. But since commit `c0885c4c30` got rid of the caues of the test failure, the original commit can be safely introduced again. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org	2020-04-04 03:13:17 +09:00
Tom Lane	e41955faf0	Fix bugs in gin_fuzzy_search_limit processing. entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com	2020-04-03 13:15:45 -04:00
Fujii Masao	c0885c4c30	Improve stability of explain regression test. The explain regression test runs EXPLAIN commands via the function that filters unstable outputs. To produce more stable test output, this commit improves the function so that it also filters out text-mode Buffers lines. This is necessary because text-mode Buffers lines vary depending the system state. This improvement will get rid of the regression test failure that the commit `ed7a509571` caused on the buildfarm members prion and dory because of the instability of Buffers lines. Author: Fujii Masao Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20200403025751.GB1759@paquier.xyz	2020-04-04 01:26:39 +09:00
Robert Haas	3031440e98	pg_waldump: Don't call XLogDumpDisplayStats() if -q is specified. Commit `ac44367efb` introduced this problem. Report and fix by Fujii Masao. Discussion: http://postgr.es/m/d332b8f0-0c72-3cd6-6945-7a86a503662a@oss.nttdata.com	2020-04-03 11:58:58 -04:00
Robert Haas	c12e43a2e0	Add checksum helper functions. These functions make it easier to write code that wants to compute a checksum for some data while allowing the user to configure the type of checksum that gets used. This is another piece of infrastructure for the upcoming patch to add backup manifests. Patch written from scratch by me, but it is similar to previous work by Rushabh Lathia and Suraj Kharage. Suraj also reviewed this version off-list. Advice on how not to break Windows from Davinder Singh. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZRTBiPyvQEwV79PU1ePTtSEo2UeVncrkJMbn1sU1gnRA@mail.gmail.com	2020-04-03 11:52:43 -04:00
Tom Lane	6dd9f35779	Fix bogus CALLED_AS_TRIGGER() defenses. contrib/lo's lo_manage() thought it could use trigdata->tg_trigger->tgname in its error message about not being called as a trigger. That naturally led to a core dump. unique_key_recheck() figured it could Assert that fcinfo->context is a TriggerData node in advance of having checked that it's being called as a trigger. That's harmless in production builds, and perhaps not that easy to reach in any case, but it's logically wrong. The first of these per bug #16340 from William Crowell; the second from manual inspection of other CALLED_AS_TRIGGER call sites. Back-patch the lo.c change to all supported branches, the other to v10 where the thinko crept in. Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org	2020-04-03 11:24:56 -04:00
Fujii Masao	19db23bcbd	Revert "Include information on buffer usage during planning phase, in EXPLAIN output." This reverts commit `ed7a509571`. Per buildfarm member prion.	2020-04-03 12:20:42 +09:00
Fujii Masao	18808f8c89	Add wait events for recovery conflicts. This commit introduces new wait events RecoveryConflictSnapshot and RecoveryConflictTablespace. The former is reported while waiting for recovery conflict resolution on a vacuum cleanup. The latter is reported while waiting for recovery conflict resolution on dropping tablespace. Also this commit changes the code so that the wait event Lock is reported while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery conflict resolution on a lock. Basically the wait event Lock is reported during that wait, but previously was not reported only when that wait happened in ResolveRecoveryConflictWithVirtualXIDs(). Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com	2020-04-03 12:15:56 +09:00
Michael Paquier	9d8ef98800	Add support for \aset in pgbench This option is similar to \gset, except that it is able to store all results from combined SQL queries into separate variables. If a query returns multiple rows, the last result is stored and if a query returns no rows, nothing is stored. While on it, add a TAP test for \gset to check for a failure when a query returns multiple rows. Author: Fabien Coelho Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/alpine.DEB.2.21.1904081914200.2529@lancre	2020-04-03 11:45:15 +09:00
Fujii Masao	ed7a509571	Include information on buffer usage during planning phase, in EXPLAIN output. When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org	2020-04-03 11:27:09 +09:00
Robert Haas	ac44367efb	pg_waldump: Add a --quiet option. The primary motivation for this change is that it will be used by the upcoming patch to add backup manifests, but it also seems to have some potential more general use. Andres Freund and Robert Haas Discussion: http://postgr.es/m/20200330020814.nspra4mvby42yoa4@alap3.anarazel.de	2020-04-02 20:25:04 -04:00
Tom Lane	7cb0a423f9	Improve stability fix for partition_aggregate test. Instead of disabling autovacuum on these test tables, adjust the partition boundaries so that the child partitions are not all the same size. That should cause the planner to use a predictable ordering of the per-partition scan nodes even in cases where autovacuum causes the rowcount estimates to be off a bit. Moreover, this also lets these tests show that the planner does properly order the tables in descending size order, something that wasn't being proven before. The pagg_tab1 and pagg_tab2 partitions are still all the same size, but that should be fine, because those tables are so small that (1) autovacuum won't fire on them, and (2) even if it did, it couldn't change the reltuples value --- with only one page, it can't see just part of the relation. Discussion: https://postgr.es/m/24467.1585838693@sss.pgh.pa.us	2020-04-02 19:43:51 -04:00
Tom Lane	0b34e7d307	Improve user control over truncation of logged bind-parameter values. This patch replaces the boolean GUC log_parameters_on_error introduced by commit `ba79cb5dc` with an integer log_parameter_max_length_on_error, adding the ability to specify how many bytes to trim each logged parameter value to. (The previous coding hard-wired that choice at 64 bytes.) In addition, add a new parameter log_parameter_max_length that provides similar control over truncation of query parameters that are logged in response to statement-logging options, as opposed to errors. Previous releases always logged such parameters in full, possibly causing log bloat. For backwards compatibility with prior releases, log_parameter_max_length defaults to -1 (log in full), while log_parameter_max_length_on_error defaults to 0 (no logging). Per discussion, log_parameter_max_length is SUSET since the DBA should control routine logging behavior, but log_parameter_max_length_on_error is USERSET because it also affects errcontext data sent back to the client. Alexey Bashtanov, editorialized a little by me Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc	2020-04-02 15:04:51 -04:00
David Rowley	cefb82d49e	Attempt to stabilize partitionwise_aggregate test In `b07642dbc`, we added code to trigger autovacuums based on the number of INSERTs into a table. This seems to have cause some destabilization of the regression tests. Likely this is due to an autovacuum triggering mid-test and (per theory from Tom Lane) one of the test's queries causes autovacuum to skip some number of pages, resulting in the reltuples estimate changing. The failure that this is attempting to fix is around the order of subnodes in an Append. Since the planner orders these according to the subnode cost, then it's possible that a small change in the reltuples value changes the subnode's cost enough that it swaps position with one of its fellow subnodes. The failure here only seems to occur on slower buildfarm machines. In this case, lousyjack, which seems have taken over 8 minutes to run just the partitionwise_aggregate test. Such a slow run would increase the chances that the autovacuum launcher would trigger a vacuum mid-test. Faster machines run this test in sub second time, so have a much smaller window for an autovacuum to trigger. Here we fix this by disabling autovacuum on all tables created in the test. Additionally, this reverts the change made in the partitionwise_aggregate test in `2dc16efed`. Discussion: https://postgr.es/m/22297.1585797192@sss.pgh.pa.us	2020-04-02 21:26:54 +13:00
Peter Eisentraut	2991ac5fc9	Add SQL functions for Unicode normalization This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and check Unicode normal forms, per SQL standard. To support fast IS NORMALIZED tests, we pull in a new data file DerivedNormalizationProps.txt from Unicode and build a lookup table from that, using techniques similar to ones already used for other Unicode data. make update-unicode will keep it up to date. We only build and use these tables for the NFC and NFKC forms, because they are too big for NFD and NFKD and the improvement is not significant enough there. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com	2020-04-02 08:56:27 +02:00
Peter Eisentraut	c6e0edad46	Add some comments to some SQL features Otherwise, it could be confusing to a reader that some of these well-publicized features are simply listed as unsupported without further explanation.	2020-04-02 07:52:20 +02:00
Thomas Munro	37b3794dfc	Add maintenance_io_concurrency to postgresql.conf.sample. New GUC from commit `fc34b0d9`.	2020-04-02 16:50:36 +13:00
Amit Kapila	3a5e22138a	Allow parallel vacuum to accumulate buffer usage. Commit `40d964ec99` allowed vacuum command to process indexes in parallel but forgot to accumulate the buffer usage stats of parallel workers. This allows leader backend to accumulate buffer usage stats of all the parallel workers. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud Discussion: https://postgr.es/m/20200328151721.GB12854@nol	2020-04-02 08:04:58 +05:30
Fujii Masao	17e0328224	Allow pg_stat_statements to track planning statistics. This commit makes pg_stat_statements support new GUC pg_stat_statements.track_planning. If this option is enabled, pg_stat_statements tracks the planning statistics of the statements, e.g., the number of times the statement was planned, the total time spent planning the statement, etc. This feature is useful to check the statements that it takes a long time to plan. Previously since pg_stat_statements tracked only the execution statistics, we could not use that for the purpose. The planning and execution statistics are stored at the end of each phase separately. So there are not always one-to-one relationship between them. For example, if the statement is successfully planned but fails in the execution phase, only its planning statistics are stored. This may cause the users to be able to see different pg_stat_statements results from the previous version. To avoid this, pg_stat_statements.track_planning needs to be disabled. This commit bumps the version of pg_stat_statements to 1.8 since it changes the definition of pg_stat_statements function. Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com	2020-04-02 11:20:19 +09:00
Tomas Vondra	28cac71bd3	Collect statistics about SLRU caches There's a number of SLRU caches used to access important data like clog, commit timestamps, multixact, asynchronous notifications, etc. Until now we had no easy way to monitor these shared caches, compute hit ratios, number of reads/writes etc. This commit extends the statistics collector to track this information for a predefined list of SLRUs, and also introduces a new system view pg_stat_slru displaying the data. The list of built-in SLRUs is fixed, but additional SLRUs may be defined in extensions. Unfortunately, there's no suitable registry of SLRUs, so this patch simply defines a fixed list of SLRUs with entries for the built-in ones and one entry for all additional SLRUs. Extensions adding their own SLRU are fairly rare, so this seems acceptable. This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes are still fixed (hard-coded in the code) and it's not entirely clear which of the SLRUs might need a GUC to tune size. In a way, allowing us to determine that is one of the goals of this patch. Bump catversion as the patch introduces new functions and system view. Author: Tomas Vondra Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development	2020-04-02 02:34:21 +02:00
Tom Lane	501b018799	Check equality semantics for unique indexes on partitioned tables. We require the partition key to be a subset of the set of columns being made unique, so that physically-separate indexes on the different partitions are sufficient to enforce the uniqueness constraint. The existing code checked that the listed columns appear, but did not inquire into the index semantics, which is a serious oversight given that different index opclasses might enforce completely different notions of uniqueness. Ideally, perhaps, we'd just match the partition key opfamily to the index opfamily. But hash partitioning uses hash opfamilies which we can't directly match to btree opfamilies. Hence, look up the equality operator in each family, and accept if it's the same operator. This should be okay in a fairly general sense, since the equality operator ought to precisely represent the opfamily's notion of uniqueness. A remaining weak spot is that we don't have a cross-index-AM notion of which opfamily member is "equality". But we know which one to use for hash and btree AMs, and those are the only two that are relevant here at present. (Any non-core AMs that know how to enforce equality are out of luck, for now.) Back-patch to v11 where this feature was introduced. Guancheng Luo, revised a bit by me Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com	2020-04-01 14:49:49 -04:00
Tom Lane	a80818605e	Improve selectivity estimation for assorted match-style operators. Quite a few matching operators such as JSONB's @> used "contsel" and "contjoinsel" as their selectivity estimators. That was a bad idea, because (a) contsel is only a stub, yielding a fixed default estimate, and (b) that default is 0.001, meaning we estimate these operators as five times more selective than equality, which is surely pretty silly. There's a good model for improving this in ltree's ltreeparentsel(): for any "var OP constant" query, we can try applying the operator to all of the column's MCV and histogram values, taking the latter as being a random sample of the non-MCV values. That code is actually 100% generic, except for the question of exactly what default selectivity ought to be plugged in when we don't have stats. Hence, migrate the guts of ltreeparentsel() into the core code, provide wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate default estimate, and use those for the non-geometric operators that formerly used contsel (mostly JSONB containment operators and tsquery matching). Also apply this code to some match-like operators in hstore, ltree, and pg_trgm, including the former users of ltreeparentsel as well as ones that improperly used contsel. Since commit `911e70207` just created new versions of those extensions that we haven't released yet, we can sneak this change into those new versions instead of having to create an additional generation of update scripts. Patch by me, reviewed by Alexey Bashtanov Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us	2020-04-01 10:32:33 -04:00
Peter Eisentraut	d8653f4687	Refactor code to look up local replication tuple This unifies some duplicate code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com	2020-04-01 15:34:41 +02:00
Michael Paquier	8d84dd0012	Fix crash in psql when attempting to reuse old connection In a psql session, if the connection to the server is abruptly cut, the referenced connection would become NULL as of CheckConnection(). This could cause a hard crash with psql if attempting to connect by reusing the past connection's data because of a null-pointer dereference with either PQhost() or PQdb(). This issue is fixed by making sure that no reuse of the past connection is done if it does not exist. Issue has been introduced by `6e5f8d4`, so backpatch down to 12. Reported-by: Hugh Wang Author: Michael Paquier Reviewed-by: Álvaro Herrera, Tom Lane Discussion: https://postgr.es/m/16330-b34835d83619e25d@postgresql.org Backpatch-through: 12	2020-04-01 14:45:45 +09:00
Amit Kapila	2401d93718	Fix coverity complaint about commit `40d964ec99`. The coverity complained that dividing integer expressions and then converting the integer quotient to type "double" would lose fractional part. Typecasting one of the arguments of expression with double should fix the report. Author: Mahendra Singh Thalor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20200329224818.6phnhv7o2q2rfovf@alap3.anarazel.de	2020-04-01 09:28:13 +05:30
Bruce Momjian	08481eedd1	psql: do file completion for \gx This was missed when the feature was added. Reported-by: Vik Fearing Discussion: https://postgr.es/m/eca20529-0b06-b493-ee38-f071a75dcd5b@postgresfriends.org Backpatch-through: 10	2020-03-31 23:01:34 -04:00
Michael Paquier	a7e8ece41c	Add -c/--restore-target-wal to pg_rewind pg_rewind needs to copy from the source cluster to the target cluster a set of relation blocks changed from the previous checkpoint where WAL forked up to the end of WAL on the target. Building this list of relation blocks requires a range of WAL segments that may not be present anymore on the target's pg_wal, causing pg_rewind to fail. It is possible to work around this issue by copying manually the WAL segments needed but this may lead to some extra and actually useless work. This commit introduces a new option allowing pg_rewind to use a restore_command while doing the rewind by grabbing the parameter value of restore_command from the target cluster configuration. This allows the rewind operation to be more reliable, so as only the WAL segments needed by the rewind are restored from the archives. In order to be able to do that, a new routine is added to src/common/ to allow frontend tools to restore files from archives using an already-built restore command. This version is more simple than the backend equivalent as there is no need to handle the non-recovery case. Author: Alexey Kondratov Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander Korotkov, Michael Paquier Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru	2020-04-01 10:57:03 +09:00
Peter Geoghegan	7dbe290da4	Add CREATE INDEX deduplication assertions. Add two assertions that verify the assumptions about posting list tuple space accounting and suffix truncation made within nbtsort.c.	2020-03-31 14:38:39 -07:00
Tom Lane	fe3036527a	Fix race condition in statext_store(). Must hold some lock on the pg_statistic_ext_data catalog before we look up the tuple we aim to replace. Otherwise a concurrent VACUUM FULL or similar operation could move it to a different TID, leaving us trying to replace the wrong tuple. Back-patch to v12 where this got broken. Credit goes to Dean Rasheed; I'm just doing the clerical work. Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com	2020-03-31 17:06:22 -04:00
Tom Lane	0936d1b6ff	Still another try at stabilizing stats_ext test results. The stats_ext test is not expecting that autovacuum will touch any of its tables; an expectation falsified by commit `b07642dbc`. Although I'm suspicious that there's something else going on that makes extended stats estimates not 100% reproducible, it's pretty easy to demonstrate that there are places in this test that fail if an autovacuum updates the table's stats unexpectedly. Hence, revert the band-aid changes made by `2dc16efed` and `24566b359` in favor of summarily disabling autovacuum for all the tables that this test checks estimated rowcounts for. Also remove an evidently obsolete comment at the head of the test. Discussion: https://postgr.es/m/15012.1585623298@sss.pgh.pa.us	2020-03-31 16:09:25 -04:00
Fujii Masao	b0236508d3	Improve the message logged when recovery is paused. When recovery target is reached and recovery is paused because of recovery_target_action=pause, executing pg_wal_replay_resume() causes the standby to promote, i.e., the recovery to end. So, in this case, the previous message "Execute pg_wal_replay_resume() to continue" logged was confusing because pg_wal_replay_resume() doesn't cause the recovery to continue. This commit improves the message logged when recovery is paused, and the proper message is output based on what (pg_wal_replay_pause or recovery_target_action) causes recovery to be paused. Author: Sergei Kornilov, revised by Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net	2020-04-01 03:35:13 +09:00
Bruce Momjian	051fd5e0f9	Allow ecpg to be built stand-alone, allow parallel libpq make This change defines SHLIB_PREREQS for the libpgport dependency, rather than using a makefile rule. This was broken in PG 12. Reported-by: Filip Janus Discussion: https://postgr.es/m/E5Dc85EGUY4wyG8cjAU0qoEdCJxGK_qhW1s9qSuYq9A@mail.gmail.com Author: Dagfinn Ilmari Mannsåker (for libpq) Backpatch-through: 12	2020-03-31 14:17:32 -04:00
Tom Lane	82e8018522	Teach pg_ls_dir_files() to ignore ENOENT failures from stat(). Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until `085b6b667` added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com	2020-03-31 12:57:55 -04:00
Alexander Korotkov	02a5786df2	Improve error reporting in opclasscmds.c This commit improves error reporting introduced by `911e702077`. It puts argument of errmsg() to the single line for easier grepping source for error text. Also it improves wording of errhint().	2020-03-31 17:51:57 +03:00
Magnus Hagander	087d3d0583	Fix assorted typos Author: Daniel Gustafsson <daniel@yesql.se>	2020-03-31 16:00:06 +02:00
Peter Eisentraut	de3bbfcc96	Fix INSERT OVERRIDING USER VALUE behavior The original implementation disallowed using OVERRIDING USER VALUE on identity columns defined as GENERATED ALWAYS, which is not per standard. So allow that now. Expand documentation and tests around this. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/CAEZATCVrh2ufCwmzzM%3Dk_OfuLhTTPBJCdFkimst2kry4oHepuQ%40mail.gmail.com	2020-03-31 08:50:39 +02:00
Michael Paquier	616ae3d2b0	Move routine definitions of xlogarchive.c to a new header file The definitions of the routines defined in xlogarchive.c have been part of xlog_internal.h which is included by several frontend tools, but all those routines are only called by the backend. More cleanup could be done within xlog_internal.h, but that's already a nice cut. This will help a follow-up patch for pg_rewind where handling of restore_command is added for frontends. Author: Alexey Kondratov, Michael Paquier Reviewed-by: Álvaro Herrera, Alexander Korotkov Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru	2020-03-31 15:33:04 +09:00
Peter Eisentraut	fc8c3bdde2	Update SQL features Set T653 to supported. This has always been possible.	2020-03-31 08:25:03 +02:00
Amit Kapila	ef75140fe7	Avoid calls to RelationGetRelationName() and RelationGetNamespace() in vacuum code. After commit `b61d161c14`, during vacuum, we cache the information of relation name and relation namespace in local structure LVRelStats so that we can use it in an error callback function. We can use the cached information to avoid the calls to RelationGetRelationName(), RelationGetNamespace() and get_namespace_name(). This is mainly for the consistent in vacuum code path but it will avoid the extra syscache lookup we do in get_namespace_name(). Author: Justin Pryzby Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-31 09:34:49 +05:30
Peter Geoghegan	f01157e2ac	Further simplify nbtree high key truncation. Commit `7c2dbc69` reorganized _bt_truncate() in a way that enables a further simplification that I (pgeoghegan) missed: Since we mark the tuple that is returned to the caller as a pivot tuple before the point where its heap TID is set as of `7c2dbc69`, it is possible to use the high level BTreeTupleGetHeapTID() inline function to get an item pointer. Do it that way now. This approach is clearer and more maintainable.	2020-03-30 17:34:12 -07:00
Michael Paquier	dd9ac7d5d8	Revert "Skip redundant anti-wraparound vacuums" This reverts commit `2aa6e33`, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12	2020-03-31 08:27:47 +09:00
Peter Geoghegan	7c2dbc691c	Refactor nbtree high key truncation. Simplify _bt_truncate(), the routine that generates truncated leaf page high keys. Remove a micro-optimization that avoided a second palloc0() call (this was used when a heap TID was needed in the final pivot tuple, though only when the index happened to not be an INCLUDE index). Removing this dubious micro-optimization allows _bt_truncate() to use the index_truncate_tuple() indextuple.c utility routine in all cases. This was already the common case. This commit is a HEAD-only follow up to bugfix commit `4b42a899`.	2020-03-30 15:52:39 -07:00
Andres Freund	d4b34f60c5	Deduplicate PageIsNew() check in lazy_scan_heap(). The recheck isn't needed anymore, as RelationGetBufferForTuple() now extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to handle the fact that relation extension extended the relation and then separately acquired a lock on the page - while expecting that the page is empty. Reported-By: Ranier Vilela Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com	2020-03-30 13:56:40 -07:00
Alexander Korotkov	364bdd0b41	Fix missing SP-GiST support in `911e702077` `911e702077` misses setting of amoptsprocnum for SP-GiST. This commit fixes that.	2020-03-30 23:45:03 +03:00
Alexander Korotkov	851b14b0c6	Remove rudiments of supporting procnum == 0 from `911e702077` Early versions of opclass options patch uses zero support procedure as opclass options procedure. This commit removes rudiments of it, which were committed in `911e702077`. Also, it implements correct handling of amoptsprocnum == 0.	2020-03-30 23:43:25 +03:00
Peter Geoghegan	4b42a89938	Consistently truncate non-key suffix columns. INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit `dd299df8`, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.	2020-03-30 12:03:59 -07:00
Alexander Korotkov	911e702077	Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera	2020-03-30 19:17:23 +03:00
Peter Eisentraut	1d53432ff9	Allow using Unix-domain sockets on Windows in tests The test suites currently don't use Unix-domain sockets on Windows. This optionally allows enabling that by setting the environment variable PG_TEST_USE_UNIX_SOCKETS. This should currently be considered experimental. In particular, pg_regress.c contains some comments that the cleanup code for Unix-domain sockets doesn't work correctly under Windows, which hasn't been an problem until now. But it's good enough for locally supervised testing of the functionality. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com	2020-03-30 17:35:29 +02:00
Tom Lane	8c49454caa	Be more careful about extracting encoding from locale strings on Windows. GetLocaleInfoEx() can fail on strings that setlocale() was perfectly happy with. A common way for that to happen is if the locale string is actually a Unix-style string, say "et_EE.UTF-8". In that case, what's after the dot is an encoding name, not a Windows codepage number; blindly treating it as a codepage number led to failure, with a fairly silly error message. Hence, check to see if what's after the dot is all digits, and if not, treat it as a literal encoding name rather than a codepage number. This will do the right thing with many Unix-style locale strings, and produce a more sensible error message otherwise. Somewhat independently of that, treat a zero (CP_ACP) result from GetLocaleInfoEx() as meaning that we must use UTF-8 encoding. Back-patch to all supported branches. Juan José Santamaría Flecha Discussion: https://postgr.es/m/24905.1585445371@sss.pgh.pa.us	2020-03-30 11:14:58 -04:00
David Rowley	24566b359d	Attempt to fix unstable regression tests, take 2 Following up on `2dc16efed`, petalura has suffered some additional failures in stats_ext which again appear to be around the timing of an autovacuum during the test, causing instability in the row estimates. Again, let's fix this by explicitly performing a VACUUM on the table and not leave it to happen by chance of an autovacuum pass. Discussion: https://postgr.es/m/CAApHDvok5hmXr%2BbUbJe7%2B2sQzWo4B_QzSk7RKFR9fP6BjYXx5g%40mail.gmail.com	2020-03-30 23:41:11 +13:00
Fujii Masao	64638ccba3	Report waiting via PS while recovery is waiting for buffer pin in hot standby. Previously while the startup process was waiting for the recovery conflict with snapshot, tablespace or lock to be resolved, waiting was reported in PS display, but not in the case of recovery conflict with buffer pin. This commit makes the startup process in hot standby report waiting via PS while waiting for the conflicts with other backends holding buffer pins to be resolved. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com	2020-03-30 17:35:03 +09:00
Peter Eisentraut	246f136e76	Improve handling of parameter differences in physical replication When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. If we just let the standby roll on with recovery, it will eventually produce an appropriate error when those resources are used. So this patch relaxes this a bit. Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. At that point we pause recovery, so a hot standby remains usable. We also repeat the last warning message once a minute so it is harder to miss or ignore. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com	2020-03-30 09:53:45 +02:00
Peter Eisentraut	a01e1b8b9d	Add new part SQL/MDA to information_schema.sql_parts	2020-03-30 08:55:55 +02:00
Fujii Masao	6aba63ef3e	Allow the planner-related functions and hook to accept the query string. This commit adds query_string argument into the planner-related functions and hook and allows us to pass the query string to them. Currently there is no user of the query string passed. But the upcoming patch for the planning counters will add the planning hook function into pg_stat_statements and the function will need the query string. So this change will be necessary for that patch. Also this change is useful for some extensions that want to use the query string in their planner hook function. Author: Pascal Legrand, Julien Rouhaud Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com	2020-03-30 13:51:05 +09:00
Fujii Masao	4a539a25eb	Expose BufferUsageAccumDiff(). Previously pg_stat_statements calculated the difference of buffer counters by its own code even while BufferUsageAccumDiff() had the same code. This commit expose BufferUsageAccumDiff() and makes pg_stat_statements use it for the calculation, in order to simply the code. This change also would be useful for the upcoming patch for the planning counters in pg_stat_statements because the patch will add one more code for the calculation of difference of buffer counters and that can easily be done by using BufferUsageAccumDiff(). Author: Julien Rouhaud Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com	2020-03-30 12:15:26 +09:00
Amit Kapila	b61d161c14	Introduce vacuum errcontext to display additional information. The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-30 07:33:38 +05:30
Peter Eisentraut	9cedb16660	pg_regress: Observe TMPDIR Put the temporary socket directory under TMPDIR, if that environment variable is set, instead of the hardcoded /tmp. This allows running the tests if there is no /tmp at all (for example on Windows, although running the tests with Unix-domain sockets is not enabled on Windows yet). We also use TMPDIR everywhere else /tmp is hardcoded, so this makes the behavior consistent. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com	2020-03-29 09:25:40 +02:00
Peter Eisentraut	b79911dc8c	Update SQL features Change F181 to supported. It requires that an embedded C program can be split across multiple files, which ECPG easily supports.	2020-03-29 08:56:41 +02:00
David Rowley	2dc16efedc	Attempt to fix unstable regression tests `b07642dbc` added code to trigger autovacuums based on the number of inserts into a table. This seems to have caused some regression test results to destabilize. I suspect this is due to autovacuum triggering a vacuum sometime after the test's ANALYZE run and perhaps reltuples is ending up being set to a slightly different value as a result. Attempt to resolve this by running a VACUUM ANALYZE on the affected table instead of just ANALYZE. pg_class.reltuples will still get set to whatever ANALYZE chooses but we should no longer get the proceeding autovacuum overriding that. The overhead this adds to each test's runtime seems small enough not to worry about. I measure 3-4% on stats_ext and can't measure any change in partition_aggregate. I'm unable to recreate the issue locally, so this is a bit of a blind fix. Discussion: https://postgr.es/m/CAApHDvpWmpqYrKwwDQyeDq8dAyK7GMNaxDhrG69CkSuXoEg%2BVg%40mail.gmail.com	2020-03-29 19:36:20 +13:00
Peter Geoghegan	a7b9d24e4e	Make deduplication use number of key attributes. Use IndexRelationGetNumberOfKeyAttributes() rather than IndexRelationGetNumberOfAttributes() when determining whether or not two index tuples are suitable for merging together into a single posting list tuple. This is a little bit tidier. It brings affected code in nbtdedup.c a little closer to similar, related code in nbtsplitloc.c.	2020-03-28 20:25:03 -07:00
Andres Freund	42750b08d9	Ensure snapshot is registered within ScanPgRelation(). In 9.4 I added support to use a historical snapshot in ScanPgRelation(), while adding logical decoding. Unfortunately a conflict with the concurrent removal of SnapshotNow was incorrectly resolved, leading to an unregistered snapshot being used. It is not correct to use an unregistered (or non-active) snapshot for anything non-trivial, because catalog invalidations can cause the snapshot to be invalidated. Luckily it seems unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and we don't appear to rely on the result's tid to be correct. It however is clearly wrong and potential negative consequences would likely be hard to find. So it seems worth backpatching the fix, even without a concrete hazard. Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de Backpatch: 9.5-	2020-03-28 12:26:46 -07:00
Jeff Davis	7351bfeda3	Fix costing for disk-based hash aggregation. Report and suggestions from Richard Guo and Tomas Vondra. Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com	2020-03-28 12:07:49 -07:00
Dean Rasheed	4083f445c0	Improve the performance and accuracy of numeric sqrt() and ln(). Instead of using Newton's method to compute numeric square roots, use the Karatsuba square root algorithm, which performs better for numbers of all sizes. In practice, this is 3-5 times faster for inputs with just a few digits and up to around 10 times faster for larger inputs. Also, the new algorithm guarantees that the final digit of the result is correctly rounded, since it computes an integer square root with truncation, containing at least 1 extra decimal digit before rounding. The former algorithm would occasionally round the wrong way because it rounded both the intermediate and final results. In addition, arrange for sqrt_var() to explicitly support negative rscale values (rounding before the decimal point). This allows the argument reduction phase of ln_var() to be optimised for large inputs, since it only needs to compute square roots with a few more digits than the final ln() result, rather than computing all the digits before the decimal point. For very large inputs, this can be many thousands of times faster. In passing, optimise div_var_fast() in a couple of places where it was doing unnecessary work. Patch be me, reviewed by Tom Lane and Tels. Discussion: https://postgr.es/m/CAEZATCV1A7+jD3P30Zu31KjaxeSEyOn3v9d6tYegpxcq3cQu-g@mail.gmail.com	2020-03-28 14:37:53 +00:00
Peter Eisentraut	8f3ec75de4	Enable Unix-domain sockets support on Windows As of Windows 10 version 1803, Unix-domain sockets are supported on Windows. But it's not automatically detected by configure because it looks for struct sockaddr_un and Windows doesn't define that. So we just make our own definition on Windows and override the configure result. Set DEFAULT_PGSOCKET_DIR to empty on Windows so by default no Unix-domain socket is used, because there is no good standard location. In pg_upgrade, we have to do some extra tweaking to preserve the existing behavior of not using Unix-domain sockets on Windows. Adding support would be desirable, but it needs further work, in particular a way to select whether to use Unix-domain sockets from the command-line or with a run-time test. The pg_upgrade test script needs a fix. The previous code passed "localhost" to postgres -k, which only happened to work because Windows used to ignore the -k argument value altogether. We instead need to pass an empty string to get the desired effect. The test suites will continue to not use Unix-domain sockets on Windows. This requires a small tweak in pg_regress.c. The TAP tests don't need to be changed because they decide by the operating system rather than HAVE_UNIX_SOCKETS. Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com	2020-03-28 15:01:01 +01:00
Dean Rasheed	87779aa474	Prevent functional dependency estimates from exceeding column estimates. Formerly we applied a functional dependency "a => b with dependency degree f" using the formula P(a,b) = P(a) * [f + (1-f)P(b)] This leads to the possibility that the combined selectivity P(a,b) could exceed P(b), which is not ideal. The addition of support for IN and OR clauses (commits `8f321bd16c` and `ccaa3569f5`) would seem to make this more likely, since the user-supplied values in such clauses are not necessarily compatible with the functional dependency. Mitigate this by using the formula P(a,b) = f Min(P(a), P(b)) + (1-f) * P(a) * P(b) instead, which guarantees that the combined selectivity is less than each column's individual selectivity. Logically, this is modifies the part of the formula that accounts for dependent rows to handle cases where P(a) > P(b), whilst not changing the second term which accounts for independent rows. Additionally, this refactors the way that functional dependencies are applied, so now dependencies_clauselist_selectivity() estimates both the implying clauses and the implied clauses for each functional dependency (formerly only the implied clauses were estimated), and now all clauses for each attribute are taken into account (formerly only one clause for each implied attribute was estimated). This removes the previously built-in assumption that only equality clauses will be seen, which is no longer true, and opens up the possibility of applying functional dependencies to more general clauses. Patch by me, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development	2020-03-28 12:48:34 +00:00
Peter Eisentraut	145cb16d3b	Cleanup in SQL features files Feature C011 was still listed in sql_feature_packages.txt but had been removed from sql_features.txt, so also remove from the former.	2020-03-28 08:46:18 +01:00
David Rowley	b07642dbcd	Trigger autovacuum based on number of INSERTs Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com	2020-03-28 19:20:12 +13:00
Peter Geoghegan	9945ad6e90	Justify nbtree page split locking in code comment. Delaying unlocking the right child page until after the point that the left child's parent page has been refound is no longer truly necessary. Commit `40dae7ec` made nbtree tolerant of interrupted page splits. VACUUM was taught to avoid deleting a page that happens to be the right half of an incomplete split. As long as page splits don't unlock the left child page until the end of the second/final phase, it should be safe to unlock the right child page earlier (at the end of the first phase). It probably isn't actually useful to release the right child's lock earlier like this (it probably won't improve performance). Even still, pointing out that it ought to be safe to do so should make it easier to understand the overall design.	2020-03-27 16:44:52 -07:00
Alvaro Herrera	1e6148032e	Allow walreceiver configuration to change on reload The parameters primary_conninfo, primary_slot_name and wal_receiver_create_temp_slot can now be changed with a simple "reload" signal, no longer requiring a server restart. This is achieved by signalling the walreceiver process to terminate and having it start again with the new values. Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net	2020-03-27 19:51:37 -03:00
Alvaro Herrera	092c6936de	Set wal_receiver_create_temp_slot PGC_POSTMASTER Commit `3297308278` gave walreceiver the ability to create and use a temporary replication slot, and made it controllable by a GUC (enabled by default) that can be changed with SIGHUP. That's useful but has two problems: one, it's possible to cause the origin server to fill its disk if the slot doesn't advance in time; and also there's a disconnect between state passed down via the startup process and GUCs that walreceiver reads directly. We handle the first problem by setting the option to disabled by default. If the user enables it, its on their head to make sure that disk doesn't fill up. We handle the second problem by passing the flag via startup rather than having walreceiver acquire it directly, and making it PGC_POSTMASTER (which ensures a walreceiver always has the fresh value). A future commit can relax this (to PGC_SIGHUP again) by having the startup process signal walreceiver to shutdown whenever the value changes. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz	2020-03-27 16:20:33 -03:00
Tom Lane	fbc7a71608	Rearrange validity checks for plpgsql "simple" expressions. Buildfarm experience shows what probably should've occurred to me before: if a cache flush occurs partway through building a generic plan, then the plansource may have is_valid = false even though the plan is valid. We need to accept this case, use the generated plan, and then try to replan the next time. We can't try to replan immediately, because that would produce an infinite loop in CLOBBER_CACHE_ALWAYS builds; moreover it's really overkill. (We can assume that the plan is valid, it's just possibly a bit stale. Note that the pre-existing code behaved this way, and the non-simple-expression code paths do too.) Conversely, not using the generated plan would drop us into the not-a-simple-expression code path, which is bad for performance and would also cause regression-test failures due to visibly different error-reporting behavior. Hence, refactor the validity-check functions so that the initial check and recheck cases can react differently to plansource->is_valid. This makes their usage a bit simpler, too. Discussion: https://postgr.es/m/7072.1585332104@sss.pgh.pa.us	2020-03-27 14:47:34 -04:00
Peter Eisentraut	8d1b9648c5	Update SQL features Change F311 to supported. This was already accomplished when subfeature F311-04 (WITH CHECK OPTION) was added, but the top-level feature wasn't updated at the time.	2020-03-27 08:36:08 +01:00
Tom Lane	8f59f6b9c0	Improve performance of "simple expressions" in PL/pgSQL. For relatively simple expressions (say, "x + 1" or "x > 0"), plpgsql's management overhead exceeds the cost of evaluating the expression. This patch substantially improves that situation, providing roughly 2X speedup for such trivial expressions. First, add infrastructure in the plancache to allow fast re-validation of cached plans that contain no table access, and hence need no locks. Teach plpgsql to use this infrastructure for expressions that it's already deemed "simple" (which in particular will never contain table references). The fast path still requires checking that search_path hasn't changed, so provide a fast path for OverrideSearchPathMatchesCurrent by counting changes that have occurred to the active search path in the current session. This is simplistic but seems enough for now, seeing that PushOverrideSearchPath is not used in any performance-critical cases. Second, manage the refcounts on simple expressions' cached plans using a transaction-lifespan resource owner, so that we only need to take and release an expression's refcount once per transaction not once per expression evaluation. The management of this resource owner exactly parallels the existing management of plpgsql's simple-expression EState. Add some regression tests covering this area, in particular verifying that expression caching doesn't break semantics for search_path changes. Patch by me, but it owes something to previous work by Amit Langote, who recognized that getting rid of plancache-related overhead would be a useful thing to do here. Also thanks to Andres Freund for review. Discussion: https://postgr.es/m/CAFj8pRDRVfLdAxsWeVLzCAbkLFZhW549K+67tpOc-faC8uH8zw@mail.gmail.com	2020-03-26 18:58:57 -04:00
Tom Lane	86e5badd22	Ensure that plpgsql cleans up cleanly during parallel-worker exit. plpgsql_xact_cb ought to treat events XACT_EVENT_PARALLEL_COMMIT and XACT_EVENT_PARALLEL_ABORT like XACT_EVENT_COMMIT and XACT_EVENT_ABORT respectively, since its goal is to do process-local cleanup. This oversight caused plpgsql's end-of-transaction cleanup to not get done in parallel workers. Since a parallel worker will exit just after the transaction cleanup, the effects of this are limited. I couldn't find any case in the core code with user-visible effects, but perhaps there are some in extensions. In any case it's wrong, so let's fix it before it bites us not after. In passing, add some comments around the handling of expression evaluation resources in DO blocks. There's no live bug there, but it's quite unobvious what's happening; at least I thought so. This isn't related to the other issue, except that I found both things while poking at expression-evaluation performance. Back-patch the plpgsql_xact_cb fix to 9.5 where those event types were introduced, and the DO-block commentary to v11 where DO blocks gained the ability to issue COMMIT/ROLLBACK. Discussion: https://postgr.es/m/10353.1585247879@sss.pgh.pa.us	2020-03-26 18:06:55 -04:00
Magnus Hagander	eff5b245df	Document that pg_checksums exists in checksums README Author: Daniel Gustafsson <daniel@yesql.se>	2020-03-26 15:05:54 +01:00
Peter Eisentraut	49bf81536e	Drop slot's LWLock before returning from SaveSlotToPath() When SaveSlotToPath() is called with elevel=LOG, the early exits didn't release the slot's io_in_progress_lock. This could result in a walsender being stuck on the lock forever. A possible way to get into this situation is if the offending code paths are triggered in a low disk space situation. Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com> Reported-by: Craig Ringer <craig@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com	2020-03-26 13:29:20 +01:00
Tom Lane	958aa438aa	Further fixes for ssl_passphrase_callback test module. The Makefile should set TAP_TESTS = 1, not implement the infrastructure for itself. For one thing, it missed the appropriate "make clean" steps. For another, the buildfarm isn't running this test because it wasn't hooked into "make installcheck" either.	2020-03-25 22:05:27 -04:00
Andrew Dunstan	e984fb341f	Don't listen to localhost in ssl_passphrase_callback test Commit `896fcdb230` contained an unnecessary setting that listened to localhost. Since the test doesn't actually try to make an SSL connection to the database this isn't required. Moreover, it's a security hole. Per gripe from Tom Lane.	2020-03-25 21:14:14 -04:00
Tom Lane	13c98bdfc4	Fix assorted portability issues in commit `896fcdb23`. Some platforms require libssl to be linked explicitly in the new SSL test module. Borrow contrib/sslinfo's code for that. Since src/test/modules/Makefile now has a variable SUBDIRS list, it needs to follow the ALWAYS_SUBDIRS protocol for that (cf. comments in Makefile.global.in). Blindly try to fix MSVC build failures by adding PGDLLIMPORT.	2020-03-25 19:37:30 -04:00
Andrew Dunstan	896fcdb230	Provide a TLS init hook The default hook function sets the default password callback function. In order to allow preloaded libraries to have an opportunity to override the default, TLS initialization if now delayed slightly until after shared preloaded libraries have been loaded. A test module is provided which contains a trivial example that decodes an obfuscated password for an SSL certificate. Author: Andrew Dunstan Reviewed By: Andreas Karlsson, Asaba Takanori Discussion: https://postgr.es/m/04116472-818b-5859-1d74-3d995aab2252@2ndQuadrant.com	2020-03-25 17:13:17 -04:00
Alvaro Herrera	ffd398021c	pg_dump new test: Change order of arguments Some getopt_long implementations don't like to have a non-option argument before option arguments, so put the database name as the last switch. Per buildfarm member hoverfly.	2020-03-25 15:15:32 -03:00
Alvaro Herrera	2f9eb31320	pg_dump: Allow dumping data of specific foreign servers The new command-line switch --include-foreign-data=PATTERN lets the user specify foreign servers from which to dump foreign table data. This can be refined by further inclusion/exclusion switches, so that the user has full control over which tables to dump. A limitation is that this doesn't work in combination with parallel dumps, for implementation reasons. This might be lifted in the future, but requires shuffling some code around. Author: Luis Carril <luis.carril@swarm64.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Surafel Temesgen <surafel3000@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@2ndQuadrant.com> Discussion: https://postgr.es/m/LEJPR01MB0185483C0079D2F651B16231E7FC0@LEJPR01MB0185.DEUPRD01.PROD.OUTLOOK.DE	2020-03-25 13:19:31 -03:00
Tom Lane	bda6dedbea	Go back to returning int from ereport auxiliary functions. This reverts the parts of commit `17a28b0364` that changed ereport's auxiliary functions from returning dummy integer values to returning void. It turns out that a minority of compilers complain (not entirely unreasonably) about constructs such as (condition) ? errdetail(...) : 0 if errdetail() returns void rather than int. We could update those call sites to say "(void) 0" perhaps, but the expectation for this patch set was that ereport callers would not have to change anything. And this aspect of the patch set was already the most invasive and least compelling part of it, so let's just drop it. Per buildfarm. Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-25 11:57:36 -04:00
Peter Eisentraut	f5817595a7	Define EXEC_BACKEND in pg_config_manual.h It was for unclear reasons defined in a separate location, which makes it more cumbersome to override for testing, and it also did not have any prominent documentation. Move to pg_config_manual.h, where similar things are already collected. The previous definition on the command-line had the effect of defining it to the value 1, but now that we don't need that anymore we just define it to empty, to simplify manual editing a bit. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/b7053ba8-b008-5335-31de-2fe4fe41ef0f%402ndquadrant.com	2020-03-25 14:31:14 +01:00
Peter Eisentraut	e8b1774fc2	Update SQL features The name of E182 was changed in SQL:2011. Also, we can change it to supported because all it requires is one embedded language to be supported, which we do.	2020-03-25 08:46:41 +01:00
Thomas Munro	352f6f2df6	Add collation versions for Windows. On Vista and later, use GetNLSVersionEx() to request collation version information. Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com	2020-03-25 16:04:32 +13:00
Thomas Munro	382a821907	Allow NULL version for individual collations. Remove the documented restriction that collation providers must either return NULL for all collations or non-NULL for all collations. Use NULL for glibc collations like "C.UTF-8", which might otherwise lead future proposed commits to force unnecessary index rebuilds. Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com	2020-03-25 15:53:24 +13:00
Jeff Davis	dd8e19132a	Consider disk-based hash aggregation to implement DISTINCT. Correct oversight in `1f39bce0`. If enable_hashagg_disk=true, we should consider hash aggregation for DISTINCT when applicable.	2020-03-24 18:30:04 -07:00
Jeff Davis	3649133b14	Avoid allocating unnecessary zero-sized array. If there are no aggregates, there is no need to allocate an array of zero AggStatePerGroupData elements.	2020-03-24 18:30:04 -07:00
Peter Geoghegan	b150a76793	Fix nbtree deduplication README commentary. Descriptions of some aspects of how deduplication works were unclear in a couple of places.	2020-03-24 14:58:27 -07:00
Andres Freund	112b006fe7	logical decoding: Remove TODO about unnecessary optimization. Measurements show, and intuition agrees, that there's currently no known cases where adding a fastpath to avoid allocating / ordering a heap for a single transaction is worthwhile. Author: Dilip Kumar Discussion: https://postgr.es/m/CAFiTN-sp701wvzvnLQJGk7JDqrFM8f--97-ihbwkU8qvn=p8nw@mail.gmail.com	2020-03-24 12:15:03 -07:00
Peter Eisentraut	f15ace7935	Fix compiler warning on Cygwin `bf68b79e50` introduced an unused variable compiler warning on Cygwin.	2020-03-24 19:31:02 +01:00
Tom Lane	17a28b0364	Improve the internal implementation of ereport(). Change all the auxiliary error-reporting routines to return void, now that we no longer need to pretend they are passing something useful to errfinish(). While this probably doesn't save anything significant at the machine-code level, it allows detection of some additional types of mistakes. Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO) to errfinish not errstart. This shaves a few cycles off the case where errstart decides we're not going to emit anything. Re-implement elog() as a trivial wrapper around ereport(), removing the separate support infrastructure it used to have. Aside from getting rid of some now-surplus code, this means that elog() now really does have exactly the same semantics as ereport(), in particular that it can skip evaluation work if the message is not to be emitted. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-24 12:08:48 -04:00
Tom Lane	e3a87b4991	Re-implement the ereport() macro using __VA_ARGS__. Now that we require C99, we can depend on __VA_ARGS__ to work, and revising ereport() to use it has several significant benefits: * The extra parentheses around the auxiliary function calls are now optional. Aside from being a bit less ugly, this removes a common gotcha for new contributors, because in some cases the compiler errors you got from forgetting them were unintelligible. * The auxiliary function calls are now evaluated as a comma expression list rather than as extra arguments to errfinish(). This means that compilers can be expected to warn about no-op expressions in the list, allowing detection of several other common mistakes such as forgetting to add errmsg(...) when converting an elog() call to ereport(). * Unlike the situation with extra function arguments, comma expressions are guaranteed to be evaluated left-to-right, so this removes platform dependency in the order of the auxiliary function calls. While that dependency hasn't caused us big problems in the past, this change does allow dropping some rather shaky assumptions around errcontext() domain handling. There's no intention to make wholesale changes of existing ereport calls, but as proof-of-concept this patch removes the extra parens from a couple of calls in postgres.c. While new code can be written either way, code intended to be back-patched will need to use extra parens for awhile yet. It seems worth back-patching this change into v12, so as to reduce the window where we have to be careful about that by one year. Hence, this patch is careful to preserve ABI compatibility; a followup HEAD-only patch will make some additional simplifications. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-24 11:49:00 -04:00
Peter Eisentraut	cef27ae01a	Fix compiler warning A variable was unused in non-assert builds. Simplify the code to avoid the issue. Reported-by: Erik Rijkers <er@xs4all.nl>	2020-03-24 16:02:01 +01:00
Peter Eisentraut	97ee604d9b	Some refactoring of logical/worker.c This moves the main operations of apply_handle_{insert\|update\|delete}, that of inserting, updating, deleting a tuple into/from a given relation, into corresponding apply_handle_{insert\|update\|delete}_internal functions. This allows performing those operations on relations that are not directly the targets of replication, which is something a later patch will use for targeting partitioned tables. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-24 15:00:54 +01:00
Peter Eisentraut	d40d564c5a	Add support for other normal forms to Unicode normalization API It previously only supported NFKC, for use by SASLprep. This expands the API to offer the choice of all four normalization forms. Right now, there are no internal users of the forms other than NFKC. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com	2020-03-24 10:02:46 +01:00
Andres Freund	cedffbdb8b	Report wait event for cost-based vacuum delay. Author: Justin Pryzby Discussion: https://postgr.es/m/20200321040750.GD13662@telsasoft.com	2020-03-23 22:53:22 -07:00
Fujii Masao	496ee647ec	Prefer standby promotion over recovery pause. Previously if a promotion was triggered while recovery was paused, the paused state continued. Also recovery could be paused by executing pg_wal_replay_pause() even while a promotion was ongoing. That is, recovery pause had higher priority over a standby promotion. But this behavior was not desirable because most users basically wanted the recovery to complete as soon as possible and the server to become the master when they requested a promotion. This commit changes recovery so that it prefers a promotion over recovery pause. That is, if a promotion is triggered while recovery is paused, the paused state ends and a promotion continues. Also this commit makes recovery pause functions like pg_wal_replay_pause() throw an error if they are executed while a promotion is ongoing. Internally, this commit adds new internal function PromoteIsTriggered() that returns true if a promotion is triggered. Since the name of this function and the existing function IsPromoteTriggered() are confusingly similar, the commit changes the name of IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name. Author: Fujii Masao Reviewed-by: Atsushi Torikoshi, Sergei Kornilov Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com	2020-03-24 12:46:48 +09:00
Michael Paquier	e09ad07b21	Move routine building restore_command to src/common/ restore_command has only been used until now by the backend, but there is a pending patch for pg_rewind to make use of that in the frontend. Author: Alexey Kondratov Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander Korotkov, Michael Paquier Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru	2020-03-24 12:13:36 +09:00
Fujii Masao	b8e20d6dab	Add wait events for WAL archive and recovery pause. This commit introduces new wait events BackupWaitWalArchive and RecoveryPause. The former is reported while waiting for the WAL files required for the backup to be successfully archived. The latter is reported while waiting for recovery in pause state to be resumed. Author: Fujii Masao Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com	2020-03-24 11:12:21 +09:00
Jeff Davis	76df765e88	Reduce test time for disk-based Hash Aggregation. Discussion: https://postgr.es/m/23196.1584943506@sss.pgh.pa.us	2020-03-23 19:03:49 -07:00
Fujii Masao	67e0adfb3f	Report NULL as total backup size if it's not estimated. Previously 0 was reported in pg_stat_progress_basebackup.total_backup if the total backup size was not estimated. Per discussion, our consensus is that NULL is better choise as the value in total_backup in that case. So this commit makes pg_stat_progress_basebackup view report NULL in total_backup column if the estimation is disabled. Bump catversion. Author: Fujii Masao Reviewed-by: Amit Langote, Magnus Hagander, Alvaro Herrera Discussion: https://postgr.es/m/CABUevExnhOD89zBDuPvfAAh243RzNpwCPEWNLtMYpKHMB8gbAQ@mail.gmail.com	2020-03-24 10:43:41 +09:00
Jeff Davis	64fe602279	Fixes for Disk-based Hash Aggregation. Justin Pryzby raised a couple issues with commit `1f39bce0`. Fixed. Also, tweak the way the size of a hash entry is estimated and the number of buckets is estimated when calling BuildTupleHashTableExt(). Discussion: https://www.postgresql.org/message-id/20200319064222.GR26184@telsasoft.com	2020-03-23 15:43:07 -07:00
Andres Freund	f801ceb696	Add regression tests for constraint errors in partitioned tables. While #16293 only applied to 11 (and 10 to some degree), it seems best to add tests to all branches with partitioning support. Reported-By: Daniel WM Author: Andres Freund Bug: #16293 Discussion: https://postgr.es/m/16293-26f5777d10143a66@postgresql.org Backpatch: 10-	2020-03-23 15:06:11 -07:00
Alexander Korotkov	0df94beb36	Fix ordering in typedefs.list	2020-03-24 00:59:17 +03:00
Tom Lane	980a70b976	Fix our getopt_long's behavior for a command line argument of just "-". src/port/getopt_long.c failed on such an argument, always seeing it as an unrecognized switch. This is unhelpful; better is to treat such an item as a non-switch argument. That behavior is what we find in GNU's getopt_long(); it's what src/port/getopt.c does; and it is required by POSIX for getopt(), which getopt_long() ought to be generally a superset of. Moreover, it's expected by ecpg, which intends an argument of "-" to mean "read from stdin". So fix it. Also add some documentation about ecpg's behavior in this area, since that was miserably underdocumented. I had to reverse-engineer it from the code. Per bug #16304 from James Gray. Back-patch to all supported branches, since this has been broken forever. Discussion: https://postgr.es/m/16304-c662b00a1322db7f@postgresql.org	2020-03-23 11:58:00 -04:00
Michael Paquier	faa650a99b	Revert "Refactor compile-time assertion checks in c.h" This reverts commit `b7f64c6`, which broke the fallback implementation for C++. We have discussed a couple of alternatives to reduce the number of implementations for those asserts, but nothing allowing to reduce the number of implementations down to three instead of four, so there is no benefit in keeping this patch. Thanks to Tom Lane for the discussion. Discussion: https://postgr.es/m/20200313115033.GA183471@paquier.xyz	2020-03-23 12:52:37 +09:00
Amit Kapila	33753ac9d7	Add object names to partition integrity violations. All errors of SQLSTATE class 23 should include the name of an object associated with the error in separate fields of the error report message. We do this so that applications need not try to extract them from the possibly-localized human-readable text of the message. Reported-by: Chris Bandy Author: Chris Bandy Reviewed-by: Amit Kapila and Amit Langote Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com	2020-03-23 08:09:15 +05:30
Michael Paquier	79dfa8afb2	Add bound checks for ssl_min_protocol_version and ssl_max_protocol_version Mixing incorrect bounds in the SSL context leads to confusing error messages generated by OpenSSL which are hard to act on. New range checks are added when both min/max parameters are loaded in the context of a SSL reload to improve the error reporting. Note that this does not make use of the GUC hook machinery contrary to `41aadee`, as there is no way to ensure a consistent range check (except if there is a way one day to define range types for GUC parameters?). Hence, this patch applies only to OpenSSL, and uses a logic similar to other parameters to trigger an error when reloading the SSL context in a session. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz	2020-03-23 11:01:41 +09:00
Noah Misch	de9396326e	Revert "Skip WAL for new relfilenodes, under wal_level=minimal." This reverts commit `cb2fd7eac2`. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com	2020-03-22 09:24:09 -07:00
Tom Lane	d0587f52b3	Fix up recent breakage of headerscheck and cpluspluscheck. headerscheck and cpluspluscheck should skip the recently-added cmdtaglist.h header, since (like kwlist.h and some other similarly- designed headers) it's not meant to be included standalone. evtcache.h was missing an #include to support its usage of Bitmapset. typecmds.h was missing an #include to support its usage of ParseState. The first two of these were evidently oversights in commit `2f9661311`. I didn't track down exactly which change broke typecmds.h, but it must have been some rearrangement in one of its existing inclusions, because it's referenced ParseState for quite a long time and there were not complaints from these checking programs before.	2020-03-21 18:28:44 -04:00
Noah Misch	cb2fd7eac2	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-03-21 09:38:26 -07:00
Noah Misch	d3e572855b	In log_newpage_range(), heed forkNum and page_std arguments. The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com	2020-03-21 09:38:26 -07:00
Noah Misch	e629a01f69	During heap rebuild, lock any TOAST index until end of transaction. swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com	2020-03-21 09:38:26 -07:00
Noah Misch	d60ef94d76	Fix cosmetic blemishes involving rd_createSubid. Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting of a comment in struct RelationData, mangled by the pgindent run in commit `9af4159fce`. Back-patch to 9.5 (all supported versions), because another fix stacks on this.	2020-03-21 09:38:26 -07:00
Amit Kapila	3ba59ccc89	Allow page lock to conflict among parallel group members. This is required as it is no safer for two related processes to perform clean up in gin indexes at a time than for unrelated processes to do the same. After acquiring page locks, we can acquire relation extension lock but reverse never happens which means these will also not participate in deadlock. So, avoid checking wait edges from this lock. Currently, the parallel mode is strictly read-only, but after this patch we have the infrastructure to allow parallel inserts and parallel copy. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-21 08:48:06 +05:30
Alvaro Herrera	069b750ca7	Fix bogus last-minute edit in `4e62091341` Noticed by Erik Rijkers before I was able to push the fix.	2020-03-20 18:13:12 -03:00
Alvaro Herrera	4e62091341	pg_dump: Add FOREIGN to ALTER statements, if appropriate Author: Luis Carril Reviewed-by: Tomas Vondra, Daniel Gustafsson, Álvaro Herrera Discussion: https://postgr.es/m/LEJPR01MB0185A19B2E7C98E5E2A031F5E7F20@LEJPR01MB0185.DEUPRD01.PROD.OUTLOOK.DE	2020-03-20 17:33:18 -03:00
Andrew Dunstan	71c2fd0c04	Turn off deprecated bison warnings under MSVC These are disabled by the configure code, so this is just fixing an inconsistency in the MSVC code. Backpatch to all live branches.	2020-03-20 13:55:15 -04:00
Peter Eisentraut	b03436994b	psql: Catch and report errors while printing result table Errors (for example I/O errors or disk full) while printing out result tables were completely ignored, which could result in silently truncated output in scripts, for example. Fix by adding some basic error checking and reporting. Author: Daniel Verite <daniel@manitou-mail.org> Author: David Zhang <david.zhang@highgo.ca> Discussion: https://www.postgresql.org/message-id/flat/9a0b3c8d-ee14-4b1d-9d0a-2c993bdabacc@manitou-mail.org	2020-03-20 16:04:15 +01:00
Amit Kapila	85f6b49c2c	Allow relation extension lock to conflict among parallel group members. This is required as it is no safer for two related processes to extend the same relation at a time than for unrelated processes to do the same. We don't acquire a heavyweight lock on any other object after relation extension lock which means such a lock can never participate in the deadlock cycle. So, avoid checking wait edges from this lock. This provides an infrastructure to allow parallel operations like insert, copy, etc. which were earlier not possible as parallel group members won't conflict for relation extension lock. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-20 08:20:56 +05:30
Peter Geoghegan	b27e1b3418	nbtree: Remove obsolete _bt_pgaddtup() comments. Remove comments that are a throw back to a time when nbtree cared about write-ordering dependencies. The comments are similar to those removed by commit `9ee7414e`, among others.	2020-03-19 14:56:56 -07:00
Jeff Davis	2fd6a44ad5	Revert "Specialize MemoryContextMemAllocated()." This reverts commit `e00912e11a`.	2020-03-19 12:21:50 -07:00
Bruce Momjian	2247a1ea5f	pg_upgrade: make get_major_server_version() err msg consistent This patch fixes the error message in get_major_server_version() to be "could not parse version file", and uses the full file path name, rather than just the data directory path. Also, commit `4109bb5de4` added the cause of the failure to the "could not open" error message, and improved quoting. This patch backpatches the "could not open" cause to PG 12, where it was first widely used, and backpatches the quoting fix in that patch to all supported releases. Reported-by: Tom Lane Discussion: https://postgr.es/m/87pne2w98h.fsf@wibble.ilmari.org Author: Dagfinn Ilmari Mannsåker Backpatch-through: 9.5	2020-03-19 15:20:55 -04:00
Alexander Korotkov	45452825e5	Add new typedefs introduced in `773df883e8` to typedefs.list	2020-03-19 21:40:45 +03:00
Tom Lane	24e2885ee3	Introduce "anycompatible" family of polymorphic types. This patch adds the pseudo-types anycompatible, anycompatiblearray, anycompatiblenonarray, and anycompatiblerange. They work much like anyelement, anyarray, anynonarray, and anyrange respectively, except that the actual input values need not match precisely in type. Instead, if we can find a common supertype (using the same rules as for UNION/CASE type resolution), then the parser automatically promotes the input values to that type. For example, "myfunc(anycompatible, anycompatible)" can match a call with one integer and one bigint argument, with the integer automatically promoted to bigint. With anyelement in the definition, the user would have had to cast the integer explicitly. The new types also provide a second, independent set of type variables for function matching; thus with "myfunc(anyelement, anyelement, anycompatible) returns anycompatible" the first two arguments are constrained to be the same type, but the third can be some other type, and the result has the type of the third argument. The need for more than one set of type variables was foreseen back when we first invented the polymorphic types, but we never did anything about it. Pavel Stehule, revised a bit by me Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com	2020-03-19 11:43:11 -04:00
Fujii Masao	fab13dc50b	Make pg_basebackup ask the server to estimate the total backup size, by default. This commit changes pg_basebackup so that it specifies PROGRESS option in BASE_BACKUP replication command whether --progress is specified or not. This causes the server to estimate the total backup size and report it in pg_stat_progress_basebackup.backup_total, by default. This is reasonable default because the time required for the estimation would not be so large in most cases. Also this commit adds new option --no-estimate-size to pg_basebackup. This option prevents the server from the estimation, and so is useful to avoid such estimation time if it's too long. Author: Fujii Masao Reviewed-by: Magnus Hagander, Amit Langote Discussion: https://postgr.es/m/CABUevEyDPPSjP7KRvfTXPdqOdY5aWNkqsB5aAXs3bco5ZwtGHg@mail.gmail.com	2020-03-19 17:09:00 +09:00
Peter Eisentraut	c314c147c0	Prepare to support non-tables in publications This by itself doesn't change any functionality but prepares the way for having relations other than base tables in publications. Make arrangements for COPY handling the initial table sync. For non-tables we have to use COPY (SELECT ...) instead of directly copying from the table, but then we have to take care to omit generated columns from the column list. Also, remove a hardcoded reference to relkind = 'r' and rely on the publisher to send only what it can actually publish, which will be correct even in future cross-version scenarios. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-19 08:25:07 +01:00
Fujii Masao	1d253bae57	Rename the recovery-related wait events. This commit renames RecoveryWalAll and RecoveryWalStream wait events to RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively, in order to make the names and what they are more consistent. For example, previously RecoveryWalAll was reported as a wait event while the recovery was waiting for WAL from a stream, and which was confusing because the name was very different from the situation where the wait actually could happen. The names of macro variables for those wait events also are renamed accordingly. This commit also changes the category of RecoveryRetrieveRetryInterval to Timeout from Activity because the wait event is reported while waiting based on wal_retrieve_retry_interval. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com	2020-03-19 15:32:55 +09:00
Amit Kapila	72e78d831a	Add assert to ensure that page locks don't participate in deadlock cycle. Assert that we don't acquire any other heavyweight lock while holding the page lock except for relation extension. However, these locks are never taken in reverse order which implies that page locks will never participate in the deadlock cycle. Similar to relation extension, page locks are also held for a short duration, so imposing such a restriction won't hurt. Author: Dilip Kumar, with few changes by Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-19 08:11:45 +05:30
Peter Geoghegan	6312c08a29	nbtree: Use raw PageAddItem() for retail inserts. Only internal page splits need to call _bt_pgaddtup() instead of PageAddItem(), and only for data items, one of which will end up at the first offset (or first offset after the high key offset) on the new right page. This data item alone will need to be truncated in _bt_pgaddtup(). Since there is no reason why retail inserts ever need to truncate the incoming item, use a raw PageAddItem() call there instead. Even _bt_split() uses raw PageAddItem() calls for left page and right page high keys. Clearly the _bt_pgaddtup() shim function wasn't really encapsulating anything. _bt_pgaddtup() should now be thought of as a _bt_split() helper function. Note that the assertions from commit `d1e241c2` verify that retail inserts never insert an item at an internal page's negative infinity offset. This invariant could only ever be violated as a result of a basic logic error in nbtinsert.c.	2020-03-18 18:17:37 -07:00
Michael Paquier	d41202f36e	Fix comment related to concurrent index swapping in index.c A comment about switching indisvalid of the new and old indexes swapped in REINDEX CONCURRENTLY got this backwards. Issue introduced by `5dc92b8`, the original commit of REINDEX CONCURRENTLY. Author: Julien Rouhaud Discussion: https://postgr.es/m/20200318143340.GA46897@nol Backpatch-through: 12	2020-03-19 09:51:33 +09:00
Jeff Davis	1f39bce021	Disk-based Hash Aggregation. While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode". In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later. The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively). Author: Jeff Davis Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com	2020-03-18 15:42:02 -07:00
Jeff Davis	e00912e11a	Specialize MemoryContextMemAllocated(). An AllocSet doubles the size of allocated blocks (up to maxBlockSize), which means that the current block can represent half of the total allocated space for the memory context. But the free space in the current block may never have been touched, so don't count the untouched memory as allocated for the purposes of MemoryContextMemAllocated(). Discussion: https://postgr.es/m/ec63d70b668818255486a83ffadc3aec492c1f57.camel@j-davis.com	2020-03-18 15:39:14 -07:00
Alvaro Herrera	487e9861d0	Enable BEFORE row-level triggers for partitioned tables ... with the limitation that the tuple must remain in the same partition. Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/20200227165158.GA2071@alvherre.pgsql	2020-03-18 18:58:05 -03:00
Peter Geoghegan	b029395f5e	Refactor nbtree fastpath optimization. Commit `2b272734`, which added the fastpath rightmost leaf page cache insert optimization, added code to _bt_doinsert() to handle using and invalidating the backend local block cache. It doesn't seem like a good place to handle these low level details, though. _bt_doinsert() is supposed to be a high level function -- it is the main entry point to nbtinsert.c. Restructure the code by placing handling of the rightmost block cache at the start of a new _bt_search() shim function, _bt_search_insert(). The new function is called from _bt_doinsert(), which uses it as a _bt_search() variant that conveniently accepts its BTInsertState state as an argument. _bt_doinsert() no longer needs to directly consider the fastpath optimization. Discussion: https://postgr.es/m/CAH2-Wzk59cxKJRd=rfbyub6-V4yWRjsOYRkUNHBLT1P1GdtCQQ@mail.gmail.com	2020-03-18 14:42:49 -07:00
Peter Eisentraut	a2b1faa0f2	Implement type regcollation This will be helpful for a following commit and it's also just generally useful, like the other reg* types. Author: Julien Rouhaud Reviewed-by: Thomas Munro and Michael Paquier Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com	2020-03-18 21:21:00 +01:00
Tomas Vondra	ccaa3569f5	Recognize some OR clauses as compatible with functional dependencies Since commit `8f321bd16c` functional dependencies can handle IN clauses, which however introduced a possible (and surprising) inconsistency, because IN clauses may be expressed as an OR clause, which are still considered incompatible. For example a IN (1, 2, 3) may be rewritten as (a = 1 OR a = 2 OR a = 3) The IN clause will work fine with functional dependencies, but the OR clause will force the estimation to fall back to plain per-column estimates, possibly introducing significant estimation errors. This commit recognizes OR clauses equivalent to an IN clause (when all arugments are compatible and reference the same attribute) as a special case, compatible with functional dependencies. This allows applying functional dependencies, just like for IN clauses. This does not eliminate the difference in estimating the clause itself, i.e. IN clause and OR clause still use different formulas. It would be possible to change that (for these special OR clauses), but that's not really about extended statistics - it was always like this. Moreover the errors are usually much smaller compared to ignoring dependencies. Author: Tomas Vondra Reviewed-by: Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-18 16:41:49 +01:00
Tomas Vondra	6f72dbc48b	Fix wording of several extended stats comments Reported-by: Thomas Munro Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development	2020-03-18 13:40:13 +01:00
Amit Kapila	b4f140869f	Add missing errcode() in a few ereport calls. This will allow to specifying SQLSTATE error code for the errors in the missing places. Reported-by: Sawada Masahiko Author: Sawada Masahiko Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-18 09:27:14 +05:30
Michael Paquier	fdeeb524b4	Fix typo in indexcmds.c Introduced by `61d7c7b`. Backpatch-through: 12	2020-03-18 11:13:12 +09:00

... 4 5 6 7 8 ...

35665 Commits