postgresql

Commit Graph

Author	SHA1	Message	Date
Alvaro Herrera	afccd76f1c	Fix detaching partitions with cloned row triggers When a partition is detached, any triggers that had been cloned from its parent were not properly disentangled from its parent triggers. This resulted in triggers that could not be dropped because they depended on the trigger in the trigger in the no-longer-parent table: ALTER TABLE t DETACH PARTITION t1; DROP TRIGGER trig ON t1; ERROR: cannot drop trigger trig on table t1 because trigger trig on table t requires it HINT: You can drop trigger trig on table t instead. Moreover the table can no longer be re-attached to its parent, because the trigger name is already taken: ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2); ERROR: trigger "trig" for relation "t1" already exists The former is a bug introduced in commit `86f575948c`. (The latter is not necessarily a bug, but it makes the bug more uncomfortable.) To avoid the complexity that would be needed to tell whether the trigger has a local definition that has to be merged with the one coming from the parent table, establish the behavior that the trigger is removed when the table is detached. Backpatch to pg11. Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/20200408152412.GZ2228@telsasoft.com	2020-04-21 13:57:00 -04:00
Peter Geoghegan	1542e16f2c	Consider outliers in split interval calculation. Commit `0d861bbb`, which introduced deduplication to nbtree, added some logic to take large posting list tuples into account when choosing a split point. We subtract firstright posting list overhead from the projected new high key size when calculating leftfree/rightfree values for an affected candidate split point. Posting list tuples aren't special to nbtsplitloc.c, but taking them into account like this makes a huge difference in practice. Posting list tuples are frequently tuple size outliers. However, commit `0d861bbb` missed a closely related issue: split interval itself is calculated based on the assumption that tuples on the page being split are roughly equisized. That assumption was acceptable back when commit `fab25024` taught the logic for choosing a split point about suffix truncation, but it's pretty questionable now that very large tuple sizes are common. This oversight led to unbalanced page splits in low cardinality multi-column indexes when deduplication was used: page splits that don't give sufficient weight to how unbalanced the split is when the interval happens to include some large posting list tuples (and when most other tuples on the page are not so large). Nail this down by calculating an initial split interval in a way that's attuned to the actual cost that we want to keep under control (not a fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to each candidate split point that actually gets included in the split interval (for the default strategy). This replaces logic that used a percentage of all legal split points for the page as the basis of the initial split interval. Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com	2020-04-21 09:59:24 -07:00
Tom Lane	1c455078b0	Allow matchingsel() to be used with operators that might return NULL. Although selfuncs.c will never call a target operator with null inputs, some functions might return null anyway. The existing coding will fail if that happens (since FunctionCall2Coll will punt), which seems undesirable given that matchingsel() has such a broad range of potential applicability --- in fact, we already have a problem because we apply it to jsonb_path_exists_opr, which can return null. Hence, rejigger the underlying functions mcv_selectivity and histogram_selectivity to cope, treating a null result as false. While we are at it, we can move the InitFunctionCallInfoData overhead out of the inner loops, which isn't a huge number of cycles but might save something considering we are likely calling functions as cheap as int4eq(). Plus, the number of loop cycles to be expected is much more than it was when this code was written, since typical settings of default_statistics_target are higher. In view of that consideration, let's apply the same change to var_eq_const, eqjoinsel_inner, and eqjoinsel_semi. We do not expect equality functions to ever return null for non-null inputs (and certainly that code has been that way a long time without complaints), but the cycle savings seem attractive, especially in the eqjoinsel loops where there's potentially an O(N^2) savings. Similar code exists in ineq_histogram_selectivity and get_variable_range, but I forebore from changing those for now. The performance argument for changing ineq_histogram_selectivity is really weak anyway, since that will only iterate log2(N) times. Nikita Glukhov and Tom Lane Discussion: https://postgr.es/m/9d3b0959-95d6-c37e-2c0b-287bcfe5c705@postgrespro.ru	2020-04-21 12:56:55 -04:00
Tom Lane	9d25e1aa31	Clean up cpluspluscheck violation. "operator" is a reserved word in C++, so per project conventions, don't use it as an identifier in header files. My oversight in commit `a80818605`.	2020-04-21 11:21:15 -04:00
Robert Haas	079ac29d4d	Move the server's backup manifest code to a separate file. basebackup.c is already a pretty big and complicated file, so it makes more sense to keep the backup manifest support routines in a separate file, for clarity and ease of maintenance. Discussion: http://postgr.es/m/CA+TgmoavRak5OdP76P8eJExDYhPEKWjMb0sxW7dF01dWFgE=uA@mail.gmail.com	2020-04-20 14:38:15 -04:00
Alvaro Herrera	5fc703946b	Add ALTER .. NO DEPENDS ON Commit `f2fcad27d5` (9.6 era) added the ability to mark objects as dependent an extension, but forgot to add a way for such dependencies to be removed. This commit fixes that oversight. Strictly speaking this should be backpatched to 9.6, but due to lack of demand we're not doing so at this time. Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql Reviewed-by: ahsan hadi <ahsan.hadi@gmail.com> Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2020-04-20 13:42:12 -04:00
Magnus Hagander	7e4e574744	Allow pg_read_all_stats to access all stats views again The views pg_stat_progress_* had not gotten the memo that pg_read_all_stats is supposed to be able to read all statistics. Also make a pass over all text-returning pg_stat_xyz functions that could return "insufficient privilege" and make sure they also respect pg_read_all_status. Reported-by: Andrey M. Borodin Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru	2020-04-20 12:53:40 +02:00
Jeff Davis	0cacb2b79d	Fix missing pfree() in logtape.c, missed by `24d85952`.	2020-04-19 10:33:06 -07:00
Tom Lane	f332241a60	Fix race conditions in synchronous standby management. We have repeatedly seen the buildfarm reach the Assert(false) in SyncRepGetSyncStandbysPriority. This apparently is due to failing to consider the possibility that the sync_standby_priority values in shared memory might be inconsistent; but they will be whenever only some of the walsenders have updated their values after a change in the synchronous_standby_names setting. That function is vastly too complex for what it does, anyway, so rewriting it seems better than trying to apply a band-aid fix. Furthermore, the API of SyncRepGetSyncStandbys is broken by design: it returns a list of WalSnd array indexes, but there is nothing guaranteeing that the contents of the WalSnd array remain stable. Thus, if some walsender exits and then a new walsender process takes over that WalSnd array slot, a caller might make use of WAL position data that it should not, potentially leading to incorrect decisions about whether to release transactions that are waiting for synchronous commit. To fix, replace SyncRepGetSyncStandbys with a new function SyncRepGetCandidateStandbys that copies all the required data from shared memory while holding the relevant mutexes. If the associated walsender process then exits, this data is still safe to make release decisions with, since we know that that much WAL was sent to a valid standby server. This incidentally means that we no longer need to treat sync_standby_priority as protected by the SyncRepLock rather than the per-walsender mutex. SyncRepGetSyncStandbys is no longer used by the core code, so remove it entirely in HEAD. However, it seems possible that external code is relying on that function, so do not remove it from the back branches. Instead, just remove the known-incorrect Assert. When the bug occurs, the function will return a too-short list, which callers should treat as meaning there are not enough sync standbys, which seems like a reasonably safe fallback until the inconsistent state is resolved. Moreover it's bug-compatible with what has been happening in non-assert builds. We cannot do anything about the walsender-replacement race condition without an API/ABI break. The bogus assertion exists back to 9.6, but 9.6 is sufficiently different from the later branches that the patch doesn't apply at all. I chose to just remove the bogus assertion in 9.6, feeling that the probability of a bad outcome from the walsender-replacement race condition is too low to justify rewriting the whole patch for 9.6. Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us	2020-04-18 14:02:44 -04:00
David Rowley	3cb02e307e	Fix possible crash with GENERATED ALWAYS columns In some corner cases, this could also lead to corrupted values being included in the tuple. Users who are concerned that they are affected by this should first upgrade and then perform a base backup of their database and restore onto an off-line server. They should then query each table with generated columns to ensure there are no rows where the generated expression does not match a newly calculated version of the GENERATED ALWAYS expression. If no crashes occur and no rows are returned then you're not affected. Fixes bug #16369. Reported-by: Cameron Ezell Discussion: https://postgr.es/m/16369-5845a6f1bef59884@postgresql.org Backpatch-through: 12 (where GENERATED ALWAYS columns were added.)	2020-04-18 14:10:37 +12:00
Tom Lane	3125a5baec	Fix possible future cache reference leak in ALTER EXTENSION ADD/DROP. recordExtObjInitPriv and removeExtObjInitPriv were sloppy about calling ReleaseSysCache. The cases cannot occur given current usage in ALTER EXTENSION ADD/DROP, since we wouldn't get here for these relkinds; but it seems wise to clean up better. In passing, extend test logic in test_pg_dump to exercise the dropped-column code paths here. Since the case is unreachable at present, there seems no great need to back-patch; hence fix HEAD only. Kyotaro Horiguchi, with test case and comment adjustments by me Discussion: https://postgr.es/m/20200417.151831.1153577605111650154.horikyota.ntt@gmail.com	2020-04-17 13:41:59 -04:00
David Rowley	5b736e9cf9	Remove unneeded constraint dependency tracking It was previously thought that remove_useless_groupby_columns() needed to keep track of which constraints the generated plan depended upon, however, this is unnecessary. The confusion likely arose regarding this because of check_functional_grouping(), which does need to track the dependency to ensure VIEWs with columns which are functionally dependant on the GROUP BY remain so. For remove_useless_groupby_columns(), cached plans will just become invalidated when the primary key's underlying index is removed through the normal relcache invalidation code. Here we just remove the unneeded code which records the dependency and updates the comments. The previous comments claimed that we could not use UNIQUE constraints for the same optimization due to lack of a pg_constraint record for NOT NULL constraints (which are required because NULLs can be duplicated in a unique index). Since we don't actually need a pg_constraint record to handle the invalidation, it looks like we could add code to do this in the future. But not today. We're not really fixing any bug in the code here, this fix is just to set the record straight on UNIQUE constraints. This code was added back in 9.6, but due to lack of any bug, we'll not be backpatching this. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvrdYa=VhOoMe4ZZjZ-G4ALnD-xuAeUNCRTL+PYMVN8OnQ@mail.gmail.com	2020-04-17 10:29:49 +12:00
Amit Kapila	24d2d38b1e	Fix the usage of parallel and full options of vacuum command. Earlier we were inconsistent in allowing the usage of parallel and full options. Change it such that we disallow them only when they are combined in a way that we don't support. In passing, improve the comments in some of the existing tests of parallel vacuum. Reported-by: Tushar Ahuja Author: Justin Pryzby, Amit Kapila Reviewed-by: Sawada Masahiko, Michael Paquier, Mahendra Singh Thalor and Amit Kapila Discussion: https://postgr.es/m/58c8d171-e665-6fa3-a9d3-d9423b694dae%40enterprisedb.com	2020-04-16 10:55:02 +05:30
Peter Geoghegan	f0ca378d4c	Slightly simplify nbtree split point choice loop. Spotted during post-commit review of the nbtree deduplication commit (commit `0d861bbb`).	2020-04-15 15:47:26 -07:00
Peter Geoghegan	4a05a64095	Remove obsolete "hole in center of page" comment. A comment from the Berkeley days incorrectly claimed that the page management code cares about the contents of the hole in the center of the page (at least in the case of the left half of an nbtree page split). Commit `8fa30f906b` added an addendum that stated that the original comment was "probably obsolete". It's definitely obsolete, though, so remove the original comment plus the addendum.	2020-04-14 14:38:28 -07:00
Tom Lane	2d59643dbc	Account for collation when coercing the output of a SQL function. Commit `913bbd88d` overlooked that the result of coerce_to_target_type might need collation fixups. Per report from Andreas Joseph Krogh. Discussion: https://postgr.es/m/VisenaEmail.72.37d08ec2b8cb8fb5.17179940cd3@tc7-visena	2020-04-14 17:30:36 -04:00
Andrew Dunstan	e60c6f6ea1	Set Perl search path more idiomatically Back in commits `1df92eeafe`, `f884a96819`, and `592123efbb` I used some hackish code to set the script search path, unaware despite decades of perl that there was a completely standard way to do this. This patch changes those cases to use the standard perl FindBin package.	2020-04-14 16:47:07 -04:00
Peter Geoghegan	80634e3b18	Rearrange _bt_insertonpg() "update metapage" code. Nest the "update metapage as part of insert into root-like page" branch inside the broader "insert into internal page" branch. This improves readability.	2020-04-14 09:33:18 -07:00
Michael Paquier	8128b0c152	Fix collection of typos and grammar mistakes in the tree, volume 2 This fixes some comments and documentation new as of Postgres 13, and is a follow-up of the work done in `dd0f37e`. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com	2020-04-14 14:45:43 +09:00
Peter Geoghegan	f762b2feba	Add defensive "split_only_page" nbtree assertion. Clearly it's not okay for nbtree to split a page that is the only page on its level, and then find that it has to split the parent one level up in turn. There is simply no code to handle the split_only_page case in the _bt_insertonpg() "newitem won't fit" branch (only the "newitem fits" branch handles split_only_page). Add a defensive assertion that will fail if a split_only_page call to _bt_insertonpg() somehow ends up splitting the target/parent page. I (pgeoghegan) believe that we don't need split_only_page handling for the "newitem won't fit" branch because anybody calling _bt_insertonpg() like this would have to hold a lock on the same one and only child page.	2020-04-13 21:11:03 -07:00
Amit Kapila	a6fea120a7	Comments and doc fixes for commit `40d964ec99`. Reported-by: Justin Pryzby Author: Justin Pryzby, with few changes by me Reviewed-by: Amit Kapila and Sawada Masahiko Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com	2020-04-14 08:10:27 +05:30
Peter Geoghegan	826ee1a019	Make _bt_insertonpg() more like _bt_split(). It seems like a good idea for nbtree's retail insert code to be absolutely consistent with nbtree's page split code for anything that naturally requires equivalent handling. Anything that concerns inserting newitem (which is handled as part of the page split atomic action when a page split is required) should work in exactly the same way. With that in mind, make _bt_insertonpg() handle 'cbuf' in a way that matches _bt_split().	2020-04-13 19:26:41 -07:00
Peter Geoghegan	bc3087b626	Harmonize nbtree page split point code. An nbtree split point can be thought of as a point between two adjoining tuples from an imaginary version of the page being split that includes the incoming/new item (in addition to the items that really are on the page). These adjoining tuples are called the lastleft and firstright tuples. The variables that represent split points contained a field called firstright, which is an offset number of the first data item from the original page that goes on the new right page. The corresponding tuple from origpage was usually the same thing as the actual firstright tuple, but not always: the firstright tuple is sometimes the new/incoming item instead. This situation seems unnecessarily confusing. Make things clearer by renaming the origpage offset returned by _bt_findsplitloc() to "firstrightoff". We now have a firstright tuple and a firstrightoff offset number which are comparable to the newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers respectively. Also make sure that we are consistent about how we describe nbtree page split point state. Push the responsibility for dealing with pg_upgrade'd !heapkeyspace indexes down to lower level code, relieving _bt_split() from dealing with it directly. This means that we always have a palloc'd left page high key on the leaf level, no matter what. This enables simplifying some of the code (and code comments) within _bt_split(). Finally, restructure the page split code to make it clearer why suffix truncation (which only takes place during leaf page splits) is completely different to the first data item truncation that takes place during internal page splits. Tuples are marked as having fewer attributes stored in both cases, and the firstright tuple is truncated in both cases, so it's easy to imagine somebody missing the distinction.	2020-04-13 16:39:55 -07:00
Andrew Dunstan	7be5d8df1f	Use perl warnings pragma consistently We've had a mixture of the warnings pragma, the -w switch on the shebang line, and no warnings at all. This patch removes the -w swicth and add the warnings pragma to all perl sources missing it. It raises the severity of the TestingAndDebugging::RequireUseWarnings perlcritic policy to level 5, so that we catch any future violations. Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com	2020-04-13 11:55:45 -04:00
Amit Kapila	ef08ca113f	Cosmetic fixups for WAL usage work. Reported-by: Justin Pryzby and Euler Taveira Author: Justin Pryzby and Julien Rouhaud Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-13 15:31:16 +05:30
Peter Eisentraut	0c620a5803	Improve error messages after LoadLibrary() Move the file name to a format parameter to ease translatability. Add error code where missing. Make the wording consistent.	2020-04-13 10:24:46 +02:00
Tom Lane	35cb574aa8	Suppress -Wimplicit-fallthrough warning in new LIMIT WITH TIES code. The placement of the fall-through comment in this code appears not to work to suppress the warning in recent gcc. Move it to the bottom of the case group, and add an assertion that we didn't get there through some other code path. Also improve wording of nearby comments. Julien Rouhaud, comment hacking by me Discussion: https://postgr.es/m/CAOBaU_aLdPGU5wCpaowNLF-Q8328iR7mj1yJAhMOVsdLwY+sdg@mail.gmail.com	2020-04-11 15:02:44 -04:00
Noah Misch	328c70997b	Optimize RelationFindReplTupleSeq() for CLOBBER_CACHE_ALWAYS. Specifically, remember lookup_type_cache() results instead of retrieving them once per comparison. Under CLOBBER_CACHE_ALWAYS, this reduced src/test/subscription/t/001_rep_changes.pl elapsed time by an order of magnitude, which reduced check-world elapsed time by 9%. Discussion: https://postgr.es/m/20200406085420.GC162712@rfd.leadboat.com	2020-04-11 10:30:12 -07:00
Noah Misch	4216858122	When WalSndCaughtUp, sleep only in WalSndWaitForWal(). Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write < sentPtr. That is important in logical replication. When the latest physical LSN yields no logical replication messages (a common case), that keepalive elicits a reply, and processing the reply updates pg_stat_replication.replay_lsn. WalSndLoop() lacks that; when WalSndLoop() slept, replay_lsn advancement could stall until wal_receiver_status_interval elapsed. This sometimes stalled src/test/subscription/t/001_rep_changes.pl for up to 10s. Discussion: https://postgr.es/m/20200406063649.GA3738151@rfd.leadboat.com	2020-04-11 10:30:00 -07:00
Tom Lane	969f9d0b4b	Make EXPLAIN report maximum hashtable usage across multiple rescans. Before discarding the old hash table in ExecReScanHashJoin, capture its statistics, ensuring that we report the maximum hashtable size across repeated rescans of the hash input relation. We can repurpose the existing code for reporting hashtable size in parallel workers to help with this, making the patch pretty small. This also ensures that if rescans happen within parallel workers, we get the correct maximums across all instances. Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro of a trouble report from Alvaro Herrera. Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql	2020-04-11 12:39:19 -04:00
Tom Lane	5c27bce7f3	Clear dangling pointer to avoid bogus EXPLAIN printout in a corner case. ExecReScanHashJoin will destroy the join's hash table if it expects that the inner relation will produce different rows on rescan. Up to now it's not bothered to clear the additional pointer to that hash table that exists in the child HashState node. However, it's possible for the query to terminate without building a fresh hash table (this happens if the outer relation is found to be empty during the final rescan). So we can end with a dangling pointer to a deleted hash table. That was harmless originally, but since 9.0 EXPLAIN ANALYZE has used that pointer to print hash table statistics. In debug builds this reproducibly results in garbage statistics. In non-debug builds there's frequently no ill effects, but in principle one could get wrong EXPLAIN ANALYZE output, or perhaps even a crash if free() has released the hashtable memory back to the OS. To fix, just make sure we clear the additional pointer when destroying the hash table. In problematic cases, EXPLAIN ANALYZE will then print no hashtable statistics (reverting to its pre-9.0 behavior). This isn't ideal, but since the problem manifests only in unusual corner cases, it's hard to justify taking any risks to do better in the back branches. A follow-on patch will improve matters in HEAD. Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro of a trouble report from Alvaro Herrera. Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql	2020-04-11 12:29:06 -04:00
Peter Eisentraut	12fb189bfe	Fix RELCACHE_FORCE_RELEASE issue Introduced by `83fd4532a7`. To fix, the tuple descriptors need to be copied into the current memory context. Discussion: https://www.postgresql.org/message-id/04d78603-edae-9243-9dde-fe3037176a7d@2ndquadrant.com	2020-04-11 15:07:25 +02:00
Peter Eisentraut	5a1d0c9925	Fix relcache reference leak Introduced by `83fd4532a7`	2020-04-11 09:44:14 +02:00
Tom Lane	401418ca6a	Suppress unused-variable warning. Ashutosh Bapat Discussion: https://postgr.es/m/CAG-ACPWPB8Lc_aFj25eiPFqi31YB5vmaZnb39mbHSf5Yej=miA@mail.gmail.com	2020-04-10 12:00:28 -04:00
Michael Paquier	dd0f37ecce	Fix collection of typos and grammar mistakes in the tree This fixes some comments and documentation new as of Postgres 13. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com	2020-04-10 11:18:39 +09:00
Tom Lane	2e0e409e3c	Further cleanup of ts_headline code. Suppress a probably-meaningless uninitialized-variable warning (induced by my previous patch, I'm sorry to say). Improve mark_hl_fragments()'s test for overlapping cover strings: it failed to consider the possibility that the current string is strictly within another one. That's unlikely given the preceding splitting into MaxWords fragments, but I don't think it's impossible. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 15:38:43 -04:00
Tom Lane	c9b0c678d3	Fix default text search parser's ts_headline code for phrase queries. This code could produce very poor results when asked to highlight a string based on a query using phrase-match operators. The root cause is that hlCover(), which is supposed to find a minimal substring that matches the query, was written assuming that word position is not significant. I'm only 95% convinced that its algorithm was correct even for plain AND/OR queries; but it definitely fails completely for phrase matches, causing it to possibly not identify a cover string at all. Hence, rewrite hlCover() with a less-tense algorithm that just tries all the possible substrings, earlier and shorter ones first. (This is not as bad as it sounds performance-wise, because all of the string matching has been done already: the repeated tsquery match checks boil down to pointer comparisons.) Unfortunately, since that approach produces more candidate cover strings than before, it also exposes that there were bugs in the heuristics in mark_hl_words() for selecting a best cover string. Fixes there include: * Do not apply the ShortWord filter to words that appear in the query. * Remove a misguided optimization for quickly rejecting a cover. * Fix order-of-operation bug that could cause computation of a wrong figure of merit (poslen) when shortening a cover. * Change the preference rule so that candidate headlines that do not include their whole cover string (after MaxWords trimming) are lowest priority, since they may not actually satisfy the user's query. This results in some changes in existing regression test cases, but they all seem reasonable. Note in particular that the tests involving strings like "1 2 3" were previously being affected by the ShortWord filter, masking the normal matching behavior. Per bug #16345 from Augustinas Jokubauskas; the new test cases are based on that example. Back-patch to 9.6 where phrase search was added to tsquery. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 13:19:23 -04:00
Tom Lane	b10f8bb9fd	Cosmetic improvements for default text search parser's ts_headline code. This code was woefully unreadable and under-commented. Try to improve matters by adding comments, using some macros to make complicated if-tests more readable, using boolean type where appropriate, etc. There are a couple of tiny coding improvements too, but this commit includes (I hope) no behavioral change. Nonetheless, back-patch as far as 9.6, because a followup bug-fixing commit depends on this. Discussion: https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org	2020-04-09 12:36:59 -04:00
Peter Eisentraut	e92e4a2b68	Fix CREATE TABLE LIKE INCLUDING GENERATED column order issue CREATE TABLE LIKE INCLUDING GENERATED would fail if a generated column referred to a column with a higher attribute number. This is because the column mapping mechanism created the mapping incrementally as columns are added. This was sufficient for previous uses of that mechanism (omitting dropped columns), and it also happened to work if generated columns only referred to columns with lower attribute numbers, but here it failed. This fix is to build the attribute mapping in a separate loop before processing the columns in detail. Bug: #16342 Reported-by: Ethan Waldo <ewaldo@healthetechs.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>	2020-04-09 16:36:45 +02:00
Fujii Masao	1ec50a81ec	Exclude backup_manifest file that existed in database, from BASE_BACKUP. If there is already a backup_manifest file in the database cluster, it belongs to the past backup that was used to start this server. It is not correct for the backup being taken now. So this commit changes pg_basebackup so that it always skips such backup_manifest file. The backup_manifest file for the current backup will be injected separately if users want it. Author: Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/78f76a3d-1a28-a97d-0394-5c96985dd1c0@oss.nttdata.com	2020-04-09 22:37:11 +09:00
Amit Kapila	5c71362174	Allow parallel create index to accumulate buffer usage stats. Currently, we don't account for buffer usage incurred by parallel workers for parallel create index. This commit allows each worker to record the buffer usage stats and leader backend to accumulate that stats at the end of the operation. This will allow pg_stat_statements to display correct buffer usage stats for (parallel) create index command. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Julien Rouhaud and Amit Kapila Backpatch-through: 11, where this was introduced Discussion: https://postgr.es/m/20200328151721.GB12854@nol	2020-04-09 09:49:30 +05:30
Thomas Munro	d140f2f3e2	Rationalize GetWalRcv{Write,Flush}RecPtr(). GetWalRcvWriteRecPtr() previously reported the latest flushed location. Adopt the conventional terminology used elsewhere in the tree by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related variables that used the term "received". Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest written value. This will allow later patches to use the value for non-data-integrity purposes, without having to wait for the flush pointer to advance. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2020-04-08 23:45:09 +12:00
Peter Eisentraut	83fd4532a7	Allow publishing partition changes via ancestors To control whether partition changes are replicated using their own identity and schema or an ancestor's, add a new parameter that can be set per publication named 'publish_via_partition_root'. This allows replicating a partitioned table into a different partition structure on the subscriber. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-04-08 11:19:23 +02:00
Alexander Korotkov	1aac32df89	Revert `0f5ca02f53` `0f5ca02f53` introduces 3 new keywords. It appears to be too much for relatively small feature. Given now we past feature freeze, it's already late for discussion of the new syntax. So, revert. Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us	2020-04-08 11:37:27 +03:00
David Rowley	02a2e8b442	Modify additional power 2 calculations to use new helper functions 2nd pass of modifying various places which obtain the next power of 2 of a number and make them use the new functions added in `f0705bb62`. In passing, also modify num_combinations(). This can be implemented using simple bitshifting rather than looping. Reviewed-by: John Naylor Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org	2020-04-08 18:29:51 +12:00
Michael Paquier	c0187869a0	Fix crash when using COLLATE in partition bound expressions Attempting to use a COLLATE clause with a type that it not collatable in a partition bound expression could crash the server. This commit fixes the code by adding more checks similar to what is done when computing index or partition attributes by making sure that there is a collation iff the type is collatable. Backpatch down to 12, as `7c079d7` introduced this problem. Reported-by: Alexander Lakhin Author: Dmitry Dolgov Discussion: https://postgr.es/m/16325-809194cf742313ab@postgresql.org Backpatch-through: 12	2020-04-08 15:04:51 +09:00
David Rowley	d025cf88ba	Modify various power 2 calculations to use new helper functions First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org	2020-04-08 16:55:03 +12:00
Jeff Davis	50a38f6517	Create memory context for HashAgg with a reasonable maxBlockSize. If the memory context's maxBlockSize is too big, a single block allocation can suddenly exceed work_mem. For Hash Aggregation, this can mean spilling to disk too early or reporting a confusing memory usage number for EXPLAN ANALYZE. Introduce CreateWorkExprContext(), which is like CreateExprContext(), except that it creates the AllocSet with a maxBlockSize that is reasonable in proportion to work_mem. Right now, CreateWorkExprContext() is only used by Hash Aggregation, but it may be generally useful in the future. Discussion: https://postgr.es/m/412a3fbf306f84d8d78c4009e11791867e62b87c.camel@j-davis.com	2020-04-07 21:25:28 -07:00
Tom Lane	7a5d74b7dd	Put back mistakenly removed #include. In commit `4dbcb3f84` I removed some code from parse_coerce.c, and also removed some apparently-no-longer-needed #includes. But removing datum.h broke some not-compiled-by-default code. Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development	2020-04-08 00:10:16 -04:00
Thomas Munro	3985b600f5	Support PrefetchBuffer() in recovery. Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for use in recovery. Rename LocalPrefetchBuffer() to PrefetchLocalBuffer() for consistency. Add a return value to all of these. In recovery, tolerate and report missing files, so we can handle relations unlinked before crash recovery began. Also report cache hits and misses, so that callers can do faster buffer lookups and better I/O accounting. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2020-04-08 14:56:57 +12:00
Tom Lane	981643dcdb	Allow partitionwise join to handle nested FULL JOIN USING cases. This case didn't work because columns merged by FULL JOIN USING are represented in the parse tree by COALESCE expressions, and the logic for recognizing a partitionable join failed to match upper-level join clauses to such expressions. To fix, synthesize suitable COALESCE expressions and add them to the nullable_partexprs lists. This is pretty ugly and brute-force, but it gets the job done. (I have ambitions of rethinking the way outer-join output Vars are represented, so maybe that will provide a cleaner solution someday. For now, do this.) Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com	2020-04-07 22:12:14 -04:00
Etsuro Fujita	c8434d64ce	Allow partitionwise joins in more cases. Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com	2020-04-08 10:25:00 +09:00
Tom Lane	41a194f491	Fix circle_in to accept "(x,y),r" as it's advertised to do. Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca	2020-04-07 20:50:28 -04:00
Andres Freund	75848bc744	snapshot scalability: Move delayChkpt from PGXACT to PGPROC. The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-04-07 17:36:23 -07:00
Tomas Vondra	2b88fdde30	Track SLRU page hits in SimpleLruReadPage_ReadOnly SLRU page hits were tracked only in SimpleLruReadPage, but that's not enough because we may hit the page in SimpleLruReadPage_ReadOnly in which case we don't call SimpleLruReadPage at all. Reported-by: Kuntal Ghosh Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development	2020-04-08 02:15:47 +02:00
Andres Freund	91c40548d5	Fix XLogReader FD leak that makes backends unusable after 2PC usage. Before the fix every 2PC commit/abort leaked a file descriptor. As the files are opened using BasicOpenFile(), that quickly leads to the backend running out of file descriptors. Once enough 2PC abort/commit have caused enough FDs to leak, any IO in the backend will fail with "Too many open files", as BasicOpenFilePerm() will have triggered all open files known to fd.c to be closed. The leak causing the problem at hand is a consequence of `0dc8ead46`, but is only exascerbated by it. Previously most XLogPageReadCB callbacks used static variables to cache one open file, but after the commit the cache is private to each XLogReader instance. There never was infrastructure to close FDs at the time of XLogReaderFree, but the way XLogReader was used limited the leak to one FD. This commit just closes the during XLogReaderFree() if the FD is stored in XLogReaderState.seg.ws_segno. This may not be the way to solve this medium/long term, but at least unbreaks 2PC. Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de	2020-04-07 17:03:04 -07:00
Peter Geoghegan	60cbd7751c	Remove nbtree BTreeTupleSetAltHeapTID() function. Since heap TID is supposed to be just another key attribute to the implementation, it doesn't make much sense to have separate BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions. Merge the two functions together. This slightly simplifies _bt_truncate().	2020-04-07 15:56:52 -07:00
Alvaro Herrera	c655077639	Allow users to limit storage reserved by replication slots Replication slots are useful to retain data that may be needed by a replication system. But experience has shown that allowing them to retain excessive data can lead to the primary failing because of running out of space. This new feature allows the user to configure a maximum amount of space to be reserved using the new option max_slot_wal_keep_size. Slots that overrun that space are invalidated at checkpoint time, enabling the storage to be released. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp	2020-04-07 18:35:00 -04:00
Alexander Korotkov	0f5ca02f53	Implement waiting for given lsn at transaction start This commit adds following optional clause to BEGIN and START TRANSACTION commands. WAIT FOR LSN lsn [ TIMEOUT timeout ] New clause pospones transaction start till given lsn is applied on standby. This clause allows user be sure, that changes previously made on primary would be visible on standby. New shared memory struct is used to track awaited lsn per backend. Recovery process wakes up backend once required lsn is applied. Author: Ivan Kartyshov, Anna Akenteva Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs Reviewed-by: Amit Kapila, Alexander Korotkov Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru	2020-04-07 23:51:10 +03:00
Alvaro Herrera	357889eb17	Support FETCH FIRST WITH TIES WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL standard's spelling of LIMIT), where you additionally get rows that compare equal to the last of those N rows by the columns in the mandatory ORDER BY clause. There was a proposal by Andrew Gierth to implement this functionality in a more powerful way that would yield more features, but the other patch had not been finished at this time, so we decided to use this one for now in the spirit of incremental development. Author: Surafel Temesgen <surafel3000@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk	2020-04-07 16:22:13 -04:00
Tom Lane	26a944cf29	Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering. Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. Lift that restriction by redeclaring the bit number arguments as int8 (which requires a catversion bump, hence is not back-patchable). The similarly-named functions for bit/varbit don't really have a problem because we restrict those types to at most VARBITMAXLEN bits; hence leave them alone. While here, extend the encode/decode functions in utils/adt/encode.c to allow dealing with values wider than 1GB. This is not a live bug or restriction in current usage, because no input could be more than 1GB, and since none of the encoders can expand a string more than 4X, the result size couldn't overflow uint32. But it might be desirable to support more in future, so make the input length values size_t and the potential-output-length values uint64. Also add some test cases to improve the miserable code coverage of these functions. Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca	2020-04-07 15:57:58 -04:00
Tomas Vondra	9c74ceb20b	Remove debugging elog from pgstat_recv_resetslrucounter Reported-by: Thomas Munro	2020-04-07 19:20:20 +02:00
Tomas Vondra	d22782a539	Minor improvements in Incremental Sort explain Some places still used "Maximum" instead of "Peak" when displaying info about sort space, so fix that. Also, add a comment clarifying why it's correct to check the number of full/prefix sort groups. Author: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-07 18:25:13 +02:00
Fujii Masao	4bd0ad9e44	Prevent archive recovery from scanning non-existent WAL files. Previously when there were multiple timelines listed in the history file of the recovery target timeline, archive recovery searched all of them, starting from the newest timeline to the oldest one, to find the segment to read. That is, archive recovery had to continuously fail scanning the segment until it reached the timeline that the segment belonged to. These scans for non-existent segment could be harmful on the recovery performance especially when archival area was located on the remote storage and each scan could take a long time. To address the issue, this commit changes archive recovery so that it skips scanning the timeline that the segment to read doesn't belong to. Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com	2020-04-08 00:49:29 +09:00
Tomas Vondra	ba3e76cc57	Consider Incremental Sort paths at additional places Commit `d2d8a229bc` introduced Incremental Sort, but it was considered only in create_ordered_paths() as an alternative to regular Sort. There are many other places that require sorted input and might benefit from considering Incremental Sort too. This patch modifies a number of those places, but not all. The concern is that just adding Incremental Sort to any place that already adds Sort may increase the number of paths considered, negatively affecting planning time, without any benefit. So we've taken a more conservative approach, based on analysis of which places do affect a set of queries that did seem practical. This means some less common queries may not benefit from Incremental Sort yet. Author: Tomas Vondra Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-07 16:43:22 +02:00
Tom Lane	c7654f6a37	Fix representation of SORT_TYPE_STILL_IN_PROGRESS. It turns out that the code did indeed rely on a zeroed TuplesortInstrumentation.sortMethod field to indicate "this worker never did anything", although it seems the issue only comes up during certain race-condition-y cases. Hence, rearrange the TuplesortMethod enum to restore SORT_TYPE_STILL_IN_PROGRESS to having the value zero, and add some comments reinforcing that that isn't optional. Also future-proof a loop over the possible values of the enum. sizeof(bits32) happened to be the correct limit value, but only by purest coincidence. Per buildfarm and local investigation. Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us	2020-04-06 22:22:13 -04:00
Thomas Munro	4c04be9b05	Introduce xid8-based functions to replace txid_XXX. The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to users as int8. Now that we have an SQL type xid8 for FullTransactionId, define a new set of functions including pg_current_xact_id() and pg_current_snapshot() based on that. Keep the old functions around too, for now. It's a bit sneaky to use the same C functions for both, but since the binary representation is identical except for the signedness of the type, and since older functions are the ones using the wrong signedness, and since we'll presumably drop the older ones after a reasonable period of time, it seems reasonable to switch to FullTransactionId internally and share the code for both. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de	2020-04-07 12:04:32 +12:00
Thomas Munro	aeec457de8	Add SQL type xid8 to expose FullTransactionId to users. Similar to xid, but 64 bits wide. This new type is suitable for use in various system views and administration functions. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de	2020-04-07 12:03:59 +12:00
Tomas Vondra	4bea576b03	Use INT64_FORMAT when formatting int64 values in explain Per report from lapwing.	2020-04-07 01:16:57 +02:00
Peter Geoghegan	ce2cee0ade	Fix nbtree kill_prior_tuple posting list assert. An assertion added by commit `0d861bbb` checked that _bt_killitems() only processes a BTScanPosItem whose heap TID is contained in a posting list tuple when its page offset number still matches what is on the page (i.e. when it matches the posting list tuple's current offset number). This was only correct in the common case where the page can't have changed since we first read it. It was not correct in cases where we don't drop the buffer pin (and don't need to verify the page hasn't changed using its LSN). The latter category includes scans involving unlogged tables, and scans that use a non-MVCC snapshot, per the logic originally introduced by commit `2ed5b87f`. The assertion still seems helpful. Fix it by taking cases where the page may have been concurrently modified into account. Reported-By: Anastasia Lubennikova, Alexander Lakhin Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru	2020-04-06 14:46:33 -07:00
Tomas Vondra	7d6d82a524	Fix show_incremental_sort_info with force_parallel_mode When executed with force_parallel_mode=regress, the function was exiting too early and thus failed to print the worker stats. Fixed by making it more like show_sort_info. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-06 23:19:13 +02:00
Tomas Vondra	d2d8a229bc	Implement Incremental Sort Incremental Sort is an optimized variant of multikey sort for cases when the input is already sorted by a prefix of the requested sort keys. For example when the relation is already sorted by (key1, key2) and we need to sort it by (key1, key2, key3) we can simply split the input rows into groups having equal values in (key1, key2), and only sort/compare the remaining column key3. This has a number of benefits: - Reduced memory consumption, because only a single group (determined by values in the sorted prefix) needs to be kept in memory. This may also eliminate the need to spill to disk. - Lower startup cost, because Incremental Sort produce results after each prefix group, which is beneficial for plans where startup cost matters (like for example queries with LIMIT clause). We consider both Sort and Incremental Sort, and decide based on costing. The implemented algorithm operates in two different modes: - Fetching a minimum number of tuples without check of equality on the prefix keys, and sorting on all columns when safe. - Fetching all tuples for a single prefix group and then sorting by comparing only the remaining (non-prefix) keys. We always start in the first mode, and employ a heuristic to switch into the second mode if we believe it's beneficial - the goal is to minimize the number of unnecessary comparions while keeping memory consumption below work_mem. This is a very old patch series. The idea was originally proposed by Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the patch was taken over by James Coleman, who wrote and rewrote most of the current code. There were many reviewers/contributors since 2013 - I've done my best to pick the most active ones, and listed them in this commit message. Author: James Coleman, Alexander Korotkov Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com	2020-04-06 21:35:10 +02:00
Peter Eisentraut	f1ac27bfda	Add logical replication support to replicate into partitioned tables Mainly, this adds support code in logical/worker.c for applying replicated operations whose target is a partitioned table to its relevant partitions. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-04-06 15:15:52 +02:00
Amit Kapila	b7ce6de93b	Allow autovacuum to log WAL usage statistics. This commit allows autovacuum to log WAL usage statistics added by commit `df3b181499`. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-06 16:24:51 +05:30
Michael Paquier	8ef9451f58	Refactor cluster.c to use new routine get_index_isclustered() This new cache lookup routine has been introduced in `a40caf5`, and more code paths can directly use it. Note that in cluster_rel(), the code was returning immediately if the tuple's entry in pg_index for the clustered index was not valid. This commit changes the code so as a lookup error is raised instead, something that could not happen from the start as we check for the existence of the index beforehand, while holding an exclusive lock on the parent table. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com	2020-04-06 11:44:23 +09:00
Amit Kapila	33e05f89c5	Add the option to report WAL usage in EXPLAIN and auto_explain. This commit adds a new option WAL similar to existing option BUFFERS in the EXPLAIN command. This option allows to include information on WAL record generation added by commit `df3b181499` in EXPLAIN output. This also allows the WAL usage information to be displayed via the auto_explain module. A new parameter auto_explain.log_wal controls whether WAL usage statistics are printed when an execution plan is logged. This parameter has no effect unless auto_explain.log_analyze is enabled. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-06 08:02:15 +05:30
Michael Paquier	a40caf5f86	Preserve clustered index after rewrites with ALTER TABLE A table rewritten by ALTER TABLE would lose tracking of an index usable for CLUSTER. This setting is tracked by pg_index.indisclustered and is controlled by ALTER TABLE, so some extra work was needed to restore it properly. Note that ALTER TABLE only marks the index that can be used for clustering, and does not do the actual operation. Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com Backpatch-through: 9.5	2020-04-06 11:03:49 +09:00
Andres Freund	fc3f4453a2	Recompute stack base in forked postmaster children. This is for the benefit of running postgres under the rr debugger. When using rr signal handlers running while a syscall is active use an alternative stack. As e.g. bgworkers are started from within signal handlers, the forked backend then has a different stack base than postmaster. Previously that subsequently lead to those processes triggering spurious "stack depth limit exceeded" errors. Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de	2020-04-05 18:23:30 -07:00
Andres Freund	f946069e68	Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative(). There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-	2020-04-05 17:47:30 -07:00
Andres Freund	549a3e23c3	Fix recently introduced typo. Reported-By: David Rowley	2020-04-05 12:03:09 -07:00
Peter Eisentraut	a9d9bdd3ad	Save errno across LWLockRelease() calls Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>	2020-04-05 10:02:00 +02:00
Tom Lane	07871d40c7	Remove bogus Assert, add some regression test cases showing why. Commit `77ec5affb` added an assertion to enforce_generic_type_consistency that boils down to "if the function result is polymorphic, there must be at least one polymorphic argument". This should be true for user-created functions, but there are built-in functions for which it's not true, as pointed out by Jaime Casanova. Hence, go back to the old behavior of leaving the return type alone. There's only a limited amount of stuff you can do with such a function result, but it does work to some extent; add some regression test cases to ensure we don't break that again. Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com	2020-04-04 18:03:30 -04:00
Noah Misch	c6b92041d3	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-04-04 12:25:34 -07:00
Peter Eisentraut	552fcebff0	Revert "Improve handling of parameter differences in physical replication" This reverts commit `246f136e76`. That patch wasn't quite complete enough. Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org	2020-04-04 09:08:12 +02:00
Amit Kapila	df3b181499	Add infrastructure to track WAL usage. This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com	2020-04-04 10:02:08 +05:30
Jeff Davis	0588ee63aa	Include chunk overhead in hash table entry size estimate. Don't try to be precise about it, just use a constant 16 bytes of chunk overhead. Being smarter would require knowing the memory context where the chunk will be allocated, which is not known by all callers. Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de	2020-04-03 20:07:58 -07:00
Robert Haas	3e0d80fd8d	Fix resource management bug with replication=database. Commit `0d8c9c1210` allowed BASE_BACKUP to acquire a ResourceOwner without a transaction so that the backup manifest functionality could use a BufFile, but it overlooked the fact that when a walsender is used with replication=database, it might have a transaction in progress, because in that mode, SQL and replication commands can be mixed. Try to fix things up so that the two cleanup mechanisms don't conflict. Per buildfarm member serinus, which triggered the problem when CREATE_REPLICATION_SLOT failed from inside a transaction. It passed on the subsequent run, so evidently the failure doesn't happen every time.	2020-04-03 22:28:37 -04:00
Robert Haas	db1531cae0	Be more careful about time_t vs. pg_time_t in basebackup.c. lapwing is complaining that about a call to pg_gmtime, saying that it "expected 'const pg_time_t ' but argument is of type 'time_t '". I at first thought that the problem had someting to do with const, but Thomas Munro suggested that it might be just because time_t and pg_time_t are different identifers. lapwing is i686 rather than x86_64, and pg_time_t is always int64, so that seems like a good guess. There is other code that just casts time_t to pg_time_t without any conversion function, so try that approach here. Introduced in commit `0d8c9c1210`.	2020-04-03 20:18:47 -04:00
Tom Lane	0568e7a2a4	Cosmetic improvements for code related to partitionwise join. Move have_partkey_equi_join and match_expr_to_partition_keys to relnode.c, since they're used only there. Refactor build_joinrel_partition_info to split out the code that fills the joinrel's partition key lists; this doesn't have any non-cosmetic impact, but it seems like a useful separation of concerns. Improve assorted nearby comments. Amit Langote, with a little further editorialization by me Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com	2020-04-03 17:00:35 -04:00
Robert Haas	0d8c9c1210	Generate backup manifests for base backups, and validate them. A manifest is a JSON document which includes (1) the file name, size, last modification time, and an optional checksum for each file backed up, (2) timelines and LSNs for whatever WAL will need to be replayed to make the backup consistent, and (3) a checksum for the manifest itself. By default, we use CRC-32C when checksumming data files, because we are trying to detect corruption and user error, not foil an adversary. However, pg_basebackup and the server-side BASE_BACKUP command now have options to select a different algorithm, so users wanting a cryptographic hash function can select SHA-224, SHA-256, SHA-384, or SHA-512. Users not wanting file checksums at all can disable them, or disable generating of the backup manifest altogether. Using a cryptographic hash function in place of CRC-32C consumes significantly more CPU cycles, which may slow down backups in some cases. A new tool called pg_validatebackup can validate a backup against the manifest. If no checksums are present, it can still check that the right files exist and that they have the expected sizes. If checksums are present, it can also verify that each file has the expected checksum. Additionally, it calls pg_waldump to verify that the expected WAL files are present and parseable. Only plain format backups can be validated directly, but tar format backups can be validated after extracting them. Robert Haas, with help, ideas, review, and testing from David Steele, Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan Chalke, Amit Kapila, Andres Freund, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com	2020-04-03 15:05:59 -04:00
Fujii Masao	ce77abe63c	Include information on buffer usage during planning phase, in EXPLAIN output, take two. When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. This commit revives the original commit `ed7a509571` that was reverted by commit `19db23bcbd`. The original commit had to be reverted because it caused the regression test failure on the buildfarm members prion and dory. But since commit `c0885c4c30` got rid of the caues of the test failure, the original commit can be safely introduced again. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org	2020-04-04 03:13:17 +09:00
Tom Lane	e41955faf0	Fix bugs in gin_fuzzy_search_limit processing. entryGetItem()'s three code paths each contained bugs associated with filtering the entries for gin_fuzzy_search_limit. The posting-tree path failed to advance "advancePast" after having decided to filter an item. If we ran out of items on the current page and needed to advance to the next, what would actually happen is that entryLoadMoreItems() would re-load the same page. Eventually, the random dropItem() test would accept one of the same items it'd previously rejected, and we'd move on --- but it could take awhile with small gin_fuzzy_search_limit. To add insult to injury, this case would inevitably cause entryLoadMoreItems() to decide it needed to re-descend from the root, making things even slower. The posting-list path failed to implement gin_fuzzy_search_limit filtering at all, so that all entries in the posting list would be returned. The bitmap-result path used a "gotitem" variable that it failed to update in the one place where it'd actually make a difference, ie at the one "continue" statement. I think this was unreachable in practice, because if we'd looped around then it shouldn't be the case that the entries on the new page are before advancePast. Still, the "gotitem" variable was contributing nothing to either clarity or correctness, so get rid of it. Refactor all three loops so that the termination conditions are more alike and less unreadable. The code coverage report showed that we had no coverage at all for the re-descend-from-root code path in entryLoadMoreItems(), which seems like a very bad thing, so add a test case that exercises it. We also had exactly no coverage for gin_fuzzy_search_limit, so add a simplistic test case that at least hits those code paths a little bit. Back-patch to all supported branches. Adé Heyward and Tom Lane Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com	2020-04-03 13:15:45 -04:00
Tom Lane	6dd9f35779	Fix bogus CALLED_AS_TRIGGER() defenses. contrib/lo's lo_manage() thought it could use trigdata->tg_trigger->tgname in its error message about not being called as a trigger. That naturally led to a core dump. unique_key_recheck() figured it could Assert that fcinfo->context is a TriggerData node in advance of having checked that it's being called as a trigger. That's harmless in production builds, and perhaps not that easy to reach in any case, but it's logically wrong. The first of these per bug #16340 from William Crowell; the second from manual inspection of other CALLED_AS_TRIGGER call sites. Back-patch the lo.c change to all supported branches, the other to v10 where the thinko crept in. Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org	2020-04-03 11:24:56 -04:00
Fujii Masao	19db23bcbd	Revert "Include information on buffer usage during planning phase, in EXPLAIN output." This reverts commit `ed7a509571`. Per buildfarm member prion.	2020-04-03 12:20:42 +09:00
Fujii Masao	18808f8c89	Add wait events for recovery conflicts. This commit introduces new wait events RecoveryConflictSnapshot and RecoveryConflictTablespace. The former is reported while waiting for recovery conflict resolution on a vacuum cleanup. The latter is reported while waiting for recovery conflict resolution on dropping tablespace. Also this commit changes the code so that the wait event Lock is reported while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery conflict resolution on a lock. Basically the wait event Lock is reported during that wait, but previously was not reported only when that wait happened in ResolveRecoveryConflictWithVirtualXIDs(). Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com	2020-04-03 12:15:56 +09:00
Fujii Masao	ed7a509571	Include information on buffer usage during planning phase, in EXPLAIN output. When BUFFERS option is enabled, EXPLAIN command includes the information on buffer usage during each plan node, in its output. In addition to that, this commit makes EXPLAIN command include also the information on buffer usage during planning phase, in its output. This feature makes it easier to discern the cases where lots of buffer access happen during planning. Author: Julien Rouhaud, slightly revised by Fujii Masao Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org	2020-04-03 11:27:09 +09:00
Tom Lane	0b34e7d307	Improve user control over truncation of logged bind-parameter values. This patch replaces the boolean GUC log_parameters_on_error introduced by commit `ba79cb5dc` with an integer log_parameter_max_length_on_error, adding the ability to specify how many bytes to trim each logged parameter value to. (The previous coding hard-wired that choice at 64 bytes.) In addition, add a new parameter log_parameter_max_length that provides similar control over truncation of query parameters that are logged in response to statement-logging options, as opposed to errors. Previous releases always logged such parameters in full, possibly causing log bloat. For backwards compatibility with prior releases, log_parameter_max_length defaults to -1 (log in full), while log_parameter_max_length_on_error defaults to 0 (no logging). Per discussion, log_parameter_max_length is SUSET since the DBA should control routine logging behavior, but log_parameter_max_length_on_error is USERSET because it also affects errcontext data sent back to the client. Alexey Bashtanov, editorialized a little by me Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc	2020-04-02 15:04:51 -04:00
Peter Eisentraut	2991ac5fc9	Add SQL functions for Unicode normalization This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and check Unicode normal forms, per SQL standard. To support fast IS NORMALIZED tests, we pull in a new data file DerivedNormalizationProps.txt from Unicode and build a lookup table from that, using techniques similar to ones already used for other Unicode data. make update-unicode will keep it up to date. We only build and use these tables for the NFC and NFKC forms, because they are too big for NFD and NFKD and the improvement is not significant enough there. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com	2020-04-02 08:56:27 +02:00
Peter Eisentraut	c6e0edad46	Add some comments to some SQL features Otherwise, it could be confusing to a reader that some of these well-publicized features are simply listed as unsupported without further explanation.	2020-04-02 07:52:20 +02:00
Thomas Munro	37b3794dfc	Add maintenance_io_concurrency to postgresql.conf.sample. New GUC from commit `fc34b0d9`.	2020-04-02 16:50:36 +13:00
Amit Kapila	3a5e22138a	Allow parallel vacuum to accumulate buffer usage. Commit `40d964ec99` allowed vacuum command to process indexes in parallel but forgot to accumulate the buffer usage stats of parallel workers. This allows leader backend to accumulate buffer usage stats of all the parallel workers. Reported-by: Julien Rouhaud Author: Sawada Masahiko Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud Discussion: https://postgr.es/m/20200328151721.GB12854@nol	2020-04-02 08:04:58 +05:30
Tomas Vondra	28cac71bd3	Collect statistics about SLRU caches There's a number of SLRU caches used to access important data like clog, commit timestamps, multixact, asynchronous notifications, etc. Until now we had no easy way to monitor these shared caches, compute hit ratios, number of reads/writes etc. This commit extends the statistics collector to track this information for a predefined list of SLRUs, and also introduces a new system view pg_stat_slru displaying the data. The list of built-in SLRUs is fixed, but additional SLRUs may be defined in extensions. Unfortunately, there's no suitable registry of SLRUs, so this patch simply defines a fixed list of SLRUs with entries for the built-in ones and one entry for all additional SLRUs. Extensions adding their own SLRU are fairly rare, so this seems acceptable. This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes are still fixed (hard-coded in the code) and it's not entirely clear which of the SLRUs might need a GUC to tune size. In a way, allowing us to determine that is one of the goals of this patch. Bump catversion as the patch introduces new functions and system view. Author: Tomas Vondra Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development	2020-04-02 02:34:21 +02:00
Tom Lane	501b018799	Check equality semantics for unique indexes on partitioned tables. We require the partition key to be a subset of the set of columns being made unique, so that physically-separate indexes on the different partitions are sufficient to enforce the uniqueness constraint. The existing code checked that the listed columns appear, but did not inquire into the index semantics, which is a serious oversight given that different index opclasses might enforce completely different notions of uniqueness. Ideally, perhaps, we'd just match the partition key opfamily to the index opfamily. But hash partitioning uses hash opfamilies which we can't directly match to btree opfamilies. Hence, look up the equality operator in each family, and accept if it's the same operator. This should be okay in a fairly general sense, since the equality operator ought to precisely represent the opfamily's notion of uniqueness. A remaining weak spot is that we don't have a cross-index-AM notion of which opfamily member is "equality". But we know which one to use for hash and btree AMs, and those are the only two that are relevant here at present. (Any non-core AMs that know how to enforce equality are out of luck, for now.) Back-patch to v11 where this feature was introduced. Guancheng Luo, revised a bit by me Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com	2020-04-01 14:49:49 -04:00
Tom Lane	a80818605e	Improve selectivity estimation for assorted match-style operators. Quite a few matching operators such as JSONB's @> used "contsel" and "contjoinsel" as their selectivity estimators. That was a bad idea, because (a) contsel is only a stub, yielding a fixed default estimate, and (b) that default is 0.001, meaning we estimate these operators as five times more selective than equality, which is surely pretty silly. There's a good model for improving this in ltree's ltreeparentsel(): for any "var OP constant" query, we can try applying the operator to all of the column's MCV and histogram values, taking the latter as being a random sample of the non-MCV values. That code is actually 100% generic, except for the question of exactly what default selectivity ought to be plugged in when we don't have stats. Hence, migrate the guts of ltreeparentsel() into the core code, provide wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate default estimate, and use those for the non-geometric operators that formerly used contsel (mostly JSONB containment operators and tsquery matching). Also apply this code to some match-like operators in hstore, ltree, and pg_trgm, including the former users of ltreeparentsel as well as ones that improperly used contsel. Since commit `911e70207` just created new versions of those extensions that we haven't released yet, we can sneak this change into those new versions instead of having to create an additional generation of update scripts. Patch by me, reviewed by Alexey Bashtanov Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us	2020-04-01 10:32:33 -04:00
Peter Eisentraut	d8653f4687	Refactor code to look up local replication tuple This unifies some duplicate code. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com	2020-04-01 15:34:41 +02:00
Amit Kapila	2401d93718	Fix coverity complaint about commit `40d964ec99`. The coverity complained that dividing integer expressions and then converting the integer quotient to type "double" would lose fractional part. Typecasting one of the arguments of expression with double should fix the report. Author: Mahendra Singh Thalor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20200329224818.6phnhv7o2q2rfovf@alap3.anarazel.de	2020-04-01 09:28:13 +05:30
Peter Geoghegan	7dbe290da4	Add CREATE INDEX deduplication assertions. Add two assertions that verify the assumptions about posting list tuple space accounting and suffix truncation made within nbtsort.c.	2020-03-31 14:38:39 -07:00
Tom Lane	fe3036527a	Fix race condition in statext_store(). Must hold some lock on the pg_statistic_ext_data catalog before we look up the tuple we aim to replace. Otherwise a concurrent VACUUM FULL or similar operation could move it to a different TID, leaving us trying to replace the wrong tuple. Back-patch to v12 where this got broken. Credit goes to Dean Rasheed; I'm just doing the clerical work. Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com	2020-03-31 17:06:22 -04:00
Fujii Masao	b0236508d3	Improve the message logged when recovery is paused. When recovery target is reached and recovery is paused because of recovery_target_action=pause, executing pg_wal_replay_resume() causes the standby to promote, i.e., the recovery to end. So, in this case, the previous message "Execute pg_wal_replay_resume() to continue" logged was confusing because pg_wal_replay_resume() doesn't cause the recovery to continue. This commit improves the message logged when recovery is paused, and the proper message is output based on what (pg_wal_replay_pause or recovery_target_action) causes recovery to be paused. Author: Sergei Kornilov, revised by Fujii Masao Reviewed-by: Robert Haas Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net	2020-04-01 03:35:13 +09:00
Tom Lane	82e8018522	Teach pg_ls_dir_files() to ignore ENOENT failures from stat(). Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until `085b6b667` added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com	2020-03-31 12:57:55 -04:00
Alexander Korotkov	02a5786df2	Improve error reporting in opclasscmds.c This commit improves error reporting introduced by `911e702077`. It puts argument of errmsg() to the single line for easier grepping source for error text. Also it improves wording of errhint().	2020-03-31 17:51:57 +03:00
Magnus Hagander	087d3d0583	Fix assorted typos Author: Daniel Gustafsson <daniel@yesql.se>	2020-03-31 16:00:06 +02:00
Peter Eisentraut	de3bbfcc96	Fix INSERT OVERRIDING USER VALUE behavior The original implementation disallowed using OVERRIDING USER VALUE on identity columns defined as GENERATED ALWAYS, which is not per standard. So allow that now. Expand documentation and tests around this. Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/CAEZATCVrh2ufCwmzzM%3Dk_OfuLhTTPBJCdFkimst2kry4oHepuQ%40mail.gmail.com	2020-03-31 08:50:39 +02:00
Michael Paquier	616ae3d2b0	Move routine definitions of xlogarchive.c to a new header file The definitions of the routines defined in xlogarchive.c have been part of xlog_internal.h which is included by several frontend tools, but all those routines are only called by the backend. More cleanup could be done within xlog_internal.h, but that's already a nice cut. This will help a follow-up patch for pg_rewind where handling of restore_command is added for frontends. Author: Alexey Kondratov, Michael Paquier Reviewed-by: Álvaro Herrera, Alexander Korotkov Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru	2020-03-31 15:33:04 +09:00
Peter Eisentraut	fc8c3bdde2	Update SQL features Set T653 to supported. This has always been possible.	2020-03-31 08:25:03 +02:00
Amit Kapila	ef75140fe7	Avoid calls to RelationGetRelationName() and RelationGetNamespace() in vacuum code. After commit `b61d161c14`, during vacuum, we cache the information of relation name and relation namespace in local structure LVRelStats so that we can use it in an error callback function. We can use the cached information to avoid the calls to RelationGetRelationName(), RelationGetNamespace() and get_namespace_name(). This is mainly for the consistent in vacuum code path but it will avoid the extra syscache lookup we do in get_namespace_name(). Author: Justin Pryzby Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-31 09:34:49 +05:30
Peter Geoghegan	f01157e2ac	Further simplify nbtree high key truncation. Commit `7c2dbc69` reorganized _bt_truncate() in a way that enables a further simplification that I (pgeoghegan) missed: Since we mark the tuple that is returned to the caller as a pivot tuple before the point where its heap TID is set as of `7c2dbc69`, it is possible to use the high level BTreeTupleGetHeapTID() inline function to get an item pointer. Do it that way now. This approach is clearer and more maintainable.	2020-03-30 17:34:12 -07:00
Michael Paquier	dd9ac7d5d8	Revert "Skip redundant anti-wraparound vacuums" This reverts commit `2aa6e33`, that added a fast path to skip anti-wraparound and non-aggressive autovacuum jobs (these have no sense as anti-wraparound implies aggressive). With a cluster using a high amount of relations with a portion of them being heavily updated, this could cause autovacuum to lock down, with autovacuum workers attempting repeatedly those jobs on the same relations for the same database, that just kept being skipped. This lock down can be solved with a manual VACUUM FREEZE. Justin King has reported one environment where the issue happened, and Julien Rouhaud and I have been able to reproduce it in a second environment. With a very aggressive autovacuum_freeze_max_age, triggering those jobs with pgbench is a matter of minutes, and hitting the lock down is a lot harder (my local tests failed to do that). Note that anti-wraparound and non-aggressive jobs can only be triggered on a subset of shared catalogs: - pg_auth_members - pg_authid - pg_database - pg_replication_origin - pg_shseclabel - pg_subscription - pg_tablespace While the lock down was possible down to v12, the root cause of those jobs is a much older issue, which needs more analysis. Bonus thanks to Andres Freund for the discussion. Reported-by: Justin King Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com Backpatch-through: 12	2020-03-31 08:27:47 +09:00
Peter Geoghegan	7c2dbc691c	Refactor nbtree high key truncation. Simplify _bt_truncate(), the routine that generates truncated leaf page high keys. Remove a micro-optimization that avoided a second palloc0() call (this was used when a heap TID was needed in the final pivot tuple, though only when the index happened to not be an INCLUDE index). Removing this dubious micro-optimization allows _bt_truncate() to use the index_truncate_tuple() indextuple.c utility routine in all cases. This was already the common case. This commit is a HEAD-only follow up to bugfix commit `4b42a899`.	2020-03-30 15:52:39 -07:00
Andres Freund	d4b34f60c5	Deduplicate PageIsNew() check in lazy_scan_heap(). The recheck isn't needed anymore, as RelationGetBufferForTuple() now extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to handle the fact that relation extension extended the relation and then separately acquired a lock on the page - while expecting that the page is empty. Reported-By: Ranier Vilela Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com	2020-03-30 13:56:40 -07:00
Alexander Korotkov	364bdd0b41	Fix missing SP-GiST support in `911e702077` `911e702077` misses setting of amoptsprocnum for SP-GiST. This commit fixes that.	2020-03-30 23:45:03 +03:00
Alexander Korotkov	851b14b0c6	Remove rudiments of supporting procnum == 0 from `911e702077` Early versions of opclass options patch uses zero support procedure as opclass options procedure. This commit removes rudiments of it, which were committed in `911e702077`. Also, it implements correct handling of amoptsprocnum == 0.	2020-03-30 23:43:25 +03:00
Peter Geoghegan	4b42a89938	Consistently truncate non-key suffix columns. INCLUDE indexes failed to have their non-key attributes physically truncated away in certain rare cases. This led to physically larger pivot tuples that contained useless non-key attribute values. The impact on users should be negligible, but this is still clearly a regression (Postgres 11 supports INCLUDE indexes, and yet was not affected). The bug appeared in commit `dd299df8`, which introduced "true" suffix truncation of key attributes. Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com Backpatch: 12-, where "true" suffix truncation was introduced.	2020-03-30 12:03:59 -07:00
Alexander Korotkov	911e702077	Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera	2020-03-30 19:17:23 +03:00
Fujii Masao	64638ccba3	Report waiting via PS while recovery is waiting for buffer pin in hot standby. Previously while the startup process was waiting for the recovery conflict with snapshot, tablespace or lock to be resolved, waiting was reported in PS display, but not in the case of recovery conflict with buffer pin. This commit makes the startup process in hot standby report waiting via PS while waiting for the conflicts with other backends holding buffer pins to be resolved. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com	2020-03-30 17:35:03 +09:00
Peter Eisentraut	246f136e76	Improve handling of parameter differences in physical replication When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send. For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records. However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction. The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources. If we just let the standby roll on with recovery, it will eventually produce an appropriate error when those resources are used. So this patch relaxes this a bit. Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem. Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information. At that point we pause recovery, so a hot standby remains usable. We also repeat the last warning message once a minute so it is harder to miss or ignore. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com	2020-03-30 09:53:45 +02:00
Peter Eisentraut	a01e1b8b9d	Add new part SQL/MDA to information_schema.sql_parts	2020-03-30 08:55:55 +02:00
Fujii Masao	6aba63ef3e	Allow the planner-related functions and hook to accept the query string. This commit adds query_string argument into the planner-related functions and hook and allows us to pass the query string to them. Currently there is no user of the query string passed. But the upcoming patch for the planning counters will add the planning hook function into pg_stat_statements and the function will need the query string. So this change will be necessary for that patch. Also this change is useful for some extensions that want to use the query string in their planner hook function. Author: Pascal Legrand, Julien Rouhaud Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com	2020-03-30 13:51:05 +09:00
Fujii Masao	4a539a25eb	Expose BufferUsageAccumDiff(). Previously pg_stat_statements calculated the difference of buffer counters by its own code even while BufferUsageAccumDiff() had the same code. This commit expose BufferUsageAccumDiff() and makes pg_stat_statements use it for the calculation, in order to simply the code. This change also would be useful for the upcoming patch for the planning counters in pg_stat_statements because the patch will add one more code for the calculation of difference of buffer counters and that can easily be done by using BufferUsageAccumDiff(). Author: Julien Rouhaud Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com	2020-03-30 12:15:26 +09:00
Amit Kapila	b61d161c14	Introduce vacuum errcontext to display additional information. The additional information displayed will be block number for error occurring while processing heap and index name for error occurring while processing the index. This will help us in diagnosing the problems that occur during a vacuum. For ex. due to corruption (either caused by bad hardware or by some bug) if we get some error while vacuuming, it can help us identify the block in heap and or additional index information. It sets up an error context callback to display additional information with the error. During different phases of vacuum (heap scan, heap vacuum, index vacuum, index clean up, heap truncate), we update the error context callback to display appropriate information. We can extend it to a bit more granular level like adding the phases for FSM operations or for prefetching the blocks while truncating. However, I felt that it requires adding many more error callback function calls and can make the code a bit complex, so left those for now. Author: Justin Pryzby, with few changes by Amit Kapila Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier and Sawada Masahiko Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com	2020-03-30 07:33:38 +05:30
Peter Eisentraut	b79911dc8c	Update SQL features Change F181 to supported. It requires that an embedded C program can be split across multiple files, which ECPG easily supports.	2020-03-29 08:56:41 +02:00
Peter Geoghegan	a7b9d24e4e	Make deduplication use number of key attributes. Use IndexRelationGetNumberOfKeyAttributes() rather than IndexRelationGetNumberOfAttributes() when determining whether or not two index tuples are suitable for merging together into a single posting list tuple. This is a little bit tidier. It brings affected code in nbtdedup.c a little closer to similar, related code in nbtsplitloc.c.	2020-03-28 20:25:03 -07:00
Andres Freund	42750b08d9	Ensure snapshot is registered within ScanPgRelation(). In 9.4 I added support to use a historical snapshot in ScanPgRelation(), while adding logical decoding. Unfortunately a conflict with the concurrent removal of SnapshotNow was incorrectly resolved, leading to an unregistered snapshot being used. It is not correct to use an unregistered (or non-active) snapshot for anything non-trivial, because catalog invalidations can cause the snapshot to be invalidated. Luckily it seems unlikely to actively cause problems in practice, as ScanPgRelation() requires that we already have a lock on the relation, we only look for a single row, and we don't appear to rely on the result's tid to be correct. It however is clearly wrong and potential negative consequences would likely be hard to find. So it seems worth backpatching the fix, even without a concrete hazard. Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de Backpatch: 9.5-	2020-03-28 12:26:46 -07:00
Jeff Davis	7351bfeda3	Fix costing for disk-based hash aggregation. Report and suggestions from Richard Guo and Tomas Vondra. Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com	2020-03-28 12:07:49 -07:00
Dean Rasheed	4083f445c0	Improve the performance and accuracy of numeric sqrt() and ln(). Instead of using Newton's method to compute numeric square roots, use the Karatsuba square root algorithm, which performs better for numbers of all sizes. In practice, this is 3-5 times faster for inputs with just a few digits and up to around 10 times faster for larger inputs. Also, the new algorithm guarantees that the final digit of the result is correctly rounded, since it computes an integer square root with truncation, containing at least 1 extra decimal digit before rounding. The former algorithm would occasionally round the wrong way because it rounded both the intermediate and final results. In addition, arrange for sqrt_var() to explicitly support negative rscale values (rounding before the decimal point). This allows the argument reduction phase of ln_var() to be optimised for large inputs, since it only needs to compute square roots with a few more digits than the final ln() result, rather than computing all the digits before the decimal point. For very large inputs, this can be many thousands of times faster. In passing, optimise div_var_fast() in a couple of places where it was doing unnecessary work. Patch be me, reviewed by Tom Lane and Tels. Discussion: https://postgr.es/m/CAEZATCV1A7+jD3P30Zu31KjaxeSEyOn3v9d6tYegpxcq3cQu-g@mail.gmail.com	2020-03-28 14:37:53 +00:00
Dean Rasheed	87779aa474	Prevent functional dependency estimates from exceeding column estimates. Formerly we applied a functional dependency "a => b with dependency degree f" using the formula P(a,b) = P(a) * [f + (1-f)P(b)] This leads to the possibility that the combined selectivity P(a,b) could exceed P(b), which is not ideal. The addition of support for IN and OR clauses (commits `8f321bd16c` and `ccaa3569f5`) would seem to make this more likely, since the user-supplied values in such clauses are not necessarily compatible with the functional dependency. Mitigate this by using the formula P(a,b) = f Min(P(a), P(b)) + (1-f) * P(a) * P(b) instead, which guarantees that the combined selectivity is less than each column's individual selectivity. Logically, this is modifies the part of the formula that accounts for dependent rows to handle cases where P(a) > P(b), whilst not changing the second term which accounts for independent rows. Additionally, this refactors the way that functional dependencies are applied, so now dependencies_clauselist_selectivity() estimates both the implying clauses and the implied clauses for each functional dependency (formerly only the implied clauses were estimated), and now all clauses for each attribute are taken into account (formerly only one clause for each implied attribute was estimated). This removes the previously built-in assumption that only equality clauses will be seen, which is no longer true, and opens up the possibility of applying functional dependencies to more general clauses. Patch by me, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development	2020-03-28 12:48:34 +00:00
Peter Eisentraut	145cb16d3b	Cleanup in SQL features files Feature C011 was still listed in sql_feature_packages.txt but had been removed from sql_features.txt, so also remove from the former.	2020-03-28 08:46:18 +01:00
David Rowley	b07642dbcd	Trigger autovacuum based on number of INSERTs Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com	2020-03-28 19:20:12 +13:00
Peter Geoghegan	9945ad6e90	Justify nbtree page split locking in code comment. Delaying unlocking the right child page until after the point that the left child's parent page has been refound is no longer truly necessary. Commit `40dae7ec` made nbtree tolerant of interrupted page splits. VACUUM was taught to avoid deleting a page that happens to be the right half of an incomplete split. As long as page splits don't unlock the left child page until the end of the second/final phase, it should be safe to unlock the right child page earlier (at the end of the first phase). It probably isn't actually useful to release the right child's lock earlier like this (it probably won't improve performance). Even still, pointing out that it ought to be safe to do so should make it easier to understand the overall design.	2020-03-27 16:44:52 -07:00
Alvaro Herrera	1e6148032e	Allow walreceiver configuration to change on reload The parameters primary_conninfo, primary_slot_name and wal_receiver_create_temp_slot can now be changed with a simple "reload" signal, no longer requiring a server restart. This is achieved by signalling the walreceiver process to terminate and having it start again with the new values. Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net	2020-03-27 19:51:37 -03:00
Alvaro Herrera	092c6936de	Set wal_receiver_create_temp_slot PGC_POSTMASTER Commit `3297308278` gave walreceiver the ability to create and use a temporary replication slot, and made it controllable by a GUC (enabled by default) that can be changed with SIGHUP. That's useful but has two problems: one, it's possible to cause the origin server to fill its disk if the slot doesn't advance in time; and also there's a disconnect between state passed down via the startup process and GUCs that walreceiver reads directly. We handle the first problem by setting the option to disabled by default. If the user enables it, its on their head to make sure that disk doesn't fill up. We handle the second problem by passing the flag via startup rather than having walreceiver acquire it directly, and making it PGC_POSTMASTER (which ensures a walreceiver always has the fresh value). A future commit can relax this (to PGC_SIGHUP again) by having the startup process signal walreceiver to shutdown whenever the value changes. Author: Sergei Kornilov <sk@zsrv.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz	2020-03-27 16:20:33 -03:00
Tom Lane	fbc7a71608	Rearrange validity checks for plpgsql "simple" expressions. Buildfarm experience shows what probably should've occurred to me before: if a cache flush occurs partway through building a generic plan, then the plansource may have is_valid = false even though the plan is valid. We need to accept this case, use the generated plan, and then try to replan the next time. We can't try to replan immediately, because that would produce an infinite loop in CLOBBER_CACHE_ALWAYS builds; moreover it's really overkill. (We can assume that the plan is valid, it's just possibly a bit stale. Note that the pre-existing code behaved this way, and the non-simple-expression code paths do too.) Conversely, not using the generated plan would drop us into the not-a-simple-expression code path, which is bad for performance and would also cause regression-test failures due to visibly different error-reporting behavior. Hence, refactor the validity-check functions so that the initial check and recheck cases can react differently to plansource->is_valid. This makes their usage a bit simpler, too. Discussion: https://postgr.es/m/7072.1585332104@sss.pgh.pa.us	2020-03-27 14:47:34 -04:00
Peter Eisentraut	8d1b9648c5	Update SQL features Change F311 to supported. This was already accomplished when subfeature F311-04 (WITH CHECK OPTION) was added, but the top-level feature wasn't updated at the time.	2020-03-27 08:36:08 +01:00
Tom Lane	8f59f6b9c0	Improve performance of "simple expressions" in PL/pgSQL. For relatively simple expressions (say, "x + 1" or "x > 0"), plpgsql's management overhead exceeds the cost of evaluating the expression. This patch substantially improves that situation, providing roughly 2X speedup for such trivial expressions. First, add infrastructure in the plancache to allow fast re-validation of cached plans that contain no table access, and hence need no locks. Teach plpgsql to use this infrastructure for expressions that it's already deemed "simple" (which in particular will never contain table references). The fast path still requires checking that search_path hasn't changed, so provide a fast path for OverrideSearchPathMatchesCurrent by counting changes that have occurred to the active search path in the current session. This is simplistic but seems enough for now, seeing that PushOverrideSearchPath is not used in any performance-critical cases. Second, manage the refcounts on simple expressions' cached plans using a transaction-lifespan resource owner, so that we only need to take and release an expression's refcount once per transaction not once per expression evaluation. The management of this resource owner exactly parallels the existing management of plpgsql's simple-expression EState. Add some regression tests covering this area, in particular verifying that expression caching doesn't break semantics for search_path changes. Patch by me, but it owes something to previous work by Amit Langote, who recognized that getting rid of plancache-related overhead would be a useful thing to do here. Also thanks to Andres Freund for review. Discussion: https://postgr.es/m/CAFj8pRDRVfLdAxsWeVLzCAbkLFZhW549K+67tpOc-faC8uH8zw@mail.gmail.com	2020-03-26 18:58:57 -04:00
Magnus Hagander	eff5b245df	Document that pg_checksums exists in checksums README Author: Daniel Gustafsson <daniel@yesql.se>	2020-03-26 15:05:54 +01:00
Peter Eisentraut	49bf81536e	Drop slot's LWLock before returning from SaveSlotToPath() When SaveSlotToPath() is called with elevel=LOG, the early exits didn't release the slot's io_in_progress_lock. This could result in a walsender being stuck on the lock forever. A possible way to get into this situation is if the offending code paths are triggered in a low disk space situation. Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com> Reported-by: Craig Ringer <craig@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com	2020-03-26 13:29:20 +01:00
Andrew Dunstan	896fcdb230	Provide a TLS init hook The default hook function sets the default password callback function. In order to allow preloaded libraries to have an opportunity to override the default, TLS initialization if now delayed slightly until after shared preloaded libraries have been loaded. A test module is provided which contains a trivial example that decodes an obfuscated password for an SSL certificate. Author: Andrew Dunstan Reviewed By: Andreas Karlsson, Asaba Takanori Discussion: https://postgr.es/m/04116472-818b-5859-1d74-3d995aab2252@2ndQuadrant.com	2020-03-25 17:13:17 -04:00
Tom Lane	bda6dedbea	Go back to returning int from ereport auxiliary functions. This reverts the parts of commit `17a28b0364` that changed ereport's auxiliary functions from returning dummy integer values to returning void. It turns out that a minority of compilers complain (not entirely unreasonably) about constructs such as (condition) ? errdetail(...) : 0 if errdetail() returns void rather than int. We could update those call sites to say "(void) 0" perhaps, but the expectation for this patch set was that ereport callers would not have to change anything. And this aspect of the patch set was already the most invasive and least compelling part of it, so let's just drop it. Per buildfarm. Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-25 11:57:36 -04:00
Peter Eisentraut	e8b1774fc2	Update SQL features The name of E182 was changed in SQL:2011. Also, we can change it to supported because all it requires is one embedded language to be supported, which we do.	2020-03-25 08:46:41 +01:00
Thomas Munro	352f6f2df6	Add collation versions for Windows. On Vista and later, use GetNLSVersionEx() to request collation version information. Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com	2020-03-25 16:04:32 +13:00
Thomas Munro	382a821907	Allow NULL version for individual collations. Remove the documented restriction that collation providers must either return NULL for all collations or non-NULL for all collations. Use NULL for glibc collations like "C.UTF-8", which might otherwise lead future proposed commits to force unnecessary index rebuilds. Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com	2020-03-25 15:53:24 +13:00
Jeff Davis	dd8e19132a	Consider disk-based hash aggregation to implement DISTINCT. Correct oversight in `1f39bce0`. If enable_hashagg_disk=true, we should consider hash aggregation for DISTINCT when applicable.	2020-03-24 18:30:04 -07:00
Jeff Davis	3649133b14	Avoid allocating unnecessary zero-sized array. If there are no aggregates, there is no need to allocate an array of zero AggStatePerGroupData elements.	2020-03-24 18:30:04 -07:00
Peter Geoghegan	b150a76793	Fix nbtree deduplication README commentary. Descriptions of some aspects of how deduplication works were unclear in a couple of places.	2020-03-24 14:58:27 -07:00
Andres Freund	112b006fe7	logical decoding: Remove TODO about unnecessary optimization. Measurements show, and intuition agrees, that there's currently no known cases where adding a fastpath to avoid allocating / ordering a heap for a single transaction is worthwhile. Author: Dilip Kumar Discussion: https://postgr.es/m/CAFiTN-sp701wvzvnLQJGk7JDqrFM8f--97-ihbwkU8qvn=p8nw@mail.gmail.com	2020-03-24 12:15:03 -07:00
Peter Eisentraut	f15ace7935	Fix compiler warning on Cygwin `bf68b79e50` introduced an unused variable compiler warning on Cygwin.	2020-03-24 19:31:02 +01:00
Tom Lane	17a28b0364	Improve the internal implementation of ereport(). Change all the auxiliary error-reporting routines to return void, now that we no longer need to pretend they are passing something useful to errfinish(). While this probably doesn't save anything significant at the machine-code level, it allows detection of some additional types of mistakes. Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO) to errfinish not errstart. This shaves a few cycles off the case where errstart decides we're not going to emit anything. Re-implement elog() as a trivial wrapper around ereport(), removing the separate support infrastructure it used to have. Aside from getting rid of some now-surplus code, this means that elog() now really does have exactly the same semantics as ereport(), in particular that it can skip evaluation work if the message is not to be emitted. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-24 12:08:48 -04:00
Tom Lane	e3a87b4991	Re-implement the ereport() macro using __VA_ARGS__. Now that we require C99, we can depend on __VA_ARGS__ to work, and revising ereport() to use it has several significant benefits: * The extra parentheses around the auxiliary function calls are now optional. Aside from being a bit less ugly, this removes a common gotcha for new contributors, because in some cases the compiler errors you got from forgetting them were unintelligible. * The auxiliary function calls are now evaluated as a comma expression list rather than as extra arguments to errfinish(). This means that compilers can be expected to warn about no-op expressions in the list, allowing detection of several other common mistakes such as forgetting to add errmsg(...) when converting an elog() call to ereport(). * Unlike the situation with extra function arguments, comma expressions are guaranteed to be evaluated left-to-right, so this removes platform dependency in the order of the auxiliary function calls. While that dependency hasn't caused us big problems in the past, this change does allow dropping some rather shaky assumptions around errcontext() domain handling. There's no intention to make wholesale changes of existing ereport calls, but as proof-of-concept this patch removes the extra parens from a couple of calls in postgres.c. While new code can be written either way, code intended to be back-patched will need to use extra parens for awhile yet. It seems worth back-patching this change into v12, so as to reduce the window where we have to be careful about that by one year. Hence, this patch is careful to preserve ABI compatibility; a followup HEAD-only patch will make some additional simplifications. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-24 11:49:00 -04:00
Peter Eisentraut	cef27ae01a	Fix compiler warning A variable was unused in non-assert builds. Simplify the code to avoid the issue. Reported-by: Erik Rijkers <er@xs4all.nl>	2020-03-24 16:02:01 +01:00
Peter Eisentraut	97ee604d9b	Some refactoring of logical/worker.c This moves the main operations of apply_handle_{insert\|update\|delete}, that of inserting, updating, deleting a tuple into/from a given relation, into corresponding apply_handle_{insert\|update\|delete}_internal functions. This allows performing those operations on relations that are not directly the targets of replication, which is something a later patch will use for targeting partitioned tables. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-24 15:00:54 +01:00
Andres Freund	cedffbdb8b	Report wait event for cost-based vacuum delay. Author: Justin Pryzby Discussion: https://postgr.es/m/20200321040750.GD13662@telsasoft.com	2020-03-23 22:53:22 -07:00
Fujii Masao	496ee647ec	Prefer standby promotion over recovery pause. Previously if a promotion was triggered while recovery was paused, the paused state continued. Also recovery could be paused by executing pg_wal_replay_pause() even while a promotion was ongoing. That is, recovery pause had higher priority over a standby promotion. But this behavior was not desirable because most users basically wanted the recovery to complete as soon as possible and the server to become the master when they requested a promotion. This commit changes recovery so that it prefers a promotion over recovery pause. That is, if a promotion is triggered while recovery is paused, the paused state ends and a promotion continues. Also this commit makes recovery pause functions like pg_wal_replay_pause() throw an error if they are executed while a promotion is ongoing. Internally, this commit adds new internal function PromoteIsTriggered() that returns true if a promotion is triggered. Since the name of this function and the existing function IsPromoteTriggered() are confusingly similar, the commit changes the name of IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name. Author: Fujii Masao Reviewed-by: Atsushi Torikoshi, Sergei Kornilov Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com	2020-03-24 12:46:48 +09:00
Michael Paquier	e09ad07b21	Move routine building restore_command to src/common/ restore_command has only been used until now by the backend, but there is a pending patch for pg_rewind to make use of that in the frontend. Author: Alexey Kondratov Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander Korotkov, Michael Paquier Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru	2020-03-24 12:13:36 +09:00
Fujii Masao	b8e20d6dab	Add wait events for WAL archive and recovery pause. This commit introduces new wait events BackupWaitWalArchive and RecoveryPause. The former is reported while waiting for the WAL files required for the backup to be successfully archived. The latter is reported while waiting for recovery in pause state to be resumed. Author: Fujii Masao Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com	2020-03-24 11:12:21 +09:00
Fujii Masao	67e0adfb3f	Report NULL as total backup size if it's not estimated. Previously 0 was reported in pg_stat_progress_basebackup.total_backup if the total backup size was not estimated. Per discussion, our consensus is that NULL is better choise as the value in total_backup in that case. So this commit makes pg_stat_progress_basebackup view report NULL in total_backup column if the estimation is disabled. Bump catversion. Author: Fujii Masao Reviewed-by: Amit Langote, Magnus Hagander, Alvaro Herrera Discussion: https://postgr.es/m/CABUevExnhOD89zBDuPvfAAh243RzNpwCPEWNLtMYpKHMB8gbAQ@mail.gmail.com	2020-03-24 10:43:41 +09:00
Jeff Davis	64fe602279	Fixes for Disk-based Hash Aggregation. Justin Pryzby raised a couple issues with commit `1f39bce0`. Fixed. Also, tweak the way the size of a hash entry is estimated and the number of buckets is estimated when calling BuildTupleHashTableExt(). Discussion: https://www.postgresql.org/message-id/20200319064222.GR26184@telsasoft.com	2020-03-23 15:43:07 -07:00
Amit Kapila	33753ac9d7	Add object names to partition integrity violations. All errors of SQLSTATE class 23 should include the name of an object associated with the error in separate fields of the error report message. We do this so that applications need not try to extract them from the possibly-localized human-readable text of the message. Reported-by: Chris Bandy Author: Chris Bandy Reviewed-by: Amit Kapila and Amit Langote Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com	2020-03-23 08:09:15 +05:30
Michael Paquier	79dfa8afb2	Add bound checks for ssl_min_protocol_version and ssl_max_protocol_version Mixing incorrect bounds in the SSL context leads to confusing error messages generated by OpenSSL which are hard to act on. New range checks are added when both min/max parameters are loaded in the context of a SSL reload to improve the error reporting. Note that this does not make use of the GUC hook machinery contrary to `41aadee`, as there is no way to ensure a consistent range check (except if there is a way one day to define range types for GUC parameters?). Hence, this patch applies only to OpenSSL, and uses a logic similar to other parameters to trigger an error when reloading the SSL context in a session. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz	2020-03-23 11:01:41 +09:00
Noah Misch	de9396326e	Revert "Skip WAL for new relfilenodes, under wal_level=minimal." This reverts commit `cb2fd7eac2`. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com	2020-03-22 09:24:09 -07:00
Noah Misch	cb2fd7eac2	Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org	2020-03-21 09:38:26 -07:00
Noah Misch	d3e572855b	In log_newpage_range(), heed forkNum and page_std arguments. The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com	2020-03-21 09:38:26 -07:00
Noah Misch	e629a01f69	During heap rebuild, lock any TOAST index until end of transaction. swap_relation_files() calls toast_get_valid_index() to find and lock this index, just before swapping with the rebuilt TOAST index. The latter function releases the lock before returning. Potential for mischief is low; a concurrent session can issue ALTER INDEX ... SET (fillfactor = ...), which is not alarming. Nonetheless, changing pg_class.relfilenode without a lock is unconventional. Back-patch to 9.5 (all supported versions), because another fix needs this. Discussion: https://postgr.es/m/20191226001521.GA1772687@rfd.leadboat.com	2020-03-21 09:38:26 -07:00
Noah Misch	d60ef94d76	Fix cosmetic blemishes involving rd_createSubid. Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting of a comment in struct RelationData, mangled by the pgindent run in commit `9af4159fce`. Back-patch to 9.5 (all supported versions), because another fix stacks on this.	2020-03-21 09:38:26 -07:00
Amit Kapila	3ba59ccc89	Allow page lock to conflict among parallel group members. This is required as it is no safer for two related processes to perform clean up in gin indexes at a time than for unrelated processes to do the same. After acquiring page locks, we can acquire relation extension lock but reverse never happens which means these will also not participate in deadlock. So, avoid checking wait edges from this lock. Currently, the parallel mode is strictly read-only, but after this patch we have the infrastructure to allow parallel inserts and parallel copy. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-21 08:48:06 +05:30
Amit Kapila	85f6b49c2c	Allow relation extension lock to conflict among parallel group members. This is required as it is no safer for two related processes to extend the same relation at a time than for unrelated processes to do the same. We don't acquire a heavyweight lock on any other object after relation extension lock which means such a lock can never participate in the deadlock cycle. So, avoid checking wait edges from this lock. This provides an infrastructure to allow parallel operations like insert, copy, etc. which were earlier not possible as parallel group members won't conflict for relation extension lock. Author: Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-20 08:20:56 +05:30
Peter Geoghegan	b27e1b3418	nbtree: Remove obsolete _bt_pgaddtup() comments. Remove comments that are a throw back to a time when nbtree cared about write-ordering dependencies. The comments are similar to those removed by commit `9ee7414e`, among others.	2020-03-19 14:56:56 -07:00
Jeff Davis	2fd6a44ad5	Revert "Specialize MemoryContextMemAllocated()." This reverts commit `e00912e11a`.	2020-03-19 12:21:50 -07:00
Tom Lane	24e2885ee3	Introduce "anycompatible" family of polymorphic types. This patch adds the pseudo-types anycompatible, anycompatiblearray, anycompatiblenonarray, and anycompatiblerange. They work much like anyelement, anyarray, anynonarray, and anyrange respectively, except that the actual input values need not match precisely in type. Instead, if we can find a common supertype (using the same rules as for UNION/CASE type resolution), then the parser automatically promotes the input values to that type. For example, "myfunc(anycompatible, anycompatible)" can match a call with one integer and one bigint argument, with the integer automatically promoted to bigint. With anyelement in the definition, the user would have had to cast the integer explicitly. The new types also provide a second, independent set of type variables for function matching; thus with "myfunc(anyelement, anyelement, anycompatible) returns anycompatible" the first two arguments are constrained to be the same type, but the third can be some other type, and the result has the type of the third argument. The need for more than one set of type variables was foreseen back when we first invented the polymorphic types, but we never did anything about it. Pavel Stehule, revised a bit by me Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com	2020-03-19 11:43:11 -04:00
Peter Eisentraut	c314c147c0	Prepare to support non-tables in publications This by itself doesn't change any functionality but prepares the way for having relations other than base tables in publications. Make arrangements for COPY handling the initial table sync. For non-tables we have to use COPY (SELECT ...) instead of directly copying from the table, but then we have to take care to omit generated columns from the column list. Also, remove a hardcoded reference to relkind = 'r' and rely on the publisher to send only what it can actually publish, which will be correct even in future cross-version scenarios. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-19 08:25:07 +01:00
Fujii Masao	1d253bae57	Rename the recovery-related wait events. This commit renames RecoveryWalAll and RecoveryWalStream wait events to RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively, in order to make the names and what they are more consistent. For example, previously RecoveryWalAll was reported as a wait event while the recovery was waiting for WAL from a stream, and which was confusing because the name was very different from the situation where the wait actually could happen. The names of macro variables for those wait events also are renamed accordingly. This commit also changes the category of RecoveryRetrieveRetryInterval to Timeout from Activity because the wait event is reported while waiting based on wal_retrieve_retry_interval. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com	2020-03-19 15:32:55 +09:00
Amit Kapila	72e78d831a	Add assert to ensure that page locks don't participate in deadlock cycle. Assert that we don't acquire any other heavyweight lock while holding the page lock except for relation extension. However, these locks are never taken in reverse order which implies that page locks will never participate in the deadlock cycle. Similar to relation extension, page locks are also held for a short duration, so imposing such a restriction won't hurt. Author: Dilip Kumar, with few changes by Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-19 08:11:45 +05:30
Peter Geoghegan	6312c08a29	nbtree: Use raw PageAddItem() for retail inserts. Only internal page splits need to call _bt_pgaddtup() instead of PageAddItem(), and only for data items, one of which will end up at the first offset (or first offset after the high key offset) on the new right page. This data item alone will need to be truncated in _bt_pgaddtup(). Since there is no reason why retail inserts ever need to truncate the incoming item, use a raw PageAddItem() call there instead. Even _bt_split() uses raw PageAddItem() calls for left page and right page high keys. Clearly the _bt_pgaddtup() shim function wasn't really encapsulating anything. _bt_pgaddtup() should now be thought of as a _bt_split() helper function. Note that the assertions from commit `d1e241c2` verify that retail inserts never insert an item at an internal page's negative infinity offset. This invariant could only ever be violated as a result of a basic logic error in nbtinsert.c.	2020-03-18 18:17:37 -07:00
Michael Paquier	d41202f36e	Fix comment related to concurrent index swapping in index.c A comment about switching indisvalid of the new and old indexes swapped in REINDEX CONCURRENTLY got this backwards. Issue introduced by `5dc92b8`, the original commit of REINDEX CONCURRENTLY. Author: Julien Rouhaud Discussion: https://postgr.es/m/20200318143340.GA46897@nol Backpatch-through: 12	2020-03-19 09:51:33 +09:00
Jeff Davis	1f39bce021	Disk-based Hash Aggregation. While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode". In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later. The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively). Author: Jeff Davis Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com	2020-03-18 15:42:02 -07:00
Jeff Davis	e00912e11a	Specialize MemoryContextMemAllocated(). An AllocSet doubles the size of allocated blocks (up to maxBlockSize), which means that the current block can represent half of the total allocated space for the memory context. But the free space in the current block may never have been touched, so don't count the untouched memory as allocated for the purposes of MemoryContextMemAllocated(). Discussion: https://postgr.es/m/ec63d70b668818255486a83ffadc3aec492c1f57.camel@j-davis.com	2020-03-18 15:39:14 -07:00
Alvaro Herrera	487e9861d0	Enable BEFORE row-level triggers for partitioned tables ... with the limitation that the tuple must remain in the same partition. Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/20200227165158.GA2071@alvherre.pgsql	2020-03-18 18:58:05 -03:00
Peter Geoghegan	b029395f5e	Refactor nbtree fastpath optimization. Commit `2b272734`, which added the fastpath rightmost leaf page cache insert optimization, added code to _bt_doinsert() to handle using and invalidating the backend local block cache. It doesn't seem like a good place to handle these low level details, though. _bt_doinsert() is supposed to be a high level function -- it is the main entry point to nbtinsert.c. Restructure the code by placing handling of the rightmost block cache at the start of a new _bt_search() shim function, _bt_search_insert(). The new function is called from _bt_doinsert(), which uses it as a _bt_search() variant that conveniently accepts its BTInsertState state as an argument. _bt_doinsert() no longer needs to directly consider the fastpath optimization. Discussion: https://postgr.es/m/CAH2-Wzk59cxKJRd=rfbyub6-V4yWRjsOYRkUNHBLT1P1GdtCQQ@mail.gmail.com	2020-03-18 14:42:49 -07:00
Peter Eisentraut	a2b1faa0f2	Implement type regcollation This will be helpful for a following commit and it's also just generally useful, like the other reg* types. Author: Julien Rouhaud Reviewed-by: Thomas Munro and Michael Paquier Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com	2020-03-18 21:21:00 +01:00
Tomas Vondra	ccaa3569f5	Recognize some OR clauses as compatible with functional dependencies Since commit `8f321bd16c` functional dependencies can handle IN clauses, which however introduced a possible (and surprising) inconsistency, because IN clauses may be expressed as an OR clause, which are still considered incompatible. For example a IN (1, 2, 3) may be rewritten as (a = 1 OR a = 2 OR a = 3) The IN clause will work fine with functional dependencies, but the OR clause will force the estimation to fall back to plain per-column estimates, possibly introducing significant estimation errors. This commit recognizes OR clauses equivalent to an IN clause (when all arugments are compatible and reference the same attribute) as a special case, compatible with functional dependencies. This allows applying functional dependencies, just like for IN clauses. This does not eliminate the difference in estimating the clause itself, i.e. IN clause and OR clause still use different formulas. It would be possible to change that (for these special OR clauses), but that's not really about extended statistics - it was always like this. Moreover the errors are usually much smaller compared to ignoring dependencies. Author: Tomas Vondra Reviewed-by: Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-18 16:41:49 +01:00
Tomas Vondra	6f72dbc48b	Fix wording of several extended stats comments Reported-by: Thomas Munro Discussion: https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development	2020-03-18 13:40:13 +01:00
Amit Kapila	b4f140869f	Add missing errcode() in a few ereport calls. This will allow to specifying SQLSTATE error code for the errors in the missing places. Reported-by: Sawada Masahiko Author: Sawada Masahiko Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com	2020-03-18 09:27:14 +05:30
Michael Paquier	fdeeb524b4	Fix typo in indexcmds.c Introduced by `61d7c7b`. Backpatch-through: 12	2020-03-18 11:13:12 +09:00
Amit Kapila	15ef6ff4b9	Assert that we don't acquire a heavyweight lock on another object after relation extension lock. The only exception to the rule is that we can try to acquire the same relation extension lock more than once. This is allowed as we are not creating any new lock for this case. This restriction implies that the relation extension lock won't ever participate in the deadlock cycle because we can never wait for any other heavyweight lock after acquiring this lock. Such a restriction is okay for relation extension locks as unlike other heavyweight locks these are not held till the transaction end. These are taken for a short duration to extend a particular relation and then released. Author: Dilip Kumar, with few changes by Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com	2020-03-18 07:20:17 +05:30
Peter Geoghegan	b897b3aae6	nbtree: Remove useless local variables. Copying block and offset numbers to local variables in _bt_insertonpg() made the code less readable. Remove the variables. There is already code that conditionally calls BufferGetBlockNumber() in the same block, so consistently do it that way instead. Calling BufferGetBlockNumber() is very cheap, but we might as well avoid it when it isn't truly necessary. It isn't truly necessary for _bt_insertonpg() to call BufferGetBlockNumber() in almost all cases. Spotted while working on a patch that refactors the fastpath rightmost leaf page cache optimization, which was added by commit `2b272734`.	2020-03-17 18:39:26 -07:00
Thomas Munro	9b8aa09293	Don't use EV_CLEAR for kqueue events. For the semantics to match the epoll implementation, we need a socket to continue to appear readable/writable if you wait multiple times without doing I/O in between (in Linux terminology: level-triggered rather than edge-triggered). This distinction will be important for later commits. Similar to commit `3b790d256f` for Windows. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2020-03-18 13:05:09 +13:00
Thomas Munro	7bc84a1f30	Fix kqueue support under debugger on macOS. While running under a debugger, macOS's getppid() can return the debugger's PID. That could cause a backend to exit because it falsely believed that the postmaster had died, since commit `815c2f09`. Continue to use getppid() as a fast postmaster check after adding the postmaster's PID to a kqueue, to close a PID-reuse race, but double check that it actually exited by trying to read the pipe. The new check isn't reached in the common case. Reported-by: Alexander Korotkov <a.korotkov@postgrespro.ru> Discussion: https://postgr.es/m/CA%2BhUKGKhAxJ8V8RVwCo6zJaeVrdOG1kFBHGZOOjf6DzW_omeMA%40mail.gmail.com	2020-03-18 13:05:07 +13:00
Tom Lane	e6c178b5b7	Refactor our checks for valid function and aggregate signatures. pg_proc.c and pg_aggregate.c had near-duplicate copies of the logic to decide whether a function or aggregate's signature is legal. This seems like a bad thing even without the problem that the upcoming "anycompatible" patch would roughly double the complexity of that logic. Hence, refactor so that the rules are localized in new subroutines supplied by parse_coerce.c. (One could quibble about just where to add that code, but putting it beside enforce_generic_type_consistency seems not totally unreasonable.) The fact that the rules are about to change would mandate some changes in the wording of the associated error messages in any case. I ended up spelling things out in a fairly literal fashion in the errdetail messages, eg "A result of type anyelement requires at least one input of type anyelement, anyarray, anynonarray, anyenum, or anyrange." Perhaps this is overkill, but once there's more than one subgroup of polymorphic types, people might get confused by more-abstract messages. Discussion: https://postgr.es/m/24137.1584139352@sss.pgh.pa.us	2020-03-17 19:36:41 -04:00
Tom Lane	77ec5affbc	Adjust handling of an ANYARRAY actual input for an ANYARRAY argument. Ordinarily it's impossible for an actual input of a function to have declared type ANYARRAY, since we'd resolve that to a concrete array type before doing argument type resolution for the function. But an exception arises for functions applied to certain columns of pg_statistic or pg_stats, since we abuse the "anyarray" pseudotype by using it to declare those columns. So parse_coerce.c has to deal with the case. Previously we allowed an ANYARRAY actual input to match an ANYARRAY polymorphic argument, but only if no other argument or result was declared ANYELEMENT. When that logic was written, those were the only two polymorphic types, and I fear nobody thought carefully about how it ought to extend to the other ones that came along later. But actually it was wrong even then, because if a function has two ANYARRAY arguments, it should be able to expect that they have identical element types, and we'd not be able to ensure that. The correct generalization is that we can match an ANYARRAY actual input to an ANYARRAY polymorphic argument only if no other argument or result is of any polymorphic type, so that no promises are being made about element type compatibility. check_generic_type_consistency can't test that condition, but it seems better anyway to accept such matches there and then throw an error if needed in enforce_generic_type_consistency. That way we can produce a specific error message rather than an unintuitive "function does not exist" complaint. (There's some risk perhaps of getting new ambiguous-function complaints, but I think that any set of functions for which that could happen would be ambiguous against ordinary array columns as well.) While we're at it, we can improve the message that's produced in cases that the code did already object to, as shown in the regression test changes. Also, remove a similar test that got cargo-culted in for ANYRANGE; there are no catalog columns of type ANYRANGE, and I hope we never create any, so that's not needed. (It was incomplete anyway.) While here, update some comments and rearrange the code a bit in preparation for upcoming additions of more polymorphic types. In practical situations I believe this is just exchanging one error message for another, hopefully better, one. So it doesn't seem needful to back-patch, even though the mistake is ancient. Discussion: https://postgr.es/m/21569.1584314271@sss.pgh.pa.us	2020-03-17 18:29:14 -04:00
Alvaro Herrera	5d0c2d5eba	Remove logical_read_local_xlog_page It devolved into a content-less wrapper over read_local_xlog_page, with nothing to add, plus it's easily confused with walsender's logical_read_xlog_page. There doesn't seem to be any reason for it to stay. src/include/replication/logicalfuncs.h becomes empty, so remove it too. The prototypes it initially had were absorbed by generated fmgrprotos.h. Discussion: https://postgr.es/m/20191115214102.GA15616@alvherre.pgsql	2020-03-17 18:18:01 -03:00
Alvaro Herrera	bcd1c36300	Fix consistency issues with replication slot copy Commit 9f06d79ef831's replication slot copying failed to properly reserve the WAL that the slot is expecting to see during DecodingContextFindStartpoint (to set the confirmed_flush LSN), so concurrent activity could remove that WAL and cause the copy process to error out. But it doesn't actually need that WAL anyway: instead of running decode to find confirmed_flush, it can be copied from the source slot. Fix this by rearranging things to avoid DecodingContextFindStartpoint() (leaving the target slot's confirmed_flush_lsn to invalid), and set that up afterwards by copying from the target slot's value. Also ensure the source slot's confirmed_flush_lsn is valid. Reported-by: Arseny Sher Author: Masahiko Sawada, Arseny Sher Discussion: https://postgr.es/m/871rr3ohbo.fsf@ars-thinkpad	2020-03-17 16:13:18 -03:00
Tom Lane	9d9784c840	Remove bogus assertion about polymorphic SQL function result. It is possible to reach check_sql_fn_retval() with an unresolved polymorphic rettype, resulting in an assertion failure as demonstrated by one of the added test cases. However, the code following that throws what seems an acceptable error message, so just remove the Assert and adjust commentary. While here, I thought it'd be a good idea to provide some parallel tests of SQL-function and PL/pgSQL-function polymorphism behavior. Some of these cases are perhaps duplicative of tests elsewhere, but we hadn't any organized coverage of the topic AFAICS. Although that assertion's been wrong all along, it won't have any effect in production builds, so I'm not bothering to back-patch. Discussion: https://postgr.es/m/21569.1584314271@sss.pgh.pa.us	2020-03-17 14:54:46 -04:00
Fujii Masao	1429d3f767	Fix comment in xlog.c. This commit fixes the comment about SharedHotStandbyActive variable. The comment was apparently copy-and-pasted. Author: Atsushi Torikoshi Discussion: https://postgr.es/m/CACZ0uYEjpqZB9wN2Rwc_RMvDybyYqdbkPuDr1NyxJg4f9yGfMw@mail.gmail.com	2020-03-17 12:06:22 +09:00
Tom Lane	41b45576d5	Remove useless pfree()s at the ends of various ValuePerCall SRFs. We don't need to manually clean up allocations in a SRF's multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure takes care of that (and also ensures that it will happen even if the function never gets a final call, which simple manual cleanup cannot do). Hence, the code removed by this patch is a waste of code and cycles. Worse, it gives the impression that cleaning up manually is a thing, which can lead to more serious errors such as those fixed in commits `085b6b667` and `b4570d33a`. So we should get rid of it. These are not quite actual bugs though, so I couldn't muster the enthusiasm to back-patch. Fix in HEAD only. Justin Pryzby Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com	2020-03-16 21:36:53 -04:00
Tom Lane	b4570d33aa	Avoid holding a directory FD open across assorted SRF calls. This extends the fixes made in commit `085b6b667` to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com	2020-03-16 21:05:52 -04:00
Peter Geoghegan	113758155c	nbtree: Fix obsolete _bt_search() comment. Oversight in commit `d2086b08b0`.	2020-03-16 15:51:06 -07:00
Peter Geoghegan	013c1f6af6	nbtree: Pass down MAXALIGN()'d itemsz for new item. Refactor nbtinsert.c so that the final itemsz of each new non-pivot tuple (the MAXALIGN()'d size) is determined once. Most of the functions used by leaf page inserts used the insertstate.itemsz value already. This commit makes everything use insertstate.itemsz as standard practice. The goal is to decouple tuple size from "effective" tuple size. Making this distinction isn't truly necessary right now, but that might change in the future. Also explain why we consistently apply MAXALIGN() to get an effective index tuple size. This was rather unclear, in part because it isn't actually strictly necessary right now.	2020-03-16 12:00:10 -07:00
Thomas Munro	fc34b0d9de	Introduce a maintenance_io_concurrency setting. Introduce a GUC and a tablespace option to control I/O prefetching, much like effective_io_concurrency, but for work that is done on behalf of many client sessions. Use the new setting in heapam.c instead of the hard-coded formula effective_io_concurrency + 10 introduced by commit `558a9165e0`. Go with a default value of 10 for now, because it's a round number pretty close to the value used for that existing case. Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com	2020-03-16 17:14:26 +13:00
Thomas Munro	b09ff53667	Simplify the effective_io_concurrency setting. The effective_io_concurrency GUC and equivalent tablespace option were previously passed through a formula based on a theory about RAID spindles and probabilities, to arrive at the number of pages to prefetch in bitmap heap scans. Tomas Vondra, Andres Freund and others argued that it was anachronistic and hard to justify, and commit `558a9165e0` already started down the path of bypassing it in new code. We agreed to drop that logic and use the value directly. For the default setting of 1, there is no change in effect. Higher settings can be converted from the old meaning to the new with: select round(sum(OLD / n::float)) from generate_series(1, OLD) s(n); We might want to consider renaming the GUC before the next release given the change in meaning, but it's not clear that many users had set it very carefully anyway. That decision is deferred for now. Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com	2020-03-16 17:14:26 +13:00
Peter Geoghegan	f207bb0b8f	nbtree: Reorder nbtinsert.c prototypes. Relocate _bt_newroot() prototype, so that the order that prototypes appear in matches the order that the functions are defined in.	2020-03-15 20:53:12 -07:00
Peter Eisentraut	70a7b4776b	Add backend type to csvlog and optionally log_line_prefix The backend type, which corresponds to what pg_stat_activity.backend_type shows, is added as a column to the csvlog and can optionally be added to log_line_prefix using the new %b placeholder. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com	2020-03-15 11:20:21 +01:00
Tom Lane	87c9c2571c	Rearrange pseudotypes.c to get rid of duplicative code. Commit `a5954de10` replaced a lot of manually-coded stub I/O routines with code generated by macros. That was a good idea but it didn't go far enough, because there were still manually-coded stub input routines for types that had live output routines. Refactor the macro so that we can generate just a stub input routine at need. Also create similar macros to generate stub binary I/O routines, since we have some of those now. The only stub functions that remain hand-coded are shell_in() and shell_out(), which need to be separate because they use different error messages. While here, rearrange the commentary to discuss each type not each function. This provides a better way to explain the why of which types need which support, rather than just duplicatively annotating the functions. Discussion: https://postgr.es/m/24137.1584139352@sss.pgh.pa.us	2020-03-14 15:31:44 -04:00
Tom Lane	4dbcb3f844	Restructure polymorphic-type resolution in funcapi.c. resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to cover the case of having to resolve anyarray given only an anyrange input. The bug was masked if anyelement was also used (as either input or output), which probably helps account for our not having noticed. While looking at this I noticed that resolve_generic_type() would produce the wrong answer if asked to make that same resolution. ISTM that resolve_generic_type() is confusingly defined and overly complex, so rather than fix it, let's just make funcapi.c do the specific lookups it requires for itself. With this change, resolve_generic_type() is not used anywhere, so remove it in HEAD. In the back branches, leave it alone (complete with bug) just in case any external code is using it. While we're here, make some other refactoring adjustments in funcapi.c with an eye to upcoming future expansion of the set of polymorphic types: * Simplify quick-exit tests by adding an overall have_polymorphic_result flag. This is about a wash now but will be a win when there are more flags. * Reduce duplication of code between resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes(). * Don't bother to validate correct matching of anynonarray or anyenum; the parser should have done that, and even if it didn't, just doing "return false" here would lead to a very confusing, off-point error message. (Really, "return false" in these two functions should only occur if the call_expr isn't supplied or we can't obtain data type info from it.) * For the same reason, throw an elog rather than "return false" if we fail to resolve a polymorphic type. The bug's been there since we added anyrange, so back-patch to all supported branches. Discussion: https://postgr.es/m/6093.1584202130@sss.pgh.pa.us	2020-03-14 14:42:22 -04:00
Tomas Vondra	e83daa7e33	Use multi-variate MCV lists to estimate ScalarArrayOpExpr Commit `8f321bd16c` added support for estimating ScalarArrayOpExpr clauses (IN/ANY) clauses using functional dependencies. There's no good reason not to support estimation of these clauses using multi-variate MCV lists too, so this commits implements that. That makes the behavior consistent and MCV lists can estimate all variants (ANY/ALL, inequalities, ...). Author: Tomas Vondra Review: Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-14 16:13:00 +01:00
Tomas Vondra	8f321bd16c	Use functional dependencies to estimate ScalarArrayOpExpr Until now functional dependencies supported only simple equality clauses and clauses that can be trivially translated to equalities. This commit allows estimation of some ScalarArrayOpExpr (IN/ANY) clauses. For IN clauses we can do this thanks to using operator with equality semantics, which means an IN clause WHERE c IN (1, 2, ..., N) can be translated to WHERE (c = 1 OR c = 2 OR ... OR c = N) IN clauses are now considered compatible with functional dependencies, and rely on the same assumption of consistency of queries with data (which is an assumption we already used for simple equality clauses). This applies also to ALL clauses with an equality operator, which can be considered equivalent to IN clause. ALL clauses are still considered incompatible, although there's some discussion about maybe relaxing this in the future. Author: Pierre Ducroquet Reviewed-by: Tomas Vondra, Dean Rasheed Discussion: https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy%40pierred-pdoc	2020-03-14 16:12:41 +01:00
Peter Eisentraut	d90bd24391	Remove am_syslogger global variable Use the new MyBackendType instead. More similar changes for other "am something" variables are possible. This one was just particularly simple. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com	2020-03-13 14:01:15 +01:00
Peter Eisentraut	8e8a0becb3	Unify several ways to tracking backend type Add a new global variable MyBackendType that uses the same BackendType enum that was previously only used by the stats collector. That way several duplicate ways of checking what type a particular process is can be simplified. Since it's no longer just for stats, move to miscinit.c and rename existing functions to match the expanded purpose. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com	2020-03-13 14:01:10 +01:00
Peter Eisentraut	1cc9c2412c	Preserve replica identity index across ALTER TABLE rewrite If an index was explicitly set as replica identity index, this setting was lost when a table was rewritten by ALTER TABLE. Because this setting is part of pg_index but actually controlled by ALTER TABLE (not part of CREATE INDEX, say), we have to do some extra work to restore it. Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com> Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com	2020-03-13 11:57:06 +01:00
Tom Lane	085b6b6679	Avoid holding a directory FD open across pg_ls_dir_files() calls. This coding technique is undesirable because (a) it leaks the FD for the rest of the transaction if the SRF is not run to completion, and (b) allocated FDs are a scarce resource, but multiple interleaved uses of the relevant functions could eat many such FDs. In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1" yields a warning about the leaked FD, and the only reason there's no warning in earlier branches is that fd.c didn't whine about such leaks before commit `9cb7db3f0`. Even disregarding the warning, it wouldn't be too hard to run a backend out of FDs with careless use of these SQL functions. Hence, rewrite the function so that it reads the directory within a single call, returning the results as a tuplestore rather than via value-per-call mode. There are half a dozen other built-in SRFs with similar problems, but let's fix this one to start with, just to see if the buildfarm finds anything wrong with the code. In passing, fix bogus error report for stat() failure: it was whining about the directory when it should be fingering the individual file. Doubtless a copy-and-paste error. Back-patch to v10 where this function was added. Justin Pryzby, with cosmetic tweaks and test cases by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com	2020-03-11 15:27:59 -04:00
Peter Eisentraut	bf68b79e50	Refactor ps_status.c API The init_ps_display() arguments were mostly lies by now, so to match typical usage, just use one argument and let the caller assemble it from multiple sources if necessary. The only user of the additional arguments is BackendInitialize(), which was already doing string assembly on the caller side anyway. Remove the second argument of set_ps_display() ("force") and just handle that in init_ps_display() internally. BackendInitialize() also used to set the initial status as "authentication", but that was very far from where authentication actually happened. So now it's set to "initializing" and then "authentication" just before the actual call to ClientAuthentication(). Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com	2020-03-11 16:38:31 +01:00
Alvaro Herrera	899a04f5ed	Avoid duplicates in ALTER ... DEPENDS ON EXTENSION If the command is attempted for an extension that the object already depends on, silently do nothing. In particular, this means that if a database containing multiple such entries is dumped, the restore will silently do the right thing and record just the first one. (At least, in a world where pg_dump does dump such entries -- which it doesn't currently, but it will.) Backpatch to 9.6, where this kind of dependency was introduced. Reviewed-by: Ibrar Ahmed, Tom Lane (offlist) Discussion: https://postgr.es/m/20200217225333.GA30974@alvherre.pgsql	2020-03-11 11:04:59 -03:00
Peter Eisentraut	1c91838181	Clean up order in miscinit.c a bit The code around InitPostmasterChild() from commit `31c453165b` somehow ended up in the middle of a block of code related to "User ID state". Move it into its own block instead.	2020-03-11 13:51:55 +01:00
Peter Eisentraut	aaa3aeddee	Remove HAVE_WORKING_LINK Previously, hard links were not used on Windows and Cygwin, but they support them just fine in currently supported OS versions, so we can use them there as well. Since all supported platforms now support hard links, we can remove the alternative code paths. Rename durable_link_or_rename() to durable_rename_excl() to make the purpose more clear without referencing the implementation details. Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com	2020-03-11 11:23:04 +01:00
Peter Geoghegan	39eabec904	nbtree: Move fastpath NULL descent stack assertion. Commit `074251db` added an assertion that verified the fastpath/rightmost page insert optimization's assumption about free space: There should always be enough free space on the page to insert the new item without splitting the page. Otherwise, we end up using the "concurrent root page split" phony/fake stack path in _bt_insert_parent(). This does not lead to incorrect behavior, but it is likely to be far slower than simply using the regular _bt_search() path. The assertion catches serious performance bugs that would probably take a long time to detect any other way. It seems much more natural to make this assertion just before the point that we generate a fake/phony descent stack. Move the assert there. This also makes _bt_insertonpg() a bit more readable.	2020-03-10 17:25:47 -07:00
Tom Lane	c8e8b2f9df	Marginal comments and docs cleanup. Fix up some imprecise comments and poor markup from `ba79cb5dc`. Also try to convert the documentation of log_min_duration_sample and friends into passable English.	2020-03-10 17:34:09 -04:00
Peter Geoghegan	d1e241c226	nbtree: Demote minus infinity "can't happen" error. Only a very basic logic bug in a _bt_insertonpg() caller could lead to a violation of this invariant. Besides, any newitemoff used for an internal page is sanitized using other "can't happen" errors in _bt_getstackbuf() or its callers, before _bt_insertonpg() even gets called. Also, move the error/assertion from the insert-without-split path of _bt_insertonpg() to the top of the same function. There is no reason why this invariant only applies to insertions that happen to not result in a page split; cover every insertion. The assertion naturally belongs next to the existing generic assertions that document relatively high-level invariants for the item being inserted.	2020-03-10 14:15:41 -07:00
Tom Lane	cacef17223	Ensure that CREATE TABLE LIKE copies any NO INHERIT constraint property. Since the documentation about LIKE doesn't say that a copied constraint has properties different from the original, it seems that ignoring a NO INHERIT property doesn't meet the principle of least surprise. So make it copy that. (Note, however, that we still don't copy a NOT VALID property; CREATE TABLE offers no way to do that, plus it seems pointless.) Arguably this is a bug fix; but no back-patch, as it seems barely possible somebody is depending on the current behavior. Ildar Musin and Chris Travers; reviewed by Amit Langote and myself Discussion: https://postgr.es/m/CAONYFtMC6C+3AWCVp7Yd8H87Zn0GxG1_iQG6_bQKbaqYZY0=-g@mail.gmail.com	2020-03-10 14:54:00 -04:00
Tom Lane	d01f03a495	Preserve integer and float values accurately in (de)serialize_deflist. Previously, this code just smashed all types of DefElem values to strings, cavalierly reasoning that nobody would care. But in point of fact, most of the defGetFoo functions do distinguish among different input syntaxes; for instance defGetBoolean will accept 1 as an integer but not "1" as a string. This led to CREATE/ALTER TEXT SEARCH DICTIONARY accepting 0 and 1 as values for boolean dictionary properties, only to have the dictionary fail at runtime. We can upgrade this behavior by teaching serialize_deflist that it does not need to quote T_Integer or T_Float nodes' values on output, and then teaching deserialize_deflist to restore unquoted integer or float values as the appropriate node type. This should not break anything using pg_ts_dict.dictinitoption, since that field is just defined as being something valid to include in CREATE TEXT SEARCH DICTIONARY. deserialize_deflist is also used to parse the options arguments for the ts_headline family of functions, but so far as I can see this won't cause any problems there either: the only consumer of that output is prsd_headline which always uses defGetString. (Really that's a bad idea, but I won't risk changing it here.) This is surely a bug fix, but given the lack of field complaints I don't think it's necessary to back-patch. Discussion: https://postgr.es/m/CAMkU=1xRcs_BUPzR0+V3WndaCAv0E_m3h6aUEJ8NF-sY1nnHsw@mail.gmail.com	2020-03-10 12:30:02 -04:00
Alvaro Herrera	40b3e2c201	Split out CreateCast into src/backend/catalog/pg_cast.c This catalog-handling code was previously together with the rest of CastCreate() in src/backend/commands/functioncmds.c. A future patch will need a way to add casts internally, so this will be useful to have separate. Also, move the nearby get_cast_oid() function from functioncmds.c to lsyscache.c, which seems a more natural place for it. Author: Paul Jungwirth, minor edits by Álvaro Discussion: https://postgr.es/m/20200309210003.GA19992@alvherre.pgsql	2020-03-10 11:28:23 -03:00
Peter Eisentraut	3c173a53a8	Remove utils/acl.h from catalog/objectaddress.h The need for this was removed by `8b9e9644dc`. A number of files now need to include utils/acl.h or parser/parse_node.h explicitly where they previously got it indirectly somehow. Since parser/parse_node.h already includes nodes/parsenodes.h, the latter is then removed where the former was added. Also, remove nodes/pg_list.h from objectaddress.h, since that's included via nodes/parsenodes.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com	2020-03-10 10:27:00 +01:00
Peter Eisentraut	17b9e7f9fe	Support adding partitioned tables to publication When a partitioned table is added to a publication, changes of all of its partitions (current or future) are published via that publication. This change only affects which tables a publication considers as its members. The receiving side still sees the data coming from the individual leaf partitions. So existing restrictions that partition hierarchies can only be replicated one-to-one are not changed by this. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com	2020-03-10 09:09:32 +01:00
Michael Paquier	61d7c7bce3	Prevent reindex of invalid indexes on TOAST tables Such indexes can only be duplicated leftovers of a previously failed REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to exist. As toast indexes can only be dropped if invalid, reindexing these would lead to useless duplicated indexes that can't be dropped anymore, except if the parent relation is dropped. Thanks to Justin Pryzby for reminding that this problem was reported long ago during the review of the original patch of REINDEX CONCURRENTLY, but the issue was never addressed. Reported-by: Sergei Kornilov, Justin Pryzby Author: Julien Rouhaud Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net Discussion: https://postgr.es/m/20200216190835.GA21832@telsasoft.com Backpatch-through: 12	2020-03-10 15:38:17 +09:00
Fujii Masao	71e0d0a737	Tidy up XLogSource code in xlog.c. This commit replaces 0 used as an initial value of XLogSource variable, with XLOG_FROM_ANY. Also this commit changes those variable so that XLogSource instead of int is used as the type for them. These changes are for code readability and debugger-friendliness. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com	2020-03-10 09:41:44 +09:00
Jeff Davis	24d85952a5	Introduce LogicalTapeSetExtend(). Increases the number of tapes in a logical tape set. This will be important for disk-based hash aggregation, because the maximum number of tapes is not known ahead of time. While discussing this change, it was observed to regress the performance of Sort for at least one test case. The performance regression was because some versions of GCC switch to an inlined version of memcpy() in LogicalTapeWrite() after this change. No performance regression for clang was observed. Because the regression is due to an arbitrary decision by the compiler, I decided it shouldn't hold up this change. If it needs to be fixed, we can find a workaround. Author: Adam Lee, Jeff Davis Discussion: https://postgr.es/m/e54bfec11c59689890f277722aaaabd05f78e22c.camel%40j-davis.com	2020-03-09 10:40:02 -07:00
Fujii Masao	17d3fcdc39	Fix bug that causes to report waiting in PS display twice, in hot standby. Previously "waiting" could appear twice via PS in case of lock conflict in hot standby mode. Specifically this issue happend when the delay in WAL application determined by max_standby_archive_delay and max_standby_streaming_delay had passed but it took more than 500 msec to cancel all the conflicting transactions. Especially we can observe this easily by setting those delay parameters to -1. The cause of this issue was that WaitOnLock() and ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to the process title in that case. This commit prevents ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting in case of lock conflict, to fix the bug. Back-patch to all back branches. Author: Masahiko Sawada Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com	2020-03-10 00:14:43 +09:00
Peter Eisentraut	71d60e2aa0	Add tg_updatedcols to TriggerData This allows a trigger function to determine for an UPDATE trigger which columns were actually updated. This allows some optimizations in generic trigger functions such as lo_manage and tsvector_update_trigger. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com	2020-03-09 09:34:55 +01:00
Peter Eisentraut	8f152b6c50	Code simplification Initialize TriggerData to 0 for the whole struct together, instead of each field separately. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com	2020-03-09 09:34:55 +01:00
Fujii Masao	ef34ab42a8	Avoid assertion failure with targeted recovery in standby mode. At the end of recovery, standby mode is turned off to re-fetch the last valid record from archive or pg_wal. Previously, if recovery target was reached and standby mode was turned off while the current WAL source was stream, recovery could try to retrieve WAL file containing the last valid record unexpectedly from stream even though not in standby mode. This caused an assertion failure. That is, the assertion test confirms that WAL file should not be retrieved from stream if standby mode is not true. This commit moves back the current WAL source to archive if it's stream even though not in standby mode, to avoid that assertion failure. This issue doesn't cause the server to crash when built with assertion disabled. In this case, the attempt to retrieve WAL file from stream not in standby mode just fails. And then recovery tries to retrieve WAL file from archive or pg_wal. Back-patch to all supported branches. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com	2020-03-09 15:32:32 +09:00
Fujii Masao	d9249441ef	Mark ssl_passphrase_command as GUC_SUPERUSER_ONLY. This commit changes the GUC ssl_passphrase_command so that it's examinable by only superuser and a member of pg_read_all_settings. Per discussion, we determined to do this because the parameter may contain a sensitive informtaion like a passphrase itself. Author: Insung Moon Reviewed-by: Keisuke Kuroda Discussion: https://postgr.es/m/CAEMmqBuHVGayc+QkYKgx3gWSdqwTAQGw+0DYn3WhcX-eNa2ntA@mail.gmail.com	2020-03-09 11:41:31 +09:00
Tom Lane	ea7dace2aa	Simplify/speed up assertion cross-check in ginCompressPostingList(). I noticed while testing some other stuff that the CHECK_ENCODING_ROUNDTRIP logic in ginCompressPostingList could account for over 50% of the runtime of an INSERT with a GIN index. While that's not relevant to production performance, it's still kind of annoying in a debug build. Replacing the loop around short memcmp's with one long memcmp works just as well and is significantly faster, at least on my machine.	2020-03-07 13:31:17 -05:00
Peter Eisentraut	8d7def5c27	Fix typo	2020-03-07 08:51:59 +01:00
Tom Lane	a6525588b7	Allow Unicode escapes in any server encoding, not only UTF-8. SQL includes provisions for numeric Unicode escapes in string literals and identifiers. Previously we only accepted those if they represented ASCII characters or the server encoding was UTF-8, making the conversion to internal form trivial. This patch adjusts things so that we'll call the appropriate encoding conversion function in less-trivial cases, allowing the escape sequence to be accepted so long as it corresponds to some character available in the server encoding. This also applies to processing of Unicode escapes in JSONB. However, the old restriction still applies to client-side JSON processing, since that hasn't got access to the server's encoding conversion infrastructure. This patch includes some lexer infrastructure that simplifies throwing errors with error cursors pointing into the middle of a string (or other complex token). For the moment I only used it for errors relating to Unicode escapes, but we might later expand the usage to some other cases. Patch by me, reviewed by John Naylor. Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us	2020-03-06 14:17:43 -05:00
Tom Lane	fe30e7ebfa	Allow ALTER TYPE to change some properties of a base type. Specifically, this patch allows ALTER TYPE to: * Change the default TOAST strategy for a toastable base type; * Promote a non-toastable type to toastable; * Add/remove binary I/O functions for a type; * Add/remove typmod I/O functions for a type; * Add/remove a custom ANALYZE statistics functions for a type. The first of these can be done by the type's owner; all the others require superuser privilege since misuse could cause problems. The main motivation for this patch is to allow extensions to upgrade the feature sets of their data types, so the set of alterable properties is biased towards that use-case. However it's also true that changing some other properties would be a lot harder, as they get baked into physical storage and/or stored expressions that depend on the type. Along the way, refactor GenerateTypeDependencies() to make it easier to call, refactor DefineType's volatility checks so they can be shared by AlterType, and teach typcache.c that it might have to reload data from the type's pg_type row, a scenario it never handled before. Also rearrange alter_type.sgml a bit for clarity (put the composite-type operations together). Tomas Vondra and Tom Lane Discussion: https://postgr.es/m/20200228004440.b23ein4qvmxnlpht@development	2020-03-06 12:19:29 -05:00
Tom Lane	bb03010b9f	Remove the "opaque" pseudo-type and associated compatibility hacks. A long time ago, it was necessary to declare datatype I/O functions, triggers, and language handler support functions in a very type-unsafe way involving a single pseudo-type "opaque". We got rid of those conventions in 7.3, but there was still support in various places to automatically convert such functions to the modern declaration style, to be able to transparently re-load dumps from pre-7.3 servers. It seems unnecessary to continue to support that anymore, so take out the hacks; whereupon the "opaque" pseudo-type itself is no longer needed and can be dropped. This is part of a group of patches removing various server-side kluges for transparently upgrading pre-8.0 dump files. Since we've had few complaints about dropping pg_dump's support for dumping from pre-8.0 servers (commit `64f3524e2`), it seems okay to now remove these kluges. Discussion: https://postgr.es/m/4110.1583255415@sss.pgh.pa.us	2020-03-05 15:48:56 -05:00
Tom Lane	84eca14bc4	Remove ancient hacks to ignore certain opclass names in CREATE INDEX. Twenty years ago, we removed certain operator classes in favor of letting indexes over their data types be built with some other binary-compatible, more standard opclass. As a hack to allow existing index definitions to be dumped and reloaded, we made CREATE INDEX ignore the removed opclass names, so that such indexes would fall back to the new default opclass for their data types. This was never intended to be a long-lived thing; it carries the obvious risk of breaking some future developer's attempt to re-use those old opclass names. Since all of the cases in question are for opclasses that were removed before PG 8.0, it seems okay to get rid of these hacks now. This is part of a group of patches removing various server-side kluges for transparently upgrading pre-8.0 dump files. Since we've had few complaints about dropping pg_dump's support for dumping from pre-8.0 servers (commit `64f3524e2`), it seems okay to now remove these kluges. Discussion: https://postgr.es/m/3685.1583422389@sss.pgh.pa.us	2020-03-05 15:36:06 -05:00
Tom Lane	e58a599752	Remove ancient support for upgrading pre-7.3 foreign key constraints. Before 7.3, foreign key constraints had no explicit catalog representation, so that what pg_dump produced for them was (usually) a set of three CREATE CONSTRAINT TRIGGER commands. Commit `a2899ebdc` and some follow-on fixes added an ugly hack in CreateTrigger() to recognize that pattern and reconstruct the foreign key definition. However, we've never had any test coverage for that code, so that it's legitimate to wonder if it still works; and having to maintain it in the face of upcoming trigger-related patches seems rather pointless. Let's decree that its time has passed, and drop it. This is part of a group of patches removing various server-side kluges for transparently upgrading pre-8.0 dump files. Since we've had few complaints about dropping pg_dump's support for dumping from pre-8.0 servers (commit `64f3524e2`), it seems okay to now remove these kluges. Daniel Gustafsson Discussion: https://postgr.es/m/805874E2-999C-4CDA-856F-1AFBD9DFE933@yesql.se	2020-03-05 15:25:45 -05:00
Alvaro Herrera	a77315fdf2	Remove RangeIOData->typiofunc We used to carry the I/O function OID in RangeIOData, but it's not used for anything. Since the struct is not exposed to the world anyway, we can simplify it a bit. Also, rename the FmgrInfo member to match the accompanying 'typioparam' and put them in a more sensible order. Reviewed by Tom Lane and Paul Jungwirth. Discussion: https://postgr.es/m/20200304215711.GA8732@alvherre.pgsql	2020-03-05 11:35:02 -03:00
Michael Paquier	fbcf087112	Fix more issues with dependency handling at swap phase of REINDEX CONCURRENTLY When canceling a REINDEX CONCURRENTLY operation after swapping is done, a drop of the parent table would leave behind old indexes. This is a consequence of `68ac9cf`, which fixed the case of pg_depend bloat when repeating REINDEX CONCURRENTLY on the same relation. In order to take care of the problem without breaking the previous fix, this uses a different strategy, possible even with the exiting set of routines to handle dependency changes. The dependencies of/on the new index are additionally switched to the old one, allowing an old invalid index remaining around because of a cancellation or a failure to use the dependency links of the concurrently-created index. This ensures that dropping any objects the old invalid index depends on also drops the old index automatically. Reported-by: Julien Rouhaud Author: Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/20200227080735.l32fqcauy73lon7o@nol Backpatch-through: 12	2020-03-05 12:50:15 +09:00
Jeff Davis	c954d49046	Extend ExecBuildAggTrans() to support a NULL pointer check. Optionally push a step to check for a NULL pointer to the pergroup state. This will be important for disk-based hash aggregation in combination with grouping sets. When memory limits are reached, a given tuple may find its per-group state for some grouping sets but not others. For the former, it advances the per-group state as normal; for the latter, it skips evaluation and the calling code will have to spill the tuple and reprocess it in a later batch. Add the NULL check as a separate expression step because in some common cases it's not needed. Discussion: https://postgr.es/m/20200221202212.ssb2qpmdgrnx52sj%40alap3.anarazel.de	2020-03-04 17:29:18 -08:00
Tom Lane	3ed2005ff5	Introduce macros for typalign and typstorage constants. Our usual practice for "poor man's enum" catalog columns is to define macros for the possible values and use those, not literal constants, in C code. But for some reason lost in the mists of time, this was never done for typalign/attalign or typstorage/attstorage. It's never too late to make it better though, so let's do that. The reason I got interested in this right now is the need to duplicate some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch. But in general, this sort of change aids greppability and readability, so it's a good idea even without any specific motivation. I may have missed a few places that could be converted, and it's even more likely that pending patches will re-introduce some hard-coded references. But that's not fatal --- there's no expectation that we'd actually change any of these values. We can clean up stragglers over time. Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us	2020-03-04 10:34:25 -05:00
Tom Lane	d677550493	Allow to_date/to_timestamp to recognize non-English month/day names. to_char() has long allowed the TM (translation mode) prefix to specify output of translated month or day names; but that prefix had no effect in input format strings. Now it does. to_date() and to_timestamp() will now recognize the same month or day names that to_char() would output for the same format code. Matching is case-insensitive (per the active collation's notion of what that means), just as it has always been for English month/day names without the TM prefix. (As per the discussion thread, there are lots of cases that this feature will not handle, such as alternate day names. But being able to accept what to_char() will output seems useful enough.) In passing, fix some shaky English and violations of message style guidelines in jsonpath errors for the .datetime() method, which depends on this code. Juan José Santamaría Flecha, reviewed and modified by me, with other commentary from Alvaro Herrera, Tomas Vondra, Arthur Zakirov, Peter Eisentraut, Mark Dilger. Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com	2020-03-03 11:06:47 -05:00
Peter Geoghegan	1e07f5e0a1	Remove overzealous _bt_split() assertions. _bt_split() is passed NULL as its insertion scankey for internal page splits. Two recently added Assert() statements failed to consider this, leading to a crash with pg_upgrade'd BREE_VERSION < 4 indexes. Remove the assertions. The assertions in question were added by commit `0d861bbb`, which added nbtree deduplication. It would be possible to fix the assertions directly instead, but they weren't adding much anyway.	2020-03-02 21:40:11 -08:00
Michael Paquier	0b48f1335d	Fix assertion failure with ALTER TABLE ATTACH PARTITION and indexes Using ALTER TABLE ATTACH PARTITION causes an assertion failure when attempting to work on a partitioned index, because partitioned indexes cannot have partition bounds. The grammar of ALTER TABLE ATTACH PARTITION requires partition bounds, but not ALTER INDEX, so mixing ALTER TABLE with partitioned indexes is confusing. Hence, on HEAD, prevent ALTER TABLE to attach a partition if the relation involved is a partitioned index. On back-branches, as applications may rely on the existing behavior, just remove the culprit assertion. Reported-by: Alexander Lakhin Author: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/16276-5cd1dcc8fb8be7b5@postgresql.org Backpatch-through: 11	2020-03-03 13:55:41 +09:00
Fujii Masao	e65497df8f	Report progress of streaming base backup. This commit adds pg_stat_progress_basebackup view that reports the progress while an application like pg_basebackup is taking a base backup. This uses the progress reporting infrastructure added by `c16dc1aca5`, adding support for streaming base backup. Bump catversion. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Amit Langote, Sergei Kornilov Discussion: https://postgr.es/m/9ed8b801-8215-1f3d-62d7-65bff53f6e94@oss.nttdata.com	2020-03-03 12:03:43 +09:00
Michael Paquier	d79fb88ac7	Preserve pg_index.indisclustered across REINDEX CONCURRENTLY If the flag value is lost, a CLUSTER query following REINDEX CONCURRENTLY could fail. Non-concurrent REINDEX is already handling this case consistently. Author: Justin Pryzby Discussion: https://postgr.es/m/20200229024202.GH29456@telsasoft.com Backpatch-through: 12	2020-03-03 10:12:28 +09:00
Alvaro Herrera	2f9661311b	Represent command completion tags as structs The backend was using strings to represent command tags and doing string comparisons in multiple places, but that's slow and unhelpful. Create a new command list with a supporting structure to use instead; this is stored in a tag-list-file that can be tailored to specific purposes with a caller-definable C macro, similar to what we do for WAL resource managers. The first first such uses are a new CommandTag enum and a CommandTagBehavior struct. Replace numerous occurrences of char *completionTag with a QueryCompletion struct so that the code no longer stores information about completed queries in a cstring. Only at the last moment, in EndCommand(), does this get converted to a string. EventTriggerCacheItem no longer holds an array of palloc’d tag strings in sorted order, but rather just a Bitmapset over the CommandTags. Author: Mark Dilger, with unsolicited help from Álvaro Herrera Reviewed-by: John Naylor, Tom Lane Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com	2020-03-02 18:19:51 -03:00
Peter Geoghegan	77b88bd5dc	Add assertions to _bt_update_posting(). Copy some assertions from _bt_form_posting() to its sibling function, _bt_update_posting(). Discussion: https://postgr.es/m/CAH2-WzkPR8KMwkL0ap976kmXwBCeukTeHz6fB-U__wvuP1S9Zg@mail.gmail.com	2020-03-02 08:07:16 -08:00
Michael Paquier	12c5cad76f	Handle logical decoding in multi-insert for catalog tuples The code path for multi-insert decoding is not stressed yet for catalogs (a future patch may introduce this capability), so no back-patch is needed. Author: Daniel Gustafsson Discussion: https://postgr.es/m/9690D72F-5C4F-4016-9572-6D16684E1D87@yesql.se	2020-03-02 10:00:37 +09:00
Peter Geoghegan	84ec9b231a	Remove dead code from _bt_update_posting(). Discussion: https://postgr.es/m/CAH2-WzmAufHiOku6AGiFD=81VQs5nYJ1L2YkhW1t+BH4CMsgRw@mail.gmail.com	2020-03-01 12:11:26 -08:00
Dean Rasheed	43a899f41f	Fix corner-case loss of precision in numeric ln(). When deciding on the local rscale to use for the Taylor series expansion, ln_var() neglected to account for the fact that the result is subsequently multiplied by a factor of 2^(nsqrt+1), where nsqrt is the number of square root operations performed in the range reduction step, which can be as high as 22 for very large inputs. This could result in a loss of precision, particularly when combined with large rscale values, for which a large number of Taylor series terms is required (up to around 400). Fix by computing a few extra digits in the Taylor series, based on the weight of the multiplicative factor log10(2^(nsqrt+1)). It remains to be proven whether or not the other 8 extra digits used for the Taylor series is appropriate, but this at least deals with the obvious oversight of failing to account for the effects of the final multiplication. Per report from Justin AnyhowStep. Reviewed by Tom Lane. Discussion: https://postgr.es/m/16280-279f299d9c06e56f@postgresql.org	2020-03-01 14:49:25 +00:00
Tom Lane	58c47ccfff	Correctly re-use hash tables in buildSubPlanHash(). Commit `356687bd8` omitted to remove leftover code for destroying a hashed subplan's hash tables, with the result that the tables were always rebuilt not reused; this leads to severe memory leakage if a hashed subplan is re-executed enough times. Moreover, the code for reusing the hashnulls table had a typo that would have made it do the wrong thing if it were reached. Looking at the code coverage report shows severe under-coverage of the potential callers of ResetTupleHashTable, so add some test cases that exercise them. Andreas Karlsson and Tom Lane, per reports from Ranier Vilela and Justin Pryzby. Backpatch to v11, as the faulty commit was. Discussion: https://postgr.es/m/edb62547-c453-c35b-3ed6-a069e4d6b937@proxel.se Discussion: https://postgr.es/m/CAEudQAo=DCebm1RXtig9OH+QivpS97sMkikt0A9qHmMUs+g6ZA@mail.gmail.com Discussion: https://postgr.es/m/20200210032547.GA1412@telsasoft.com	2020-02-29 13:48:09 -05:00
Tom Lane	6afc8aefd3	Remove obsolete comment. Noted while studying subplan hash issue.	2020-02-29 13:23:12 -05:00
Tom Lane	80d76be51c	Avoid failure if autovacuum tries to access a just-dropped temp namespace. Such an access became possible when commit `246a6c8f7` added more aggressive cleanup of orphaned temp relations by autovacuum. Since autovacuum's snapshot might be slightly stale, it could attempt to access an already-dropped temp namespace, resulting in an assertion failure or null-pointer dereference. (In practice, since we don't drop temp namespaces automatically but merely recycle them, this situation could only arise if a superuser does a manual drop of a temp namespace. Still, that should be allowed.) The core of the bug, IMO, is that isTempNamespaceInUse and its callers failed to think hard about whether to treat "temp namespace isn't there" differently from "temp namespace isn't in use". In hopes of forestalling future mistakes of the same ilk, replace that function with a new one checkTempNamespaceStatus, which makes the same tests but returns a three-way enum rather than just a bool. isTempNamespaceInUse is gone entirely in HEAD; but just in case some external code is relying on it, keep it in the back branches, as a bug-compatible wrapper around the new function. Per report originally from Prabhat Kumar Sahu, investigated by Mahendra Singh and Michael Paquier; the final form of the patch is my fault. This replaces the failed fix attempt in `a052f6cbb`. Backpatch as far as v11, as `246a6c8f7` was. Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com	2020-02-28 20:28:34 -05:00
Jeff Davis	32bb4535a0	Fix commit `c11cb17d`. I neglected to update copyfuncs/outfuncs/readfuncs. Discussion: https://postgr.es/m/12491.1582833409%40sss.pgh.pa.us	2020-02-28 09:35:11 -08:00
Alvaro Herrera	db989184cd	Add comments on avoid reuse of parse-time snapshot Apparently, reusing the parse-time query snapshot for later steps (execution) is a frequently considered optimization ... but it doesn't work, for reasons discovered in thread [1]. Adding some comments about why it doesn't really work can relieve some future hackers from wasting time reimplementing it again. [1] https://postgr.es/m/flat/5075D8DF.6050500@fuzzy.cz Author: Michail Nikolaev Discussion: https://postgr.es/m/CANtu0ogp6cTvMJObXP8n=k+JtqxY1iT9UV5MbGCpjjPa5crCiw@mail.gmail.com	2020-02-28 13:25:26 -03:00
Peter Eisentraut	1933ae629e	Add PostgreSQL home page to --help output Per emerging standard in GNU programs and elsewhere. Autoconf already has support for specifying a home page, so we can just that. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/8d389c5f-7fb5-8e48-9a4a-68cec44786fa%402ndquadrant.com	2020-02-28 13:12:21 +01:00
Peter Eisentraut	864934131e	Refer to bug report address by symbol rather than hardcoding Use the PACKAGE_BUGREPORT macro that is created by Autoconf for referring to the bug reporting address rather than hardcoding it everywhere. This makes it easier to change the address and it reduces translation work. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/8d389c5f-7fb5-8e48-9a4a-68cec44786fa%402ndquadrant.com	2020-02-28 13:12:21 +01:00
Jeff Davis	c11cb17dc5	Save calculated transitionSpace in Agg node. This will be useful in the upcoming Hash Aggregation work to improve estimates for hash table sizing. Discussion: https://postgr.es/m/37091115219dd522fd9ed67333ee8ed1b7e09443.camel%40j-davis.com	2020-02-27 11:20:56 -08:00
Alvaro Herrera	b9b408c487	Record parents of triggers This let us get rid of a recently introduced ugly hack (commit `1fa846f1c9`). Author: Álvaro Herrera Reviewed-by: Amit Langote, Tom Lane Discussion: https://postgr.es/m/20200217215641.GA29784@alvherre.pgsql	2020-02-27 13:23:33 -03:00
Robert Haas	05d8449e73	Move src/backend/utils/hash/hashfn.c to src/common This also involves renaming src/include/utils/hashutils.h, which becomes src/include/common/hashfn.h. Perhaps an argument can be made for keeping the hashutils.h name, but it seemed more consistent to make it match the name of the file, and also more descriptive of what is actually going on here. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list advice on how not to break the Windows build from Davinder Singh and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com	2020-02-27 09:25:41 +05:30
Peter Geoghegan	2c0797da2c	Silence another compiler warning in nbtinsert.c. Per complaint from Álvaro Herrera.	2020-02-26 15:15:45 -08:00
Tom Lane	a477bfc1df	Suppress unnecessary RelabelType nodes in more cases. eval_const_expressions sometimes produced RelabelType nodes that were useless because they just relabeled an expression to the same exposed type it already had. This is worth avoiding because it can cause two equivalent expressions to not be equal(), preventing recognition of useful optimizations. In the test case added here, an unpatched planner fails to notice that the "sqli = constant" clause renders a sort step unnecessary, because one code path produces an extra RelabelType and another doesn't. Fix by ensuring that eval_const_expressions_mutator's T_RelabelType case will not add in an unnecessary RelabelType. Also save some code by sharing a subroutine with the effectively-equivalent cases for CollateExpr and CoerceToDomain. (CollateExpr had no bug, and I think that the case couldn't arise with CoerceToDomain, but it seems prudent to do the same check for all three cases.) Back-patch to v12. In principle this has been wrong all along, but I haven't seen a case where it causes visible misbehavior before v12, so refrain from changing stable branches unnecessarily. Per investigation of a report from Eric Gillum. Discussion: https://postgr.es/m/CAMmjdmvAZsUEskHYj=KT9sTukVVCiCSoe_PBKOXsncFeAUDPCQ@mail.gmail.com	2020-02-26 18:14:12 -05:00
Peter Geoghegan	2d8a6fad18	Silence compiler warning in nbtinsert.c. Per buildfarm member longfin.	2020-02-26 13:17:36 -08:00
Peter Geoghegan	0d861bbb70	Add deduplication to nbtree. Deduplication reduces the storage overhead of duplicates in indexes that use the standard nbtree index access method. The deduplication process is applied lazily, after the point where opportunistic deletion of LP_DEAD-marked index tuples occurs. Deduplication is only applied at the point where a leaf page split would otherwise be required. New posting list tuples are formed by merging together existing duplicate tuples. The physical representation of the items on an nbtree leaf page is made more space efficient by deduplication, but the logical contents of the page are not changed. Even unique indexes make use of deduplication as a way of controlling bloat from duplicates whose TIDs point to different versions of the same logical table row. The lazy approach taken by nbtree has significant advantages over a GIN style eager approach. Most individual inserts of index tuples have exactly the same overhead as before. The extra overhead of deduplication is amortized across insertions, just like the overhead of page splits. The key space of indexes works in the same way as it has since commit `dd299df8` (the commit that made heap TID a tiebreaker column). Testing has shown that nbtree deduplication can generally make indexes with about 10 or 15 tuples for each distinct key value about 2.5X - 4X smaller, even with single column integer indexes (e.g., an index on a referencing column that accompanies a foreign key). The final size of single column nbtree indexes comes close to the final size of a similar contrib/btree_gin index, at least in cases where GIN's posting list compression isn't very effective. This can significantly improve transaction throughput, and significantly reduce the cost of vacuuming indexes. A new index storage parameter (deduplicate_items) controls the use of deduplication. The default setting is 'on', so all new B-Tree indexes automatically use deduplication where possible. This decision will be reviewed at the end of the Postgres 13 beta period. There is a regression of approximately 2% of transaction throughput with synthetic workloads that consist of append-only inserts into a table with several non-unique indexes, where all indexes have few or no repeated values. The underlying issue is that cycles are wasted on unsuccessful attempts at deduplicating items in non-unique indexes. There doesn't seem to be a way around it short of disabling deduplication entirely. Note that deduplication of items in unique indexes is fairly well targeted in general, which avoids the problem there (we can use a special heuristic to trigger deduplication passes in unique indexes, since we're specifically targeting "version bloat"). Bump XLOG_PAGE_MAGIC because xl_btree_vacuum changed. No bump in BTREE_VERSION, since the representation of posting list tuples works in a way that's backwards compatible with version 4 indexes (i.e. indexes built on PostgreSQL 12). However, users must still REINDEX a pg_upgrade'd index to use deduplication, regardless of the Postgres version they've upgraded from. This is the only way to set the new nbtree metapage flag indicating that deduplication is generally safe. Author: Anastasia Lubennikova, Peter Geoghegan Reviewed-By: Peter Geoghegan, Heikki Linnakangas Discussion: https://postgr.es/m/55E4051B.7020209@postgrespro.ru https://postgr.es/m/4ab6e2db-bcee-f4cf-0916-3a06e6ccbb55@postgrespro.ru	2020-02-26 13:05:30 -08:00
Peter Geoghegan	612a1ab767	Add equalimage B-Tree support functions. Invent the concept of a B-Tree equalimage ("equality implies image equality") support function, registered as support function 4. This indicates whether it is safe (or not safe) to apply optimizations that assume that any two datums considered equal by an operator class's order method must be interchangeable without any loss of semantic information. This is static information about an operator class and a collation. Register an equalimage routine for almost all of the existing B-Tree opclasses. We only need two trivial routines for all of the opclasses that are included with the core distribution. There is one routine for opclasses that index non-collatable types (which returns 'true' unconditionally), plus another routine for collatable types (which returns 'true' when the collation is a deterministic collation). This patch is infrastructure for an upcoming patch that adds B-Tree deduplication. Author: Peter Geoghegan, Anastasia Lubennikova Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com	2020-02-26 11:28:25 -08:00
Andres Freund	2742c45080	expression eval: Reduce number of steps for agg transition invocations. Do so by combining the various steps that are part of aggregate transition function invocation into one larger step. As some of the current steps are only necessary for some aggregates, have one variant of the aggregate transition step for each possible combination. To avoid further manual copies of code in the different transition step implementations, move most of the code into helper functions marked as "always inline". The benefit of this change is an increase in performance when aggregating lots of rows. This comes in part due to the reduced number of indirect jumps due to the reduced number of steps, and in part by reducing redundant setup code across steps. This mainly benefits interpreted execution, but the code generated by JIT is also improved a bit. As a nice side-effect it also ends up making the code a bit simpler. A small additional optimization is removing the need to set aggstate->curaggcontext before calling ExecAggInitGroup, choosing to instead passign curaggcontext as an argument. It was, in contrast to other aggregate related functions, only needed to fetch a memory context to copy the transition value into. Author: Andres Freund Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de https://postgr.es/m/5c371df7cee903e8cd4c685f90c6c72086d3a2dc.camel@j-davis.com	2020-02-24 15:09:09 -08:00
Michael Paquier	7d672b76bf	Issue properly WAL record for CID of first catalog tuple in multi-insert Multi-insert for heap is not yet used actively for catalogs, but the code to support this case is in place for logical decoding. The existing code forgot to issue a XLOG_HEAP2_NEW_CID record for the first tuple inserted, leading to failures when attempting to use multiple inserts for catalogs at decoding time. This commit fixes the problem by WAL-logging the needed CID. This is not an active bug, so no back-patch is done. Author: Daniel Gustafsson Discussion: https://postgr.es/m/E0D4CC67-A1CF-4DF4-991D-B3AC2EB5FAE9@yesql.se	2020-02-25 07:55:22 +09:00
Tom Lane	3d475515a1	Account explicitly for long-lived FDs that are allocated outside fd.c. The comments in fd.c have long claimed that all file allocations should go through that module, but in reality that's not always practical. fd.c doesn't supply APIs for invoking some FD-producing syscalls like pipe() or epoll_create(); and the APIs it does supply for non-virtual FDs are mostly insistent on releasing those FDs at transaction end; and in some cases the actual open() call is in code that can't be made to use fd.c, such as libpq. This has led to a situation where, in a modern server, there are likely to be seven or so long-lived FDs per backend process that are not known to fd.c. Since NUM_RESERVED_FDS is only 10, that meant we had very few spare FDs if max_files_per_process is >= the system ulimit and fd.c had opened all the files it thought it safely could. The contrib/postgres_fdw regression test, in particular, could easily be made to fall over by running it under a restrictive ulimit. To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD that allow outside callers to tell fd.c that they have or want to allocate a FD that's not directly managed by fd.c. Add calls to track all the fixed FDs in a standard backend session, so that we are honestly guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE limit in a backend's idle state. The coding rules for these functions say that there's no need to call them in code that just allocates one FD over a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases. That means that there aren't all that many places where we need to worry. But postgres_fdw and dblink must use this facility to account for long-lived FDs consumed by libpq connections. There may be other places where it's worth doing such accounting, too, but this seems like enough to solve the immediate problem. Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs. (Callers can choose to ignore this limit, but of course it's unwise to do so except for fixed file allocations.) I also reduced the limit on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2). Conceivably a smarter rule could be used here --- but in practice, on reasonable systems, max_safe_fds should be large enough that this isn't much of an issue, so KISS for now. To avoid possible regression in the number of external or allocated files that can be opened, increase FD_MINFREE and the lower limit on max_files_per_process a little bit; we now insist that the effective "ulimit -n" be at least 64. This seems like pretty clearly a bug fix, but in view of the lack of field complaints, I'll refrain from risking a back-patch. Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org	2020-02-24 17:28:33 -05:00
Robert Haas	a91e2fa941	Adapt hashfn.c and hashutils.h for frontend use. hash_any() and its various variants are defined to return Datum, which is a backend-only concept, but the underlying functions actually want to return uint32 and uint64, and only return Datum because it's convenient for callers who are using them to implement a hash function for some SQL datatype. However, changing these functions to return uint32 and uint64 seems like it might lead to programming errors or back-patching difficulties, both because they are widely used and because failure to use UInt{32,64}GetDatum() might not provoke a compilation error. Instead, rename the existing functions as well as changing the return type, and add static inline wrappers for those callers that need the previous behavior. Although this commit adapts hashutils.h and hashfn.c so that they can be compiled as frontend code, it does not actually do anything that would cause them to be so compiled. That is left for another commit. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com	2020-02-24 17:27:15 +05:30
Robert Haas	9341c783cc	Put all the prototypes for hashfn.c into the same header file. Previously, some of the prototypes for functions in hashfn.c were in utils/hashutils.h and others were in utils/hsearch.h, but that is confusing and has no particular benefit. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com	2020-02-24 17:22:45 +05:30
Robert Haas	07b95c3d83	Move bitmap_hash and bitmap_match to bitmapset.c. The closely-related function bms_hash_value is already defined in that file, and this change means that hashfn.c no longer needs to depend on nodes/bitmapset.h. That gets us closer to allowing use of the hash functions in hashfn.c in frontend code. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com	2020-02-24 17:17:43 +05:30
Michael Paquier	bf883b211e	Add prefix checks in exclude lists for pg_rewind, pg_checksums and base backups An instance of PostgreSQL crashing with a bad timing could leave behind temporary pg_internal.init files, potentially causing failures when verifying checksums. As the same exclusion lists are used between pg_rewind, pg_checksums and basebackup.c, all those tools are extended with prefix checks to keep everything in sync, with dedicated checks added for pg_internal.init. Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and checksum verification for base backups have been introduced. Reported-by: Michael Banck Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/62031974fd8e941dd8351fbc8c7eff60d59c5338.camel@credativ.de Backpatch-through: 11	2020-02-24 18:13:25 +09:00
Peter Eisentraut	79c2385915	Factor out InitControlFile() from BootStrapXLOG() Right now this only makes BootStrapXLOG() a bit more manageable, but in the future there may be external callers. Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com	2020-02-22 12:09:27 +01:00
Peter Eisentraut	9745f93afc	Reformat code comment Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com	2020-02-22 12:09:27 +01:00
Tom Lane	97cf1fa4ed	Assume that we have <wchar.h>. Windows has this, and so do all other live platforms according to the buildfarm; it's been required by POSIX since SUSv2. So remove the configure probe and tests of HAVE_WCHAR_H. This is part of a series of commits to get rid of no-longer-relevant configure checks and dead src/port/ code. I'm committing them separately to make it easier to back out individual changes if they prove less portable than I expect. Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us	2020-02-21 14:30:47 -05:00
Tom Lane	481c8e9232	Assume that we have utime() and <utime.h>. These are required by POSIX since SUSv2, and no live platforms fail to provide them. On Windows, utime() exists and we bring our own <utime.h>, so we're good there too. So remove the configure probes and ad-hoc substitute code. We don't need to check for utimes() anymore either, since that was only used as a substitute. In passing, make the Windows build include <sys/utime.h> only where we need it, not everywhere. This is part of a series of commits to get rid of no-longer-relevant configure checks and dead src/port/ code. I'm committing them separately to make it easier to back out individual changes if they prove less portable than I expect. Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us	2020-02-21 14:30:47 -05:00
Tom Lane	abe41f453a	Assume that we have cbrt(). Windows has this, and so do all other live platforms according to the buildfarm, so remove the configure probe and float.c's substitute code. This is part of a series of commits to get rid of no-longer-relevant configure checks and dead src/port/ code. I'm committing them separately to make it easier to back out individual changes if they prove less portable than I expect. Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us	2020-02-21 14:30:47 -05:00
Jeff Davis	b7fabe80df	Fixup for nodeAgg.c refactor. Commit `5b618e1f` made an unintended behavior change.	2020-02-21 09:34:20 -08:00
Etsuro Fujita	032f9ae012	Avoid redundant checks in partition_bounds_copy(). Previously, partition_bounds_copy() checked whether the strategy for the given partition bounds was hash or not, and then determined the number of elements in the datums in the datums array for the partition bounds, on each iteration of the loop for copying the datums array, but there is no need to do that. Perform the checks only once before the loop iteration. Author: Etsuro Fujita Reported-by: Amit Langote and Julien Rouhaud Discussion: https://postgr.es/m/CAPmGK14Rvxrm8DHWvCjdoks6nwZuHBPvMnWZ6rkEx2KhFeEoPQ@mail.gmail.com	2020-02-21 20:00:45 +09:00
Etsuro Fujita	53b01acd46	Remove extra word from comment.	2020-02-20 19:15:00 +09:00
Tom Lane	70a7732007	Remove support for upgrading extensions from "unpackaged" state. Andres Freund pointed out that allowing non-superusers to run "CREATE EXTENSION ... FROM unpackaged" has security risks, since the unpackaged-to-1.0 scripts don't try to verify that the existing objects they're modifying are what they expect. Just attaching such objects to an extension doesn't seem too dangerous, but some of them do more than that. We could have resolved this, perhaps, by still requiring superuser privilege to use the FROM option. However, it's fair to ask just what we're accomplishing by continuing to lug the unpackaged-to-1.0 scripts forward. None of them have received any real testing since 9.1 days, so they may not even work anymore (even assuming that one could still load the previous "loose" object definitions into a v13 database). And an installation that's trying to go from pre-9.1 to v13 or later in one jump is going to have worse compatibility problems than whether there's a trivial way to convert their contrib modules into extension style. Hence, let's just drop both those scripts and the core-code support for "CREATE EXTENSION ... FROM". Discussion: https://postgr.es/m/20200213233015.r6rnubcvl4egdh5r@alap3.anarazel.de	2020-02-19 16:59:14 -05:00
Jeff Davis	5b618e1f48	Minor refactor of nodeAgg.c. * Separate calculation of hash value from the lookup. * Split build_hash_table() into two functions. * Change lookup_hash_entry() to return AggStatePerGroup. That's all the caller needed, anyway. These changes are to support the upcoming Disk-based Hash Aggregation work. Discussion: https://postgr.es/m/31f5ab871a3ad5a1a91a7a797651f20e77ac7ce3.camel%40j-davis.com	2020-02-19 10:39:11 -08:00
Jeff Davis	8021985d79	logtape.c: allocate read buffer even for an empty tape. Prior to this commit, the read buffer was allocated at the time the tape was rewound; but as an optimization, would not be allocated at all if the tape was empty. That optimization meant that it was valid to have a rewound tape with the buffer set to NULL, but only if a number of conditions were met and only if the API was used properly. After `7fdd919a` refactored the code to support lazily-allocating the buffer, Coverity started complaining. The optimization for empty tapes doesn't seem important, so just allocate the buffer whether the tape has any data or not. Discussion: https://postgr.es/m/20351.1581868306%40sss.pgh.pa.us	2020-02-19 10:04:17 -08:00
Fujii Masao	0074919794	Fix mesurement of elapsed time during truncating heap in VACUUM. VACUUM may truncate heap in several batches. The activity report is logged for each batch, and contains the number of pages in the table before and after the truncation, and also the elapsed time during the truncation. Previously the elapsed time reported in each batch was the total elapsed time since starting the truncation until finishing each batch. For example, if the truncation was processed dividing into three batches, the second batch reported the accumulated time elapsed during both first and second batches. This is strange and confusing because the number of pages in the table reported together is not total. Instead, each batch should report the time elapsed during only that batch. The cause of this issue was that the resource usage snapshot was initialized only at the beginning of the truncation and was never reset later. This commit fixes the issue by changing VACUUM so that the resource usage snapshot is reset at each batch. Back-patch to all supported branches. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Reviewed-by: Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVJsf=NvQuy+QXQZ7B=ZVLoDV_JzsVC1FRsF1G18i3zMGg@mail.gmail.com	2020-02-19 20:37:26 +09:00
Michael Paquier	e2e02191e2	Clean up some code, comments and docs referring to Windows 2000 and older This fixes and updates a couple of comments related to outdated Windows versions. Particularly, src/common/exec.c had a fallback implementation to read a file's line from a pipe because stdin/stdout/stderr does not exist in Windows 2000 that is removed to simplify src/common/ as there are unlikely versions of Postgres running on such platforms. Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Juan José Santamaría Flecha Discussion: https://postgr.es/m/20191219021526.GC4202@paquier.xyz	2020-02-19 13:20:33 +09:00
Amit Kapila	e3ff789acf	Stop demanding that top xact must be seen before subxact in decoding. Manifested as ERROR: subtransaction logged without previous top-level txn record this check forbids legit behaviours like - First xl_xact_assignment record is beyond reading, i.e. earlier restart_lsn. - After restart_lsn there is some change of a subxact. - After that, there is second xl_xact_assignment (for another subxact) revealing the relationship between top and first subxact. Such a transaction won't be streamed anyway because we hadn't seen it in full. Saying for sure whether xact of some record encountered after the snapshot was deserialized can be streamed or not requires to know whether it wrote something before deserialization point --if yes, it hasn't been seen in full and can't be decoded. Snapshot doesn't have such info, so there is no easy way to relax the check. Reported-by: Hsu, John Diagnosed-by: Arseny Sher Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar Backpatch-through: 9.5 Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com	2020-02-19 08:15:49 +05:30
Peter Geoghegan	fe9b92854e	Remove obsolete _bt_compare() comment. btbuild() has nothing to say about how NULL values compare in nbtree. Besides, there are _bt_compare() header comments that describe how NULL values are handled.	2020-02-18 16:07:16 -08:00
Fujii Masao	b7e51b350c	Make inherited LOCK TABLE perform access permission checks on parent table only. Previously, LOCK TABLE command through a parent table checked the permissions on not only the parent table but also the children tables inherited from it. This was a bug and inherited queries should perform access permission checks on the parent table only. This commit fixes LOCK TABLE so that it does not check the permission on children tables. This patch is applied only in the master branch. We decided not to back-patch because it's not hard to imagine that there are some applications expecting the old behavior and the change breaks their security. Author: Amit Langote Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CAHGQGwE+GauyG7POtRfRwwthAGwTjPQYdFHR97+LzA4RHGnJxA@mail.gmail.com	2020-02-18 13:13:15 +09:00
Michael Paquier	958f9fb98d	Remove duplicated words in comments Author: Daniel Gustafsson Reviewed-by: Vik Fearing Discussion: https://postgr.es/m/EBC3BFEB-664C-4063-81ED-29F1227DB012@yesql.se	2020-02-18 12:20:55 +09:00
Peter Eisentraut	c6679e4fca	Optimize update of tables with generated columns When updating a table row with generated columns, only recompute those generated columns whose base columns have changed in this update and keep the rest unchanged. This can result in a significant performance benefit. The required information was already kept in RangeTblEntry.extraUpdatedCols; we just have to make use of it. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com	2020-02-17 15:20:58 +01:00
Peter Eisentraut	ad3ae64770	Fill in extraUpdatedCols in logical replication The extraUpdatedCols field of the target RTE records which generated columns are affected by an update. This is used in a variety of places, including per-column triggers and foreign data wrappers. When an update was initiated by a logical replication subscription, this field was not filled in, so such an update would not affect generated columns in a way that is consistent with normal updates. To fix, factor out some code from analyze.c to fill in extraUpdatedCols in the logical replication worker as well. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com	2020-02-17 15:20:57 +01:00
Fujii Masao	f4ae722141	Add description about GSSOpenServer wait event into document. This commit also updates wait event enum into alphabetical order. Previously the enum entry for GSSOpenServer was added out-of-order. Back-patch to v12 where commit `b0b39f72b9` introduced GSSOpenServer wait event. In v12, the commit doesn't include the update of wait event enum, not to break ABI. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/949931aa-4ed4-d867-a7b5-de9c02b2292b@oss.nttdata.com	2020-02-17 16:16:08 +09:00
Tom Lane	faade5d4c6	Update obsolete comment. Noted by Justin Pryzby, though I chose to just rip out the stale text, as it's in no way relevant to this particular function. Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com	2020-02-15 15:22:40 -05:00
Tom Lane	9d1ec5a8e1	Clarify coding in Catalog::AddDefaultValues. Make it a bit shorter and better-commented; no functional change. Alvaro Herrera and Tom Lane Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com	2020-02-15 15:13:44 -05:00
Tom Lane	86ff085e83	Don't require pg_class.dat to contain correct relnatts values. Practically everybody who's ever added a column to one of the bootstrap catalogs has been burnt by the need to update the relnatts field in the initial pg_class data to match. Now that we use Perl scripts to generate postgres.bki, we can have the machines take care of that, by filling the field during genbki.pl. While at it, use the BKI_DEFAULTS mechanism to eliminate repetitive specifications of other column values in pg_class.dat, too. They weren't particularly a maintenance problem, but this way is prettier (certainly the spotty previous usage of BKI_DEFAULTS wasn't pretty). No catversion bump needed, since this doesn't actually change the contents of postgres.bki. Per gripe from Justin Pryzby, though this is quite different from his originally proposed solution. Amit Langote, John Naylor, Tom Lane Discussion: https://postgr.es/m/20200212182337.GZ1412@telsasoft.com	2020-02-15 14:57:27 -05:00
Jeff Davis	7fdd919ae7	Logical Tape Set: lazily allocate read buffer. The write buffer was already lazily-allocated, so this is more symmetric. It also means that a freshly-rewound tape (whether for reading or writing) is not consuming memory for the buffer. Discussion: https://postgr.es/m/97c46a59c27f3c38e486ca170fcbc618d97ab049.camel%40j-davis.com	2020-02-13 10:44:25 -08:00
Tom Lane	607f8ce74d	Avoid a performance regression in float overflow/underflow detection. Commit `6bf0bc842` replaced float.c's CHECKFLOATVAL() macro with static inline subroutines, but that wasn't too well thought out. In the original coding, the unlikely condition (isinf(result) or result == 0) was checked first, and the inf_is_valid or zero_is_valid condition only afterwards. The inline-subroutine coding caused that to be swapped around, which is pretty horrid for performance because (a) in common cases the is_valid condition is twice as expensive to evaluate (e.g., requiring two isinf() calls not one) and (b) in common cases the is_valid condition is false, requiring us to perform the unlikely-condition check anyway. Net result is that one isinf() call becomes two or three, resulting in visible performance loss as reported by Keisuke Kuroda. The original fix proposal was to revert the replacement of the macro, but on second thought, that macro was just a bad idea from the beginning: if anything it's a net negative for readability of the code. So instead, let's just open-code all the overflow/underflow tests, being careful to test the unlikely condition first (and mark it unlikely() to help the compiler get the point). Also, rather than having N copies of the actual ereport() calls, collapse those into out-of-line error subroutines to save some code space. This does mean that the error file/line numbers won't be very helpful for figuring out where the issue really is --- but we'd already burned that bridge by putting the ereports into static inlines. In HEAD, check_float[48]_val() are gone altogether. In v12, leave them present in float.h but unused in the core code, just in case some extension is depending on them. Emre Hasegeli, with some kibitzing from me and Andres Freund Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com	2020-02-13 13:37:43 -05:00
Tom Lane	0973f5602c	Remove long-dead comments. These should've been dropped by `a8bb8eb58`, but evidently were missed. Not important enough to back-patch.	2020-02-12 14:33:49 -05:00
Thomas Munro	701a51fd4e	Use pg_pwrite() in more places. This removes some lseek() system calls. Author: Thomas Munro Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGJ%2BoHhnvqjn3%3DHro7xu-YDR8FPr0FL6LF35kHRX%3D_bUzg%40mail.gmail.com	2020-02-11 17:50:22 +13:00
Jeff Davis	11de6c903d	Change signature of TupleHashTableHash(). Commit `4eaea3db` introduced TupleHashTableHash(), but the signature didn't match the other exposed functions. Separate it into internal and external versions. The external version hides the details behind an API more consistent with the other external functions, and the internal version is still suitable for simplehash.	2020-02-10 10:20:10 -08:00
Alvaro Herrera	b048f558dd	Fix priv checks for ALTER <object> DEPENDS ON EXTENSION Marking an object as dependant on an extension did not have any privilege check whatsoever; this allowed any user to mark objects as droppable by anyone able to DROP EXTENSION, which could be used to cause system-wide havoc. Disallow by checking that the calling user owns the mentioned object. (No constraints are placed on the extension.) Security: CVE-2020-1720 Reported-by: Tom Lane Discussion: 31605.1566429043@sss.pgh.pa.us	2020-02-10 11:47:09 -03:00
Amit Kapila	3dfba9fdf5	Fix typos. Reported-by: Justin Pryzby Author: Justin Pryzby Discussion: https://postgr.es/m/20200206021432.GA24549@telsasoft.com	2020-02-10 09:31:18 +05:30
Tom Lane	4093ff5737	Store the deletion horizon XID for a deleted GIN page on the right page. Commit `b10714080` moved the GinPageSetDeleteXid() call to a spot where the "page" variable was pointing to the wrong page, causing the XID to be inserted on a page that's not being deleted, thus allowing later GinPageIsRecyclable tests to recycle the deleted page too soon. It might be a good idea to stop using the single "page" variable for multiple purposes in this function. But for the moment I just moved the GinPageSetDeleteXid() call down beside the GinPageSetDeleted() call, which seems like a more logical place for it anyway. Back-patch to v11, as the faulty patch was. (Fortunately, the bug hasn't made it into any release yet.) Discussion: https://postgr.es/m/21620.1581098806@sss.pgh.pa.us	2020-02-09 12:02:57 -05:00
Alvaro Herrera	55173d2e66	Fix failure to create FKs correctly in partitions On a multi-level partioned table, when adding a partition not directly connected to the root table, foreign key constraints referencing the root were not cloned to the new partition, leading to the FK being possibly inadvertently violated later on. This was caused by fuzzy thinking in CloneFkReferenced (commit `f56f8f8da6`): it was skipping constraints marked as having parents on the theory that cloning those would create duplicates; but that's only correct for the top level of the partitioning hierarchy. For levels below that one, such constraints must still be considered and only skipped if later on we see that we'd create duplicates. Apparently, I (Álvaro) wrote the comments right but the code implemented something slightly different. Author: Jehan-Guillaume de Rorthais Discussion: https://postgr.es/m/20200206004948.238352db@firost	2020-02-07 18:27:18 -03:00
Alvaro Herrera	9710d3d4a8	Fix TRUNCATE .. CASCADE on partitions When running TRUNCATE CASCADE on a child of a partitioned table referenced by another partitioned table, the truncate was not applied to partitions of the referencing table; this could leave rows violating the constraint in the referencing partitioned table. Repair by walking the pg_constraint chain all the way up to the topmost referencing table. Note: any partitioned tables containing FKs that reference other partitioned tables should be checked for possible violating rows, if TRUNCATE has occurred in partitions of the referenced table. Reported-by: Christophe Courtois Author: Jehan-Guillaume de Rorthais Discussion: https://postgr.es/m/20200204183906.115f693e@firost	2020-02-07 17:09:36 -03:00
Fujii Masao	cb5b28613d	Fix bug in Tid scan. Commit `147e3722f7` changed Tid scan so that it calls table_beginscan() and uses the scan option for seq scan. This change caused two issues. (1) The change caused Tid scan to take a predicate lock on the entire relation in serializable transaction even when relation-level lock is not necessary. This could lead to an unexpected serialization error. (2) The change caused Tid scan to increment the number of seq_scan in pg_stat_*_tables views even though it's not seq scan. This could confuse the users. This commit adds the scan option for Tid scan and makes Tid scan use it, to avoid those issues. Back-patch to v12, where the bug was introduced. Author: Tatsuhito Kasahara Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com	2020-02-07 22:06:31 +09:00
Andres Freund	b059d2f456	jit: Reference expression step functions via llvmjit_types. The main benefit of doing so is that this allows llvm to ensure that types match - previously that'd only be detected by a crash within the called function. There were a number of cases where we passed a superfluous parameter... To avoid needing to add all the functions to llvmjit.{c,h}, instead get them from the llvm module for llvmjit_types.c. Also use that for the functions from llvmjit_types already in llvmjit.h. Author: Soumyadeep Chakraborty and Andres Freund Discussion: https://postgr.es/m/CADwEdooww3wZv-sXSfatzFRwMuwa186LyTwkBfwEW6NjtooBPA@mail.gmail.com	2020-02-06 22:29:14 -08:00
Jeff Davis	4eaea3db15	Introduce TupleHashTableHash() and LookupTupleHashEntryHash(). Expose two new entry points: one for only calculating the hash value of a tuple, and another for looking up a hash entry when the hash value is already known. This will be useful for disk-based Hash Aggregation to avoid recomputing the hash value for the same tuple after saving and restoring it from disk. Discussion: https://postgr.es/m/37091115219dd522fd9ed67333ee8ed1b7e09443.camel%40j-davis.com	2020-02-06 20:34:01 -08:00
Andres Freund	e6f86f8dd9	jit: Remove redundancies in expression evaluation code generation. This merges the code emission for a number of opcodes by handling the behavioural difference more locally. This reduces code, and also improves the generated code a bit in some cases, by removing redundant constants. Author: Andres Freund Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de	2020-02-06 20:01:23 -08:00
Andres Freund	8c2769405f	jit: Reference functions by name in IOCOERCE steps. Previously we used constant function pointer addresses, which prevents inlining and other related optimizations. Author: Andres Freund Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de	2020-02-06 19:54:43 -08:00
Andres Freund	1fdb7f9789	expression eval: Don't redundantly keep track of AggState. It's already tracked via ExprState->parent, so we don't need to also include it in ExprEvalStep. When that code originally was written ExprState->parent didn't exist, but it since has been introduced in `6719b238e8`. Author: Andres Freund Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de	2020-02-06 19:54:43 -08:00
Andres Freund	1ec7679f1b	expression eval, jit: Minor code cleanups. This mostly consists of using C99 style for loops, moving variables into narrower scopes, and a smattering of other minor improvements. Done separately to make it easier to review patches with actual functional changes. Author: Andres Freund Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de	2020-02-06 19:54:15 -08:00
Michael Paquier	5ac4e9a12c	Fix typo in proc.c Author: Julien Rouhaud Discussion: https://postgr.es/m/20200206082333.GA95343@nol	2020-02-07 12:41:10 +09:00
Michael Paquier	414c2fd1e1	Revert "Add GUC checks for ssl_min_protocol_version and ssl_max_protocol_version" This reverts commit `41aadee`, as the GUC checks could run on older values with the new values used, and result in incorrect errors if both parameters are changed at the same time. Per complaint from Tom Lane. Discussion: https://postgr.es/m/27574.1581015893@sss.pgh.pa.us Backpatch-through: 12	2020-02-07 08:10:40 +09:00
Peter Eisentraut	fc7a5e9eaa	Ensure relcache consistency around generated columns In certain transient states, it's possible that a table has attributes with attgenerated set but no default expressions in pg_attrdef yet. In that case, the old code path would not set relation->rd_att->constr->has_generated_stored, unless relation->rd_att->constr was also populated for some other reason. There was probably no practical impact, but it's better to keep this consistent. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/20200115181105.ad6ab6dlgyww3lb6%40alap3.anarazel.de	2020-02-06 21:25:01 +01:00
Jeff Davis	7d4395d0a1	Refactor hash_agg_entry_size(). Consolidate the calculations for hash table size estimation. This will help with upcoming Hash Aggregation work that will add additional call sites.	2020-02-06 11:49:56 -08:00
Jeff Davis	c02fdc9223	Logical Tape Set: use min heap for freelist. Previously, the freelist of blocks was tracked as an occasionally-sorted array. A min heap is more resilient to larger freelists or more frequent changes between reading and writing. Discussion: https://postgr.es/m/97c46a59c27f3c38e486ca170fcbc618d97ab049.camel%40j-davis.com	2020-02-06 10:09:45 -08:00
Amit Kapila	cac8ce4a73	Fix typo. Reported-by: Amit Langote Author: Amit Langote Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CA+HiwqFNADeukaaGRmTqANbed9Fd81gLi08AWe_F86_942Gspw@mail.gmail.com	2020-02-06 15:57:02 +05:30
Fujii Masao	3ccc66dac6	Fix bug in LWLock statistics mechanism. Previously PostgreSQL built with -DLWLOCK_STATS could report more than one LWLock statistics entries for the same backend process and the same LWLock. This is strange and only one statistics should be output in that case, instead. The cause of this issue is that the key variable used for LWLock stats hash table was not fully initialized. The key consists of two fields and they were initialized. But the following 4 bytes allocated in the key variable for the alignment was not initialized. So even if the same key was specified, hash_search(HASH_ENTER) could not find the existing entry for that key and created new one. This commit fixes this issue by initializing the key variable with zero. As the side effect of this commit, the volume of LWLock statistics output would be reduced very much. Back-patch to v10, where commit `3761fe3c20` introduced the issue. Author: Fujii Masao Reviewed-by: Julien Rouhaud, Kyotaro Horiguchi Discussion: https://postgr.es/m/26359edb-798a-568f-d93a-6aafac49752d@oss.nttdata.com	2020-02-06 14:43:21 +09:00
Michael Paquier	b025f32e0b	Add leader_pid to pg_stat_activity This new field tracks the PID of the group leader used with parallel query. For parallel workers and the leader, the value is set to the PID of the group leader. So, for the group leader, the value is the same as its own PID. Note that this reflects what PGPROC stores in shared memory, so as leader_pid is NULL if a backend has never been involved in parallel query. If the backend is using parallel query or has used it at least once, the value is set until the backend exits. Author: Julien Rouhaud Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas Vondra Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com	2020-02-06 09:18:06 +09:00
Andrew Gierth	bf6cc19e34	Force tuple conversion when the source has missing attributes. Tuple conversion incorrectly concluded that no conversion was needed as long as all the attributes lined up. But if the source tuple has a missing attribute (from addition of a column with default), then the destination tupdesc might not reflect the same default. The typical symptom was that the affected columns would be unexpectedly NULL. Repair by always forcing conversion if the source has missing attributes, which will be filled in by the deform operation. (In theory we could optimize for when the destination has the same default, but that seemed overkill.) Backpatch to 11 where missing attributes were added. Per bug #16242. Vik Fearing (discovery, code, testing) and me (analysis, testcase). Discussion: https://postgr.es/m/16242-d1c9fca28445966b@postgresql.org	2020-02-05 20:21:20 +00:00
Alvaro Herrera	15d13e8291	Make vacuum buffer counters 64 bits wide Using 32 bit counters means they can now realistically wrap around when vacuuming extremely large tables. Because they're signed integers, stats printed by vacuum look very odd when they do. We'd love to backpatch this, but refrain because the variables are exported and could cause third-party code to break. Reviewed-by: Julien Rouhaud, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/20200131205926.GA16367@alvherre.pgsql	2020-02-05 16:59:29 -03:00
Thomas Munro	815c2f0972	Add kqueue(2) support to the WaitEventSet API. Use kevent(2) to wait for events on the BSD family of operating systems and macOS. This is similar to the epoll(2) support added for Linux by commit `98a64d0bd`. Author: Thomas Munro Reviewed-by: Andres Freund, Marko Tiikkaja, Tom Lane Tested-by: Mateusz Guzik, Matteo Beccati, Keith Fiske, Heikki Linnakangas, Michael Paquier, Peter Eisentraut, Rui DeSousa, Tom Lane, Mark Wong Discussion: https://postgr.es/m/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com	2020-02-05 17:35:57 +13:00
Thomas Munro	d9fe702a2c	Handle lack of DSM slots in parallel btree build, take 2. Commit `74618e77` added a new check intended to fix a bug, but put it in the wrong place so that parallel btree build was always disabled. Do the check after we've actually tried to create a DSM segment. Back-patch to 11, like the earlier commit. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzmDABkJzrNnvf%2BOULK-_A_j9gkYg_Dz-H62jzNv4eKQTw%40mail.gmail.com	2020-02-05 12:27:00 +13:00
Tom Lane	7d91b604d9	Fix handling of "Subplans Removed" field in EXPLAIN output. Commit `499be013d` added this field in a rather poorly-thought-through manner, with the result being that rather than being a field of the Append or MergeAppend plan node as intended (and as it seems to be, in text format), it was actually an element of the "Plans" subgroup. At least in JSON format, that's flat out invalid syntax, because "Plans" is an array not an object. While it's not hard to move the generation of the field so that it appears where it's supposed to, this does result in a visible change in field order in text format, in cases where a Append or MergeAppend plan node has any InitPlans attached. That's slightly annoying to do in stable branches; but the alternative of continuing to emit broken non-text formats seems worse. Also, since the set of fields emitted is not supposed to be data-dependent in non-text formats, make sure that "Subplans Removed" appears in Append and MergeAppend nodes even when it's zero, in those formats. (The previous coding made it look like it could appear in some other node types such as BitmapAnd, but we don't actually support runtime pruning there, so don't emit it in those cases.) Per bug #16171 from Mahadevan Ramachandran. Fix by Daniel Gustafsson and Tom Lane, reviewed by Hamid Akhtar. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/16171-b72259ab75505fa2@postgresql.org	2020-02-04 13:07:13 -05:00
Alvaro Herrera	1c7a0b387d	Add missing break out seqscan loop in logical replication When replica identity is FULL (an admittedly unusual case), the loop that searches for tuples in execReplication.c didn't stop scanning the table when once a matching tuple was found. Add the missing 'break'. Note slight behavior change: we now return the first matching tuple rather than the last one. They are supposed to be indistinguishable anyway, so this shouldn't matter. Author: Konstantin Knizhnik Discussion: https://postgr.es/m/379743f6-ae91-b866-f7a2-5624e6d2b0a4@postgrespro.ru	2020-02-03 18:59:12 -03:00
Michael Paquier	f1f10a1ba9	Add declaration-level assertions for compile-time checks Those new assertions can be used at file scope, outside of any function for compilation checks. This commit provides implementations for C and C++, and fallback implementations. Author: Peter Smith Reviewed-by: Andres Freund, Kyotaro Horiguchi, Dagfinn Ilmari Mannsåker, Michael Paquier Discussion: https://postgr.es/m/201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217	2020-02-03 14:48:42 +09:00
Andrew Gierth	1fd687a035	Optimizations for integer to decimal output. Using a lookup table of digit pairs reduces the number of divisions needed, and calculating the length upfront saves some work; these ideas are taken from the code previously committed for floats. David Fetter, reviewed by Kyotaro Horiguchi, Tels, and me. Discussion: https://postgr.es/m/20190924052620.GP31596%40fetter.org	2020-02-01 21:57:14 +00:00
Thomas Munro	93745f1e01	Fix memory leak on DSM slot exhaustion. If we attempt to create a DSM segment when no slots are available, we should return the memory to the operating system. Previously we did that if the DSM_CREATE_NULL_IF_MAXSEGMENTS flag was passed in, but we didn't do it if an error was raised. Repair. Back-patch to 9.4, where DSM segments arrived. Author: Thomas Munro Reviewed-by: Robert Haas Reported-by: Julian Backes Discussion: https://postgr.es/m/CA%2BhUKGKAAoEw-R4om0d2YM4eqT1eGEi6%3DQot-3ceDR-SLiWVDw%40mail.gmail.com	2020-02-01 14:29:13 +13:00
Tom Lane	870ad6a59b	Fix not-quite-right string comparison in parse_jsonb_index_flags(). This code would accept "strinX", where X is any 1-byte character, as meaning "string". Clearly it wasn't meant to do that. No back-patch, since this doesn't affect correct queries and there's some tiny chance we'd break somebody's incorrect query in a minor release. Report and patch by Dominik Czarnota. Discussion: https://postgr.es/m/CABEVAa1dU0mDCAfaT8WF2adVXTDsLVJy_izotg6ze_hh-cn8qQ@mail.gmail.com	2020-01-31 17:26:40 -05:00
Tom Lane	74b35eb468	Fix CheckAttributeType's handling of collations for ranges. Commit `fc7695891` changed CheckAttributeType to recurse into ranges, but made it pass down the wrong collation (always InvalidOid, since ranges as such have no collation). This would result in guaranteed failure when considering a range type whose subtype is collatable. Embarrassingly, we lack any regression tests that would expose such a problem (but fortunately, somebody noticed before we shipped this bug in any release). Fix it to pass down the range's subtype collation property instead, and add some regression test cases to exercise collatable-subtype ranges a bit more. Back-patch to all supported branches, as the previous patch was. Report and patch by Julien Rouhaud, test cases tweaked by me Discussion: https://postgr.es/m/CAOBaU_aBWqNweiGUFX0guzBKkcfJ8mnnyyGC_KBQmO12Mj5f_A@mail.gmail.com	2020-01-31 17:03:55 -05:00
Peter Eisentraut	7c23bfd25c	Sprinkle some const decorations This might help clarify the API a bit.	2020-01-31 12:52:08 +01:00
Thomas Munro	ef02fb15a3	Report time spent in posix_fallocate() as a wait event. When allocating DSM segments with posix_fallocate() on Linux (see commit `899bd785`), report this activity as a wait event exactly as we would if we were using file-backed DSM rather than shm_open()-backed DSM. Author: Thomas Munro Discussion: https://postgr.es/m/CA%2BhUKGKCSh4GARZrJrQZwqs5SYp0xDMRr9Bvb%2BHQzJKvRgL6ZA%40mail.gmail.com	2020-01-31 17:29:41 +13:00
Thomas Munro	d061ea21fc	Adjust DSM and DSA slot usage constants. When running a lot of large parallel queries concurrently, or a plan with a lot of separate Gather nodes, it is possible to run out of DSM slots. There are better solutions to these problems requiring architectural redesign work, but for now, let's adjust the constants so that it's more difficult to hit the limit. 1. Previously, a DSA area would create up to four segments at each size before doubling the size. After this commit, it will create only two at each size, so it ramps up faster and therefore needs fewer slots. 2. Previously, the total limit on DSM slots allowed for 2 per connection. Switch to 5 per connection. Also remove an obsolete nearby comment. Author: Thomas Munro Reviewed-by: Robert Haas, Andres Freund Discussion: https://postre.es/m/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com	2020-01-31 17:29:38 +13:00
Thomas Munro	74618e77b4	Handle lack of DSM slots in parallel btree build. If no DSM slots are available, a ParallelContext can still be created, but its seg pointer is NULL. Teach parallel btree build to cope with that by falling back to a regular non-parallel build, to avoid crashing with a segmentation fault. Back-patch to 11, where parallel CREATE INDEX landed. Reported-by: Nicola Contu Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CA%2BhUKGJgJEBnkuODBVomyK3MWFvDBbMVj%3Dgdt6DnRPU-5sQ6UQ%40mail.gmail.com	2020-01-31 10:25:34 +13:00
Alvaro Herrera	c9d2977519	Clean up newlines following left parentheses We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql	2020-01-30 13:42:14 -03:00
Alvaro Herrera	4e89c79a52	Remove excess parens in ereport() calls Cosmetic cleanup, not worth backpatching. Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql Reviewed-by: Tom Lane, Michael Paquier	2020-01-30 13:32:04 -03:00
Fujii Masao	e6f1e560e4	Make inherited TRUNCATE perform access permission checks on parent table only. Previously, TRUNCATE command through a parent table checked the permissions on not only the parent table but also the children tables inherited from it. This was a bug and inherited queries should perform access permission checks on the parent table only. This commit fixes that bug. Back-patch to all supported branches. Author: Amit Langote Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/CAHGQGwFHdSvifhJE+-GSNqUHSfbiKxaeQQ7HGcYz6SC2n_oDcg@mail.gmail.com	2020-01-31 00:42:06 +09:00
Michael Paquier	b0afdcad21	Fix slot data persistency when advancing physical replication slots Advancing a physical replication slot with pg_replication_slot_advance() did not mark the slot as dirty if any advancing was done, preventing the follow-up checkpoint to flush the slot data to disk. This caused the advancing to be lost even on clean restarts. This does not happen for logical slots as any advancing marked the slot as dirty. Per discussion, the original feature has been implemented so as in the event of a crash the slot may move backwards to a past LSN. This property is kept and more documentation is added about that. This commit adds some new TAP tests to check the persistency of physical and logical slots after advancing across clean restarts. Author: Alexey Kondratov, Michael Paquier Reviewed-by: Andres Freund, Kyotaro Horiguchi, Craig Ringer Discussion: https://postgr.es/m/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru Backpatch-through: 11	2020-01-30 11:14:02 +09:00
Tom Lane	50fc694e43	Invent "trusted" extensions, and remove the pg_pltemplate catalog. This patch creates a new extension property, "trusted". An extension that's marked that way in its control file can be installed by a non-superuser who has the CREATE privilege on the current database, even if the extension contains objects that normally would have to be created by a superuser. The objects within the extension will (by default) be owned by the bootstrap superuser, but the extension itself will be owned by the calling user. This allows replicating the old behavior around trusted procedural languages, without all the special-case logic in CREATE LANGUAGE. We have, however, chosen to loosen the rules slightly: formerly, only a database owner could take advantage of the special case that allowed installation of a trusted language, but now anyone who has CREATE privilege can do so. Having done that, we can delete the pg_pltemplate catalog, moving the knowledge it contained into the extension script files for the various PLs. This ends up being no change at all for the in-core PLs, but it is a large step forward for external PLs: they can now have the same ease of installation as core PLs do. The old "trusted PL" behavior was only available to PLs that had entries in pg_pltemplate, but now any extension can be marked trusted if appropriate. This also removes one of the stumbling blocks for our Python 2 -> 3 migration, since the association of "plpythonu" with Python 2 is no longer hard-wired into pg_pltemplate's initial contents. Exactly where we go from here on that front remains to be settled, but one problem is fixed. Patch by me, reviewed by Peter Eisentraut, Stephen Frost, and others. Discussion: https://postgr.es/m/5889.1566415762@sss.pgh.pa.us	2020-01-29 18:42:43 -05:00
Robert Haas	beb4699091	Move jsonapi.c and jsonapi.h to src/common. To make this work, (1) makeJsonLexContextCstringLen now takes the encoding to be used as an argument; (2) check_stack_depth() is made to do nothing in frontend code, and (3) elog(ERROR, ...) is changed to pg_log_fatal + exit in frontend code. Mark Dilger, reviewed and slightly revised by me. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com	2020-01-29 10:22:51 -05:00
Peter Eisentraut	dc788668bb	Fail if recovery target is not reached Before, if a recovery target is configured, but the archive ended before the target was reached, recovery would end and the server would promote without further notice. That was deemed to be pretty wrong. With this change, if the recovery target is not reached, it is a fatal error. Based-on-patch-by: Leif Gunnar Erlandsen <leif@lako.no> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@lako.no	2020-01-29 15:58:14 +01:00
Tom Lane	01d9676a53	Fix dangling pointer in EvalPlanQual machinery. EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark and relsubs_done arrays from a prior instantiation. But since they are allocated in the es_query_cxt of the recheckestate, that's just wrong; EvalPlanQualEnd() will blow away that storage. Therefore we were using storage that could have been reallocated to something else, causing all sorts of havoc. I think this was modeled on the old code's handling of es_epqTupleSlot, but since the code was anyway clearing the arrays at re-use, there's clearly no expectation of importing any outside state. So it's just a dubious savings of a couple of pallocs, which is negligible compared to setting up a new planstate tree. Therefore, just allocate the arrays always. (I moved the allocations slightly for readability.) In principle this bug could cause a problem whenever EPQ rechecks are needed in more than one target table of a ModifyTable plan node. In practice it seems not quite so easy to trigger as that; I couldn't readily duplicate a crash with a partitioned target table, for instance. That's probably down to incidental choices about when to free or reallocate stuff. The added isolation test case does seem to reliably show an assertion failure, though. Per report from Oleksii Kliukin. Back-patch to v12 where the bug was introduced (evidently by commit `3fb307bc4`). Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com	2020-01-28 17:26:37 -05:00
Heikki Linnakangas	30012a04a6	Fix randAccess setting in ReadRecord() Commit `38a957316d` got this backwards. Author: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20200128.194408.2260703306774646445.horikyota.ntt@gmail.com	2020-01-28 12:55:30 +02:00
Thomas Munro	11da6bccd1	Fix compile error on HP C. Per build farm animal anole, after commit `6f38d4dac3`.	2020-01-28 20:30:40 +13:00
Thomas Munro	78aaa0e823	Don't reset latch in ConditionVariablePrepareToSleep(). It's not OK to do that without calling CHECK_FOR_INTERRUPTS(). Let the next wait loop deal with it, following the usual pattern. One consequence of this bug was that a SIGTERM delivered in a very narrow timing window could leave a parallel worker process waiting forever for a condition variable that will never be signaled, after an error was raised in other process. The code is a bit different in the stable branches due to commit `1321509f`, making problems less likely there. No back-patch for now, but we may finish up deciding to make a similar change after more discussion. Author: Thomas Munro Reviewed-by: Shawn Debnath Reported-by: Tomas Vondra Discussion: https://postgr.es/m/CA%2BhUKGJOm8zZHjVA8svoNT3tHY0XdqmaC_kHitmgXDQM49m1dA%40mail.gmail.com	2020-01-28 15:28:36 +13:00
Amit Kapila	05f18c6b6b	Added relation name in error messages for constraint checks. This gives more information to the user about the error and it makes such messages consistent with the other similar messages in the code. Reported-by: Simon Riggs Author: Mahendra Singh and Simon Riggs Reviewed-by: Beena Emerson and Amit Kapila Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com	2020-01-28 07:48:10 +05:30
Michael Paquier	ff8ca5fadd	Add connection parameters to control SSL protocol min/max in libpq These two new parameters, named sslminprotocolversion and sslmaxprotocolversion, allow to respectively control the minimum and the maximum version of the SSL protocol used for the SSL connection attempt. The default setting is to allow any version for both the minimum and the maximum bounds, causing libpq to rely on the bounds set by the backend when negotiating the protocol to use for an SSL connection. The bounds are checked when the values are set at the earliest stage possible as this makes the checks independent of any SSL implementation. Author: Daniel Gustafsson Reviewed-by: Michael Paquier, Cary Huang Discussion: https://postgr.es/m/4F246AE3-A7AE-471E-BD3D-C799D3748E03@yesql.se	2020-01-28 10:40:48 +09:00
Thomas Munro	6f38d4dac3	Remove dependency on HeapTuple from predicate locking functions. The following changes make the predicate locking functions more generic and suitable for use by future access methods: - PredicateLockTuple() is renamed to PredicateLockTID(). It takes ItemPointer and inserting transaction ID instead of HeapTuple. - CheckForSerializableConflictIn() takes blocknum instead of buffer. - CheckForSerializableConflictOut() no longer takes HeapTuple or buffer. Author: Ashwin Agrawal Reviewed-by: Andres Freund, Kuntal Ghosh, Thomas Munro Discussion: https://postgr.es/m/CALfoeiv0k3hkEb3Oqk%3DziWqtyk2Jys1UOK5hwRBNeANT_yX%2Bng%40mail.gmail.com	2020-01-28 13:13:04 +13:00
Tom Lane	4589c6a2a3	Apply project best practices to switches over enum values. In the wake of `1f3a02173`, assorted buildfarm members were warning about "control reaches end of non-void function" or the like. Do what we've done elsewhere: in place of a "default" switch case that will prevent the compiler from warning about unhandled enum values, put a catchall elog() after the switch. And return a dummy value to satisfy compilers that don't know elog() doesn't return.	2020-01-27 18:46:30 -05:00
Robert Haas	73ce2a03f3	Move some code from jsonapi.c to jsonfuncs.c. Specifically, move those functions that depend on ereport() from jsonapi.c to jsonfuncs.c, in preparation for allowing jsonapi.c to be used from frontend code. A few cases where elog(ERROR, ...) is used for can't-happen conditions are left alone; we can handle those in some other way in frontend code. Reviewed by Mark Dilger and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com	2020-01-27 11:22:13 -05:00
Robert Haas	1f3a021730	Adjust pg_parse_json() so that it does not directly ereport(). Instead, it now returns a value indicating either success or the type of error which occurred. The old behavior is still available by calling pg_parse_json_or_ereport(). If the new interface is used, an error can be thrown by passing the return value of pg_parse_json() to json_ereport_error(). pg_parse_json() can still elog() in can't-happen cases, but it seems like that issue is best handled separately. Adjust json_lex() and json_count_array_elements() to return an error code, too. This is all in preparation for making the backend's json parser available to frontend code. Reviewed and/or tested by Mark Dilger and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com	2020-01-27 11:04:51 -05:00
Thomas Munro	3e4818e9dd	Avoid unnecessary shm writes in Parallel Hash Join. Currently, Parallel Hash Join cannot be used for full/right joins, so there is no point in setting the match flag. It turns out that the cache coherence traffic generated by those writes slows down large systems running many-core joins, so let's stop doing that. In future, if we need to use match bits in parallel joins, we might want to consider setting them only if not already set. Back-patch to 11, where Parallel Hash Join arrived. Reported-by: Deng, Gang Discussion: https://postgr.es/m/0F44E799048C4849BAE4B91012DB910462E9897A%40SHSMSX103.ccr.corp.intel.com	2020-01-27 15:07:03 +13:00
Michael Paquier	10a525230f	Fix some memory leaks and improve restricted token handling on Windows The leaks have been detected by a Coverity run on Windows. No backpatch is done as the leaks are minor. While on it, make restricted token creation more consistent in its error handling by logging an error instead of a warning if missing advapi32.dll, which was missing in the NT4 days. Any modern platform should have this DLL around. Now, if the library is not there, an error is still reported back to the caller, and nothing is done do there is no behavior change done in this commit. Author: Ranier Vilela Discussion: https://postgr.es/m/CAEudQApa9MG0foPkgPX87fipk=vhnF2Xfg+CfUyR08h4R7Mywg@mail.gmail.com	2020-01-27 11:02:05 +09:00
Tom Lane	3ec20c7091	Fix EXPLAIN (SETTINGS) to follow policy about when to print empty fields. In non-TEXT output formats, the "Settings" field should appear when requested, even if it would be empty. Also, get rid of the premature optimization of counting all the GUC_EXPLAIN variables at startup. Since there was no provision for adjusting that count later, all it'd take would be some extension marking a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp. We could make get_explain_guc_options() count those variables on-the-fly, or dynamically resize its array ... but TBH I do not think that making a transient array of pointers a bit smaller is worth any extra complication, especially when you consider all the other transient space EXPLAIN eats. So just allocate that array at the max possible size. In HEAD, also add some regression test coverage for this feature. Because of the memory-stomp hazard, back-patch to v12 where this feature was added. Discussion: https://postgr.es/m/19416.1580069629@sss.pgh.pa.us	2020-01-26 16:32:19 -05:00
Thomas Munro	f37ff03478	Refactor confusing code in _mdfd_openseg(). As reported independently by a couple of people, _mdfd_openseg() is coded in a way that seems to imply that the segments could be opened in an order that isn't strictly sequential. Even if that were true, it's also using the wrong comparison. It's not an active bug, since the condition is always true anyway, but it's confusing, so replace it with an assertion. Author: Thomas Munro Reviewed-by: Andres Freund, Kyotaro Horiguchi, Noah Misch Discussion: https://postgr.es/m/CA%2BhUKG%2BNBw%2BuSzxF1os-SO6gUuw%3DcqO5DAybk6KnHKzgGvxhxA%40mail.gmail.com Discussion: https://postgr.es/m/20191222091930.GA1280238%40rfd.leadboat.com	2020-01-27 09:12:56 +13:00
Heikki Linnakangas	38a957316d	Refactor XLogReadRecord(), adding XLogBeginRead() function. The signature of XLogReadRecord() required the caller to pass the starting WAL position as argument, or InvalidXLogRecPtr to continue reading at the end of previous record. That's slightly awkward to the callers, as most of them don't want to randomly jump around in the WAL stream, but start reading at one position and then read everything from that point onwards. Remove the 'RecPtr' argument and add a new function XLogBeginRead() to specify the starting position instead. That's more convenient for the callers. Also, xlogreader holds state that is reset when you change the starting position, so having a separate function for doing that feels like a more natural fit. This changes XLogFindNextRecord() function so that it doesn't reset the xlogreader's state to what it was before the call anymore. Instead, it positions the xlogreader to the found record, like XLogBeginRead(). Reviewed-by: Kyotaro Horiguchi, Alvaro Herrera Discussion: https://www.postgresql.org/message-id/5382a7a3-debe-be31-c860-cb810c08f366%40iki.fi	2020-01-26 11:39:00 +02:00
Tom Lane	1001368497	Clean up EXPLAIN's handling of per-worker details. Previously, it was possible for EXPLAIN ANALYZE of a parallel query to produce several different "Workers" fields for a single plan node, because different portions of explain.c independently generated per-worker data and wrapped that output in separate fields. This is pretty bogus, especially for the structured output formats: even if it's not technically illegal, most programs would have a hard time dealing with such data. To improve matters, add infrastructure that allows redirecting per-worker values into a side data structure, and then collect that data into a single "Workers" field after we've finished running all the relevant code for a given plan node. There are a few visible side-effects: * In text format, instead of something like Sort Method: external merge Disk: 4920kB Worker 0: Sort Method: external merge Disk: 5880kB Worker 1: Sort Method: external merge Disk: 5920kB Buffers: shared hit=682 read=10188, temp read=1415 written=2101 Worker 0: actual time=130.058..130.324 rows=1324 loops=1 Buffers: shared hit=337 read=3489, temp read=505 written=739 Worker 1: actual time=130.273..130.512 rows=1297 loops=1 Buffers: shared hit=345 read=3507, temp read=505 written=744 you get Sort Method: external merge Disk: 4920kB Buffers: shared hit=682 read=10188, temp read=1415 written=2101 Worker 0: actual time=130.058..130.324 rows=1324 loops=1 Sort Method: external merge Disk: 5880kB Buffers: shared hit=337 read=3489, temp read=505 written=739 Worker 1: actual time=130.273..130.512 rows=1297 loops=1 Sort Method: external merge Disk: 5920kB Buffers: shared hit=345 read=3507, temp read=505 written=744 * When JIT is enabled, any relevant per-worker JIT stats are attached to the child node of the Gather or Gather Merge node, which is where the other per-worker output has always been. Previously, that info was attached directly to a Gather node, or missed entirely for Gather Merge. * A query's summary JIT data no longer includes a bogus "Worker Number: -1" field. A notable code-level change is that indenting for lines of text-format output should now be handled by calling "ExplainIndentText(es)", instead of hard-wiring how much space to emit. This seems a good deal cleaner anyway. This patch also adds a new "explain.sql" regression test script that's dedicated to testing EXPLAIN. There is more that can be done in that line, certainly, but for now it just adds some coverage of the XML and YAML output formats, which had been completely untested. Although this is surely a bug fix, it's not clear that people would be happy with rearranging EXPLAIN output in a minor release, so apply to HEAD only. Maciek Sakrejda and Tom Lane, based on an idea of Andres Freund's; reviewed by Georgios Kokolatos Discussion: https://postgr.es/m/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com	2020-01-25 18:16:42 -05:00
Dean Rasheed	13661ddd7e	Add functions gcd() and lcm() for integer and numeric types. These compute the greatest common divisor and least common multiple of a pair of numbers using the Euclidean algorithm. Vik Fearing, reviewed by Fabien Coelho. Discussion: https://postgr.es/m/adbd3e0b-e3f1-5bbc-21db-03caf1cef0f7@2ndquadrant.com	2020-01-25 14:00:59 +00:00
Robert Haas	530609aa42	Remove jsonapi.c's lex_accept(). At first glance, this function seems useful, but it actually increases the amount of code required rather than decreasing it. Inline the logic into the callers instead; most callers don't use the 'lexeme' argument for anything and as a result considerable simplification is possible. Along the way, fix the header comment for the nearby function lex_expect(), which mislabeled it as lex_accept(). Patch by me, reviewed by David Steele, Mark Dilger, and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com	2020-01-24 10:29:52 -08:00
Robert Haas	11b5e3e35d	Split JSON lexer/parser from 'json' data type support. Keep the code that pertains to the 'json' data type in json.c, but move the lexing and parsing code to a new file jsonapi.c, a name I chose because the corresponding prototypes are in jsonapi.h. This seems like a logical division, because the JSON lexer and parser are also used by the 'jsonb' data type, but the SQL-callable functions in json.c are a separate thing. Also, the new jsonapi.c file needs to include far fewer header files than json.c, which seems like a good sign that this is an appropriate place to insert an abstraction boundary. I took the opportunity to remove a few apparently-unneeded includes from json.c at the same time. Patch by me, reviewed by David Steele, Mark Dilger, and Andrew Dunstan. The previous commit was, too, but I forgot to note it in the commit message. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com	2020-01-24 10:17:43 -08:00
Robert Haas	ce0425b162	Adjust src/include/utils/jsonapi.h so it's not backend-only. The major change here is that we no longer include jsonb.h into jsonapi.h. The reason that was necessary is that jsonapi.h included several prototypes functions in jsonfuncs.c that depend on the Jsonb type. Move those prototypes to a new header, jsonfuncs.h, and include it where needed. The other change is that JsonEncodeDateTime is now declared in json.h rather than jsonapi.h. Taken together, these steps eliminate all dependencies of jsonapi.h on backend-only data types and header files, so that it can potentially be included in frontend code.	2020-01-24 09:58:37 -08:00
Fujii Masao	d694e0bb79	Add pg_file_sync() to adminpack extension. This function allows us to fsync the specified file or directory. It's useful, for example, when we want to sync the file that pg_file_write() writes out or that COPY TO exports the data into, for durability. Author: Fujii Masao Reviewed-By: Julien Rouhaud, Arthur Zakirov, Michael Paquier, Atsushi Torikoshi Discussion: https://www.postgresql.org/message-id/CAHGQGwGY8uzZ_k8dHRoW1zDcy1Z7=5GQ+So4ZkVy2u=nLsk=hA@mail.gmail.com	2020-01-24 20:42:52 +09:00
Tom Lane	9a3a75cb81	Fix an oversight in commit `4c70098ff`. I had supposed that the from_char_seq_search() call sites were all passing the constant arrays you'd expect them to pass ... but on looking closer, the one for DY format was passing the days[] array not days_short[]. This accidentally worked because the day abbreviations in English are all the same as the first three letters of the full day names. However, once we took out the "maximum comparison length" logic, it stopped working. As penance for that oversight, add regression test cases covering this, as well as every other switch case in DCH_from_char() that was not reached according to the code coverage report. Also, fold the DCH_RM and DCH_rm cases into one --- now that seq_search is case independent, there's no need to pass different comparison arrays for those cases. Back-patch, as the previous commit was.	2020-01-23 16:15:32 -05:00
Tom Lane	4c70098ffa	Clean up formatting.c's logic for matching constant strings. seq_search(), which is used to match input substrings to constants such as month and day names, had a lot of bizarre and unnecessary behaviors. It was mostly possible to avert our eyes from that before, but we don't want to duplicate those behaviors in the upcoming patch to allow recognition of non-English month and day names. So it's time to clean this up. In particular: * seq_search scribbled on the input string, which is a pretty dangerous thing to do, especially in the badly underdocumented way it was done here. Fortunately the input string is a temporary copy, but that was being made three subroutine levels away, making it something easy to break accidentally. The behavior is externally visible nonetheless, in the form of odd case-folding in error reports about unrecognized month/day names. The scribbling is evidently being done to save a few calls to pg_tolower, but that's such a cheap function (at least for ASCII data) that it's pretty pointless to worry about. In HEAD I switched it to be pg_ascii_tolower to ensure it is cheap in all cases; but there are corner cases in Turkish where this'd change behavior, so leave it as pg_tolower in the back branches. * seq_search insisted on knowing the case form (all-upper, all-lower, or initcap) of the constant strings, so that it didn't have to case-fold them to perform case-insensitive comparisons. This likewise seems like excessive micro-optimization, given that pg_tolower is certainly very cheap for ASCII data. It seems unsafe to assume that we know the case form that will come out of pg_locale.c for localized month/day names, so it's better just to define the comparison rule as "downcase all strings before comparing". (The choice between downcasing and upcasing is arbitrary so far as English is concerned, but it might not be in other locales, so follow citext's lead here.) * seq_search also had a parameter that'd cause it to report a match after a maximum number of characters, even if the constant string were longer than that. This was not actually used because no caller passed a value small enough to cut off a comparison. Replicating that behavior for localized month/day names seems expensive as well as useless, so let's get rid of that too. * from_char_seq_search used the maximum-length parameter to truncate the input string in error reports about not finding a matching name. This leads to rather confusing reports in many cases. Worse, it is outright dangerous if the input string isn't all-ASCII, because we risk truncating the string in the middle of a multibyte character. That'd lead either to delivering an illegible error message to the client, or to encoding-conversion failures that obscure the actual data problem. Get rid of that in favor of truncating at whitespace if any (a suggestion due to Alvaro Herrera). In addition to fixing these things, I const-ified the input string pointers of DCH_from_char and its subroutines, to make sure there aren't any other scribbling-on-input problems. The risk of generating a badly-encoded error message seems like enough of a bug to justify back-patching, so patch all supported branches. Discussion: https://postgr.es/m/29432.1579731087@sss.pgh.pa.us	2020-01-23 13:42:09 -05:00
Michael Paquier	f942dfb952	Clarify some comments in vacuumlazy.c Author: Justin Pryzby Discussion: https://postgr.es/m/20200113004542.GA26045@telsasoft.com	2020-01-23 15:56:56 +09:00
Fujii Masao	41c184bc64	Add GUC ignore_invalid_pages. Detection of WAL records having references to invalid pages during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on causes the system to ignore those WAL records (but still report a warning), and continue recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. However, it may allow you to get past the PANIC-level error, to finish the recovery, and to cause the server to start up. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAHGQGwHCK6f77yeZD4MHOnN+PaTf6XiJfEB+Ce7SksSHjeAWtg@mail.gmail.com	2020-01-22 11:56:34 +09:00
Amit Kapila	79a3efb84d	Fix the computation of max dead tuples during the vacuum. In commit `40d964ec99`, we changed the way memory is allocated for dead tuples but forgot to update the place where we compute the maximum number of dead tuples. This could lead to invalid memory requests. Reported-by: Andres Freund Diagnosed-by: Andres Freund Author: Masahiko Sawada Reviewed-by: Amit Kapila and Dilip Kumar Discussion: https://postgr.es/m/20200121060020.e3cr7s7fj5rw4lok@alap3.anarazel.de	2020-01-22 07:43:51 +05:30
Michael Paquier	a904abe2e2	Fix concurrent indexing operations with temporary tables Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY on a temporary relation with ON COMMIT actions triggered unexpected errors because those operations use multiple transactions internally to complete their work. Here is for example one confusing error when using ON COMMIT DELETE ROWS: ERROR: index "foo" already contains data Issues related to temporary relations and concurrent indexing are fixed in this commit by enforcing the non-concurrent path to be taken for temporary relations even if using CONCURRENTLY, transparently to the user. Using a non-concurrent path does not matter in practice as locks cannot be taken on a temporary relation by a session different than the one owning the relation, and the non-concurrent operation is more effective. The problem exists with REINDEX since v12 with the introduction of CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for those commands. In all supported versions, this caused only confusing error messages to be generated. Note that with REINDEX, it was also possible to issue a REINDEX CONCURRENTLY for a temporary relation owned by a different session, leading to a server crash. The idea to enforce transparently the non-concurrent code path for temporary relations comes originally from Andres Freund. Reported-by: Manuel Rigger Author: Michael Paquier, Heikki Linnakangas Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com Backpatch-through: 9.4	2020-01-22 09:49:18 +09:00
Tom Lane	9b9c5f279e	Clarify behavior of adding and altering a column in same ALTER command. The behavior of something like ALTER TABLE transactions ADD COLUMN status varchar(30) DEFAULT 'old', ALTER COLUMN status SET default 'current'; is to fill existing table rows with 'old', not 'current'. That's intentional and desirable for a couple of reasons: * It makes the behavior the same whether you merge the sub-commands into one ALTER command or give them separately; * If we applied the new default while filling the table, there would be no way to get the existing behavior in one SQL command. The same reasoning applies in cases that add a column and then manipulate its GENERATED/IDENTITY status in a second sub-command, since the generation expression is really just a kind of default. However, that wasn't very obvious (at least not to me; earlier in the referenced discussion thread I'd thought it was a bug to be fixed). And it certainly wasn't documented. Hence, add documentation, code comments, and a test case to clarify that this behavior is all intentional. In passing, adjust ATExecAddColumn's defaults-related relkind check so that it matches up exactly with ATRewriteTables, instead of being effectively (though not literally) the negated inverse condition. The reasoning can be explained a lot more concisely that way, too (not to mention that the comment now matches the code, which it did not before). Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us	2020-01-21 16:17:21 -05:00
Andres Freund	affdde2e15	Fix edge case leading to agg transitions skipping ExecAggTransReparent() calls. The code checking whether an aggregate transition value needs to be reparented into the current context has always only compared the transition return value with the previous transition value by datum, i.e. without regard for NULLness. This normally works, because when the transition function returns NULL (via fcinfo->isnull), it'll return a value that won't be the same as its input value. But there's no hard requirement that that's the case. And it turns out, it's possible to hit this case (see discussion or reproducers), leading to a non-null transition value not being reparented, followed by a crash caused by that. Instead of adding another comparison of NULLness, instead have ExecAggTransReparent() ensure that pergroup->transValue ends up as 0 when the new transition value is NULL. That avoids having to add an additional branch to the much more common cases of the transition function returning the old transition value (which is a pointer in this case), and when the new value is different, but not NULL. In branches since `69c3936a14`, also deduplicate the reparenting code between the expression evaluation based transitions, and the path for ordered aggregates. Reported-By: Teodor Sigaev, Nikita Glukhov Author: Andres Freund Discussion: https://postgr.es/m/bd34e930-cfec-ea9b-3827-a8bc50891393@sigaev.ru Backpatch: 9.4-, this issue has existed since at least 7.4	2020-01-20 23:26:51 -08:00
Tom Lane	31f403e95f	Further tweaking of jsonb_set_lax(). Some buildfarm members were still warning about this, because in `9c679a08f` I'd missed decorating one of the ereport() code paths with a dummy return. Also, adjust the error messages to be more in line with project style guide.	2020-01-20 14:26:56 -05:00
Heikki Linnakangas	4c87010981	Fix crash in BRIN inclusion op functions, due to missing datum copy. The BRIN add_value() and union() functions need to make a longer-lived copy of the argument, if they want to store it in the BrinValues struct also passed as argument. The functions for the "inclusion operator classes" used with box, range and inet types didn't take into account that the union helper function might return its argument as is, without making a copy. Check for that case, and make a copy if necessary. That case arises at least with the range_union() function, when one of the arguments is an 'empty' range: CREATE TABLE brintest (n numrange); CREATE INDEX brinidx ON brintest USING brin (n); INSERT INTO brintest VALUES ('empty'); INSERT INTO brintest VALUES (numrange(0, 2^1000::numeric)); INSERT INTO brintest VALUES ('(-1, 0)'); SELECT brin_desummarize_range('brinidx', 0); SELECT brin_summarize_range('brinidx', 0); Backpatch down to 9.5, where BRIN was introduced. Discussion: https://www.postgresql.org/message-id/e6e1d6eb-0a67-36aa-e779-bcca59167c14%40iki.fi Reviewed-by: Emre Hasegeli, Tom Lane, Alvaro Herrera	2020-01-20 10:36:35 +02:00
Amit Kapila	40d964ec99	Allow vacuum command to process indexes in parallel. This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. This option can't be used with the FULL option. Each index is processed by at most one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes. The parallel degree is either specified by the user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The index can participate in parallel vacuum iff it's size is greater than min_parallel_index_scan_size. Author: Masahiko Sawada and Amit Kapila Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra, Mahendra Singh and Sergei Kornilov Tested-by: Mahendra Singh and Prabhat Sahu Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com	2020-01-20 07:57:49 +05:30
Tom Lane	9c679a08f0	Silence minor compiler warnings. Ensure that ClassifyUtilityCommandAsReadOnly() has defined behavior even if TransactionStmt.kind has a value that's not one of the declared values for its enum. Suppress warnings from compilers that don't know that elog(ERROR) doesn't return, in ClassifyUtilityCommandAsReadOnly() and jsonb_set_lax(). Per Coverity and buildfarm.	2020-01-19 16:04:36 -05:00
Heikki Linnakangas	7aaefadaac	Remove separate files for the initial contents of pg_(sh)description This data was only in separate files because it was the most convenient way to handle it with a shell script. Now that we use a general-purpose programming language, it's easy to assemble the data into the same format as the rest of the catalogs and output it into postgres.bki. This allows removal of some special-purpose code from initdb.c. Discussion: https://www.postgresql.org/message-id/CACPNZCtVFtjHre6hg9dput0qRPp39pzuyA2A6BT8wdgrRy%2BQdA%40mail.gmail.com Author: John Naylor	2020-01-19 13:54:58 +02:00
Michael Paquier	41aadeeb12	Add GUC checks for ssl_min_protocol_version and ssl_max_protocol_version Mixing incorrect bounds set in the SSL context leads to confusing error messages generated by OpenSSL which are hard to act on. New checks are added within the GUC machinery to improve the user experience as they apply to any SSL implementation, not only OpenSSL, and doing the checks beforehand avoids the creation of a SSL during a reload (or startup) which we know will never be used anyway. Backpatch down to 12, as those parameters have been introduced by `e73e67c`. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz Backpatch-through: 12	2020-01-18 12:32:43 +09:00
Alexander Korotkov	4b754d6c16	Avoid full scan of GIN indexes when possible The strategy of GIN index scan is driven by opclass-specific extract_query method. This method that needed search mode is GIN_SEARCH_MODE_ALL. This mode means that matching tuple may contain none of extracted entries. Simple example is '!term' tsquery, which doesn't need any term to exist in matching tsvector. In order to handle such scan key GIN calculates virtual entry, which contains all TIDs of all entries of attribute. In fact this is full scan of index attribute. And typically this is very slow, but allows to handle some queries correctly in GIN. However, current algorithm calculate such virtual entry for each GIN_SEARCH_MODE_ALL scan key even if they are multiple for the same attribute. This is clearly not optimal. This commit improves the situation by introduction of "exclude only" scan keys. Such scan keys are not capable to return set of matching TIDs. Instead, they are capable only to filter TIDs produced by normal scan keys. Therefore, each attribute should contain at least one normal scan key, while rest of them may be "exclude only" if search mode is GIN_SEARCH_MODE_ALL. The same optimization might be applied to the whole scan, not per-attribute. But that leads to NULL values elimination problem. There is trade-off between multiple possible ways to do this. We probably want to do this later using some cost-based decision algorithm. Discussion: https://postgr.es/m/CAOBaU_YGP5-BEt5Cc0%3DzMve92vocPzD%2BXiZgiZs1kjY0cj%3DXBg%40mail.gmail.com Author: Nikita Glukhov, Alexander Korotkov, Tom Lane, Julien Rouhaud Reviewed-by: Julien Rouhaud, Tomas Vondra, Tom Lane	2020-01-18 01:11:39 +03:00
Tom Lane	41c6f9db25	Repair more failures with SubPlans in multi-row VALUES lists. Commit `9b63c13f0` turns out to have been fundamentally misguided: the parent node's subPlan list is by no means the only way in which a child SubPlan node can be hooked into the outer execution state. As shown in bug #16213 from Matt Jibson, we can also get short-lived tuple table slots added to the outer es_tupleTable list. At this point I have little faith that there aren't other possible connections as well; the long time it took to notice this problem shows that this isn't a heavily-exercised situation. Therefore, revert that fix, returning to the coding that passed a NULL parent plan pointer down to the transiently-built subexpressions. That gives us a pretty good guarantee that they won't hook into the outer executor state in any way. But then we need some other solution to make SubPlans work. Adopt the solution speculated about in the previous commit's log message: do expression initialization at plan startup for just those VALUES rows containing SubPlans, abandoning the goal of reclaiming memory intra-query for those rows. In practice it seems unlikely that queries containing a vast number of VALUES rows would be using SubPlans in them, so this should not give up much. (BTW, this test case also refutes my claim in connection with the prior commit that the issue only arises with use of LATERAL. That was just wrong: some variants of SubLink always produce SubPlans.) As with previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/16213-871ac3bc208ecf23@postgresql.org	2020-01-17 16:17:31 -05:00
Alvaro Herrera	15cac3a523	Set ReorderBufferTXN->final_lsn more eagerly ... specifically, set it incrementally as each individual change is spilled down to disk. This way, it is set correctly when the transaction disappears without trace, ie. without leaving an XACT_ABORT wal record. (This happens when the server crashes midway through a transaction.) Failing to have final_lsn prevents ReorderBufferRestoreCleanup() from working, since it needs the final_lsn in order to know the endpoint of its iteration through spilled files. Commit `df9f682c7b` already tried to fix the problem, but it didn't set the final_lsn in all cases. Revert that, since it's no longer needed. Author: Vignesh C Reviewed-by: Amit Kapila, Dilip Kumar Discussion: https://postgr.es/m/CALDaNm2CLk+K9JDwjYST0sPbGg5AQdvhUt0jbKyX_HdAE0jk3A@mail.gmail.com	2020-01-17 18:00:39 -03:00
Tomas Vondra	543852fd8b	Allocate freechunks bitmap as part of SlabContext The bitmap used by SlabCheck to cross-check free chunks in a block used to be allocated for each SlabCheck call, and was never freed. The memory leak could be fixed by simply adding a pfree call, but it's actually a bad idea to do any allocations in SlabCheck at all as it assumes the state of the memory management as a whole is sane. So instead we allocate the bitmap as part of SlabContext, which means we don't need to do any allocations in SlabCheck and the bitmap goes away together with the SlabContext. Backpatch to 10, where the Slab context was introduced. Author: Tomas Vondra Reported-by: Andres Freund Reviewed-by: Tom Lane Backpatch-through: 10 Discussion: https://www.postgresql.org/message-id/20200116044119.g45f7pmgz4jmodxj%40alap3.anarazel.de	2020-01-17 15:29:11 +01:00
Andrew Dunstan	a83586b554	Add a non-strict version of jsonb_set jsonb_set_lax() is the same as jsonb_set, except that it takes and extra argument that specifies what to do if the value argument is NULL. The default is 'use_json_null'. Other possibilities are 'raise_exception', 'return_target' and 'delete_key', all these behaviours having been suggested as reasonable by various users. Discussion: https://postgr.es/m/375873e2-c957-3a8d-64f9-26c43c2b16e7@2ndQuadrant.com Reviewed by: Pavel Stehule	2020-01-17 11:52:39 +10:30
Michael Paquier	f7cd5896a6	Move OpenSSL routines for min/max protocol setting to src/common/ Two routines have been added in OpenSSL 1.1.0 to set the protocol bounds allowed within a given SSL context: - SSL_CTX_set_min_proto_version - SSL_CTX_set_max_proto_version As Postgres supports OpenSSL down to 1.0.1 (as of HEAD), equivalent replacements exist in the tree, which are only available for the backend. A follow-up patch is planned to add control of the SSL protocol bounds for libpq, so move those routines to src/common/ so as libpq can use them. Author: Daniel Gustafsson Discussion: https://postgr.es/m/4F246AE3-A7AE-471E-BD3D-C799D3748E03@yesql.se	2020-01-17 10:06:17 +09:00
Tom Lane	5afaa2e426	Rationalize code placement between wchar.c, encnames.c, and mbutils.c. Move all the backend-only code that'd crept into wchar.c and encnames.c into mbutils.c. To remove the last few #ifdef dependencies from wchar.c and encnames.c, also make the following changes: * Adjust get_encoding_name_for_icu to return NULL, not throw an error, for unsupported encodings. Its sole caller can perfectly well throw an error instead. (While at it, I also made this function and its sibling is_encoding_supported_by_icu proof against out-of-range encoding IDs.) * Remove the overlength-name error condition from pg_char_to_encoding. It's completely silly not to treat that just like any other the-name-is-not-in-the-table case. Also, get rid of pg_mic_mblen --- there's no obvious reason why conv.c shouldn't call pg_mule_mblen instead. Other than that, this is just code movement and comment-polishing with no functional changes. Notably, I reordered declarations in pg_wchar.h to show which functions are frontend-accessible and which are not. Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com	2020-01-16 18:08:21 -05:00
Tom Lane	e6afa8918c	Move wchar.c and encnames.c to src/common/. Formerly, various frontend directories symlinked these two sources and then built them locally. That's an ancient, ugly hack, and we now have a much better way: put them into libpgcommon. So do that. (The immediate motivation for this is the prospect of having to introduce still more symlinking if we don't.) This commit moves these two files absolutely verbatim, for ease of reviewing the git history. There's some follow-on work to be done that will modify them a bit. Robert Haas, Tom Lane Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com	2020-01-16 15:58:55 -05:00
Robert Haas	2eb34ac369	Fix problems with "read only query" checks, and refactor the code. Previously, check_xact_readonly() was responsible for determining which types of queries could not be run in a read-only transaction, standard_ProcessUtility() was responsibility for prohibiting things which were allowed in read only transactions but not in recovery, and utility commands were basically prohibited in bulk in parallel mode by calls to CommandIsReadOnly() in functions.c and spi.c. This situation was confusing and error-prone. Accordingly, move all the checks to a new function ClassifyUtilityCommandAsReadOnly(), which determines the degree to which a given statement is read only. In the old code, check_xact_readonly() inadvertently failed to handle several statement types that actually should have been prohibited, specifically T_CreatePolicyStmt, T_AlterPolicyStmt, T_CreateAmStmt, T_CreateStatsStmt, T_AlterStatsStmt, and T_AlterCollationStmt. As a result, thes statements were erroneously allowed in read only transactions, parallel queries, and standby operation. Generally, they would fail anyway due to some lower-level error check, but we shouldn't rely on that. In the new code structure, future omissions of this type should cause ClassifyUtilityCommandAsReadOnly() to complain about an unrecognized node type. As a fringe benefit, this means we can allow certain types of utility commands in parallel mode, where it's safe to do so. This allows ALTER SYSTEM, CALL, DO, CHECKPOINT, COPY FROM, EXPLAIN, and SHOW. It might be possible to allow additional commands with more work and thought. Along the way, document the thinking process behind the current set of checks, as per discussion especially with Peter Eisentraut. There is some interest in revising some of these rules, but that seems like a job for another patch. Patch by me, reviewed by Tom Lane, Stephen Frost, and Peter Eisentraut. Discussion: http://postgr.es/m/CA+TgmoZ_rLqJt5sYkvh+JpQnfX0Y+B2R+qfi820xNih6x-FQOQ@mail.gmail.com	2020-01-16 12:11:31 -05:00
Tom Lane	0db7c67051	Minor code beautification in regexp.c. Remove duplicated code (apparently introduced by commit `c8ea87e4b`). Also get rid of some PG_USED_FOR_ASSERTS_ONLY variables we don't really need to have. Li Japin, Tom Lane Discussion: https://postgr.es/m/PS1PR0601MB3770A5595B6E5E3FD6F35724B6360@PS1PR0601MB3770.apcprd06.prod.outlook.com	2020-01-16 11:31:30 -05:00
Tom Lane	1281a5c907	Restructure ALTER TABLE execution to fix assorted bugs. We've had numerous bug reports about how (1) IF NOT EXISTS clauses in ALTER TABLE don't behave as-expected, and (2) combining certain actions into one ALTER TABLE doesn't work, though executing the same actions as separate statements does. This patch cleans up all of the cases so far reported from the field, though there are still some oddities associated with identity columns. The core problem behind all of these bugs is that we do parse analysis of ALTER TABLE subcommands too soon, before starting execution of the statement. The root of the bugs in group (1) is that parse analysis schedules derived commands (such as a CREATE SEQUENCE for a serial column) before it's known whether the IF NOT EXISTS clause should cause a subcommand to be skipped. The root of the bugs in group (2) is that earlier subcommands may change the catalog state that later subcommands need to be parsed against. Hence, postpone parse analysis of ALTER TABLE's subcommands, and do that one subcommand at a time, during "phase 2" of ALTER TABLE which is the phase that does catalog rewrites. Thus the catalog effects of earlier subcommands are already visible when we analyze later ones. (The sole exception is that we do parse analysis for ALTER COLUMN TYPE subcommands during phase 1, so that their USING expressions can be parsed against the table's original state, which is what we need. Arguably, these bugs stem from falsely concluding that because ALTER COLUMN TYPE must do early parse analysis, every other command subtype can too.) This means that ALTER TABLE itself must deal with execution of any non-ALTER-TABLE derived statements that are generated by parse analysis. Add a suitable entry point to utility.c to accept those recursive calls, and create a struct to pass through the information needed by the recursive call, rather than making the argument lists of AlterTable() and friends even longer. Getting this to work correctly required a little bit of fiddling with the subcommand pass structure, in particular breaking up AT_PASS_ADD_CONSTR into multiple passes. But otherwise it's mostly a pretty straightforward application of the above ideas. Fixing the residual issues for identity columns requires refactoring of where the dependency link from an identity column to its sequence gets set up. So that seems like suitable material for a separate patch, especially since this one is pretty big already. Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us	2020-01-15 18:49:24 -05:00
Alvaro Herrera	a166d408eb	Report progress of ANALYZE commands This uses the progress reporting infrastructure added by `c16dc1aca5`, adding support for ANALYZE. Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi, Vignesh C, Amit Langote	2020-01-15 11:14:39 -03:00
Michael Paquier	ac5bdf6261	Fix buggy logic in isTempNamespaceInUse() The logic introduced in this routine as of `246a6c8` would report an incorrect result when a session calls it to check if the temporary namespace owned by the session is in use or not. It is possible to optimize more the routine in this case to avoid a PGPROC lookup, but let's keep the logic simple. As this routine is used only by autovacuum for now, there were no live bugs, still let's be correct for any future code involving it. Author: Michael Paquier Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/20200113093703.GA41902@paquier.xyz Backpatch-through: 11	2020-01-15 13:58:33 +09:00
Amit Kapila	4d8a8d0c73	Introduce IndexAM fields for parallel vacuum. Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions in IndexAmRoutine for parallel vacuum. The amusemaintenanceworkmem tells whether a particular IndexAM uses maintenance_work_mem or not. This will help in controlling the memory used by individual workers as otherwise, each worker can consume memory equal to maintenance_work_mem. The amparallelvacuumoptions tell whether a particular IndexAM participates in a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of vacuum. Author: Masahiko Sawada and Amit Kapila Reviewed-by: Dilip Kumar, Amit Kapila, Tomas Vondra and Robert Haas Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com	2020-01-15 07:24:14 +05:30
Peter Eisentraut	fe233366f2	Fix compiler warning about format on Windows On 64-bit Windows, pid_t is long long int, so a %d format isn't enough.	2020-01-14 23:59:18 +01:00
Peter Eisentraut	3297308278	walreceiver uses a temporary replication slot by default If no permanent replication slot is configured using primary_slot_name, the walreceiver now creates and uses a temporary replication slot. A new setting wal_receiver_create_temp_slot can be used to disable this behavior, for example, if the remote instance is out of replication slots. Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com	2020-01-14 14:40:41 +01:00
Peter Eisentraut	ee4ac46c8e	Expose PQbackendPID() through walreceiver API This will be used by a subsequent patch. Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/CA%2Bfd4k4dM0iEPLxyVyme2RAFsn8SUgrNtBJOu81YqTY4V%2BnqZA%40mail.gmail.com	2020-01-14 14:40:41 +01:00
Peter Eisentraut	f595117e24	ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION Add an ALTER TABLE subcommand for dropping the generated property from a column, per SQL standard. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Discussion: https://www.postgresql.org/message-id/flat/2f7f1d9c-946e-0453-d841-4f38eb9d69b6%402ndquadrant.com	2020-01-14 13:36:03 +01:00
Dean Rasheed	d751ba5235	Make rewriter prevent auto-updates on views with conditional INSTEAD rules. A view with conditional INSTEAD rules and no unconditional INSTEAD rules or INSTEAD OF triggers is not auto-updatable. Previously we relied on a check in the executor to catch this, but that's problematic since the planner may fail to properly handle such a query and thus return a particularly unhelpful error to the user, before reaching the executor check. Instead, trap this in the rewriter and report the correct error there. Doing so also allows us to include more useful error detail than the executor check can provide. This doesn't change the existing behaviour of updatable views; it merely ensures that useful error messages are reported when a view isn't updatable. Per report from Pengzhou Tang, though not adopting that suggested fix. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAG4reAQn+4xB6xHJqWdtE0ve_WqJkdyCV4P=trYr4Kn8_3_PEA@mail.gmail.com	2020-01-14 09:52:21 +00:00
Tom Lane	7f380c59f8	Reduce size of backend scanner's tables. Previously, the core scanner's yy_transition[] array had 37045 elements. Since that number is larger than INT16_MAX, Flex generated the array to contain 32-bit integers. By reimplementing some of the bulkier scanner rules, this patch reduces the array to 20495 elements. The much smaller total length, combined with the consequent use of 16-bit integers for the array elements reduces the binary size by over 200kB. This was accomplished in two ways: 1. Consolidate handling of quote continuations into a new start condition, rather than duplicating that logic for five different string types. 2. Treat Unicode strings and identifiers followed by a UESCAPE sequence as three separate tokens, rather than one. The logic to de-escape Unicode strings is moved to the filter code in parser.c, which already had the ability to provide special processing for token sequences. While we could have implemented the conversion in the grammar, that approach was rejected for performance and maintainability reasons. Performance in microbenchmarks of raw parsing seems equal or slightly faster in most cases, and it's reasonable to expect that in real-world usage (with more competition for the CPU cache) there will be a larger win. The exception is UESCAPE sequences; lexing those is about 10% slower, primarily because the scanner now has to be called three times rather than one. This seems acceptable since that feature is very rarely used. The psql and epcg lexers are likewise modified, primarily because we want to keep them all in sync. Since those lexers don't use the space-hogging -CF option, the space savings is much less, but it's still good for perhaps 10kB apiece. While at it, merge the ecpg lexer's handling of C-style comments used in SQL and in C. Those have different rules regarding nested comments, but since we already have the ability to keep track of the previous start condition, we can use that to handle both cases within a single start condition. This matches the core scanner more closely. John Naylor Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com	2020-01-13 15:04:31 -05:00
Peter Eisentraut	259bbe1778	Fix base backup with database OIDs larger than INT32_MAX The use of pg_atoi() for parsing a string into an Oid fails for values larger than INT32_MAX, since OIDs are unsigned. Instead, use atooid(). While this has less error checking, the contents of the data directory are expected to be trustworthy, so we don't need to go out of our way to do full error checking. Discussion: https://www.postgresql.org/message-id/flat/dea47fc8-6c89-a2b1-07e3-754ff1ab094b%402ndquadrant.com	2020-01-13 13:41:12 +01:00
Michael Paquier	7689d907bb	Fix comment in heapam.c Improvement per suggestion from Tom Lane. Author: Daniel Gustafsson Discussion: https://postgr.es/m/FED18699-4270-4778-8DA8-10F119A5ECF3@yesql.se	2020-01-13 17:57:38 +09:00
Amit Kapila	4e514c6180	Delete empty pages in each pass during GIST VACUUM. Earlier, we use to postpone deleting empty pages till the second stage of vacuum to amortize the cost of scanning internal pages. However, that can sometimes (say vacuum is canceled or errored between first and second stage) delay the pages to be recycled. Another thing is that to facilitate deleting empty pages in the second stage, we need to share the information about internal and empty pages between different stages of vacuum. It will be quite tricky to share this information via DSM which is required for the upcoming parallel vacuum patch. Also, it will bring the logic to reclaim deleted pages closer to nbtree where we delete empty pages in each pass. Overall, the advantages of deleting empty pages in each pass outweigh the advantages of postponing the same. Author: Dilip Kumar, with changes by Amit Kapila Reviewed-by: Sawada Masahiko and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com	2020-01-13 07:59:44 +05:30
Tomas Vondra	eae056c19e	Apply multiple multivariate MCV lists when possible Until now we've only used a single multivariate MCV list per relation, covering the largest number of clauses. So for example given a query SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1 and extended statistics on (a,b) and (c,d), we'd only pick and use one of them. This commit improves this by repeatedly picking and applying the best statistics (matching the largest number of remaining clauses) until no additional statistics is applicable. This greedy algorithm is simple, but may not be optimal. A different choice of statistics may leave fewer clauses unestimated and/or give better estimates for some other reason. This can however happen only when there are overlapping statistics, and selecting one makes it impossible to use the other. E.g. with statistics on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d). But it's not clear which option is the best one. We however assume cases like this are rare, and the easiest solution is to define statistics covering the whole group of correlated columns. In the future we might support overlapping stats, using some of the clauses as conditions (in conditional probability sense). Author: Tomas Vondra Reviewed-by: Mark Dilger, Kyotaro Horiguchi Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development	2020-01-13 01:21:17 +01:00
Tomas Vondra	aaa6761876	Apply all available functional dependencies When considering functional dependencies during selectivity estimation, it's not necessary to bother with selecting the best extended statistic object and then use just dependencies from it. We can simply consider all applicable functional dependencies at once. This means we need to deserialie all (applicable) dependencies before applying them to the clauses. This is a bit more expensive than picking the best statistics and deserializing dependencies for it. To minimize the additional cost, we ignore statistics that are not applicable. Author: Tomas Vondra Reviewed-by: Mark Dilger Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development	2020-01-13 01:21:06 +01:00
Tom Lane	652686a334	Fix edge-case crashes and misestimation in range containment selectivity. When estimating the selectivity of "range_var <@ range_constant" or "range_var @> range_constant", if the upper (or respectively lower) bound of the range_constant was above the last bin of the range_var's histogram, the code would access uninitialized memory and potentially crash (though it seems the probability of a crash is quite low). Handle the endpoint cases explicitly to fix that. While at it, be more paranoid about the possibility of getting NaN or other silly results from the range type's subdiff function. And improve some comments. Ordinarily we'd probably add a regression test case demonstrating the bug in unpatched code. But it's too hard to get it to crash reliably because of the uninitialized-memory dependence, so skip that. Per bug #16122 from Adam Scott. It's been broken from the beginning, apparently, so backpatch to all supported branches. Diagnosis by Michael Paquier, patch by Andrey Borodin and Tom Lane. Discussion: https://postgr.es/m/16122-eb35bc248c806c15@postgresql.org	2020-01-12 14:36:59 -05:00
Michael Paquier	1088729e84	Remove incorrect assertion for INSERT in logical replication's publisher On the publisher, it was assumed that an INSERT change cannot happen for a relation with no replica identity. However this is true only for a change that needs references to old rows, aka UPDATE or DELETE, so trying to use logical replication with a relation that has no replica identity led to an assertion failure in the publisher when issuing an INSERT. This commit removes the incorrect assertion, and adds more regression tests to provide coverage for relations without replica identity. Reported-by: Neha Sharma Author: Dilip Kumar, Michael Paquier Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com Backpatch-through: 10	2020-01-12 22:43:45 +09:00
Tom Lane	2c0cdc8183	Extensive code review for GSSAPI encryption mechanism. Fix assorted bugs in handling of non-blocking I/O when using GSSAPI encryption. The encryption layer could return the wrong status information to its caller, resulting in effectively dropping some data (or possibly in aborting a not-broken connection), or in a "livelock" situation where data remains to be sent but the upper layers think transmission is done and just go to sleep. There were multiple small thinkos contributing to that, as well as one big one (failure to think through what to do when a send fails after having already transmitted data). Note that these errors could cause failures whether the client application asked for non-blocking I/O or not, since both libpq and the backend always run things in non-block mode at this level. Also get rid of use of static variables for GSSAPI inside libpq; that's entirely not okay given that multiple connections could be open at once inside a single client process. Also adjust a bunch of random small discrepancies between the frontend and backend versions of the send/receive functions -- except for error handling, they should be identical, and now they are. Also extend the Kerberos TAP tests to exercise cases where nontrivial amounts of data need to be pushed through encryption. Before, those tests didn't provide any useful coverage at all for the cases of interest here. (They still might not, depending on timing, but at least there's a chance.) Per complaint from pmc@citylink and subsequent investigation. Back-patch to v12 where this code was introduced. Discussion: https://postgr.es/m/20200109181822.GA74698@gate.oper.dinoex.org	2020-01-11 17:14:08 -05:00
Peter Eisentraut	c67a55da4e	Make lsn argument of walrcv_create_slot() optional Some callers are not using it, so it's wasteful to have to specify it. Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/CA+fd4k4BcYrYucNfTnK-CQX3+jsG+PRPEhHAUSo-W4P0Lec57A@mail.gmail.com	2020-01-11 09:07:14 +01:00
Peter Eisentraut	c096a804d9	Remove STATUS_FOUND Replace the solitary use with a bool. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/a6f91ead-0ce4-2a34-062b-7ab9813ea308%402ndquadrant.com	2020-01-11 07:48:57 +01:00
Noah Misch	38fc056074	Maintain valid md.c state when FileClose() fails. FileClose() failure ordinarily causes a PANIC. Suppose the user disables that PANIC via data_sync_retry=on. After mdclose() issued a FileClose() that failed, calls into md.c raised SIGSEGV. This fix adds repalloc() calls during mdclose(); update a comment about ignoring repalloc() cost. The rate of relation segment count change is a minor factor; more relevant to overall performance is the rate of mdclose() and subsequent re-opening of segments. Back-patch to v10, where commit `45e191e3aa` introduced the bug. Reviewed by Kyotaro Horiguchi. Discussion: https://postgr.es/m/20191222091930.GA1280238@rfd.leadboat.com	2020-01-10 18:31:22 -08:00
Alvaro Herrera	a7b6ab5db1	Clean up representation of flags in struct ReorderBufferTXN This simplifies addition of further flags. Author: Nikhil Sontakke Discussion: https://postgr.es/m/CAMGcDxeViP+R-OL7QhzUV9eKCVjURobuY1Zijik4Ay_Ddwo4Cg@mail.gmail.com	2020-01-10 17:46:57 -03:00
Tom Lane	9ce77d75c5	Reconsider the representation of join alias Vars. The core idea of this patch is to make the parser generate join alias Vars (that is, ones with varno pointing to a JOIN RTE) only when the alias Var is actually different from any raw join input, that is a type coercion and/or COALESCE is necessary to generate the join output value. Otherwise just generate varno/varattno pointing to the relevant join input column. In effect, this means that the planner's flatten_join_alias_vars() transformation is already done in the parser, for all cases except (a) columns that are merged by JOIN USING and are transformed in the process, and (b) whole-row join Vars. In principle that would allow us to skip doing flatten_join_alias_vars() in many more queries than we do now, but we don't have quite enough infrastructure to know that we can do so --- in particular there's no cheap way to know whether there are any whole-row join Vars. I'm not sure if it's worth the trouble to add a Query-level flag for that, and in any case it seems like fit material for a separate patch. But even without skipping the work entirely, this should make flatten_join_alias_vars() faster, particularly where there are nested joins that it previously had to flatten recursively. An essential part of this change is to replace Var nodes' varnoold/varoattno fields with varnosyn/varattnosyn, which have considerably more tightly-defined meanings than the old fields: when they differ from varno/varattno, they identify the Var's position in an aliased JOIN RTE, and the join alias is what ruleutils.c should print for the Var. This is necessary because the varno change destroyed ruleutils.c's ability to find the JOIN RTE from the Var's varno. Another way in which this change broke ruleutils.c is that it's no longer feasible to determine, from a JOIN RTE's joinaliasvars list, which join columns correspond to which columns of the join's immediate input relations. (If those are sub-joins, the joinaliasvars entries may point to columns of their base relations, not the sub-joins.) But that was a horrid mess requiring a lot of fragile assumptions already, so let's just bite the bullet and add some more JOIN RTE fields to make it more straightforward to figure that out. I added two integer-List fields containing the relevant column numbers from the left and right input rels, plus a count of how many merged columns there are. This patch depends on the ParseNamespaceColumn infrastructure that I added in commit `5815696bc`. The biggest bit of code change is restructuring transformFromClauseItem's handling of JOINs so that the ParseNamespaceColumn data is propagated upward correctly. Other than that and the ruleutils fixes, everything pretty much just works, though some processing is now inessential. I grabbed two pieces of low-hanging fruit in that line: 1. In find_expr_references, we don't need to recurse into join alias Vars anymore. There aren't any except for references to merged USING columns, which are more properly handled when we scan the join's RTE. This change actually fixes an edge-case issue: we will now record a dependency on any type-coercion function present in a USING column's joinaliasvar, even if that join column has no references in the query text. The odds of the missing dependency causing a problem seem quite small: you'd have to posit somebody dropping an implicit cast between two data types, without removing the types themselves, and then having a stored rule containing a whole-row Var for a join whose USING merge depends on that cast. So I don't feel a great need to change this in the back branches. But in theory this way is more correct. 2. markRTEForSelectPriv and markTargetListOrigin don't need to recurse into join alias Vars either, because the cases they care about don't apply to alias Vars for USING columns that are semantically distinct from the underlying columns. This removes the only case in which markVarForSelectPriv could be called with NULL for the RTE, so adjust the comments to describe that hack as being strictly internal to markRTEForSelectPriv. catversion bump required due to changes in stored rules. Discussion: https://postgr.es/m/7115.1577986646@sss.pgh.pa.us	2020-01-09 11:56:59 -05:00
Robert Haas	ed10f32e37	Add pg_shmem_allocations view. This tells you about allocations that have been made from the main shared memory segment. The original patch also tried to show information about dynamic shared memory allocation as well, but I decided to leave that problem for another time. Andres Freund and Robert Haas, reviewed by Michael Paquier, Marti Raudsepp, Tom Lane, Álvaro Herrera, and Kyotaro Horiguchi. Discussion: http://postgr.es/m/20140504114417.GM12715@awork2.anarazel.de	2020-01-09 10:59:07 -05:00
Peter Eisentraut	f85a485f89	Add support for automatically updating Unicode derived files We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating `d8594d123c`, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <john.naylor@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com	2020-01-09 10:08:14 +01:00
Alvaro Herrera	f5d28710c7	Reimplement nullification of walsender timestamp Make the value null only at pg_stat_activity-output time, as suggested by Tom Lane, instead of messing with the internal state. This should appease buildfarm members with force_parallel_mode=regress, which are running parallel queries on logical replication walsenders. The fact that walsenders can run parallel queries should perhaps be studied more carefully, but for the moment let's get rid of the red blots in buildfarm. Backpatch to pg10, like the previous commit. Discussion: https://postgr.es/m/30804.1578438763@sss.pgh.pa.us	2020-01-08 14:33:49 -03:00
Tom Lane	913bbd88dc	Improve the handling of result type coercions in SQL functions. Use the parser's standard type coercion machinery to convert the output column(s) of a SQL function's final SELECT or RETURNING to the type(s) they should have according to the function's declared result type. We'll allow any case where an assignment-level coercion is available. Previously, we failed unless the required coercion was a binary-compatible one (and the documentation ignored this, falsely claiming that the types must match exactly). Notably, the coercion now accounts for typmods, so that cases where a SQL function is declared to return a composite type whose columns are typmod-constrained now behave as one would expect. Arguably this aspect is a bug fix, but the overall behavioral change here seems too large to consider back-patching. A nice side-effect is that functions can now be inlined in a few cases where we previously failed to do so because of type mismatches. Discussion: https://postgr.es/m/18929.1574895430@sss.pgh.pa.us	2020-01-08 11:07:59 -05:00
Tom Lane	4ac8aaa36f	Fix handling of generated columns in ALTER TABLE. ALTER TABLE failed if a column referenced in a GENERATED expression had been added or changed in type earlier in the ALTER command. That's because the GENERATED expression needs to be evaluated against the table's updated tuples, but it was being evaluated against the original tuples. (Fortunately the executor has adequate cross-checks to notice the mismatch, so we just got an obscure error message and not anything more dangerous.) Per report from Andreas Joseph Krogh. Back-patch to v12 where GENERATED was added. Discussion: https://postgr.es/m/VisenaEmail.200.231b0a41523275d0.16ea7f800c7@tc7-visena	2020-01-08 09:42:53 -05:00
Michael Paquier	65192e0244	Revert "Forbid DROP SCHEMA on temporary namespaces" This reverts commit `a052f6c`, following complains from Robert Haas and Tom Lane. Backpatch down to 9.4, like the previous commit. Discussion: https://postgr.es/m/CA+TgmobL4npEX5=E5h=5Jm_9mZun3MT39Kq2suJFVeamc9skSQ@mail.gmail.com Backpatch-through: 9.4	2020-01-08 10:36:12 +09:00
Alvaro Herrera	b175bd59fa	pg_stat_activity: show NULL stmt start time for walsenders Returning a non-NULL time is pointless, sinc a walsender is not a process that would be running normal transactions anyway, but the code was unintentionally exposing the process start time intermittently, which was not only bogus but it also confused monitoring systems looking for idle transactions. Fix by avoiding all updates in walsenders. Backpatch to 11, where walsenders started appearing in pg_stat_activity. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/20191209234409.exe7osmyalwkt5j4@development	2020-01-07 17:38:48 -03:00
Robert Haas	ce242ae154	tableam: New callback relation_fetch_toast_slice. Instead of always calling heap_fetch_toast_slice during detoasting, invoke a table AM callback which, when the toast table is a heap table, will be heap_fetch_toast_slice. This makes it possible for a table AM other than heap to be used as a TOAST table. It also completes the series of commits intended to improve the interaction of tableam with TOAST that began with commit 8b94dab06617ef80a0901ab103ebd8754427ef5a; detoast.c is now, hopefully, fully AM-independent. Patch by me, reviewed by Andres Freund and Peter Eisentraut. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com	2020-01-07 14:36:38 -05:00
Robert Haas	83322e38da	tableam: Allow choice of toast AM. Previously, the toast table had to be implemented by the same AM that was used for the main table, which was bad, because the detoasting code won't work with anything but heap. This commit doesn't fix the latter problem, although there's another patch coming which does, but it does let you pick something that works (i.e. heap, right now). Patch by me, reviewed by Andres Freund. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com	2020-01-07 14:23:25 -05:00
Robert Haas	8147278589	Increase the maximum value of track_activity_query_size. This one-line change provoked a lot of discussion, but ultimately the consensus seems to be that allowing a larger value might be useful to somebody, and probably won't hurt anyone who chooses not to take advantage of the higher maximum limit. Vyacheslav Makarov, reviewed by many people. Discussion: http://postgr.es/m/7b5ecc5a9991045e2f13c84e3047541d@postgrespro.ru	2020-01-07 12:14:19 -05:00
Tom Lane	e369f37086	Reduce the number of GetFlushRecPtr() calls done by walsenders. Since the WAL flush position only moves forward, it's safe to cache its previous value within each walsender process, and update from shared memory only once we've caught up to the previously-seen value. When there are many active walsenders, this makes for a very significant reduction in the amount of contention on the XLogCtl->info_lck spinlock. This patch also adjusts the logic so that we update our idea of the flush position after processing a WAL record, rather than beforehand. This may cause us to realize we're not caught up when the preceding coding would've thought that we were, but that seems all to the good; it may avoid a useless sleep-and-wakeup cycle. Back-patch to v12. The contention problem exists in prior branches, but it's much less severe (due to inefficiencies elsewhere) so there seems no need to take any risk of back-patching further. Pierre Ducroquet, reviewed by Julien Rouhaud Discussion: https://postgr.es/m/2931018.Vxl9zapr77@pierred-pdoc	2020-01-06 16:42:20 -05:00
Tom Lane	20d6225d16	Add functions min_scale(numeric) and trim_scale(numeric). These allow better control of trailing zeroes in numeric values. Pavel Stehule, based on an old proposal of Marko Tiikkaja's; review by Karl Pinc Discussion: https://postgr.es/m/CAFj8pRDjs-navGASeF0Wk74N36YGFJ+v=Ok9_knRa7vDc-qugg@mail.gmail.com	2020-01-06 12:13:53 -05:00
Peter Eisentraut	b9c130a1fd	Have logical replication subscriber fire column triggers The logical replication apply worker did not fire per-column update triggers because the updatedCols bitmap in the RTE was not populated. This fixes that. Reviewed-by: Euler Taveira <euler@timbira.com.br> Discussion: https://www.postgresql.org/message-id/flat/21673e2d-597c-6afe-637e-e8b10425b240%402ndquadrant.com	2020-01-06 08:40:00 +01:00
Michael Paquier	7b283d0e1d	Remove support for OpenSSL 0.9.8 and 1.0.0 Support is out of scope from all the major vendors for these versions (for example RHEL5 uses a version based on 0.9.8, and RHEL6 uses 1.0.1), and it created some extra maintenance work. Upstream has stopped support of 0.9.8 in December 2015 and of 1.0.0 in February 2016. Since `b1abfec`, note that the default SSL protocol version set with ssl_min_protocol_version is TLSv1.2, whose support was added in OpenSSL 1.0.1, so there is no point to enforce ssl_min_protocol_version to TLSv1 in the SSL tests. Author: Michael Paquier Reviewed-by: Daniel Gustafsson, Tom Lane Discussion: https://postgr.es/m/20191205083252.GE5064@paquier.xyz	2020-01-06 12:51:44 +09:00
Peter Geoghegan	fc31001123	Remove redundant incomplete split assertion. The fastpath insert optimization's incomplete split flag Assert() is redundant. We'll reach the more general Assert() within _bt_findinsertloc() in all cases. (Besides, Assert()'ing that the rightmost page doesn't have the flag set never made much sense.)	2020-01-05 17:42:13 -08:00
Peter Eisentraut	3fd40b628c	Make better use of ParseState in ProcessUtility Pass ParseState into the functions called from standard_ProcessUtility() instead passing the query string and query environment separately. No functionality change, but it makes the notation consistent. We had already started moving things into that direction piece by piece, and this completes it. Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/6e7aa4a1-be6a-1a75-b1f9-83a678e5184a@2ndquadrant.com	2020-01-04 13:12:41 +01:00
Peter Geoghegan	d2e5e20e57	Add xl_btree_delete optimization. Commit `558a9165e0` taught _bt_delitems_delete() to produce its own XID horizon on the primary. Standbys no longer needed to generate their own latestRemovedXid, since they could just use the explicitly logged value from the primary instead. The deleted offset numbers array from the xl_btree_delete WAL record was no longer used by the REDO routine for anything other than deleting the items. This enables a minor optimization: We now treat the array as buffer state, not generic WAL data, following _bt_delitems_vacuum()'s example. This should be a minor win, since it allows us to avoid including the deleted items array in cases where XLogInsert() stores the whole buffer anyway. The primary goal here is to make the code more maintainable, though. Removing inessential differences between the two functions highlights the fundamental differences that remain. Also change xl_btree_delete to use uint32 for the size of the array of item offsets being deleted. This brings xl_btree_delete closer to xl_btree_vacuum. Furthermore, it seems like a good idea to use an explicit-width integer type (the field was previously an "int"). Bump XLOG_PAGE_MAGIC because xl_btree_delete changed. Discussion: https://postgr.es/m/CAH2-Wzkz4TjmezzfAbaV1zYrh=fr0bCpzuJTvBe5iUQ3aUPsCQ@mail.gmail.com	2020-01-03 12:18:13 -08:00
Peter Geoghegan	0c41c83d8f	Clear up btree_xlog_split() alignment comment. Adjust a comment that describes how alignment of the new left page high key works in btree_xlog_split(), the nbtree page split REDO routine. The wording used before commit `2c03216d83` is much clearer, so go back to that.	2020-01-02 18:30:25 -08:00
Peter Geoghegan	44e44bd258	Correct _bt_delitems_vacuum() lock comments. The expectation within _bt_delitems_vacuum() is that caller has a super-exclusive/cleanup buffer lock (not just a pin and a write lock).	2020-01-02 13:30:40 -08:00
Alvaro Herrera	1fa846f1c9	Fix cloning of row triggers to sub-partitions When row triggers exist in partitioned partitions that are not either part of FKs or deferred unique constraints, they are not correctly cloned to their partitions. That's because they are marked "internal", and those are purposefully skipped when doing the clone triggers dance. Fix by relaxing the condition on which internal triggers are skipped. Amit Langote initially diagnosed the problem and proposed a fix, but I used a different approach. Reported-by: Petr Fedorov Discussion: https://postgr.es/m/6b3f0646-ba8c-b3a9-c62d-1c6651a1920f@phystech.edu	2020-01-02 17:04:24 -03:00
Tom Lane	915c04f091	Fix typmod exposed for scalar function in FROM, too. On further reflection about commit `4d02eb017`, it occurs to me that expandRTE() had better agree with what addRangeTableEntryForFunction() is doing. So teach that about functions possibly having typmods, too.	2020-01-02 14:02:55 -05:00
Tom Lane	4d02eb017e	Fix collation exposed for scalar function in FROM. One code path in addRangeTableEntryForFunction() neglected to assign a collation to the tupdesc entry it constructs (which is a bit odd considering the other path did do so). This didn't matter before commit `5815696bc`, because nothing would look at the type data in this tupdesc; but now it does. While at it, make sure we assign the correct typmod as well. Most function expressions don't have a determinate typmod, but some do. Per buildfarm, which showed failures in non-C collations, a case I'd not thought to test for this patch :-(	2020-01-02 13:48:54 -05:00
Tom Lane	5815696bc6	Make parser rely more heavily on the ParseNamespaceItem data structure. When I added the ParseNamespaceItem data structure (in commit `5ebaaa494`), it wasn't very tightly integrated into the parser's APIs. In the wake of adding p_rtindex to that struct (commit `b541e9acc`), there is a good reason to make more use of it: by passing around ParseNamespaceItem pointers instead of bare RTE pointers, we can get rid of various messy methods for passing back or deducing the rangetable index of an RTE during parsing. Hence, refactor the addRangeTableEntryXXX functions to build and return a ParseNamespaceItem struct, not just the RTE proper; and replace addRTEtoQuery with addNSItemToQuery, which is passed a ParseNamespaceItem rather than building one internally. Also, add per-column data (a ParseNamespaceColumn array) to each ParseNamespaceItem. These arrays are built during addRangeTableEntryXXX, where we have column type data at hand so that it's nearly free to fill the data structure. Later, when we need to build Vars referencing RTEs, we can use the ParseNamespaceColumn info to avoid the rather expensive operations done in get_rte_attribute_type() or expandRTE(). get_rte_attribute_type() is indeed dead code now, so I've removed it. This makes for a useful improvement in parse analysis speed, around 20% in one moderately-complex test query. The ParseNamespaceColumn structs also include Var identity information (varno/varattno). That info isn't actually being used in this patch, except that p_varno == 0 is a handy test for a dropped column. A follow-on patch will make more use of it. Discussion: https://postgr.es/m/2461.1577764221@sss.pgh.pa.us	2020-01-02 11:29:01 -05:00
Amit Kapila	d207038053	Fix running out of file descriptors for spill files. Currently while decoding changes, if the number of changes exceeds a certain threshold, we spill those to disk. And this happens for each (sub)transaction. Now, while reading all these files, we don't close them until we read all the files. While reading these files, if the number of such files exceeds the maximum number of file descriptors, the operation errors out. Use PathNameOpenFile interface to open these files as that internally has the mechanism to release kernel FDs as needed to get us under the max_safe_fds limit. Reported-by: Amit Khandekar Author: Amit Khandekar Reviewed-by: Amit Kapila Backpatch-through: 9.4 Discussion: https://postgr.es/m/CAJ3gD9c-sECEn79zXw4yBnBdOttacoE-6gAyP0oy60nfs_sabQ@mail.gmail.com	2020-01-02 11:41:04 +05:30
Peter Geoghegan	4b25f5d0ba	Revise BTP_HAS_GARBAGE nbtree VACUUM comments. _bt_delitems_vacuum() comments claimed that it isn't worth another scan of the page to avoid falsely unsetting the BTP_HAS_GARBAGE page flag hint (this happens to be the same wording that was removed from _bt_delitems_delete() by my recent commit `fe97c61c`). The comments made little sense, though. The issue can't have much to do with performing a second scan of the target leaf page, since an LP_DEAD test could easily be performed in the first scan of the page anyway (the scan that takes place in btvacuumpage() caller). Revise the explanation. It makes much more sense to frame this as an issue about recovery conflicts. _bt_delitems_vacuum() cannot easily generate an XID cutoff in the same way that _bt_delitems_delete() is designed to. Falsely unsetting the page flag is not ideal, and is likely to happen more often than was supposed by the original comments. Explain why it usually isn't a problem in practice. There may be an argument for _bt_delitems_vacuum() not clearing the BTP_HAS_GARBAGE bit, removing the question of it being falsely unset by VACUUM (there may even be an argument for not using a page level hint at all). This can be revisited later.	2020-01-01 17:29:41 -08:00
Peter Geoghegan	c5f3b53b0e	Update btree_xlog_delete() comments. Commit `fe97c61c` updated LP_DEAD item deletion comments, but missed a minor discrepancy on the REDO side. Fix it now. In passing, don't talk about the btree_xlog_vacuum() behavior within btree_xlog_delete(). The reliance on XLOG_HEAP2_CLEANUP_INFO records for recovery conflicts is already discussed within btvacuumpage() and mentioned again in passing above btree_xlog_vacuum(), which seems sufficient.	2020-01-01 11:32:07 -08:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Tom Lane	0ce38730ac	Micro-optimize AllocSetFreeIndex() by reference to pg_bitutils code. Use __builtin_clz() where available. Where it isn't, we can still win a little by using the pg_leftmost_one_pos[] lookup table instead of having a private table. Also drop the initial right shift by ALLOC_MINBITS in favor of subtracting ALLOC_MINBITS from the leftmost-one-pos result. This is a win because the compiler can fold that adjustment into other constants it'd have to add anyway, making the shift-removal free. Also, we can explain this coding as an unrolled form of pg_leftmost_one_pos32(), even though that's a bit ahistorical since it long predates pg_bitutils.h. John Naylor, with some cosmetic adjustments by me Discussion: https://postgr.es/m/CACPNZCuNUGMxjK7WTn_=WZnRbfASDdBxmjsVf2+m9MdmeNw_sg@mail.gmail.com	2019-12-28 17:21:17 -05:00
Michael Paquier	a052f6cbb8	Forbid DROP SCHEMA on temporary namespaces This operation was possible for the owner of the schema or a superuser. Down to 9.4, doing this operation would cause inconsistencies in a session whose temporary schema was dropped, particularly if trying to create new temporary objects after the drop. A more annoying consequence is a crash of autovacuum on an assertion failure when logging information about an orphaned temp table dropped. Note that because of `246a6c8` (present in v11~), which has made the removal of orphaned temporary tables more aggressive, the failure could be triggered more easily, but it is possible to reproduce down to 9.4. Reported-by: Mahendra Singh, Prabhat Sahu Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, Mahendra Singh Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com Backpatch-through: 9.4	2019-12-27 17:58:43 +09:00
Michael Paquier	7854e07f25	Revert "Rename files and headers related to index AM" This follows multiple complains from Peter Geoghegan, Andres Freund and Alvaro Herrera that this issue ought to be dug more before actually happening, if it happens. Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql	2019-12-27 08:09:00 +09:00
Tom Lane	b541e9accb	Refactor parser's generation of Var nodes. Instead of passing around a pointer to the RangeTblEntry that provides the desired column, pass a pointer to the associated ParseNamespaceItem. The RTE is trivially reachable from the nsitem, and having the ParseNamespaceItem allows access to additional information. As proof of concept for that, add the rangetable index to ParseNamespaceItem, and use that to get rid of RTERangeTablePosn searches. (I have in mind to teach the parser to generate some different representation for Vars that are nullable by outer joins, and keeping the necessary information in ParseNamespaceItems seems like a reasonable approach to that. But whether that ever happens or not, this seems like good cleanup.) Also refactor the code around scanRTEForColumn so that the "fuzzy match" stuff does not leak out of parse_relation.c. Discussion: https://postgr.es/m/26144.1576858373@sss.pgh.pa.us	2019-12-26 11:16:42 -05:00

... 7 8 9 10 11 ...

20894 Commits