postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	ac9099fc1d	Fix confusion in SP-GiST between attribute type and leaf storage type. According to the documentation, the attType passed to the opclass config function (and also relied on by the core code) is the type of the heap column or expression being indexed. But what was actually being passed was the type stored for the index column. This made no difference for user-defined SP-GiST opclasses, because we weren't allowing the STORAGE clause of CREATE OPCLASS to be used, so the two types would be the same. But it's silly not to allow that, seeing that the built-in poly_ops opclass has a different value for opckeytype than opcintype, and that if you want to do lossy storage then the types must really be different. (Thus, user-defined opclasses doing lossy storage had to lie about what type is in the index.) Hence, remove the restriction, and make sure that we use the input column type not opckeytype where relevant. For reasons of backwards compatibility with existing user-defined opclasses, we can't quite insist that the specified leafType match the STORAGE clause; instead just add an amvalidate() warning if they don't match. Also fix some bugs that would only manifest when trying to return index entries when attType is different from attLeafType. It's not too surprising that these have not been reported, because the only usual reason for such a difference is to store the leaf value lossily, rendering index-only scans impossible. Add a src/test/modules module to exercise cases where attType is different from attLeafType and yet index-only scan is supported. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	2021-04-04 14:28:57 -04:00
Tomas Vondra	d9c5b9a9ee	Fix bug in brin_minmax_multi_union When calling sort_expanded_ranges() we need to remember the return value, because the function sorts and also deduplicates the ranges. So the number of ranges may decrease. brin_minmax_multi_union failed to do that, which resulted in crashes due to bogus ranges (equal minval/maxval but not marked as compacted). Reported-by: Jaime Casanova Discussion: https://postgr.es/m/20210404052550.GA4376%40ahch-to	2021-04-04 19:36:12 +02:00
Tomas Vondra	1dad2a5ea3	Fix order of parameters in BRIN minmax-multi calls The BRIN minmax-multi consistent function incorrectly assumed it can lookup an operator, and then swap the arguments to get the commutator. For example <(a,b) would be called as <(b,a) to get >(a,b). This works when the arguments are of the same type, but with cross-type opclasses this fails. We can't swap <(float4,float8) arguments, for example. Fixed by passing arguments in the right order. Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:25:41 +02:00
Tomas Vondra	e1fbe1181c	Fix BRIN minmax-multi distance for inet type The distance calculation ignored the mask, unlike the inet comparator, which resulted in negative distance in some cases. Fixed by applying the mask in brin_minmax_multi_distance_inet. I've considered simply calling inetmi() to calculate the delta, but that does not consider mask either. Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/1a0a7b9d-9bda-e3a2-7fa4-88f15042a051%40enterprisedb.com	2021-04-04 19:23:32 +02:00
Tomas Vondra	7262f2421a	Fix BRIN minmax-multi distance for timetz type The distance calculation ignored the time zone, so the result of (b-a) might have ended negative even if (b > a). Fixed by considering the time zone difference. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jLZFLCxyxfT%3DMfK5mtPfSzHA1rVLowR-j4RRsFVvKm7A%40mail.gmail.com	2021-04-04 19:22:23 +02:00
Tomas Vondra	2b10e0e3c2	Fix BRIN minmax-multi distance for interval type The distance calculation for interval type was treating months as having 31 days, which is inconsistent with the interval comparator (using 30 days). Due to this it was possible to get negative distance (b-a) when (a<b), trigerring an assert. Fixed by adopting the same logic as interval_cmp_value. Reported-by: Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5jKH0Xhneau2mNftNPtTy-BVgQfXc8zQkEvRvBHfeUThQ%40mail.gmail.com	2021-04-04 19:19:51 +02:00
Andres Freund	225a22b19e	Improve efficiency of wait event reporting, remove proc.h dependency. pgstat_report_wait_start() and pgstat_report_wait_end() required two conditional branches so far. One to check if MyProc is NULL, the other to check if pgstat_track_activities is set. As wait events are used around comparatively lightweight operations, and are inlined (reducing branch predictor effectiveness), that's not great. The dependency on MyProc has a second disadvantage: Low-level subsystems, like storage/file/fd.c, report wait events, but architecturally it is preferable for them to not depend on inter-process subsystems like proc.h (defining PGPROC). After this change including pgstat.h (nor obviously its sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC related headers anymore. These goals, efficiency and abstraction, are achieved by having pgstat_report_wait_start/end() not interact with MyProc, but instead a new my_wait_event_info variable. At backend startup it points to a local variable, removing the need to check for MyProc being NULL. During process initialization my_wait_event_info is redirected to MyProc->wait_event_info. At shutdown this is reversed. Because wait event reporting now does not need to know about where the wait event is stored, it does not need to know about PGPROC anymore. The removal of the branch for checking pgstat_track_activities is simpler: Don't check anymore. The cost due to the branch are often higher than the store - and even if not, pgstat_track_activities is rarely disabled. The main motivator to commit this work now is that removing the (indirect) pgproc.h include from pgstat.h simplifies a patch to move statistics reporting to shared memory (which still has a chance to get into 14). Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-03 12:03:45 -07:00
Andres Freund	e1025044cd	Split backend status and progress related functionality out of pgstat.c. Backend status (supporting pg_stat_activity) and command progress (supporting pg_stat_progress*) related code is largely independent from the rest of pgstat.[ch] (supporting views like pg_stat_all_tables that accumulate data over time). See also `a333476b92`. This commit doesn't rename the function names to make the distinction from the rest of pgstat_ clearer - that'd be more invasive and not clearly beneficial. If we were to decide to do such a rename at some point, it's better done separately from moving the code as well. Robert's review was of an earlier version. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-03 11:42:52 -07:00
Michael Paquier	e6bdfd9700	Refactor HMAC implementations Similarly to the cryptohash implementations, this refactors the existing HMAC code into a single set of APIs that can be plugged with any crypto libraries PostgreSQL is built with (only OpenSSL currently). If there is no such libraries, a fallback implementation is available. Those new APIs are designed similarly to the existing cryptohash layer, so there is no real new design here, with the same logic around buffer bound checks and memory handling. HMAC has a dependency on cryptohashes, so all the cryptohash types supported by cryptohash{_openssl}.c can be used with HMAC. This refactoring is an advantage mainly for SCRAM, that included its own implementation of HMAC with SHA256 without relying on the existing crypto libraries even if PostgreSQL was built with their support. This code has been tested on Windows and Linux, with and without OpenSSL, across all the versions supported on HEAD from 1.1.1 down to 1.0.1. I have also checked that the implementations are working fine using some sample results, a custom extension of my own, and doing cross-checks across different major versions with SCRAM with the client and the backend. Author: Michael Paquier Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz	2021-04-03 17:30:49 +09:00
Andres Freund	1d9c5d0ce2	Do not rely on pgstat.h to indirectly include storage/ headers. An upcoming patch might remove the (now indirect) proc.h include (which in turn includes other headers), and it's cleaner for the modified files to include their dependencies directly anyway... Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de	2021-04-02 20:02:47 -07:00
Andres Freund	a333476b92	Split wait event related code from pgstat.[ch] into wait_event.[ch]. The wait event related code is independent from the rest of the pgstat.[ch] code, of nontrivial size and changes on a regular basis. Put it into its own set of files. As there doesn't seem to be a good pre-existing directory for code like this, add src/backend/utils/activity. Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de	2021-04-02 20:02:26 -07:00
David Rowley	1267d9862f	Remove useless Asserts in Result Cache code Testing if an unsigned variable is >= 0 is pretty pointless. There's likely enough code in remove_cache_entry() to verify the cache memory accounting is correct in assert enabled builds. These Asserts were not adding much extra cover, even if they had been checking >= 0 on a signed variable. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210402204734.6mo3nfacnljlicgn@alap3.anarazel.de	2021-04-03 10:41:43 +13:00
Thomas Munro	c30f54ad73	Detect POLLHUP/POLLRDHUP while running queries. Provide a new GUC check_client_connection_interval that can be used to check whether the client connection has gone away, while running very long queries. It is disabled by default. For now this uses a non-standard Linux extension (also adopted by at least one other OS). POLLRDHUP is not defined by POSIX, and other OSes don't have a reliable way to know if a connection was closed without actually trying to read or write. In future we might consider trying to send a no-op/heartbeat message instead, but that could require protocol changes. Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Maksim Milyutin <milyutinma@gmail.com> Reviewed-by: Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version) Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru	2021-04-03 09:02:41 +13:00
Tom Lane	53aafdb9ff	Strip file names reported in error messages on Windows, too. Commit `dd136052b` established a policy that error message FILE items should include only the base name of the reporting source file, for uniformity and succinctness. We now observe that some Windows compilers use backslashes in __FILE__ strings, so truncate at backslashes as well. This is expected to fix some platform variation in the results of the new libpq_pipeline test module. Discussion: https://postgr.es/m/3650140.1617372290@sss.pgh.pa.us	2021-04-02 10:43:54 -04:00
Peter Eisentraut	9c5f67fd62	Add support for NullIfExpr in eval_const_expressions Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Discussion: https://www.postgresql.org/message-id/flat/7ea5ce773bbc4eea9ff1a381acd3b102@G08CNEXMBPEKD05.g08.fujitsu.local	2021-04-02 11:01:49 +02:00
Fujii Masao	96bdb7e19d	Fix pgstat_report_replslot() to use proper data types for its arguments. The caller of pgstat_report_replslot() passes int64 values to the function. Also the function stores those values in PgStat_Counter (i.e., int64) fields of PgStat_MsgReplSlot struct. But previously the function used "int" as the data types of some arguments for those values, which could lead to the overflow of values. To avoid this risk, this commit fixes pgstat_report_replslot() to use PgStat_Counter type for the arguments. Since they are the statistics counters, PgStat_Counter, the data type used for counters, is used for them instead of int64. Reported-by: Vignesh C Author: Vignesh C Reviewed-by: Jeevan Ladhe, Fujii Masao Discussion: https://postgr.es/m/CALDaNm080OpG=ZwOb0i8EyChH5SyHAMFWJCKaKTXmrfvJLbgaA@mail.gmail.com	2021-04-02 17:27:31 +09:00
David Rowley	9eacee2e62	Add Result Cache executor node (take 2) Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-02 14:10:56 +13:00
Tom Lane	1ebdec8c03	Rethink handling of pass-by-value leaf datums in SP-GiST. The existing convention in SP-GiST is that any pass-by-value datatype is stored in Datum representation, i.e. it's of width sizeof(Datum) even when typlen is less than that. This is okay, or at least it's too late to change it, for prefix datums and node-label datums in inner (upper) tuples. But it's problematic for leaf datums, because we'd prefer those to be stored in Postgres' standard on-disk representation so that we can easily extend leaf tuples to carry additional "included" columns. I believe, however, that we can get away with just up and changing that. This would be an unacceptable on-disk-format break, but there are two big mitigating factors: 1. It seems quite unlikely that there are any SP-GiST opclasses out there that use pass-by-value leaf datatypes. Certainly none of the ones in core do, nor has codesearch.debian.net heard of any. Given what SP-GiST is good for, it's hard to conceive of a use-case where the leaf-level values would be both small and fixed-width. (As an example, if you wanted to index text values with the leaf level being just a byte, then every text string would have to be represented with one level of inner tuple per preceding byte, which would be horrendously space-inefficient and slow to access. You always want to use as few inner-tuple levels as possible, leaving as much as possible in the leaf values.) 2. Even granting that you have such an index, this change only breaks things on big-endian machines. On little-endian, the high order bytes of the Datum format will now just appear to be alignment padding space. So, change the code to store pass-by-value leaf datums in their usual on-disk form. Inner-tuple datums are not touched. This is extracted from a larger patch that intends to add support for "included" columns. I'm committing it separately for visibility in our commit logs. Pavel Borisov and Tom Lane, reviewed by Andrey Borodin Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com	2021-04-01 17:55:17 -04:00
Stephen Frost	c9c41c7a33	Rename Default Roles to Predefined Roles The term 'default roles' wasn't quite apt as these roles aren't able to be modified or removed after installation, so rename them to be 'Predefined Roles' instead, adding an entry into the newly added Obsolete Appendix to help users of current releases find the new documentation. Bruce Momjian and Stephen Frost Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net	2021-04-01 15:32:06 -04:00
Peter Eisentraut	91e7c90329	Fix internal extract(timezone_minute) formulas Through various refactorings over time, the extract(timezone_minute from time with time zone) and extract(timezone_minute from timestamp with time zone) implementations ended up with two different but equally nonsensical formulas by using SECS_PER_MINUTE and MINS_PER_HOUR interchangeably. Since those two are of course both the same number, the formulas do work, but for readability, fix them to be semantically correct.	2021-04-01 16:12:53 +02:00
Heikki Linnakangas	f82de5c46b	Do COPY FROM encoding conversion/verification in larger chunks. This gives a small performance gain, by reducing the number of calls to the conversion/verification function, and letting it work with larger inputs. Also, reorganizing the input pipeline makes it easier to parallelize the input parsing: after the input has been converted to the database encoding, the next stage of finding the newlines can be done in parallel, because there cannot be any newline chars "embedded" in multi-byte characters in the encodings that we support as server encodings. This changes behavior in one corner case: if client and server encodings are the same single-byte encoding (e.g. latin1), previously the input would not be checked for zero bytes ('\0'). Any fields containing zero bytes would be truncated at the zero. But if encoding conversion was needed, the conversion routine would throw an error on the zero. After this commit, the input is always checked for zeros. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 12:23:40 +03:00
Heikki Linnakangas	ea1b99a661	Add 'noError' argument to encoding conversion functions. With the 'noError' argument, you can try to convert a buffer without knowing the character boundaries beforehand. The functions now need to return the number of input bytes successfully converted. This is is a backwards-incompatible change, if you have created a custom encoding conversion with CREATE CONVERSION. This adds a check to pg_upgrade for that, refusing the upgrade if there are any user-defined encoding conversions. Custom conversions are very rare, there are no commonly used extensions that I know of that uses that feature. No other objects can depend on conversions, so if you do have one, you can fairly easily drop it before upgrading, and recreate it after the upgrade with an updated version. Add regression tests for built-in encoding conversions. This doesn't cover every conversion, but it covers all the internal functions in conv.c that are used to implement the conversions. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	2021-04-01 11:45:22 +03:00
Amit Kapila	4778826532	Ensure to send a prepare after we detect concurrent abort during decoding. It is possible that while decoding a prepared transaction, it gets aborted concurrently via a ROLLBACK PREPARED command. In that case, we were skipping all the changes and directly sending Rollback Prepared when we find the same in WAL. However, the downstream has no idea of the GID of such a transaction. So, ensure to send prepare even when a concurrent abort is detected. Author: Ajin Cherian Reviewed-by: Markus Wanner, Amit Kapila Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com	2021-04-01 07:57:34 +05:30
David Rowley	28b3e3905c	Revert `b6002a796` This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com	2021-04-01 13:33:23 +13:00
David Rowley	b6002a796d	Add Result Cache executor node Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-01 12:32:22 +13:00
Tom Lane	c545e9524d	Don't prematurely cram a value into a short int. Since `a4d75c86b`, some buildfarm members have been warning that Assert(attnum <= MaxAttrNumber); is useless if attnum is an AttrNumber. I'm not certain how plausible it is that the value coming out of the bitmap could actually exceed MaxAttrNumber, but we seem to have thought that that was possible back in `7300a6995`. Revert the intermediate variable to int so that we have the same overflow protection as before.	2021-03-31 16:45:24 -04:00
Tom Lane	6197db5340	Improve style of some replication-related error messages. Put the remote end's error message into the primary error string, instead of relegating it to errdetail(). Although this could end up being awkward if the remote sends us a really long error message, it seems more in keeping with our message style guidelines, and more helpful in situations where the errdetail could get dropped. Peter Smith Discussion: https://postgr.es/m/CAHut+Ps-Qv2yQceCwobQDP0aJOkfDzRFrOaR6+2Op2K=WHGeWg@mail.gmail.com	2021-03-31 15:25:53 -04:00
Joe Conway	b12bd4869b	Fix has_column_privilege function corner case According to the comments, when an invalid or dropped column oid is passed to has_column_privilege(), the intention has always been to return NULL. However, when the caller had table level privilege the invalid/missing column was never discovered, because table permissions were checked first. Fix that by introducing extended versions of pg_attribute_acl(check\|mask) and pg_class_acl(check\|mask) which take a new argument, is_missing. When is_missing is NULL, the old behavior is preserved. But when is_missing is passed by the caller, no ERROR is thrown for dropped or missing columns/relations, and is_missing is flipped to true. This in turn allows has_column_privilege to check for column privileges first, providing the desired semantics. Not backpatched since it is a user visible behavioral change with no previous complaints, and the fix is a bit on the invasive side. Author: Joe Conway Reviewed-By: Tom Lane Reported by: Ian Barwick Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com	2021-03-31 13:55:25 -04:00
Tom Lane	86dc90056d	Rework planning and execution of UPDATE and DELETE. This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com	2021-03-31 11:52:37 -04:00
Peter Eisentraut	055fee7eb4	Allow an alias to be attached to a JOIN ... USING This allows something like SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x where x has the columns a, b, c and unlike a regular alias it does not hide the range variables of the tables being joined t1 and t2. Per SQL:2016 feature F404 "Range variable for common column names". Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com	2021-03-31 17:10:50 +02:00
Etsuro Fujita	27e1f14563	Add support for asynchronous execution. This implements asynchronous execution, which runs multiple parts of a non-parallel-aware Append concurrently rather than serially to improve performance when possible. Currently, the only node type that can be run concurrently is a ForeignScan that is an immediate child of such an Append. In the case where such ForeignScans access data on different remote servers, this would run those ForeignScans concurrently, and overlap the remote operations to be performed simultaneously, so it'll improve the performance especially when the operations involve time-consuming ones such as remote join and remote aggregation. We may extend this to other node types such as joins or aggregates over ForeignScans in the future. This also adds the support for postgres_fdw, which is enabled by the table-level/server-level option "async_capable". The default is false. Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself. This commit is mostly based on the patch proposed by Robert Haas, but also uses stuff from the patch proposed by Kyotaro Horiguchi and from the patch proposed by Thomas Munro. Reviewed by Kyotaro Horiguchi, Konstantin Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and others. Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com	2021-03-31 18:45:00 +09:00
Peter Eisentraut	66392d3965	Add p_names field to ParseNamespaceItem ParseNamespaceItem had a wired-in assumption that p_rte->eref describes the table and column aliases exposed by the nsitem. This relaxes this by creating a separate p_names field in an nsitem. This is mainly preparation for a patch for JOIN USING aliases, but it saves one indirection in common code paths, so it's possibly a win on its own. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/785329.1616455091@sss.pgh.pa.us	2021-03-31 10:52:37 +02:00
Peter Eisentraut	91c5a8caaa	Add errhint_plural() function and make use of it Similar to existing errmsg_plural() and errdetail_plural(). Some errhint() calls hadn't received the proper plural treatment yet.	2021-03-31 09:16:25 +02:00
Noah Misch	0ff8bbdee1	Accept slightly-filled pages for tuples larger than fillfactor. We always inserted a larger-than-fillfactor tuple into a newly-extended page, even when existing pages were empty or contained nothing but an unused line pointer. This was unnecessary relation extension. Start tolerating page usage up to 1/8 the maximum space that could be taken up by line pointers. This is somewhat arbitrary, but it should allow more cases to reuse pages. This has no effect on tables with fillfactor=100 (the default). John Naylor and Floris van Nee. Reviewed by Matthias van de Meent. Reported by Floris van Nee. Discussion: https://postgr.es/m/6e263217180649339720afe2176c50aa@opammb0562.comp.optiver.com	2021-03-30 18:53:44 -07:00
Tom Lane	65158f497a	Remove small inefficiency in ExecARDeleteTriggers/ExecARUpdateTriggers. Whilst poking at nodeModifyTable.c, I chanced to notice that while its calls to ExecBRTriggers and ExecIRTriggers are protected by tests to see if there are any relevant triggers to fire, its calls to ExecARTriggers are not; the latter functions do the equivalent tests themselves. This seems possibly reasonable given the more complex conditions involved, but what's less reasonable is that the ExecAR functions aren't careful to do no work when there is no work to be done. ExecARInsertTriggers gets this right, but the other two will both force creation of a slot that the query may have no use for. ExecARUpdateTriggers additionally performed a usually-useless ExecClearTuple() on that slot. This is probably all pretty microscopic in real workloads, but a cycle shaved is a cycle earned.	2021-03-30 20:01:31 -04:00
Bruce Momjian	5da9868ed9	In messages, use singular nouns for -1, like we do for +1. This outputs "-1 year", not "-1 years". Reported-by: neverov.max@gmail.com Bug: 16939 Discussion: https://postgr.es/m/16939-cceeb03fb72736ee@postgresql.org	2021-03-30 18:34:27 -04:00
Stephen Frost	4753ef37e0	Use a WaitLatch for vacuum/autovacuum sleeping Instead of using pg_usleep() in vacuum_delay_point(), use a WaitLatch. This has the advantage that we will realize if the postmaster has been killed since the last time we decided to sleep while vacuuming. Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=JN5w@mail.gmail.com	2021-03-30 12:52:56 -04:00
David Rowley	ed934d4fa3	Allow estimate_num_groups() to pass back further details about the estimation Here we add a new output parameter to estimate_num_groups() to allow it to inform the caller of additional, possibly useful information about the estimation. The new output parameter is a struct that currently contains just a single field with a set of flags. This was done rather than having the flags as an output parameter to allow future fields to be added without having to change the signature of the function at a later date when we want to pass back further information that might not be suitable to store in the flags field. It seems reasonable that one day in the future that the planner would want to know more about the estimation. For example, how many individual sets of statistics was the estimation generated from? The planner may want to take that into account if we ever want to consider risks as well as costs when generating plans. For now, there's only 1 flag we set in the flags field. This is to indicate if the estimation fell back on using the hard-coded constants in any part of the estimation. Callers may like to change their behavior if this is set, and this gives them the ability to do so. Callers may pass the flag pointer as NULL if they have no interest in obtaining any additional information about the estimate. We're not adding any actual usages of these flags here. Some follow-up commits will make use of this feature. Additionally, we're also not making any changes to add support for clauselist_selectivity() and clauselist_selectivity_ext(). However, if this is required in the future then the same struct being added here should be fine to use as a new output argument for those functions too. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com	2021-03-30 20:52:46 +13:00
David Rowley	efd9d92bb3	Fix compiler warning in unistr function Some compilers are not aware that elog/ereport ERROR does not return.	2021-03-30 20:28:09 +13:00
Amit Kapila	f64ea6dc5c	Add a xid argument to the filter_prepare callback for output plugins. Along with gid, this provides a different way to identify the transaction. The users that use xid in some way to prepare the transactions can use it to filter prepare transactions. The later commands COMMIT PREPARED or ROLLBACK PREPARED carries both identifiers, providing an output plugin the choice of what to use. Author: Markus Wanner Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/ee280000-7355-c4dc-e47b-2436e7be959c@enterprisedb.com	2021-03-30 10:34:43 +05:30
David Rowley	af527705ed	Adjust design of per-worker parallel seqscan data struct The design of the data structures which allow storage of the per-worker memory during parallel seq scans were not ideal. The work done in `56788d215` required an additional data structure to allow workers to remember the range of pages that had been allocated to them for processing during a parallel seqscan. That commit added a void pointer field to TableScanDescData to allow heapam to store the per-worker allocation information. However putting the field there made very little sense given that we have AM specific structs for that, e.g. HeapScanDescData. Here we remove the void pointer field from TableScanDescData and add a dedicated field for this purpose to HeapScanDescData. Previously we also allocated memory for this parallel per-worker data for all scans, regardless if it was a parallel scan or not. This was just a wasted allocation for non-parallel scans, so here we make the allocation conditional on the scan being parallel. Also, add previously missing pfree() to free the per-worker data in heap_endscan(). Reported-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de	2021-03-30 10:17:09 +13:00
Andrew Dunstan	6d7a6feac4	Allow matching the DN of a client certificate for authentication Currently we only recognize the Common Name (CN) of a certificate's subject to be matched against the user name. Thus certificates with subjects '/OU=eng/CN=fred' and '/OU=sales/CN=fred' will have the same connection rights. This patch provides an option to match the whole Distinguished Name (DN) instead of just the CN. On any hba line using client certificate identity, there is an option 'clientname' which can have values of 'DN' or 'CN'. The default is 'CN', the current procedure. The DN is matched against the RFC2253 formatted DN, which looks like 'CN=fred,OU=eng'. This facility of probably best used in conjunction with an ident map. Discussion: https://postgr.es/m/92e70110-9273-d93c-5913-0bccb6562740@dunslane.net Reviewed-By: Michael Paquier, Daniel Gustafsson, Jacob Champion	2021-03-29 15:49:39 -04:00
Peter Eisentraut	f37fec837c	Add unistr function This allows decoding a string with Unicode escape sequences. It is similar to Unicode escape strings, but offers some more flexibility. Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Asif Rehman <asifr.rehman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com	2021-03-29 11:56:53 +02:00
Peter Geoghegan	30aaab26e5	PageAddItemExtended(): Add LP_UNUSED assertion. Assert that LP_UNUSED items have no storage. If it's worth having defensive code in non-assert builds then it's worth having an assertion as well.	2021-03-28 20:10:02 -07:00
David Rowley	f58b230ed0	Cache if PathTarget and RestrictInfos contain volatile functions Here we aim to reduce duplicate work done by contain_volatile_functions() by caching whether PathTargets and RestrictInfos contain any volatile functions the first time contain_volatile_functions() is called for them. Any future calls for these nodes just use the cached value rather than going to the trouble of recursively checking the sub-node all over again. Thanks to Tom Lane for the idea. Any locations in the code which make changes to a PathTarget or RestrictInfo which could change the outcome of the volatility check must change the cached value back to VOLATILITY_UNKNOWN again. contain_volatile_functions() is the only code in charge of setting the cache value to either VOLATILITY_VOLATILE or VOLATILITY_NOVOLATILE. Some existing code does benefit from this additional caching, however, this change is mainly aimed at an upcoming patch that must check for volatility during the join search. Repeated volatility checks in that case can become very expensive when the join search contains more than a few relations. Author: David Rowley Discussion: https://postgr.es/m/3795226.1614059027@sss.pgh.pa.us	2021-03-29 14:55:26 +13:00
Peter Eisentraut	8df2f37114	Improve consistency of SQL code capitalization	2021-03-27 10:17:12 +01:00
Tomas Vondra	a4d75c86bf	Extended statistics on expressions Allow defining extended statistics on expressions, not just just on simple column references. With this commit, expressions are supported by all existing extended statistics kinds, improving the same types of estimates. A simple example may look like this: CREATE TABLE t (a int); CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t; ANALYZE t; The collected statistics are useful e.g. to estimate queries with those expressions in WHERE or GROUP BY clauses: SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0; SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20); This introduces new internal statistics kind 'e' (expressions) which is built automatically when the statistics object definition includes any expressions. This represents single-expression statistics, as if there was an expression index (but without the index maintenance overhead). The statistics is stored in pg_statistics_ext_data as an array of composite types, which is possible thanks to `79f6a942bd`. CREATE STATISTICS allows building statistics on a single expression, in which case in which case it's not possible to specify statistics kinds. A new system view pg_stats_ext_exprs can be used to display expression statistics, similarly to pg_stats and pg_stats_ext views. ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it treats indexes, i.e. it drops and recreates the statistics. This means all statistics are reset, and we no longer try to preserve at least the functional dependencies. This should not be a major issue in practice, as the functional dependencies actually rely on per-column statistics, which were always reset anyway. Author: Tomas Vondra Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com	2021-03-27 00:01:11 +01:00
Tomas Vondra	33e52ad9a3	Fix ndistinct estimates with system attributes When estimating the number of groups using extended statistics, the code was discarding information about system attributes. This led to strange situation that SELECT 1 FROM t GROUP BY ctid; could have produced higher estimate (equal to pg_class.reltuples) than SELECT 1 FROM t GROUP BY a, b, ctid; with extended statistics on (a,b). Fixed by retaining information about the system attribute. Backpatch all the way to 10, where extended statistics were introduced. Author: Tomas Vondra Backpatch-through: 10	2021-03-26 22:34:58 +01:00
Noah Misch	a14a0118a1	Add "pg_database_owner" default role. Membership consists, implicitly, of the current database owner. Expect use in template databases. Once pg_database_owner has rights within a template, each owner of a database instantiated from that template will exercise those rights. Reviewed by John Naylor. Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com	2021-03-26 10:42:17 -07:00
Noah Misch	f687bf61ed	Merge similar algorithms into roles_is_member_of(). The next commit would have complicated two or three algorithms, so take this opportunity to consolidate. No functional changes. Reviewed by John Naylor. Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com	2021-03-26 10:42:16 -07:00
Tomas Vondra	73b96bad4a	Fix alignment in BRIN minmax-multi deserialization The deserialization failed to ensure correct alignment, as it assumed it can simply point into the serialized value. The serialization however ignores alignment and copies just the significant bytes in order to make the result as small as possible. This caused failures on systems that are sensitive to mialigned addresses, like sparc, or with address sanitizer enabled. Fixed by copying the serialized data to ensure proper alignment. While at it, fix an issue with serialization on big endian machines, using the same store_att_byval/fetch_att trick as extended statistics. Discussion: https://postgr.es/0c8c3304-d3dd-5e29-d5ac-b50589a23c8c%40enterprisedb.com	2021-03-26 16:48:36 +01:00
Tomas Vondra	ab596105b5	BRIN minmax-multi indexes Adds BRIN opclasses similar to the existing minmax, except that instead of summarizing the page range into a single [min,max] range, the summary consists of multiple ranges and/or points, allowing gaps. This allows more efficient handling of data with poor correlation to physical location within the table and/or outlier values, for which the regular minmax opclassed tend to work poorly. It's possible to specify the number of values kept for each page range, either as a single point or an interval boundary. CREATE TABLE t (a int); CREATE INDEX ON t USING brin (a int4_minmax_multi_ops(values_per_range=16)); When building the summary, the values are combined into intervals with the goal to minimize the "covering" (sum of interval lengths), using a support procedure computing distance between two values. Bump catversion, due to various catalog changes. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com	2021-03-26 13:54:30 +01:00
Tomas Vondra	77b88cd1bb	BRIN bloom indexes Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes using the new opclasses allow only equality queries (similar to hash indexes), but that works fine for data like UUID, MAC addresses etc. for which range queries are not very common. This also means the indexes work for data that is not well correlated to physical location within the table, or perhaps even entirely random (which is a common issue with existing BRIN minmax opclasses). It's possible to specify opclass parameters with the usual Bloom filter parameters, i.e. the desired false-positive rate and the expected number of distinct values per page range. CREATE TABLE t (a int); CREATE INDEX ON t USING brin (a int4_bloom_ops(false_positive_rate = 0.05, n_distinct_per_range = 100)); The opclasses do not operate on the indexed values directly, but compute a 32-bit hash first, and the Bloom filter is built on the hash value. Collisions should not be a huge issue though, as the number of distinct values in a page ranges is usually fairly small. Bump catversion, due to various catalog changes. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru> Reviewed-by: Nico Williams <nico@cryptonector.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com	2021-03-26 13:35:32 +01:00
Tomas Vondra	a681e3c107	Support the old signature of BRIN consistent function Commit `a1c649d889` changed the signature of the BRIN consistent function by adding a new required parameter. Treating the parameter as optional, which would make the change backwards incompatibile, was rejected with the justification that there are few out-of-core extensions, so it's not worth adding making the code more complex, and it's better to deal with that in the extension. But after further thought, that would be rather problematic, because pg_upgrade simply dumps catalog contents and the same version of an extension needs to work on both PostgreSQL versions. Supporting both variants of the consistent function (with 3 or 4 arguments) makes that possible. The signature is not the only thing that changed, as commit `72ccf55cb9` moved handling of IS [NOT] NULL keys from the support procedures. But this change is backward compatible - handling the keys in exension is unnecessary, but harmless. The consistent function will do a bit of unnecessary work, but it should be very cheap. This also undoes most of the changes to the existing opclasses (minmax and inclusion), making them use the old signature again. This should make backpatching simpler. Catversion bump, because of changes in pg_amproc. Author: Tomas Vondra <tomas.vondra@postgresql.org> Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-26 13:17:58 +01:00
Robert Haas	5db1fd7823	Fix interaction of TOAST compression with expression indexes. Before, trying to compress a value for insertion into an expression index would crash. Dilip Kumar, with some editing by me. Report by Jaime Casanova. Discussion: http://postgr.es/m/CAJKUy5gcs0zGOp6JXU2mMVdthYhuQpFk=S3V8DOKT=LZC1L36Q@mail.gmail.com	2021-03-25 19:55:32 -04:00
Alvaro Herrera	71f4c8c6f7	ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY Allow a partition be detached from its partitioned table without blocking concurrent queries, by running in two transactions and only requiring ShareUpdateExclusive in the partitioned table. Because it runs in two transactions, it cannot be used in a transaction block. This is the main reason to use dedicated syntax: so that users can choose to use the original mode if they need it. But also, it doesn't work when a default partition exists (because an exclusive lock would still need to be obtained on it, in order to change its partition constraint.) In case the second transaction is cancelled or a crash occurs, there's ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final steps. The main trick to make this work is the addition of column pg_inherits.inhdetachpending, initially false; can only be set true in the first part of this command. Once that is committed, concurrent transactions that use a PartitionDirectory will include or ignore partitions so marked: in optimizer they are ignored if the row is marked committed for the snapshot; in executor they are always included. As a result, and because of the way PartitionDirectory caches partition descriptors, queries that were planned before the detach will see the rows in the detached partition and queries that are planned after the detach, won't. A CHECK constraint is created that duplicates the partition constraint. This is probably not strictly necessary, and some users will prefer to remove it afterwards, but if the partition is re-attached to a partitioned table, the constraint needn't be rechecked. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql	2021-03-25 18:00:28 -03:00
Alvaro Herrera	cc121d5596	Add comments for AlteredTableInfo->rel The prior commit which introduced it was pretty squalid in terms of code documentation, so add some comments.	2021-03-25 16:07:15 -03:00
Alvaro Herrera	cd03c6e94b	Let ALTER TABLE Phase 2 routines manage the relation pointer Struct AlteredRelationInfo gains a new Relation member, to be used only by Phase 2 (ATRewriteCatalogs); this allows ATExecCmd() subroutines open and close the relation internally. A future commit will use this facility to implement an ALTER TABLE subcommand that closes and reopens the relation across transaction boundaries. (It is possible to keep the relation open past phase 2 to be used by phase 3 instead of having to reopen it that point, but there are some minor complications with that; it's not clear that there is much to be won from doing that, though.) Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql	2021-03-25 15:56:11 -03:00
Alvaro Herrera	a24ae3d7b9	Remove StoreSingleInheritance reimplementation I introduced this duplicate code in commit `8b08f7d482` for no good reason. Remove it, and backpatch to 11 where it was introduced. Author: Álvaro Herrera <alvherre@alvh.no-ip.org>	2021-03-25 10:47:38 -03:00
Peter Eisentraut	f2c7ce64ae	Trim some extra whitespace in parser file	2021-03-25 10:17:52 +01:00
Peter Eisentraut	91d1f2d302	Rename a parse node to be more general A WHERE clause will be used for row filtering in logical replication. We already have a similar node: 'WHERE (condition here)'. Let's rename the node to a generic name and use it for row filtering too. Author: Euler Taveira <euler.taveira@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com	2021-03-25 10:06:32 +01:00
Michael Paquier	a1999a01bb	Sanitize the term "combo CID" in code comments Combo CIDs were referred in the code comments using different terms across various places of the code, so unify a bit the term used with what is currently in use in some of the READMEs. Author: "Hou, Zhijie" Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local	2021-03-25 16:08:03 +09:00
Fujii Masao	438fc4a39c	Fix bug in WAL replay of COMMIT_TS_SETTS record. Previously the WAL replay of COMMIT_TS_SETTS record called TransactionTreeSetCommitTsData() with the argument write_xlog=true, which generated and wrote new COMMIT_TS_SETTS record. This should not be acceptable because it's during recovery. This commit fixes the WAL replay of COMMIT_TS_SETTS record so that it calls TransactionTreeSetCommitTsData() with write_xlog=false and doesn't generate new WAL during recovery. Back-patch to all supported branches. Reported-by: lx zou <zoulx1982@163.com> Author: Fujii Masao Reviewed-by: Alvaro Herrera Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org	2021-03-25 11:23:30 +09:00
Fujii Masao	df9384492b	Improve connection denied error message during recovery. Previously when an archive recovery or a standby was starting and reached the consistent recovery state but hot_standby was configured to off, the error message when a client connectted was "the database system is starting up", which was needless confusing and not really all that accurate either. This commit improves the connection denied error message during recovery, as follows, so that the users immediately know that their servers are configured to deny those connections. - If hot_standby is disabled, the error message "the database system is not accepting connections" and the detail message "Hot standby mode is disabled." are output when clients connect while an archive recovery or a standby is running. - If hot_standby is enabled, the error message "the database system is not yet accepting connections" and the detail message "Consistent recovery state has not been yet reached." are output when clients connect until the consistent recovery state is reached and postmaster starts accepting read only connections. This commit doesn't change the connection denied error message of "the database system is starting up" during normal server startup and crash recovery. Because it's still suitable for those situations. Author: James Coleman Reviewed-by: Alvaro Herrera, Andres Freund, David Zhang, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAAaqYe8h5ES_B=F_zDT+Nj9XU7YEwNhKhHA2RE4CFhAQ93hfig@mail.gmail.com	2021-03-25 10:41:28 +09:00
Peter Eisentraut	37c99d304d	Fix stray double semicolons Reported-by: John Naylor <john.naylor@enterprisedb.com>	2021-03-24 20:42:51 +01:00
Stephen Frost	bbcc4eb2e0	Change checkpoint_completion_target default to 0.9 Common recommendations are that the checkpoint should be spread out as much as possible, provided we avoid having it take too long. This change updates the default to 0.9 (from 0.5) to match that recommendation. There was some debate about possibly removing the option entirely but it seems there may be some corner-cases where having it set much lower to try to force the checkpoint to be as fast as possible could result in fewer periods of time of reduced performance due to kernel flushing. General agreement is that the "spread more" is the preferred approach though and those who need to tune away from that value are much less common. Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele, Nathan Bossart Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net	2021-03-24 13:07:51 -04:00
Robert Haas	e5595de03e	Tidy up more loose ends related to configurable TOAST compression. Change the default_toast_compression GUC to be an enum rather than a string. Earlier, uncommitted versions of the patch supported using CREATE ACCESS METHOD to add new compression methods to a running system, but that idea was dropped before commit. So, we can simplify the GUC handling as well, which has the nice side effect of improving the error messages. While updating the documentation to reflect the new GUC type, also move it back to the right place in the list. I moved this while revising what became commit `24f0e395ac`, but apparently the intended ordering is "alphabetical" rather than "whatever Robert thinks looks nice." Rejigger things to avoid having access/toast_compression.h depend on utils/guc.h, so that we don't end up with every file that includes it also depending on something largely unrelated. Move a few inline functions back into the C source file partly to help reduce dependencies and partly just to avoid clutter. A few very minor cosmetic fixes. Original patch by Justin Pryzby, but very heavily edited by me, and reverse reviewed by him and also reviewed by by Tom Lane. Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com	2021-03-24 12:36:08 -04:00
Peter Eisentraut	49ab61f0bd	Add date_bin function Similar to date_trunc, but allows binning by an arbitrary interval rather than just full units. Author: John Naylor <john.naylor@enterprisedb.com> Reviewed-by: David Fetter <david@fetter.org> Reviewed-by: Isaac Morland <isaac.morland@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Artur Zakirov <zaartur@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com	2021-03-24 16:18:24 +01:00
Peter Eisentraut	1509c6fc29	Improve an error message Make it the same as another nearby message.	2021-03-24 08:02:06 +01:00
Amit Kapila	26acb54a13	Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."." To allow inserts in parallel-mode this feature has to ensure that all the constraints, triggers, etc. are parallel-safe for the partition hierarchy which is costly and we need to find a better way to do that. Additionally, we could have used existing cached information in some cases like indexes, domains, etc. to determine the parallel-safety. List of commits reverted, in reverse chronological order: `ed62d3737c` Doc: Update description for parallel insert reloption. `c8f78b6161` Add a new GUC and a reloption to enable inserts in parallel-mode. `c5be48f092` Improve FK trigger parallel-safety check added by `05c8482f7f`. `e2cda3c20a` Fix use of relcache TriggerDesc field introduced by commit `05c8482f7f`. `e4e87a32cc` Fix valgrind issue in commit `05c8482f7f`. `05c8482f7f` Enable parallel SELECT for "INSERT INTO ... SELECT ...". Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org	2021-03-24 11:29:15 +05:30
Fujii Masao	84007043fc	Rename wait event WalrcvExit to WalReceiverExit. Commit `de829ddf23` added wait event WalrcvExit. But its name is not consistent with other wait events like WalReceiverMain or WalReceiverWaitStart, etc. So this commit renames WalrcvExit to WalReceiverExit. Author: Fujii Masao Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com	2021-03-24 10:37:54 +09:00
Fujii Masao	7fbcee1b2d	Log when GetNewOidWithIndex() fails to find unused OID many times. GetNewOidWithIndex() generates a new OID one by one until it finds one not in the relation. If there are very long runs of consecutive existing OIDs, GetNewOidWithIndex() needs to iterate many times in the loop to find unused OID. Since TOAST table can have a large number of entries and there can be such long runs of OIDs, there is the case where it takes so many iterations to find new OID not in TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used, GetNewOidWithIndex() enters something like busy loop and repeats the iterations until at least one OID is marked as unused. There are some reported troubles caused by a large number of iterations in GetNewOidWithIndex(). For example, when inserting a billion of records into the table, all the backends doing that insertion operation got hang with 100% CPU usage at some point. Previously there was no easy way to detect that GetNewOidWithIndex() failed to find unused OID many times. So, for example, gdb full backtrace of hanged backends needed to be taken, in order to investigate that trouble. This is inconvenient and may not be available in some production environments. To provide easy way for that, this commit makes GetNewOidWithIndex() log that it iterates more than GETNEWOID_LOG_THRESHOLD but have not yet found OID unused in the relation. Also this commit makes it repeat logging with exponentially increasing intervals until it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless an unused OID is found. Those macro variables are used not to fill up the server log with the similar messages. In the discusion at pgsql-hackers, there was another idea to report the lots of iterations in GetNewOidWithIndex() via wait event. But since GetNewOidWithIndex() traverses indexes to find unused OID and which will do I/O, acquire locks, etc, which will overwrite the wait event and reset it to nothing once done. So that idea doesn't work well, and we didn't adopt it. Author: Tomohiro Hiramitsu Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org	2021-03-24 10:36:56 +09:00
Michael Paquier	99dd75fb99	Reword slightly logs generated for index stats in autovacuum Using "remain" is confusing, as it implies that the index file can shrink. Instead, use "in total". Per discussion with Peter Geoghegan. Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com	2021-03-24 09:36:03 +09:00
Tomas Vondra	79f6a942bd	Allow composite types in catalog bootstrap When resolving types during catalog bootstrap, try to reload the pg_type contents if a type is not found. That allows catalogs to contain composite types, e.g. row types for other catalogs. Author: Justin Pryzby Reviewed-by: Dean Rasheed, Tomas Vondra Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com	2021-03-24 00:47:52 +01:00
Tomas Vondra	e1a5e65703	Convert Typ from array to list in bootstrap It's a bit easier and more convenient to free and reload a List, compared to a plain array. This will be helpful when allowing catalogs to contain composite types. Author: Justin Pryzby Reviewed-by: Dean Rasheed, Tomas Vondra Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com	2021-03-24 00:47:40 +01:00
Peter Geoghegan	5b861baa55	nbtree VACUUM: Cope with buggy opclasses. Teach nbtree VACUUM to press on with vacuuming in the event of a page deletion attempt that fails to "re-find" a downlink for its child/target page. There is no good reason to treat this as an irrecoverable error. But there is a good reason not to: pressing on at this point removes any question of VACUUM not making progress solely due to misbehavior from user-defined operator class code. Discussion: https://postgr.es/m/CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com	2021-03-23 16:09:51 -07:00
Tom Lane	9d523119fd	Avoid possible crash while finishing up a heap rewrite. end_heap_rewrite was not careful to ensure that the target relation is open at the smgr level before performing its final smgrimmedsync. In ordinary cases this is no problem, because it would have been opened earlier during the rewrite. However a crash can be reproduced by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled. Although that exact scenario does not crash in v13, I think that's a chance result of unrelated planner changes, and the problem is likely still reachable with other test cases. The true proximate cause of this failure is commit `c6b92041d`, which replaced a call to heap_sync (which was careful about opening smgr) with a direct call to smgrimmedsync. Hence, back-patch to v13. Amul Sul, per report from Neha Sharma; cosmetic changes and test case by me. Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com	2021-03-23 11:24:16 -04:00
Peter Eisentraut	a6715af1e7	Add bit_count SQL function This function for bit and bytea counts the set bits in the bit or byte string. Internally, we use the existing popcount functionality. For the name, after some discussion, we settled on bit_count, which also exists with this meaning in MySQL, Java, and Python. Author: David Fetter <david@fetter.org> Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org	2021-03-23 10:13:58 +01:00
Michael Paquier	5aed6a1fc2	Add per-index stats information in verbose logs of autovacuum Once a relation's autovacuum is completed, the logs include more information about this relation state if the threshold of log_autovacuum_min_duration (or its relation option) is reached, with for example contents about the statistics of the VACUUM operation for the relation, WAL and system usage. This commit adds more information about the statistics of the relation's indexes, with one line of logs generated for each index. The index stats were already calculated, but not printed in the context of autovacuum yet. While on it, some refactoring is done to keep track of the index statistics directly within LVRelStats, simplifying some routines related to parallel VACUUMs. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Euler Taveira Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com	2021-03-23 13:25:14 +09:00
Amit Kapila	4b82ed6eca	Fix dangling pointer reference in stream_cleanup_files. We can't access the entry after it is removed from dynahash. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com	2021-03-23 09:43:33 +05:30
Tomas Vondra	a5f002ad9a	Use correct spelling of statistics kind A couple error messages and comments used 'statistic kind', not the correct 'statistics kind'. Fix and backpatch all the way back to 10, where extended statistics were introduced. Backpatch-through: 10	2021-03-23 05:01:35 +01:00
Fujii Masao	1e3e8b51bd	Change the type of WalReceiverWaitStart wait event from Client to IPC. Previously the type of this wait event was Client. But while this wait event is being reported, walreceiver process is waiting for the startup process to set initial data for streaming replication. It's not waiting for any activity on a socket connected to a user application or walsender. So this commit changes the type for WalReceiverWaitStart wait event to IPC. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com	2021-03-23 10:09:42 +09:00
Bruce Momjian	95d77149c5	Add macro RelationIsPermanent() to report relation permanence Previously, to check relation permanence, the Relation's Form_pg_class structure member relpersistence was compared to the value RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro RelationIsPermanent() and is used in appropirate places to simplify the code. This matches other RelationIs* macros. This macro will be used in more places in future cluster file encryption patches. Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net	2021-03-22 20:23:52 -04:00
Tomas Vondra	8e4b332e88	Optimize allocations in bringetbitmap The bringetbitmap function allocates memory for various purposes, which may be quite expensive, depending on the number of scan keys. Instead of allocating them separately, allocate one bit chunk of memory an carve it into smaller pieces as needed - all the pieces have the same lifespan, and it saves quite a bit of CPU and memory overhead. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:47:09 +01:00
Tomas Vondra	72ccf55cb9	Move IS [NOT] NULL handling from BRIN support functions The handling of IS [NOT] NULL clauses is independent of an opclass, and most of the code was exactly the same in both minmax and inclusion. So instead move the code from support procedures to the AM. This simplifies the code - especially the support procedures - quite a bit, as they don't need to care about NULL values and flags at all. It also means the IS [NOT] NULL clauses can be evaluated without invoking the support procedure. Author: Tomas Vondra <tomas.vondra@postgresql.org> Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:45:42 +01:00
Tomas Vondra	a1c649d889	Pass all scan keys to BRIN consistent function at once This commit changes how we pass scan keys to BRIN consistent function. Instead of passing them one by one, we now pass all scan keys for a given attribute at once. That makes the consistent function a bit more complex, as it has to loop through the keys, but it does allow more elaborate opclasses that can use multiple keys to eliminate ranges much more effectively. The existing BRIN opclasses (minmax, inclusion) don't really benefit from this change. The primary purpose is to allow future opclases to benefit from seeing all keys at once. This does change the BRIN API, because the signature of the consistent function changes (a new parameter with number of scan keys). So this breaks existing opclasses, and will require supporting two variants of the code for different PostgreSQL versions. We've considered supporting two variants of the consistent, but we've decided not to do that. Firstly, there's another patch that moves handling of NULL values from the opclass, which means the opclasses need to be updated anyway. Secondly, we're not aware of any out-of-core BRIN opclasses, so it does not seem worth the extra complexity. Bump catversion, because of pg_proc changes. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:45:03 +01:00
Tomas Vondra	bfa2cee784	Move bsearch_arg to src/port Until now the bsearch_arg function was used only in extended statistics code, so it was defined in that code. But we already have qsort_arg in src/port, so let's move it next to it.	2021-03-23 00:11:22 +01:00
Tom Lane	063dd37ebc	Short-circuit slice requests that are for more than the object's size. substring(), and perhaps other callers, isn't careful to pass a slice length that is no more than the datum's true size. Since toast_decompress_datum_slice's children will palloc the requested slice length, this can waste memory. Also, close study of the liblz4 documentation suggests that it is dependent on the caller to not ask for more than the correct amount of decompressed data; this squares with observed misbehavior with liblz4 1.8.3. Avoid these problems by switching to the normal full-decompression code path if the slice request is >= datum's decompressed size. Tom Lane and Dilip Kumar Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us	2021-03-22 14:01:20 -04:00
Tom Lane	aeb1631ed2	Mostly-cosmetic adjustments of TOAST-related macros. The authors of `bbe0a81db` hadn't quite got the idea that macros named like SOMETHING_4B_C were only meant for internal endianness-related details in postgres.h. Choose more legible names for macros that are intended to be used elsewhere. Rearrange postgres.h a bit to clarify the separation between those internal macros and ones intended for wider use. Also, avoid using the term "rawsize" for true decompressed size; we've used "extsize" for that, because "rawsize" generally denotes total Datum size including header. This choice seemed particularly unfortunate in tests that were comparing one of these meanings to the other. This patch includes a couple of not-purely-cosmetic changes: be sure that the shifts aligning compression methods are unsigned (not critical today, but will be when compression method 2 exists), and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only accidentally. Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us	2021-03-22 13:43:10 -04:00
Robert Haas	a4d5284a10	Error on invalid TOAST compression in CREATE or ALTER TABLE. The previous coding treated an invalid compression method name as equivalent to the default, which is certainly not right. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com	2021-03-22 10:57:08 -04:00
Robert Haas	226e2be387	More code cleanup for configurable TOAST compression. Remove unused macro. Fix confusion about whether a TOAST compression method is identified by an OID or a char. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com	2021-03-22 09:21:37 -04:00
Michael Paquier	909b449e00	Fix concurrency issues with WAL segment recycling on Windows This commit is mostly a revert of `aaa3aed`, that switched the routine doing the internal renaming of recycled WAL segments to use on Windows a combination of CreateHardLinkA() plus unlink() instead of rename(). As reported by several users of Postgres 13, this is causing concurrency issues when manipulating WAL segments, mostly in the shape of the following error: LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ": Permission denied This moves back to a logic where a single rename() (well, pgrename() for Windows) is used. This issue has proved to be hard to hit when I tested it, facing it only once with an archive_command that was not able to do its work, so it is environment-sensitive. The reporters of this issue have been able to confirm that the situation improved once we switched back to a single rename(). In order to check things, I have provided to the reporters a patched build based on 13.2 with `aaa3aed` reverted, to test if the error goes away, and an unpatched build of 13.2 to test if the error still showed up (just to make sure that I did not mess up my build process). Extra thanks to Fujii Masao for pointing out what looked like the culprit commit, and to all the reporters for taking the time to test what I have sent them. Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz Backpatch-through: 13	2021-03-22 14:02:26 +09:00
Michael Paquier	595b9cba2a	Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since `1148e22a`, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to `1148e22a`, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10	2021-03-22 08:30:53 +09:00
Tom Lane	ac897c4834	Fix assorted silliness in ATExecSetCompression(). It's not okay to scribble directly on a syscache entry. Nor to continue accessing said entry after releasing it. Also get rid of not-used local variables. Per valgrind testing.	2021-03-21 18:43:07 -04:00
Peter Geoghegan	9dd963ae25	Recycle nbtree pages deleted during same VACUUM. Maintain a simple array of metadata about pages that were deleted during nbtree VACUUM's current btvacuumscan() call. Use this metadata at the end of btvacuumscan() to attempt to place newly deleted pages in the FSM without further delay. It might not yet be safe to place any of the pages in the FSM by then (they may not be deemed recyclable), but we have little to lose and plenty to gain by trying. In practice there is a very good chance that this will work out when vacuuming larger indexes, where scanning the index naturally takes quite a while. This commit doesn't change the page recycling invariants; it merely improves the efficiency of page recycling within the confines of the existing design. Recycle safety is a part of nbtree's implementation of what Lanin & Shasha call "the drain technique". The design happens to use transaction IDs (they're stored in deleted pages), but that in itself doesn't align the cutoff for recycle safety to any of the XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters is whether or not _other_ backends might be able to observe various inconsistencies in the tree structure (that they cannot just detect and recover from by moving right). Recycle safety is purely a question of maintaining the consistency (or the apparent consistency) of a physical data structure. Note that running a simple serial test case involving a large range DELETE followed by a VACUUM VERBOSE will probably show that any newly deleted nbtree pages are not yet reusable/recyclable. This is expected in the absence of even one concurrent XID assignment. It is an old implementation restriction. In practice it's unlikely to be the thing that makes recycling remain unsafe, at least with larger indexes, where recycling newly deleted pages during the same VACUUM actually matters. An important high-level goal of this commit (as well as related recent commits `e5d8a999` and `9f3665fb`) is to make expensive deferred cleanup operations in index AMs rare in general. If index vacuuming frequently depends on the next VACUUM operation finishing off work that the current operation started, then the general behavior of index vacuuming is hard to predict. This is relevant to ongoing work that adds a vacuumlazy.c mechanism to skip index vacuuming in certain cases. Anything that makes the real world behavior of index vacuuming simpler and more linear will also make top-down modeling in vacuumlazy.c more robust. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com	2021-03-21 15:25:39 -07:00
Tom Lane	9fb9691a88	Suppress various new compiler warnings. Compilers that don't understand that elog(ERROR) doesn't return issued warnings here. In the cases in libpq_pipeline.c, we were not exactly helping things by failing to mark pg_fatal() as noreturn. Per buildfarm.	2021-03-21 11:50:43 -04:00
Peter Eisentraut	96ae658e62	Move lwlock-release probe back where it belongs The documentation specifically states that lwlock-release fires before any released waiters have been awakened. It worked that way until `ab5194e6f6`, where is seems to have been misplaced accidentally. Move it back where it belongs. Author: Craig Ringer <craig.ringer@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com	2021-03-21 08:02:30 +01:00
Tomas Vondra	882b2cdc08	Use valid compression method in brin_form_tuple When compressing the BRIN summary, we can't simply use the compression method from the indexed attribute. The summary may use a different data type, e.g. fixed-length attribute may have varlena summary, leading to compression failures. For the built-in BRIN opclasses this happens to work, because the summary uses the same data type as the attribute. When the data types match, we can inherit use the compression method specified for the attribute (it's copied into the index descriptor). Otherwise we don't have much choice and have to use the default one. Author: Tomas Vondra Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com	2021-03-21 00:28:34 +01:00
Tom Lane	e835e89a0f	Fix memory leak when rejecting bogus DH parameters. While back-patching `e0e569e1d`, I noted that there were some other places where we ought to be applying DH_free(); namely, where we load some DH parameters from a file and then reject them as not being sufficiently secure. While it seems really unlikely that anybody would hit these code paths in production, let alone do so repeatedly, let's fix it for consistency. Back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org	2021-03-20 12:47:21 -04:00
Tom Lane	f0c2a5bba6	Avoid leaking memory in RestoreGUCState(), and improve comments. RestoreGUCState applied InitializeOneGUCOption to already-live GUC entries, causing any malloc'd subsidiary data to be forgotten. We do want the effect of resetting the GUC to its compiled-in default, and InitializeOneGUCOption seems like the best way to do that, so add code to free any existing subsidiary data beforehand. The interaction between can_skip_gucvar, SerializeGUCState, and RestoreGUCState is way more subtle than their opaque comments would suggest to an unwary reader. Rewrite and enlarge the comments to try to make it clearer what's happening. Remove a long-obsolete assertion in read_nondefault_variables: the behavior of set_config_option hasn't depended on IsInitProcessingMode since `f5d9698a8` installed a better way of controlling it. Although this is fixing a clear memory leak, the leak is quite unlikely to involve any large amount of data, and it can only happen once in the lifetime of a worker process. So it seems unnecessary to take any risk of back-patching. Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us	2021-03-19 23:03:17 -04:00
Thomas Munro	61752afb26	Provide recovery_init_sync_method=syncfs. Since commit `2ce439f3` we have opened every file in the data directory and called fsync() at the start of crash recovery. This can be very slow if there are many files, leading to field complaints of systems taking minutes or even hours to begin crash recovery. Provide an alternative method, for Linux only, where we call syncfs() on every possibly different filesystem under the data directory. This is equivalent, but avoids faulting in potentially many inodes from potentially slow storage. The new mode comes with some caveats, described in the documentation, so the default value for the new setting is "fsync", preserving the older behavior. Reported-by: Michael Brown <michael.brown@discourse.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Paul Guo <guopa@vmware.com> Reviewed-by: Bruce Momjian <bruce@momjian.us> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com	2021-03-20 12:07:28 +13:00
Tomas Vondra	b822ae13ea	Use lfirst_int in cmp_list_len_contents_asc The function added in `be45be9c33` is comparing integer lists (IntList) by length and contents, but there were two bugs. Firstly, it used intVal() to extract the value, but that's for Value nodes, not for extracting int values from IntList. Secondly, it called it directly on the ListCell, without doing lfirst(). So just do lfirst_int() instead. Interestingly enough, this did not cause any crashes on the buildfarm, but valgrind rightfully complained about it. Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org	2021-03-20 00:04:25 +01:00
Robert Haas	d00fbdc431	Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType. Introduced by commit `bbe0a81db6`. Per buildfarm member prion.	2021-03-19 17:17:48 -04:00
Robert Haas	bbe0a81db6	Allow configurable LZ4 TOAST compression. There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com	2021-03-19 15:10:38 -04:00
Fujii Masao	fd31214075	Fix comments in postmaster.c. Commit `86c23a6eb2` changed the option to specify that postgres will stop all other server processes by sending the signal SIGSTOP, from -s to -T. But previously there were comments incorrectly explaining that SIGSTOP behavior is set by -s option. This commit fixes them. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com	2021-03-19 11:28:54 +09:00
Tom Lane	9bacdf9f53	Don't leak malloc'd error string in libpqrcv_check_conninfo(). We leaked the error report from PQconninfoParse, when there was one. It seems unlikely that real usage patterns would repeat the failure often enough to create serious bloat, but let's back-patch anyway to keep the code similar in all branches. Found via valgrind testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	377b7a8300	Don't leak malloc'd strings when a GUC setting is rejected. Because guc.c prefers to keep all its string values in malloc'd not palloc'd storage, it has to be more careful than usual to avoid leaks. Error exits out of string GUC hook checks failed to clear the proposed value string, and error exits out of ProcessGUCArray() failed to clear the malloc'd results of ParseLongOption(). Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	d303849b05	Don't leak compiled regex(es) when an ispell cache entry is dropped. The text search cache mechanisms assume that we can clean up an invalidated dictionary cache entry simply by resetting the associated long-lived memory context. However, that does not work for ispell affixes that make use of regular expressions, because the regex library deals in plain old malloc. Hence, we leaked compiled regex(es) any time we dropped such a cache entry. That could quickly add up, since even a fairly trivial regex can use up tens of kB, and a large one can eat megabytes. Add a memory context callback to ensure that a regex gets freed when its owning cache entry is cleared. Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	415ffdc220	Don't run RelationInitTableAccessMethod in a long-lived context. Some code paths in this function perform syscache lookups, which can lead to table accesses and possibly leakage of cruft into the caller's context. If said context is CacheMemoryContext, we eventually will have visible bloat. But fixing this is no harder than moving one memory context switch step. (The other callers don't have a problem.) Andres Freund and I independently found this via valgrind testing. Back-patch to v12 where this code was added. Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	28644fac10	Don't leak rd_statlist when a relcache entry is dropped. Although these lists are usually NIL, and even when not empty are unlikely to be large, constant relcache update traffic could eventually result in visible bloat of CacheMemoryContext. Found via valgrind testing. Back-patch to v10 where this field was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	1d581ce712	Fix misuse of foreach_delete_current(). Our coding convention requires this macro's result to be assigned back to the original List variable. In this usage, since the List could not become empty, there was no actual bug --- but some compilers warned about it. Oversight in `be45be9c3`. Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com	2021-03-18 19:24:22 -04:00
Tomas Vondra	be45be9c33	Implement GROUP BY DISTINCT With grouping sets, it's possible that some of the grouping sets are duplicate. This is especially common with CUBE and ROLLUP clauses. For example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b, c), (a, b, c), (a, b), (a, b), (a, b), (a), (a), (a), (c, a), (c, a), (c, a), (c), (b, c), (b), () ) Some of the grouping sets are calculated multiple times, which is mostly unnecessary. This commit implements a new GROUP BY DISTINCT feature, as defined in the SQL standard, which eliminates the duplicate sets. Author: Vik Fearing Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org	2021-03-18 18:22:18 +01:00
Tomas Vondra	cd91de0d17	Remove temporary files after backend crash After a crash of a backend using temporary files, the files used to be left behind, on the basis that it might be useful for debugging. But we don't have any reports of anyone actually doing that, and it means the disk usage may grow over time due to repeated backend failures (possibly even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing it required either manual cleanup (deleting files, which is error-prone) or restart of the instance (i.e. service disruption). This implements automatic cleanup of temporary files, controled by a new GUC remove_temp_files_after_crash. By default the files are removed, but it can be disabled to restore the old behavior if needed. Author: Euler Taveira Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com	2021-03-18 17:38:28 +01:00
Magnus Hagander	da18d829c2	Fix function name in error hint pg_read_file() is the function that's in core, pg_file_read() is in adminpack. But when using pg_file_read() in adminpack it calls the C level function pg_read_file() in core, which probably threw the original author off. But the error hint should be about the SQL function. Reported-By: Sergei Kornilov Backpatch-through: 11 Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru	2021-03-18 11:22:20 +01:00
Amit Kapila	c8f78b6161	Add a new GUC and a reloption to enable inserts in parallel-mode. Commit `05c8482f7f` added the implementation of parallel SELECT for "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in the additional parallel-safety checks that it performs, even when, in the end, those checks determine that parallelism can't be used. This is normally only ever a problem in the case of when the target table has a large number of partitions. A new GUC option "enable_parallel_insert" is added, to allow insert in parallel-mode. The default is on. In addition to the GUC option, the user may want a mechanism to allow inserts in parallel-mode with finer granularity at table level. The new table option "parallel_insert_enabled" allows this. The default is true. Author: "Hou, Zhijie" Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com	2021-03-18 07:25:27 +05:30
Andres Freund	5f79580ad6	Fix memory lifetime issues of replication slot stats. When accessing replication slot stats, introduced in `9868167500`, pgstat_read_statsfiles() reads the data into newly allocated memory. Unfortunately the current memory context at that point is the callers, leading to leaks and use-after-free dangers. The fix is trivial, explicitly use pgStatLocalContext. There's some potential for further improvements, but that's outside of the scope of this bugfix. No backpatch necessary, feature is only in HEAD. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de	2021-03-17 16:21:46 -07:00
Tom Lane	8620a7f6db	Code review for server's handling of "tablespace map" files. While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.	2021-03-17 16:18:46 -04:00
Tom Lane	a50e4fd028	Prevent buffer overrun in read_tablespace_map(). Robert Foggia of Trustwave reported that read_tablespace_map() fails to prevent an overrun of its on-stack input buffer. Since the tablespace map file is presumed trustworthy, this does not seem like an interesting security vulnerability, but still we should fix it just in the name of robustness. While here, document that pg_basebackup's --tablespace-mapping option doesn't work with tar-format output, because it doesn't. To make it work, we'd have to modify the tablespace_map file within the tarball sent by the server, which might be possible but I'm not volunteering. (Less-painful solutions would require changing the basebackup protocol so that the source server could adjust the map. That's not very appetizing either.)	2021-03-17 16:10:37 -04:00
Thomas Munro	7f7f25f15e	Revert "Fix race in Parallel Hash Join batch cleanup." This reverts commit `378802e371`. This reverts commit `3b8981b6e1`. Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com	2021-03-18 01:10:55 +13:00
Michael Paquier	9fd2952cf4	Fix comment in indexing.c `578b229`, that removed support for WITH OIDS, has changed CatalogTupleInsert() to not return an Oid, but one comment was still mentioning that. Author: Vik Fearing Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org	2021-03-17 18:07:00 +09:00
Peter Eisentraut	e1ae40f381	Small error message improvement	2021-03-17 08:17:33 +01:00
Thomas Munro	378802e371	Update the names of Parallel Hash Join phases. Commit `3048898e` dropped -ING from some wait event names that correspond to barrier phases. Update the phases' names to match. While we're here making cosmetic changes, also rename "DONE" to "FREE". That pairs better with "ALLOCATE", and describes the activity that actually happens in that phase (as we do for the other phases) rather than describing a state. The distinction is clearer after bugfix commit `3b8981b6` split the phase into two. As for the growth barriers, rename their "ALLOCATE" phase to "REALLOCATE", which is probably a better description of what happens then. Also improve the comments about the phases a bit. Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com	2021-03-17 18:43:04 +13:00
Thomas Munro	3b8981b6e1	Fix race in Parallel Hash Join batch cleanup. With very unlucky timing and parallel_leader_participation off, PHJ could attempt to access per-batch state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was buggy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase at the same time, so that late attachers can avoid the hazard, without the data race. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). Revealed by a one-off build farm failure, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). Back-patch to 11, where the code arrived. Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz	2021-03-17 18:05:39 +13:00
Amit Kapila	6b67d72b60	Fix race condition in drop subscription's handling of tablesync slots. Commit `ce0fdbfe97` made tablesync slots permanent and allow Drop Subscription to drop such slots. However, it is possible that before tablesync worker could get the acknowledgment of slot creation, drop subscription stops it and that can lead to a dangling slot on the publisher. Prevent cancel/die interrupts while creating a slot in the tablesync worker. Reported-by: Thomas Munro as per buildfarm Author: Amit Kapila Reviewed-by: Vignesh C, Takamichi Osumi Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com	2021-03-17 08:15:12 +05:30
Thomas Munro	9e7ccd9ef6	Enable parallelism in REFRESH MATERIALIZED VIEW. Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans are considered when running the underlying SELECT query. This wasn't done in commit `e9baa5e9`, which did this for CREATE MATERIALIZED VIEW, because it wasn't yet known to be safe. Since REFRESH always inserts into a freshly created table before later merging or swapping the data into place with separate operations, we can enable such plans here too. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Luc Vlaming <luc@swarm64.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com	2021-03-17 15:04:17 +13:00
Peter Geoghegan	fbe4cb3bd4	Fix comment about promising tuples. Oversight in commit `d168b66682`, which added bottom-up index deletion.	2021-03-16 13:38:52 -07:00
Tom Lane	4b12ab18c9	Avoid corner-case memory leak in SSL parameter processing. After reading the root cert list from the ssl_ca_file, immediately install it as client CA list of the new SSL context. That gives the SSL context ownership of the list, so that SSL_CTX_free will free it. This avoids a permanent memory leak if we fail further down in be_tls_init(), which could happen if bogus CRL data is offered. The leak could only amount to something if the CRL parameters get broken after server start (else we'd just quit) and then the server is SIGHUP'd many times without fixing the CRL data. That's rather unlikely perhaps, but it seems worth fixing, if only because the code is clearer this way. While we're here, add some comments about the memory management aspects of this logic. Noted by Jelte Fennema and independently by Andres Freund. Back-patch to v10; before commit `de41869b6` it doesn't matter, since we'd not re-execute this code during SIGHUP. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org	2021-03-16 16:03:06 -04:00
Stephen Frost	c6fc50cb40	Use pre-fetching for ANALYZE When we have posix_fadvise() available, we can improve the performance of an ANALYZE by quite a bit by using it to inform the kernel of the blocks that we're going to be asking for. Similar to bitmap index scans, the number of buffers pre-fetched is based off of the maintenance_io_concurrency setting (for the particular tablespace or, if not set, globally, via get_tablespace_maintenance_io_concurrency()). Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com	2021-03-16 14:46:48 -04:00
Stephen Frost	94d13d474d	Improve logging of auto-vacuum and auto-analyze When logging auto-vacuum and auto-analyze activity, include the I/O timing if track_io_timing is enabled. Also, for auto-analyze, add the read rate and the dirty rate, similar to how that information has historically been logged for auto-vacuum. Stephen Frost and Jakub Wartak Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com	2021-03-16 14:46:48 -04:00
Tom Lane	1ea396362b	Improve logging of bad parameter values in BIND messages. Since commit `ba79cb5dc`, values of bind parameters have been logged during errors in extended query mode. However, we only did that after we'd collected and converted all the parameter values, thus failing to offer any useful localization of invalid-parameter problems. Add a separate callback that's used during parameter collection, and have it print the parameter number, along with the input string if text input format is used. Justin Pryzby and Tom Lane Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com	2021-03-16 11:16:41 -04:00
Alvaro Herrera	acb7e4eb6b	Implement pipeline mode in libpq Pipeline mode in libpq lets an application avoid the Sync messages in the FE/BE protocol that are implicit in the old libpq API after each query. The application can then insert Sync at its leisure with a new libpq function PQpipelineSync. This can lead to substantial reductions in query latency. Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com> Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com> Reviewed-by: Daniel Vérité <daniel@manitou-mail.org> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com> Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com	2021-03-15 18:13:42 -03:00
Fujii Masao	d75288fb27	Make archiver process an auxiliary process. This commit changes WAL archiver process so that it's treated as an auxiliary process and can use shared memory. This is an infrastructure patch required for upcoming shared-memory based stats collector patch series. These patch series basically need any processes including archiver that can report the statistics to access to shared memory. Since this patch itself is useful to simplify the code and when users monitor the status of archiver, it's committed separately in advance. This commit simplifies the code for WAL archiving. For example, previously backends need to signal to archiver via postmaster when they notify archiver that there are some WAL files to archive. On the other hand, this commit removes that signal to postmaster and enables backends to notify archier directly using shared latch. Also, as the side of this change, the information about archiver process becomes viewable at pg_stat_activity view. Author: Kyotaro Horiguchi Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp	2021-03-15 13:13:14 +09:00
Peter Geoghegan	0ea71c93a0	Notice that heap page has dead items during VACUUM. Consistently set a flag variable that tracks whether the current heap page has a dead item during lazy vacuum's heap scan. We missed the common case where there is an preexisting (or even a new) LP_DEAD heap line pointer. Also make it clear that the variable might be affected by an existing line pointer, say from an earlier opportunistic pruning operation. This distinction is important because it's the main reason why we can't just use the nearby tups_vacuumed variable instead. No backpatch. In theory failing to set the page level flag variable had no consequences. Currently it is only used to defensively check if a page marked all visible has dead items, which should never happen anyway (if it does then the table must be corrupt). Author: Masahiko Sawada <sawada.mshk@gmail.com> Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com	2021-03-14 18:05:57 -07:00
Amit Kapila	c5be48f092	Improve FK trigger parallel-safety check added by `05c8482f7f`. Commit `05c8482f7f` added special logic related to parallel-safety of FK triggers. This is a bit of a hack and should have instead been done by simply setting appropriate proparallel values on those trigger functions themselves. Suggested-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us	2021-03-13 09:20:52 +05:30
Peter Geoghegan	02b5940dbe	Consolidate nbtree VACUUM metapage routines. Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs a single 'rel' argument now), and move it next to its sibling function in nbtpage.c. I believe that _bt_vacuum_needs_cleanup() was originally located in nbtree.c due to an include dependency issue. That's no longer an issue. Follow-up to commit `9f3665fb`.	2021-03-12 13:11:47 -08:00
Tom Lane	f52c5d6749	Forbid marking an identity column as nullable. GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY NULL". One might think the old behavior was a feature, but it was inconsistent because the outcome varied depending on the order of the clauses, so it seems to have been just an oversight. Per bug #16913 from Pavel Boev. Back-patch to v10 where identity columns were introduced. Vik Fearing (minor tweaks by me) Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org	2021-03-12 11:08:42 -05:00
Thomas Munro	1b88b8908e	Specialize checkpointer sort functions. When sorting a potentially large number of dirty buffers, the checkpointer can benefit from a faster sort routine. One reported improvement on a large buffer pool system was 1.4s -> 0.6s. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-12 23:56:02 +13:00
Amit Kapila	519e4c9ee2	Fix size overflow in calculation introduced by commits `d6ad34f3` and `bea449c6`. Reported-by: Thomas Munro Author: Takayuki Tsunakawa Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com	2021-03-12 15:42:08 +05:30
Amit Kapila	e2cda3c20a	Fix use of relcache TriggerDesc field introduced by commit `05c8482f7f`. The commit added code which used a relcache TriggerDesc field across another cache access, which it shouldn't because the relcache doesn't guarantee it won't get moved. Diagnosed-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us	2021-03-12 15:14:41 +05:30
Thomas Munro	57dcc2ef33	Poll postmaster less frequently in recovery. Since commits `9f095299` and `f98b8476` we don't poll the postmaster pipe at all during crash recovery on Linux and FreeBSD, but on other operating systems we were still doing it for every WAL record. Do it less frequently on operating systems where system calls are required, at the cost of delaying exit a bit after postmaster death. This avoids expensive system calls reported to slow down CPU-bound recovery by as much as 10-30%. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi	2021-03-12 19:45:42 +13:00
Thomas Munro	de829ddf23	Add condition variable for walreceiver shutdown. Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com	2021-03-12 19:45:42 +13:00
Thomas Munro	600f2f50b7	Add condition variable for recovery resume. Replace a sleep loop with a CV, to get a fast reaction time when recovery is resumed or the postmaster exits via standard infrastructure. Unfortunately we still need to wake up every second to perform extra polling during the recovery pause loop. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com	2021-03-12 19:45:42 +13:00
Fujii Masao	b82640df00	Send statistics collected during shutdown checkpoint to the stats collector. When shutdown is requested, checkpointer performs checkpoint or restartpoint, and updates the statistics, before it exits. But previously checkpointer didn't send those statistics to the stats collector. Shutdown checkpoint and restartpoint are treated as requested ones instead of scheduled ones, so the number of them are counted in pg_stat_bgwriter.checkpoints_req column. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-12 14:23:00 +09:00
Fujii Masao	33394ee6f2	Force to send remaining WAL stats to the stats collector at walwriter exit. In walwriter's main loop, WAL stats message is only sent if enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs. This is necessary to avoid overloading to the stats collector. But this can cause recent WAL stats to be unsent when walwriter exits. To ensure that all the WAL stats are sent, this commit makes walwriter force to send remaining WAL stats to the collector when it exits because of shutdown request. Note that those remaining WAL stats can still be unsent when walwriter exits with non-zero exit code (e.g., FATAL error). This is OK because that walwriter exit leads to server crash and subsequent recovery discards all the stats. So there is no need to send remaining stats in that case. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-12 13:29:59 +09:00
Thomas Munro	43c6662496	Minor modernization for README.barrier. Itanium is very uncommon and being discontinued. ARM is everywhere. Prefer ARM as an example of an architecture with weak memory ordering.	2021-03-12 15:36:16 +13:00
Peter Geoghegan	7bb97211a5	Save a few cycles during nbtree VACUUM. Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common case where there are no deleted but not yet recycled pages to recycle during a cleanup-only nbtree VACUUM operation. Follow-up to commit `e5d8a999`, which (among other things) taught the "skip full scan" nbtree VACUUM mechanism to only trigger a full index scan when the absolute number of deleted pages in the index is considered excessive.	2021-03-11 14:18:23 -08:00
Peter Geoghegan	effdd3f3b6	Add back vacuum_cleanup_index_scale_factor parameter. Commit `9f3665fb` removed the vacuum_cleanup_index_scale_factor storage parameter. However, that creates dump/reload hazards when moving across major versions. Add back the vacuum_cleanup_index_scale_factor parameter (though not the GUC of the same name) purely to avoid problems when using tools like pg_upgrade. The parameter remains disabled and undocumented. No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was only disabled by REL_13_STABLE's version of master branch commit `9f3665fb` in the first place -- the parameter already looks like this on REL_13_STABLE. Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz	2021-03-11 12:42:46 -08:00
Robert Haas	32fd2b57d7	Be clear about whether a recovery pause has taken effect. Previously, the code and documentation seem to have essentially assumed than a call to pg_wal_replay_pause() would take place immediately, but that's not the case, because we only check for a pause in certain places. This means that a tool that uses this function and then wants to do something else afterward that is dependent on the pause having taken effect doesn't know how long it needs to wait to be sure that no more WAL is going to be replayed. To avoid that, add a new function pg_get_wal_replay_pause_state() which returns either 'not paused', 'paused requested', or 'paused'. After calling pg_wal_replay_pause() the status will immediate change from 'not paused' to 'pause requested'; when the startup process has noticed this, the status will change to 'pause'. For backward compatibility, pg_is_wal_replay_paused() still exists and returns the same thing as before: true if a pause has been requested, whether or not it has taken effect yet; and false if not. The documentation is updated to clarify. To improve the changes that a pause request is quickly confirmed effective, adjust things so that WaitForWALToBecomeAvailable will swiftly reach a call to recoveryPausesHere() when a pause request is made. Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata, Masahiko Sawada, and Bharath Rupireddy. Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com	2021-03-11 15:07:03 -05:00
Peter Geoghegan	5f8727f5a6	VACUUM ANALYZE: Always update pg_class.reltuples. vacuumlazy.c sometimes fails to update pg_class entries for each index (to ensure that pg_class.reltuples is current), even though analyze.c assumed that that must have happened during VACUUM ANALYZE. There are at least a couple of reasons for this. For example, vacuumlazy.c could fail to update pg_class when the index AM indicated that its statistics are merely an estimate, per the contract for amvacuumcleanup() routines established by commit `e57345975c` back in 2006. Stop assuming that pg_class must have been updated with accurate statistics within VACUUM ANALYZE -- update pg_class for indexes at the same time as the table relation in all cases. That way VACUUM ANALYZE will never fail to keep pg_class.reltuples reasonably accurate. The only downside of this approach (compared to the old approach) is that it might inaccurately set pg_class.reltuples for indexes whose heap relation ends up with the same inaccurate value anyway. This doesn't seem too bad. We already consistently called vac_update_relstats() (to update pg_class) for the heap/table relation twice during any VACUUM ANALYZE -- once in vacuumlazy.c, and once in analyze.c. We now make sure that we call vac_update_relstats() at least once (though often twice) for each index. This is follow up work to commit `9f3665fb`, which dealt with issues in btvacuumcleanup(). Technically this fixes an unrelated issue, though. btvacuumcleanup() no longer provides an accurate num_index_tuples value following commit `9f3665fb` (when there was no btbulkdelete() call during the VACUUM operation in question), but hashvacuumcleanup() has worked in the same way for many years now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com Backpatch: 13-, just like commit `9f3665fb`.	2021-03-10 17:07:57 -08:00
Peter Geoghegan	9f3665fbfc	Don't consider newly inserted tuples in nbtree VACUUM. Remove the entire idea of "stale stats" within nbtree VACUUM (stop caring about stats involving the number of inserted tuples). Also remove the vacuum_cleanup_index_scale_factor GUC/param on the master branch (though just disable them on postgres 13). The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM partially responsible for deciding when pg_class.reltuples stats needed to be updated. This seems contrary to the spirit of the index AM API, though -- it is not actually necessary for an index AM's bulk delete and cleanup callbacks to provide accurate stats when it happens to be inconvenient. The core code owns that. (Index AMs have the authority to perform or not perform certain kinds of deferred cleanup based on their own considerations, such as page deletion and recycling, but that has little to do with pg_class.reltuples/num_index_tuples.) This issue was fairly harmless until the introduction of the autovacuum_vacuum_insert_threshold feature by commit `b07642db`, which had an undesirable interaction with the vacuum_cleanup_index_scale_factor mechanism: it made insert-driven autovacuums perform full index scans, even though there is no real benefit to doing so. This has been tied to a regression with an append-only insert benchmark [1]. Also have remaining cases that perform a full scan of an index during a cleanup-only nbtree VACUUM indicate that the final tuple count is only an estimate. This prevents vacuumlazy.c from setting the index's pg_class.reltuples in those cases (it will now only update pg_class when vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes an oversight in deduplication-related bugfix commit `48e12913`. [1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.	2021-03-10 16:27:01 -08:00
Bruce Momjian	845ac7f847	C comments: improve description of GiST NSN and GistBuildLSN GiST indexes are complex, so adding more details in the code might help someone. Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us	2021-03-10 17:03:10 -05:00
Thomas Munro	d87251048a	Replace buffer I/O locks with condition variables. 1. Backends waiting for buffer I/O are now interruptible. 2. If something goes wrong in a backend that is currently performing I/O, waiting backends no longer wake up until that backend reaches AbortBufferIO() and broadcasts on the CV. Previously, any waiters would wake up (because the I/O lock was automatically released) and then busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS. 3. LWLockMinimallyPadded is removed, as it would now be unused. Author: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016) Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com	2021-03-11 10:36:17 +13:00
Tom Lane	c3ffe34863	Avoid creating duplicate cached plans for inherited FK constraints. When a foreign key constraint is applied to a partitioned table, each leaf partition inherits a similar FK constraint. We were processing all of those constraints independently, meaning that in large partitioning trees we'd build up large collections of cached FK-checking query plans. However, in all cases but one, the generated queries are actually identical for all members of the inheritance tree (because, in most cases, the query only mentions the topmost table of the other side of the FK relationship). So we can share a single cached plan among all the partitions, saving memory, not to mention time to build and maintain the cached plans. Keisuke Kuroda and Amit Langote Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1	2021-03-10 14:22:31 -05:00
Peter Eisentraut	bbaf315309	Add bound check before bsearch() for performance In the current lazy vacuum implementation, some index AMs such as btree indexes call lazy_tid_reaped() for each index tuple during ambulkdelete to check if the index tuple points to the (collected) garbage tuple. In that function, we simply call bsearch(), but we should be able to know the result without bsearch() if the index tuple points to the heap tuple that is out of range of the collected garbage tuples. Therefore, add a simple bound check before resorting to bsearch(). Testing has shown that this can give significant performance benefits. Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com	2021-03-10 15:19:37 +01:00
Peter Eisentraut	1657b37d7c	Small debug message tweak This makes the wording of the delete case match the update case.	2021-03-10 08:16:38 +01:00
Amit Kapila	e4e87a32cc	Fix valgrind issue in commit `05c8482f7f`. Initialize other newly added variables in max_parallel_hazard_context via is_parallel_safe() because we don't check the parallel-safety of target relations in that function. Reported-by: Tom Lane as per buildfarm Author: Amit Kapila Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us	2021-03-10 10:06:39 +05:30
Amit Kapila	05c8482f7f	Enable parallel SELECT for "INSERT INTO ... SELECT ...". Parallel SELECT can't be utilized for INSERT in the following cases: - INSERT statement uses the ON CONFLICT DO UPDATE clause - Target table has a parallel-unsafe: trigger, index expression or predicate, column default expression or check constraint - Target table has a parallel-unsafe domain constraint on any column - Target table is a partitioned table with a parallel-unsafe partition key expression or support function The planner is updated to perform additional parallel-safety checks for the cases listed above, for determining whether it is safe to run INSERT in parallel-mode with an underlying parallel SELECT. The planner will consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided nothing unsafe is found from the additional parallel-safety checks, or from the existing parallel-safety checks for SELECT. While checking parallel-safety, we need to check it for all the partitions on the table which can be costly especially when we decide not to use a parallel plan. So, in a separate patch, we will introduce a GUC and or a reloption to enable/disable parallelism for Insert statements. Prior to entering parallel-mode for the execution of INSERT with parallel SELECT, a TransactionId is acquired and assigned to the current transaction state. This is necessary to prevent the INSERT from attempting to assign the TransactionId whilst in parallel-mode, which is not allowed. This approach has a disadvantage in that if the underlying SELECT does not return any rows, then the TransactionId is not used, however that shouldn't happen in practice in many cases. Author: Greg Nancarrow, Amit Langote, Amit Kapila Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila Tested-by: Tang, Haiying Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com	2021-03-10 07:38:58 +05:30
Fujii Masao	ff99918c62	Track total amounts of times spent writing and syncing WAL data to disk. This commit adds new GUC track_wal_io_timing. When this is enabled, the total amounts of time XLogWrite writes and issue_xlog_fsync syncs WAL data to disk are counted in pg_stat_wal. This information would be useful to check how much WAL write and sync affect the performance. Enabling track_wal_io_timing will make the server query the operating system for the current time every time WAL is written or synced, which may cause significant overhead on some platforms. To avoid such additional overhead in the server with track_io_timing enabled, this commit introduces track_wal_io_timing as a separate parameter from track_io_timing. Note that WAL write and sync activity by walreceiver has not been tracked yet. This commit makes the server also track the numbers of times XLogWrite writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal, regardless of the setting of track_wal_io_timing. This counters can be used to calculate the WAL write and sync time per request, for example. Bump PGSTAT_FILE_FORMAT_ID. Bump catalog version. Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-09 16:52:06 +09:00
Michael Paquier	9d2d457009	Add support for more progress reporting in COPY The command (TO or FROM), its type (file, pipe, program or callback), and the number of tuples excluded by a WHERE clause in COPY FROM are added to the progress reporting already available. The column "lines_processed" is renamed to "tuples_processed" to disambiguate the meaning of this column in the cases of CSV and BINARY COPY and to be more consistent with the other catalog progress views. Bump catalog version, again. Author: Matthias van de Meent Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef Šimánek, Tomas Vondra Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com	2021-03-09 14:21:03 +09:00
Michael Paquier	f9264d1524	Remove support for SSL compression PostgreSQL disabled compression as of `e3bdb2d` and the documentation recommends against using it since. Additionally, SSL compression has been disabled in OpenSSL since version 1.1.0, and was disabled in many distributions long before that. The most recent TLS version, TLSv1.3, disallows compression at the protocol level. This commit removes the feature itself, removing support for the libpq parameter sslcompression (parameter still listed for compatibility reasons with existing connection strings, just ignored), and removes the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus. Note that, on top of removing the ability to activate compression by configuration, compression is actively disabled in both frontend and backend to avoid overrides from local configurations. A TAP test is added for deprecated SSL parameters to check after backwards compatibility. Bump catalog version. Author: Daniel Gustafsson Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier Discussion: https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se	2021-03-09 11:16:47 +09:00
Tom Lane	d4545dc19b	Complain if a function-in-FROM returns a set when it shouldn't. Throw a "function protocol violation" error if a function in FROM tries to return a set though it wasn't marked proretset. Although such cases work at the moment, it doesn't seem like something we want to guarantee will keep working. Besides, there are other negative consequences of not setting the proretset flag, such as potentially bad plans. No back-patch, since if there is any third-party code violating this expectation, people wouldn't appreciate us breaking it in a minor release. Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us	2021-03-08 18:54:55 -05:00
Tom Lane	5c06abb9b9	Validate the OID argument of pg_import_system_collations(). "SELECT pg_import_system_collations(0)" caused an assertion failure. With a random nonzero argument --- or indeed with zero, in non-assert builds --- it would happily make pg_collation entries with garbage values of collnamespace. These are harmless as far as I can tell (unless maybe the OID happens to become used for a schema, later on?). In any case this isn't a security issue, since the function is superuser-only. But it seems like a gotcha for unwary DBAs, so let's add a check that the given OID belongs to some schema. Back-patch to v10 where this function was introduced.	2021-03-08 18:21:51 -05:00
Tom Lane	6c20bdb2a2	Further tweak memory management for regex DFAs. Coverity is still unhappy after commit `190c79884`, and after looking closer I think it might be onto something. The callers of newdfa() typically drop out if v->err has been set nonzero, which newdfa() is faithfully doing if it fails. However, what if v->err was already nonzero before we entered newdfa()? Then newdfa() could succeed and the caller would promptly leak its result. I don't think this scenario can actually happen, but the predicate "v->err is always zero when newdfa() is called" seems difficult to be entirely sure of; there's a good deal of code that potentially could get that wrong. It seems better to adjust the callers to directly check for a null result instead of relying on ISERR() tests. This is slightly cheaper than the previous coding anyway. Lacking evidence that there's any real bug, no back-patch.	2021-03-08 16:32:29 -05:00
Amit Kapila	8a812e5106	Track replication origin progress for rollbacks. Commit `1eb6d6527a` allowed to track replica origin replay progress for 2PC but it was not complete. It misses to properly track the progress for rollback prepared especially it missed updating the code for recovery. Additionally, we need to allow tracking it on subscriber nodes where wal_level might not be logical. It is required to track decoding of 2PC which is committed in PG14 (`a271a1b50e`) and also nobody complained about this till now so not backpatching it. Author: Amit Kapila Reviewed-by: Michael Paquier and Ajin Cherian Discussion: https://postgr.es/m/CAA4eK1L-kHmMnSdrRW6UhRbCjR7cgh04c+6psY15qzT6ktcd+g@mail.gmail.com	2021-03-08 07:54:03 +05:30
Heikki Linnakangas	3174d69fb9	Remove server and libpq support for old FE/BE protocol version 2. Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be many clients or servers left out there without version 3 support. But as a courtesy, I kept just enough of the old protocol support that we can still send the "unsupported protocol version" error in v2 format, so that old clients can display the message properly. Likewise, libpq still understands v2 ErrorResponse messages when establishing a connection. The impetus to do this now is that I'm working on a patch to COPY FROM, to always prefetch some data. We cannot do that safely with the old protocol, because it requires parsing the input one byte at a time to detect the end-of-copy marker. Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi	2021-03-04 10:45:55 +02:00
Tom Lane	0a687c8f10	Add trim_array() function. This has been in the SQL spec since 2008. It's a pretty thin wrapper around the array slice functionality, but the spec says we should have it, so here it is. Vik Fearing, reviewed by Dian Fay Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org	2021-03-03 16:39:57 -05:00
Peter Eisentraut	e527a99055	Some copy-editing of GUC descriptions	2021-03-03 07:14:35 +01:00
Thomas Munro	8eda3eba30	Use sort_template.h for qsort_tuple() and qsort_ssup(). Replace the Perl code previously used to generate specialized sort functions with sort_template.h. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-03 17:02:32 +13:00
Amit Kapila	19890a064e	Add option to enable two_phase commits via pg_create_logical_replication_slot. Commit `0aa8a01d04` extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-03 07:34:11 +05:30
Peter Geoghegan	5b2f2af3d9	nbtree page deletion: Add leaftopparent assertion. Add documenting assertion. This makes it easier to follow how we maintain the top parent link in target subtree's half-dead/leaf level page.	2021-03-02 14:06:07 -08:00
Peter Geoghegan	3d8d5787a3	Fix nbtree page deletion error messages. Adjust some "can't happen" error messages that assumed that the page deletion target page must be a half-dead page. This assumption was wrong in the case of an internal target page. Simply refer to these pages as the target page instead. Internal pages are never marked half-dead. There is exactly one half-dead page for each subtree undergoing deletion. The half-dead page is also the target subtree's leaf-level page. This has been the case since commit `efada2b8`, which totally overhauled nbtree page deletion.	2021-03-02 13:02:24 -08:00
Tom Lane	d16f8c8e41	Mark default_transaction_read_only as GUC_REPORT. This allows clients to find out the setting at connection time without having to expend a query round trip to do so; which is helpful when trying to identify read/write servers. (One must also look at in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.) Modifying libpq to make use of this will come soon, but I felt it cleaner to push the server change separately. Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com	2021-03-02 13:53:54 -05:00
Tom Lane	4604f83fdf	Suppress unnecessary regex subre nodes in a couple more cases. This extends the changes made in commit `cebc1d34e`, teaching parseqatom() to generate fewer or cheaper subre nodes in some edge cases. The case of interest here is a quantified atom that is "messy" only because it has greediness opposite to what preceded it (whereas captures and backrefs are intrinsically messy). In this case we don't need an iteration node, since we don't care where the sub-matches of the quantifier are; and we might also not need a second concatenation node. This seems of only marginal real-world use according to my testing, but I wanted to get it in before wrapping up this series of regex performance fixes. Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-03-02 12:14:14 -05:00
Tom Lane	0c3405cf11	Improve performance of regular expression back-references. In some cases, at the time that we're doing an NFA-based precheck of whether a backref subexpression can match at a particular place in the string, we already know which substring the referenced subexpression matched. If so, we might as well forget about the NFA and just compare the substring; this is faster and it gives an exact rather than approximate answer. In general, this optimization can help while we are prechecking within the second child expression of a concat node, while the capture was within the first child expression; then the substring was saved during cdissect() of the first child and will be available to NFA checks done while cdissect() recurses into the second child. It can help quite a lot if the tree looks like concat / \ capture concat / \ expensive stuff backref as we will be able to avoid recursively dissecting the "expensive stuff" before discovering that the backref isn't satisfied with a particular midpoint that the lower concat node is testing. This doesn't help if the concat tree is left-deep, as the capture node won't get set soon enough (and it's hard to fix that without changing the engine's match behavior). Fortunately, right-deep concat trees are the common case. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:55:12 -05:00
Tom Lane	4aea704a5b	Fix semantics of regular expression back-references. POSIX defines the behavior of back-references thus: The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "\(" and "\)" preceding the '\n'. As far as I can see, the back-reference is supposed to consider only the data characters matched by the referenced subexpression. However, because our engine copies the NFA constructed from the referenced subexpression, it effectively enforces any constraints therein, too. As an example, '(^.)\1' ought to match 'xx', or any other string starting with two occurrences of the same character; but in our code it does not, and indeed can't match anything, because the '^' anchor constraint is included in the backref's copied NFA. If POSIX intended that, you'd think they'd mention it. Perl for one doesn't act that way, so it's hard to conclude that this isn't a bug. Fix by modifying the backref's NFA immediately after it's copied from the reference, replacing all constraint arcs by EMPTY arcs so that the constraints are treated as automatically satisfied. This still allows us to enforce matching rules that depend only on the data characters; for example, in '(^\d+).*\1' the NFA matching step will still know that the backref can only match strings of digits. Perhaps surprisingly, this change does not affect the results of any of a rather large corpus of real-world regexes. Nonetheless, I would not consider back-patching it, since it's a clear compatibility break. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:34:53 -05:00
Michael Paquier	fabde52fab	Simplify code to switch pg_class.relrowsecurity in tablecmds.c The same code pattern was repeated twice to enable or disable ROW LEVEL SECURITY with an ALTER TABLE command. This makes the code slightly cleaner. Author: Justin Pryzby Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com	2021-03-02 12:30:21 +09:00
Tom Lane	ffd3944ab9	Improve reporting for syntax errors in multi-line JSON data. Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com	2021-03-01 16:44:17 -05:00
Thomas Munro	bd69ddfcdb	Remove obsolete comment for WaitForProcSignalBarrier(). Commit `814f1d8b` removed the behavior described. Reported-by: Amit Kapila <amit.kapila16@gmail.com>	2021-03-02 09:30:57 +13:00
Thomas Munro	f5a5773a9d	Allow condition variables to be used in interrupt code. Adjust the condition variable sleep loop to work correctly when code reached by its internal CHECK_FOR_INTERRUPTS() call interacts with another condition variable. There are no such cases currently, but a proposed patch would do this. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:24:47 +13:00
Thomas Munro	814f1d8bc3	Use condition variables for ProcSignalBarriers. Instead of a poll/sleep loop, use a condition variable for precise wake-up whenever a backend's pss_barrierGeneration advances. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:23:43 +13:00
Amit Kapila	8bdb1332eb	Avoid repeated decoding of prepared transactions after a restart. In commit `a271a1b50e`, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-01 09:11:18 +05:30
Thomas Munro	6230912f23	Use FeBeWaitSet for walsender.c. This avoids the need to set up and tear down a fresh WaitEventSet every time we need need to wait. We have to add an explicit exit on postmaster exit (FeBeWaitSet isn't set up to do that automatically), so move the code to do that into a new function to avoid repetition. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:19:38 +13:00
Thomas Munro	a042ba2ba7	Introduce symbolic names for FeBeWaitSet positions. Previously we used 0 and 1 to refer to the socket and latch in far flung parts of the tree, without any explanation. Also use PGINVALID_SOCKET rather than -1 in a couple of places that didn't already do that. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:10:16 +13:00
Amit Kapila	b4e3dc7fd4	Update the docs and comments for decoding of prepared xacts. Commit `a271a1b50e` introduced decoding at prepare time in ReorderBuffer. This can lead to deadlock for out-of-core logical replication solutions that uses this feature to build distributed 2PC in case such transactions lock [user] catalog tables exclusively. They need to inform users to not have locks on catalog tables (via explicit LOCK command) in such transactions. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de	2021-03-01 08:14:33 +05:30
Thomas Munro	6148656a0b	Use EVFILT_SIGNAL for kqueue latches. Cut down on system calls and other overheads by waiting for SIGURG explicitly with kqueue instead of using a signal handler and self-pipe. Affects *BSD and macOS systems. This leaves only the poll implementation with a signal handler and the traditional self-pipe trick. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:20:04 +13:00
Thomas Munro	6a2a70a020	Use signalfd(2) for epoll latches. Cut down on system calls and other overheads by reading from a signalfd instead of using a signal handler and self-pipe. Affects Linux sytems, and possibly others including illumos that implement the Linux epoll and signalfd interfaces. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:12:02 +13:00
Thomas Munro	83709a0d5a	Use SIGURG rather than SIGUSR1 for latches. Traditionally, SIGUSR1 has been overloaded for ad-hoc signals, procsignal.c signals and latch.c wakeups. Move that last use over to a new dedicated signal. SIGURG is normally used to report out-of-band socket data, but PostgreSQL doesn't use that facility. The signal handler is now installed in all postmaster children by InitializeLatchSupport(). Those wishing to disconnect from it should call ShutdownLatchSupport(). Future patches will use this separation of signals to avoid the need for a signal handler on some operating systems. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	c8f3bc2401	Optimize latches to send fewer signals. Don't send signals to processes that aren't sleeping. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	d1b90995e8	Remove latch.c workaround for Linux < 2.6.27. Commit `82ebbeb0` added a workaround for systems with no epoll_create1() and EPOLL_CLOEXEC. Linux < 2.6.27 and glibc < 2.9 are long gone. Now seems like a good time to drop the extra code, because otherwise we'd have to add similar already-dead workaround code to new patches using XXX_CLOEXEC flags that arrived in the same kernel release. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com	2021-03-01 11:24:28 +13:00
Alvaro Herrera	25936fd46c	Fix use-after-free bug with AfterTriggersTableData.storeslot AfterTriggerSaveEvent() wrongly allocates the slot in execution-span memory context, whereas the correct thing is to allocate it in a transaction-span context, because that's where the enclosing AfterTriggersTableData instance belongs into. Backpatch to 12 (the test back to 11, where it works well with no code changes, and it's good to have to confirm that the case was previously well supported); this bug seems introduced by commit `ff11e7f4b9`. Reported-by: Bertrand Drouvot <bdrouvot@amazon.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com	2021-02-27 18:09:15 -03:00
David Rowley	977b2c0853	Add missing TidRangeScan readfunc Mistakenly forgotten in `bb437f995`	2021-02-27 23:21:21 +13:00
David Rowley	bb437f995d	Add TID Range Scans to support efficient scanning ranges of TIDs This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com	2021-02-27 22:59:36 +13:00
Peter Eisentraut	f4adc41c4f	Enhanced cycle mark values Per SQL:202x draft, in the CYCLE clause of a recursive query, the cycle mark values can be of type boolean and can be omitted, in which case they default to TRUE and FALSE. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com	2021-02-27 08:13:24 +01:00
Tom Lane	0fc1af174c	Improve memory management in regex compiler. The previous logic here created a separate pool of arcs for each state, so that the out-arcs of each state were physically stored within it. Perhaps this choice was driven by trying to not include a "from" pointer within each arc; but Spencer gave up on that idea long ago, and it's hard to see what the value is now. The approach turns out to be fairly disastrous in terms of memory consumption, though. In the first place, NFAs built by this engine seem to have about 4 arcs per state on average, with a majority having only one or two out-arcs. So pre-allocating 10 out-arcs for each state is already cause for a factor of two or more bloat. Worse, the NFA optimization phase moves arcs around with abandon. In a large NFA, some of the states will have hundreds of out-arcs, so towards the end of the optimization phase we have a significant number of states whose arc pools have room for hundreds of arcs each, even though only a few of those arcs are in use. We have seen real-world regexes in which this effect bloats the memory requirement by 25X or even more. Hence, get rid of the per-state arc pools in favor of a single arc pool for the whole NFA, with variable-sized allocation batches instead of always asking for 10 at a time. While we're at it, let's batch the allocations of state structs too, to further reduce the malloc traffic. This incidentally allows moveouts() to be optimized in a similar way to moveins(): when moving an arc to another state, it's now valid to just re-link the same arc struct into a different outchain, where before the code invariants required us to make a physically new arc and then free the old one. These changes reduce the regex compiler's typical space consumption for average-size regexes by about a factor of two, and much more for large or complicated regexes. In a large test set of real-world regexes, we formerly had half a dozen cases that failed with "regular expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE limit (about 150MB); we would have had to raise that limit to something close to 400MB to make them work with the old code. Now, none of those cases need more than 13MB to compile. Furthermore, the test set is about 10% faster overall due to less malloc traffic. Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us	2021-02-26 13:52:10 -05:00
Thomas Munro	8556267b2b	Revert "pg_collation_actual_version() -> pg_collation_current_version()." This reverts commit `9cf184cc05`. Name change less well received than anticipated. Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org	2021-02-26 15:29:27 +13:00
Tom Lane	80ca8464fe	Fix list-manipulation bug in WITH RECURSIVE processing. makeDependencyGraphWalker and checkWellFormedRecursionWalker thought they could hold onto a pointer to a list's first cons cell while the list was modified by recursive calls. That was okay when the cons cell was actually separately palloc'd ... but since commit `1cff1b95a`, it's quite unsafe, leading to core dumps or incorrect complaints of faulty WITH nesting. In the field this'd require at least a seven-deep WITH nest to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE allows the bug to be seen with lesser nesting depths. Per bug #16801 from Alexander Lakhin. Back-patch to v13. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org	2021-02-25 20:47:32 -05:00
Peter Geoghegan	2376361839	VACUUM VERBOSE: Count "newly deleted" index pages. Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM operation -- these are newly deleted pages. VACUUM VERBOSE continues to report on the total number of deleted pages in the entire index (no change there). The former is a subset of the latter. The distinction between each category of deleted index page only arises with index AMs where page deletion is supported and is decoupled from page recycling for performance reasons. This is follow-up work to commit `e5d8a999`, which made nbtree store 64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're deleted. Note that the btm_last_cleanup_num_delpages metapage field added by that commit usually gets set to pages_newly_deleted. The exceptions (the scenarios in which they're not equal) all seem to be tricky cases for the implementation (of page deletion and recycling) in general. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com	2021-02-25 14:32:18 -08:00
Tom Lane	301ed8812e	Doc: remove src/backend/regex/re_syntax.n. We aren't publishing this file as documentation, and it's been much more haphazardly maintained than the real docs in func.sgml, so let's just drop it. I think the only reason I included it in commit `7bcc6d98f` was that the Berkeley-era sources had had a man page in this directory. Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us	2021-02-25 13:33:27 -05:00
Tom Lane	7dc13a0f08	Change regex \D and \W shorthands to always match newlines. Newline is certainly not a digit, nor a word character, so it is sensible that it should match these complemented character classes. Previously, \D and \W acted that way by default, but in newline-sensitive mode ('n' or 'p' flag) they did not match newlines. This behavior was previously forced because explicit complemented character classes don't match newlines in newline-sensitive mode; but as of the previous commit that implementation constraint no longer exists. It seems useful to change this because the primary real-world use for newline-sensitive mode seems to be to match the default behavior of other regex engines such as Perl and Javascript ... and their default behavior is that these match newlines. The old behavior can be kept by writing an explicit complemented character class, i.e. [^[:digit:]] or [^[:word:]]. (This means that \D and \W are not exactly equivalent to those strings, but they weren't anyway.) Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us	2021-02-25 13:29:06 -05:00
Tom Lane	2a0af7fe46	Allow complemented character class escapes within regex brackets. The complement-class escapes \D, \S, \W are now allowed within bracket expressions. There is no semantic difficulty with doing that, but the rather hokey macro-expansion-based implementation previously used here couldn't cope. Also, invent "word" as an allowed character class name, thus "\w" is now equivalent to "[[:word:]]" outside brackets, or "[:word:]" within brackets. POSIX allows such implementation-specific extensions, and the same name is used in e.g. bash. One surprising compatibility issue this raises is that constructs such as "[\w-_]" are now disallowed, as our documentation has always said they should be: character classes can't be endpoints of a range. Previously, because \w was just a macro for "[:alnum:]_", such a construct was read as "[[:alnum:]_-_]", so it was accepted so long as the character after "-" was numerically greater than or equal to "_". Some implementation cleanup along the way: * Remove the lexnest() hack, and in consequence clean up wordchrs() to not interact with the lexer. * Fix colorcomplement() to not be O(N^2) in the number of colors involved. * Get rid of useless-as-far-as-I-can-see calls of element() on single-character character element names in brackpart(). element() always maps these to the character itself, and things would be quite broken if it didn't --- should "[a]" match something different than "a" does? Besides, the shortcut path in brackpart() wasn't doing this anyway, making it even more inconsistent. Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us	2021-02-25 13:00:40 -05:00
Peter Geoghegan	e5d8a99903	Use full 64-bit XIDs in deleted nbtree pages. Otherwise we risk "leaking" deleted pages by making them non-recyclable indefinitely. Commit `6655a729` did the same thing for deleted pages in GiST indexes. That work was used as a starting point here. Stop storing an XID indicating the oldest bpto.xact across all deleted though unrecycled pages in nbtree metapages. There is no longer any reason to care about that condition/the oldest XID. It only ever made sense when wraparound was something _bt_vacuum_needs_cleanup() had to consider. The btm_oldest_btpo_xact metapage field has been repurposed and renamed. It is now btm_last_cleanup_num_delpages, which is used to remember how many non-recycled deleted pages remain from the last VACUUM (in practice its value is usually the precise number of pages that were _newly deleted_ during the specific VACUUM operation that last set the field). The general idea behind storing btm_last_cleanup_num_delpages is to use it to give _some_ consideration to non-recycled deleted pages inside _bt_vacuum_needs_cleanup() -- though never too much. We only really need to avoid leaving a truly excessive number of deleted pages in an unrecycled state forever. We only do this to cover certain narrow cases where no other factor makes VACUUM do a full scan, and yet the index continues to grow (and so actually misses out on recycling existing deleted pages). These metapage changes result in a clear user-visible benefit: We no longer trigger full index scans during VACUUM operations solely due to the presence of only 1 or 2 known deleted (though unrecycled) blocks from a very large index. All that matters now is keeping the costs and benefits in balance over time. Fix an issue that has been around since commit `857f9c36`, which added the "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup() logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally hinged upon _when_ the source value gets stored. We now always store btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the issue because IndexVacuumInfo.num_heap_tuples (the source field) is expected to accurately indicate the state of the table _after_ the VACUUM completes inside btvacuumcleanup(). A backpatchable fix cannot easily be extracted from this commit. A targeted fix for the issue will follow in a later commit, though that won't happen today. I (pgeoghegan) have chosen to remove any mention of deleted pages in the documentation of the vacuum_cleanup_index_scale_factor GUC/param, since the presence of deleted (though unrecycled) pages is no longer of much concern to users. The vacuum_cleanup_index_scale_factor description in the docs now seems rather unclear in any case, and it should probably be rewritten in the near future. Perhaps some passing mention of page deletion will be added back at the same time. Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com	2021-02-24 18:41:34 -08:00
Amit Kapila	8a4f9522d0	Fix relcache reference leak introduced by `ce0fdbfe97`. Author: Sawada Masahiko Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com	2021-02-25 07:48:24 +05:30
Michael Paquier	bcf2667bf6	Fix some typos, grammar and style in docs and comments The portions fixing the documentation are backpatched where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com backpatch-through: 9.6	2021-02-24 16:13:17 +09:00
Peter Eisentraut	8ec8fe0f31	Message style fix Don't quote type name placeholders.	2021-02-24 07:00:49 +01:00
Alvaro Herrera	5a65eacfdc	Fix confusion in comments about generate_gather_paths `d2d8a229bc` introduced a new function generate_useful_gather_paths to be used as a replacement for generate_gather_paths, but forgot to update a couple of places that referenced the older function. This is possibly not 100% complete (ref. create_ordered_paths), but it's better than not changing anything. Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local	2021-02-23 20:05:15 -03:00
Alvaro Herrera	8deb6b38dc	Reinstate HEAP_XMAX_LOCK_ONLY\|HEAP_KEYS_UPDATED as allowed Commit `866e24d47d` added an assert that HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that the latter necessarily referred to an update and not a tuple lock; but that's wrong, because SELECT FOR UPDATE can use precisely that combination, as evidenced by the amcheck test case added here. Remove the Assert(), and also patch amcheck's verify_heapam.c to not complain if the combination is found. Also, out of overabundance of caution, update (across all branches) README.tuplock to be more explicit about this. Author: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/20210124061758.GA11756@nol	2021-02-23 17:30:21 -03:00
Tom Lane	3db05e76f9	Suppress compiler warning in new regex match-all detection code. gcc 10 is smart enough to notice that control could reach this "hasmatch[depth]" assignment with depth < 0, but not smart enough to know that that would require a badly broken NFA graph. Change the assert() to a plain runtime test to shut it up. Per report from Andres Freund. Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de	2021-02-23 13:55:34 -05:00
Alvaro Herrera	d9d076222f	VACUUM: ignore indexing operations with CONCURRENTLY As envisioned in commit `c98763bf51`, it is possible for VACUUM to ignore certain transactions that are executing CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's because we know those transactions are not going to examine any other tables, and are not going to execute anything else in the same transaction. (Only operations on "safe" indexes can be ignored: those on indexes that are neither partial nor expressional). This is extremely useful in cases where CIC/RC can run for a very long time, because that used to be a significant headache for concurrent vacuuming of other tables. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql	2021-02-23 12:15:09 -03:00
Peter Eisentraut	6f6f284c7e	Simplify printing of LSNs Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com	2021-02-23 10:27:02 +01:00
Amit Kapila	ade89ba5f4	Fix an oversight in ReorderBufferFinishPrepared. We don't have anything to decode in a transaction if ReorderBufferTXN doesn't exist by the time we decode the commit prepared. So don't create a new ReorderBufferTXN here. This is an oversight in commit `a271a1b5`. Reported-by: Markus Wanner Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch	2021-02-23 09:47:41 +05:30
Amit Kapila	bc617a7b1c	Change the error message for logical replication authentication failure. The authentication failure error message wasn't distinguishing whether it is a physical replication or logical replication connection failure and was giving incomplete information on what led to failure in case of logical replication connection. Author: Paul Martinez and Amit Kapila Reviewed-by: Euler Taveira and Amit Kapila Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com	2021-02-23 09:11:22 +05:30
Alvaro Herrera	0f5505a881	Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls Pavan Deolasee recently noted that a few of the HeapTupleHeaderIndicatesMovedPartitions calls added by commit `5db6df0c01` are useless, since they are done after comparing t_self with t_ctid. But because t_self can never be set to the magical values that indicate that the tuple moved partition, this can never succeed: if the first test fails (so we know t_self equals t_ctid), necessarily the second test will also fail. So these checks can be removed and no harm is done. There's no bug here, just a code legibility issue. Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com> Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql	2021-02-22 16:51:44 -03:00
Alvaro Herrera	6a03369a71	Fix typo	2021-02-22 11:34:05 -03:00
Thomas Munro	beb4480c85	Refactor get_collation_current_version(). The code paths for three different OSes finished up with three different ways of excluding C[.xxx] and POSIX from consideration. Merge them. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:32:16 +13:00
Thomas Munro	9cf184cc05	pg_collation_actual_version() -> pg_collation_current_version(). The new name seems a bit more natural. Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:32:16 +13:00
Thomas Munro	0fb0a0503b	Hide internal error for pg_collation_actual_version(<bad OID>). Instead of an unsightly internal "cache lookup failed" message, just return NULL for bad OIDs, as is the convention for other similar things. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:01:20 +13:00
Fujii Masao	f05ed5a5cf	Initialize atomic variable waitStart in PGPROC, at postmaster startup. Commit `46d6e5f567` added the atomic variable "waitStart" into PGPROC struct, to store the time at which wait for lock acquisition started. Previously this variable was initialized every time each backend started. Instead, this commit makes postmaster initialize it at the startup, to ensure that the variable should be initialized before any use of it. This commit also moves the code to initialize "waitStart" variable for prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts(). Because MarkAsPreparingGuts() is more proper place to do that since it initializes other PGPROC variables. Author: Fujii Masao Reviewed-by: Atsushi Torikoshi Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com	2021-02-22 18:25:00 +09:00
Peter Eisentraut	efbfb64241	Improve new hash partition bound check error messages For the error message "every hash partition modulus must be a factor of the next larger modulus", add a detail message that shows the particular numbers and existing partition involved. Also comment the code more. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com	2021-02-22 08:06:45 +01:00
Michael Paquier	9294264278	Use pgstat_progress_update_multi_param() where possible This commit changes one code path in REINDEX INDEX and one code path in CREATE INDEX CONCURRENTLY to report the progress of each operation using pgstat_progress_update_multi_param() rather than multiple calls to pgstat_progress_update_param(). This has the advantage to make the progress report more consistent to the end-user without impacting the amount of information provided. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com	2021-02-22 14:21:40 +09:00
Thomas Munro	db8374d804	Remove outdated reference to RAID spindles. Commit `b09ff536` left behind some outdated advice in the long_desc field of the GUC "effective_io_concurrency". Remove it. Back-patch to 13. Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com	2021-02-22 14:42:15 +13:00
Tom Lane	190c79884a	Simplify memory management for regex DFAs a little. Coverity complained that functions in regexec.c might leak DFA storage. It's wrong, but this logic is confusing enough that it's not so surprising Coverity couldn't make sense of it. Rewrite in hopes of making it more legible to humans as well as machines.	2021-02-21 20:29:11 -05:00
Tom Lane	ea1268f630	Avoid generating extra subre tree nodes for capturing parentheses. Previously, each pair of capturing parentheses gave rise to a separate subre tree node, whose only function was to identify that we ought to capture the match details for this particular sub-expression. In most cases we don't really need that, since we can perfectly well put a "capture this" annotation on the child node that does the real matching work. As with the two preceding commits, the main value of this is to avoid generating and optimizing an NFA for a tree node that's not really pulling its weight. The chosen data representation only allows one capture annotation per subre node. In the legal-per-spec, but seemingly not very useful, case where there are multiple capturing parens around the exact same bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture nodes that act the same as before. We could work harder at that but I'll refrain, pending some evidence that such cases are worth troubling over. In passing, improve the comments in regex.h to say what all the different re_info bits mean. Some of them were pretty obvious but others not so much, so reverse-engineer some documentation. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 19:26:41 -05:00
Tom Lane	5810430894	Convert regex engine's subre tree from binary to N-ary style. Instead of having left and right child links in subre structs, have a single child link plus a sibling link. Multiple children of a tree node are now reached by chasing the sibling chain. The beneficiary of this is alternation tree nodes. A regular expression with N (>1) branches is now represented by one alternation node with N children, rather than a tree that includes N alternation nodes as well as N children. While the old representation didn't really cost anything extra at execution time, it was pretty horrid for compilation purposes, because each of the alternation nodes had its own NFA, which we were too stupid not to separately optimize. (To make matters worse, all of those NFAs described the entire alternation pattern, not just the portion of it that one might expect from the tree structure.) We continue to require concatenation nodes to have exactly two children. This data structure is now prepared to support more, but the executor's logic would need some careful redesign, and it's not clear that a lot of benefit could be had. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 19:07:45 -05:00
Tom Lane	cebc1d34e5	Fix regex engine to suppress useless concatenation sub-REs. The comment for parsebranch() claims that it avoids generating unnecessary concatenation nodes in the "subre" tree, but it missed some significant cases. Once we've decided that a given atom is "messy" and can't be bundled with the preceding atom(s) of the current regex branch, parseqatom() always generated two new concat nodes, one to concat the messy atom to what follows it in the branch, and an upper node to concatenate the preceding part of the branch to that one. But one or both of these could be unnecessary, if the messy atom is the first, last, or only one in the branch. Improve the code to suppress such useless concat nodes, along with the no-op child nodes representing empty chunks of a branch. Reducing the number of subre tree nodes offers significant savings not only at execution but during compilation, because each subre node has its own NFA that has to be separately optimized. (Maybe someday we'll figure out how to share the optimization work across multiple tree nodes, but it doesn't look easy.) Eliminating upper tree nodes is especially useful because they tend to have larger NFAs. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:45:29 -05:00
Tom Lane	824bf71902	Recognize "match-all" NFAs within the regex engine. This builds on the previous "rainbow" patch to detect NFAs that will match any string, though possibly with constraints on the string length. This definition is chosen to match constructs such as ".*", ".+", and ".{1,100}". Recognizing such an NFA after the optimization pass is fairly cheap, since we basically just have to verify that all arcs are RAINBOW arcs and count the number of steps to the end state. (Well, there's a bit of complication with pseudo-color arcs for string boundary conditions, but not much.) Once we have these markings, the regex executor functions longest(), shortest(), and matchuntil() don't have to expend per-character work to determine whether a given substring satisfies such an NFA; they just need to check its length against the bounds. Since some matching problems require O(N) invocations of these functions, we've reduced the runtime for an N-character string from O(N^2) to O(N). Of course, this is no help for non-matchall sub-patterns, but those usually have constraints that allow us to avoid needing O(N) substring checks in the first place. It's precisely the unconstrained "match-all" cases that cause the most headaches. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:31:19 -05:00
Tom Lane	08c0d6ad65	Invent "rainbow" arcs within the regex engine. Some regular expression constructs, most notably the "." match-anything metacharacter, produce a sheaf of parallel NFA arcs covering all possible colors (that is, character equivalence classes). We can make a noticeable improvement in the space and time needed to process large regexes by replacing such cases with a single arc bearing the special color code "RAINBOW". This requires only minor additional complication in places such as pull() and push(). Callers of pg_reg_getoutarcs() must now be prepared for the possibility of seeing a RAINBOW arc. For the one known user, contrib/pg_trgm, that's a net benefit since it cuts the number of arcs to be dealt with, and the handling isn't any different than for other colors that contain too many characters to be dealt with individually. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:11:56 -05:00
Fujii Masao	8a55cb5ba9	Fix bug in COMMIT AND CHAIN command. This commit fixes COMMIT AND CHAIN command so that it starts new transaction immediately even if savepoints are defined within the transaction to commit. Previously COMMIT AND CHAIN command did not in that case because commit `280a408b48` forgot to make CommitTransactionCommand() handle a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT. Also this commit adds the regression test for COMMIT AND CHAIN command when savepoints are defined. Back-patch to v12 where transaction chaining was added. Reported-by: Arthur Nascimento Author: Fujii Masao Reviewed-by: Arthur Nascimento, Vik Fearing Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org	2021-02-19 21:57:52 +09:00
Peter Eisentraut	678d0e239b	Update snowball Update to snowball tag v2.1.0. Major changes are new stemmers for Armenian, Serbian, and Yiddish.	2021-02-19 08:10:15 +01:00
Peter Geoghegan	b071a31149	Add nbtree README section on page recycling. Consolidate discussion of how VACUUM places pages in the FSM for recycling by adding a new section that comes after discussion of page deletion. This structure reflects the fact that page recycling is explicitly decoupled from page deletion in Lanin & Shasha's paper. Page recycling in nbtree is an implementation of what the paper calls "the drain technique". This decoupling is an important concept for nbtree VACUUM. Searchers have to detect and recover from concurrent page deletions, but they will never have to reason about concurrent page recycling. Recycling can almost always be thought of as a low level garbage collection operation that asynchronously frees the physical space that backs a logical tree node. Almost all code need only concern itself with logical tree nodes. (Note that "logical tree node" is not currently a term of art in the nbtree code -- this all works implicitly.) This is preparation for an upcoming patch that teaches nbtree VACUUM to remember the details of pages that it deletes on the fly, in local memory. This enables the same VACUUM operation to consider placing its own deleted pages in the FSM later on, when it reaches the end of btvacuumscan().	2021-02-18 21:16:33 -08:00
Tom Lane	b5a66e7353	Fix another ancient bug in parsing of BRE-mode regular expressions. While poking at the regex code, I happened to notice that the bug squashed in commit `afcc8772e` had a sibling: next() failed to return a specific value associated with the '}' token for a "\{m,n\}" quantifier when parsing in basic RE mode. Again, this could result in treating the quantifier as non-greedy, which it never should be in basic mode. For that to happen, the last character before "\}" that sets "nextvalue" would have to set it to zero, or it'd have to have accidentally been zero from the start. The failure can be provoked repeatably with, for example, a bound ending in digit "0". Like the previous patch, back-patch all the way.	2021-02-18 22:38:55 -05:00
Fujii Masao	614b7f18b3	Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver. Commit `2c8dd05d6c` added the atomic variable writtenUpto into walreceiver's shared memory information. It's initialized only when walreceiver started up but could be read via pg_stat_wal_receiver view anytime, i.e., even before it's initialized. In the server built with --disable-atomics and --disable-spinlocks, this uninitialized atomic variable read could cause "invalid spinlock number: 0" error. This commit changed writtenUpto so that it's initialized at the postmaster startup, to avoid the uninitialized variable read via pg_stat_wal_receiver and fix the error. Also this commit moved the read of writtenUpto after the release of spinlock protecting walreceiver's shared variables. This is necessary to prevent new spinlock from being taken by atomic variable read while holding another spinlock, and to shorten the spinlock duration. This change leads writtenUpto not to be consistent with the other walreceiver's shared variables protected by a spinlock. But this is OK because writtenUpto should not be used for data integrity checks. Back-patch to v13 where commit `2c8dd05d6c` introduced the bug. Author: Fujii Masao Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com	2021-02-18 23:28:15 +09:00
Peter Eisentraut	f5465fade9	Allow specifying CRL directory Add another method to specify CRLs, hashed directory method, for both server and client side. This offers a means for server or libpq to load only CRLs that are required to verify a certificate. The CRL directory is specifed by separate GUC variables or connection options ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and sslcrl, so both methods can be used at the same time. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com	2021-02-18 07:59:10 +01:00
Peter Geoghegan	128dd901a5	nbtree README: move VACUUM linear scan section. Discuss VACUUM's linear scan after discussion of tuple deletion by VACUUM, but before discussion of page deletion by VACUUM. This progression is a lot more natural. Also tweak the wording a little. It seems unnecessary to talk about how it worked prior to PostgreSQL 8.2.	2021-02-17 21:13:15 -08:00
Tomas Vondra	927f453a94	Fix tuple routing to initialize batching only for inserts A cross-partition update on a partitioned table is implemented as a delete followed by an insert. With foreign partitions, this was however causing issues, because the FDW and core may disagree on when to enable batching. postgres_fdw was only allowing batching for plain inserts (CMD_INSERT) while core was trying to batch the insert component of the cross-partition update. Fix by restricting core to apply batching only to plain CMD_INSERT queries. It's possible to allow batching for cross-partition updates, but that will require more extensive changes, so better to leave that for a separate patch. Author: Amit Langote Reviewed-by: Tomas Vondra, Takayuki Tsunakawa Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development	2021-02-18 00:03:45 +01:00
Tom Lane	4e703d6719	Make some minor improvements in the regex code. Push some hopefully-uncontroversial bits extracted from an upcoming patch series, to remove non-relevant clutter from the main patches. In compact(), return immediately after setting REG_ASSERT error; continuing the loop would just lead to assertion failure below. (Ask me how I know.) In parseqatom(), remove assertion that moresubs() did its job. When moresubs actually did its job, this is redundant with that function's final assert; but when it failed on OOM, this is an assertion crash. We could avoid the crash by adding a NOERR() check before the assertion, but it seems better to subtract code than add it. (Note that there's a NOERR exit a few lines further down, and nothing else between here and there requires moresubs to have succeeded. So we don't really need an extra error exit.) This is a live bug in assert-enabled builds, but given the very low likelihood of OOM in moresub's tiny allocation, I don't think it's worth back-patching. On the other hand, it seems worthwhile to add an assertion that our intended v->subs[subno] target is still null by the time we are ready to insert into it, since there's a recursion in between. In pg_regexec, ensure we fflush any debug output on the way out, and try to make MDEBUG messages more uniform and helpful. (In particular, ensure that all of them are prefixed with the subre's id number, so one can match up entry and exit reports.) Add some test cases in test_regex to improve coverage of lookahead and lookbehind constraints. Adding these now is mainly to establish that this is indeed the existing behavior. Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-17 12:24:23 -05:00
Peter Eisentraut	f40c6969d0	Routine usage information schema tables Several information schema views track dependencies between functions/procedures and objects used by them. These had not been implemented so far because PostgreSQL doesn't track objects used in a function body. However, formally, these also show dependencies used in parameter default expressions, which PostgreSQL does support and track. So for the sake of completeness, we might as well add these. If dependency tracking for function bodies is ever implemented, these views will automatically work correctly. Reviewed-by: Erik Rijkers <er@xs4all.nl> Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com	2021-02-17 18:16:06 +01:00
Peter Eisentraut	0e392fcc0d	Use errmsg_internal for debug messages An inconsistent set of debug-level messages was not using errmsg_internal(), thus uselessly exposing the messages to translation work. Fix those.	2021-02-17 11:33:25 +01:00
Tom Lane	38bb3aef35	Convert tsginidx.c's GIN indexing logic to fully ternary operation. Commit `2f2007fbb` did this partially, but there were two remaining warts. checkcondition_gin handled some uncertain cases by setting the out-of-band recheck flag, some by returning TS_MAYBE, and some by doing both. Meanwhile, TS_execute arbitrarily converted a TS_MAYBE result to TS_YES. Thus, if checkcondition_gin chose to only return TS_MAYBE, the outcome would be TS_YES with no recheck flag, potentially resulting in wrong query outputs. The case where this'd happen is if there were GIN_MAYBE entries in the indexscan results passed to gin_tsquery_[tri]consistent, which so far as I can see would only happen if the tidbitmap used to accumulate indexscan results grew large enough to become lossy. I initially thought of fixing this by ensuring we always set the recheck flag as well as returning TS_MAYBE in uncertain cases. But that errs in the other direction, potentially forcing rechecks of rows that provably match the query (since the recheck flag remains set even if TS_execute later finds that the answer must be TS_YES). Instead, let's get rid of the out-of-band recheck flag altogether and rely on returning TS_MAYBE. This requires exporting a version of TS_execute that will actually return the full ternary result of the evaluation ... but we likely should have done that to start with. Unfortunately it doesn't seem practical to add a regression test case that covers this: the amount of data needed to cause the GIN bitmap to become lossy results in a longer runtime than I think we want to have in the tests. (I'm wondering about allowing smaller work_mem settings to ameliorate that, but it'd be a matter for a separate patch.) Per bug #16865 from Dimitri Nüscheler. Back-patch to v13 where the faulty commit came in. Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org	2021-02-16 12:07:14 -05:00
Amit Kapila	f672df5fdd	Remove the unnecessary PrepareWrite in pgoutput. This issue exists from the inception of this code (PG-10) but got exposed by the recent commit `ce0fdbfe97` where we are using origins in tablesync workers. The problem was that we were sometimes sending the prepare_write ('w') message but then the actual message was not being sent and on the subscriber side, we always expect a message after prepare_write message which led to this bug. I refrained from backpatching this because there is no way in the core code to hit this prior to commit `ce0fdbfe97` and we haven't received any complaints so far. Reported-by: Erik Rijkers Author: Amit Kapila and Vignesh C Tested-by: Erik Rijkers Discussion: https://postgr.es/m/1295168140.139428.1613133237154@webmailclassic.xs4all.nl	2021-02-16 07:26:50 +05:30
Andres Freund	8001cb77ee	Fix heap_page_prune() parameter order confusion introduced in `dc7420c2c9`. Both luckily and unluckily the passed values meant the same for all types. Luckily because that meant my confusion caused no harm, unluckily because otherwise the compiler might have warned... In passing, synchronize parameter names between definition and declaration. Reported-By: Peter Geoghegan <pg@bowt.ie> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wz=L=nBoepQdH9b5Qd0nMvepFT2CnT6sjWvvpOXa=K8HVQ@mail.gmail.com	2021-02-15 17:12:12 -08:00
Andres Freund	a975ff4980	Remove backwards compat ugliness in snapbuild.c. In `955a684e04` we fixed a bug in initial snapshot creation. In the course of which several members of struct SnapBuild were obsoleted. As SnapBuild is serialized to disk we couldn't change the memory layout. Unfortunately I subsequently forgot about removing the backward compat gunk, but luckily Heikki just reminded me. This commit bumps SNAPBUILD_VERSION, therefore breaking existing slots (which is fine in a major release). Author: Andres Freund Reminded-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi	2021-02-15 16:57:47 -08:00
Tom Lane	0e52903128	Simplify loop logic in nodeIncrementalSort.c. The inner loop in switchToPresortedPrefixMode() can be implemented as a conventional integer-counter for() loop, removing a couple of redundant boolean state variables. The old logic here was a remnant of earlier development, but as things now stand there's no reason for extra complexity. Also, annotate the test case added by `82e0e2930` to explain why it manages to hit the corner case fixed in that commit, and add an EXPLAIN to verify that it's creating an incremental-sort plan. Back-patch to v13, like the previous patch. James Coleman and Tom Lane Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org	2021-02-15 10:17:58 -05:00
Heikki Linnakangas	54e51dcde0	Make ExecGetInsertedCols() and friends more robust and improve comments. If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols() were called with a ResultRelInfo that's not in the range table and isn't a partition routing target, the functions would dereference a NULL pointer, relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing RI triggers in tables that are not modified directly. None of the current callers of these functions pass such relations, so this isn't a live bug, but let's make them more robust. Also update comment in ResultRelInfo; after commit `6214e2b228`, ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple routing. Noted by Coverity. Backpatch down to v11, like commit `6214e2b228`. Reviewed-by: Tom Lane, Amit Langote	2021-02-15 09:28:08 +02:00
Fujii Masao	46d6e5f567	Display the time when the process started waiting for the lock, in pg_locks, take 2 This commit adds new column "waitstart" into pg_locks view. This column reports the time when the server process started waiting for the lock if the lock is not held. This information is useful, for example, when examining the amount of time to wait on a lock by subtracting "waitstart" in pg_locks from the current time, and identify the lock that the processes are waiting for very long. This feature uses the current time obtained for the deadlock timeout timer as "waitstart" (i.e., the time when this process started waiting for the lock). Since getting the current time newly can cause overhead, we reuse the already-obtained time to avoid that overhead. Note that "waitstart" is updated without holding the lock table's partition lock, to avoid the overhead by additional lock acquisition. This can cause "waitstart" in pg_locks to become NULL for a very short period of time after the wait started even though "granted" is false. This is OK in practice because we can assume that users are likely to look at "waitstart" when waiting for the lock for a long time. The first attempt of this patch (commit `3b733fcd04`) caused the buildfarm member "rorqual" (built with --disable-atomics --disable-spinlocks) to report the failure of the regression test. It was reverted by commit `890d2182a2`. The cause of this failure was that the atomic variable for "waitstart" in the dummy process entry created at the end of prepare transaction was not initialized. This second attempt fixes that issue. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com	2021-02-15 15:13:37 +09:00
Peter Geoghegan	7cde6b13a9	Adjust lazy_scan_heap() accounting comments. Explain which particular LP_DEAD line pointers get accounted for by the tups_vacuumed variable.	2021-02-14 19:28:37 -08:00
Thomas Munro	f900a79ecd	Default to wal_sync_method=fdatasync on FreeBSD. FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to choose open_datasync as its default value. That may not be a good choice for all systems, and performs worse than fdatasync in some scenarios. Let's preserve the existing default behavior for now. Like commit `576477e73c`, which did the same for Linux, back-patch to all supported releases. Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com	2021-02-15 16:04:59 +13:00
Amit Kapila	d9b0767bec	Fix the warnings introduced in commit `ce0fdbfe97`. Author: Amit Kapila Reviewed-by: Tom Lane Discussion: https://postgr.es/m/1610789.1613170207@sss.pgh.pa.us	2021-02-15 07:28:02 +05:30
Thomas Munro	637668fb1d	Hold interrupts while running dsm_detach() callbacks. While cleaning up after a parallel query or parallel index creation that created temporary files, we could be interrupted by a statement timeout. The error handling path would then fail to clean up the files when it ran dsm_detach() again, because the callback was already popped off the list. Prevent this hazard by holding interrupts while the cleanup code runs. Thanks to Heikki Linnakangas for this suggestion, and also to Kyotaro Horiguchi, Masahiko Sawada, Justin Pryzby and Tom Lane for discussion of this and earlier ideas on how to fix the problem. Back-patch to all supported releases. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20191212180506.GR2082@telsasoft.com	2021-02-15 14:27:33 +13:00
Michael Paquier	b83dcf7928	Add result size as argument of pg_cryptohash_final() for overflow checks With its current design, a careless use of pg_cryptohash_final() could would result in an out-of-bound write in memory as the size of the destination buffer to store the result digest is not known to the cryptohash internals, without the caller knowing about that. This commit adds a new argument to pg_cryptohash_final() to allow such sanity checks, and implements such defenses. The internals of SCRAM for HMAC could be tightened a bit more, but as everything is based on SCRAM_KEY_LEN with uses particular to this code there is no need to complicate its interface more than necessary, and this comes back to the refactoring of HMAC in core. Except that, this minimizes the uses of the existing DIGEST_LENGTH variables, relying instead on sizeof() for the result sizes. In ossp-uuid, this also makes the code more defensive, as it already relied on dce_uuid_t being at least the size of a MD5 digest. This is in philosophy similar to `cfc40d3` for base64.c and `aef8948` for hex.c. Reported-by: Ranier Vilela Author: Michael Paquier, Ranier Vilela Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com	2021-02-15 10:18:34 +09:00
Tom Lane	2dd6733108	Minor fixes to improve regex debugging code. When REG_DEBUG is defined, ensure that an un-filled "struct cnfa" is all-zeroes, not just that it has nstates == 0. This is mainly so that looking at "struct subre" structs in gdb doesn't distract one with a lot of garbage fields during regex compilation. Adjust some places that print debug output to have suitable fflush calls afterwards. In passing, correct an erroneous ancient comment: the concatenation subre-s created by parsebranch() have op == '.' not ','. Noted while fooling around with some regex performance improvements.	2021-02-14 19:53:42 -05:00
Thomas Munro	c7ecd6af01	ReadNewTransactionId() -> ReadNextTransactionId(). The new name conveys the effect better, is more consistent with similar functions ReadNextMultiXactId(), ReadNextFullTransactionId(), and matches the name of the variable that it reads. Reported-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzmVR4SakBXQUdhhPpMf1aYvZCnna5%3DHKa7DAgEmBAg%2B8g%40mail.gmail.com	2021-02-15 13:17:02 +13:00
Bruce Momjian	8facf1ea00	README/C-comment: document GiST's NSN value	2021-02-13 13:50:49 -05:00
Tom Lane	ae4867ec74	Avoid divide-by-zero in regex_selectivity() with long fixed prefix. Given a regex pattern with a very long fixed prefix (approaching 500 characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can underflow to zero. Typically the preceding selectivity calculation would have underflowed as well, so that we compute 0/0 and get NaN. In released branches this leads to an assertion failure later on. That doesn't happen in HEAD, for reasons I've not explored yet, but it's surely still a bug. To fix, just skip the division when the pow() result is zero, so that we'll (most likely) return a zero selectivity estimate. In the edge cases where "sel" didn't yet underflow, perhaps this isn't desirable, but I'm not sure that the case is worth spending a lot of effort on. The results of regex_selectivity_sub() are barely worth the electrons they're written on anyway :-( Per report from Alexander Lakhin. Back-patch to all supported versions. Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com	2021-02-12 16:26:47 -05:00
Amit Kapila	ce0fdbfe97	Allow multiple xacts during table sync in logical replication. For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com	2021-02-12 07:41:51 +05:30
Peter Geoghegan	3063eb1759	Remove obsolete IndexBulkDeleteResult stats field. The pages_removed field is no longer used for anything. It hasn't been possible for an index to physically shrink since old-style VACUUM FULL was removed by commit `0a469c87`.	2021-02-11 16:49:41 -08:00
Tom Lane	69036aafb9	Simplify jsonfuncs.c code by using strtoint() not strtol(). Explicitly testing for INT_MIN and INT_MAX isn't particularly good style; it's tedious and may draw useless compiler warnings on machines where int and long are the same width. We invented strtoint() precisely for this usage, so use that instead. While here, remove gratuitous variations in the way the tests for did-strtoint-succeed were spelled. Also, avoid attempting to negate INT_MIN; that would probably work given that the result is implicitly cast to uint32, but I think it's nominally undefined behavior. Per gripe from Ranier Vilela, though this isn't his proposed patch. Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com	2021-02-11 12:49:22 -05:00
Tom Lane	d4c746516b	Remove no-longer-used RTE argument of markVarForSelectPriv(). In the wake of `c028faf2a`, this is no longer needed. I left it out of that patch since the API change would be undesirable in a released branch; but there's no reason not to do it in HEAD.	2021-02-11 11:23:25 -05:00
Peter Eisentraut	4ad5611055	Fix lack of message pluralization	2021-02-10 11:35:45 +01:00
Michael Paquier	092b785fad	Simplify code related to compilation of SSL and OpenSSL This commit makes more generic some comments and code related to the compilation with OpenSSL and SSL in general to ease the addition of more SSL implementations in the future. In libpq, some OpenSSL-only code is moved under USE_OPENSSL and not USE_SSL. While on it, make a comment more consistent in libpq-fe.h. Author: Daniel Gustafsson Discussion: https://postgr.es/m/5382CB4A-9CF3-4145-BA46-C802615935E0@yesql.se	2021-02-10 15:28:19 +09:00
Michael Paquier	bd12080980	Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY For an index, attstattarget can be updated using ALTER INDEX SET STATISTICS. This data was lost on the new index after REINDEX CONCURRENTLY. The update of this field is done when the old and new indexes are swapped to make the fix back-patchable. Another approach we could look after in the long-term is to change index_create() to pass the wanted values of attstattarget when creating the new relation, but, as this would cause an ABI breakage this can be done only on HEAD. Reported-by: Ronan Dunklau Author: Michael Paquier Reviewed-by: Ronan Dunklau, Tomas Vondra Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand Backpatch-through: 12	2021-02-10 13:06:48 +09:00
Amit Kapila	cd142e032e	Make pg_replication_origin_drop safe against concurrent drops. Currently, we get the origin id from the name and then drop the origin by taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent sessions can get the id from the name at the same time and then when they try to drop the origin, one of the sessions will get the either "tuple concurrently deleted" or "cache lookup failed for replication origin ..". To prevent this race condition we do the entire operation under lock. This obviates the need for replorigin_drop() API and we have removed it so if any extension authors are using it they need to instead use replorigin_drop_by_name. See it's usage in pg_replication_origin_drop(). Author: Peter Smith Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com	2021-02-10 07:17:09 +05:30
Peter Geoghegan	31c7fb41e2	Fix obsolete FSM remarks in nbtree README. The free space map has used a dedicated relation fork rather than shared memory segments for over a decade.	2021-02-09 11:36:51 -08:00
Fujii Masao	890d2182a2	Revert "Display the time when the process started waiting for the lock, in pg_locks." This reverts commit `3b733fcd04`. Per buildfarm members prion and rorqual.	2021-02-09 18:30:40 +09:00
Fujii Masao	3b733fcd04	Display the time when the process started waiting for the lock, in pg_locks. This commit adds new column "waitstart" into pg_locks view. This column reports the time when the server process started waiting for the lock if the lock is not held. This information is useful, for example, when examining the amount of time to wait on a lock by subtracting "waitstart" in pg_locks from the current time, and identify the lock that the processes are waiting for very long. This feature uses the current time obtained for the deadlock timeout timer as "waitstart" (i.e., the time when this process started waiting for the lock). Since getting the current time newly can cause overhead, we reuse the already-obtained time to avoid that overhead. Note that "waitstart" is updated without holding the lock table's partition lock, to avoid the overhead by additional lock acquisition. This can cause "waitstart" in pg_locks to become NULL for a very short period of time after the wait started even though "granted" is false. This is OK in practice because we can assume that users are likely to look at "waitstart" when waiting for the lock for a long time. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com	2021-02-09 18:10:19 +09:00
Michael Paquier	7cb3048f38	Add option PROCESS_TOAST to VACUUM This option controls if toast tables associated with a relation are vacuumed or not when running a manual VACUUM. It was already possible to trigger a manual VACUUM on a toast relation without processing its main relation, but a manual vacuum on a main relation always forced a vacuum on its toast table. This is useful in scenarios where the level of bloat or transaction age of the main and toast relations differs a lot. This option is an extension of the existing VACOPT_SKIPTOAST that was used by autovacuum to control if toast relations should be skipped or not. This internal flag is renamed to VACOPT_PROCESS_TOAST for consistency with the new option. A new option switch, called --no-process-toast, is added to vacuumdb. Author: Nathan Bossart Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com	2021-02-09 14:13:57 +09:00
Tom Lane	c028faf2a6	Fix mishandling of column-level SELECT privileges for join aliases. scanNSItemForColumn, expandNSItemAttrs, and ExpandSingleTable would pass the wrong RTE to markVarForSelectPriv when dealing with a join ParseNamespaceItem: they'd pass the join RTE, when what we need to mark is the base table that the join column came from. The end result was to not fill the base table's selectedCols bitmap correctly, resulting in an understatement of the set of columns that are read by the query. The executor would still insist on there being at least one selectable column; but with a correctly crafted query, a user having SELECT privilege on just one column of a table would nonetheless be allowed to read all its columns. To fix, make markRTEForSelectPriv fetch the correct RTE for itself, ignoring the possibly-mismatched RTE passed by the caller. Later, we'll get rid of some now-unused RTE arguments, but that risks API breaks so we won't do it in released branches. This problem was introduced by commit `9ce77d75c`, so back-patch to v13 where that came in. Thanks to Sven Klemm for reporting the problem. Security: CVE-2021-20229	2021-02-08 10:14:09 -05:00
Heikki Linnakangas	6214e2b228	Fix permission checks on constraint violation errors on partitions. If a cross-partition UPDATE violates a constraint on the target partition, and the columns in the new partition are in different physical order than in the parent, the error message can reveal columns that the user does not have SELECT permission on. A similar bug was fixed earlier in commit `804b6b6db4`. The cause of the bug is that the callers of the ExecBuildSlotValueDescription() function got confused when constructing the list of modified columns. If the tuple was routed from a parent, we converted the tuple to the parent's format, but the list of modified columns was grabbed directly from the child's RTE entry. ExecUpdateLockMode() had a similar issue. That lead to confusion on which columns are key columns, leading to wrong tuple lock being taken on tables referenced by foreign keys, when a row is updated with INSERT ON CONFLICT UPDATE. A new isolation test is added for that corner case. With this patch, the ri_RangeTableIndex field is no longer set for partitions that don't have an entry in the range table. Previously, it was set to the RTE entry of the parent relation, but that was confusing. NOTE: This modifies the ResultRelInfo struct, replacing the ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to backpatch, because it breaks any extensions accessing the field. The change that ri_RangeTableIndex is not set for partitions could potentially break extensions, too. The ResultRelInfos are visible to FDWs at least, and this patch required small changes to postgres_fdw. Nevertheless, this seem like the least bad option. I don't think these fields widely used in extensions; I don't think there are FDWs out there that uses the FDW "direct update" API, other than postgres_fdw. If there is, you will get a compilation error, so hopefully it is caught quickly. Backpatch to 11, where support for both cross-partition UPDATEs, and unique indexes on partitioned tables, were added. Reviewed-by: Amit Langote Security: CVE-2021-3393	2021-02-08 11:01:51 +02:00
Peter Geoghegan	617fffee8a	Rename removable xid function for consistency. GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid(). This is consistent with the general convention for FullTransactionId equivalents of functions that deal with TransactionId values. It now matches the nearby GlobalVisCheckRemovableXid() function, which performs the same check for callers that use TransactionId values. Oversight in commit `dc7420c2c9`. Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com	2021-02-07 10:11:14 -08:00
Tom Lane	d1d2979852	Revert "Propagate CTE property flags when copying a CTE list into a rule." This reverts commit `ed29089633` and equivalent back-branch commits. The issue is subtler than I thought, and it's far from new, so just before a release deadline is no time to be fooling with it. We'll consider what to do at a bit more leisure. Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com	2021-02-07 12:54:08 -05:00
Tom Lane	ed29089633	Propagate CTE property flags when copying a CTE list into a rule. rewriteRuleAction() neglected this step, although it was careful to propagate other similar flags such as hasSubLinks or hasRowSecurity. Omitting to transfer hasRecursive is just cosmetic at the moment, but omitting hasModifyingCTE is a live bug, since the executor certainly looks at that. The proposed test case only fails back to v10, but since the executor examines hasModifyingCTE in 9.x as well, I suspect that a test case could be devised that fails in older branches. Given the nearness of the release deadline, though, I'm not going to spend time looking for a better test. Report and patch by Greg Nancarrow, cosmetic changes by me Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com	2021-02-06 19:28:39 -05:00
Tom Lane	dd705a039f	Disallow converting an inheritance child table to a view. Generally, members of inheritance trees must be plain tables (or, in more recent versions, foreign tables). ALTER TABLE INHERIT rejects creating an inheritance relationship that has a view at either end. When DefineQueryRewrite attempts to convert a relation to a view, it already had checks prohibiting doing so for partitioning parents or children as well as traditional-inheritance parents ... but it neglected to check that a traditional-inheritance child wasn't being converted. Since the planner assumes that any inheritance child is a table, this led to making plans that tried to do a physical scan on a view, causing failures (or even crashes, in recent versions). One could imagine trying to support such a case by expanding the view normally, but since the rewriter runs before the planner does inheritance expansion, it would take some very fundamental refactoring to make that possible. There are probably a lot of other parts of the system that don't cope well with such a situation, too. For now, just forbid it. Per bug #16856 from Yang Lin. Back-patch to all supported branches. (In versions before v10, this includes back-patching the portion of commit `501ed02cf` that added has_superclass(). Perhaps the lack of that infrastructure partially explains the missing check.) Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org	2021-02-06 15:17:01 -05:00
Michael Paquier	f7400823c3	Clarify some comments around SharedRecoveryState in xlog.c SharedRecoveryState has been switched from a boolean to an enum as of commit `4e87c48`, but some comments still referred to it as a boolean. Author: Amul Sul Reviewed-by: Dilip Kumar, Kyotaro Horiguchi Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig	2021-02-06 10:27:55 +09:00
Heikki Linnakangas	c444472af5	Fix backslash-escaping multibyte chars in COPY FROM. If a multi-byte character is escaped with a backslash in TEXT mode input, and the encoding is one of the client-only encodings where the bytes after the first one can have an ASCII byte "embedded" in the char, we didn't skip the character correctly. After a backslash, we only skipped the first byte of the next character, so if it was a multi-byte character, we would try to process its second byte as if it was a separate character. If it was one of the characters with special meaning, like '\n', '\r', or another '\\', that would cause trouble. One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding. That's supposed to be [backslash][two-byte character][.][f][o][o], but because the second byte of the two-byte character is 0x5c, we incorrectly treat it as another backslash. And because the next character is a dot, we parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt" error. Backpatch to all supported versions. Reviewed-by: John Naylor, Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi	2021-02-05 11:14:56 +02:00
Tom Lane	0ff865fbe5	Fix bug in HashAgg's selective-column-spilling logic. Commit `230230223` taught nodeAgg.c that, when spilling tuples from memory in an oversized hash aggregation, it only needed to spill input columns referenced in the node's tlist and quals. Unfortunately, that's wrong: we also have to save the grouping columns. The error is masked in common cases because the grouping columns also appear in the tlist, but that's not necessarily true. The main category of plans where it's not true seem to come from semijoins ("WHERE outercol IN (SELECT innercol FROM innertable)") where the innercol needs an implicit promotion to make it comparable to the outercol. The grouping column will be "innercol::promotedtype", but that expression appears nowhere in the Agg node's own tlist and quals; only the bare "innercol" is found in the tlist. I spent quite a bit of time looking for a suitable regression test case for this, without much success. If the number of distinct values of the innercol is large enough to make spilling happen, the planner tends to prefer a non-HashAgg plan, at least for problem sizes that are reasonable to use in the regression tests. So, no new regression test. However, this patch does demonstrably fix the originally-reported test case. Per report from s.p.e (at) gmx-topmail.de. Backpatch to v13 where the troublesome code came in. Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78	2021-02-04 23:01:37 -05:00
Tom Lane	82e0e29308	Fix YA incremental sort bug. switchToPresortedPrefixMode() did the wrong thing if it detected a batch boundary just at the last tuple of a fullsort group. The initially-reported symptom was a "retrieved too many tuples in a bounded sort" error, but the test case added here just silently gives the wrong answer without this patch. I (tgl) am not really happy about committing this patch without review from the incremental-sort authors, but they seem AWOL and we are hard against a release deadline. This does demonstrably make some cases better, anyway. Per bug #16846 from Yoran Heling. Back-patch to v13 where incremental sort was introduced. Neil Chen Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org	2021-02-04 19:12:14 -05:00
Peter Geoghegan	c34787f910	Harden nbtree page deletion. Add some additional defensive checks in the second phase of index deletion to detect and report index corruption during VACUUM, and to avoid having VACUUM become stuck in more cases. The code is still not robust in the presence of a circular chain of sibling links, though it's not clear whether that really matters. This is follow-up work to commit `3a01f68e`. The new defensive checks rely on the assumption that there can be no more than one VACUUM operation running for an index at any given time. Remove an old comment suggesting that multiple concurrent VACUUMs need to be considered here. This concern now seems highly unlikely to have any real validity, since we clearly rely on the same assumption in several other places. For example, there are much more recent comments that appear in the same function (added by commit `efada2b8e9`) that make the same assumption. Also add a CHECK_FOR_INTERRUPTS() to the relevant code path. Contrary to comments added by commit `3a01f68e`, it is actually possible to handle interrupts here, at least in the common case where processing takes place at the leaf level. We only hold a pin on leafbuf/target page when stepping right at the leaf level. No backpatch due to the lack of complaints following hardening added to the same area by commit `3a01f68e`.	2021-02-04 15:42:36 -08:00
Heikki Linnakangas	2f86ab305e	Fix small error in COPY FROM progress reporting. The # of bytes processed was accumulated slightly incorrectly. After loading more data to the input buffer, we added the number of bytes in the buffer to the sum. But in case of multi-byte characters or escapes, there can be a few unprocessed bytes left over from previous load in the buffer. Those bytes got counted twice.	2021-02-04 17:40:33 +02:00
Peter Eisentraut	3c78e0569c	Refactor Windows error message for easier translation In the error messages referring to the user right "Lock pages in memory", this is a term from the Windows OS, so it should be translated in accordance with the OS localization. Refactor the error messages so this is easier and clearer. Also fix the capitalization to match the existing capitalization in the OS.	2021-02-04 13:31:13 +01:00
Michael Paquier	5128483d06	Ensure unlinking of old index file with REINDEX (TABLESPACE) The original versions of the patch included this part, but a mismerge from my side has made this piece go missing. Oversight in `c5b28604`.	2021-02-04 17:16:47 +09:00
Michael Paquier	fc749bc704	Clarify comment in tablesync.c Author: Peter Smith Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira Discussion: https://postgr.es/m/CAHut+Pt9_T6pWar0FLtPsygNmme8HPWPdGUyZ_8mE1Yvjdf0ZA@mail.gmail.com	2021-02-04 16:02:31 +09:00
Michael Paquier	c5b286047c	Add TABLESPACE option to REINDEX This patch adds the possibility to move indexes to a new tablespace while rebuilding them. Both the concurrent and the non-concurrent cases are supported, and the following set of restrictions apply: - When using TABLESPACE with a REINDEX command that targets a partitioned table or index, all the indexes of the leaf partitions are moved to the new tablespace. The tablespace references of the non-leaf, partitioned tables in pg_class.reltablespace are not changed. This requires an extra ALTER TABLE SET TABLESPACE. - Any index on a toast table rebuilt as part of a parent table is kept in its original tablespace. - The operation is forbidden on system catalogs, including trying to directly move a toast relation with REINDEX. This results in an error if doing REINDEX on a single object. REINDEX SCHEMA, DATABASE and SYSTEM skip system relations when TABLESPACE is used. Author: Alexey Kondratov, Michael Paquier, Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru	2021-02-04 14:34:20 +09:00
Tom Lane	9624321ec5	Avoid crash when rolling back within a prepared statement. If a portal is used to run a prepared CALL or DO statement that contains a ROLLBACK, PortalRunMulti fails because the portal's statement list gets cleared by the rollback. (Since the grammar doesn't allow CALL/DO in PREPARE, the only easy way to get to this is via extended query protocol, which treats all inputs as prepared statements.) It's difficult to avoid resetting the portal early because of resource-management issues, so work around this by teaching PortalRunMulti to be wary of portal->stmts having suddenly become NIL. The crash has only been seen to occur in v13 and HEAD (as a consequence of commit `1cff1b95a` having added an extra touch of portal->stmts). But even before that, the code involved touching a List that the portal no longer has any claim on. In the test case at hand, the List will still exist because of another refcount on the cached plan; but I'm far from convinced that it's impossible for the cached plan to have been dropped by the time control gets back to PortalRunMulti. Hence, backpatch to v11 where nested transactions were added. Thomas Munro and Tom Lane, per bug #16811 from James Inform Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org	2021-02-03 19:38:43 -05:00
Tom Lane	ba0faf81c6	Remove special BKI_LOOKUP magic for namespace and role OIDs. Now that commit `62f34097c` attached BKI_LOOKUP annotation to all the namespace and role OID columns in the catalogs, there's no real reason to have the magic PGNSP and PGUID symbols. Get rid of them in favor of implementing those lookups according to genbki.pl's normal pattern. This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes BKI_DEFAULT(pg_catalog), which seems a lot more transparent. BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps less so; but you can look into pg_authid.dat to discover that POSTGRES is the nonce name for the bootstrap superuser. This change also means that if we ever need cross-references in the initial catalog data to any of the other built-in roles besides POSTGRES, or to some other built-in schema besides pg_catalog, we can just do it. No catversion bump here, as there's no actual change in the contents of postgres.bki. Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us	2021-02-03 12:01:48 -05:00
Tom Lane	62f34097c8	Build in some knowledge about foreign-key relationships in the catalogs. This follows in the spirit of commit `dfb75e478`, which created primary key and uniqueness constraints to improve the visibility of constraints imposed on the system catalogs. While our catalogs contain many foreign-key-like relationships, they don't quite follow SQL semantics, in that the convention for an omitted reference is to write zero not NULL. Plus, we have some cases in which there are arrays each of whose elements is supposed to be an FK reference; SQL has no way to model that. So we can't create actual foreign key constraints to describe the situation. Nonetheless, we can collect and use knowledge about these relationships. This patch therefore adds annotations to the catalog header files to declare foreign-key relationships. (The BKI_LOOKUP annotations cover simple cases, but we weren't previously distinguishing which such columns are allowed to contain zeroes; we also need new markings for multi-column FK references.) Then, Catalog.pm and genbki.pl are taught to collect this information into a table in a new generated header "system_fk_info.h". The only user of that at the moment is a new SQL function pg_get_catalog_foreign_keys(), which exposes the table to SQL. The oidjoins regression test is rewritten to use pg_get_catalog_foreign_keys() to find out which columns to check. Aside from removing the need for manual maintenance of that test script, this allows it to cover numerous relationships that were not checked by the old implementation based on findoidjoins. (As of this commit, 217 relationships are checked by the test, versus 181 before.) Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us	2021-02-02 17:11:55 -05:00
Peter Eisentraut	1d71f3c83c	Improve confusing variable names The prototype calls the second argument of pgstat_progress_update_multi_param() "index", and some callers name their local variable that way. But when the surrounding code deals with index relations, this is confusing, and in at least one case shadowed another variable that is referring to an index relation. Adjust those call sites to have clearer local variable naming, similar to existing callers in indexcmds.c.	2021-02-02 09:20:22 +01:00
Michael Paquier	4ad31bb2ef	Remove unused column atttypmod from initial tablesync query The initial tablesync done by logical replication used a query to fetch the information of a relation's columns that included atttypmod, but it was left unused. This was added by `7c4f524`. Author: Euler Taveira Reviewed-by: Önder Kalacı, Amit Langote, Japin Li Discussion: https://postgr.es/m/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com	2021-02-02 13:59:23 +09:00
Tom Lane	f003a7522b	Remove [Merge]AppendPath.partitioned_rels. It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit `a929e17e5a` to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. The preceding two commits did away with all use of the partitioned_rels values; this commit just mechanically removes the fields and the code that calculated them. Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com	2021-02-01 14:43:54 -05:00
Tom Lane	5076f88bc9	Remove incidental dependencies on partitioned_rels lists. It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit `a929e17e5a` to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. This patch undoes a couple of very minor uses of the partitioned_rels values. createplan.c was testing for nil-ness to optimize away the preparatory work for make_partition_pruneinfo(). That is worth doing if the check is nigh free, but it's not worth going to any great lengths to avoid. create_append_path() was testing for nil-ness as part of deciding how to set up ParamPathInfo for an AppendPath. I replaced that with a check for the appendrel's parent rel being partitioned. That's not quite the same thing but should cover most cases. If we note any interesting loss of optimizations, we can dumb this down to just always use the more expensive method when the parent is a baserel. Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com	2021-02-01 14:34:59 -05:00
Tom Lane	fb2d645dd5	Revise make_partition_pruneinfo to not use its partitioned_rels input. It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit `a929e17e5a` to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. As a first step, teach make_partition_pruneinfo to collect the relevant partitioned tables for itself. It's not hard to do so by traversing from child tables up to parents using the AppendRelInfo links. While here, make some minor stylistic improvements; mainly, don't use the "Relids" alias for bitmapsets that are not identities of any relation considered by the planner. Try to document the logic better, too. No backpatch, as there does not seem to be a live problem before `a929e17e5a`. Also no new regression test; the code where the bug was will be gone at the end of this patch series, so it seems a bit pointless to memorialize the issue. Tom Lane and David Rowley, per reports from Andreas Seltenreich and Jaime Casanova. Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com	2021-02-01 14:05:51 -05:00
Peter Eisentraut	3696a600e2	SEARCH and CYCLE clauses This adds the SQL standard feature that adds the SEARCH and CYCLE clauses to recursive queries to be able to do produce breadth- or depth-first search orders and detect cycles. These clauses can be rewritten into queries using existing syntax, and that is what this patch does in the rewriter. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com	2021-02-01 14:32:51 +01:00
Alexander Korotkov	bb513b364b	Get rid of unnecessary memory allocation in jsonb_subscript_assign() Current code allocates memory for JsonbValue, but it could be placed locally.	2021-02-01 14:06:02 +03:00
Michael Paquier	fe61df7f82	Introduce --with-ssl={openssl} as a configure option This is a replacement for the existing --with-openssl, extending the logic to make easier the addition of new SSL libraries. The grammar is chosen to be similar to --with-uuid, where multiple values can be chosen, with "openssl" as the only supported value for now. The original switch, --with-openssl, is kept for compatibility. Author: Daniel Gustafsson, Michael Paquier Reviewed-by: Jacob Champion Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se	2021-02-01 19:19:44 +09:00
Tom Lane	7c5d57caed	Fix portability issue in new jsonbsubs code. On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit platform), the previous coding would compute a misaligned workspace->index pointer if nupper is odd. Architectures where misaligned access is a hard no-no would then fail. This appears to explain why thorntail is unhappy but other buildfarm members are not.	2021-02-01 02:03:59 -05:00
Alexander Korotkov	aa6e46daf5	Throw error when assigning jsonb scalar instead of a composite object During the jsonb subscripting assignment, the provided path might assume an object or an array where the source jsonb has a scalar value. Initial subscripting assignment logic will skip such an update operation with no message shown. This commit makes it throw an error to indicate this type of situation. Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov	2021-01-31 23:51:06 +03:00
Alexander Korotkov	81fcc72e66	Filling array gaps during jsonb subscripting This commit introduces two new flags for jsonb assignment: * JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps are filled with nulls (similar to the JavaScript behavior). This mode also instructs to create the whole path in a jsonb object if some part of the path (more than just the last element) is not present. * JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by preventing prepending of elements. Both flags are used only in jsonb subscripting assignment. Initially proposed by Nikita Glukhov based on polymorphic subscripting patch, but transformed into an independent change. Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov	2021-01-31 23:51:01 +03:00
Alexander Korotkov	676887a3b0	Implementation of subscripting for jsonb Subscripting for jsonb does not support slices, does not have a limit for the number of subscripts, and an assignment expects a replace value to have jsonb type. There is also one functional difference between assignment via subscripting and assignment via jsonb_set(). When an original jsonb container is NULL, the subscripting replaces it with an empty jsonb and proceeds with an assignment. For the sake of code reuse, we rearrange some parts of jsonb functionality to allow the usage of the same functions for jsonb_set and assign subscripting operation. The original idea belongs to Oleg Bartunov. Catversion is bumped. Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov	2021-01-31 23:50:40 +03:00
Peter Geoghegan	dc43492e46	Remove unused _bt_delitems_delete() argument. The latestRemovedXid values used by nbtree deletion operations are determined by _bt_delitems_delete()'s caller, so there is no reason to pass a separate heapRel argument. Oversight in commit `d168b66682`.	2021-01-31 10:10:55 -08:00
Alexander Korotkov	0c4f355c6a	Fix parsing of complex morphs to tsquery When to_tsquery() or websearch_to_tsquery() meet a complex morph containing multiple words residing adjacent position, these words are connected with OP_AND operator. That leads to surprising results. For instace, both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg') produce '( pg & class ) <-> pg' tsquery. This tsquery requires 'pg' and 'class' words to reside on the same position and doesn't match to to_tsvector('pg_class pg'). It appears to be ridiculous behavior, which needs to be fixed. This commit makes to_tsquery() or websearch_to_tsquery() connect words residing adjacent position with OP_PHRASE. Therefore, now those words are normally chained with other OP_PHRASE operator. The examples of above now produces 'pg <-> class <-> pg' tsquery, which matches to to_tsvector('pg_class pg'). Another effect of this commit is that complex morph word positions now need to match the tsvector even if there is no surrounding OP_PHRASE. This behavior change generally looks like an improvement but making this commit not backpatchable. Reported-by: Barry Pederson Bug: #16592 Discussion: https://postgr.es/m/16592-70b110ff9731c07d@postgresql.org Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Tom Lane, Neil Chen	2021-01-31 20:14:29 +03:00
Peter Eisentraut	dfb75e478c	Add primary keys and unique constraints to system catalogs For those system catalogs that have a unique indexes, make a primary key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE USING INDEX. This can be helpful for GUI tools that look for a primary key, and it might in the future allow declaring foreign keys, for making schema diagrams. The constraint creation statements are automatically created by genbki.pl from DECLARE_UNIQUE_INDEX directives. To specify which one of the available unique indexes is the primary key, use the new directive DECLARE_UNIQUE_INDEX_PKEY instead. By convention, we usually make a catalog's OID column its primary key, if it has one. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com	2021-01-30 19:44:29 +01:00
Peter Eisentraut	6aaaa76bb4	Allow GRANTED BY clause in normal GRANT and REVOKE statements The SQL standard allows a GRANTED BY clause on GRANT and REVOKE (privilege) statements that can specify CURRENT_USER or CURRENT_ROLE. In PostgreSQL, both of these are the default behavior. Since we already have all the parsing support for this for the GRANT (role) statement, we might as well add basic support for this for the privilege variant as well. This allows us to check off SQL feature T332. In the future, perhaps more interesting things could be done with this, too. Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com	2021-01-30 09:45:11 +01:00
Noah Misch	7da83415e5	Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE. Given a permanent relation rewritten in the current transaction, the old_snapshot_threshold mechanism assumed the relation had never been subject to early pruning. Hence, a query could fail to report "snapshot too old" when the rewrite followed an early truncation. ALTER TABLE SET TABLESPACE is probably the only rewrite mechanism capable of exposing this bug. REINDEX sets indcheckxmin, avoiding the problem. CLUSTER has zeroed page LSNs since before old_snapshot_threshold existed, so old_snapshot_threshold has never cooperated with it. ALTER TABLE ... SET DATA TYPE makes the table look empty to every past snapshot, which is strictly worse. Back-patch to v13, where commit `c6b92041d3` broke this. Kyotaro Horiguchi and Noah Misch Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com	2021-01-30 00:12:18 -08:00
Noah Misch	360bd2321b	Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables. CREATE PUBLICATION has failed spuriously when applied to a permanent relation created or rewritten in the current transaction. Make the same change to another site having the same semantic intent; the second instance has no user-visible consequences. Back-patch to v13, where commit `c6b92041d3` broke this. Kyotaro Horiguchi Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com	2021-01-30 00:11:38 -08:00
Noah Misch	8a54e12a38	Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions. In a cluster having used CREATE INDEX CONCURRENTLY while having enabled prepared transactions, queries that use the resulting index can silently fail to find rows. Fix this for future CREATE INDEX CONCURRENTLY by making it wait for prepared transactions like it waits for ordinary transactions. This expands the VirtualTransactionId structure domain to admit prepared transactions. It may be necessary to reindex to recover from past occurrences. Back-patch to 9.5 (all supported versions). Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael Paquier. Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru	2021-01-30 00:00:27 -08:00
Michael Paquier	24843297a9	Adjust comments of CheckRelationTableSpaceMove() and SetRelationTableSpace() `4c9c359`, that introduced those two functions, has been overoptimistic on the point that only ShareUpdateExclusiveLock would be required when moving a relation to a new tablespace. AccessExclusiveLock is a requirement, but ShareUpdateExclusiveLock may be used under specific conditions like REINDEX CONCURRENTLY where waits on past transactions make the operation safe even with a lower-level lock. The current code does only the former, so update the existing comments to reflect that. Once a REINDEX (TABLESPACE) is introduced, those comments would require an extra refresh to mention their new use case. While on it, fix an incorrect variable name. Per discussion with Álvaro Herrera. Discussion: https://postgr.es/m/20210127140741.GA14174@alvherre.pgsql	2021-01-29 13:59:18 +09:00
Thomas Munro	514b411a2b	Retire pg_standby. pg_standby was useful more than a decade ago, but now it is obsolete. It has been proposed that we retire it many times. Now seems like a good time to finally do it, because "waiting restore commands" are incompatible with a proposed recovery prefetching feature. Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com Author: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>	2021-01-29 14:09:41 +13:00
Tom Lane	1046dbedde	Silence another gcc 11 warning. Per buildfarm and local experimentation, bleeding-edge gcc isn't convinced that the MemSet in reorder_function_arguments() is safe. Shut it up by adding an explicit check that pronargs isn't negative, and by changing MemSet to memset. (It appears that either change is enough to quiet the warning at -O2, but let's do both to be sure.)	2021-01-28 17:19:16 -05:00
Alvaro Herrera	6f5c8a8ec2	Remove bogus restriction from BEFORE UPDATE triggers In trying to protect the user from inconsistent behavior, commit `487e9861d0` "Enable BEFORE row-level triggers for partitioned tables" tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row from one partition to another. However, it turns out that the restriction is wrong in two ways: first, it fails spuriously, preventing valid situations from working, as in bug #16794; and second, they don't protect from any misbehavior, because tuple routing would cope anyway. Fix by removing that restriction. We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers, though. It is valid and useful there. In the future we could remove it by having tuple reroute work for inserts as it does for updates. Backpatch to 13. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Phillip Menke <pg@pmenke.de> Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org	2021-01-28 16:56:07 -03:00
Tom Lane	1d9351a87c	Fix hash partition pruning with asymmetric partition sets. perform_pruning_combine_step() was not taught about the number of partition indexes used in hash partitioning; more embarrassingly, get_matching_hash_bounds() also had it wrong. These errors are masked in the common case where all the partitions have the same modulus and no partition is missing. However, with missing or unequal-size partitions, we could erroneously prune some partitions that need to be scanned, leading to silently wrong query answers. While a minimal-footprint fix for this could be to export get_partition_bound_num_indexes and make the incorrect functions use it, I'm of the opinion that that function should never have existed in the first place. It's not reasonable data structure design that PartitionBoundInfoData lacks any explicit record of the length of its indexes[] array. Perhaps that was all right when it could always be assumed equal to ndatums, but something should have been done about it as soon as that stopped being true. Putting in an explicit "nindexes" field makes both partition_bounds_equal() and partition_bounds_copy() simpler, safer, and faster than before, and removes explicit knowledge of the number-of-partition-indexes rules from some other places too. This change also makes get_hash_partition_greatest_modulus obsolete. I left that in place in case any external code uses it, but no core code does anymore. Per bug #16840 from Michał Albrycht. Back-patch to v11 where the hash partitioning code came in. (In the back branches, add the new field at the end of PartitionBoundInfoData to minimize ABI risks.) Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org	2021-01-28 13:41:55 -05:00
Heikki Linnakangas	6c5576075b	Add direct conversion routines between EUC_TW and Big5. Conversions between EUC_TW and Big5 were previously implemented by converting the whole input to MIC first, and then from MIC to the target encoding. Implement functions to convert directly between the two. The reason to do this now is that I'm working on a patch that will change the conversion function signature so that if the input is invalid, we convert as much as we can and return the number of bytes successfully converted. That's not possible if we use an intermediary format, because if an error happens in the intermediary -> final conversion, we lose track of the location of the invalid character in the original input. Avoiding the intermediate step makes the conversions faster, too. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi	2021-01-28 14:53:03 +02:00
Heikki Linnakangas	b80e10638e	Add mbverifystr() functions specific to each encoding. This makes pg_verify_mbstr() function faster, by allowing more efficient encoding-specific implementations. All the implementations included in this commit are pretty naive, they just call the same encoding-specific verifychar functions that were used previously, but that already gives a performance boost because the tight character-at-a-time loop is simpler. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi	2021-01-28 14:40:07 +02:00
Andrew Gierth	a3367aa3c4	Don't add bailout adjustment for non-strict deserialize calls. When building aggregate expression steps, strict checks need a bailout jump for when a null value is encountered, so there is a list of steps that require later adjustment. Adding entries to that list for steps that aren't actually strict would be harmless, except that there is an Assert which catches them. This leads to spurious errors on asserts builds, for data sets that trigger parallel aggregation of an aggregate with a non-strict deserialization function (no such aggregates exist in the core system). Repair by not adding the adjustment entry when it's not needed. Backpatch back to 11 where the code was introduced. Per a report from Darafei (Komzpa) of the PostGIS project; analysis and patch by me. Discussion: https://postgr.es/m/87mty7peb3.fsf@news-spur.riddles.org.uk	2021-01-28 10:53:10 +00:00
Michael Paquier	f854c69a5b	Refactor SQL functions of SHA-2 in cryptohashfuncs.c The same code pattern was repeated four times when compiling a SHA-2 hash. This refactoring has the advantage to issue a compilation warning if a new value is added to pg_cryptohash_type, so as anybody doing an addition in this area would need to consider if support for a new SQL function is needed or not. Author: Sehrope Sarkuni, Michael Paquier Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz	2021-01-28 16:13:26 +09:00
Peter Geoghegan	e19594c5c0	Reduce the default value of vacuum_cost_page_miss. When commit `f425b605` introduced cost based vacuum delays back in 2004, the defaults reflected then-current trends in hardware, as well as certain historical limitations in PostgreSQL. There have been enormous improvements in both areas since that time. The cost limit GUC defaults finally became much more representative of current trends following commit `cbccac37`, which decreased autovacuum_vacuum_cost_delay's default by 10x for PostgreSQL 12 (it went from 20ms to only 2ms). The relative costs have shifted too. This should also be accounted for by the defaults. More specifically, the relative importance of avoiding dirtying pages within VACUUM has greatly increased, primarily due to main memory capacity scaling and trends in flash storage. Within Postgres itself, improvements like sequential access during index vacuuming (at least in nbtree and GiST indexes) have also been contributing factors. To reflect all this, decrease the default of vacuum_cost_page_miss to 2. Since the default of vacuum_cost_page_dirty remains 20, dirtying a page is now considered 10x "costlier" than a page miss by default. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com	2021-01-27 15:11:13 -08:00
Robert Haas	69059d3b2f	In TrimCLOG(), don't reset XactCtl->shared->latest_page_number. Since the CLOG page number is not recorded directly in the checkpoint record, we have to use ShmemVariableCache->nextXid to figure out the latest CLOG page number at the start of recovery. However, as recovery progresses, replay of CLOG/EXTEND records will update our notion of the latest page number, and we should rely on that being accurate rather than recomputing the value based on an updated notion of nextXid. ShmemVariableCache->nextXid is only an approximation during recovery anyway, whereas CLOG/EXTEND records are an authoritative representation of how the SLRU has been updated. Commit `0fcc2decd4` makes this simplification possible, as before that change clog_redo() might have injected a bogus value here, and we'd want to get rid of that before entering normal running. Patch by me, reviewed by Heikki Linnakangas. Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com	2021-01-27 15:52:34 -05:00
Robert Haas	0fcc2decd4	In clog_redo(), don't set XactCtl->shared->latest_page_number. The comment is no longer accurate, and hasn't been entirely accurate since Hot Standby was introduced. The original idea here was that StartupCLOG() wouldn't be called until the end of recovery and therefore this value would be uninitialized when this code is reached, but Hot Standby made that true only when hot_standby=off, and commit `1f113abdf8` means that this value is now always initialized before replay even starts. The original purpose of this code was to bypass the sanity check in SimpleLruTruncate(), which will no longer occur: now, if something is wrong, that sanity check might trip during recovery. That's probably a good thing, because in the current code base latest_page_number should always be initialized and therefore we expect that the sanity check should pass. If it doesn't, something has gone wrong, and complaining about it is appropriate. Patch by me, reviewed by Heikki Linnakangas. Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com	2021-01-27 13:11:30 -05:00
Robert Haas	1f113abdf8	Move StartupCLOG() calls to just after we initialize ShmemVariableCache. Previously, the hot_standby=off code path did this at end of recovery, while the hot_standby=on code path did it at the beginning of recovery. It's better to do this in only one place because (a) it's simpler, (b) StartupCLOG() is trivial so trying to postpone the work isn't useful, and (c) this will make it possible to simplify some other logic. Patch by me, reviewed by Heikki Linnakangas. Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com	2021-01-27 12:20:46 -05:00
Peter Geoghegan	e42b3c3bd6	Fix GiST index deletion assert issue. Avoid calling heap_index_delete_tuples() with an empty deltids array to avoid an assertion failure. This issue was arguably an oversight in commit `b5f58cf2`, though the failing assert itself was added by my recent commit `d168b666`. No backpatch, though, since the oversight is harmless in the back branches. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec> Discussion: https://postgr.es/m/CAJKUy5jscES84n3puE=sYngyF+zpb4wv8UMtuLnLPv5z=6yyNw@mail.gmail.com	2021-01-26 23:24:37 -08:00
Michael Paquier	4c9c359d38	Refactor code in tablecmds.c to check and process tablespace moves Two code paths of tablecmds.c (for relations with storage and without storage) use the same logic to check if the move of a relation to a new tablespace is allowed or not and to update pg_class.reltablespace and pg_class.relfilenode. A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs similar checks to make sure that nothing is moved around in illegal ways (no mapped relations, shared relations only in pg_global, no move of temp tables owned by other backends). This reorganizes the existing code of ALTER TABLE so as all this logic is controlled by two new routines that can be reused for the other commands able to move relations across tablespaces, limiting the number of code paths in need of the same protections. This also removes some code that was duplicated for tables with and without storage for ALTER TABLE. Author: Alexey Kondratov, Michael Paquier Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz	2021-01-27 11:54:16 +09:00
Tom Lane	d5a83d79c9	Rethink recently-added SPI interfaces. SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are new in v14 (cf. commit `2f48ede08`). Before they can get out the door, let's change their APIs to follow the practice recently established by SPI_prepare_extended etc: shove all optional arguments into a struct that callers are supposed to pre-zero. The hope is to allow future addition of more options without either API breakage or a continuing proliferation of new SPI entry points. With that in mind, choose slightly more generic names for them: SPI_execute_extended and SPI_cursor_parse_open respectively. Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com	2021-01-26 16:37:12 -05:00
Tom Lane	ee895a655c	Improve performance of repeated CALLs within plpgsql procedures. This patch essentially is cleaning up technical debt left behind by the original implementation of plpgsql procedures, particularly commit `d92bc83c4`. That patch (or more precisely, follow-on patches fixing its worst bugs) forced us to re-plan CALL and DO statements each time through, if we're in a non-atomic context. That wasn't for any fundamental reason, but just because use of a saved plan requires having a ResourceOwner to hold a reference count for the plan, and we had no suitable resowner at hand, nor would the available APIs support using one if we did. While it's not that expensive to create a "plan" for CALL/DO, the cycles do add up in repeated executions. This patch therefore makes the following API changes: * GetCachedPlan/ReleaseCachedPlan are modified to let the caller specify which resowner to use to pin the plan, rather than forcing use of CurrentResourceOwner. * spi.c gains a "SPI_execute_plan_extended" entry point that lets callers say which resowner to use to pin the plan. This borrows the idea of an options struct from the recently added SPI_prepare_extended, hopefully allowing future options to be added without more API breaks. This supersedes SPI_execute_plan_with_paramlist (which I've marked deprecated) as well as SPI_execute_plan_with_receiver (which is new in v14, so I just took it out altogether). * I also took the opportunity to remove the crude hack of letting plpgsql reach into SPI private data structures to mark SPI plans as "no_snapshot". It's better to treat that as an option of SPI_prepare_extended. Now, when running a non-atomic procedure or DO block that contains any CALL or DO commands, plpgsql creates a ResourceOwner that will be used to pin the plans of the CALL/DO commands. (In an atomic context, we just use CurrentResourceOwner, as before.) Having done this, we can just save CALL/DO plans normally, whether or not they are used across transaction boundaries. This seems to be good for something like 2X speedup of a CALL of a trivial procedure with a few simple argument expressions. By restricting the creation of an extra ResourceOwner like this, there's essentially zero penalty in cases that can't benefit. Pavel Stehule, with some further hacking by me Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com	2021-01-25 22:28:29 -05:00
Andres Freund	55ef8555f0	Fix two typos in snapbuild.c. Reported-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi	2021-01-25 12:15:10 -08:00
Tom Lane	07d46fceb4	Fix broken ruleutils support for function TRANSFORM clauses. I chanced to notice that this dumped core due to a faulty Assert. To add insult to injury, the output has been misformatted since v11. Obviously we need some regression testing here. Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com	2021-01-25 13:03:43 -05:00
Robert Haas	d18e75664a	Remove CheckpointLock. Up until now, we've held this lock when performing a checkpoint or restartpoint, but commit `076a055acf` back in 2004 and commit `7e48b77b1c` from 2009, taken together, have removed all need for this. In the present code, there's only ever one process entitled to attempt a checkpoint: either the checkpointer, during normal operation, or the postmaster, during single-user operation. So, we don't need the lock. One possible concern in making this change is that it means that a substantial amount of code where HOLD_INTERRUPTS() was previously in effect due to the preceding LWLockAcquire() will now be running without that. This could mean that ProcessInterrupts() gets called in places from which it didn't before. However, this seems unlikely to do very much, because the checkpointer doesn't have any signal mapped to die(), so it's not clear how, for example, ProcDiePending = true could happen in the first place. Similarly with ClientConnectionLost and recovery conflicts. Also, if there are any such problems, we might want to fix them rather than reverting this, since running lots of code with interrupt handling suspended is generally bad. Patch by me, per an inquiry by Amul Sul. Review by Tom Lane and Michael Paquier. Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com	2021-01-25 12:34:38 -05:00
David Rowley	16dfe253e3	Fix hypothetical bug in heap backward scans Both heapgettup() and heapgettup_pagemode() incorrectly set the first page to scan in a backward scan in which the number of pages to scan was specified by heap_setscanlimits(). The code incorrectly started the scan at the end of the relation when startBlk was 0, or otherwise at startBlk - 1, neither of which is correct when only scanning a subset of pages. The fix here checks if heap_setscanlimits() has changed the number of pages to scan and if so we set the first page to scan as the final page in the specified range during backward scans. Proper adjustment of this code was forgotten when heap_setscanlimits() was added in `7516f5259` back in 9.5. However, practice, nowhere in core code performs backward scans after having used heap_setscanlimits(), yet, it is possible an extension uses the heap functions in this way, hence backpatch. An upcoming patch does use heap_setscanlimits() with backward scans, so this must be fixed before that can go in. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com Backpatch-through: 9.5, all supported versions	2021-01-25 19:52:18 +13:00
Amit Kapila	40ab64c1ec	Fix ALTER PUBLICATION...DROP TABLE behavior. Commit `69bd60672` fixed the initialization of streamed transactions for RelationSyncEntry. It forgot to initialize the publication actions while invalidating the RelationSyncEntry due to which even though the relation is dropped from a particular publication we still publish its changes. Fix it by initializing pubactions when entry got invalidated. Author: Japin Li and Bharath Rupireddy Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com	2021-01-25 07:39:29 +05:30
Tomas Vondra	39b66a91bd	Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS This adds code omitted from commit `7db0cd2145` by accident, which had two consequences. Firstly, only rows inserted by heap_multi_insert were frozen as expected when running COPY FREEZE, while heap_insert left rows unfrozen. That however includes rows in TOAST tables, so a lot of data might have been left unfrozen. Secondly, page might have been left partially empty after relcache invalidation. This addresses both of those issues. Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com	2021-01-24 01:08:11 +01:00
Tom Lane	7cd9765f9b	Re-allow DISTINCT in pl/pgsql expressions. I'd omitted this from the grammar in commit `c9d529848`, figuring that it wasn't worth supporting. However we already have one complaint, so it seems that judgment was wrong. It doesn't require a huge amount of code, so add it back. (I'm still drawing the line at UNION/INTERSECT/EXCEPT though: those'd require an unreasonable amount of grammar refactoring, and the single-result-row restriction makes them near useless anyway.) Also rethink the documentation: this behavior is a property of all pl/pgsql expressions, not just assignments. Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org	2021-01-22 16:26:22 -05:00
Peter Eisentraut	09418bed67	Remove bogus tracepoint Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE tracepoint, but LWLockWaitForVar() never actually acquires the LWLock. (Probably a copy/paste bug in 68a2e52bbaf.) Remove it. Author: Craig Ringer <craig.ringer@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com	2021-01-22 11:58:21 +01:00
Michael Paquier	af0e79c8f4	Move SSL information callback earlier to capture more information The callback for retrieving state change information during connection setup was only installed when the connection was mostly set up, and thus didn't provide much information and missed all the details related to the handshake. This also extends the callback with SSL_state_string_long() to print more information about the state change within the SSL object handled. While there, fix some comments which were incorrectly referring to the callback and its previous location in fe-secure.c. Author: Daniel Gustafsson Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se	2021-01-22 09:26:27 +09:00
Tom Lane	55dc86eca7	Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar. Previously, pull_varnos() took the relids of a PlaceHolderVar as being equal to the relids in its contents, but that fails to account for the possibility that we have to postpone evaluation of the PHV due to outer joins. This could result in a malformed plan. The known cases end up triggering the "failed to assign all NestLoopParams to plan nodes" sanity check in createplan.c, but other symptoms may be possible. The right value to use is the join level we actually intend to evaluate the PHV at. We can get that from the ph_eval_at field of the associated PlaceHolderInfo. However, there are some places that call pull_varnos() before the PlaceHolderInfos have been created; in that case, fall back to the conservative assumption that the PHV will be evaluated at its syntactic level. (In principle this might result in missing some legal optimization, but I'm not aware of any cases where it's an issue in practice.) Things are also a bit ticklish for calls occurring during deconstruct_jointree(), but AFAICS the ph_eval_at fields should have reached their final values by the time we need them. The main problem in making this work is that pull_varnos() has no way to get at the PlaceHolderInfos. We can fix that easily, if a bit tediously, in HEAD by passing it the planner "root" pointer. In the back branches that'd cause an unacceptable API/ABI break for extensions, so leave the existing entry points alone and add new ones with the additional parameter. (If an old entry point is called and encounters a PHV, it'll fall back to using the syntactic level, again possibly missing some valid optimization.) Back-patch to v12. The computation is surely also wrong before that, but it appears that we cannot reach a bad plan thanks to join order restrictions imposed on the subquery that the PlaceHolderVar came from. The error only became reachable when commit `4be058fe9` allowed trivial subqueries to be collapsed out completely, eliminating their join order restrictions. Per report from Stephan Springl. Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us	2021-01-21 15:37:23 -05:00
Tomas Vondra	920f853dc9	Fix initialization of FDW batching in ExecInitModifyTable ExecInitModifyTable has to initialize batching for all result relations, not just the first one. Furthermore, when junk filters were necessary, the pointer pointed past the mtstate->resultRelInfo array. Per reports from multiple non-x86 animals (florican, locust, ...). Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development	2021-01-21 03:34:32 +01:00
Tomas Vondra	b663a41363	Implement support for bulk inserts in postgres_fdw Extends the FDW API to allow batching inserts into foreign tables. That is usually much more efficient than inserting individual rows, due to high latency for each round-trip to the foreign server. It was possible to implement something similar in the regular FDW API, but it was inconvenient and there were issues with reporting the number of actually inserted rows etc. This extends the FDW API with two new functions: * GetForeignModifyBatchSize - allows the FDW picking optimal batch size * ExecForeignBatchInsert - inserts a batch of rows at once Currently, only INSERT queries support batching. Support for DELETE and UPDATE may be added in the future. This also implements batching for postgres_fdw. The batch size may be specified using "batch_size" option both at the server and table level. The initial patch version was written by me, but it was rewritten and improved in many ways by Takayuki Tsunakawa. Author: Takayuki Tsunakawa Reviewed-by: Tomas Vondra, Amit Langote Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development	2021-01-20 23:57:27 +01:00
Heikki Linnakangas	6b4d3046f4	Fix bug in detecting concurrent page splits in GiST insert In commit `9eb5607e69`, I got the condition on checking for split or deleted page wrong: I used && instead of \|\|. The comment correctly said "concurrent split _or_ deletion". As a result, GiST insertion could miss a concurrent split, and insert to wrong page. Duncan Sands demonstrated this with a test script that did a lot of concurrent inserts. Backpatch to v12, where this was introduced. REINDEX is required to fix indexes that were affected by this bug. Backpatch-through: 12 Reported-by: Duncan Sands Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com	2021-01-20 11:58:03 +02:00
Michael Paquier	21378e1fef	Fix ALTER DEFAULT PRIVILEGES with duplicated objects Specifying duplicated objects in this command would lead to unique constraint violations in pg_default_acl or "tuple already updated by self" errors. Similarly to GRANT/REVOKE, increment the command ID after each subcommand processing to allow this case to work transparently. A regression test is added by tweaking one of the existing queries of privileges.sql to stress this case. Reported-by: Andrus Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee Backpatch-through: 9.5	2021-01-20 11:38:17 +09:00
Tom Lane	a0efda88a6	Remove faulty support for MergeAppend plan with WHERE CURRENT OF. Somebody extended search_plan_tree() to treat MergeAppend exactly like Append, which is 100% wrong, because unlike Append we can't assume that only one input node is actively returning tuples. Hence a cursor using a MergeAppend across a UNION ALL or inheritance tree could falsely match a WHERE CURRENT OF query at a row that isn't actually the cursor's current output row, but coincidentally has the same TID (in a different table) as the current output row. Delete the faulty code; this means that such a case will now return an error like 'cursor "foo" is not a simply updatable scan of table "bar"', instead of silently misbehaving. Users should not find that surprising though, as the same cursor query could have failed that way already depending on the chosen plan. (It would fail like that if the sort were done with an explicit Sort node instead of MergeAppend.) Expand the clearly-inadequate commentary to be more explicit about what this code is doing, in hopes of forestalling future mistakes. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us	2021-01-19 13:25:33 -05:00
Amit Kapila	ed43677e20	pgindent worker.c. This is a leftover from commit `0926e96c49`. Changing this separately because this file is being modified for upcoming patch logical replication of 2PC. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com	2021-01-19 08:10:13 +05:30
Tom Lane	60661bbf2d	Avoid crash with WHERE CURRENT OF and a custom scan plan. execCurrent.c's search_plan_tree() assumed that ForeignScanStates and CustomScanStates necessarily have a valid ss_currentRelation. This is demonstrably untrue for postgres_fdw's remote join and remote aggregation plans, and non-leaf custom scans might not have an identifiable scan relation either. Avoid crashing by ignoring such nodes when the field is null. This solution will lead to errors like 'cursor "foo" is not a simply updatable scan of table "bar"' in cases where maybe we could have allowed WHERE CURRENT OF to work. That's not an issue for postgres_fdw's usages, since joins or aggregations would render WHERE CURRENT OF invalid anyway. But an otherwise-transparent upper level custom scan node might find this annoying. When and if someone cares to expend work on such a scenario, we could invent a custom-scan-provider callback to determine what's safe. Report and patch by David Geier, commentary by me. It's been like this for awhile, so back-patch to all supported branches. Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com	2021-01-18 18:32:30 -05:00
Tom Lane	3fd80c728d	Narrow the scope of a local variable. This is better style and more symmetrical with the other if-branch. This likely should have been included in `9de77b545` (which created the opportunity), but it was overlooked. Japin Li Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2021-01-18 15:55:01 -05:00
Tom Lane	a6cf3df4eb	Add bytea equivalents of ltrim() and rtrim(). We had bytea btrim() already, but for some reason not the other two. Joel Jacobson Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com	2021-01-18 15:11:32 -05:00
Robert Haas	a3ed4d1efe	Allow for error or refusal while absorbing a ProcSignalBarrier. Previously, the per-barrier-type functions tasked with absorbing them were expected to always succeed and never throw an error. However, that's a bit inconvenient. Further study has revealed that there are realistic cases where it might not be possible to absorb a ProcSignalBarrier without terminating the transaction, or even the whole backend. Similarly, for some barrier types, there might be other reasons where it's not reasonably possible to absorb the barrier at certain points in the code, so provide a way for a per-barrier-type function to reject absorbing the barrier. Unfortunately, there's still no committed code making use of this infrastructure; hopefully, we'll get there. :-( Patch by me, reviewed by Andres Freund and Amul Sul. Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com	2021-01-18 12:09:52 -05:00
Peter Eisentraut	15251c0a60	Pause recovery for insufficient parameter settings When certain parameters are changed on a physical replication primary, this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record. The standby then checks whether its own settings are at least as big as the ones on the primary. If not, the standby shuts down with a fatal error. This patch changes this behavior for hot standbys to pause recovery at that point instead. That allows read traffic on the standby to continue while database administrators figure out next steps. When recovery is unpaused, the server shuts down (as before). The idea is to fix the parameters while recovery is paused and then restart when there is a maintenance window. Reviewed-by: Sergei Kornilov <sk@zsrv.org> Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com	2021-01-18 09:04:04 +01:00
Michael Paquier	a3dc926009	Refactor option handling of CLUSTER, REINDEX and VACUUM This continues the work done in `b5913f6`. All the options of those commands are changed to use hex values rather than enums to reduce the risk of compatibility bugs when introducing new options. Each option set is moved into a new structure that can be extended with more non-boolean options (this was already the case of VACUUM). The code of REINDEX is restructured so as manual REINDEX commands go through a single routine from utility.c, like VACUUM, to ease the allocation handling of option parameters when a command needs to go through multiple transactions. This can be used as a base infrastructure for future patches related to those commands, including reindex filtering and tablespace support. Per discussion with people mentioned below, as well as Alvaro Herrera and Peter Eisentraut. Author: Michael Paquier, Justin Pryzby Reviewed-by: Alexey Kondratov, Justin Pryzby Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz	2021-01-18 14:03:10 +09:00
Tomas Vondra	7db0cd2145	Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the visibility map. Until now we only marked individual tuples as frozen, but page-level flags were not updated, so the first VACUUM after the COPY FREEZE had to rewrite the whole table. This is a fairly old patch, and multiple people worked on it. The first version was written by Jeff Janes, and then reworked by Pavan Deolasee and Anastasia Lubennikova. Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii, Darafei Praliaskouski Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com	2021-01-17 22:28:26 +01:00
Magnus Hagander	960869da08	Add pg_stat_database counters for sessions and session time This add counters for number of sessions, the different kind of session termination types, and timers for how much time is spent in active vs idle in a database to pg_stat_database. Internally this also renames the parameter "force" to disconnect. This was the only use-case for the parameter before, so repurposing it to this mroe narrow usecase makes things cleaner than inventing something new. Author: Laurenz Albe Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at	2021-01-17 13:52:31 +01:00
Noah Misch	6db992833c	Prevent excess SimpleLruTruncate() deletion. Every core SLRU wraps around. With the exception of pg_notify, the wrap point can fall in the middle of a page. Account for this in the PagePrecedes callback specification and in SimpleLruTruncate()'s use of said callback. Update each callback implementation to fit the new specification. This changes SerialPagePrecedesLogically() from the style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes(). (Whereas pg_clog and pg_serial share a key space, pg_serial is nothing like pg_notify.) The bug fixed here has the same symptoms and user followup steps as `592a589a04`. Back-patch to 9.5 (all supported versions). Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane. Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com	2021-01-16 12:21:35 -08:00
Amit Kapila	c95765f476	Remove unnecessary pstrdup in fetch_table_list. The result of TextDatumGetCString is already palloc'ed so we don't need to allocate memory for it again. We decide not to backpatch it as there doesn't seem to be any case where it can create a meaningful leak. Author: Zhijie Hou Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local	2021-01-16 10:15:32 +05:30
Tomas Vondra	c9a0dc3486	Disallow CREATE STATISTICS on system catalogs Add a check that CREATE STATISTICS does not add extended statistics on system catalogs, similarly to indexes etc. It can be overriden using the allow_system_table_mods GUC. This bug exists since `7b504eb282`, adding the extended statistics, so backpatch all the way back to PostgreSQL 10. Author: Tomas Vondra Reported-by: Dean Rasheed Backpatch-through: 10 Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com	2021-01-15 23:31:22 +01:00
Alvaro Herrera	f9900df5f9	Avoid spurious wait in concurrent reindex This is like commit `c98763bf51`, but for REINDEX CONCURRENTLY. To wit: this flags indicates that the current process is safe to ignore for the purposes of waiting for other snapshots, when doing CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY. This helps two processes doing either of those things not deadlock, and also avoids spurious waits. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql	2021-01-15 10:31:42 -03:00
Fujii Masao	2ad78a87f0	Fix calculation of how much shared memory is required to store a TOC. Commit `ac883ac453` refactored shm_toc_estimate() but changed its calculation of shared memory size for TOC incorrectly. Previously this could cause too large memory to be allocated. Back-patch to v11 where the bug was introduced. Author: Takayuki Tsunakawa Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com	2021-01-15 12:44:17 +09:00
Michael Paquier	5ae1572993	Fix O(N^2) stat() calls when recycling WAL segments The counter tracking the last segment number recycled was getting initialized when recycling one single segment, while it should be used across a full cycle of segments recycled to prevent useless checks related to entries already recycled. This performance issue has been introduced by `b2a5545`, and it was first implemented in `61b86142`. No backpatch is done per the lack of field complaints. Reported-by: Andres Freund, Thomas Munro Author: Michael Paquier Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com	2021-01-15 10:33:13 +09:00

... 5 6 7 8 9 ...

21940 Commits