postgresql

Commit Graph

Author	SHA1	Message	Date
Tomas Vondra	a5f002ad9a	Use correct spelling of statistics kind A couple error messages and comments used 'statistic kind', not the correct 'statistics kind'. Fix and backpatch all the way back to 10, where extended statistics were introduced. Backpatch-through: 10	2021-03-23 05:01:35 +01:00
Fujii Masao	1e3e8b51bd	Change the type of WalReceiverWaitStart wait event from Client to IPC. Previously the type of this wait event was Client. But while this wait event is being reported, walreceiver process is waiting for the startup process to set initial data for streaming replication. It's not waiting for any activity on a socket connected to a user application or walsender. So this commit changes the type for WalReceiverWaitStart wait event to IPC. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com	2021-03-23 10:09:42 +09:00
Bruce Momjian	95d77149c5	Add macro RelationIsPermanent() to report relation permanence Previously, to check relation permanence, the Relation's Form_pg_class structure member relpersistence was compared to the value RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro RelationIsPermanent() and is used in appropirate places to simplify the code. This matches other RelationIs* macros. This macro will be used in more places in future cluster file encryption patches. Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net	2021-03-22 20:23:52 -04:00
Tomas Vondra	8e4b332e88	Optimize allocations in bringetbitmap The bringetbitmap function allocates memory for various purposes, which may be quite expensive, depending on the number of scan keys. Instead of allocating them separately, allocate one bit chunk of memory an carve it into smaller pieces as needed - all the pieces have the same lifespan, and it saves quite a bit of CPU and memory overhead. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:47:09 +01:00
Tomas Vondra	72ccf55cb9	Move IS [NOT] NULL handling from BRIN support functions The handling of IS [NOT] NULL clauses is independent of an opclass, and most of the code was exactly the same in both minmax and inclusion. So instead move the code from support procedures to the AM. This simplifies the code - especially the support procedures - quite a bit, as they don't need to care about NULL values and flags at all. It also means the IS [NOT] NULL clauses can be evaluated without invoking the support procedure. Author: Tomas Vondra <tomas.vondra@postgresql.org> Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:45:42 +01:00
Tomas Vondra	a1c649d889	Pass all scan keys to BRIN consistent function at once This commit changes how we pass scan keys to BRIN consistent function. Instead of passing them one by one, we now pass all scan keys for a given attribute at once. That makes the consistent function a bit more complex, as it has to loop through the keys, but it does allow more elaborate opclasses that can use multiple keys to eliminate ranges much more effectively. The existing BRIN opclasses (minmax, inclusion) don't really benefit from this change. The primary purpose is to allow future opclases to benefit from seeing all keys at once. This does change the BRIN API, because the signature of the consistent function changes (a new parameter with number of scan keys). So this breaks existing opclasses, and will require supporting two variants of the code for different PostgreSQL versions. We've considered supporting two variants of the consistent, but we've decided not to do that. Firstly, there's another patch that moves handling of NULL values from the opclass, which means the opclasses need to be updated anyway. Secondly, we're not aware of any out-of-core BRIN opclasses, so it does not seem worth the extra complexity. Bump catversion, because of pg_proc changes. Author: Tomas Vondra <tomas.vondra@postgresql.org> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Mark Dilger <hornschnorter@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru> Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com	2021-03-23 00:45:03 +01:00
Tomas Vondra	bfa2cee784	Move bsearch_arg to src/port Until now the bsearch_arg function was used only in extended statistics code, so it was defined in that code. But we already have qsort_arg in src/port, so let's move it next to it.	2021-03-23 00:11:22 +01:00
Tom Lane	063dd37ebc	Short-circuit slice requests that are for more than the object's size. substring(), and perhaps other callers, isn't careful to pass a slice length that is no more than the datum's true size. Since toast_decompress_datum_slice's children will palloc the requested slice length, this can waste memory. Also, close study of the liblz4 documentation suggests that it is dependent on the caller to not ask for more than the correct amount of decompressed data; this squares with observed misbehavior with liblz4 1.8.3. Avoid these problems by switching to the normal full-decompression code path if the slice request is >= datum's decompressed size. Tom Lane and Dilip Kumar Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us	2021-03-22 14:01:20 -04:00
Tom Lane	aeb1631ed2	Mostly-cosmetic adjustments of TOAST-related macros. The authors of `bbe0a81db` hadn't quite got the idea that macros named like SOMETHING_4B_C were only meant for internal endianness-related details in postgres.h. Choose more legible names for macros that are intended to be used elsewhere. Rearrange postgres.h a bit to clarify the separation between those internal macros and ones intended for wider use. Also, avoid using the term "rawsize" for true decompressed size; we've used "extsize" for that, because "rawsize" generally denotes total Datum size including header. This choice seemed particularly unfortunate in tests that were comparing one of these meanings to the other. This patch includes a couple of not-purely-cosmetic changes: be sure that the shifts aligning compression methods are unsigned (not critical today, but will be when compression method 2 exists), and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only accidentally. Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us	2021-03-22 13:43:10 -04:00
Robert Haas	a4d5284a10	Error on invalid TOAST compression in CREATE or ALTER TABLE. The previous coding treated an invalid compression method name as equivalent to the default, which is certainly not right. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com	2021-03-22 10:57:08 -04:00
Robert Haas	226e2be387	More code cleanup for configurable TOAST compression. Remove unused macro. Fix confusion about whether a TOAST compression method is identified by an OID or a char. Justin Pryzby Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com	2021-03-22 09:21:37 -04:00
Michael Paquier	909b449e00	Fix concurrency issues with WAL segment recycling on Windows This commit is mostly a revert of `aaa3aed`, that switched the routine doing the internal renaming of recycled WAL segments to use on Windows a combination of CreateHardLinkA() plus unlink() instead of rename(). As reported by several users of Postgres 13, this is causing concurrency issues when manipulating WAL segments, mostly in the shape of the following error: LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ": Permission denied This moves back to a logic where a single rename() (well, pgrename() for Windows) is used. This issue has proved to be hard to hit when I tested it, facing it only once with an archive_command that was not able to do its work, so it is environment-sensitive. The reporters of this issue have been able to confirm that the situation improved once we switched back to a single rename(). In order to check things, I have provided to the reporters a patched build based on 13.2 with `aaa3aed` reverted, to test if the error goes away, and an unpatched build of 13.2 to test if the error still showed up (just to make sure that I did not mess up my build process). Extra thanks to Fujii Masao for pointing out what looked like the culprit commit, and to all the reporters for taking the time to test what I have sent them. Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz Reviewed-by: Tom Lane, Andres Freund Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz Backpatch-through: 13	2021-03-22 14:02:26 +09:00
Michael Paquier	595b9cba2a	Fix timeline assignment in checkpoints with 2PC transactions Any transactions found as still prepared by a checkpoint have their state data read from the WAL records generated by PREPARE TRANSACTION before being moved into their new location within pg_twophase/. While reading such records, the WAL reader uses the callback read_local_xlog_page() to read a page, that is shared across various parts of the system. This callback, since `1148e22a`, has introduced an update of ThisTimeLineID when reading a record while in recovery, which is potentially helpful in the context of cascading WAL senders. This update of ThisTimeLineID interacts badly with the checkpointer if a promotion happens while some 2PC data is read from its record, as, by changing ThisTimeLineID, any follow-up WAL records would be written to an timeline older than the promoted one. This results in consistency issues. For instance, a subsequent server restart would cause a failure in finding a valid checkpoint record, resulting in a PANIC, for instance. This commit changes the code reading the 2PC data to reset the timeline once the 2PC record has been read, to prevent messing up with the static state of the checkpointer. It would be tempting to do the same thing directly in read_local_xlog_page(). However, based on the discussion that has led to `1148e22a`, users may rely on the updates of ThisTimeLineID when a WAL record page is read in recovery, so changing this callback could break some cases that are working currently. A TAP test reproducing the issue is added, relying on a PITR to precisely trigger a promotion with a prepared transaction still tracked. Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao and myself. Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com Backpatch-through: 10	2021-03-22 08:30:53 +09:00
Tom Lane	ac897c4834	Fix assorted silliness in ATExecSetCompression(). It's not okay to scribble directly on a syscache entry. Nor to continue accessing said entry after releasing it. Also get rid of not-used local variables. Per valgrind testing.	2021-03-21 18:43:07 -04:00
Peter Geoghegan	9dd963ae25	Recycle nbtree pages deleted during same VACUUM. Maintain a simple array of metadata about pages that were deleted during nbtree VACUUM's current btvacuumscan() call. Use this metadata at the end of btvacuumscan() to attempt to place newly deleted pages in the FSM without further delay. It might not yet be safe to place any of the pages in the FSM by then (they may not be deemed recyclable), but we have little to lose and plenty to gain by trying. In practice there is a very good chance that this will work out when vacuuming larger indexes, where scanning the index naturally takes quite a while. This commit doesn't change the page recycling invariants; it merely improves the efficiency of page recycling within the confines of the existing design. Recycle safety is a part of nbtree's implementation of what Lanin & Shasha call "the drain technique". The design happens to use transaction IDs (they're stored in deleted pages), but that in itself doesn't align the cutoff for recycle safety to any of the XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters is whether or not _other_ backends might be able to observe various inconsistencies in the tree structure (that they cannot just detect and recover from by moving right). Recycle safety is purely a question of maintaining the consistency (or the apparent consistency) of a physical data structure. Note that running a simple serial test case involving a large range DELETE followed by a VACUUM VERBOSE will probably show that any newly deleted nbtree pages are not yet reusable/recyclable. This is expected in the absence of even one concurrent XID assignment. It is an old implementation restriction. In practice it's unlikely to be the thing that makes recycling remain unsafe, at least with larger indexes, where recycling newly deleted pages during the same VACUUM actually matters. An important high-level goal of this commit (as well as related recent commits `e5d8a999` and `9f3665fb`) is to make expensive deferred cleanup operations in index AMs rare in general. If index vacuuming frequently depends on the next VACUUM operation finishing off work that the current operation started, then the general behavior of index vacuuming is hard to predict. This is relevant to ongoing work that adds a vacuumlazy.c mechanism to skip index vacuuming in certain cases. Anything that makes the real world behavior of index vacuuming simpler and more linear will also make top-down modeling in vacuumlazy.c more robust. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com	2021-03-21 15:25:39 -07:00
Tom Lane	9fb9691a88	Suppress various new compiler warnings. Compilers that don't understand that elog(ERROR) doesn't return issued warnings here. In the cases in libpq_pipeline.c, we were not exactly helping things by failing to mark pg_fatal() as noreturn. Per buildfarm.	2021-03-21 11:50:43 -04:00
Peter Eisentraut	96ae658e62	Move lwlock-release probe back where it belongs The documentation specifically states that lwlock-release fires before any released waiters have been awakened. It worked that way until `ab5194e6f6`, where is seems to have been misplaced accidentally. Move it back where it belongs. Author: Craig Ringer <craig.ringer@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com	2021-03-21 08:02:30 +01:00
Tomas Vondra	882b2cdc08	Use valid compression method in brin_form_tuple When compressing the BRIN summary, we can't simply use the compression method from the indexed attribute. The summary may use a different data type, e.g. fixed-length attribute may have varlena summary, leading to compression failures. For the built-in BRIN opclasses this happens to work, because the summary uses the same data type as the attribute. When the data types match, we can inherit use the compression method specified for the attribute (it's copied into the index descriptor). Otherwise we don't have much choice and have to use the default one. Author: Tomas Vondra Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com	2021-03-21 00:28:34 +01:00
Tom Lane	e835e89a0f	Fix memory leak when rejecting bogus DH parameters. While back-patching `e0e569e1d`, I noted that there were some other places where we ought to be applying DH_free(); namely, where we load some DH parameters from a file and then reject them as not being sufficiently secure. While it seems really unlikely that anybody would hit these code paths in production, let alone do so repeatedly, let's fix it for consistency. Back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org	2021-03-20 12:47:21 -04:00
Tom Lane	f0c2a5bba6	Avoid leaking memory in RestoreGUCState(), and improve comments. RestoreGUCState applied InitializeOneGUCOption to already-live GUC entries, causing any malloc'd subsidiary data to be forgotten. We do want the effect of resetting the GUC to its compiled-in default, and InitializeOneGUCOption seems like the best way to do that, so add code to free any existing subsidiary data beforehand. The interaction between can_skip_gucvar, SerializeGUCState, and RestoreGUCState is way more subtle than their opaque comments would suggest to an unwary reader. Rewrite and enlarge the comments to try to make it clearer what's happening. Remove a long-obsolete assertion in read_nondefault_variables: the behavior of set_config_option hasn't depended on IsInitProcessingMode since `f5d9698a8` installed a better way of controlling it. Although this is fixing a clear memory leak, the leak is quite unlikely to involve any large amount of data, and it can only happen once in the lifetime of a worker process. So it seems unnecessary to take any risk of back-patching. Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us	2021-03-19 23:03:17 -04:00
Thomas Munro	61752afb26	Provide recovery_init_sync_method=syncfs. Since commit `2ce439f3` we have opened every file in the data directory and called fsync() at the start of crash recovery. This can be very slow if there are many files, leading to field complaints of systems taking minutes or even hours to begin crash recovery. Provide an alternative method, for Linux only, where we call syncfs() on every possibly different filesystem under the data directory. This is equivalent, but avoids faulting in potentially many inodes from potentially slow storage. The new mode comes with some caveats, described in the documentation, so the default value for the new setting is "fsync", preserving the older behavior. Reported-by: Michael Brown <michael.brown@discourse.org> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Paul Guo <guopa@vmware.com> Reviewed-by: Bruce Momjian <bruce@momjian.us> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com	2021-03-20 12:07:28 +13:00
Tomas Vondra	b822ae13ea	Use lfirst_int in cmp_list_len_contents_asc The function added in `be45be9c33` is comparing integer lists (IntList) by length and contents, but there were two bugs. Firstly, it used intVal() to extract the value, but that's for Value nodes, not for extracting int values from IntList. Secondly, it called it directly on the ListCell, without doing lfirst(). So just do lfirst_int() instead. Interestingly enough, this did not cause any crashes on the buildfarm, but valgrind rightfully complained about it. Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org	2021-03-20 00:04:25 +01:00
Robert Haas	d00fbdc431	Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType. Introduced by commit `bbe0a81db6`. Per buildfarm member prion.	2021-03-19 17:17:48 -04:00
Robert Haas	bbe0a81db6	Allow configurable LZ4 TOAST compression. There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com	2021-03-19 15:10:38 -04:00
Fujii Masao	fd31214075	Fix comments in postmaster.c. Commit `86c23a6eb2` changed the option to specify that postgres will stop all other server processes by sending the signal SIGSTOP, from -s to -T. But previously there were comments incorrectly explaining that SIGSTOP behavior is set by -s option. This commit fixes them. Author: Kyotaro Horiguchi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com	2021-03-19 11:28:54 +09:00
Tom Lane	9bacdf9f53	Don't leak malloc'd error string in libpqrcv_check_conninfo(). We leaked the error report from PQconninfoParse, when there was one. It seems unlikely that real usage patterns would repeat the failure often enough to create serious bloat, but let's back-patch anyway to keep the code similar in all branches. Found via valgrind testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	377b7a8300	Don't leak malloc'd strings when a GUC setting is rejected. Because guc.c prefers to keep all its string values in malloc'd not palloc'd storage, it has to be more careful than usual to avoid leaks. Error exits out of string GUC hook checks failed to clear the proposed value string, and error exits out of ProcessGUCArray() failed to clear the malloc'd results of ParseLongOption(). Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	d303849b05	Don't leak compiled regex(es) when an ispell cache entry is dropped. The text search cache mechanisms assume that we can clean up an invalidated dictionary cache entry simply by resetting the associated long-lived memory context. However, that does not work for ispell affixes that make use of regular expressions, because the regex library deals in plain old malloc. Hence, we leaked compiled regex(es) any time we dropped such a cache entry. That could quickly add up, since even a fairly trivial regex can use up tens of kB, and a large one can eat megabytes. Add a memory context callback to ensure that a regex gets freed when its owning cache entry is cleared. Found via valgrind testing. This problem is ancient, so back-patch to all supported branches. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	415ffdc220	Don't run RelationInitTableAccessMethod in a long-lived context. Some code paths in this function perform syscache lookups, which can lead to table accesses and possibly leakage of cruft into the caller's context. If said context is CacheMemoryContext, we eventually will have visible bloat. But fixing this is no harder than moving one memory context switch step. (The other callers don't have a problem.) Andres Freund and I independently found this via valgrind testing. Back-patch to v12 where this code was added. Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	28644fac10	Don't leak rd_statlist when a relcache entry is dropped. Although these lists are usually NIL, and even when not empty are unlikely to be large, constant relcache update traffic could eventually result in visible bloat of CacheMemoryContext. Found via valgrind testing. Back-patch to v10 where this field was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:22:47 -04:00
Tom Lane	1d581ce712	Fix misuse of foreach_delete_current(). Our coding convention requires this macro's result to be assigned back to the original List variable. In this usage, since the List could not become empty, there was no actual bug --- but some compilers warned about it. Oversight in `be45be9c3`. Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com	2021-03-18 19:24:22 -04:00
Tomas Vondra	be45be9c33	Implement GROUP BY DISTINCT With grouping sets, it's possible that some of the grouping sets are duplicate. This is especially common with CUBE and ROLLUP clauses. For example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b, c), (a, b, c), (a, b), (a, b), (a, b), (a), (a), (a), (c, a), (c, a), (c, a), (c), (b, c), (b), () ) Some of the grouping sets are calculated multiple times, which is mostly unnecessary. This commit implements a new GROUP BY DISTINCT feature, as defined in the SQL standard, which eliminates the duplicate sets. Author: Vik Fearing Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org	2021-03-18 18:22:18 +01:00
Tomas Vondra	cd91de0d17	Remove temporary files after backend crash After a crash of a backend using temporary files, the files used to be left behind, on the basis that it might be useful for debugging. But we don't have any reports of anyone actually doing that, and it means the disk usage may grow over time due to repeated backend failures (possibly even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing it required either manual cleanup (deleting files, which is error-prone) or restart of the instance (i.e. service disruption). This implements automatic cleanup of temporary files, controled by a new GUC remove_temp_files_after_crash. By default the files are removed, but it can be disabled to restore the old behavior if needed. Author: Euler Taveira Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com	2021-03-18 17:38:28 +01:00
Magnus Hagander	da18d829c2	Fix function name in error hint pg_read_file() is the function that's in core, pg_file_read() is in adminpack. But when using pg_file_read() in adminpack it calls the C level function pg_read_file() in core, which probably threw the original author off. But the error hint should be about the SQL function. Reported-By: Sergei Kornilov Backpatch-through: 11 Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru	2021-03-18 11:22:20 +01:00
Amit Kapila	c8f78b6161	Add a new GUC and a reloption to enable inserts in parallel-mode. Commit `05c8482f7f` added the implementation of parallel SELECT for "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in the additional parallel-safety checks that it performs, even when, in the end, those checks determine that parallelism can't be used. This is normally only ever a problem in the case of when the target table has a large number of partitions. A new GUC option "enable_parallel_insert" is added, to allow insert in parallel-mode. The default is on. In addition to the GUC option, the user may want a mechanism to allow inserts in parallel-mode with finer granularity at table level. The new table option "parallel_insert_enabled" allows this. The default is true. Author: "Hou, Zhijie" Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com	2021-03-18 07:25:27 +05:30
Andres Freund	5f79580ad6	Fix memory lifetime issues of replication slot stats. When accessing replication slot stats, introduced in `9868167500`, pgstat_read_statsfiles() reads the data into newly allocated memory. Unfortunately the current memory context at that point is the callers, leading to leaks and use-after-free dangers. The fix is trivial, explicitly use pgStatLocalContext. There's some potential for further improvements, but that's outside of the scope of this bugfix. No backpatch necessary, feature is only in HEAD. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de	2021-03-17 16:21:46 -07:00
Tom Lane	8620a7f6db	Code review for server's handling of "tablespace map" files. While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.	2021-03-17 16:18:46 -04:00
Tom Lane	a50e4fd028	Prevent buffer overrun in read_tablespace_map(). Robert Foggia of Trustwave reported that read_tablespace_map() fails to prevent an overrun of its on-stack input buffer. Since the tablespace map file is presumed trustworthy, this does not seem like an interesting security vulnerability, but still we should fix it just in the name of robustness. While here, document that pg_basebackup's --tablespace-mapping option doesn't work with tar-format output, because it doesn't. To make it work, we'd have to modify the tablespace_map file within the tarball sent by the server, which might be possible but I'm not volunteering. (Less-painful solutions would require changing the basebackup protocol so that the source server could adjust the map. That's not very appetizing either.)	2021-03-17 16:10:37 -04:00
Thomas Munro	7f7f25f15e	Revert "Fix race in Parallel Hash Join batch cleanup." This reverts commit `378802e371`. This reverts commit `3b8981b6e1`. Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com	2021-03-18 01:10:55 +13:00
Michael Paquier	9fd2952cf4	Fix comment in indexing.c `578b229`, that removed support for WITH OIDS, has changed CatalogTupleInsert() to not return an Oid, but one comment was still mentioning that. Author: Vik Fearing Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org	2021-03-17 18:07:00 +09:00
Peter Eisentraut	e1ae40f381	Small error message improvement	2021-03-17 08:17:33 +01:00
Thomas Munro	378802e371	Update the names of Parallel Hash Join phases. Commit `3048898e` dropped -ING from some wait event names that correspond to barrier phases. Update the phases' names to match. While we're here making cosmetic changes, also rename "DONE" to "FREE". That pairs better with "ALLOCATE", and describes the activity that actually happens in that phase (as we do for the other phases) rather than describing a state. The distinction is clearer after bugfix commit `3b8981b6` split the phase into two. As for the growth barriers, rename their "ALLOCATE" phase to "REALLOCATE", which is probably a better description of what happens then. Also improve the comments about the phases a bit. Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com	2021-03-17 18:43:04 +13:00
Thomas Munro	3b8981b6e1	Fix race in Parallel Hash Join batch cleanup. With very unlucky timing and parallel_leader_participation off, PHJ could attempt to access per-batch state just as it was being freed. There was code intended to prevent that by checking for a cleared pointer, but it was buggy. Fix, by introducing an extra barrier phase. The new phase PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to find a batch to help with, and PHJ_BUILD_DONE means that it is too late. The last to detach will free the array of per-batch state as before, but now it will also atomically advance the phase at the same time, so that late attachers can avoid the hazard, without the data race. This mirrors the way per-batch hash tables are freed (see phases PHJ_BATCH_PROBING and PHJ_BATCH_DONE). Revealed by a one-off build farm failure, where BarrierAttach() failed a sanity check assertion, because the memory had been clobbered by dsa_free(). Back-patch to 11, where the code arrived. Reported-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz	2021-03-17 18:05:39 +13:00
Amit Kapila	6b67d72b60	Fix race condition in drop subscription's handling of tablesync slots. Commit `ce0fdbfe97` made tablesync slots permanent and allow Drop Subscription to drop such slots. However, it is possible that before tablesync worker could get the acknowledgment of slot creation, drop subscription stops it and that can lead to a dangling slot on the publisher. Prevent cancel/die interrupts while creating a slot in the tablesync worker. Reported-by: Thomas Munro as per buildfarm Author: Amit Kapila Reviewed-by: Vignesh C, Takamichi Osumi Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com	2021-03-17 08:15:12 +05:30
Thomas Munro	9e7ccd9ef6	Enable parallelism in REFRESH MATERIALIZED VIEW. Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans are considered when running the underlying SELECT query. This wasn't done in commit `e9baa5e9`, which did this for CREATE MATERIALIZED VIEW, because it wasn't yet known to be safe. Since REFRESH always inserts into a freshly created table before later merging or swapping the data into place with separate operations, we can enable such plans here too. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Luc Vlaming <luc@swarm64.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com	2021-03-17 15:04:17 +13:00
Peter Geoghegan	fbe4cb3bd4	Fix comment about promising tuples. Oversight in commit `d168b66682`, which added bottom-up index deletion.	2021-03-16 13:38:52 -07:00
Tom Lane	4b12ab18c9	Avoid corner-case memory leak in SSL parameter processing. After reading the root cert list from the ssl_ca_file, immediately install it as client CA list of the new SSL context. That gives the SSL context ownership of the list, so that SSL_CTX_free will free it. This avoids a permanent memory leak if we fail further down in be_tls_init(), which could happen if bogus CRL data is offered. The leak could only amount to something if the CRL parameters get broken after server start (else we'd just quit) and then the server is SIGHUP'd many times without fixing the CRL data. That's rather unlikely perhaps, but it seems worth fixing, if only because the code is clearer this way. While we're here, add some comments about the memory management aspects of this logic. Noted by Jelte Fennema and independently by Andres Freund. Back-patch to v10; before commit `de41869b6` it doesn't matter, since we'd not re-execute this code during SIGHUP. Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org	2021-03-16 16:03:06 -04:00
Stephen Frost	c6fc50cb40	Use pre-fetching for ANALYZE When we have posix_fadvise() available, we can improve the performance of an ANALYZE by quite a bit by using it to inform the kernel of the blocks that we're going to be asking for. Similar to bitmap index scans, the number of buffers pre-fetched is based off of the maintenance_io_concurrency setting (for the particular tablespace or, if not set, globally, via get_tablespace_maintenance_io_concurrency()). Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com	2021-03-16 14:46:48 -04:00
Stephen Frost	94d13d474d	Improve logging of auto-vacuum and auto-analyze When logging auto-vacuum and auto-analyze activity, include the I/O timing if track_io_timing is enabled. Also, for auto-analyze, add the read rate and the dirty rate, similar to how that information has historically been logged for auto-vacuum. Stephen Frost and Jakub Wartak Reviewed-By: Heikki Linnakangas, Tomas Vondra Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com	2021-03-16 14:46:48 -04:00
Tom Lane	1ea396362b	Improve logging of bad parameter values in BIND messages. Since commit `ba79cb5dc`, values of bind parameters have been logged during errors in extended query mode. However, we only did that after we'd collected and converted all the parameter values, thus failing to offer any useful localization of invalid-parameter problems. Add a separate callback that's used during parameter collection, and have it print the parameter number, along with the input string if text input format is used. Justin Pryzby and Tom Lane Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com	2021-03-16 11:16:41 -04:00
Alvaro Herrera	acb7e4eb6b	Implement pipeline mode in libpq Pipeline mode in libpq lets an application avoid the Sync messages in the FE/BE protocol that are implicit in the old libpq API after each query. The application can then insert Sync at its leisure with a new libpq function PQpipelineSync. This can lead to substantial reductions in query latency. Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com> Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com> Reviewed-by: Daniel Vérité <daniel@manitou-mail.org> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com> Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com	2021-03-15 18:13:42 -03:00
Fujii Masao	d75288fb27	Make archiver process an auxiliary process. This commit changes WAL archiver process so that it's treated as an auxiliary process and can use shared memory. This is an infrastructure patch required for upcoming shared-memory based stats collector patch series. These patch series basically need any processes including archiver that can report the statistics to access to shared memory. Since this patch itself is useful to simplify the code and when users monitor the status of archiver, it's committed separately in advance. This commit simplifies the code for WAL archiving. For example, previously backends need to signal to archiver via postmaster when they notify archiver that there are some WAL files to archive. On the other hand, this commit removes that signal to postmaster and enables backends to notify archier directly using shared latch. Also, as the side of this change, the information about archiver process becomes viewable at pg_stat_activity view. Author: Kyotaro Horiguchi Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp	2021-03-15 13:13:14 +09:00
Peter Geoghegan	0ea71c93a0	Notice that heap page has dead items during VACUUM. Consistently set a flag variable that tracks whether the current heap page has a dead item during lazy vacuum's heap scan. We missed the common case where there is an preexisting (or even a new) LP_DEAD heap line pointer. Also make it clear that the variable might be affected by an existing line pointer, say from an earlier opportunistic pruning operation. This distinction is important because it's the main reason why we can't just use the nearby tups_vacuumed variable instead. No backpatch. In theory failing to set the page level flag variable had no consequences. Currently it is only used to defensively check if a page marked all visible has dead items, which should never happen anyway (if it does then the table must be corrupt). Author: Masahiko Sawada <sawada.mshk@gmail.com> Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com	2021-03-14 18:05:57 -07:00
Amit Kapila	c5be48f092	Improve FK trigger parallel-safety check added by `05c8482f7f`. Commit `05c8482f7f` added special logic related to parallel-safety of FK triggers. This is a bit of a hack and should have instead been done by simply setting appropriate proparallel values on those trigger functions themselves. Suggested-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us	2021-03-13 09:20:52 +05:30
Peter Geoghegan	02b5940dbe	Consolidate nbtree VACUUM metapage routines. Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs a single 'rel' argument now), and move it next to its sibling function in nbtpage.c. I believe that _bt_vacuum_needs_cleanup() was originally located in nbtree.c due to an include dependency issue. That's no longer an issue. Follow-up to commit `9f3665fb`.	2021-03-12 13:11:47 -08:00
Tom Lane	f52c5d6749	Forbid marking an identity column as nullable. GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY NULL". One might think the old behavior was a feature, but it was inconsistent because the outcome varied depending on the order of the clauses, so it seems to have been just an oversight. Per bug #16913 from Pavel Boev. Back-patch to v10 where identity columns were introduced. Vik Fearing (minor tweaks by me) Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org	2021-03-12 11:08:42 -05:00
Thomas Munro	1b88b8908e	Specialize checkpointer sort functions. When sorting a potentially large number of dirty buffers, the checkpointer can benefit from a faster sort routine. One reported improvement on a large buffer pool system was 1.4s -> 0.6s. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-12 23:56:02 +13:00
Amit Kapila	519e4c9ee2	Fix size overflow in calculation introduced by commits `d6ad34f3` and `bea449c6`. Reported-by: Thomas Munro Author: Takayuki Tsunakawa Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com	2021-03-12 15:42:08 +05:30
Amit Kapila	e2cda3c20a	Fix use of relcache TriggerDesc field introduced by commit `05c8482f7f`. The commit added code which used a relcache TriggerDesc field across another cache access, which it shouldn't because the relcache doesn't guarantee it won't get moved. Diagnosed-by: Tom Lane Author: Greg Nancarrow Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us	2021-03-12 15:14:41 +05:30
Thomas Munro	57dcc2ef33	Poll postmaster less frequently in recovery. Since commits `9f095299` and `f98b8476` we don't poll the postmaster pipe at all during crash recovery on Linux and FreeBSD, but on other operating systems we were still doing it for every WAL record. Do it less frequently on operating systems where system calls are required, at the cost of delaying exit a bit after postmaster death. This avoids expensive system calls reported to slow down CPU-bound recovery by as much as 10-30%. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi	2021-03-12 19:45:42 +13:00
Thomas Munro	de829ddf23	Add condition variable for walreceiver shutdown. Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com	2021-03-12 19:45:42 +13:00
Thomas Munro	600f2f50b7	Add condition variable for recovery resume. Replace a sleep loop with a CV, to get a fast reaction time when recovery is resumed or the postmaster exits via standard infrastructure. Unfortunately we still need to wake up every second to perform extra polling during the recovery pause loop. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com	2021-03-12 19:45:42 +13:00
Fujii Masao	b82640df00	Send statistics collected during shutdown checkpoint to the stats collector. When shutdown is requested, checkpointer performs checkpoint or restartpoint, and updates the statistics, before it exits. But previously checkpointer didn't send those statistics to the stats collector. Shutdown checkpoint and restartpoint are treated as requested ones instead of scheduled ones, so the number of them are counted in pg_stat_bgwriter.checkpoints_req column. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-12 14:23:00 +09:00
Fujii Masao	33394ee6f2	Force to send remaining WAL stats to the stats collector at walwriter exit. In walwriter's main loop, WAL stats message is only sent if enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs. This is necessary to avoid overloading to the stats collector. But this can cause recent WAL stats to be unsent when walwriter exits. To ensure that all the WAL stats are sent, this commit makes walwriter force to send remaining WAL stats to the collector when it exits because of shutdown request. Note that those remaining WAL stats can still be unsent when walwriter exits with non-zero exit code (e.g., FATAL error). This is OK because that walwriter exit leads to server crash and subsequent recovery discards all the stats. So there is no need to send remaining stats in that case. Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-12 13:29:59 +09:00
Thomas Munro	43c6662496	Minor modernization for README.barrier. Itanium is very uncommon and being discontinued. ARM is everywhere. Prefer ARM as an example of an architecture with weak memory ordering.	2021-03-12 15:36:16 +13:00
Peter Geoghegan	7bb97211a5	Save a few cycles during nbtree VACUUM. Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common case where there are no deleted but not yet recycled pages to recycle during a cleanup-only nbtree VACUUM operation. Follow-up to commit `e5d8a999`, which (among other things) taught the "skip full scan" nbtree VACUUM mechanism to only trigger a full index scan when the absolute number of deleted pages in the index is considered excessive.	2021-03-11 14:18:23 -08:00
Peter Geoghegan	effdd3f3b6	Add back vacuum_cleanup_index_scale_factor parameter. Commit `9f3665fb` removed the vacuum_cleanup_index_scale_factor storage parameter. However, that creates dump/reload hazards when moving across major versions. Add back the vacuum_cleanup_index_scale_factor parameter (though not the GUC of the same name) purely to avoid problems when using tools like pg_upgrade. The parameter remains disabled and undocumented. No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was only disabled by REL_13_STABLE's version of master branch commit `9f3665fb` in the first place -- the parameter already looks like this on REL_13_STABLE. Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz	2021-03-11 12:42:46 -08:00
Robert Haas	32fd2b57d7	Be clear about whether a recovery pause has taken effect. Previously, the code and documentation seem to have essentially assumed than a call to pg_wal_replay_pause() would take place immediately, but that's not the case, because we only check for a pause in certain places. This means that a tool that uses this function and then wants to do something else afterward that is dependent on the pause having taken effect doesn't know how long it needs to wait to be sure that no more WAL is going to be replayed. To avoid that, add a new function pg_get_wal_replay_pause_state() which returns either 'not paused', 'paused requested', or 'paused'. After calling pg_wal_replay_pause() the status will immediate change from 'not paused' to 'pause requested'; when the startup process has noticed this, the status will change to 'pause'. For backward compatibility, pg_is_wal_replay_paused() still exists and returns the same thing as before: true if a pause has been requested, whether or not it has taken effect yet; and false if not. The documentation is updated to clarify. To improve the changes that a pause request is quickly confirmed effective, adjust things so that WaitForWALToBecomeAvailable will swiftly reach a call to recoveryPausesHere() when a pause request is made. Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata, Masahiko Sawada, and Bharath Rupireddy. Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com	2021-03-11 15:07:03 -05:00
Peter Geoghegan	5f8727f5a6	VACUUM ANALYZE: Always update pg_class.reltuples. vacuumlazy.c sometimes fails to update pg_class entries for each index (to ensure that pg_class.reltuples is current), even though analyze.c assumed that that must have happened during VACUUM ANALYZE. There are at least a couple of reasons for this. For example, vacuumlazy.c could fail to update pg_class when the index AM indicated that its statistics are merely an estimate, per the contract for amvacuumcleanup() routines established by commit `e57345975c` back in 2006. Stop assuming that pg_class must have been updated with accurate statistics within VACUUM ANALYZE -- update pg_class for indexes at the same time as the table relation in all cases. That way VACUUM ANALYZE will never fail to keep pg_class.reltuples reasonably accurate. The only downside of this approach (compared to the old approach) is that it might inaccurately set pg_class.reltuples for indexes whose heap relation ends up with the same inaccurate value anyway. This doesn't seem too bad. We already consistently called vac_update_relstats() (to update pg_class) for the heap/table relation twice during any VACUUM ANALYZE -- once in vacuumlazy.c, and once in analyze.c. We now make sure that we call vac_update_relstats() at least once (though often twice) for each index. This is follow up work to commit `9f3665fb`, which dealt with issues in btvacuumcleanup(). Technically this fixes an unrelated issue, though. btvacuumcleanup() no longer provides an accurate num_index_tuples value following commit `9f3665fb` (when there was no btbulkdelete() call during the VACUUM operation in question), but hashvacuumcleanup() has worked in the same way for many years now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com Backpatch: 13-, just like commit `9f3665fb`.	2021-03-10 17:07:57 -08:00
Peter Geoghegan	9f3665fbfc	Don't consider newly inserted tuples in nbtree VACUUM. Remove the entire idea of "stale stats" within nbtree VACUUM (stop caring about stats involving the number of inserted tuples). Also remove the vacuum_cleanup_index_scale_factor GUC/param on the master branch (though just disable them on postgres 13). The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM partially responsible for deciding when pg_class.reltuples stats needed to be updated. This seems contrary to the spirit of the index AM API, though -- it is not actually necessary for an index AM's bulk delete and cleanup callbacks to provide accurate stats when it happens to be inconvenient. The core code owns that. (Index AMs have the authority to perform or not perform certain kinds of deferred cleanup based on their own considerations, such as page deletion and recycling, but that has little to do with pg_class.reltuples/num_index_tuples.) This issue was fairly harmless until the introduction of the autovacuum_vacuum_insert_threshold feature by commit `b07642db`, which had an undesirable interaction with the vacuum_cleanup_index_scale_factor mechanism: it made insert-driven autovacuums perform full index scans, even though there is no real benefit to doing so. This has been tied to a regression with an append-only insert benchmark [1]. Also have remaining cases that perform a full scan of an index during a cleanup-only nbtree VACUUM indicate that the final tuple count is only an estimate. This prevents vacuumlazy.c from setting the index's pg_class.reltuples in those cases (it will now only update pg_class when vacuumlazy.c had TIDs for nbtree to bulk delete). This arguably fixes an oversight in deduplication-related bugfix commit `48e12913`. [1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.	2021-03-10 16:27:01 -08:00
Bruce Momjian	845ac7f847	C comments: improve description of GiST NSN and GistBuildLSN GiST indexes are complex, so adding more details in the code might help someone. Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us	2021-03-10 17:03:10 -05:00
Thomas Munro	d87251048a	Replace buffer I/O locks with condition variables. 1. Backends waiting for buffer I/O are now interruptible. 2. If something goes wrong in a backend that is currently performing I/O, waiting backends no longer wake up until that backend reaches AbortBufferIO() and broadcasts on the CV. Previously, any waiters would wake up (because the I/O lock was automatically released) and then busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS. 3. LWLockMinimallyPadded is removed, as it would now be unused. Author: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016) Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com	2021-03-11 10:36:17 +13:00
Tom Lane	c3ffe34863	Avoid creating duplicate cached plans for inherited FK constraints. When a foreign key constraint is applied to a partitioned table, each leaf partition inherits a similar FK constraint. We were processing all of those constraints independently, meaning that in large partitioning trees we'd build up large collections of cached FK-checking query plans. However, in all cases but one, the generated queries are actually identical for all members of the inheritance tree (because, in most cases, the query only mentions the topmost table of the other side of the FK relationship). So we can share a single cached plan among all the partitions, saving memory, not to mention time to build and maintain the cached plans. Keisuke Kuroda and Amit Langote Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1	2021-03-10 14:22:31 -05:00
Peter Eisentraut	bbaf315309	Add bound check before bsearch() for performance In the current lazy vacuum implementation, some index AMs such as btree indexes call lazy_tid_reaped() for each index tuple during ambulkdelete to check if the index tuple points to the (collected) garbage tuple. In that function, we simply call bsearch(), but we should be able to know the result without bsearch() if the index tuple points to the heap tuple that is out of range of the collected garbage tuples. Therefore, add a simple bound check before resorting to bsearch(). Testing has shown that this can give significant performance benefits. Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com	2021-03-10 15:19:37 +01:00
Peter Eisentraut	1657b37d7c	Small debug message tweak This makes the wording of the delete case match the update case.	2021-03-10 08:16:38 +01:00
Amit Kapila	e4e87a32cc	Fix valgrind issue in commit `05c8482f7f`. Initialize other newly added variables in max_parallel_hazard_context via is_parallel_safe() because we don't check the parallel-safety of target relations in that function. Reported-by: Tom Lane as per buildfarm Author: Amit Kapila Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us	2021-03-10 10:06:39 +05:30
Amit Kapila	05c8482f7f	Enable parallel SELECT for "INSERT INTO ... SELECT ...". Parallel SELECT can't be utilized for INSERT in the following cases: - INSERT statement uses the ON CONFLICT DO UPDATE clause - Target table has a parallel-unsafe: trigger, index expression or predicate, column default expression or check constraint - Target table has a parallel-unsafe domain constraint on any column - Target table is a partitioned table with a parallel-unsafe partition key expression or support function The planner is updated to perform additional parallel-safety checks for the cases listed above, for determining whether it is safe to run INSERT in parallel-mode with an underlying parallel SELECT. The planner will consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided nothing unsafe is found from the additional parallel-safety checks, or from the existing parallel-safety checks for SELECT. While checking parallel-safety, we need to check it for all the partitions on the table which can be costly especially when we decide not to use a parallel plan. So, in a separate patch, we will introduce a GUC and or a reloption to enable/disable parallelism for Insert statements. Prior to entering parallel-mode for the execution of INSERT with parallel SELECT, a TransactionId is acquired and assigned to the current transaction state. This is necessary to prevent the INSERT from attempting to assign the TransactionId whilst in parallel-mode, which is not allowed. This approach has a disadvantage in that if the underlying SELECT does not return any rows, then the TransactionId is not used, however that shouldn't happen in practice in many cases. Author: Greg Nancarrow, Amit Langote, Amit Kapila Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila Tested-by: Tang, Haiying Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com	2021-03-10 07:38:58 +05:30
Fujii Masao	ff99918c62	Track total amounts of times spent writing and syncing WAL data to disk. This commit adds new GUC track_wal_io_timing. When this is enabled, the total amounts of time XLogWrite writes and issue_xlog_fsync syncs WAL data to disk are counted in pg_stat_wal. This information would be useful to check how much WAL write and sync affect the performance. Enabling track_wal_io_timing will make the server query the operating system for the current time every time WAL is written or synced, which may cause significant overhead on some platforms. To avoid such additional overhead in the server with track_io_timing enabled, this commit introduces track_wal_io_timing as a separate parameter from track_io_timing. Note that WAL write and sync activity by walreceiver has not been tracked yet. This commit makes the server also track the numbers of times XLogWrite writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal, regardless of the setting of track_wal_io_timing. This counters can be used to calculate the WAL write and sync time per request, for example. Bump PGSTAT_FILE_FORMAT_ID. Bump catalog version. Author: Masahiro Ikeda Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com	2021-03-09 16:52:06 +09:00
Michael Paquier	9d2d457009	Add support for more progress reporting in COPY The command (TO or FROM), its type (file, pipe, program or callback), and the number of tuples excluded by a WHERE clause in COPY FROM are added to the progress reporting already available. The column "lines_processed" is renamed to "tuples_processed" to disambiguate the meaning of this column in the cases of CSV and BINARY COPY and to be more consistent with the other catalog progress views. Bump catalog version, again. Author: Matthias van de Meent Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef Šimánek, Tomas Vondra Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com	2021-03-09 14:21:03 +09:00
Michael Paquier	f9264d1524	Remove support for SSL compression PostgreSQL disabled compression as of `e3bdb2d` and the documentation recommends against using it since. Additionally, SSL compression has been disabled in OpenSSL since version 1.1.0, and was disabled in many distributions long before that. The most recent TLS version, TLSv1.3, disallows compression at the protocol level. This commit removes the feature itself, removing support for the libpq parameter sslcompression (parameter still listed for compatibility reasons with existing connection strings, just ignored), and removes the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus. Note that, on top of removing the ability to activate compression by configuration, compression is actively disabled in both frontend and backend to avoid overrides from local configurations. A TAP test is added for deprecated SSL parameters to check after backwards compatibility. Bump catalog version. Author: Daniel Gustafsson Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier Discussion: https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se	2021-03-09 11:16:47 +09:00
Tom Lane	d4545dc19b	Complain if a function-in-FROM returns a set when it shouldn't. Throw a "function protocol violation" error if a function in FROM tries to return a set though it wasn't marked proretset. Although such cases work at the moment, it doesn't seem like something we want to guarantee will keep working. Besides, there are other negative consequences of not setting the proretset flag, such as potentially bad plans. No back-patch, since if there is any third-party code violating this expectation, people wouldn't appreciate us breaking it in a minor release. Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us	2021-03-08 18:54:55 -05:00
Tom Lane	5c06abb9b9	Validate the OID argument of pg_import_system_collations(). "SELECT pg_import_system_collations(0)" caused an assertion failure. With a random nonzero argument --- or indeed with zero, in non-assert builds --- it would happily make pg_collation entries with garbage values of collnamespace. These are harmless as far as I can tell (unless maybe the OID happens to become used for a schema, later on?). In any case this isn't a security issue, since the function is superuser-only. But it seems like a gotcha for unwary DBAs, so let's add a check that the given OID belongs to some schema. Back-patch to v10 where this function was introduced.	2021-03-08 18:21:51 -05:00
Tom Lane	6c20bdb2a2	Further tweak memory management for regex DFAs. Coverity is still unhappy after commit `190c79884`, and after looking closer I think it might be onto something. The callers of newdfa() typically drop out if v->err has been set nonzero, which newdfa() is faithfully doing if it fails. However, what if v->err was already nonzero before we entered newdfa()? Then newdfa() could succeed and the caller would promptly leak its result. I don't think this scenario can actually happen, but the predicate "v->err is always zero when newdfa() is called" seems difficult to be entirely sure of; there's a good deal of code that potentially could get that wrong. It seems better to adjust the callers to directly check for a null result instead of relying on ISERR() tests. This is slightly cheaper than the previous coding anyway. Lacking evidence that there's any real bug, no back-patch.	2021-03-08 16:32:29 -05:00
Amit Kapila	8a812e5106	Track replication origin progress for rollbacks. Commit `1eb6d6527a` allowed to track replica origin replay progress for 2PC but it was not complete. It misses to properly track the progress for rollback prepared especially it missed updating the code for recovery. Additionally, we need to allow tracking it on subscriber nodes where wal_level might not be logical. It is required to track decoding of 2PC which is committed in PG14 (`a271a1b50e`) and also nobody complained about this till now so not backpatching it. Author: Amit Kapila Reviewed-by: Michael Paquier and Ajin Cherian Discussion: https://postgr.es/m/CAA4eK1L-kHmMnSdrRW6UhRbCjR7cgh04c+6psY15qzT6ktcd+g@mail.gmail.com	2021-03-08 07:54:03 +05:30
Heikki Linnakangas	3174d69fb9	Remove server and libpq support for old FE/BE protocol version 2. Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be many clients or servers left out there without version 3 support. But as a courtesy, I kept just enough of the old protocol support that we can still send the "unsupported protocol version" error in v2 format, so that old clients can display the message properly. Likewise, libpq still understands v2 ErrorResponse messages when establishing a connection. The impetus to do this now is that I'm working on a patch to COPY FROM, to always prefetch some data. We cannot do that safely with the old protocol, because it requires parsing the input one byte at a time to detect the end-of-copy marker. Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi	2021-03-04 10:45:55 +02:00
Tom Lane	0a687c8f10	Add trim_array() function. This has been in the SQL spec since 2008. It's a pretty thin wrapper around the array slice functionality, but the spec says we should have it, so here it is. Vik Fearing, reviewed by Dian Fay Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org	2021-03-03 16:39:57 -05:00
Peter Eisentraut	e527a99055	Some copy-editing of GUC descriptions	2021-03-03 07:14:35 +01:00
Thomas Munro	8eda3eba30	Use sort_template.h for qsort_tuple() and qsort_ssup(). Replace the Perl code previously used to generate specialized sort functions with sort_template.h. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-03 17:02:32 +13:00
Amit Kapila	19890a064e	Add option to enable two_phase commits via pg_create_logical_replication_slot. Commit `0aa8a01d04` extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-03 07:34:11 +05:30
Peter Geoghegan	5b2f2af3d9	nbtree page deletion: Add leaftopparent assertion. Add documenting assertion. This makes it easier to follow how we maintain the top parent link in target subtree's half-dead/leaf level page.	2021-03-02 14:06:07 -08:00
Peter Geoghegan	3d8d5787a3	Fix nbtree page deletion error messages. Adjust some "can't happen" error messages that assumed that the page deletion target page must be a half-dead page. This assumption was wrong in the case of an internal target page. Simply refer to these pages as the target page instead. Internal pages are never marked half-dead. There is exactly one half-dead page for each subtree undergoing deletion. The half-dead page is also the target subtree's leaf-level page. This has been the case since commit `efada2b8`, which totally overhauled nbtree page deletion.	2021-03-02 13:02:24 -08:00
Tom Lane	d16f8c8e41	Mark default_transaction_read_only as GUC_REPORT. This allows clients to find out the setting at connection time without having to expend a query round trip to do so; which is helpful when trying to identify read/write servers. (One must also look at in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.) Modifying libpq to make use of this will come soon, but I felt it cleaner to push the server change separately. Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com	2021-03-02 13:53:54 -05:00
Tom Lane	4604f83fdf	Suppress unnecessary regex subre nodes in a couple more cases. This extends the changes made in commit `cebc1d34e`, teaching parseqatom() to generate fewer or cheaper subre nodes in some edge cases. The case of interest here is a quantified atom that is "messy" only because it has greediness opposite to what preceded it (whereas captures and backrefs are intrinsically messy). In this case we don't need an iteration node, since we don't care where the sub-matches of the quantifier are; and we might also not need a second concatenation node. This seems of only marginal real-world use according to my testing, but I wanted to get it in before wrapping up this series of regex performance fixes. Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-03-02 12:14:14 -05:00
Tom Lane	0c3405cf11	Improve performance of regular expression back-references. In some cases, at the time that we're doing an NFA-based precheck of whether a backref subexpression can match at a particular place in the string, we already know which substring the referenced subexpression matched. If so, we might as well forget about the NFA and just compare the substring; this is faster and it gives an exact rather than approximate answer. In general, this optimization can help while we are prechecking within the second child expression of a concat node, while the capture was within the first child expression; then the substring was saved during cdissect() of the first child and will be available to NFA checks done while cdissect() recurses into the second child. It can help quite a lot if the tree looks like concat / \ capture concat / \ expensive stuff backref as we will be able to avoid recursively dissecting the "expensive stuff" before discovering that the backref isn't satisfied with a particular midpoint that the lower concat node is testing. This doesn't help if the concat tree is left-deep, as the capture node won't get set soon enough (and it's hard to fix that without changing the engine's match behavior). Fortunately, right-deep concat trees are the common case. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:55:12 -05:00
Tom Lane	4aea704a5b	Fix semantics of regular expression back-references. POSIX defines the behavior of back-references thus: The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "\(" and "\)" preceding the '\n'. As far as I can see, the back-reference is supposed to consider only the data characters matched by the referenced subexpression. However, because our engine copies the NFA constructed from the referenced subexpression, it effectively enforces any constraints therein, too. As an example, '(^.)\1' ought to match 'xx', or any other string starting with two occurrences of the same character; but in our code it does not, and indeed can't match anything, because the '^' anchor constraint is included in the backref's copied NFA. If POSIX intended that, you'd think they'd mention it. Perl for one doesn't act that way, so it's hard to conclude that this isn't a bug. Fix by modifying the backref's NFA immediately after it's copied from the reference, replacing all constraint arcs by EMPTY arcs so that the constraints are treated as automatically satisfied. This still allows us to enforce matching rules that depend only on the data characters; for example, in '(^\d+).*\1' the NFA matching step will still know that the backref can only match strings of digits. Perhaps surprisingly, this change does not affect the results of any of a rather large corpus of real-world regexes. Nonetheless, I would not consider back-patching it, since it's a clear compatibility break. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:34:53 -05:00
Michael Paquier	fabde52fab	Simplify code to switch pg_class.relrowsecurity in tablecmds.c The same code pattern was repeated twice to enable or disable ROW LEVEL SECURITY with an ALTER TABLE command. This makes the code slightly cleaner. Author: Justin Pryzby Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com	2021-03-02 12:30:21 +09:00
Tom Lane	ffd3944ab9	Improve reporting for syntax errors in multi-line JSON data. Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com	2021-03-01 16:44:17 -05:00
Thomas Munro	bd69ddfcdb	Remove obsolete comment for WaitForProcSignalBarrier(). Commit `814f1d8b` removed the behavior described. Reported-by: Amit Kapila <amit.kapila16@gmail.com>	2021-03-02 09:30:57 +13:00
Thomas Munro	f5a5773a9d	Allow condition variables to be used in interrupt code. Adjust the condition variable sleep loop to work correctly when code reached by its internal CHECK_FOR_INTERRUPTS() call interacts with another condition variable. There are no such cases currently, but a proposed patch would do this. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:24:47 +13:00
Thomas Munro	814f1d8bc3	Use condition variables for ProcSignalBarriers. Instead of a poll/sleep loop, use a condition variable for precise wake-up whenever a backend's pss_barrierGeneration advances. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:23:43 +13:00
Amit Kapila	8bdb1332eb	Avoid repeated decoding of prepared transactions after a restart. In commit `a271a1b50e`, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-01 09:11:18 +05:30
Thomas Munro	6230912f23	Use FeBeWaitSet for walsender.c. This avoids the need to set up and tear down a fresh WaitEventSet every time we need need to wait. We have to add an explicit exit on postmaster exit (FeBeWaitSet isn't set up to do that automatically), so move the code to do that into a new function to avoid repetition. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:19:38 +13:00
Thomas Munro	a042ba2ba7	Introduce symbolic names for FeBeWaitSet positions. Previously we used 0 and 1 to refer to the socket and latch in far flung parts of the tree, without any explanation. Also use PGINVALID_SOCKET rather than -1 in a couple of places that didn't already do that. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:10:16 +13:00
Amit Kapila	b4e3dc7fd4	Update the docs and comments for decoding of prepared xacts. Commit `a271a1b50e` introduced decoding at prepare time in ReorderBuffer. This can lead to deadlock for out-of-core logical replication solutions that uses this feature to build distributed 2PC in case such transactions lock [user] catalog tables exclusively. They need to inform users to not have locks on catalog tables (via explicit LOCK command) in such transactions. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de	2021-03-01 08:14:33 +05:30
Thomas Munro	6148656a0b	Use EVFILT_SIGNAL for kqueue latches. Cut down on system calls and other overheads by waiting for SIGURG explicitly with kqueue instead of using a signal handler and self-pipe. Affects *BSD and macOS systems. This leaves only the poll implementation with a signal handler and the traditional self-pipe trick. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:20:04 +13:00
Thomas Munro	6a2a70a020	Use signalfd(2) for epoll latches. Cut down on system calls and other overheads by reading from a signalfd instead of using a signal handler and self-pipe. Affects Linux sytems, and possibly others including illumos that implement the Linux epoll and signalfd interfaces. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:12:02 +13:00
Thomas Munro	83709a0d5a	Use SIGURG rather than SIGUSR1 for latches. Traditionally, SIGUSR1 has been overloaded for ad-hoc signals, procsignal.c signals and latch.c wakeups. Move that last use over to a new dedicated signal. SIGURG is normally used to report out-of-band socket data, but PostgreSQL doesn't use that facility. The signal handler is now installed in all postmaster children by InitializeLatchSupport(). Those wishing to disconnect from it should call ShutdownLatchSupport(). Future patches will use this separation of signals to avoid the need for a signal handler on some operating systems. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	c8f3bc2401	Optimize latches to send fewer signals. Don't send signals to processes that aren't sleeping. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	d1b90995e8	Remove latch.c workaround for Linux < 2.6.27. Commit `82ebbeb0` added a workaround for systems with no epoll_create1() and EPOLL_CLOEXEC. Linux < 2.6.27 and glibc < 2.9 are long gone. Now seems like a good time to drop the extra code, because otherwise we'd have to add similar already-dead workaround code to new patches using XXX_CLOEXEC flags that arrived in the same kernel release. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com	2021-03-01 11:24:28 +13:00
Alvaro Herrera	25936fd46c	Fix use-after-free bug with AfterTriggersTableData.storeslot AfterTriggerSaveEvent() wrongly allocates the slot in execution-span memory context, whereas the correct thing is to allocate it in a transaction-span context, because that's where the enclosing AfterTriggersTableData instance belongs into. Backpatch to 12 (the test back to 11, where it works well with no code changes, and it's good to have to confirm that the case was previously well supported); this bug seems introduced by commit `ff11e7f4b9`. Reported-by: Bertrand Drouvot <bdrouvot@amazon.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com	2021-02-27 18:09:15 -03:00
David Rowley	977b2c0853	Add missing TidRangeScan readfunc Mistakenly forgotten in `bb437f995`	2021-02-27 23:21:21 +13:00
David Rowley	bb437f995d	Add TID Range Scans to support efficient scanning ranges of TIDs This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com	2021-02-27 22:59:36 +13:00
Peter Eisentraut	f4adc41c4f	Enhanced cycle mark values Per SQL:202x draft, in the CYCLE clause of a recursive query, the cycle mark values can be of type boolean and can be omitted, in which case they default to TRUE and FALSE. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com	2021-02-27 08:13:24 +01:00
Tom Lane	0fc1af174c	Improve memory management in regex compiler. The previous logic here created a separate pool of arcs for each state, so that the out-arcs of each state were physically stored within it. Perhaps this choice was driven by trying to not include a "from" pointer within each arc; but Spencer gave up on that idea long ago, and it's hard to see what the value is now. The approach turns out to be fairly disastrous in terms of memory consumption, though. In the first place, NFAs built by this engine seem to have about 4 arcs per state on average, with a majority having only one or two out-arcs. So pre-allocating 10 out-arcs for each state is already cause for a factor of two or more bloat. Worse, the NFA optimization phase moves arcs around with abandon. In a large NFA, some of the states will have hundreds of out-arcs, so towards the end of the optimization phase we have a significant number of states whose arc pools have room for hundreds of arcs each, even though only a few of those arcs are in use. We have seen real-world regexes in which this effect bloats the memory requirement by 25X or even more. Hence, get rid of the per-state arc pools in favor of a single arc pool for the whole NFA, with variable-sized allocation batches instead of always asking for 10 at a time. While we're at it, let's batch the allocations of state structs too, to further reduce the malloc traffic. This incidentally allows moveouts() to be optimized in a similar way to moveins(): when moving an arc to another state, it's now valid to just re-link the same arc struct into a different outchain, where before the code invariants required us to make a physically new arc and then free the old one. These changes reduce the regex compiler's typical space consumption for average-size regexes by about a factor of two, and much more for large or complicated regexes. In a large test set of real-world regexes, we formerly had half a dozen cases that failed with "regular expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE limit (about 150MB); we would have had to raise that limit to something close to 400MB to make them work with the old code. Now, none of those cases need more than 13MB to compile. Furthermore, the test set is about 10% faster overall due to less malloc traffic. Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us	2021-02-26 13:52:10 -05:00
Thomas Munro	8556267b2b	Revert "pg_collation_actual_version() -> pg_collation_current_version()." This reverts commit `9cf184cc05`. Name change less well received than anticipated. Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org	2021-02-26 15:29:27 +13:00
Tom Lane	80ca8464fe	Fix list-manipulation bug in WITH RECURSIVE processing. makeDependencyGraphWalker and checkWellFormedRecursionWalker thought they could hold onto a pointer to a list's first cons cell while the list was modified by recursive calls. That was okay when the cons cell was actually separately palloc'd ... but since commit `1cff1b95a`, it's quite unsafe, leading to core dumps or incorrect complaints of faulty WITH nesting. In the field this'd require at least a seven-deep WITH nest to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE allows the bug to be seen with lesser nesting depths. Per bug #16801 from Alexander Lakhin. Back-patch to v13. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org	2021-02-25 20:47:32 -05:00
Peter Geoghegan	2376361839	VACUUM VERBOSE: Count "newly deleted" index pages. Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM operation -- these are newly deleted pages. VACUUM VERBOSE continues to report on the total number of deleted pages in the entire index (no change there). The former is a subset of the latter. The distinction between each category of deleted index page only arises with index AMs where page deletion is supported and is decoupled from page recycling for performance reasons. This is follow-up work to commit `e5d8a999`, which made nbtree store 64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're deleted. Note that the btm_last_cleanup_num_delpages metapage field added by that commit usually gets set to pages_newly_deleted. The exceptions (the scenarios in which they're not equal) all seem to be tricky cases for the implementation (of page deletion and recycling) in general. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com	2021-02-25 14:32:18 -08:00
Tom Lane	301ed8812e	Doc: remove src/backend/regex/re_syntax.n. We aren't publishing this file as documentation, and it's been much more haphazardly maintained than the real docs in func.sgml, so let's just drop it. I think the only reason I included it in commit `7bcc6d98f` was that the Berkeley-era sources had had a man page in this directory. Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us	2021-02-25 13:33:27 -05:00
Tom Lane	7dc13a0f08	Change regex \D and \W shorthands to always match newlines. Newline is certainly not a digit, nor a word character, so it is sensible that it should match these complemented character classes. Previously, \D and \W acted that way by default, but in newline-sensitive mode ('n' or 'p' flag) they did not match newlines. This behavior was previously forced because explicit complemented character classes don't match newlines in newline-sensitive mode; but as of the previous commit that implementation constraint no longer exists. It seems useful to change this because the primary real-world use for newline-sensitive mode seems to be to match the default behavior of other regex engines such as Perl and Javascript ... and their default behavior is that these match newlines. The old behavior can be kept by writing an explicit complemented character class, i.e. [^[:digit:]] or [^[:word:]]. (This means that \D and \W are not exactly equivalent to those strings, but they weren't anyway.) Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us	2021-02-25 13:29:06 -05:00
Tom Lane	2a0af7fe46	Allow complemented character class escapes within regex brackets. The complement-class escapes \D, \S, \W are now allowed within bracket expressions. There is no semantic difficulty with doing that, but the rather hokey macro-expansion-based implementation previously used here couldn't cope. Also, invent "word" as an allowed character class name, thus "\w" is now equivalent to "[[:word:]]" outside brackets, or "[:word:]" within brackets. POSIX allows such implementation-specific extensions, and the same name is used in e.g. bash. One surprising compatibility issue this raises is that constructs such as "[\w-_]" are now disallowed, as our documentation has always said they should be: character classes can't be endpoints of a range. Previously, because \w was just a macro for "[:alnum:]_", such a construct was read as "[[:alnum:]_-_]", so it was accepted so long as the character after "-" was numerically greater than or equal to "_". Some implementation cleanup along the way: * Remove the lexnest() hack, and in consequence clean up wordchrs() to not interact with the lexer. * Fix colorcomplement() to not be O(N^2) in the number of colors involved. * Get rid of useless-as-far-as-I-can-see calls of element() on single-character character element names in brackpart(). element() always maps these to the character itself, and things would be quite broken if it didn't --- should "[a]" match something different than "a" does? Besides, the shortcut path in brackpart() wasn't doing this anyway, making it even more inconsistent. Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us	2021-02-25 13:00:40 -05:00
Peter Geoghegan	e5d8a99903	Use full 64-bit XIDs in deleted nbtree pages. Otherwise we risk "leaking" deleted pages by making them non-recyclable indefinitely. Commit `6655a729` did the same thing for deleted pages in GiST indexes. That work was used as a starting point here. Stop storing an XID indicating the oldest bpto.xact across all deleted though unrecycled pages in nbtree metapages. There is no longer any reason to care about that condition/the oldest XID. It only ever made sense when wraparound was something _bt_vacuum_needs_cleanup() had to consider. The btm_oldest_btpo_xact metapage field has been repurposed and renamed. It is now btm_last_cleanup_num_delpages, which is used to remember how many non-recycled deleted pages remain from the last VACUUM (in practice its value is usually the precise number of pages that were _newly deleted_ during the specific VACUUM operation that last set the field). The general idea behind storing btm_last_cleanup_num_delpages is to use it to give _some_ consideration to non-recycled deleted pages inside _bt_vacuum_needs_cleanup() -- though never too much. We only really need to avoid leaving a truly excessive number of deleted pages in an unrecycled state forever. We only do this to cover certain narrow cases where no other factor makes VACUUM do a full scan, and yet the index continues to grow (and so actually misses out on recycling existing deleted pages). These metapage changes result in a clear user-visible benefit: We no longer trigger full index scans during VACUUM operations solely due to the presence of only 1 or 2 known deleted (though unrecycled) blocks from a very large index. All that matters now is keeping the costs and benefits in balance over time. Fix an issue that has been around since commit `857f9c36`, which added the "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup() logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally hinged upon _when_ the source value gets stored. We now always store btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the issue because IndexVacuumInfo.num_heap_tuples (the source field) is expected to accurately indicate the state of the table _after_ the VACUUM completes inside btvacuumcleanup(). A backpatchable fix cannot easily be extracted from this commit. A targeted fix for the issue will follow in a later commit, though that won't happen today. I (pgeoghegan) have chosen to remove any mention of deleted pages in the documentation of the vacuum_cleanup_index_scale_factor GUC/param, since the presence of deleted (though unrecycled) pages is no longer of much concern to users. The vacuum_cleanup_index_scale_factor description in the docs now seems rather unclear in any case, and it should probably be rewritten in the near future. Perhaps some passing mention of page deletion will be added back at the same time. Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com	2021-02-24 18:41:34 -08:00
Amit Kapila	8a4f9522d0	Fix relcache reference leak introduced by `ce0fdbfe97`. Author: Sawada Masahiko Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com	2021-02-25 07:48:24 +05:30
Michael Paquier	bcf2667bf6	Fix some typos, grammar and style in docs and comments The portions fixing the documentation are backpatched where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com backpatch-through: 9.6	2021-02-24 16:13:17 +09:00
Peter Eisentraut	8ec8fe0f31	Message style fix Don't quote type name placeholders.	2021-02-24 07:00:49 +01:00
Alvaro Herrera	5a65eacfdc	Fix confusion in comments about generate_gather_paths `d2d8a229bc` introduced a new function generate_useful_gather_paths to be used as a replacement for generate_gather_paths, but forgot to update a couple of places that referenced the older function. This is possibly not 100% complete (ref. create_ordered_paths), but it's better than not changing anything. Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local	2021-02-23 20:05:15 -03:00
Alvaro Herrera	8deb6b38dc	Reinstate HEAP_XMAX_LOCK_ONLY\|HEAP_KEYS_UPDATED as allowed Commit `866e24d47d` added an assert that HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that the latter necessarily referred to an update and not a tuple lock; but that's wrong, because SELECT FOR UPDATE can use precisely that combination, as evidenced by the amcheck test case added here. Remove the Assert(), and also patch amcheck's verify_heapam.c to not complain if the combination is found. Also, out of overabundance of caution, update (across all branches) README.tuplock to be more explicit about this. Author: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Discussion: https://postgr.es/m/20210124061758.GA11756@nol	2021-02-23 17:30:21 -03:00
Tom Lane	3db05e76f9	Suppress compiler warning in new regex match-all detection code. gcc 10 is smart enough to notice that control could reach this "hasmatch[depth]" assignment with depth < 0, but not smart enough to know that that would require a badly broken NFA graph. Change the assert() to a plain runtime test to shut it up. Per report from Andres Freund. Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de	2021-02-23 13:55:34 -05:00
Alvaro Herrera	d9d076222f	VACUUM: ignore indexing operations with CONCURRENTLY As envisioned in commit `c98763bf51`, it is possible for VACUUM to ignore certain transactions that are executing CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's because we know those transactions are not going to examine any other tables, and are not going to execute anything else in the same transaction. (Only operations on "safe" indexes can be ignored: those on indexes that are neither partial nor expressional). This is extremely useful in cases where CIC/RC can run for a very long time, because that used to be a significant headache for concurrent vacuuming of other tables. Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql	2021-02-23 12:15:09 -03:00
Peter Eisentraut	6f6f284c7e	Simplify printing of LSNs Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com	2021-02-23 10:27:02 +01:00
Amit Kapila	ade89ba5f4	Fix an oversight in ReorderBufferFinishPrepared. We don't have anything to decode in a transaction if ReorderBufferTXN doesn't exist by the time we decode the commit prepared. So don't create a new ReorderBufferTXN here. This is an oversight in commit `a271a1b5`. Reported-by: Markus Wanner Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch	2021-02-23 09:47:41 +05:30
Amit Kapila	bc617a7b1c	Change the error message for logical replication authentication failure. The authentication failure error message wasn't distinguishing whether it is a physical replication or logical replication connection failure and was giving incomplete information on what led to failure in case of logical replication connection. Author: Paul Martinez and Amit Kapila Reviewed-by: Euler Taveira and Amit Kapila Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com	2021-02-23 09:11:22 +05:30
Alvaro Herrera	0f5505a881	Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls Pavan Deolasee recently noted that a few of the HeapTupleHeaderIndicatesMovedPartitions calls added by commit `5db6df0c01` are useless, since they are done after comparing t_self with t_ctid. But because t_self can never be set to the magical values that indicate that the tuple moved partition, this can never succeed: if the first test fails (so we know t_self equals t_ctid), necessarily the second test will also fail. So these checks can be removed and no harm is done. There's no bug here, just a code legibility issue. Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com> Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql	2021-02-22 16:51:44 -03:00
Alvaro Herrera	6a03369a71	Fix typo	2021-02-22 11:34:05 -03:00
Thomas Munro	beb4480c85	Refactor get_collation_current_version(). The code paths for three different OSes finished up with three different ways of excluding C[.xxx] and POSIX from consideration. Merge them. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:32:16 +13:00
Thomas Munro	9cf184cc05	pg_collation_actual_version() -> pg_collation_current_version(). The new name seems a bit more natural. Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:32:16 +13:00
Thomas Munro	0fb0a0503b	Hide internal error for pg_collation_actual_version(<bad OID>). Instead of an unsightly internal "cache lookup failed" message, just return NULL for bad OIDs, as is the convention for other similar things. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com	2021-02-22 23:01:20 +13:00
Fujii Masao	f05ed5a5cf	Initialize atomic variable waitStart in PGPROC, at postmaster startup. Commit `46d6e5f567` added the atomic variable "waitStart" into PGPROC struct, to store the time at which wait for lock acquisition started. Previously this variable was initialized every time each backend started. Instead, this commit makes postmaster initialize it at the startup, to ensure that the variable should be initialized before any use of it. This commit also moves the code to initialize "waitStart" variable for prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts(). Because MarkAsPreparingGuts() is more proper place to do that since it initializes other PGPROC variables. Author: Fujii Masao Reviewed-by: Atsushi Torikoshi Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com	2021-02-22 18:25:00 +09:00
Peter Eisentraut	efbfb64241	Improve new hash partition bound check error messages For the error message "every hash partition modulus must be a factor of the next larger modulus", add a detail message that shows the particular numbers and existing partition involved. Also comment the code more. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com	2021-02-22 08:06:45 +01:00
Michael Paquier	9294264278	Use pgstat_progress_update_multi_param() where possible This commit changes one code path in REINDEX INDEX and one code path in CREATE INDEX CONCURRENTLY to report the progress of each operation using pgstat_progress_update_multi_param() rather than multiple calls to pgstat_progress_update_param(). This has the advantage to make the progress report more consistent to the end-user without impacting the amount of information provided. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com	2021-02-22 14:21:40 +09:00
Thomas Munro	db8374d804	Remove outdated reference to RAID spindles. Commit `b09ff536` left behind some outdated advice in the long_desc field of the GUC "effective_io_concurrency". Remove it. Back-patch to 13. Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com	2021-02-22 14:42:15 +13:00
Tom Lane	190c79884a	Simplify memory management for regex DFAs a little. Coverity complained that functions in regexec.c might leak DFA storage. It's wrong, but this logic is confusing enough that it's not so surprising Coverity couldn't make sense of it. Rewrite in hopes of making it more legible to humans as well as machines.	2021-02-21 20:29:11 -05:00
Tom Lane	ea1268f630	Avoid generating extra subre tree nodes for capturing parentheses. Previously, each pair of capturing parentheses gave rise to a separate subre tree node, whose only function was to identify that we ought to capture the match details for this particular sub-expression. In most cases we don't really need that, since we can perfectly well put a "capture this" annotation on the child node that does the real matching work. As with the two preceding commits, the main value of this is to avoid generating and optimizing an NFA for a tree node that's not really pulling its weight. The chosen data representation only allows one capture annotation per subre node. In the legal-per-spec, but seemingly not very useful, case where there are multiple capturing parens around the exact same bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture nodes that act the same as before. We could work harder at that but I'll refrain, pending some evidence that such cases are worth troubling over. In passing, improve the comments in regex.h to say what all the different re_info bits mean. Some of them were pretty obvious but others not so much, so reverse-engineer some documentation. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 19:26:41 -05:00
Tom Lane	5810430894	Convert regex engine's subre tree from binary to N-ary style. Instead of having left and right child links in subre structs, have a single child link plus a sibling link. Multiple children of a tree node are now reached by chasing the sibling chain. The beneficiary of this is alternation tree nodes. A regular expression with N (>1) branches is now represented by one alternation node with N children, rather than a tree that includes N alternation nodes as well as N children. While the old representation didn't really cost anything extra at execution time, it was pretty horrid for compilation purposes, because each of the alternation nodes had its own NFA, which we were too stupid not to separately optimize. (To make matters worse, all of those NFAs described the entire alternation pattern, not just the portion of it that one might expect from the tree structure.) We continue to require concatenation nodes to have exactly two children. This data structure is now prepared to support more, but the executor's logic would need some careful redesign, and it's not clear that a lot of benefit could be had. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 19:07:45 -05:00
Tom Lane	cebc1d34e5	Fix regex engine to suppress useless concatenation sub-REs. The comment for parsebranch() claims that it avoids generating unnecessary concatenation nodes in the "subre" tree, but it missed some significant cases. Once we've decided that a given atom is "messy" and can't be bundled with the preceding atom(s) of the current regex branch, parseqatom() always generated two new concat nodes, one to concat the messy atom to what follows it in the branch, and an upper node to concatenate the preceding part of the branch to that one. But one or both of these could be unnecessary, if the messy atom is the first, last, or only one in the branch. Improve the code to suppress such useless concat nodes, along with the no-op child nodes representing empty chunks of a branch. Reducing the number of subre tree nodes offers significant savings not only at execution but during compilation, because each subre node has its own NFA that has to be separately optimized. (Maybe someday we'll figure out how to share the optimization work across multiple tree nodes, but it doesn't look easy.) Eliminating upper tree nodes is especially useful because they tend to have larger NFAs. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:45:29 -05:00
Tom Lane	824bf71902	Recognize "match-all" NFAs within the regex engine. This builds on the previous "rainbow" patch to detect NFAs that will match any string, though possibly with constraints on the string length. This definition is chosen to match constructs such as ".*", ".+", and ".{1,100}". Recognizing such an NFA after the optimization pass is fairly cheap, since we basically just have to verify that all arcs are RAINBOW arcs and count the number of steps to the end state. (Well, there's a bit of complication with pseudo-color arcs for string boundary conditions, but not much.) Once we have these markings, the regex executor functions longest(), shortest(), and matchuntil() don't have to expend per-character work to determine whether a given substring satisfies such an NFA; they just need to check its length against the bounds. Since some matching problems require O(N) invocations of these functions, we've reduced the runtime for an N-character string from O(N^2) to O(N). Of course, this is no help for non-matchall sub-patterns, but those usually have constraints that allow us to avoid needing O(N) substring checks in the first place. It's precisely the unconstrained "match-all" cases that cause the most headaches. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:31:19 -05:00
Tom Lane	08c0d6ad65	Invent "rainbow" arcs within the regex engine. Some regular expression constructs, most notably the "." match-anything metacharacter, produce a sheaf of parallel NFA arcs covering all possible colors (that is, character equivalence classes). We can make a noticeable improvement in the space and time needed to process large regexes by replacing such cases with a single arc bearing the special color code "RAINBOW". This requires only minor additional complication in places such as pull() and push(). Callers of pg_reg_getoutarcs() must now be prepared for the possibility of seeing a RAINBOW arc. For the one known user, contrib/pg_trgm, that's a net benefit since it cuts the number of arcs to be dealt with, and the handling isn't any different than for other colors that contain too many characters to be dealt with individually. This is part of a patch series that in total reduces the regex engine's runtime by about a factor of four on a large corpus of real-world regexes. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-02-20 18:11:56 -05:00
Fujii Masao	8a55cb5ba9	Fix bug in COMMIT AND CHAIN command. This commit fixes COMMIT AND CHAIN command so that it starts new transaction immediately even if savepoints are defined within the transaction to commit. Previously COMMIT AND CHAIN command did not in that case because commit `280a408b48` forgot to make CommitTransactionCommand() handle a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT. Also this commit adds the regression test for COMMIT AND CHAIN command when savepoints are defined. Back-patch to v12 where transaction chaining was added. Reported-by: Arthur Nascimento Author: Fujii Masao Reviewed-by: Arthur Nascimento, Vik Fearing Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org	2021-02-19 21:57:52 +09:00
Peter Eisentraut	678d0e239b	Update snowball Update to snowball tag v2.1.0. Major changes are new stemmers for Armenian, Serbian, and Yiddish.	2021-02-19 08:10:15 +01:00
Peter Geoghegan	b071a31149	Add nbtree README section on page recycling. Consolidate discussion of how VACUUM places pages in the FSM for recycling by adding a new section that comes after discussion of page deletion. This structure reflects the fact that page recycling is explicitly decoupled from page deletion in Lanin & Shasha's paper. Page recycling in nbtree is an implementation of what the paper calls "the drain technique". This decoupling is an important concept for nbtree VACUUM. Searchers have to detect and recover from concurrent page deletions, but they will never have to reason about concurrent page recycling. Recycling can almost always be thought of as a low level garbage collection operation that asynchronously frees the physical space that backs a logical tree node. Almost all code need only concern itself with logical tree nodes. (Note that "logical tree node" is not currently a term of art in the nbtree code -- this all works implicitly.) This is preparation for an upcoming patch that teaches nbtree VACUUM to remember the details of pages that it deletes on the fly, in local memory. This enables the same VACUUM operation to consider placing its own deleted pages in the FSM later on, when it reaches the end of btvacuumscan().	2021-02-18 21:16:33 -08:00
Tom Lane	b5a66e7353	Fix another ancient bug in parsing of BRE-mode regular expressions. While poking at the regex code, I happened to notice that the bug squashed in commit `afcc8772e` had a sibling: next() failed to return a specific value associated with the '}' token for a "\{m,n\}" quantifier when parsing in basic RE mode. Again, this could result in treating the quantifier as non-greedy, which it never should be in basic mode. For that to happen, the last character before "\}" that sets "nextvalue" would have to set it to zero, or it'd have to have accidentally been zero from the start. The failure can be provoked repeatably with, for example, a bound ending in digit "0". Like the previous patch, back-patch all the way.	2021-02-18 22:38:55 -05:00

1 2 3 4 5 ...

21660 Commits