postgresql

Commit Graph

Author	SHA1	Message	Date
Amit Kapila	8af3c233e4	Clarify the usage of max_replication_slots on the subscriber side. It was not clear in the docs that the max_replication_slots is also used to track replication origins on the subscriber side. Author: Paul Martinez Reviewed-by: Amit Kapila Backpatch-through: 10 where logical replication was introduced Discussion: https://postgr.es/m/CACqFVBZgwCN_pHnW6dMNCrOS7tiHCw6Retf_=U2Vvj3aUSeATw@mail.gmail.com	2021-03-03 12:01:56 +05:30
Peter Eisentraut	e527a99055	Some copy-editing of GUC descriptions	2021-03-03 07:14:35 +01:00
Tom Lane	d422a2a94b	Silence perlcritic warning in commit `ee28cacf6`. Per buildfarm; this fix is from Michael Paquier (vignesh C proposed nearly the same). Discussion: https://postgr.es/m/YD8IZ9OKfUf9X1eF@paquier.xyz	2021-03-02 23:32:43 -05:00
Thomas Munro	8eda3eba30	Use sort_template.h for qsort_tuple() and qsort_ssup(). Replace the Perl code previously used to generate specialized sort functions with sort_template.h. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-03 17:02:32 +13:00
Thomas Munro	f374f4d664	Use sort_template.h for qsort() and qsort_arg(). Reduce duplication by using the new template. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-03 17:02:32 +13:00
Thomas Munro	0a1f1d3cac	Add sort_template.h for making sort functions. Move our qsort implementation into a header that can be used to define specialized functions for better performance and reduced duplication. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com	2021-03-03 17:02:22 +13:00
Amit Kapila	19890a064e	Add option to enable two_phase commits via pg_create_logical_replication_slot. Commit `0aa8a01d04` extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-03 07:34:11 +05:30
Tom Lane	ee28cacf61	Extend the abilities of libpq's target_session_attrs parameter. In addition to the existing options of "any" and "read-write", we now support "read-only", "primary", "standby", and "prefer-standby". "read-write" retains its previous meaning of "transactions are read-write by default", and "read-only" inverts that. The other three modes test specifically for hot-standby status, which is not quite the same thing. (Setting default_transaction_read_only on a primary server renders it read-only to this logic, but not a standby.) Furthermore, if talking to a v14 or later server, no extra network round trip is needed to detect the session's status; the GUC_REPORT variables delivered by the server are enough. When talking to an older server, a SHOW or SELECT query is issued to detect session read-only-ness or server hot-standby state, as needed. Haribabu Kommi, Greg Nancarrow, Vignesh C, Tom Lane; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com	2021-03-02 20:17:48 -05:00
Michael Paquier	57e6db706e	Add --tablespace option to reindexdb This option provides REINDEX (TABLESPACE) for reindexdb, applying the tablespace value given by the caller to all the REINDEX queries generated. While on it, this commit adds some tests for REINDEX TABLESPACE, with and without CONCURRENTLY, when run on toast indexes and tables. Such operations are not allowed, and toast relation names are not stable enough to be part of the main regression test suite (even if using a PL function with a TRY/CATCH logic, as CONCURRENTLY could not be tested). Author: Michael Paquier Reviewed-by: Mark Dilger, Daniel Gustafsson Discussion: https://postgr.es/m/YDiaDMnzLICqeukl@paquier.xyz	2021-03-03 10:14:21 +09:00
Peter Geoghegan	5b2f2af3d9	nbtree page deletion: Add leaftopparent assertion. Add documenting assertion. This makes it easier to follow how we maintain the top parent link in target subtree's half-dead/leaf level page.	2021-03-02 14:06:07 -08:00
Peter Geoghegan	3d8d5787a3	Fix nbtree page deletion error messages. Adjust some "can't happen" error messages that assumed that the page deletion target page must be a half-dead page. This assumption was wrong in the case of an internal target page. Simply refer to these pages as the target page instead. Internal pages are never marked half-dead. There is exactly one half-dead page for each subtree undergoing deletion. The half-dead page is also the target subtree's leaf-level page. This has been the case since commit `efada2b8`, which totally overhauled nbtree page deletion.	2021-03-02 13:02:24 -08:00
Tom Lane	d16f8c8e41	Mark default_transaction_read_only as GUC_REPORT. This allows clients to find out the setting at connection time without having to expend a query round trip to do so; which is helpful when trying to identify read/write servers. (One must also look at in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.) Modifying libpq to make use of this will come soon, but I felt it cleaner to push the server change separately. Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith. Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com	2021-03-02 13:53:54 -05:00
Alvaro Herrera	75dbfe4ca7	Use native path separators to pg_ctl in initdb On Windows, CMD.EXE allegedly does not run a command that uses forward slashes, so let's convert the path to use backslashes instead. Backpatch to 10. Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> Discussion: https://postgr.es/m/CAMm1aWaNDuaPYFYMAqDeJrZmPtNvLcJRS++CcZWY8LT6KcoBZw@mail.gmail.com	2021-03-02 15:39:34 -03:00
Tom Lane	4604f83fdf	Suppress unnecessary regex subre nodes in a couple more cases. This extends the changes made in commit `cebc1d34e`, teaching parseqatom() to generate fewer or cheaper subre nodes in some edge cases. The case of interest here is a quantified atom that is "messy" only because it has greediness opposite to what preceded it (whereas captures and backrefs are intrinsically messy). In this case we don't need an iteration node, since we don't care where the sub-matches of the quantifier are; and we might also not need a second concatenation node. This seems of only marginal real-world use according to my testing, but I wanted to get it in before wrapping up this series of regex performance fixes. Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us	2021-03-02 12:14:14 -05:00
Tom Lane	0c3405cf11	Improve performance of regular expression back-references. In some cases, at the time that we're doing an NFA-based precheck of whether a backref subexpression can match at a particular place in the string, we already know which substring the referenced subexpression matched. If so, we might as well forget about the NFA and just compare the substring; this is faster and it gives an exact rather than approximate answer. In general, this optimization can help while we are prechecking within the second child expression of a concat node, while the capture was within the first child expression; then the substring was saved during cdissect() of the first child and will be available to NFA checks done while cdissect() recurses into the second child. It can help quite a lot if the tree looks like concat / \ capture concat / \ expensive stuff backref as we will be able to avoid recursively dissecting the "expensive stuff" before discovering that the backref isn't satisfied with a particular midpoint that the lower concat node is testing. This doesn't help if the concat tree is left-deep, as the capture node won't get set soon enough (and it's hard to fix that without changing the engine's match behavior). Fortunately, right-deep concat trees are the common case. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:55:12 -05:00
Tom Lane	4aea704a5b	Fix semantics of regular expression back-references. POSIX defines the behavior of back-references thus: The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "\(" and "\)" preceding the '\n'. As far as I can see, the back-reference is supposed to consider only the data characters matched by the referenced subexpression. However, because our engine copies the NFA constructed from the referenced subexpression, it effectively enforces any constraints therein, too. As an example, '(^.)\1' ought to match 'xx', or any other string starting with two occurrences of the same character; but in our code it does not, and indeed can't match anything, because the '^' anchor constraint is included in the backref's copied NFA. If POSIX intended that, you'd think they'd mention it. Perl for one doesn't act that way, so it's hard to conclude that this isn't a bug. Fix by modifying the backref's NFA immediately after it's copied from the reference, replacing all constraint arcs by EMPTY arcs so that the constraints are treated as automatically satisfied. This still allows us to enforce matching rules that depend only on the data characters; for example, in '(^\d+).*\1' the NFA matching step will still know that the backref can only match strings of digits. Perhaps surprisingly, this change does not affect the results of any of a rather large corpus of real-world regexes. Nonetheless, I would not consider back-patching it, since it's a clear compatibility break. Patch by me, reviewed by Joel Jacobson Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us	2021-03-02 11:34:53 -05:00
Michael Paquier	c5530d8474	Fix duplicated test case in TAP tests of reindexdb The same test for REINDEX (VERBOSE) was done twice, while it is clear that the second test should use --concurrently. Issue introduced in `5dc92b8`, for what looks like a copy-paste mistake. Reviewed-by: Mark Dilger Discussion: https://postgr.es/m/A7AE97EA-F4B0-4CAB-8FFF-3FECD31F9D63@enterprisedb.com Backpatch-through: 12	2021-03-02 13:18:06 +09:00
Michael Paquier	fabde52fab	Simplify code to switch pg_class.relrowsecurity in tablecmds.c The same code pattern was repeated twice to enable or disable ROW LEVEL SECURITY with an ALTER TABLE command. This makes the code slightly cleaner. Author: Justin Pryzby Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com	2021-03-02 12:30:21 +09:00
Michael Paquier	bd1b8d0ef2	doc: Improve description of data checksums This partially reverts `bcf2667` that got incorrectly merged, and this improves the wording of the documentation that existed before that. Per discussion with Justin Pryzby. Discussion: https://postgr.es/m/20210301004647.GF20769@telsasoft.com	2021-03-02 10:50:13 +09:00
Michael Paquier	8c1b6a186d	doc: Mention archive_command failure handling on signals The behavior is similar to restore_command, which was already documented for the restore part, but not the archive part. Author: Benoit Lobréau Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CAPE8EZ7akCzc1hWohA4AcbmKtHh9rcWAB5MStOeZD2+9jC+hLQ@mail.gmail.com	2021-03-02 10:25:47 +09:00
Tom Lane	ffd3944ab9	Improve reporting for syntax errors in multi-line JSON data. Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com	2021-03-01 16:44:17 -05:00
Thomas Munro	bd69ddfcdb	Remove obsolete comment for WaitForProcSignalBarrier(). Commit `814f1d8b` removed the behavior described. Reported-by: Amit Kapila <amit.kapila16@gmail.com>	2021-03-02 09:30:57 +13:00
Andres Freund	1e6e404471	Fix recovery test hang in 021_row_visibility.pl on windows. The psql processes were not explicitly killed (but would eventually exit due postgres shutting down). For some reason windows perl doesn't like that, resulting in errors like Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction. The test was introduced in d6734a897e3, so no backpatching necessary.	2021-03-01 11:25:24 -08:00
Thomas Munro	f5a5773a9d	Allow condition variables to be used in interrupt code. Adjust the condition variable sleep loop to work correctly when code reached by its internal CHECK_FOR_INTERRUPTS() call interacts with another condition variable. There are no such cases currently, but a proposed patch would do this. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:24:47 +13:00
Thomas Munro	814f1d8bc3	Use condition variables for ProcSignalBarriers. Instead of a poll/sleep loop, use a condition variable for precise wake-up whenever a backend's pss_barrierGeneration advances. Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com	2021-03-01 17:23:43 +13:00
Amit Kapila	8bdb1332eb	Avoid repeated decoding of prepared transactions after a restart. In commit `a271a1b50e`, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com	2021-03-01 09:11:18 +05:30
Thomas Munro	6230912f23	Use FeBeWaitSet for walsender.c. This avoids the need to set up and tear down a fresh WaitEventSet every time we need need to wait. We have to add an explicit exit on postmaster exit (FeBeWaitSet isn't set up to do that automatically), so move the code to do that into a new function to avoid repetition. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:19:38 +13:00
Thomas Munro	a042ba2ba7	Introduce symbolic names for FeBeWaitSet positions. Previously we used 0 and 1 to refer to the socket and latch in far flung parts of the tree, without any explanation. Also use PGINVALID_SOCKET rather than -1 in a couple of places that didn't already do that. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com	2021-03-01 16:10:16 +13:00
Amit Kapila	cf54e04b9e	Update docs of logical replication for commit `ce0fdbfe97`. Forgot to update the logical replication configuration settings page. After commit `ce0fdbfe97`, table synchronization workers also started using replication origins to track the progress and the same should be reflected in docs. Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KkbppndxxRKbaT2sXrLkdPwy44F4pjEZ0EDrVjD9MPjQ@mail.gmail.com	2021-03-01 08:23:41 +05:30
Amit Kapila	b4e3dc7fd4	Update the docs and comments for decoding of prepared xacts. Commit `a271a1b50e` introduced decoding at prepare time in ReorderBuffer. This can lead to deadlock for out-of-core logical replication solutions that uses this feature to build distributed 2PC in case such transactions lock [user] catalog tables exclusively. They need to inform users to not have locks on catalog tables (via explicit LOCK command) in such transactions. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de	2021-03-01 08:14:33 +05:30
Thomas Munro	6148656a0b	Use EVFILT_SIGNAL for kqueue latches. Cut down on system calls and other overheads by waiting for SIGURG explicitly with kqueue instead of using a signal handler and self-pipe. Affects *BSD and macOS systems. This leaves only the poll implementation with a signal handler and the traditional self-pipe trick. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:20:04 +13:00
Thomas Munro	6a2a70a020	Use signalfd(2) for epoll latches. Cut down on system calls and other overheads by reading from a signalfd instead of using a signal handler and self-pipe. Affects Linux sytems, and possibly others including illumos that implement the Linux epoll and signalfd interfaces. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 14:12:02 +13:00
Thomas Munro	83709a0d5a	Use SIGURG rather than SIGUSR1 for latches. Traditionally, SIGUSR1 has been overloaded for ad-hoc signals, procsignal.c signals and latch.c wakeups. Move that last use over to a new dedicated signal. SIGURG is normally used to report out-of-band socket data, but PostgreSQL doesn't use that facility. The signal handler is now installed in all postmaster children by InitializeLatchSupport(). Those wishing to disconnect from it should call ShutdownLatchSupport(). Future patches will use this separation of signals to avoid the need for a signal handler on some operating systems. Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	c8f3bc2401	Optimize latches to send fewer signals. Don't send signals to processes that aren't sleeping. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com	2021-03-01 12:44:12 +13:00
Thomas Munro	d1b90995e8	Remove latch.c workaround for Linux < 2.6.27. Commit `82ebbeb0` added a workaround for systems with no epoll_create1() and EPOLL_CLOEXEC. Linux < 2.6.27 and glibc < 2.9 are long gone. Now seems like a good time to drop the extra code, because otherwise we'd have to add similar already-dead workaround code to new patches using XXX_CLOEXEC flags that arrived in the same kernel release. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com	2021-03-01 11:24:28 +13:00
Michael Paquier	943eb47880	pgbench: Remove now-dead CState->ecnt The last use of ecnt was in `12788ae`. It was getting incremented after a backend error without any purpose since then, so let's get rid of it. Author: Kota Miyake Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/786c3d9fbe067763d899e78c296f9f0f@oss.nttdata.com	2021-02-28 07:50:26 +09:00
Alvaro Herrera	25936fd46c	Fix use-after-free bug with AfterTriggersTableData.storeslot AfterTriggerSaveEvent() wrongly allocates the slot in execution-span memory context, whereas the correct thing is to allocate it in a transaction-span context, because that's where the enclosing AfterTriggersTableData instance belongs into. Backpatch to 12 (the test back to 11, where it works well with no code changes, and it's good to have to confirm that the case was previously well supported); this bug seems introduced by commit `ff11e7f4b9`. Reported-by: Bertrand Drouvot <bdrouvot@amazon.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com	2021-02-27 18:09:15 -03:00
Noah Misch	388b959315	Raise a timeout to 180s, in contrib/test_decoding. Per buildfarm member hornet. The test is new in v14, so no back-patch.	2021-02-27 07:02:56 -08:00
David Rowley	977b2c0853	Add missing TidRangeScan readfunc Mistakenly forgotten in `bb437f995`	2021-02-27 23:21:21 +13:00
David Rowley	bb437f995d	Add TID Range Scans to support efficient scanning ranges of TIDs This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com	2021-02-27 22:59:36 +13:00
Peter Eisentraut	f4adc41c4f	Enhanced cycle mark values Per SQL:202x draft, in the CYCLE clause of a recursive query, the cycle mark values can be of type boolean and can be omitted, in which case they default to TRUE and FALSE. Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com	2021-02-27 08:13:24 +01:00
Tom Lane	4e90052c46	Doc: further clarify libpq's description of connection string URIs. Break the synopsis into named parts to make it less confusing. Make more than zero effort at applying SGML markup. Do a bit of copy-editing of nearby text. The synopsis revision is by Alvaro Herrera and Paul Förster, the rest is my fault. Back-patch to v10 where multi-host connection strings appeared. Discussion: https://postgr.es/m/6E752D6B-487C-463E-B6E2-C32E7FB007EA@gmail.com	2021-02-26 15:24:00 -05:00
Tom Lane	0fc1af174c	Improve memory management in regex compiler. The previous logic here created a separate pool of arcs for each state, so that the out-arcs of each state were physically stored within it. Perhaps this choice was driven by trying to not include a "from" pointer within each arc; but Spencer gave up on that idea long ago, and it's hard to see what the value is now. The approach turns out to be fairly disastrous in terms of memory consumption, though. In the first place, NFAs built by this engine seem to have about 4 arcs per state on average, with a majority having only one or two out-arcs. So pre-allocating 10 out-arcs for each state is already cause for a factor of two or more bloat. Worse, the NFA optimization phase moves arcs around with abandon. In a large NFA, some of the states will have hundreds of out-arcs, so towards the end of the optimization phase we have a significant number of states whose arc pools have room for hundreds of arcs each, even though only a few of those arcs are in use. We have seen real-world regexes in which this effect bloats the memory requirement by 25X or even more. Hence, get rid of the per-state arc pools in favor of a single arc pool for the whole NFA, with variable-sized allocation batches instead of always asking for 10 at a time. While we're at it, let's batch the allocations of state structs too, to further reduce the malloc traffic. This incidentally allows moveouts() to be optimized in a similar way to moveins(): when moving an arc to another state, it's now valid to just re-link the same arc struct into a different outchain, where before the code invariants required us to make a physically new arc and then free the old one. These changes reduce the regex compiler's typical space consumption for average-size regexes by about a factor of two, and much more for large or complicated regexes. In a large test set of real-world regexes, we formerly had half a dozen cases that failed with "regular expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE limit (about 150MB); we would have had to raise that limit to something close to 400MB to make them work with the old code. Now, none of those cases need more than 13MB to compile. Furthermore, the test set is about 10% faster overall due to less malloc traffic. Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us	2021-02-26 13:52:10 -05:00
Peter Eisentraut	b3a9e9897e	Extend a test case a little This will possibly help a subsequent patch by making sure the notice messages are distinct so that it's clear that they come out in the right order. Author: Fabien COELHO <coelho@cri.ensmp.fr> Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1904240654120.3407%40lancre	2021-02-26 09:11:15 +01:00
Michael Paquier	329784e118	doc: Improve {archive,restore}_command for compressed logs The commands mentioned in the docs with gzip and gunzip did not prefix the archives with ".gz" and used inconsistent paths for the archives, which can be confusing. Reported-by: Philipp Gramzow Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/161397938841.15451.13129264141285167267@wrigleys.postgresql.org	2021-02-26 14:39:03 +09:00
Thomas Munro	8556267b2b	Revert "pg_collation_actual_version() -> pg_collation_current_version()." This reverts commit `9cf184cc05`. Name change less well received than anticipated. Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org	2021-02-26 15:29:27 +13:00
Tom Lane	80ca8464fe	Fix list-manipulation bug in WITH RECURSIVE processing. makeDependencyGraphWalker and checkWellFormedRecursionWalker thought they could hold onto a pointer to a list's first cons cell while the list was modified by recursive calls. That was okay when the cons cell was actually separately palloc'd ... but since commit `1cff1b95a`, it's quite unsafe, leading to core dumps or incorrect complaints of faulty WITH nesting. In the field this'd require at least a seven-deep WITH nest to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE allows the bug to be seen with lesser nesting depths. Per bug #16801 from Alexander Lakhin. Back-patch to v13. Michael Paquier and Tom Lane Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org	2021-02-25 20:47:32 -05:00
Peter Geoghegan	2376361839	VACUUM VERBOSE: Count "newly deleted" index pages. Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM operation -- these are newly deleted pages. VACUUM VERBOSE continues to report on the total number of deleted pages in the entire index (no change there). The former is a subset of the latter. The distinction between each category of deleted index page only arises with index AMs where page deletion is supported and is decoupled from page recycling for performance reasons. This is follow-up work to commit `e5d8a999`, which made nbtree store 64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're deleted. Note that the btm_last_cleanup_num_delpages metapage field added by that commit usually gets set to pages_newly_deleted. The exceptions (the scenarios in which they're not equal) all seem to be tricky cases for the implementation (of page deletion and recycling) in general. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com	2021-02-25 14:32:18 -08:00
Tom Lane	301ed8812e	Doc: remove src/backend/regex/re_syntax.n. We aren't publishing this file as documentation, and it's been much more haphazardly maintained than the real docs in func.sgml, so let's just drop it. I think the only reason I included it in commit `7bcc6d98f` was that the Berkeley-era sources had had a man page in this directory. Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us	2021-02-25 13:33:27 -05:00
Tom Lane	7dc13a0f08	Change regex \D and \W shorthands to always match newlines. Newline is certainly not a digit, nor a word character, so it is sensible that it should match these complemented character classes. Previously, \D and \W acted that way by default, but in newline-sensitive mode ('n' or 'p' flag) they did not match newlines. This behavior was previously forced because explicit complemented character classes don't match newlines in newline-sensitive mode; but as of the previous commit that implementation constraint no longer exists. It seems useful to change this because the primary real-world use for newline-sensitive mode seems to be to match the default behavior of other regex engines such as Perl and Javascript ... and their default behavior is that these match newlines. The old behavior can be kept by writing an explicit complemented character class, i.e. [^[:digit:]] or [^[:word:]]. (This means that \D and \W are not exactly equivalent to those strings, but they weren't anyway.) Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us	2021-02-25 13:29:06 -05:00

1 2 3 4 5 ...

50856 Commits All Branches Search

50856 Commits

All Branches