Using pgbench in an environment with both PGPORT and PGUSER set would
have caused the generation of a debug log with an incorrect database
name due to an oversight in 412893b. Not specifying user, port and/or
database using the option switches, without their respective environment
variables, generated a log entry with empty strings, which was
rather useless.
This commit fixes this set of issues by simplifying the logic grabbing
the connection information, removing a set of getenv() calls that
emulated what libpq already does. The faulty debug log now directly
uses the information from the libpq connection, and it gets generated
after the connection to the backend is completed, not before it (in the
event of a failure libpq would complain with more information about the
connection attempt so the log is not really useful before anyway).
Author: Kota Miyake
Reviewed-by: Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/026b3ae6fc339a18394d053c32a4463d@oss.nttdata.com
These can be set in buildenv.pl or a "set" command within a Windows
terminal. The MSVC script vcregress.pl parses the values available in
the environment to build the resulting prove commands, and the parsing
of PROVE_TESTS is able to handle name patterns in the same way as other
platforms.
Not specifying those environment values makes vcregress.pl fall back to
the previous default, with no extra flags for the prove command, and all
the tests run within t/.
Author: Michael Paquier
Reviewed-by: Juan José Santamaría Flecha, Julien Rouhaud
Discussion: https://postgr.es/m/YD9GigwHoL6lFY2y@paquier.xyz
Under windows, psql processes need to be ended explicitly, or the TAP
tests hang. However, the recovery tests were doing this via
IPC::Run::kill_kill(), which causes other major problems on Windows.
We solve this by instead sending '\q' to psql so it quits of its own
accord, and then simply waiting for it. This means we can now run almost
all the recovery tests on all Windows platforms.
Discussion: https://postgr.es/m/20210301200715.tdjpuesfzebpffgn@alap3.anarazel.de
Mistake in f06b1c598254f8adb2b7f51d6a7685618a7fb121: We should only
check the version of the binaries in the target installation. The
source installation can of course be of a different version.
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/E1lHNKN-0005IC-V6%40gemulon.postgresql.org
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.
The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.
Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
Feed the desired command to psql via "-c" not stdin, else Perl
may complain that it can't push stdin to an already-failed psql
process, as seen in intermittent buildfarm failures.
Make some minor cosmetic improvements while at it.
Before commit ee28cacf6 we had no tests here that expected failure
to connect, so there seems no need for a back-patch.
Discussion: https://postgr.es/m/CALDaNm2mo8YED=M2ZJKGf1U3F3mw6SaQuLXWCK8rZP6sECYcrA@mail.gmail.com
This expands the binary validation in pg_upgrade with a version
check per binary to ensure that the target cluster installation
only contains binaries from the target version.
In order to reduce duplication, validate_exec is exported from
port.h and the local copy in pg_upgrade is removed.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/9328.1552952117@sss.pgh.pa.us
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.
Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.
Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
In addition to the existing options of "any" and "read-write", we
now support "read-only", "primary", "standby", and "prefer-standby".
"read-write" retains its previous meaning of "transactions are
read-write by default", and "read-only" inverts that. The other
three modes test specifically for hot-standby status, which is not
quite the same thing. (Setting default_transaction_read_only on
a primary server renders it read-only to this logic, but not a
standby.)
Furthermore, if talking to a v14 or later server, no extra network
round trip is needed to detect the session's status; the GUC_REPORT
variables delivered by the server are enough. When talking to an
older server, a SHOW or SELECT query is issued to detect session
read-only-ness or server hot-standby state, as needed.
Haribabu Kommi, Greg Nancarrow, Vignesh C, Tom Lane; reviewed at
various times by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.
Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
This option provides REINDEX (TABLESPACE) for reindexdb, applying the
tablespace value given by the caller to all the REINDEX queries
generated.
While on it, this commit adds some tests for REINDEX TABLESPACE, with
and without CONCURRENTLY, when run on toast indexes and tables. Such
operations are not allowed, and toast relation names are not stable
enough to be part of the main regression test suite (even if using a PL
function with a TRY/CATCH logic, as CONCURRENTLY could not be tested).
Author: Michael Paquier
Reviewed-by: Mark Dilger, Daniel Gustafsson
Discussion: https://postgr.es/m/YDiaDMnzLICqeukl@paquier.xyz
Adjust some "can't happen" error messages that assumed that the page
deletion target page must be a half-dead page. This assumption was
wrong in the case of an internal target page. Simply refer to these
pages as the target page instead.
Internal pages are never marked half-dead. There is exactly one
half-dead page for each subtree undergoing deletion. The half-dead page
is also the target subtree's leaf-level page. This has been the case
since commit efada2b8, which totally overhauled nbtree page deletion.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers. (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.
Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.
Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
On Windows, CMD.EXE allegedly does not run a command that uses forward slashes,
so let's convert the path to use backslashes instead.
Backpatch to 10.
Author: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CAMm1aWaNDuaPYFYMAqDeJrZmPtNvLcJRS++CcZWY8LT6KcoBZw@mail.gmail.com
This extends the changes made in commit cebc1d34e, teaching
parseqatom() to generate fewer or cheaper subre nodes in some edge
cases. The case of interest here is a quantified atom that is "messy"
only because it has greediness opposite to what preceded it (whereas
captures and backrefs are intrinsically messy). In this case we don't
need an iteration node, since we don't care where the sub-matches of
the quantifier are; and we might also not need a second concatenation
node. This seems of only marginal real-world use according to my
testing, but I wanted to get it in before wrapping up this series of
regex performance fixes.
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched. If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.
In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child. It can help quite a
lot if the tree looks like
concat
/ \
capture concat
/ \
expensive stuff backref
as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing. This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior). Fortunately, right-deep concat trees
are the common case.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
POSIX defines the behavior of back-references thus:
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression
enclosed between "\(" and "\)" preceding the '\n'.
As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression. However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA. If POSIX intended
that, you'd think they'd mention it. Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.
Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied. This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.
Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes. Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
The same test for REINDEX (VERBOSE) was done twice, while it is clear
that the second test should use --concurrently. Issue introduced in
5dc92b8, for what looks like a copy-paste mistake.
Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/A7AE97EA-F4B0-4CAB-8FFF-3FECD31F9D63@enterprisedb.com
Backpatch-through: 12
The same code pattern was repeated twice to enable or disable ROW LEVEL
SECURITY with an ALTER TABLE command. This makes the code slightly
cleaner.
Author: Justin Pryzby
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was. Simplify the logic a bit. Add test cases.
Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself
Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
The psql processes were not explicitly killed (but would eventually
exit due postgres shutting down). For some reason windows perl doesn't
like that, resulting in errors like
Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction.
The test was introduced in d6734a897e3, so no backpatching necessary.
In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consistent_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.
This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.
Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
This avoids the need to set up and tear down a fresh WaitEventSet every
time we need need to wait. We have to add an explicit exit on
postmaster exit (FeBeWaitSet isn't set up to do that automatically), so
move the code to do that into a new function to avoid repetition.
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
Commit a271a1b50e introduced decoding at prepare time in ReorderBuffer.
This can lead to deadlock for out-of-core logical replication solutions
that uses this feature to build distributed 2PC in case such transactions
lock [user] catalog tables exclusively. They need to inform users to not
have locks on catalog tables (via explicit LOCK command) in such
transactions.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.
This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe. Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups. Move that last use over to a
new dedicated signal. SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that facility.
The signal handler is now installed in all postmaster children by
InitializeLatchSupport(). Those wishing to disconnect from it should
call ShutdownLatchSupport().
Future patches will use this separation of signals to avoid the need for
a signal handler on some operating systems.
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
Commit 82ebbeb0 added a workaround for systems with no epoll_create1()
and EPOLL_CLOEXEC. Linux < 2.6.27 and glibc < 2.9 are long gone. Now
seems like a good time to drop the extra code, because otherwise we'd
have to add similar already-dead workaround code to new patches using
XXX_CLOEXEC flags that arrived in the same kernel release.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com
AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.
Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4b9.
Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com
This adds a new executor node named TID Range Scan. The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column. These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.
To support this, two new optional callback functions have been added to
table AM. scan_set_tidrange is used to set the scan range to just the
given range of TIDs. scan_getnextslot_tidrange fetches the next tuple
in the given range.
For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine. The query planner won't
generate TID Range Scan Paths in that case.
Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
The previous logic here created a separate pool of arcs for each
state, so that the out-arcs of each state were physically stored
within it. Perhaps this choice was driven by trying to not include
a "from" pointer within each arc; but Spencer gave up on that idea
long ago, and it's hard to see what the value is now. The approach
turns out to be fairly disastrous in terms of memory consumption,
though. In the first place, NFAs built by this engine seem to have
about 4 arcs per state on average, with a majority having only one
or two out-arcs. So pre-allocating 10 out-arcs for each state is
already cause for a factor of two or more bloat. Worse, the NFA
optimization phase moves arcs around with abandon. In a large NFA,
some of the states will have hundreds of out-arcs, so towards the
end of the optimization phase we have a significant number of states
whose arc pools have room for hundreds of arcs each, even though only
a few of those arcs are in use. We have seen real-world regexes in
which this effect bloats the memory requirement by 25X or even more.
Hence, get rid of the per-state arc pools in favor of a single arc
pool for the whole NFA, with variable-sized allocation batches
instead of always asking for 10 at a time. While we're at it,
let's batch the allocations of state structs too, to further reduce
the malloc traffic.
This incidentally allows moveouts() to be optimized in a similar
way to moveins(): when moving an arc to another state, it's now
valid to just re-link the same arc struct into a different outchain,
where before the code invariants required us to make a physically
new arc and then free the old one.
These changes reduce the regex compiler's typical space consumption
for average-size regexes by about a factor of two, and much more for
large or complicated regexes. In a large test set of real-world
regexes, we formerly had half a dozen cases that failed with "regular
expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
limit (about 150MB); we would have had to raise that limit to
something close to 400MB to make them work with the old code. Now,
none of those cases need more than 13MB to compile. Furthermore,
the test set is about 10% faster overall due to less malloc traffic.
Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us
makeDependencyGraphWalker and checkWellFormedRecursionWalker
thought they could hold onto a pointer to a list's first
cons cell while the list was modified by recursive calls.
That was okay when the cons cell was actually separately
palloc'd ... but since commit 1cff1b95a, it's quite unsafe,
leading to core dumps or incorrect complaints of faulty
WITH nesting.
In the field this'd require at least a seven-deep WITH nest
to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
allows the bug to be seen with lesser nesting depths.
Per bug #16801 from Alexander Lakhin. Back-patch to v13.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org
Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
operation -- these are newly deleted pages. VACUUM VERBOSE continues to
report on the total number of deleted pages in the entire index (no
change there). The former is a subset of the latter.
The distinction between each category of deleted index page only arises
with index AMs where page deletion is supported and is decoupled from
page recycling for performance reasons.
This is follow-up work to commit e5d8a999, which made nbtree store
64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
deleted. Note that the btm_last_cleanup_num_delpages metapage field
added by that commit usually gets set to pages_newly_deleted. The
exceptions (the scenarios in which they're not equal) all seem to be
tricky cases for the implementation (of page deletion and recycling) in
general.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com
We aren't publishing this file as documentation, and it's been
much more haphazardly maintained than the real docs in func.sgml,
so let's just drop it. I think the only reason I included it in
commit 7bcc6d98f was that the Berkeley-era sources had had a man
page in this directory.
Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us
Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.
This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists. It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.
The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]]. (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
The complement-class escapes \D, \S, \W are now allowed within
bracket expressions. There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.
Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets. POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.
One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".
Some implementation cleanup along the way:
* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.
* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.
* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does? Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.
Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely. Commit 6655a729 did the same thing for deleted pages in
GiST indexes. That work was used as a starting point here.
Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages. There is no longer any
reason to care about that condition/the oldest XID. It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.
The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).
The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much. We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever. We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).
These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index. All that matters now is keeping the costs and
benefits in balance over time.
Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored. We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().
A backpatchable fix cannot easily be extracted from this commit. A
targeted fix for the issue will follow in a later commit, though that
won't happen today.
I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users. The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future. Perhaps some passing mention of page
deletion will be added back at the same time.
Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
d2d8a229bc introduced a new function generate_useful_gather_paths to
be used as a replacement for generate_gather_paths, but forgot to update
a couple of places that referenced the older function.
This is possibly not 100% complete (ref. create_ordered_paths), but it's
better than not changing anything.
Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local
Commit 866e24d47d added an assert that HEAP_XMAX_LOCK_ONLY and
HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that
the latter necessarily referred to an update and not a tuple lock; but
that's wrong, because SELECT FOR UPDATE can use precisely that
combination, as evidenced by the amcheck test case added here.
Remove the Assert(), and also patch amcheck's verify_heapam.c to not
complain if the combination is found. Also, out of overabundance of
caution, update (across all branches) README.tuplock to be more explicit
about this.
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/20210124061758.GA11756@nol
gcc 10 is smart enough to notice that control could reach this
"hasmatch[depth]" assignment with depth < 0, but not smart enough
to know that that would require a badly broken NFA graph. Change
the assert() to a plain runtime test to shut it up.
Per report from Andres Freund.
Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction. (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).
This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
We don't have anything to decode in a transaction if ReorderBufferTXN
doesn't exist by the time we decode the commit prepared. So don't create a
new ReorderBufferTXN here. This is an oversight in commit a271a1b5.
Reported-by: Markus Wanner
Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch
The authentication failure error message wasn't distinguishing whether
it is a physical replication or logical replication connection failure and
was giving incomplete information on what led to failure in case of logical
replication connection.
Author: Paul Martinez and Amit Kapila
Reviewed-by: Euler Taveira and Amit Kapila
Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com
Pavan Deolasee recently noted that a few of the
HeapTupleHeaderIndicatesMovedPartitions calls added by commit
5db6df0c01 are useless, since they are done after comparing t_self
with t_ctid. But because t_self can never be set to the magical values
that indicate that the tuple moved partition, this can never succeed: if
the first test fails (so we know t_self equals t_ctid), necessarily the
second test will also fail.
So these checks can be removed and no harm is done. There's no bug
here, just a code legibility issue.
Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com>
Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration. Merge them.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
Commit 46d6e5f567 added the atomic variable "waitStart" into PGPROC struct,
to store the time at which wait for lock acquisition started. Previously
this variable was initialized every time each backend started. Instead,
this commit makes postmaster initialize it at the startup, to ensure that
the variable should be initialized before any use of it.
This commit also moves the code to initialize "waitStart" variable for
prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts().
Because MarkAsPreparingGuts() is more proper place to do that since
it initializes other PGPROC variables.
Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com
For the error message "every hash partition modulus must be a factor
of the next larger modulus", add a detail message that shows the
particular numbers and existing partition involved. Also comment the
code more.
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com
This commit changes one code path in REINDEX INDEX and one code path
in CREATE INDEX CONCURRENTLY to report the progress of each operation
using pgstat_progress_update_multi_param() rather than
multiple calls to pgstat_progress_update_param(). This has the
advantage to make the progress report more consistent to the end-user
without impacting the amount of information provided.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com
Coverity complained that functions in regexec.c might leak DFA
storage. It's wrong, but this logic is confusing enough that it's
not so surprising Coverity couldn't make sense of it. Rewrite
in hopes of making it more legible to humans as well as machines.
Previously, each pair of capturing parentheses gave rise to a separate
subre tree node, whose only function was to identify that we ought to
capture the match details for this particular sub-expression. In
most cases we don't really need that, since we can perfectly well
put a "capture this" annotation on the child node that does the real
matching work. As with the two preceding commits, the main value
of this is to avoid generating and optimizing an NFA for a tree node
that's not really pulling its weight.
The chosen data representation only allows one capture annotation
per subre node. In the legal-per-spec, but seemingly not very useful,
case where there are multiple capturing parens around the exact same
bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
nodes that act the same as before. We could work harder at that but
I'll refrain, pending some evidence that such cases are worth troubling
over.
In passing, improve the comments in regex.h to say what all the
different re_info bits mean. Some of them were pretty obvious
but others not so much, so reverse-engineer some documentation.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
Instead of having left and right child links in subre structs,
have a single child link plus a sibling link. Multiple children
of a tree node are now reached by chasing the sibling chain.
The beneficiary of this is alternation tree nodes. A regular
expression with N (>1) branches is now represented by one alternation
node with N children, rather than a tree that includes N alternation
nodes as well as N children. While the old representation didn't
really cost anything extra at execution time, it was pretty horrid
for compilation purposes, because each of the alternation nodes had
its own NFA, which we were too stupid not to separately optimize.
(To make matters worse, all of those NFAs described the entire
alternation pattern, not just the portion of it that one might
expect from the tree structure.)
We continue to require concatenation nodes to have exactly two
children. This data structure is now prepared to support more,
but the executor's logic would need some careful redesign, and
it's not clear that a lot of benefit could be had.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
The comment for parsebranch() claims that it avoids generating
unnecessary concatenation nodes in the "subre" tree, but it missed
some significant cases. Once we've decided that a given atom is
"messy" and can't be bundled with the preceding atom(s) of the
current regex branch, parseqatom() always generated two new concat
nodes, one to concat the messy atom to what follows it in the branch,
and an upper node to concatenate the preceding part of the branch
to that one. But one or both of these could be unnecessary, if the
messy atom is the first, last, or only one in the branch. Improve
the code to suppress such useless concat nodes, along with the
no-op child nodes representing empty chunks of a branch.
Reducing the number of subre tree nodes offers significant savings
not only at execution but during compilation, because each subre node
has its own NFA that has to be separately optimized. (Maybe someday
we'll figure out how to share the optimization work across multiple
tree nodes, but it doesn't look easy.) Eliminating upper tree nodes
is especially useful because they tend to have larger NFAs.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
This builds on the previous "rainbow" patch to detect NFAs that will
match any string, though possibly with constraints on the string length.
This definition is chosen to match constructs such as ".*", ".+", and
".{1,100}". Recognizing such an NFA after the optimization pass is
fairly cheap, since we basically just have to verify that all arcs
are RAINBOW arcs and count the number of steps to the end state.
(Well, there's a bit of complication with pseudo-color arcs for string
boundary conditions, but not much.)
Once we have these markings, the regex executor functions longest(),
shortest(), and matchuntil() don't have to expend per-character work
to determine whether a given substring satisfies such an NFA; they
just need to check its length against the bounds. Since some matching
problems require O(N) invocations of these functions, we've reduced
the runtime for an N-character string from O(N^2) to O(N). Of course,
this is no help for non-matchall sub-patterns, but those usually have
constraints that allow us to avoid needing O(N) substring checks in the
first place. It's precisely the unconstrained "match-all" cases that
cause the most headaches.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
Some regular expression constructs, most notably the "." match-anything
metacharacter, produce a sheaf of parallel NFA arcs covering all
possible colors (that is, character equivalence classes). We can make
a noticeable improvement in the space and time needed to process large
regexes by replacing such cases with a single arc bearing the special
color code "RAINBOW". This requires only minor additional complication
in places such as pull() and push().
Callers of pg_reg_getoutarcs() must now be prepared for the possibility
of seeing a RAINBOW arc. For the one known user, contrib/pg_trgm,
that's a net benefit since it cuts the number of arcs to be dealt with,
and the handling isn't any different than for other colors that contain
too many characters to be dealt with individually.
This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.
Patch by me, reviewed by Joel Jacobson
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
When ON_ERROR_ROLLBACK is enabled, psql releases a temporary savepoint
if it's idle in a valid transaction block after executing a query. But psql
doesn't do that after RELEASE or ROLLBACK is executed because a temporary
savepoint has already been destroyed in that case.
This commit changes psql's ON_ERROR_ROLLBACK so that it doesn't release
a temporary savepoint also when COMMIT AND CHAIN is executed. A temporary
savepoint doesn't need to be released in that case because
COMMIT AND CHAIN also destroys any savepoints defined within the transaction
to commit. Otherwise psql tries to release the savepoint that
COMMIT AND CHAIN has already destroyed and cause an error
"ERROR: savepoint "pg_psql_temporary_savepoint" does not exist".
Back-patch to v12 where transaction chaining was added.
Reported-by: Arthur Nascimento
Author: Arthur Nascimento
Reviewed-by: Fujii Masao, Vik Fearing
Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
This commit fixes COMMIT AND CHAIN command so that it starts new transaction
immediately even if savepoints are defined within the transaction to commit.
Previously COMMIT AND CHAIN command did not in that case because
commit 280a408b48 forgot to make CommitTransactionCommand() handle
a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT.
Also this commit adds the regression test for COMMIT AND CHAIN command
when savepoints are defined.
Back-patch to v12 where transaction chaining was added.
Reported-by: Arthur Nascimento
Author: Fujii Masao
Reviewed-by: Arthur Nascimento, Vik Fearing
Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
Consolidate discussion of how VACUUM places pages in the FSM for
recycling by adding a new section that comes after discussion of page
deletion. This structure reflects the fact that page recycling is
explicitly decoupled from page deletion in Lanin & Shasha's paper. Page
recycling in nbtree is an implementation of what the paper calls "the
drain technique".
This decoupling is an important concept for nbtree VACUUM. Searchers
have to detect and recover from concurrent page deletions, but they will
never have to reason about concurrent page recycling. Recycling can
almost always be thought of as a low level garbage collection operation
that asynchronously frees the physical space that backs a logical tree
node. Almost all code need only concern itself with logical tree nodes.
(Note that "logical tree node" is not currently a term of art in the
nbtree code -- this all works implicitly.)
This is preparation for an upcoming patch that teaches nbtree VACUUM to
remember the details of pages that it deletes on the fly, in local
memory. This enables the same VACUUM operation to consider placing its
own deleted pages in the FSM later on, when it reaches the end of
btvacuumscan().
While poking at the regex code, I happened to notice that the bug
squashed in commit afcc8772e had a sibling: next() failed to return
a specific value associated with the '}' token for a "\{m,n\}"
quantifier when parsing in basic RE mode. Again, this could result
in treating the quantifier as non-greedy, which it never should be in
basic mode. For that to happen, the last character before "\}" that
sets "nextvalue" would have to set it to zero, or it'd have to have
accidentally been zero from the start. The failure can be provoked
repeatably with, for example, a bound ending in digit "0".
Like the previous patch, back-patch all the way.
Commit 2c8dd05d6c added the atomic variable writtenUpto into
walreceiver's shared memory information. It's initialized only
when walreceiver started up but could be read via pg_stat_wal_receiver
view anytime, i.e., even before it's initialized. In the server built
with --disable-atomics and --disable-spinlocks, this uninitialized
atomic variable read could cause "invalid spinlock number: 0" error.
This commit changed writtenUpto so that it's initialized at
the postmaster startup, to avoid the uninitialized variable read
via pg_stat_wal_receiver and fix the error.
Also this commit moved the read of writtenUpto after the release
of spinlock protecting walreceiver's shared variables. This is
necessary to prevent new spinlock from being taken by atomic
variable read while holding another spinlock, and to shorten
the spinlock duration. This change leads writtenUpto not to be
consistent with the other walreceiver's shared variables protected
by a spinlock. But this is OK because writtenUpto should not be
used for data integrity checks.
Back-patch to v13 where commit 2c8dd05d6c introduced the bug.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com
Add another method to specify CRLs, hashed directory method, for both
server and client side. This offers a means for server or libpq to
load only CRLs that are required to verify a certificate. The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM. This
progression is a lot more natural.
Also tweak the wording a little. It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
A cross-partition update on a partitioned table is implemented as a
delete followed by an insert. With foreign partitions, this was however
causing issues, because the FDW and core may disagree on when to enable
batching. postgres_fdw was only allowing batching for plain inserts
(CMD_INSERT) while core was trying to batch the insert component of the
cross-partition update. Fix by restricting core to apply batching only
to plain CMD_INSERT queries.
It's possible to allow batching for cross-partition updates, but that
will require more extensive changes, so better to leave that for a
separate patch.
Author: Amit Langote
Reviewed-by: Tomas Vondra, Takayuki Tsunakawa
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
Push some hopefully-uncontroversial bits extracted from an upcoming
patch series, to remove non-relevant clutter from the main patches.
In compact(), return immediately after setting REG_ASSERT error;
continuing the loop would just lead to assertion failure below.
(Ask me how I know.)
In parseqatom(), remove assertion that moresubs() did its job.
When moresubs actually did its job, this is redundant with that
function's final assert; but when it failed on OOM, this is an
assertion crash. We could avoid the crash by adding a NOERR()
check before the assertion, but it seems better to subtract code
than add it. (Note that there's a NOERR exit a few lines further
down, and nothing else between here and there requires moresubs
to have succeeded. So we don't really need an extra error exit.)
This is a live bug in assert-enabled builds, but given the very
low likelihood of OOM in moresub's tiny allocation, I don't think
it's worth back-patching.
On the other hand, it seems worthwhile to add an assertion that
our intended v->subs[subno] target is still null by the time we
are ready to insert into it, since there's a recursion in between.
In pg_regexec, ensure we fflush any debug output on the way out,
and try to make MDEBUG messages more uniform and helpful. (In
particular, ensure that all of them are prefixed with the subre's
id number, so one can match up entry and exit reports.)
Add some test cases in test_regex to improve coverage of lookahead
and lookbehind constraints. Adding these now is mainly to establish
that this is indeed the existing behavior.
Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
Several information schema views track dependencies between
functions/procedures and objects used by them. These had not been
implemented so far because PostgreSQL doesn't track objects used in a
function body. However, formally, these also show dependencies used
in parameter default expressions, which PostgreSQL does support and
track. So for the sake of completeness, we might as well add these.
If dependency tracking for function bodies is ever implemented, these
views will automatically work correctly.
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
ALTER INDEX was able to handle that already. This adds tab completion
for all the remaining commands that support this grammar:
- ALTER FUNCTION
- ALTER PROCEDURE
- ALTER ROUTINE
- ALTER TRIGGER
- ALTER MATERIALIZED VIEW
Author: Ian Lawrence Barwick
Discussion: https://postgr.es/m/CAB8KJ=iypYudXuMOAMOP4BpkaYbXxk=a2cdJppX0e9mJXWtuig@mail.gmail.com
Commit 2f2007fbb did this partially, but there were two remaining
warts. checkcondition_gin handled some uncertain cases by setting
the out-of-band recheck flag, some by returning TS_MAYBE, and some
by doing both. Meanwhile, TS_execute arbitrarily converted a
TS_MAYBE result to TS_YES. Thus, if checkcondition_gin chose to
only return TS_MAYBE, the outcome would be TS_YES with no recheck
flag, potentially resulting in wrong query outputs.
The case where this'd happen is if there were GIN_MAYBE entries
in the indexscan results passed to gin_tsquery_[tri]consistent,
which so far as I can see would only happen if the tidbitmap used
to accumulate indexscan results grew large enough to become lossy.
I initially thought of fixing this by ensuring we always set the
recheck flag as well as returning TS_MAYBE in uncertain cases.
But that errs in the other direction, potentially forcing rechecks
of rows that provably match the query (since the recheck flag
remains set even if TS_execute later finds that the answer must be
TS_YES). Instead, let's get rid of the out-of-band recheck flag
altogether and rely on returning TS_MAYBE. This requires exporting
a version of TS_execute that will actually return the full ternary
result of the evaluation ... but we likely should have done that
to start with.
Unfortunately it doesn't seem practical to add a regression test case
that covers this: the amount of data needed to cause the GIN bitmap to
become lossy results in a longer runtime than I think we want to have
in the tests. (I'm wondering about allowing smaller work_mem settings
to ameliorate that, but it'd be a matter for a separate patch.)
Per bug #16865 from Dimitri Nüscheler. Back-patch to v13 where
the faulty commit came in.
Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org
This issue exists from the inception of this code (PG-10) but got exposed
by the recent commit ce0fdbfe97 where we are using origins in tablesync
workers. The problem was that we were sometimes sending the prepare_write
('w') message but then the actual message was not being sent and on the
subscriber side, we always expect a message after prepare_write message
which led to this bug.
I refrained from backpatching this because there is no way in the core
code to hit this prior to commit ce0fdbfe97 and we haven't received any
complaints so far.
Reported-by: Erik Rijkers
Author: Amit Kapila and Vignesh C
Tested-by: Erik Rijkers
Discussion: https://postgr.es/m/1295168140.139428.1613133237154@webmailclassic.xs4all.nl