Session statistics, as introduced by 960869da08, had several shortcomings:
- an additional GetCurrentTimestamp() call that also impaired the accuracy of
the data collected
This can be avoided by passing the current timestamp we already have in
pgstat_report_stat().
- an additional statistics UDP packet sent every 500ms
This is solved by adding the new statistics to PgStat_MsgTabstat.
This is conceptually ugly, because session statistics are not
table statistics. But the struct already contains data unrelated
to tables, so there is not much damage done.
Connection and disconnection are reported in separate messages, which
reduces the number of additional messages to two messages per session and a
slight increase in PgStat_MsgTabstat size (but the same number of table
stats fit).
- Session time computation could overflow on systems where long is 32 bit.
Reported-By: Andres Freund <andres@anarazel.de>
Author: Andres Freund <andres@anarazel.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/20210801205501.nyxzxoelqoo4x2qc%40alap3.anarazel.de
Backpatch: 14-, where the feature was introduced.
ProcArrayGroupClearXid function has a parameter named "proc",
but the same name was used for its local variables. This commit fixes
this variable shadowing, to improve code readability.
Back-patch to all supported versions, to make future back-patching
easy though this patch is classified as refactoring only.
Reported-by: Ranier Vilela
Author: Ranier Vilela, Aleksander Alekseev
https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
All size_t variables declared in procarray.c are actually int ones.
Let's use int instead of size_t for those variables. Which would
reduce Wsign-compare compiler warnings.
Back-patch to v14 where commit 941697c3c1 added size_t variables
in procarray.c, to make future back-patching easy though
this patch is classified as refactoring only.
Reported-by: Ranier Vilela
Author: Ranier Vilela, Aleksander Alekseev
https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
It's possible to execute user-defined SQL in some background processes;
for example, logical replication workers can fire triggers. This opens
the possibility that someone would try to execute LISTEN in such a
context. But since only regular backends ever call
ProcessNotifyInterrupt, no messages would actually be received, and
thus the registered listener would simply prevent the message queue
from being cleaned. Eventually NOTIFY would stop working, which is bad.
Perhaps someday somebody will invent infrastructure to make listening
in a background worker actually useful. In the meantime, forbid it.
Back-patch to v13, which is where we introduced the MyBackendType
variable. It'd be a lot harder to implement the check without that,
and it doesn't seem worth the trouble.
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
It's possible for us to copy an AlternativeSubPlan expression node
into multiple places, for example the scan quals of several
partition children. Then it's possible that we choose a different
one of the alternatives as optimal in each place. Commit 41efb8340
failed to consider this scenario, so its attempt to remove "unused"
subplans could remove subplans that were still used elsewhere.
Fix by delaying the removal logic until we've examined all the
AlternativeSubPlans in a given query level. (This does assume that
AlternativeSubPlans couldn't get copied to other query levels, but
for the foreseeable future that's fine; cf qual_is_pushdown_safe.)
Per report from Rajkumar Raghuwanshi. Back-patch to v14
where the faulty logic came in.
Discussion: https://postgr.es/m/CAKcux6==O3NNZC3bZ2prRYv3cjm3_Zw1GfzmOjEVqYN4jub2+Q@mail.gmail.com
If an allocation failed within LLVM it is not safe to call back into LLVM as
LLVM is not generally safe against exceptions / stack-unwinding. Thus errors
while in LLVM code are promoted to FATAL. However llvm_shutdown() did call
back into LLVM even in such cases, while llvm_release_context() was careful
not to do so.
We cannot generally skip shutting down LLVM, as that can break profiling. But
it's OK to do so if there was an error from within LLVM.
Reported-By: Jelte Fennema <Jelte.Fennema@microsoft.com>
Author: Andres Freund <andres@anarazel.de>
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/AM5PR83MB0178C52CCA0A8DEA0207DC14F7FF9@AM5PR83MB0178.EURPRD83.prod.outlook.com
Backpatch: 11-, where jit was introduced
In d9d8aa9bb9 I added a defensive NULL assignment to protect against a
not-too-smart compiler warning about unitialized variable use after the
switch. Unfortunately I only did so on master and forgot to adjust that for
14.
Stephen noticed that there actually is a compiler warning :(.
Reported-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/20210827224639.GX17906@tamriel.snowman.net
While processing toast changes in logical decoding, we rejigger the
tuple change to point to in-memory toast tuples instead to on-disk toast
tuples. And, to make sure the memory accounting is correct, we were
subtracting the old change size and then after re-computing the new tuple,
re-adding its size at the end. Now, if there is any error before we add
the new size, we will release the changes and that will update the
accounting info (subtracting the size from the counters). And we were
underflowing there which leads to an assertion failure in assert enabled
builds and wrong memory accounting in reorder buffer otherwise.
Author: Bertrand Drouvot
Reviewed-by: Amit Kapila
Backpatch-through: 13, where memory accounting was introduced
Discussion: https://postgr.es/m/92b0ee65-b8bd-e42d-c082-4f3f4bf12d34@amazon.com
If search_start is greater than the length of the string, we should just
return REG_NOMATCH immediately. (Note that the equality case should
*not* be rejected, since the pattern might be able to match zero
characters.) This guards various internal assumptions that the min of a
range of string positions is not more than the max. Violation of those
assumptions could allow an attempt to fetch string[search_start-1],
possibly causing a crash.
Jaime Casanova pointed out that this situation is reachable with the
new regexp_xxx functions that accept a user-specified start position.
I don't believe it's reachable via any in-core call site in v14 and
below. However, extensions could possibly call pg_regexec with an
out-of-range search_start, so let's back-patch the fix anyway.
Discussion: https://postgr.es/m/20210911180357.GA6870@ahch-to
We have long forbidden fetching backwards from a NO SCROLL cursor,
but the prohibition didn't extend to cases in which we rewind the
query altogether and then re-fetch forwards. I think the reason is
that this logic was mainly meant to protect plan nodes that can't
be run in the reverse direction. However, re-reading the query output
is problematic if the query is volatile (which includes SELECT FOR
UPDATE, not just queries with volatile functions): the re-read can
produce different results, which confuses the cursor navigation logic
completely. Another reason for disliking this approach is that some
code paths will either fetch backwards or rewind-and-fetch-forwards
depending on the distance to the target row; so that seemingly
identical use-cases may or may not draw the "cursor can only scan
forward" error. Hence, let's clean things up by disallowing rewind
as well as fetch-backwards in a NO SCROLL cursor.
Ordinarily we'd only make such a definitional change in HEAD, but
there is a third reason to consider this change now. Commit ba2c6d6ce
created some new user-visible anomalies for non-scrollable cursors
WITH HOLD, in that navigation in the cursor result got confused if the
cursor had been partially read before committing. The only good way
to resolve those anomalies is to forbid rewinding such a cursor, which
allows removal of the incorrect cursor state manipulations that
ba2c6d6ce added to PersistHoldablePortal.
To minimize the behavioral change in the back branches (including
v14), refuse to rewind a NO SCROLL cursor only when it has a holdStore,
ie has been held over from a previous transaction due to WITH HOLD.
This should avoid breaking most applications that have been sloppy
about whether to declare cursors as scrollable. We'll enforce the
prohibition across-the-board beginning in v15.
Back-patch to v11, as ba2c6d6ce was.
Discussion: https://postgr.es/m/3712911.1631207435@sss.pgh.pa.us
Some plan node types don't react well to being called again after
they've already returned NULL. PortalRunSelect() has long dealt
with this by calling the executor with NoMovementScanDirection
if it sees that we've already run the portal to the end. However,
commit ba2c6d6ce overlooked this point, so that persisting an
already-fully-fetched cursor would fail if it had such a plan.
Per report from Tomas Barton. Back-patch to v11, as the faulty
commit was. (I've omitted a test case because the type of plan
that causes a problem isn't all that stable.)
Discussion: https://postgr.es/m/CAPV2KRjd=ErgVGbvO2Ty20tKTEZZr6cYsYLxgN_W3eAo9pf5sw@mail.gmail.com
We don't allow relations to exceed 2^32-1 blocks, because block
numbers are 32 bits and the last possible block number is reserved
to mean InvalidBlockNumber. There is a check for this in mdextend,
but that's really way too late, because the smgr API requires us to
create a buffer for the block-to-be-added, and we do not want to
have any buffer with blocknum InvalidBlockNumber. (Such a case
can trigger assertions in bufmgr.c, plus I think it might confuse
ReadBuffer's logic for data-past-EOF later on.) So put the check
into ReadBuffer.
Per report from Christoph Berg. It's been like this forever,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/YTn1iTkUYBZfcODk@msg.credativ.de
Previously, walreceiver always closed the currently-opened WAL segment
and created its archive notification file, after it finished writing
the current segment up and received any WAL data that should be
written into the next segment. If walreceiver exited just before
any WAL data in the next segment arrived at standby, it did not
create the archive notification file of the current segment
even though that's known completed. This behavior could cause
WAL archiving of the segment to be delayed until subsequent
restartpoints or checkpoints created its notification file.
To fix the issue, this commit changes walreceiver so that it creates
an archive notification file of a current WAL segment immediately
if that's known completed before receiving next WAL data.
Back-patch to all supported branches.
Reported-by: Kyotaro Horiguchi
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20200630.165503.1465894182551545886.horikyota.ntt@gmail.com
If we copy data-modifying CTEs from the original query to a replacement
query (from a DO INSTEAD rule), we must set hasModifyingCTE properly
in the replacement query. Failure to do this can cause various
unpleasantness, such as unsafe usage of parallel plans. The code also
neglected to propagate hasRecursive, though that's only cosmetic at
the moment.
A difficulty arises if the rule action is an INSERT...SELECT. We
attach the original query's RTEs and CTEs to the sub-SELECT Query, but
data-modifying CTEs are only allowed to appear in the topmost Query.
For the moment, throw an error in such cases. It would probably be
possible to avoid this error by attaching the CTEs to the top INSERT
Query instead; but that would require a bunch of new code to adjust
ctelevelsup references. Given the narrowness of the use-case, and
the need to back-patch this fix, it does not seem worth the trouble
for now. We can revisit this if we get field complaints.
Per report from Greg Nancarrow. Back-patch to all supported branches.
(The test case added here does not fail before v10, but there are
plenty of places checking top-level hasModifyingCTE in 9.6, so I have
no doubt that this code change is necessary there too.)
Greg Nancarrow and Tom Lane
Discussion: https://postgr.es/m/CAJcOf-f68DT=26YAMz_i0+Au3TcLO5oiHY5=fL6Sfuits6r+_w@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
Commit 01e658fa74 added hash support for row types. This also added
support for hashing anonymous record types, using the same approach
that the type cache uses for comparison support for record types: It
just reports that it works, but it might fail at run time if a
component type doesn't actually support the operation. We get away
with that for comparison because most types support that. But some
types don't support hashing, so the current state can result in
failures at run time where the planner chooses hashing over sorting,
whereas that previously worked if only sorting was an option.
We do, however, want the record hashing support for path tracking in
recursive unions, and the SEARCH and CYCLE clauses built on that. In
that case, hashing is the only plan option. So enable that, this
commit implements the following approach: The type cache does not
report that hashing is available for the record type. This undoes
that part of 01e658fa74. Instead, callers that require hashing no
matter what can override that result themselves. This patch only
touches the callers to make the aforementioned recursive query cases
work, namely the parse analysis of unions, as well as the hash_array()
function.
Reported-by: Sait Talha Nisanci <sait.nisanci@microsoft.com>
Bug: #17158
Discussion: https://www.postgresql.org/message-id/flat/17158-8a2ba823982537a4%40postgresql.org
Updates/Deletes on a relation were allowed even without replica identity
after we define the publication for all tables. This would later lead to
an error on subscribers. The reason was that for such publications we were
not invalidating the relcache and the publication information for
relations was not getting rebuilt. Similarly, we were not invalidating the
relcache after dropping of such publications which will prohibit
Updates/Deletes without replica identity even without any publication.
Author: Vignesh C and Hou Zhijie
Reviewed-by: Hou Zhijie, Kyotaro Horiguchi, Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/CALDaNm0pF6zeWqCA8TCe2sDuwFAy8fCqba=nHampCKag-qLixg@mail.gmail.com
timetz_zone() delivered completely wrong answers if the zone was
specified by a dynamic TZ abbreviation, because it failed to account
for the difference between the POSIX conventions for field values in
struct pg_tm and the conventions used in PG-specific datetime code.
As a stopgap fix, just adjust the tm_year and tm_mon fields to match
PG conventions. This is fixed in a different way in HEAD (388e71af8)
but I don't want to back-patch the change of reference point.
Discussion: https://postgr.es/m/CAJ7c6TOMG8zSNEZtCn5SPe+cCk3Lfxb71ZaQwT2F4T7PJ_t=KA@mail.gmail.com
Attempting to make hashfloat4() look as much as possible like
hashfloat8(), I'd figured I could replace NaNs with get_float4_nan()
before widening to float8. However, results from protosciurus
and topminnow show that on some platforms that produces a different
bit-pattern from get_float8_nan(), breaking the intent of ce773f230.
Rearrange so that we use the result of get_float8_nan() for all NaN
cases. As before, back-patch.
Previously this was allowed, but the collation effectively vanished
into the ether because of the way lookup_collation() works: you could
not use the collation, nor even drop it. Seems better to give an
error up front than to leave the user wondering why it doesn't work.
(Because this test is in DefineCollation not CreateCollation, it does
not prevent pg_import_system_collations from creating ICU collations,
regardless of the initially-chosen encoding.)
Per bug #17170 from Andrew Bille. Back-patch to v10 where ICU support
was added.
Discussion: https://postgr.es/m/17170-95845cf3f0a9c36d@postgresql.org
We had a complaint that the postmaster fails to start if the invoking
program closes stdin. That happens because count_usable_fds expects
to be able to dup(0), and if it can't, we conclude there are no free
FDs and go belly-up. So far as I can find, though, there is no other
place in the server that touches stdin, and it's not unreasonable to
expect that a daemon wouldn't use that file.
As a simple improvement, let's dup FD 2 (stderr) instead. Unlike stdin,
it *is* reasonable for us to expect that stderr be open; even if we are
configured not to touch it, common libraries such as libc might try to
write error messages there.
Per gripe from Mario Emmenlauer. Given the lack of previous complaints,
I'm not excited about pushing this into stable branches, but it seems
OK to squeeze it into v14.
Discussion: https://postgr.es/m/48bafc63-c30f-3962-2ded-f2e985d93e86@emmenlauer.de
The IEEE 754 standard allows a wide variety of bit patterns for NaNs,
of which at least two ("NaN" and "-NaN") are pretty easy to produce
from SQL on most machines. This is problematic because our btree
comparison functions deem all NaNs to be equal, but our float hash
functions know nothing about NaNs and will happily produce varying
hash codes for them. That causes unexpected results from queries
that hash a column containing different NaN values. It could also
produce unexpected lookup failures when using a hash index on a
float column, i.e. "WHERE x = 'NaN'" will not find all the rows
it should.
To fix, special-case NaN in the float hash functions, not too much
unlike the existing special case that forces zero and minus zero
to hash the same. I arranged for the most vanilla sort of NaN
(that coming from the C99 NAN constant) to still have the same
hash code as before, to reduce the risk to existing hash indexes.
I dithered about whether to back-patch this into stable branches,
but ultimately decided to do so. It's a clear improvement for
queries that hash internally. If there is anybody who has -NaN
in a hash index, they'd be well advised to re-index after applying
this patch ... but the misbehavior if they don't will not be much
worse than the misbehavior they had before.
Per bug #17172 from Ma Liangzhu.
Discussion: https://postgr.es/m/17172-7505bea9e04e230f@postgresql.org
Until now, when defining extended statistics, everything except a plain
column reference was treated as complex expression. So for example "a"
was a column reference, but "(a)" would be an expression. In most cases
this does not matter much, but there were a couple strange consequences.
For example
CREATE STATISTICS s ON a FROM t;
would fail, because extended stats require at least two columns. But
CREATE STATISTICS s ON (a) FROM t;
would succeed, because that requirement does not apply to expressions.
Moreover, that statistics object is useless - the optimizer will always
use the regular statistics collected for attribute "a".
So do a bit more work to identify those expressions referencing a single
column, and translate them to a simple column reference. Backpatch to
14, where support for extended statistics on expressions was introduced.
Reported-by: Justin Pryzby
Backpatch-through: 14
Discussion: https://postgr.es/m/20210816013255.GS10479%40telsasoft.com
It doesn't make any sense to report this information, since VACUUM
VERBOSE reports on heap relation truncation directly. This was an
oversight in commit 7ab96cf6, which made VACUUM VERBOSE output a little
more consistent with nearby autovacuum-specific log output. Adjust
comments that describe how this is supposed to work in passing.
Also bring truncation-related VACUUM VERBOSE output in line with the
convention established for VACUUM VERBOSE output by commit f4f4a649.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch: 14-, where VACUUM VERBOSE's output changed.
The code printing expressions for extended statistics doubled the
parens, producing results like ((a+1)), which is unnecessary and not
consistent with how we print expressions elsewhere.
Fixed by tweaking the code to produce just a single set of parens.
Reported by Mark Dilger, fix by me. Backpatch to 14, where support for
extended statistics on expressions was added.
Reported-by: Mark Dilger
Discussion: https://postgr.es/m/20210122040101.GF27167%40telsasoft.com
When an ownership check on extended statistics object failed, the code
was calling aclcheck_error_type to report the failure, which is clearly
wrong, resulting in cache lookup errors. Fix by calling aclcheck_error.
This issue exists since the introduction of extended statistics, so
backpatch all the way back to PostgreSQL 10. It went unnoticed because
there were no tests triggering the error, so add one.
Reported-by: Mark Dilger
Backpatch-through: 10, where extended stats were introduced
Discussion: https://postgr.es/m/1F238937-7CC2-4703-A1B1-6DC225B8978A%40enterprisedb.com
When starting to use a query parsetree loaded from the catalogs,
we must begin by applying AcquireRewriteLocks(), to obtain the same
relation locks that the parser would have gotten if the query were
entered interactively, and to do some other cleanup such as dealing
with later-dropped columns. New-style SQL functions are just as
subject to this rule as other stored parsetrees; however, of the
places dealing with such functions, only init_sql_fcache had gotten
the memo. In particular, if we successfully inlined a new-style
set-returning SQL function that contained any relation references,
we'd either get an assertion failure or attempt to use those
relation(s) sans locks.
I also added AcquireRewriteLocks calls to fmgr_sql_validator and
print_function_sqlbody. Desultory experiments didn't demonstrate any
failures in those, but I suspect that I just didn't try hard enough.
Certainly we don't expect nearby code paths to operate without locks.
On the same logic of it-ought-to-have-the-same-effects-as-the-old-code,
call pg_rewrite_query() in fmgr_sql_validator, too. It's possible
that neither code path there needs to bother with rewriting, but
doing the analysis to prove that is beyond my goals for today.
Per bug #17161 from Alexander Lakhin.
Discussion: https://postgr.es/m/17161-048a1cdff8422800@postgresql.org
Most data-corruption reports mention the location of the problem, but
this one failed to. Add it.
Backpatch all the way back. In 12 and older, also assign the
ERRCODE_DATA_CORRUPTED error code as was done in commit fd6ec93bf8 for
13 and later.
Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql
In the long-going saga for analyze on partitioned tables, one thing I
missed while reverting 0827e8af70 is the maintenance of analyze count
and last analyze time for partitioned tables. This is a mostly trivial
change that enables users assess the need for invoking manual ANALYZE on
partitioned tables.
This patch, posted by Justin and modified a bit by me (Álvaro), can be
mostly traced back to Hosoya-san, though any problems introduced with
the scissors are mine.
Backpatch to 14, in line with 6f8127b739.
Co-authored-by: Yuzuko Hosoya <yuzukohosoya@gmail.com>
Co-authored-by: Justin Pryzby <pryzby@telsasoft.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210816222810.GE10479@telsasoft.com
If the system crashed between CREATE TABLESPACE and the next checkpoint,
the result could be some files in the tablespace unexpectedly containing
no rows. Affected files would be those for which the system did not
write WAL; see the wal_skip_threshold documentation. Before v13, a
different set of conditions governed the writing of WAL; see v12's
<sect2 id="populate-pitr">. (The v12 conditions were broader in some
ways and narrower in others.) Users may want to audit non-default
tablespaces for unexpected short files. The bug could have truncated an
index without affecting the associated table, and reindexing the index
would fix that particular problem.
This fixes the bug by making create_tablespace_directories() more like
TablespaceCreateDbspace(). create_tablespace_directories() was
recursively removing tablespace contents, reasoning that WAL redo would
recreate everything removed that way. That assumption holds for other
wal_level values. Under wal_level=minimal, the old approach could
delete files for which no other copy existed. Back-patch to 9.6 (all
supported versions).
Reviewed by Robert Haas and Prabhat Sahu. Reported by Robert Haas.
Discussion: https://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zKiHxQzBbFRxxGxm0w@mail.gmail.com
Somehow, spgist overlooked the need to call pgstat_count_index_scan().
Hence, pg_stat_all_indexes.idx_scan and equivalent columns never
became nonzero for an SP-GiST index, although the related per-tuple
counters worked fine.
This fix works a bit differently from other index AMs, in that the
counter increment occurs in spgrescan not spggettuple/spggetbitmap.
It looks like this won't make the user-visible semantics noticeably
different, so I won't go to the trouble of introducing an is-this-
the-first-call flag just to make the counter bumps happen in the
same places.
Per bug #17163 from Christian Quest. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/17163-b8c5cc88322a5e92@postgresql.org
When prefetching pages for ANALYZE, we should be using
maintenance_io_concurrenty (by calling
get_tablespace_maintenance_io_concurrency(), not
get_tablespace_io_concurrency()).
ANALYZE prefetching was introduced in c6fc50c, so back-patch to 14.
Backpatch-through: 14
Reported-By: Egor Rogov
Discussion: https://postgr.es/m/9beada99-34ce-8c95-fadb-451768d08c64%40postgrespro.ru
Adjust track_io_timing related logging code added by commit 94d13d474d.
Make it consistent with other nearby autovacuum and autoanalyze logging
code by removing logic that suppressed zero millisecond outputs.
log_autovacuum_min_duration log output now reliably shows "read:" and
"write:" millisecond-based values in its report (when track_io_timing is
enabled).
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/CAH2-WznW0FNxSVQMSRazAMYNfZ6DR_gr5WE78hc6E1CBkkJpzw@mail.gmail.com
Backpatch: 14-, where the track_io_timing logging was introduced.
This order seems more natural. It starts with details that are
particular to heap and index data structures, and ends with system-level
costs incurred during the autovacuum worker's VACUUM/ANALYZE operation.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkzxK6ahA9xxsOftRtBX_R0swuHZsvo4QUbak1Bz7hb7Q@mail.gmail.com
Backpatch: 14-, which enhanced the log output in various ways.
Second thoughts about commit 824bf7190: we apply makesearch() to
an NFA after having determined whether it is a MATCHALL pattern.
Prepending ".*" doesn't make it non-MATCHALL, but it does change the
maximum possible match length, and makesearch() failed to update that.
This has no ill effects given the stylized usage of search NFAs, but
it seems like it's better to keep the data structure consistent. In
particular, fixing this allows more honest handling of the MATCHALL
check in matchuntil(): we can now assert that maxmatchall is infinity,
instead of lamely assuming that it should act that way.
In passing, improve the code in dump[c]nfa so that infinite maxmatchall
is printed as "inf" not a magic number.
Pengchengliu reported an assertion failure in a parallel woker while
performing a parallel scan using an overflowed snapshot. The proximate
cause is that TransactionXmin was set to an incorrect value. The
underlying cause is incorrect snapshot handling in parallel.c.
In particular, InitializeParallelDSM() was unconditionally calling
GetTransactionSnapshot(), because I (rhaas) mistakenly thought that
was always retrieving an existing snapshot whereas, at isolation
levels less than REPEATABLE READ, it's actually taking a new one. So
instead do this only at higher isolation levels where there actually
is a single snapshot for the whole transaction.
By itself, this is not a sufficient fix, because we still need to
guarantee that TransactionXmin gets set properly in the workers. The
easiest way to do that seems to be to install the leader's active
snapshot as the transaction snapshot if the leader did not serialize a
transaction snapshot. This doesn't affect the results of future
GetTrasnactionSnapshot() calls since those have to take a new snapshot
anyway; what we care about is the side effect of setting TransactionXmin.
Report by Pengchengliu. Patch by Greg Nancarrow, except for some comment
text which I supplied.
Discussion: https://postgr.es/m/002f01d748ac$eaa781a0$bff684e0$@tju.edu.cn
Commit 325f2ec555 introduced pg_class.relwrite to skip operations on
tables created as part of a heap rewrite during DDL. It links such
transient heaps to the original relation OID via this new field in
pg_class but forgot to do anything about toast tables. So, logical
decoding was not able to skip operations on internally created toast
tables. This leads to an error when we tried to decode the WAL for the
next operation for which it appeared that there is a toast data where
actually it didn't have any toast data.
To fix this, we set pg_class.relwrite for internally created toast tables
as well which allowed skipping operations on them during logical decoding.
Author: Bertrand Drouvot
Reviewed-by: David Zhang, Amit Kapila
Backpatch-through: 11, where it was introduced
Discussion: https://postgr.es/m/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
There are two identical error messages about valid value of modulus for
hash partition, in PostgreSQL source code. Commit 0e1275fb07 improved
only one of them so that ambiguous word "positive" was avoided there,
and forgot to improve the other. This commit improves the other.
Which would reduce translator burden.
Back-pach to v11 where the error message exists.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210819.170315.1413060634876301811.horikyota.ntt@gmail.com
The distance in phrase operator must be an integer value between zero
and MAXENTRYPOS inclusive. But previously the error message about
its valid value included the information about its upper limit
but not lower limit (i.e., zero). This commit improves the error message
so that it also includes the information about its lower limit.
Back-patch to v9.6 where full-text phrase search was supported.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210819.170315.1413060634876301811.horikyota.ntt@gmail.com
Regexps like "(.){0}...\1" drew an "invalid backreference number".
That's not unreasonable on its face, since the capture group will
never be matched if it's iterated zero times. However, other engines
such as Perl's don't complain about this, nor do we throw an error for
related cases such as "(.)|\1", even though that backref can never
succeed either. Also, if the zero-iterations case happens at runtime
rather than compile time --- say, "(x)*...\1" when there's no "x" to
be found --- that's not an error, we just deem the backref to not
match. Making this even less defensible, no error was thrown for
nested cases such as "((.)){0}...\2"; and to add insult to injury,
those cases could result in assertion failures instead. (It seems
that nothing especially bad happened in non-assert builds, though.)
Let's just fix it so that no error is thrown and instead the backref
is deemed to never match, so that compile-time detection of no
iterations behaves the same as run-time detection.
Per report from Mark Dilger. This appears to be an aboriginal error
in Spencer's library, so back-patch to all supported versions.
Pre-v14, it turns out to also be necessary to back-patch one aspect of
commits cb76fbd7e/00116dee5, namely to create capture-node subREs with
the begin/end states of their subexpressions, not the current lp/rp
of the outer parseqatom invocation. Otherwise delsub complains that
we're trying to disconnect a state from itself. This is a bit scary
but code examination shows that it's safe: in the pre-v14 code, if we
want to wrap iteration around the subexpression, the first thing we do
is overwrite the atom's begin/end fields with new states. So the
bogus values didn't survive long enough to be used for anything, except
if no iteration is required, in which case it doesn't matter.
Discussion: https://postgr.es/m/A099E4A8-4377-4C64-A98C-3DEDDC075502@enterprisedb.com
The current refresh behavior tries to just refresh added/dropped
publications but that leads to removing wrong tables from subscription. We
can't refresh just the dropped publication because it is quite possible
that some of the tables are removed from publication by that time and now
those will remain as part of the subscription. Also, there is a chance
that the tables that were part of the publication being dropped are also
part of another publication, so we can't remove those.
So, we decided that by default, add/drop commands will also act like
REFRESH PUBLICATION which means they will refresh all the publications. We
can keep the old behavior for "add publication" but it is better to be
consistent with "drop publication".
Author: Hou Zhijie
Reviewed-by: Masahiko Sawada, Amit Kapila
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/OS0PR01MB5716935D4C2CC85A6143073F94EF9@OS0PR01MB5716.jpnprd01.prod.outlook.com
The recursion in cdissect() was careless about clearing match data
for capturing parentheses after rejecting a partial match. This
could allow a later back-reference to succeed when by rights it
should fail for lack of a defined referent.
To fix, think a little more rigorously about what the contract
between different levels of cdissect's recursion needs to be.
With the right spec, we can fix this using fewer rather than more
resets of the match data; the key decision being that a failed
sub-match is now explicitly responsible for clearing any matches
it may have set.
There are enough other cross-checks and optimizations in the code
that it's not especially easy to exhibit this problem; usually, the
match will fail as-expected. Plus, regexps that are even potentially
vulnerable are most likely user errors, since there's just not much
point in writing a back-ref that doesn't always have a referent.
These facts perhaps explain why the issue hasn't been detected,
even though it's almost certainly a couple of decades old.
Discussion: https://postgr.es/m/151435.1629733387@sss.pgh.pa.us
WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files. Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it. If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.
To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.
This has always been wrong, so backpatch all the way back.
Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
In a backup manifest, WAL-Ranges stores the range of WAL that is
required for the backup to be valid. pg_verifybackup would then
internally use pg_waldump for the checks based on this data.
When the timeline where the backup started was more than 1 with a
history file looked at for the manifest data generation, the calculation
of the WAL range for the first timeline to check was incorrect. The
previous logic used as start LSN the start position of the first
timeline, but it needs to use the start LSN of the backup. This would
cause failures with pg_verifybackup, or any tools making use of the
backup manifests.
This commit adds a test based on a logic using a self-promoted node,
making it rather cheap.
Author: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20210818.143031.1867083699202617521.horikyota.ntt@gmail.com
Backpatch-through: 13