Add hardening to the heapam index tuple deletion path to catch TIDs in
index pages that point to a heap item that index tuples should never
point to. The corruption we're trying to catch here is particularly
tricky to detect, since it typically involves "extra" (corrupt) index
tuples, as opposed to the absence of required index tuples in the index.
For example, a heap TID from an index page that turns out to point to an
LP_UNUSED item in the heap page has a good chance of being caught by one
of the new checks. There is a decent chance that the recently fixed
parallel VACUUM bug (see commit 9bacec15) would have been caught had
that particular check been in place for Postgres 14. No backpatch of
this extra hardening for now, though.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
These assertions document (and verify) our high level assumptions about
how pruning can and cannot affect existing items from target heap pages.
For example, one of the new assertions verifies that pruning does not
set a heap-only tuple to LP_DEAD.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=vhvBx1GjF+oueHh8YQcHoQYrMi0F0zFMHEr8yc4sCoA@mail.gmail.com
Commit b4af70cb inverted the return value of the function
parallel_processing_is_safe(), but missed the amvacuumcleanup test.
Index AMs that don't support parallel cleanup at all were affected.
The practical consequences of this bug were not very serious. Hash
indexes are affected, but since they just return the number of blocks
during hashvacuumcleanup anyway, it can't have had much impact.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA-Em+aeVPmBbL_s1V-ghsJQSxYL-i3JP8nTfPiD1wjKw@mail.gmail.com
Backpatch: 14-, where commit b4af70cb appears.
Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing. Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely. Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed. These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all. Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples. This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.
To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff. Also document the design of parallel VACUUM with these
below-size-cutoff indexes.
It's unclear how many users might be affected by this bug. There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff. Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing. Note also that autovacuum was not affected, since it never
uses parallel processing.
Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.
Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.
We don't modify any shared state in this function which could cause
problems for any concurrent session. This will make it look similar to the
other updates for the same structure (TransactionState) which avoids
confusion for future readers of code.
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
Commit 0bead9af48 introduced XLOG_INCLUDE_XID flag to indicate that the
WAL record contains subXID-to-topXID association. It uses that flag later
to mark in CurrentTransactionState that top-xid is logged so that we
should not try to log it again with the next WAL record in the current
subtransaction. However, we can use a localized variable to pass that
information.
In passing, change the related function and variable names to make them
consistent with what the code is actually doing.
Author: Dilip Kumar
Reviewed-by: Alvaro Herrera, Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
The unicode characters, while in comments and not code, caused MSVC
to emit compiler warning C4819:
The file contains a character that cannot be represented in the
current code page (number). Save the file in Unicode format to
prevent data loss.
Fix by replacing the characters in print.c with descriptive comments
containing the codepoints and symbol names, and remove the character
in brin_bloom.c which was a footnote reference copied from the paper
citation.
Per report from hamerkop in the buildfarm.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/340E4118-0D0C-4E85-8141-8C40EB22DA3A@yesql.se
Commit d168b66682, which overhauled index deletion, added a
pg_unreachable() to the end of a sort comparator used when sorting heap
TIDs from an index page. This allows the compiler to apply
optimizations that assume that the heap TIDs from the index AM must
always be unique.
That doesn't seem like a good idea now, given recent reports of
corruption involving duplicate TIDs in indexes on Postgres 14. Demote
to an assertion, just in case.
Backpatch: 14-, where index deletion was overhauled.
Comments above _bt_findinsertloc() that talk about LP_DEAD items are now
out of place. We already discuss index tuple deletion at an earlier
point in the same comment block.
Oversight in commit d168b666.
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.
To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.
To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.
Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.
Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
All the code that runs in the startup process to write WAL records
before that's allowed generally is now consecutive, so there's no
reason to shut the facility to write WAL locally off and then turn
it on again three times in a row.
Unfortunately, this requires a slight kludge in the checkpointer,
which needs to separately enable writing WAL in order to write the
checkpoint record. Because that code might run in the same process
as StartupXLOG() if we are in single-user mode, we must save/restore
the state of the LocalXLogInsertAllowed flag. Hopefully, we'll be
able to eliminate this wart in further refactoring, but it's
not too bad anyway.
Amul Sul, with modifications by me.
Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
This does a better job grouping related operations together, since
all of the WAL records that we need to write prior to allowing WAL
writes generally and written by a single uninterrupted stretch of code.
Since CleanupAfterArchiveRecovery() just (1) runs recovery_end_command,
(2) removes non-parent xlog files, and (3) archives any final partial
segment, this should be safe, because all of those things are pretty
much unrelated to the WAL record written by XLogReportParameters().
Amul Sul, per a suggestion from me
Discussion: http://postgr.es/m/CAAJ_b97fysj6sRSQEfOHj-y8Jfd5uPqOgO74qast89B4WfD+TA@mail.gmail.com
The purpose of commit 8a54e12a38 was to
fix this, and it sufficed when the PREPARE TRANSACTION completed before
the CIC looked for lock conflicts. Otherwise, things still broke. As
before, in a cluster having used CIC while having enabled prepared
transactions, queries that use the resulting index can silently fail to
find rows. It may be necessary to reindex to recover from past
occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index
builds by making CIC wait for arbitrarily-recent prepared transactions
and for ordinary transactions that may yet PREPARE TRANSACTION. As part
of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC
before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all
supported versions).
Andrey Borodin, reviewed (in earlier versions) by Andres Freund.
Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
During a replication slot creation, an ERROR generated in the same
transaction as the one creating a to-be-exported snapshot would have
left the backend in an inconsistent state, as the associated static
export snapshot state was not being reset on transaction abort, but only
on the follow-up command received by the WAL sender that created this
snapshot on replication slot creation. This would trigger inconsistency
failures if this session tried to export again a snapshot, like during
the creation of a replication slot.
Note that a snapshot export cannot happen in a transaction block, so
there is no need to worry resetting this state for subtransaction
aborts. Also, this inconsistent state would very unlikely show up to
users. For example, one case where this could happen is an
out-of-memory error when building the initial snapshot to-be-exported.
Dilip found this problem while poking at a different patch, that caused
an error in this code path for reasons unrelated to HEAD.
Author: Dilip Kumar
Reviewed-by: Michael Paquier, Zhihong Yu
Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com
Backpatch-through: 9.6
CreateOverwriteContrecordRecord(), UpdateFullPageWrites(),
PerformRecoveryXLogAction(), and CleanupAfterArchiveRecovery()
are moved somewhat later in StartupXLOG(). This is preparatory
work for a future patch that wants to allow recovery to end at one
time and only later start to allow WAL writes. To do that, it's
necessary to separate code that has to do with allowing WAL writes
from other things that need to happen simply because recovery is
ending, such as initializing shared memory data structures that
depend on information that might not be accurate before redo is
complete.
This commit does not achieve that goal, but it is a step in that
direction. For example, there are a few different bits of code that
write things into WAL once we have finished recovery, and with this
change, those bits of code are closer to each other than previously,
with fewer unrelated bits of code interspersed.
Robert Haas and Amul Sul
Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
Create a new function PerformRecoveryXLogAction() and move the
code which either writes an end-of-recovery record or requests a
checkpoint there.
Also create a new function CleanupAfterArchiveRecovery() to
perform a few tasks that we want to do after we've actually exited
archive recovery but before we start accepting new WAL writes.
More refactoring of this file is planned, but this commit is
just straightforward code movement to make StartupXLOG() a
little bit shorter and a little bit easier to understand.
Robert Haas and Amul Sul
Discussion: http://postgr.es/m/CAAJ_b97abMuq=470Wahun=aS1PHTSbStHtrjjPaD-C0YQ1AqVw@mail.gmail.com
Commit 0668719801 changed XLogPageRead() so that it validated the page
header, if invalid page header was found reset the error message and
retried reading the page, to fix the scenario where streaming standby
got stuck at a continuation record. This change hid the error message
about invalid page header, which would make it harder for users to
investigate what the actual issue was found in WAL.
To fix the issue, this commit makes XLogPageRead() report the error
message when invalid page header is found.
When not in standby mode, an invalid page header should cause recovery
to end, not retry reading the page, so XLogPageRead() doesn't need to
validate the page header for the retry. Instead, ReadPageInternal() should
be responsible for the validation in that case. Therefore this commit
changes XLogPageRead() so that if not in standby mode it doesn't validate
the page header for the retry.
Reported-by: Yugo Nagata
Author: Yugo Nagata, Kyotaro Horiguchi
Reviewed-by: Ranier Vilela, Fujii Masao
Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
Some specific logic is done at the end of recovery when involving 2PC
transactions:
1) Call RecoverPreparedTransactions(), to recover the state of 2PC
transactions into memory (re-acquire locks, etc.).
2) ShutdownRecoveryTransactionEnvironment(), to move back to normal
operations, mainly cleaning up recovery locks and KnownAssignedXids
(including any 2PC transaction tracked previously).
3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is
the tipping point for any process calling RecoveryInProgress() to check
if the cluster is still in recovery or not.
Any snapshot taken between steps 2) and 3) would be empty, causing any
transaction relying on a snapshot at this point to potentially corrupt
data as there could still be some 2PC transactions to track, with
RecentXmin moving backwards on successive calls to GetSnapshotData() in
the same transaction.
As SharedRecoveryState is the point to take into account to know if it
is safe to discard KnownAssignedXids, this commit moves step 2) after
step 3), so as we can never finish with empty snapshots.
This exists since the introduction of hot standby, so backpatch all the
way down. The window with incorrect snapshots is extremely small, but I
have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm
member fairywren. Thomas Munro also found it independently. Special
thanks to Andres Freund for taking the time to analyze this issue.
Reported-by: Thomas Munro, Michael Paquier
Analyzed-by: Andres Freund
Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de
Backpatch-through: 9.6
The "equality implies image equality" opclass infrastructure disallowed
deduplication in system catalog indexes and TOAST indexes before now.
That seemed like the right approach back when the infrastructure was
added by commit 612a1ab7, since ALTER INDEX cannot set deduplicate_items
to 'off' (due to an old implementation restriction). But that decision
now seems arbitrary at best. Remove special case handling implementing
this policy.
No catversion bump, since existing catalog indexes will still work.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=rYQHFaJ3WYBdK=xgwxKzaiGMSSrh-ZCREa-pS-7Zjew@mail.gmail.com
Commit ff9f111bce added some test code that's unportable and doesn't
add meaningful coverage. Remove it rather than try and get it to work
everywhere.
While at it, fix a typo in a log message added by the aforementioned
commit.
Backpatch to 14.
Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's. In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.
To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.
Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems. It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.
Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.
Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
Physical replication always ships WAL segment files to replicas once
they are complete. This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
LOG: invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be. A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary. But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.
A fix for this problem was already attempted in commit 515e3d84a0, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.
This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts. With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.
Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.
A new TAP test that exercises this is added, but the portability of it
is yet to be seen.
This has been wrong since the introduction of physical replication, so
backpatch all the way back. In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
Discussing the low level issue of nbtree VACUUM and recovery conflicts
in btvacuumpage() now seems inappropriate. The same issue is discussed
in nbtxlog.h, as well as in a comment block above _bt_delitems_vacuum().
The comment block made more sense when it was part of a broader
discussion of nbtree VACUUM "pin scans". These were removed by commit
9f83468b.
_bt_delitems_delete() is no longer the high-level entry point used by
index tuple deletion driven by index tuples whose LP_DEAD bits are set
(now called "simple index tuple deletion"). It became a lower level
routine that's only called by _bt_delitems_delete_check() following
commit d168b66682.
Remove obsolete reference to lazy_vacuum()'s onecall argument. The
function argument was removed by commit 3499df0dee.
Also remove adjoining comment block that introduces the wraparound
failsafe concept. Talking about the failsafe here no longer makes
sense, since lazy_vacuum() (and related functions) are no longer the
only place where the failsafe might be triggered. This has been the
case since commit c242baa4a8 taught VACUUM to consider triggering the
failsafe mechanism during its initial heap scan.
Point out that index tuple deletion generally needs a latestRemovedXid
value for the deletion operation's WAL record. This is bound to be the
most expensive part of the whole deletion operation now that it takes
place up front, during original execution.
This was arguably an oversight in commit 558a9165e0, which moved the
work required to generate these values from index deletion REDO routines
to original execution of index deletion operations.
Checking that an offset number isn't past the end of a heap page's line
pointer array was just a defensive sanity check for HOT-chain traversal
code before commit 3c3b8a4b. It's etrictly necessary now, though. Add
comments that reference the issue to code in heapam that needs to get it
right.
Per suggestion from Alexander Lakhin.
Discussion: https://postgr.es/m/f76a292c-9170-1aef-91a0-59d9443b99a3@gmail.com
It is not appropriate for deduplication to apply single value strategy
when triggered by a bottom-up index deletion pass. This wastes cycles
because later bottom-up deletion passes will overinterpret older
duplicate tuples that deduplication actually just skipped over "by
design". It also makes bottom-up deletion much less effective for low
cardinality indexes that happen to cross a meaningless "index has single
key value per leaf page" threshold.
To fix, slightly narrow the conditions under which deduplication's
single value strategy is considered. We already avoided the strategy
for a unique index, since our high level goal must just be to buy time
for VACUUM to run (not to buy space). We'll now also avoid it when we
just had a bottom-up pass that reported failure. The two cases share
the same high level goal, and already overlapped significantly, so this
approach is quite natural.
Oversight in commit d168b666, which added bottom-up index deletion.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznaOvM+Gyj-JQ0X=JxoMDxctDTYjiEuETdAGbF5EUc3MA@mail.gmail.com
Backpatch: 14-, where bottom-up deletion was introduced.
I noticed that commit 0bead9af48 left this flag undocumented in
XLogSetRecordFlags, which led me to discover that the flag doesn't
actually do what the one comment on it said it does. Improve the
situation by adding some more comments.
Backpatch to 14, where the aforementioned commit appears.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/202109212119.c3nhfp64t2ql@alvherre.pgsql
A broken HOT chain is not an unexpected condition, even when the offset
number points past the end of the page's line pointer array.
heap_prune_chain() does not (and never has) treated this condition as
unexpected, so derivative code in heap_index_delete_tuples() shouldn't
do so either.
Oversight in commit 4228817449.
The assertion can probably only fail on Postgres 14 and master. Earlier
releases don't have commit 3c3b8a4b, which taught VACUUM to truncate the
line pointer array of heap pages. Backpatch all the same, just to be
consistent.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/17197-9438f31f46705182@postgresql.org
Backpatch: 12-, just like commit 4228817449.
Formerly, we sent signals for outgoing NOTIFY messages within
ProcessCompletedNotifies, which was also responsible for sending
relevant ones of those messages to our connected client. It therefore
had to run during the main-loop processing that occurs just before
going idle. This arrangement had two big disadvantages:
* Now that procedures allow intra-command COMMITs, it would be
useful to send NOTIFYs to other sessions immediately at COMMIT
(though, for reasons of wire-protocol stability, we still shouldn't
forward them to our client until end of command).
* Background processes such as replication workers would not send
NOTIFYs at all, since they never execute the client communication
loop. We've had requests to allow triggers running in replication
workers to send NOTIFYs, so that's a problem.
To fix these things, move transmission of outgoing NOTIFY signals
into AtCommit_Notify, where it will happen during CommitTransaction.
Also move the possible call of asyncQueueAdvanceTail there, to
ensure we don't bloat the async SLRU if a background worker sends
many NOTIFYs with no one listening.
We can also drop the call of asyncQueueReadAllNotifications,
allowing ProcessCompletedNotifies to go away entirely. That's
because commit 790026972 added a call of ProcessNotifyInterrupt
adjacent to PostgresMain's call of ProcessCompletedNotifies,
and that does its own call of asyncQueueReadAllNotifications,
meaning that we were uselessly doing two such calls (inside two
separate transactions) whenever inbound notify signals coincided
with an outbound notify. We need only set notifyInterruptPending
to ensure that ProcessNotifyInterrupt runs, and we're done.
The existing documentation suggests that custom background workers
should call ProcessCompletedNotifies if they want to send NOTIFY
messages. To avoid an ABI break in the back branches, reduce it
to an empty routine rather than removing it entirely. Removal
will occur in v15.
Although the problems mentioned above have existed for awhile,
I don't feel comfortable back-patching this any further than v13.
There was quite a bit of churn in adjacent code between 12 and 13.
At minimum we'd have to also backpatch 51004c717, and a good deal
of other adjustment would also be needed, so the benefit-to-risk
ratio doesn't look attractive.
Per bug #15293 from Michael Powers (and similar gripes from others).
Artur Zakirov and Tom Lane
Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org
All the code paths simplified here were already using a boolean or used
an expression that led to zero or one, making the extra bits
unnecessary.
Author: Justin Pryzby
Reviewed-by: Tom Lane, Michael Paquier, Peter Smith
Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
Attempting to make hashfloat4() look as much as possible like
hashfloat8(), I'd figured I could replace NaNs with get_float4_nan()
before widening to float8. However, results from protosciurus
and topminnow show that on some platforms that produces a different
bit-pattern from get_float8_nan(), breaking the intent of ce773f230.
Rearrange so that we use the result of get_float8_nan() for all NaN
cases. As before, back-patch.
The IEEE 754 standard allows a wide variety of bit patterns for NaNs,
of which at least two ("NaN" and "-NaN") are pretty easy to produce
from SQL on most machines. This is problematic because our btree
comparison functions deem all NaNs to be equal, but our float hash
functions know nothing about NaNs and will happily produce varying
hash codes for them. That causes unexpected results from queries
that hash a column containing different NaN values. It could also
produce unexpected lookup failures when using a hash index on a
float column, i.e. "WHERE x = 'NaN'" will not find all the rows
it should.
To fix, special-case NaN in the float hash functions, not too much
unlike the existing special case that forces zero and minus zero
to hash the same. I arranged for the most vanilla sort of NaN
(that coming from the C99 NAN constant) to still have the same
hash code as before, to reduce the risk to existing hash indexes.
I dithered about whether to back-patch this into stable branches,
but ultimately decided to do so. It's a clear improvement for
queries that hash internally. If there is anybody who has -NaN
in a hash index, they'd be well advised to re-index after applying
this patch ... but the misbehavior if they don't will not be much
worse than the misbehavior they had before.
Per bug #17172 from Ma Liangzhu.
Discussion: https://postgr.es/m/17172-7505bea9e04e230f@postgresql.org
It doesn't make any sense to report this information, since VACUUM
VERBOSE reports on heap relation truncation directly. This was an
oversight in commit 7ab96cf6, which made VACUUM VERBOSE output a little
more consistent with nearby autovacuum-specific log output. Adjust
comments that describe how this is supposed to work in passing.
Also bring truncation-related VACUUM VERBOSE output in line with the
convention established for VACUUM VERBOSE output by commit f4f4a649.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch: 14-, where VACUUM VERBOSE's output changed.
The field hasn't been used since commit 3d351d91, which redefined
pg_class.reltuples to be -1 before the first VACUUM or ANALYZE.
Also rename a local variable of the same name ("old_rel_pages"). This is
used by relation truncation to represent the original relation size at
the start of the ongoing VACUUM operation. Rename it to orig_rel_pages,
since that's a lot clearer. (This name matches similar nearby code.)
Most data-corruption reports mention the location of the problem, but
this one failed to. Add it.
Backpatch all the way back. In 12 and older, also assign the
ERRCODE_DATA_CORRUPTED error code as was done in commit fd6ec93bf8 for
13 and later.
Discussion: https://postgr.es/m/202108191637.oqyzrdtnheir@alvherre.pgsql
Somehow, spgist overlooked the need to call pgstat_count_index_scan().
Hence, pg_stat_all_indexes.idx_scan and equivalent columns never
became nonzero for an SP-GiST index, although the related per-tuple
counters worked fine.
This fix works a bit differently from other index AMs, in that the
counter increment occurs in spgrescan not spggettuple/spggetbitmap.
It looks like this won't make the user-visible semantics noticeably
different, so I won't go to the trouble of introducing an is-this-
the-first-call flag just to make the counter bumps happen in the
same places.
Per bug #17163 from Christian Quest. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/17163-b8c5cc88322a5e92@postgresql.org