GetWALRecordsInfo() and pg_get_wal_fpi_info() can leak memory across
WAL record iterations. Fix this by using a temporary memory context
that's reset for each WAL record iteraion.
Also a use temporary context for loops in GetXLogSummaryStats(). The
number of iterations is a small constant, so the previous behavior was
not a leak, but fix for clarity (but no need to backport).
Backport GetWALRecordsInfo() change to version
15. pg_get_wal_fpi_info() didn't exist in version 15.
Reported-by: Peter Geoghegan
Author: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CAH2-WznLEJjn7ghmKOABOEZYuJvkTk%3DGKU3m0%2B-XBAH%2BerPiJQ%40mail.gmail.com
Backpatch-through: 15
This function is able to extract the full page images from a range of
records, specified as of input arguments start_lsn and end_lsn. Like
the other functions of this module, an error is returned if using LSNs
that do not reflect real system values. All the FPIs stored in a single
record are extracted.
The module's version is bumped to 1.1.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CALj2ACVCcvzd7WiWvD=6_7NBvVB_r6G0EGSxL4F8vosAi6Se4g@mail.gmail.com
Per discussion, the existing routine name able to initialize a SRF
function with materialize mode is unpopular, so rename it. Equally, the
flags of this function are renamed, as of:
- SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC
- SRF_SINGLE_BLESS -> MAT_SRF_BLESS
The previous function and flags introduced in 9e98583 are kept around
for compatibility purposes, so as any extension code already compiled
with v15 continues to work as-is. The declarations introduced here for
compatibility will be removed from HEAD in a follow-up commit.
The new names have been suggested by Andres Freund and Melanie
Plageman.
Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de
Backpatch-through: 15
pg_walinspect uses datatype double (double precision floating point
number) for WAL stats percentile calculations and expose them via
float4 (single precision floating point number), which an unnecessary
loss of precision and confusing. Even though, it's harmless that way,
let's use float8 (double precision floating-point number) to be in
sync with what pg_walinspect does internally and what it exposes to
the users. This seems to be the pattern used elsewhere in the code.
Reported-by: Peter Eisentraut
Author: Bharath Rupireddy
Reviewed-by: Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/36ee692b-232f-0484-ce94-dc39d82021ad%40enterprisedb.com
Usage of ReadNextXLogRecord()'s first_record parameter for error
reporting isn't always correct. For instance, in GetWALRecordsInfo()
and GetWalStats(), we're reading multiple records, and first_record
is always passed as the LSN of the first record which is then used
for error reporting for later WAL record read failures. This isn't
correct.
The correct parameter to use for error reports in case of WAL
reading failures is xlogreader->EndRecPtr. This change fixes it.
While on it, removed an unnecessary Assert in pg_walinspect code.
Reported-by: Robert Haas
Author: Bharath Rupireddy
Reviewed-by: Robert Haas
Discussion: https://www.postgresql.org/message-id/CA%2BTgmoZAOGzPUifrcZRjFZ2vbtcw3mp-mN6UgEoEcQg6bY3OVg%40mail.gmail.com
Backpatch-through: 15
This replaces all MemSet() calls with struct initialization where that
is easily and obviously possible. (For example, some cases have to
worry about padding bits, so I left those.)
(The same could be done with appropriate memset() calls, but this
patch is part of an effort to phase out MemSet(), so it doesn't touch
memset() calls.)
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com
Instability in the test for pg_walinspect revealed that
pg_get_wal_records_info_till_end_of_wal(x) would try to decode all the
records with a start LSN earlier than the flush LSN, even though that
might include a partial record at the end of the range. In that case,
read_local_xlog_page_no_wait() would return NULL when it tried to read
past the flush LSN, which would be interpreted as an error by the
caller. That caused a test failure only on a BF animal that had been
restarted recently, but could be expected to happen in the wild quite
easily depending on the alignment of various parameters.
Fix by using private data in read_local_xlog_page_no_wait() to signal
end-of-wal to the caller, so that it can be properly distinguished
from a real error.
Discussion: https://postgr.es/m/Ymd/e5eeZMNAkrXo%40paquier.xyz
Discussion: https://postgr.es/m/111657.1650910309@sss.pgh.pa.us
Authors: Thomas Munro, Bharath Rupireddy.