Commit Graph

1020 Commits

Author SHA1 Message Date
Robert Haas 1fa092913d Don't export basebackup.c's sendTablespace().
Commit 72d422a522 made xlog.c call
sendTablespace() with the 'sizeonly' argument set to true, which
required basebackup.c to export sendTablespace(). However, that's
kind of ugly, so instead defer the call to sendTablespace() until
basebackup.c regains control. That way, it can still be a static
function.

Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
2020-06-17 10:57:34 -04:00
Thomas Munro 57cb806308 Fix locking bugs that could corrupt pg_control.
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.

Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().

Back-patch to all supported releases.

Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
2020-06-08 13:57:24 +12:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane 29c3e2dd5a Collect built-in LWLock tranche names statically, not dynamically.
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names.  It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices.  Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.

Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.

(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)

Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
2020-05-14 11:10:31 -04:00
Alvaro Herrera b060dbe000
Rework XLogReader callback system
Code review for 0dc8ead463, prompted by a bug closed by 91c40548d5.

XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open).  This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.

Document the whole arrangement more clearly.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
2020-05-08 15:40:11 -04:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Peter Eisentraut f057980149 Fix typo
from 303640199d
2020-04-26 13:48:33 +02:00
Michael Paquier 4e87c4836a Fix handling of WAL segments ready to be archived during crash recovery
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode.  However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.

This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not.  Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.

A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.

Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
2020-04-24 08:48:28 +09:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Thomas Munro d140f2f3e2 Rationalize GetWalRcv{Write,Flush}RecPtr().
GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location.  Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".

Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value.  This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2020-04-08 23:45:09 +12:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
Alvaro Herrera c655077639
Allow users to limit storage reserved by replication slots
Replication slots are useful to retain data that may be needed by a
replication system.  But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space.  This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size.  Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
2020-04-07 18:35:00 -04:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Fujii Masao 4bd0ad9e44 Prevent archive recovery from scanning non-existent WAL files.
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.

To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.

Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao
Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
2020-04-08 00:49:29 +09:00
Peter Eisentraut 552fcebff0 Revert "Improve handling of parameter differences in physical replication"
This reverts commit 246f136e76.

That patch wasn't quite complete enough.

Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
2020-04-04 09:08:12 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Robert Haas 0d8c9c1210 Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 15:05:59 -04:00
Fujii Masao b0236508d3 Improve the message logged when recovery is paused.
When recovery target is reached and recovery is paused because of
recovery_target_action=pause, executing pg_wal_replay_resume() causes
the standby to promote, i.e., the recovery to end. So, in this case,
the previous message "Execute pg_wal_replay_resume() to continue"
logged was confusing because pg_wal_replay_resume() doesn't cause
the recovery to continue.

This commit improves the message logged when recovery is paused,
and the proper message is output based on what (pg_wal_replay_pause
or recovery_target_action) causes recovery to be paused.

Author: Sergei Kornilov, revised by Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
2020-04-01 03:35:13 +09:00
Michael Paquier 616ae3d2b0 Move routine definitions of xlogarchive.c to a new header file
The definitions of the routines defined in xlogarchive.c have been part
of xlog_internal.h which is included by several frontend tools, but all
those routines are only called by the backend.  More cleanup could be
done within xlog_internal.h, but that's already a nice cut.

This will help a follow-up patch for pg_rewind where handling of
restore_command is added for frontends.

Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Álvaro Herrera, Alexander Korotkov
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
2020-03-31 15:33:04 +09:00
Peter Eisentraut 246f136e76 Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  At that point we pause
recovery, so a hot standby remains usable.  We also repeat the last
warning message once a minute so it is harder to miss or ignore.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2020-03-30 09:53:45 +02:00
Alvaro Herrera 1e6148032e
Allow walreceiver configuration to change on reload
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart.  This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.

Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
2020-03-27 19:51:37 -03:00
Alvaro Herrera 092c6936de
Set wal_receiver_create_temp_slot PGC_POSTMASTER
Commit 3297308278 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP.  That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.

We handle the first problem by setting the option to disabled by
default.  If the user enables it, its on their head to make sure that
disk doesn't fill up.

We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value).  A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
2020-03-27 16:20:33 -03:00
Fujii Masao 496ee647ec Prefer standby promotion over recovery pause.
Previously if a promotion was triggered while recovery was paused,
the paused state continued. Also recovery could be paused by executing
pg_wal_replay_pause() even while a promotion was ongoing. That is,
recovery pause had higher priority over a standby promotion.
But this behavior was not desirable because most users basically wanted
the recovery to complete as soon as possible and the server to become
the master when they requested a promotion.

This commit changes recovery so that it prefers a promotion over
recovery pause. That is, if a promotion is triggered while recovery
is paused, the paused state ends and a promotion continues. Also
this commit makes recovery pause functions like pg_wal_replay_pause()
throw an error if they are executed while a promotion is ongoing.

Internally, this commit adds new internal function PromoteIsTriggered()
that returns true if a promotion is triggered. Since the name of
this function and the existing function IsPromoteTriggered() are
confusingly similar, the commit changes the name of IsPromoteTriggered()
to IsPromoteSignaled, as more appropriate name.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi, Sergei Kornilov
Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
2020-03-24 12:46:48 +09:00
Fujii Masao b8e20d6dab Add wait events for WAL archive and recovery pause.
This commit introduces new wait events BackupWaitWalArchive and
RecoveryPause. The former is reported while waiting for the WAL files
required for the backup to be successfully archived. The latter is
reported while waiting for recovery in pause state to be resumed.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas
Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com
2020-03-24 11:12:21 +09:00
Fujii Masao 1d253bae57 Rename the recovery-related wait events.
This commit renames RecoveryWalAll and RecoveryWalStream wait events to
RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively,
in order to make the names and what they are more consistent. For example,
previously RecoveryWalAll was reported as a wait event while the recovery
was waiting for WAL from a stream, and which was confusing because the name
was very different from the situation where the wait actually could happen.

The names of macro variables for those wait events also are renamed
accordingly.

This commit also changes the category of RecoveryRetrieveRetryInterval to
Timeout from Activity because the wait event is reported while waiting based
on wal_retrieve_retry_interval.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi
Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
2020-03-19 15:32:55 +09:00
Fujii Masao 1429d3f767 Fix comment in xlog.c.
This commit fixes the comment about SharedHotStandbyActive variable.
The comment was apparently copy-and-pasted.

Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/CACZ0uYEjpqZB9wN2Rwc_RMvDybyYqdbkPuDr1NyxJg4f9yGfMw@mail.gmail.com
2020-03-17 12:06:22 +09:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Peter Eisentraut aaa3aeddee Remove HAVE_WORKING_LINK
Previously, hard links were not used on Windows and Cygwin, but they
support them just fine in currently supported OS versions, so we can
use them there as well.

Since all supported platforms now support hard links, we can remove
the alternative code paths.

Rename durable_link_or_rename() to durable_rename_excl() to make the
purpose more clear without referencing the implementation details.

Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
2020-03-11 11:23:04 +01:00
Fujii Masao 71e0d0a737 Tidy up XLogSource code in xlog.c.
This commit replaces 0 used as an initial value of XLogSource variable,
with XLOG_FROM_ANY. Also this commit changes those variable so that
XLogSource instead of int is used as the type for them. These changes
are for code readability and debugger-friendliness.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
2020-03-10 09:41:44 +09:00
Fujii Masao ef34ab42a8 Avoid assertion failure with targeted recovery in standby mode.
At the end of recovery, standby mode is turned off to re-fetch the last
valid record from archive or pg_wal. Previously, if recovery target was
reached and standby mode was turned off while the current WAL source
was stream, recovery could try to retrieve WAL file containing the last
valid record unexpectedly from stream even though not in standby mode.
This caused an assertion failure. That is, the assertion test confirms that
WAL file should not be retrieved from stream if standby mode is not true.

This commit moves back the current WAL source to archive if it's stream
even though not in standby mode, to avoid that assertion failure.

This issue doesn't cause the server to crash when built with assertion
disabled. In this case, the attempt to retrieve WAL file from stream not
in standby mode just fails. And then recovery tries to retrieve WAL file
from archive or pg_wal.

Back-patch to all supported branches.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
2020-03-09 15:32:32 +09:00
Fujii Masao e65497df8f Report progress of streaming base backup.
This commit adds pg_stat_progress_basebackup view that reports
the progress while an application like pg_basebackup is taking
a base backup. This uses the progress reporting infrastructure
added by c16dc1aca5, adding support for streaming base backup.

Bump catversion.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Amit Langote, Sergei Kornilov
Discussion: https://postgr.es/m/9ed8b801-8215-1f3d-62d7-65bff53f6e94@oss.nttdata.com
2020-03-03 12:03:43 +09:00
Tom Lane 3d475515a1 Account explicitly for long-lived FDs that are allocated outside fd.c.
The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
2020-02-24 17:28:33 -05:00
Peter Eisentraut 79c2385915 Factor out InitControlFile() from BootStrapXLOG()
Right now this only makes BootStrapXLOG() a bit more manageable, but
in the future there may be external callers.

Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com
2020-02-22 12:09:27 +01:00
Peter Eisentraut 9745f93afc Reformat code comment
Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com
2020-02-22 12:09:27 +01:00
Peter Eisentraut dc788668bb Fail if recovery target is not reached
Before, if a recovery target is configured, but the archive ended
before the target was reached, recovery would end and the server would
promote without further notice.  That was deemed to be pretty wrong.
With this change, if the recovery target is not reached, it is a fatal
error.

Based-on-patch-by: Leif Gunnar Erlandsen <leif@lako.no>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@lako.no
2020-01-29 15:58:14 +01:00
Heikki Linnakangas 30012a04a6 Fix randAccess setting in ReadRecord()
Commit 38a957316d got this backwards.

Author: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/20200128.194408.2260703306774646445.horikyota.ntt@gmail.com
2020-01-28 12:55:30 +02:00
Heikki Linnakangas 38a957316d Refactor XLogReadRecord(), adding XLogBeginRead() function.
The signature of XLogReadRecord() required the caller to pass the starting
WAL position as argument, or InvalidXLogRecPtr to continue reading at the
end of previous record. That's slightly awkward to the callers, as most
of them don't want to randomly jump around in the WAL stream, but start
reading at one position and then read everything from that point onwards.
Remove the 'RecPtr' argument and add a new function XLogBeginRead() to
specify the starting position instead. That's more convenient for the
callers. Also, xlogreader holds state that is reset when you change the
starting position, so having a separate function for doing that feels like
a more natural fit.

This changes XLogFindNextRecord() function so that it doesn't reset the
xlogreader's state to what it was before the call anymore. Instead, it
positions the xlogreader to the found record, like XLogBeginRead().

Reviewed-by: Kyotaro Horiguchi, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/5382a7a3-debe-be31-c860-cb810c08f366%40iki.fi
2020-01-26 11:39:00 +02:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Alvaro Herrera c4dcd9144b Avoid splitting C string literals with \-newline
Using \ is unnecessary and ugly, so remove that.  While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.

Leave contrib/tablefunc alone.

Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
2019-12-24 12:44:12 -03:00
Robert Haas 303640199d Fix minor problems with non-exclusive backup cleanup.
The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.

Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who
preferred a different approach, but got outvoted), Fujii Masao,
and Tom Lane, and with comments by various others.

Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com
2019-12-19 09:06:54 -05:00
Michael Paquier 70116493a8 Remove shadow variables linked to RedoRecPtr in xlog.c
This changes the routines in charge of recycling WAL segments past the
last redo LSN to not use anymore "RedoRecPtr" as a local variable, which
is also available in the context of the session as a static declaration,
replacing it with "lastredoptr".  This confusion has been introduced by
d9fadbf, so backpatch down to v11 like the other commit.

Thanks to Tom Lane, Robert Haas, Alvaro Herrera, Mark Dilger and Kyotaro
Horiguchi for the input provided.

Author: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F7B5F690065E1194B258E35D0@MN2PR18MB2927.namprd18.prod.outlook.com
Backpatch-through: 11
2019-12-18 10:11:13 +09:00
Alvaro Herrera 91fca4bb60 Demote variable from global to local
recoveryDelayUntilTime was introduced by commit 36da3cfb45 as a global
because its method of operation was devilishly intrincate.  Commit
c945af80cf removed all that complexity and could have turned it into a
local variable, but didn't.  Do so now.

Discussion: https://postgr.es/m/20191213200751.GA10731@alvherre.pgsql
Reviewed-by: Michaël Paquier, Daniel Gustafsson
2019-12-16 14:23:56 -03:00
Michael Paquier 68ab982906 Fix thinkos from commit 9989d37
Error messages referring to incorrect WAL segment names could have been
generated for a fsync() failure or when creating a new segment at the
end of recovery.
2019-12-03 18:59:09 +09:00
Michael Paquier 9989d37d1c Remove XLogFileNameP() from the tree
XLogFileNameP() is a wrapper routine able to build a palloc'd string for
a WAL segment name, which is used for error string generation.  There
were several code paths where it gets called in a critical section,
where memory allocation is not allowed.  This results in triggering
an assertion failure instead of generating the wanted error message.

Another, more annoying, problem is that if the allocation to generate
the WAL segment name fails on OOM, then the failure would be escalated
to a PANIC.

This removes the routine and all its callers are replaced with a logic
using a fixed-size buffer.  This way, all the existing mistakes are
fixed and future ones are prevented.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CA+fd4k5gC9H4uoWMLg9K_QfNrnkkdEw+-AFveob9YX7z8JnKTA@mail.gmail.com
2019-12-03 15:06:04 +09:00
Peter Eisentraut 2e4db241bf Remove configure --disable-float4-byval
This build option was only useful to maintain compatibility for
version-0 functions, but those are no longer supported, so this option
can be removed.

float4 is now always pass-by-value; the pass-by-reference code path is
completely removed.

Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
2019-11-21 18:29:21 +01:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Peter Eisentraut b85e43feb3 More precise errors from initial pg_control check
Use a separate error message for invalid checkpoint location and
invalid state instead of just "invalid data" for both.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20191107041630.GK1768@paquier.xyz
2019-11-08 08:03:16 +01:00
Michael Paquier 51970fa8df Fix initialization of fake LSN for unlogged relations
9155580 has changed the value of the first fake LSN for unlogged
relations from 1 to FirstNormalUnloggedLSN (aka 1000), GiST requiring a
non-zero LSN on some pages to allow an interlocking logic to work, but
its value was still initialized to 1 at the beginning of recovery or
after running pg_resetwal.  This fixes the initialization for both code
paths.

Author: Takayuki Tsunakawa
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/OSBPR01MB2503CE851940C17DE44AE3D9FE6F0@OSBPR01MB2503.jpnprd01.prod.outlook.com
Backpatch-through: 12
2019-10-27 13:54:12 +09:00
Fujii Masao 3b0c59ac1c Fix typo in xlog.c.
Author: Fujii Masao
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAHGQGwH7dtYvOZZ8c0AG5AJwH5pfiRdKaCptY1_RdHy0HYeRfQ@mail.gmail.com
2019-10-24 14:13:36 +09:00
Fujii Masao ec1259e880 Fix failure of archive recovery with recovery_min_apply_delay enabled.
recovery_min_apply_delay parameter is intended for use with streaming
replication deployments. However, the document clearly explains that
the parameter will be honored in all cases if it's specified. So it should
take effect even if in archive recovery. But, previously, archive recovery
with recovery_min_apply_delay enabled always failed, and caused assertion
failure if --enable-caasert is enabled.

The cause of this problem is that; the ownership of recoveryWakeupLatch
that recovery_min_apply_delay uses was taken only when standby mode
is requested. So unowned latch could be used in archive recovery, and
which caused the failure.

This commit changes recovery code so that the ownership of
recoveryWakeupLatch is taken even in archive recovery. Which prevents
archive recovery with recovery_min_apply_delay from failing.

Back-patch to v9.4 where recovery_min_apply_delay was added.

Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
2019-10-18 22:32:18 +09:00