Commit Graph

1077 Commits

Author SHA1 Message Date
Andres Freund a9a4a7ad56 code: replace most remaining uses of 'master'.
Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 13:24:35 -07:00
Andres Freund 5e7bbb5286 code: replace 'master' with 'primary' where appropriate.
Also changed "in the primary" to "on the primary", and added a few
"the" before "primary".

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:57:23 -07:00
Fujii Masao 654242fd81 Fix incorrect variable datatype.
Since slot_keep_segs indicates the number of WAL segments not LSN,
its datatype should not be XLogRecPtr.

Back-patch to v13 where this issue was added.

Reported-by: Atsushi Torikoshi
Author: Atsushi Torikoshi, tweaked by Fujii Masao
Discussion: https://postgr.es/m/ebd0d674f3e050222238a960cac5251a@oss.nttdata.com
2020-07-08 21:24:34 +09:00
Alvaro Herrera a8aaa0c786
Morph pg_replication_slots.min_safe_lsn to safe_wal_size
The previous definition of the column was almost universally disliked,
so provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger.  This should be operationally more convenient.

Backpatch to 13, where the new column to pg_replication_slots (and the
feature it represents) were added.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com
2020-07-07 13:08:00 -04:00
Amit Kapila e7b476c657 Remove duplicate check added by commit b2a5545bd6.
As this doesn't cause any harm so we decided to this clean up in HEAD only.

Author: Ádám Balogh
Discussion: https://postgr.es/m/VI1PR0702MB36631BD67559461AFDE1FEEE81920@VI1PR0702MB3663.eurprd07.prod.outlook.com
2020-06-27 09:59:27 +05:30
Alvaro Herrera b8fd4e02c6
Adjust max_slot_wal_keep_size behavior per review
In pg_replication_slot, change output from normal/reserved/lost to
reserved/extended/unreserved/ lost, which better expresses the possible
states particularly near the time where segments are no longer safe but
checkpoint has not run yet.

Under the new definition, reserved means the slot is consuming WAL
that's still under the normal WAL size constraints; extended means it's
consuming WAL that's being protected by wal_keep_segments or the slot
itself, whose size is below max_slot_wal_keep_size; unreserved means the
WAL is no longer safe, but checkpoint has not yet removed those files.
Such as slot is in imminent danger, but can still continue for a little
while and may catch up to the reserved WAL space.

Also, there were some bugs in the calculations used to report the
status; fixed those.

Backpatch to 13.

Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com
2020-06-24 14:23:39 -04:00
Alvaro Herrera 368d7f3297
Add parens to ConvertToXSegs macro
The current definition is dangerous.  No bugs exist in our code at
present, but backpatch to 11 nonetheless where it was introduced.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
2020-06-24 14:00:37 -04:00
Robert Haas 1fa092913d Don't export basebackup.c's sendTablespace().
Commit 72d422a522 made xlog.c call
sendTablespace() with the 'sizeonly' argument set to true, which
required basebackup.c to export sendTablespace(). However, that's
kind of ugly, so instead defer the call to sendTablespace() until
basebackup.c regains control. That way, it can still be a static
function.

Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
2020-06-17 10:57:34 -04:00
Thomas Munro 57cb806308 Fix locking bugs that could corrupt pg_control.
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.

Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().

Back-patch to all supported releases.

Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
2020-06-08 13:57:24 +12:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane 29c3e2dd5a Collect built-in LWLock tranche names statically, not dynamically.
There is little point in using the LWLockRegisterTranche mechanism for
built-in tranche names.  It wastes cycles, it creates opportunities for
bugs (since failing to register a tranche name is a very hard-to-detect
problem), and the lack of any centralized list of names encourages
sloppy nonconformity in name choices.  Moreover, since we have a
centralized list of the tranches anyway in enum BuiltinTrancheIds, we're
certainly not buying any flexibility in return for these disadvantages.

Hence, nuke all the backend-internal LWLockRegisterTranche calls,
and instead provide a const array of the builtin tranche names.

(I have in mind to change a bunch of these names shortly, but this
patch is just about getting them into one place.)

Discussion: https://postgr.es/m/9056.1589419765@sss.pgh.pa.us
2020-05-14 11:10:31 -04:00
Alvaro Herrera b060dbe000
Rework XLogReader callback system
Code review for 0dc8ead463, prompted by a bug closed by 91c40548d5.

XLogReader's system for opening and closing segments had gotten too
complicated, with callbacks being passed at both the XLogReaderAllocate
level (read_page) as well as at the WALRead level (segment_open).  This
was confusing and hard to follow, so restructure things so that these
callbacks are passed together at XLogReaderAllocate time, and add
another callback to the set (segment_close) to make it a coherent whole.
Also, ensure XLogReaderState is an argument to all the callbacks, so
that they can grab at the ->private data if necessary.

Document the whole arrangement more clearly.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
2020-05-08 15:40:11 -04:00
Amit Kapila 69bfaf2e1d Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain.  The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field.  Change the format to make WAL usage
statistics consistent with Buffer usage statistics.

This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.

Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-05-05 08:00:53 +05:30
Peter Eisentraut f057980149 Fix typo
from 303640199d
2020-04-26 13:48:33 +02:00
Michael Paquier 4e87c4836a Fix handling of WAL segments ready to be archived during crash recovery
78ea8b5 has fixed an issue related to the recycling of WAL segments on
standbys depending on archive_mode.  However, it has introduced a
regression with the handling of WAL segments ready to be archived during
crash recovery, causing those files to be recycled without getting
archived.

This commit fixes the regression by tracking in shared memory if a live
cluster is either in crash recovery or archive recovery as the handling
of WAL segments ready to be archived is different in both cases (those
WAL segments should not be removed during crash recovery), and by using
this new shared memory state to decide if a segment can be recycled or
not.  Previously, it was not possible to know if a cluster was in crash
recovery or archive recovery as the shared state was able to track only
if recovery was happening or not, leading to the problem.

A set of TAP tests is added to close the gap here, making sure that WAL
segments ready to be archived are correctly handled when a cluster is in
archive or crash recovery with archive_mode set to "on" or "always", for
both standby and primary.

Reported-by: Benoît Lobréau
Author: Jehan-Guillaume de Rorthais
Reviewed-by: Kyotaro Horiguchi, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200331172229.40ee00dc@firost
Backpatch-through: 9.5
2020-04-24 08:48:28 +09:00
Amit Kapila ef08ca113f Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-13 15:31:16 +05:30
Thomas Munro d140f2f3e2 Rationalize GetWalRcv{Write,Flush}RecPtr().
GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location.  Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".

Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value.  This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2020-04-08 23:45:09 +12:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
Alvaro Herrera c655077639
Allow users to limit storage reserved by replication slots
Replication slots are useful to retain data that may be needed by a
replication system.  But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space.  This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size.  Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
2020-04-07 18:35:00 -04:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Fujii Masao 4bd0ad9e44 Prevent archive recovery from scanning non-existent WAL files.
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.

To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.

Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao
Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
2020-04-08 00:49:29 +09:00
Peter Eisentraut 552fcebff0 Revert "Improve handling of parameter differences in physical replication"
This reverts commit 246f136e76.

That patch wasn't quite complete enough.

Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
2020-04-04 09:08:12 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Robert Haas 0d8c9c1210 Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 15:05:59 -04:00
Fujii Masao b0236508d3 Improve the message logged when recovery is paused.
When recovery target is reached and recovery is paused because of
recovery_target_action=pause, executing pg_wal_replay_resume() causes
the standby to promote, i.e., the recovery to end. So, in this case,
the previous message "Execute pg_wal_replay_resume() to continue"
logged was confusing because pg_wal_replay_resume() doesn't cause
the recovery to continue.

This commit improves the message logged when recovery is paused,
and the proper message is output based on what (pg_wal_replay_pause
or recovery_target_action) causes recovery to be paused.

Author: Sergei Kornilov, revised by Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
2020-04-01 03:35:13 +09:00
Michael Paquier 616ae3d2b0 Move routine definitions of xlogarchive.c to a new header file
The definitions of the routines defined in xlogarchive.c have been part
of xlog_internal.h which is included by several frontend tools, but all
those routines are only called by the backend.  More cleanup could be
done within xlog_internal.h, but that's already a nice cut.

This will help a follow-up patch for pg_rewind where handling of
restore_command is added for frontends.

Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Álvaro Herrera, Alexander Korotkov
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
2020-03-31 15:33:04 +09:00
Peter Eisentraut 246f136e76 Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  At that point we pause
recovery, so a hot standby remains usable.  We also repeat the last
warning message once a minute so it is harder to miss or ignore.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2020-03-30 09:53:45 +02:00
Alvaro Herrera 1e6148032e
Allow walreceiver configuration to change on reload
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart.  This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.

Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
2020-03-27 19:51:37 -03:00
Alvaro Herrera 092c6936de
Set wal_receiver_create_temp_slot PGC_POSTMASTER
Commit 3297308278 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP.  That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.

We handle the first problem by setting the option to disabled by
default.  If the user enables it, its on their head to make sure that
disk doesn't fill up.

We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value).  A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
2020-03-27 16:20:33 -03:00
Fujii Masao 496ee647ec Prefer standby promotion over recovery pause.
Previously if a promotion was triggered while recovery was paused,
the paused state continued. Also recovery could be paused by executing
pg_wal_replay_pause() even while a promotion was ongoing. That is,
recovery pause had higher priority over a standby promotion.
But this behavior was not desirable because most users basically wanted
the recovery to complete as soon as possible and the server to become
the master when they requested a promotion.

This commit changes recovery so that it prefers a promotion over
recovery pause. That is, if a promotion is triggered while recovery
is paused, the paused state ends and a promotion continues. Also
this commit makes recovery pause functions like pg_wal_replay_pause()
throw an error if they are executed while a promotion is ongoing.

Internally, this commit adds new internal function PromoteIsTriggered()
that returns true if a promotion is triggered. Since the name of
this function and the existing function IsPromoteTriggered() are
confusingly similar, the commit changes the name of IsPromoteTriggered()
to IsPromoteSignaled, as more appropriate name.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi, Sergei Kornilov
Discussion: https://postgr.es/m/00c194b2-dbbb-2e8a-5b39-13f14048ef0a@oss.nttdata.com
2020-03-24 12:46:48 +09:00
Fujii Masao b8e20d6dab Add wait events for WAL archive and recovery pause.
This commit introduces new wait events BackupWaitWalArchive and
RecoveryPause. The former is reported while waiting for the WAL files
required for the backup to be successfully archived. The latter is
reported while waiting for recovery in pause state to be resumed.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Robert Haas
Discussion: https://postgr.es/m/f0651f8c-9c96-9f29-0ff9-80414a15308a@oss.nttdata.com
2020-03-24 11:12:21 +09:00
Fujii Masao 1d253bae57 Rename the recovery-related wait events.
This commit renames RecoveryWalAll and RecoveryWalStream wait events to
RecoveryWalStream and RecoveryRetrieveRetryInterval, respectively,
in order to make the names and what they are more consistent. For example,
previously RecoveryWalAll was reported as a wait event while the recovery
was waiting for WAL from a stream, and which was confusing because the name
was very different from the situation where the wait actually could happen.

The names of macro variables for those wait events also are renamed
accordingly.

This commit also changes the category of RecoveryRetrieveRetryInterval to
Timeout from Activity because the wait event is reported while waiting based
on wal_retrieve_retry_interval.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Atsushi Torikoshi
Discussion: https://postgr.es/m/124997ee-096a-5d09-d8da-2c7a57d0816e@oss.nttdata.com
2020-03-19 15:32:55 +09:00
Fujii Masao 1429d3f767 Fix comment in xlog.c.
This commit fixes the comment about SharedHotStandbyActive variable.
The comment was apparently copy-and-pasted.

Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/CACZ0uYEjpqZB9wN2Rwc_RMvDybyYqdbkPuDr1NyxJg4f9yGfMw@mail.gmail.com
2020-03-17 12:06:22 +09:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Peter Eisentraut aaa3aeddee Remove HAVE_WORKING_LINK
Previously, hard links were not used on Windows and Cygwin, but they
support them just fine in currently supported OS versions, so we can
use them there as well.

Since all supported platforms now support hard links, we can remove
the alternative code paths.

Rename durable_link_or_rename() to durable_rename_excl() to make the
purpose more clear without referencing the implementation details.

Discussion: https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
2020-03-11 11:23:04 +01:00
Fujii Masao 71e0d0a737 Tidy up XLogSource code in xlog.c.
This commit replaces 0 used as an initial value of XLogSource variable,
with XLOG_FROM_ANY. Also this commit changes those variable so that
XLogSource instead of int is used as the type for them. These changes
are for code readability and debugger-friendliness.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
2020-03-10 09:41:44 +09:00
Fujii Masao ef34ab42a8 Avoid assertion failure with targeted recovery in standby mode.
At the end of recovery, standby mode is turned off to re-fetch the last
valid record from archive or pg_wal. Previously, if recovery target was
reached and standby mode was turned off while the current WAL source
was stream, recovery could try to retrieve WAL file containing the last
valid record unexpectedly from stream even though not in standby mode.
This caused an assertion failure. That is, the assertion test confirms that
WAL file should not be retrieved from stream if standby mode is not true.

This commit moves back the current WAL source to archive if it's stream
even though not in standby mode, to avoid that assertion failure.

This issue doesn't cause the server to crash when built with assertion
disabled. In this case, the attempt to retrieve WAL file from stream not
in standby mode just fails. And then recovery tries to retrieve WAL file
from archive or pg_wal.

Back-patch to all supported branches.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20200227.124830.2197604521555566121.horikyota.ntt@gmail.com
2020-03-09 15:32:32 +09:00
Fujii Masao e65497df8f Report progress of streaming base backup.
This commit adds pg_stat_progress_basebackup view that reports
the progress while an application like pg_basebackup is taking
a base backup. This uses the progress reporting infrastructure
added by c16dc1aca5, adding support for streaming base backup.

Bump catversion.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Amit Langote, Sergei Kornilov
Discussion: https://postgr.es/m/9ed8b801-8215-1f3d-62d7-65bff53f6e94@oss.nttdata.com
2020-03-03 12:03:43 +09:00
Tom Lane 3d475515a1 Account explicitly for long-lived FDs that are allocated outside fd.c.
The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.

This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c.  Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could.  The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.

To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c.  Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state.  The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections.  There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.

Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.)  I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now.  To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.

This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.

Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
2020-02-24 17:28:33 -05:00
Peter Eisentraut 79c2385915 Factor out InitControlFile() from BootStrapXLOG()
Right now this only makes BootStrapXLOG() a bit more manageable, but
in the future there may be external callers.

Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com
2020-02-22 12:09:27 +01:00
Peter Eisentraut 9745f93afc Reformat code comment
Discussion: https://www.postgresql.org/message-id/e8f86ba5-48f1-a80a-7f1d-b76bcb9c5c47@2ndquadrant.com
2020-02-22 12:09:27 +01:00
Peter Eisentraut dc788668bb Fail if recovery target is not reached
Before, if a recovery target is configured, but the archive ended
before the target was reached, recovery would end and the server would
promote without further notice.  That was deemed to be pretty wrong.
With this change, if the recovery target is not reached, it is a fatal
error.

Based-on-patch-by: Leif Gunnar Erlandsen <leif@lako.no>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@lako.no
2020-01-29 15:58:14 +01:00
Heikki Linnakangas 30012a04a6 Fix randAccess setting in ReadRecord()
Commit 38a957316d got this backwards.

Author: Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/20200128.194408.2260703306774646445.horikyota.ntt@gmail.com
2020-01-28 12:55:30 +02:00
Heikki Linnakangas 38a957316d Refactor XLogReadRecord(), adding XLogBeginRead() function.
The signature of XLogReadRecord() required the caller to pass the starting
WAL position as argument, or InvalidXLogRecPtr to continue reading at the
end of previous record. That's slightly awkward to the callers, as most
of them don't want to randomly jump around in the WAL stream, but start
reading at one position and then read everything from that point onwards.
Remove the 'RecPtr' argument and add a new function XLogBeginRead() to
specify the starting position instead. That's more convenient for the
callers. Also, xlogreader holds state that is reset when you change the
starting position, so having a separate function for doing that feels like
a more natural fit.

This changes XLogFindNextRecord() function so that it doesn't reset the
xlogreader's state to what it was before the call anymore. Instead, it
positions the xlogreader to the found record, like XLogBeginRead().

Reviewed-by: Kyotaro Horiguchi, Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/5382a7a3-debe-be31-c860-cb810c08f366%40iki.fi
2020-01-26 11:39:00 +02:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Alvaro Herrera c4dcd9144b Avoid splitting C string literals with \-newline
Using \ is unnecessary and ugly, so remove that.  While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.

Leave contrib/tablefunc alone.

Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
2019-12-24 12:44:12 -03:00
Robert Haas 303640199d Fix minor problems with non-exclusive backup cleanup.
The previous coding imagined that it could call before_shmem_exit()
when a non-exclusive backup began and then remove the previously-added
handler by calling cancel_before_shmem_exit() when that backup
ended. However, this only works provided that nothing else in the
system has registered a before_shmem_exit() hook in the interim,
because cancel_before_shmem_exit() is documented to remove a callback
only if it is the latest callback registered. It also only works
if nothing can ERROR out between the time that sessionBackupState
is reset and the time that cancel_before_shmem_exit(), which doesn't
seem to be strictly true.

To fix, leave the handler installed for the lifetime of the session,
arrange to install it just once, and teach it to quietly do nothing if
there isn't a non-exclusive backup in process.

This is a bug, but for now I'm not going to back-patch, because the
consequences are minor. It's possible to cause a spurious warning
to be generated, but that doesn't really matter. It's also possible
to trigger an assertion failure, but production builds shouldn't
have assertions enabled.

Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who
preferred a different approach, but got outvoted), Fujii Masao,
and Tom Lane, and with comments by various others.

Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com
2019-12-19 09:06:54 -05:00
Michael Paquier 70116493a8 Remove shadow variables linked to RedoRecPtr in xlog.c
This changes the routines in charge of recycling WAL segments past the
last redo LSN to not use anymore "RedoRecPtr" as a local variable, which
is also available in the context of the session as a static declaration,
replacing it with "lastredoptr".  This confusion has been introduced by
d9fadbf, so backpatch down to v11 like the other commit.

Thanks to Tom Lane, Robert Haas, Alvaro Herrera, Mark Dilger and Kyotaro
Horiguchi for the input provided.

Author: Ranier Vilela
Discussion: https://postgr.es/m/MN2PR18MB2927F7B5F690065E1194B258E35D0@MN2PR18MB2927.namprd18.prod.outlook.com
Backpatch-through: 11
2019-12-18 10:11:13 +09:00
Alvaro Herrera 91fca4bb60 Demote variable from global to local
recoveryDelayUntilTime was introduced by commit 36da3cfb45 as a global
because its method of operation was devilishly intrincate.  Commit
c945af80cf removed all that complexity and could have turned it into a
local variable, but didn't.  Do so now.

Discussion: https://postgr.es/m/20191213200751.GA10731@alvherre.pgsql
Reviewed-by: Michaël Paquier, Daniel Gustafsson
2019-12-16 14:23:56 -03:00
Michael Paquier 68ab982906 Fix thinkos from commit 9989d37
Error messages referring to incorrect WAL segment names could have been
generated for a fsync() failure or when creating a new segment at the
end of recovery.
2019-12-03 18:59:09 +09:00
Michael Paquier 9989d37d1c Remove XLogFileNameP() from the tree
XLogFileNameP() is a wrapper routine able to build a palloc'd string for
a WAL segment name, which is used for error string generation.  There
were several code paths where it gets called in a critical section,
where memory allocation is not allowed.  This results in triggering
an assertion failure instead of generating the wanted error message.

Another, more annoying, problem is that if the allocation to generate
the WAL segment name fails on OOM, then the failure would be escalated
to a PANIC.

This removes the routine and all its callers are replaced with a logic
using a fixed-size buffer.  This way, all the existing mistakes are
fixed and future ones are prevented.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CA+fd4k5gC9H4uoWMLg9K_QfNrnkkdEw+-AFveob9YX7z8JnKTA@mail.gmail.com
2019-12-03 15:06:04 +09:00
Peter Eisentraut 2e4db241bf Remove configure --disable-float4-byval
This build option was only useful to maintain compatibility for
version-0 functions, but those are no longer supported, so this option
can be removed.

float4 is now always pass-by-value; the pass-by-reference code path is
completely removed.

Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
2019-11-21 18:29:21 +01:00
Amit Kapila 14aec03502 Make the order of the header file includes consistent in backend modules.
Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order
of header file inclusion consistent for backend modules.

In the passing, removed a couple of duplicate inclusions.

Author: Vignesh C
Reviewed-by: Kuntal Ghosh and Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-12 08:30:16 +05:30
Peter Eisentraut b85e43feb3 More precise errors from initial pg_control check
Use a separate error message for invalid checkpoint location and
invalid state instead of just "invalid data" for both.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20191107041630.GK1768@paquier.xyz
2019-11-08 08:03:16 +01:00
Michael Paquier 51970fa8df Fix initialization of fake LSN for unlogged relations
9155580 has changed the value of the first fake LSN for unlogged
relations from 1 to FirstNormalUnloggedLSN (aka 1000), GiST requiring a
non-zero LSN on some pages to allow an interlocking logic to work, but
its value was still initialized to 1 at the beginning of recovery or
after running pg_resetwal.  This fixes the initialization for both code
paths.

Author: Takayuki Tsunakawa
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/OSBPR01MB2503CE851940C17DE44AE3D9FE6F0@OSBPR01MB2503.jpnprd01.prod.outlook.com
Backpatch-through: 12
2019-10-27 13:54:12 +09:00
Fujii Masao 3b0c59ac1c Fix typo in xlog.c.
Author: Fujii Masao
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAHGQGwH7dtYvOZZ8c0AG5AJwH5pfiRdKaCptY1_RdHy0HYeRfQ@mail.gmail.com
2019-10-24 14:13:36 +09:00
Fujii Masao ec1259e880 Fix failure of archive recovery with recovery_min_apply_delay enabled.
recovery_min_apply_delay parameter is intended for use with streaming
replication deployments. However, the document clearly explains that
the parameter will be honored in all cases if it's specified. So it should
take effect even if in archive recovery. But, previously, archive recovery
with recovery_min_apply_delay enabled always failed, and caused assertion
failure if --enable-caasert is enabled.

The cause of this problem is that; the ownership of recoveryWakeupLatch
that recovery_min_apply_delay uses was taken only when standby mode
is requested. So unowned latch could be used in archive recovery, and
which caused the failure.

This commit changes recovery code so that the ownership of
recoveryWakeupLatch is taken even in archive recovery. Which prevents
archive recovery with recovery_min_apply_delay from failing.

Back-patch to v9.4 where recovery_min_apply_delay was added.

Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
2019-10-18 22:32:18 +09:00
Fujii Masao 9b95a36be8 Make crash recovery ignore recovery_min_apply_delay setting.
In v11 or before, this setting could not take effect in crash recovery
because it's specified in recovery.conf and crash recovery always
starts without recovery.conf. But commit 2dedf4d9a8 integrated
recovery.conf into postgresql.conf and which unexpectedly allowed
this setting to take effect even in crash recovery. This is definitely
not good behavior.

To fix the issue, this commit makes crash recovery always ignore
recovery_min_apply_delay setting.

Back-patch to v12 where the issue was added.

Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
2019-10-18 22:24:18 +09:00
Fujii Masao 20961ceaf0 Make crash recovery ignore restore_command and recovery_end_command settings.
In v11 or before, those settings could not take effect in crash recovery
because they are specified in recovery.conf and crash recovery always
starts without recovery.conf. But commit 2dedf4d9a8 integrated
recovery.conf into postgresql.conf and which unexpectedly allowed
those settings to take effect even in crash recovery. This is definitely
not good behavior.

To fix the issue, this commit makes crash recovery always ignore
restore_command and recovery_end_command settings.

Back-patch to v12 where the issue was added.

Author: Fujii Masao
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
2019-10-11 15:47:59 +09:00
Michael Paquier df86e52cac Remove temporary WAL and history files at the end of archive recovery
cbc55da has reworked the order of some actions at the end of archive
recovery.  Unfortunately this overlooked the fact that the startup
process needs to remove RECOVERYXLOG (for temporary WAL segment newly
recovered from archives) and RECOVERYHISTORY (for temporary history
file) at this step, leaving the files around even after recovery ended.

Backpatch to 9.5, like the previous commit.

Author: Sawada Masahiko
Reviewed-by: Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBO_eDQub6zojFnWtnmutRBWvYf7=cW4Hsqj+U_R26w3Q@mail.gmail.com
Backpatch-through: 9.5
2019-10-02 15:53:07 +09:00
Fujii Masao 7acf8a876b Make crash recovery ignore recovery target settings.
In v11 or before, recovery target settings could not take effect in
crash recovery because they are specified in recovery.conf and
crash recovery always starts without recovery.conf. But commit
2dedf4d9a8 integrated recovery.conf into postgresql.conf and
which unexpectedly allowed recovery target settings to take effect
even in crash recovery. This is definitely not good behavior.

To fix the issue, this commit makes crash recovery always ignore
recovery target settings.

Back-patch to v12.

Author: Peter Eisentraut
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/e445616d-023e-a268-8aa1-67b8b335340c@pgmasters.net
2019-09-30 10:18:15 +09:00
Alvaro Herrera 709d003fbd Rework WAL-reading supporting structs
The state-tracking of WAL reading in various places was pretty messy,
mostly because the ancient physical-replication WAL reading code wasn't
using the XLogReader abstraction.  This led to some untidy code.  Make
it prettier by creating two additional supporting structs,
WALSegmentContext and WALOpenSegment which keep track of WAL-reading
state.  This makes code cleaner, as well as supports more future
cleanup.

Author: Antonin Houska
Reviewed-by: Álvaro Herrera and (older versions) Robert Haas
Discussion: https://postgr.es/m/14984.1554998742@spoje.net
2019-09-24 16:39:53 -03:00
Peter Eisentraut 887248e97e Message style fixes 2019-09-23 13:38:39 +02:00
Peter Eisentraut 48770492c3 Add some const decorations to array constants
Author: Mark G <markg735@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEeOP_YFVeFjq4zDZLDQbLSRFxBiTpwBQHxCNgGd%2Bp5VztTXyQ%40mail.gmail.com
2019-09-17 22:03:00 +02:00
Robert Haas 8b94dab066 Split tuptoaster.c into three separate files.
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.

toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.

heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.

detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap.  At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.

Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
2019-09-05 13:15:10 -04:00
Michael Paquier c96581abe4 Fix inconsistencies and typos in the tree, take 11
This fixes various typos in docs and comments, and removes some orphaned
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
2019-08-19 16:21:39 +09:00
Michael Paquier eb43f3d193 Fix inconsistencies and typos in the tree
This is numbered take 8, and addresses again a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
2019-07-29 12:28:30 +09:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Peter Eisentraut 7e9a4c5c3d Use consistent style for checking return from system calls
Use

    if (something() != 0)
        error ...

instead of just

    if (something)
        error ...

The latter is not incorrect, but it's a bit confusing and not the
common style.

Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
2019-07-07 15:28:49 +02:00
Peter Eisentraut 21f428ebde Don't call data type input functions in GUC check hooks
Instead of calling pg_lsn_in() in check_recovery_target_lsn and
timestamptz_in() in check_recovery_target_time, reorganize the
respective code so that we don't raise any errors in the check hooks.
The previous code tried to use PG_TRY/PG_CATCH to handle errors in a
way that is not safe, so now the code contains no ereport() calls and
can operate safely within the GUC error handling system.

Moreover, since the interpretation of the recovery_target_time string
may depend on the time zone, we cannot do the final processing of that
string until all the GUC processing is done.  Instead,
check_recovery_target_time() now does some parsing for syntax
checking, but the actual conversion to a timestamptz value is done
later in the recovery code that uses it.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190611061115.njjwkagvxp4qujhp%40alap3.anarazel.de
2019-06-30 10:27:43 +02:00
Amit Kapila 92c4abc736 Fix assorted inconsistencies.
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations.  Fix them.

Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
2019-06-08 08:16:38 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Fujii Masao b8e2170e40 Fix comment for issue_xlog_fsync().
"segno" is the argument for the function, not "log" and "seg".

Author: Antonin Houska
Discussion: https://postgr.es/m/11863.1558361020@spoje.net
2019-05-21 00:44:00 +09:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Heikki Linnakangas 9155580fd5 Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.

For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.

Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 17:03:15 +03:00
Thomas Munro 475861b261 Add wal_recycle and wal_init_zero GUCs.
On at least ZFS, it can be beneficial to create new WAL files every
time and not to bother zero-filling them.  Since it's not clear which
other filesystems might benefit from one or both of those things,
add individual GUCs to control those two behaviors independently and
make only very general statements in the docs.

Author: Jerry Jelinek, with some adjustments by Thomas Munro
Reviewed-by: Alvaro Herrera, Andres Freund, Tomas Vondra, Robert Haas and others
Discussion: https://postgr.es/m/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com
2019-04-02 14:37:14 +13:00
Peter Eisentraut cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Robert Haas c900c15269 Warn more strongly about the dangers of exclusive backup mode.
Especially, warn about the hazards of mishandling the backup_label
file.  Adjust a couple of server messages to be more clear about
the hazards associated with removing backup_label files, too.

David Steele and Robert Haas, reviewed by Laurenz Albe, Martín
Marqués, Peter Eisentraut, and Magnus Hagander.

Discussion: http://postgr.es/m/7d85c387-000e-16f0-e00b-50bf83c22127@pgmasters.net
2019-03-29 08:15:16 -04:00
Thomas Munro 2fc7af5e96 Add basic infrastructure for 64 bit transaction IDs.
Instead of inferring epoch progress from xids and checkpoints,
introduce a 64 bit FullTransactionId type and use it to track xid
generation.  This fixes an unlikely bug where the epoch is reported
incorrectly if the range of active xids wraps around more than once
between checkpoints.

The only user-visible effect of this commit is to correct the epoch
used by txid_current() and txid_status(), also visible with
pg_controldata, in those rare circumstances.  It also creates some
basic infrastructure so that later patches can use 64 bit
transaction IDs in more places.

The new type is a struct that we pass by value, as a form of strong
typedef.  This prevents the sort of accidental confusion between
TransactionId and FullTransactionId that would be possible if we
were to use a plain old uint64.

Author: Thomas Munro
Reported-by: Amit Kapila
Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
2019-03-28 18:12:20 +13:00
Michael Paquier 8b938d36f7 Refactor more code logic to update the control file
ce6afc6 has begun the refactoring work by plugging pg_rewind into a
central routine to update the control file, and left around two extra
copies, with one in xlog.c for the backend and one in pg_resetwal.c.  By
adding an extra option to the central routine in controldata_utils.c to
control if a flush of the control file needs to be done, it is proving
to be straight-forward to make xlog.c and pg_resetwal.c use the central
code path at the condition of moving the wait event tracking there.
Hence, this allows to have only one central code path to update the
control file, shaving the code from the duplicates.

This refactoring actually fixes a problem in pg_resetwal.  Previously,
the control file was first removed before being recreated.  So if a
crash happened between the moment the file was removed and the moment
the file was created, then it would have been possible to not have a
control file anymore in the database folder.

Author: Fabien Coelho
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre
2019-03-18 12:59:35 +09:00
Peter Eisentraut 893d6f8a1f Avoid casting away a const 2019-03-16 10:13:03 +01:00
Michael Paquier f2d84a4a6b Adjust error message for partial writes in WAL segments
93473c6 has removed openLogOff, changing on the way the error message
which is used to report partial writes to WAL segments.  The
newly-introduced error message used the offset up to which the write has
happened, keeping always the same total length to write.  This changes
the error message so as the number of bytes left to write are reported.

Reported-by: Michael Paquier
Author: Robert Haas
Discussion: https://postgr.es/m/20190306235251.GA17293@paquier.xyz
2019-03-11 09:31:25 +09:00
Michael Paquier 82a5649fb9 Tighten use of OpenTransientFile and CloseTransientFile
This fixes two sets of issues related to the use of transient files in
the backend:
1) OpenTransientFile() has been used in some code paths with read-write
flags while read-only is sufficient, so switch those calls to be
read-only where necessary.  These have been reported by Joe Conway.
2) When opening transient files, it is up to the caller to close the
file descriptors opened.  In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error.  However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it.  This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.

Note that one frontend code path is impacted by this commit so as an
error is issued when fetching control file data, making backend and
frontend to be treated consistently.

Reported-by: Joe Conway, Michael Paquier
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway
Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz
Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
2019-03-09 08:50:55 +09:00
Robert Haas 93473c6ac8 Removed unused variable, openLogOff.
Antonin Houska

Discussion: http://postgr.es/m/30413.1551870730@localhost
2019-03-06 09:44:08 -05:00
Michael Paquier ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
Michael Paquier c93001b3f9 Adjust comment about timeout when waiting for WAL at recovery
A timeout of 5s is used when waiting for WAL to become available at
recovery so as the startup process is able to react promptly if a
trigger file shows up.  However this missed the fact that the startup
process also relies on the timeout to check periodically the status of
any active WAL receiver.

Discussion: https://postgr.es/m/20190131070956.GE13429@paquier.xyz
2019-02-01 10:46:45 +09:00
Peter Eisentraut 0acb3bc33a Change default of recovery_target_timeline to 'latest'
This is what one usually wants for recovery and almost always wants
for a standby.

Discussion: https://www.postgresql.org/message-id/flat/6dd2c23a-4162-8469-410f-bfe146e28c0c@2ndquadrant.com/
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-01-13 10:01:05 +01:00
Bruce Momjian 97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
Michael Paquier 1707a0d2aa Remove configure switch --disable-strong-random
This removes a portion of infrastructure introduced by fe0a0b5 to allow
compilation of Postgres in environments where no strong random source is
available, meaning that there is no linking to OpenSSL and no
/dev/urandom (Windows having its own CryptoAPI).  No systems shipped
this century lack /dev/urandom, and the buildfarm is actually not
testing this switch at all, so just remove it.  This simplifies
particularly some backend code which included a fallback implementation
using shared memory, and removes a set of alternate regression output
files from pgcrypto.

Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz
2019-01-01 20:05:51 +09:00
Michael Paquier 8d3b389ec3 Improve comments and logs in do_pg_stop/start_backup
The function name pg_stop_backup() has been included for ages in some
log messages when stopping the backup, which is confusing for base
backups taken with the replication protocol because this function is
never called.  Some other comments and messages in this area are
improved while on it.

The new wording is based on input and suggestions from several people,
all listed below.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/20181221040510.GA12599@paquier.xyz
2019-01-01 08:53:02 +09:00
Peter Eisentraut 60d99797bf Change "checkpoint starting" message to use "wal"
This catches up with the recent renaming of all user-facing mentions
of "xlog" to "wal".

Discussion: https://www.postgresql.org/message-id/flat/20181129084708.GA9562%40msg.credativ.de
2018-12-30 22:23:01 +01:00
Tom Lane 8528e3d849 Fix failure to check for open() or fsync() failures.
While it seems OK to not be concerned about fsync() failure for a
pre-existing signal file, it's not OK to not even check for open()
failure.  This at least causes complaints from static analyzers,
and I think on some platforms passing -1 to fsync() or close() might
trigger assertion-type failures.  Also add (void) casts to make clear
that we're ignoring fsync's result intentionally.

Oversights in commit 2dedf4d9a, noted by Coverity.
2018-12-26 16:08:17 -05:00
Peter Eisentraut 2dedf4d9a8 Integrate recovery.conf into postgresql.conf
recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
2018-11-25 16:33:40 +01:00
Thomas Munro cfdf4dc4fc Add WL_EXIT_ON_PM_DEATH pseudo-event.
Users of the WaitEventSet and WaitLatch() APIs can now choose between
asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking
for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death.
This reduces code duplication, since almost all callers want the latter.

Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).

Assert that callers of WaitLatchXXX() under the postmaster remember to
ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent
future bugs.

The only process that doesn't handle postmaster death is syslogger.  It
waits until all backends holding the write end of the syslog pipe
(including the postmaster) have closed it by exiting, to be sure to
capture any parting messages.  By using the WaitEventSet API directly
it avoids the new assertion, and as a by-product it may be slightly
more efficient on platforms that have epoll().

Author: Thomas Munro
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com,
            https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
2018-11-23 20:46:34 +13:00
Thomas Munro 9ccdd7f66e PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure.  In that case the only remaining copy of the
data is in the WAL.  A subsequent fsync() could appear to succeed,
but not have flushed the data.  That means that a future checkpoint
could apparently complete successfully but have lost data.

Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure.  Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.

Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses.  If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail.  The GUC defaults
to off, meaning that we panic.

Back-patch to all supported releases.

There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error.  A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.

Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
2018-11-19 17:41:26 +13:00
Michael Paquier 285bd0ac4a Remove unnecessary memcpy when reading WAL record fitting on page
When reading a WAL record, its contents are copied into an intermediate
buffer.  However, doing so is not necessary if the record fits fully
into the current page, saving one memcpy for each such record.  The
allocation handling of the intermediate buffer is also now done only
when a record crosses a page boundary, shaving some extra cycles when
reading a WAL record.

Author: Andrey Lepikhov
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas
Discussion: https://postgr.es/m/c2ea54dd-a1d3-80eb-ddbf-7e6f258e615e@postgrespro.ru
2018-11-19 10:25:48 +09:00
Thomas Munro c24dcd0cfd Use pg_pread() and pg_pwrite() for data files and WAL.
Cut down on system calls by doing random I/O using offset-based OS
routines where available.  Remove the code for tracking the 'virtual'
seek position.  The only reason left to call FileSeek() was to get
the file's size, so provide a new function FileSize() instead.

Author: Oskari Saarenmaa, Thomas Munro
Reviewed-by: Thomas Munro, Jesper Pedersen, Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/CAEepm=02rapCpPR3ZGF2vW=SBHSdFYO_bz_f-wwWJonmA3APgw@mail.gmail.com
Discussion: https://postgr.es/m/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com
Discussion: https://postgr.es/m/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi
2018-11-07 09:51:50 +13:00
Andres Freund c4ab62f9ac Fix typo in xlog.c.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/A6817958-949E-4A5B-895D-FA421B6640C2@yesql.se
2018-10-31 07:50:32 -07:00
Michael Paquier 10074651e3 Add pg_promote function
This function is able to promote a standby with this new SQL-callable
function.  Execution access can be granted to non-superusers so that
failover tools can observe the principle of least privilege.

Catalog version is bumped.

Author: Laurenz Albe
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at
2018-10-25 09:46:00 +09:00