postgresql/src
Michael Paquier 6d8727f95e Ensure cleanup of orphan archive status files
When a WAL segment is recycled, its ".ready" and ".done" status files
get also automatically removed, however this is not done in a durable
manner.  Hence, in a subsequent crash, it could be possible that a
".ready" status file is still around with its corresponding segment
already gone.

If the backend reaches such a state, the archive command would most
likely complain about a segment non-existing and would keep retrying,
causing WAL segments to bloat pg_wal/, potentially making Postgres crash
hard when running out of space.

As status files are removed after each individual segment, using
durable_unlink() does not completely close the window either, as a crash
could happen between the moment the WAL segment is recycled and the
moment its status files are removed.  This has also some performance
impact with the additional fsync() calls needed to make the removal in a
durable manner.  Doing the cleanup at recovery is not cost-free either
as this makes crash recovery potentially take longer than necessary.

So, instead, as per an idea of Stephen Frost, make the archiver aware of
orphan status files and remove them on-the-fly if the corresponding
segment goes missing.  Removal failures follow a model close to what
happens for WAL segments, where multiple attempts are done before giving
up temporarily, and where a successful orphan removal makes the archiver
move immediately to the next WAL segment thought as ready to be
archived.

Author: Michael Paquier
Reviewed-by: Nathan Bossart, Andres Freund, Stephen Frost, Kyotaro
Horiguchi
Discussion: https://postgr.es/m/20180928032827.GF1500@paquier.xyz
2018-12-10 15:00:59 +09:00
..
backend Ensure cleanup of orphan archive status files 2018-12-10 15:00:59 +09:00
bin Cleanup minor pg_dump memory leaks 2018-12-06 11:11:21 -05:00
common Improve our response to invalid format strings, and detect more cases. 2018-12-06 15:08:44 -05:00
fe_utils Fix translation of special characters in psql's LaTeX output modes. 2018-11-26 17:32:51 -05:00
include Add timestamp of last received message from standby to pg_stat_replication 2018-12-09 16:35:06 +09:00
interfaces In PQprint(), write HTML table trailer before closing the output pipe. 2018-12-07 13:11:30 -05:00
makefiles Add PGXS options to control TAP and isolation tests, take two 2018-12-03 09:27:35 +09:00
pl Fix some errhint and errdetail strings missing a period 2018-12-07 07:47:42 +09:00
port Improve our response to invalid format strings, and detect more cases. 2018-12-06 15:08:44 -05:00
template Yet further rethinking of build changes for macOS Mojave. 2018-11-02 18:54:00 -04:00
test Add timestamp of last received message from standby to pg_stat_replication 2018-12-09 16:35:06 +09:00
timezone Sync our copy of the timezone library with IANA release tzcode2018g. 2018-10-31 09:47:53 -04:00
tools Eliminate parallel-make hazard in ecpg/preproc. 2018-12-01 17:19:51 -05:00
tutorial Deduplicate "invalid input syntax" messages for various types. 2018-07-22 14:58:01 -07:00
.gitignore
DEVELOPERS
Makefile Fix partial-build problems introduced by having more generated headers. 2018-04-09 16:42:10 -04:00
Makefile.global.in Remove useless symbol from Makefile.global. 2018-11-06 10:57:51 -05:00
Makefile.shlib Ensure static libraries have correct mod time even if ranlib messes it up. 2018-11-29 15:53:44 -05:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00