Fix data loss in wal_level=minimal crash recovery of CREATE TABLESPACE.

If the system crashed between CREATE TABLESPACE and the next checkpoint,
the result could be some files in the tablespace unexpectedly containing
no rows.  Affected files would be those for which the system did not
write WAL; see the wal_skip_threshold documentation.  Before v13, a
different set of conditions governed the writing of WAL; see v12's
<sect2 id="populate-pitr">.  (The v12 conditions were broader in some
ways and narrower in others.)  Users may want to audit non-default
tablespaces for unexpected short files.  The bug could have truncated an
index without affecting the associated table, and reindexing the index
would fix that particular problem.

This fixes the bug by making create_tablespace_directories() more like
TablespaceCreateDbspace().  create_tablespace_directories() was
recursively removing tablespace contents, reasoning that WAL redo would
recreate everything removed that way.  That assumption holds for other
wal_level values.  Under wal_level=minimal, the old approach could
delete files for which no other copy existed.  Back-patch to 9.6 (all
supported versions).

Reviewed by Robert Haas and Prabhat Sahu.  Reported by Robert Haas.

Discussion: https://postgr.es/m/CA+TgmoaLO9ncuwvr2nN-J4VEP5XyAcy=zKiHxQzBbFRxxGxm0w@mail.gmail.com
This commit is contained in:
Noah Misch 2021-08-27 23:33:23 -07:00
parent 7c1b0f4c35
commit 6ebd2426bd

View File

@ -603,40 +603,36 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid)
location)));
}
if (InRecovery)
{
/*
* Our theory for replaying a CREATE is to forcibly drop the target
* subdirectory if present, and then recreate it. This may be more
* work than needed, but it is simple to implement.
*/
if (stat(location_with_version_dir, &st) == 0 && S_ISDIR(st.st_mode))
{
if (!rmtree(location_with_version_dir, true))
/* If this failed, MakePGDirectory() below is going to error. */
ereport(WARNING,
(errmsg("some useless files may be left behind in old database directory \"%s\"",
location_with_version_dir)));
}
}
/*
* The creation of the version directory prevents more than one tablespace
* in a single location.
* in a single location. This imitates TablespaceCreateDbspace(), but it
* ignores concurrency and missing parent directories. The chmod() would
* have failed in the absence of a parent. pg_tablespace_spcname_index
* prevents concurrency.
*/
if (MakePGDirectory(location_with_version_dir) < 0)
if (stat(location_with_version_dir, &st) < 0)
{
if (errno == EEXIST)
if (errno != ENOENT)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("directory \"%s\" already in use as a tablespace",
(errcode_for_file_access(),
errmsg("could not stat directory \"%s\": %m",
location_with_version_dir)));
else
else if (MakePGDirectory(location_with_version_dir) < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not create directory \"%s\": %m",
location_with_version_dir)));
}
else if (!S_ISDIR(st.st_mode))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("\"%s\" exists but is not a directory",
location_with_version_dir)));
else if (!InRecovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("directory \"%s\" already in use as a tablespace",
location_with_version_dir)));
/*
* In recovery, remove old symlink, in case it points to the wrong place.