Fix failure to remove non-first segments of temporary tables.

Commit 4ab5dae94 broke mdunlinkfork's logic for removing additional
segments of a multi-gigabyte table, because it neglected to advance
"segno" after unlinking the first segment, in the code path where it
chooses to unlink that one immediately.  Then the main remove loop
gets ENOENT at segment zero and figures it's done, so we never remove
whatever additional segments might exist.

The main problem here is with large temporary tables, but WAL replay
of a drop of a large regular table would also fail to remove extra
segments.  The third case where this path is taken is for non-main
forks; but I doubt it matters for those since they probably never
exceed 1GB.

The simplest fix is just to increment segno after that unlink().
(Probably this logic could do with a more thorough rethink, but not
with mere hours to go before 15.1 wraps.)

While here, also fix an incautious assumption that
register_forget_request cannot change errno.  I don't think that
that has any really bad consequences, as we'd end up trying to unlink
the zero'th segment either way, but it greatly complicates reasoning
about what could happen here.  Also make a couple of other cosmetic
fixes.

Per bug #17679 from Balazs Szilfai.  Back-patch into v15, as the
faulty patch was.

Discussion: https://postgr.es/m/17679-1095d04450cf6a6e@postgresql.org
This commit is contained in:
Tom Lane 2022-11-07 11:36:45 -05:00
parent 7134af1149
commit 5fe0ab4201

View File

@ -330,11 +330,15 @@ mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
{
if (!RelFileNodeBackendIsTemp(rnode))
{
int save_errno;
/* Prevent other backends' fds from holding on to the disk space */
ret = do_truncate(path);
/* Forget any pending sync requests for the first segment */
save_errno = errno;
register_forget_request(rnode, forkNum, 0 /* first seg */ );
errno = save_errno;
}
else
ret = 0;
@ -347,6 +351,7 @@ mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
segno++;
}
}
else
@ -359,21 +364,22 @@ mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
* segment later, rather than now.
*
* If we're performing a binary upgrade, the dangers described in the
* header comments for mdunlink() do not exist, since after a crash
* or even a simple ERROR, the upgrade fails and the whole new cluster
* header comments for mdunlink() do not exist, since after a crash or
* even a simple ERROR, the upgrade fails and the whole new cluster
* must be recreated from scratch. And, on the other hand, it is
* important to remove the files from disk immediately, because we
* may be about to reuse the same relfilenode.
* important to remove the files from disk immediately, because we may
* be about to reuse the same relfilenode.
*/
if (!IsBinaryUpgrade)
{
register_unlink_segment(rnode, forkNum, 0 /* first seg */ );
++segno;
segno++;
}
}
/*
* Delete any additional segments.
* Delete any remaining segments (we might or might not have dealt with
* the first one above).
*/
if (ret >= 0)
{