postgresql/src/backend/storage/file/copydir.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

217 lines
5.6 KiB
C
Raw Normal View History

2003-11-12 00:52:45 +01:00
/*-------------------------------------------------------------------------
*
* copydir.c
* copies a directory
*
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
2003-11-12 00:52:45 +01:00
* Portions Copyright (c) 1994, Regents of the University of California
*
* While "xcopy /e /i /q" works fine for copying directories, on Windows XP
* it requires a Window handle which prevents it from working when invoked
* as a service.
*
2003-11-12 00:52:45 +01:00
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/storage/file/copydir.c
2003-11-12 00:52:45 +01:00
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <fcntl.h>
#include <unistd.h>
#include "common/file_utils.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/copydir.h"
#include "storage/fd.h"
/*
* copydir: copy a directory
*
* If recurse is false, subdirectories are ignored. Anything that's not
* a directory or a regular file is ignored.
*/
void
copydir(const char *fromdir, const char *todir, bool recurse)
{
DIR *xldir;
struct dirent *xlde;
char fromfile[MAXPGPATH * 2];
char tofile[MAXPGPATH * 2];
if (MakePGDirectory(todir) != 0)
ereport(ERROR,
2003-07-27 19:10:07 +02:00
(errcode_for_file_access(),
errmsg("could not create directory \"%s\": %m", todir)));
xldir = AllocateDir(fromdir);
while ((xlde = ReadDir(xldir, fromdir)) != NULL)
{
PGFileType xlde_type;
/* If we got a cancel signal during the copy of the directory, quit */
CHECK_FOR_INTERRUPTS();
if (strcmp(xlde->d_name, ".") == 0 ||
strcmp(xlde->d_name, "..") == 0)
continue;
snprintf(fromfile, sizeof(fromfile), "%s/%s", fromdir, xlde->d_name);
snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name);
xlde_type = get_dirent_type(fromfile, xlde, false, ERROR);
if (xlde_type == PGFILETYPE_DIR)
{
/* recurse to handle subdirectories */
if (recurse)
copydir(fromfile, tofile, true);
}
else if (xlde_type == PGFILETYPE_REG)
copy_file(fromfile, tofile);
}
FreeDir(xldir);
/*
* Be paranoid here and fsync all files to ensure the copy is really done.
* But if fsync is disabled, we're done.
*/
if (!enableFsync)
return;
xldir = AllocateDir(todir);
while ((xlde = ReadDir(xldir, todir)) != NULL)
{
if (strcmp(xlde->d_name, ".") == 0 ||
strcmp(xlde->d_name, "..") == 0)
continue;
snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name);
/*
* We don't need to sync subdirectories here since the recursive
* copydir will do it before it returns
*/
if (get_dirent_type(tofile, xlde, false, ERROR) == PGFILETYPE_REG)
fsync_fname(tofile, false);
}
FreeDir(xldir);
/*
* It's important to fsync the destination directory itself as individual
* file fsyncs don't guarantee that the directory entry for the file is
* synced. Recent versions of ext4 have made the window much wider but
* it's been true for ext3 and other filesystems in the past.
*/
fsync_fname(todir, true);
}
/*
* copy one file
*/
void
copy_file(const char *fromfile, const char *tofile)
{
char *buffer;
int srcfd;
int dstfd;
int nbytes;
off_t offset;
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
off_t flush_offset;
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
/* Size of copy buffer (read and write requests) */
#define COPY_BUF_SIZE (8 * BLCKSZ)
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
/*
* Size of data flush requests. It seems beneficial on most platforms to
* do this every 1MB or so. But macOS, at least with early releases of
* APFS, is really unfriendly to small mmap/msync requests, so there do it
* only every 32MB.
*/
#if defined(__darwin__)
#define FLUSH_DISTANCE (32 * 1024 * 1024)
#else
#define FLUSH_DISTANCE (1024 * 1024)
#endif
/* Use palloc to ensure we get a maxaligned buffer */
buffer = palloc(COPY_BUF_SIZE);
/*
* Open the files
*/
srcfd = OpenTransientFile(fromfile, O_RDONLY | PG_BINARY);
if (srcfd < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", fromfile)));
dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
if (dstfd < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not create file \"%s\": %m", tofile)));
/*
* Do the data copying.
*/
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
flush_offset = 0;
for (offset = 0;; offset += nbytes)
{
/* If we got a cancel signal during the copy of the file, quit */
CHECK_FOR_INTERRUPTS();
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
/*
* We fsync the files later, but during the copy, flush them every so
* often to avoid spamming the cache and hopefully get the kernel to
* start writing them out before the fsync comes.
*/
if (offset - flush_offset >= FLUSH_DISTANCE)
{
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
flush_offset = offset;
}
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
pgstat_report_wait_end();
if (nbytes < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read file \"%s\": %m", fromfile)));
if (nbytes == 0)
break;
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_WRITE);
if ((int) write(dstfd, buffer, nbytes) != nbytes)
{
/* if write didn't set errno, assume problem is no disk space */
if (errno == 0)
errno = ENOSPC;
ereport(ERROR,
2003-07-27 19:10:07 +02:00
(errcode_for_file_access(),
errmsg("could not write to file \"%s\": %m", tofile)));
}
pgstat_report_wait_end();
}
Increase distance between flush requests during bulk file copies. copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
if (offset > flush_offset)
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
if (CloseTransientFile(dstfd) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not close file \"%s\": %m", tofile)));
if (CloseTransientFile(srcfd) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not close file \"%s\": %m", fromfile)));
pfree(buffer);
}