2003-11-12 00:52:45 +01:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* copydir.c
|
|
|
|
* copies a directory
|
|
|
|
*
|
2019-01-02 18:44:25 +01:00
|
|
|
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
|
2003-11-12 00:52:45 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
2003-05-15 19:59:17 +02:00
|
|
|
* While "xcopy /e /i /q" works fine for copying directories, on Windows XP
|
2003-09-10 22:12:01 +02:00
|
|
|
* it requires a Window handle which prevents it from working when invoked
|
2003-05-15 19:59:17 +02:00
|
|
|
* as a service.
|
2003-09-10 22:12:01 +02:00
|
|
|
*
|
2003-11-12 00:52:45 +01:00
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/storage/file/copydir.c
|
2003-11-12 00:52:45 +01:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
2003-05-15 19:59:17 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2005-08-02 21:02:32 +02:00
|
|
|
#include <fcntl.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
|
2010-11-12 22:39:53 +01:00
|
|
|
#include "storage/copydir.h"
|
2004-02-24 00:03:10 +01:00
|
|
|
#include "storage/fd.h"
|
2010-07-01 22:12:40 +02:00
|
|
|
#include "miscadmin.h"
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
#include "pgstat.h"
|
2005-08-02 21:02:32 +02:00
|
|
|
|
2003-09-10 22:12:01 +02:00
|
|
|
/*
|
2005-08-02 21:02:32 +02:00
|
|
|
* copydir: copy a directory
|
2003-09-10 22:12:01 +02:00
|
|
|
*
|
2005-08-02 21:02:32 +02:00
|
|
|
* If recurse is false, subdirectories are ignored. Anything that's not
|
|
|
|
* a directory or a regular file is ignored.
|
2003-09-10 22:12:01 +02:00
|
|
|
*/
|
2005-08-02 21:02:32 +02:00
|
|
|
void
|
|
|
|
copydir(char *fromdir, char *todir, bool recurse)
|
2003-05-15 19:59:17 +02:00
|
|
|
{
|
|
|
|
DIR *xldir;
|
|
|
|
struct dirent *xlde;
|
2017-04-11 20:13:31 +02:00
|
|
|
char fromfile[MAXPGPATH * 2];
|
|
|
|
char tofile[MAXPGPATH * 2];
|
2003-05-15 19:59:17 +02:00
|
|
|
|
2018-04-07 23:45:39 +02:00
|
|
|
if (MakePGDirectory(todir) != 0)
|
2005-08-02 21:02:32 +02:00
|
|
|
ereport(ERROR,
|
2003-07-27 19:10:07 +02:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m", todir)));
|
2005-08-02 21:02:32 +02:00
|
|
|
|
2004-02-24 00:03:10 +01:00
|
|
|
xldir = AllocateDir(fromdir);
|
2005-08-02 21:02:32 +02:00
|
|
|
|
|
|
|
while ((xlde = ReadDir(xldir, fromdir)) != NULL)
|
|
|
|
{
|
2005-10-15 04:49:52 +02:00
|
|
|
struct stat fst;
|
2005-08-02 21:02:32 +02:00
|
|
|
|
2010-07-06 21:19:02 +02:00
|
|
|
/* If we got a cancel signal during the copy of the directory, quit */
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
2010-07-01 22:12:40 +02:00
|
|
|
|
2005-10-15 04:49:52 +02:00
|
|
|
if (strcmp(xlde->d_name, ".") == 0 ||
|
2005-08-02 21:02:32 +02:00
|
|
|
strcmp(xlde->d_name, "..") == 0)
|
2005-10-15 04:49:52 +02:00
|
|
|
continue;
|
2005-08-02 21:02:32 +02:00
|
|
|
|
2017-04-11 20:13:31 +02:00
|
|
|
snprintf(fromfile, sizeof(fromfile), "%s/%s", fromdir, xlde->d_name);
|
|
|
|
snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name);
|
2005-08-02 21:02:32 +02:00
|
|
|
|
2006-07-19 00:36:46 +02:00
|
|
|
if (lstat(fromfile, &fst) < 0)
|
2005-08-02 21:02:32 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2005-10-29 02:31:52 +02:00
|
|
|
errmsg("could not stat file \"%s\": %m", fromfile)));
|
2005-08-02 21:02:32 +02:00
|
|
|
|
Fix a number of places that were making file-type tests infelicitously.
The places that did, eg,
(statbuf.st_mode & S_IFMT) == S_IFDIR
were correct, but there is no good reason not to use S_ISDIR() instead,
especially when that's what the other 90% of our code does. The places
that did, eg,
(statbuf.st_mode & S_IFDIR)
were flat out *wrong* and would fail in various platform-specific ways,
eg a symlink could be mistaken for a regular file on most Unixen.
The actual impact of this is probably small, since the problem cases
seem to always involve symlinks or sockets, which are unlikely to be
found in the directories that PG code might be scanning. But it's
clearly trouble waiting to happen, so patch all the way back anyway.
(There seem to be no occurrences of the mistake in 7.4.)
2008-03-31 03:31:43 +02:00
|
|
|
if (S_ISDIR(fst.st_mode))
|
2005-08-02 21:02:32 +02:00
|
|
|
{
|
|
|
|
/* recurse to handle subdirectories */
|
|
|
|
if (recurse)
|
|
|
|
copydir(fromfile, tofile, true);
|
|
|
|
}
|
Fix a number of places that were making file-type tests infelicitously.
The places that did, eg,
(statbuf.st_mode & S_IFMT) == S_IFDIR
were correct, but there is no good reason not to use S_ISDIR() instead,
especially when that's what the other 90% of our code does. The places
that did, eg,
(statbuf.st_mode & S_IFDIR)
were flat out *wrong* and would fail in various platform-specific ways,
eg a symlink could be mistaken for a regular file on most Unixen.
The actual impact of this is probably small, since the problem cases
seem to always involve symlinks or sockets, which are unlikely to be
found in the directories that PG code might be scanning. But it's
clearly trouble waiting to happen, so patch all the way back anyway.
(There seem to be no occurrences of the mistake in 7.4.)
2008-03-31 03:31:43 +02:00
|
|
|
else if (S_ISREG(fst.st_mode))
|
2005-08-02 21:02:32 +02:00
|
|
|
copy_file(fromfile, tofile);
|
2003-05-15 19:59:17 +02:00
|
|
|
}
|
2010-02-22 03:50:10 +01:00
|
|
|
FreeDir(xldir);
|
2003-05-15 19:59:17 +02:00
|
|
|
|
2010-02-14 18:50:52 +01:00
|
|
|
/*
|
2010-02-22 03:50:10 +01:00
|
|
|
* Be paranoid here and fsync all files to ensure the copy is really done.
|
2012-07-22 02:10:29 +02:00
|
|
|
* But if fsync is disabled, we're done.
|
2010-02-14 18:50:52 +01:00
|
|
|
*/
|
2012-07-22 02:10:29 +02:00
|
|
|
if (!enableFsync)
|
|
|
|
return;
|
|
|
|
|
2010-02-22 03:50:10 +01:00
|
|
|
xldir = AllocateDir(todir);
|
2010-02-15 01:50:57 +01:00
|
|
|
|
2010-02-22 03:50:10 +01:00
|
|
|
while ((xlde = ReadDir(xldir, todir)) != NULL)
|
2010-02-15 01:50:57 +01:00
|
|
|
{
|
2010-02-15 12:40:49 +01:00
|
|
|
struct stat fst;
|
|
|
|
|
2010-02-15 01:50:57 +01:00
|
|
|
if (strcmp(xlde->d_name, ".") == 0 ||
|
|
|
|
strcmp(xlde->d_name, "..") == 0)
|
|
|
|
continue;
|
|
|
|
|
2017-04-11 20:13:31 +02:00
|
|
|
snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name);
|
2010-02-15 12:40:49 +01:00
|
|
|
|
2010-02-22 03:50:10 +01:00
|
|
|
/*
|
|
|
|
* We don't need to sync subdirectories here since the recursive
|
|
|
|
* copydir will do it before it returns
|
|
|
|
*/
|
|
|
|
if (lstat(tofile, &fst) < 0)
|
2010-02-15 12:40:49 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2010-02-22 03:50:10 +01:00
|
|
|
errmsg("could not stat file \"%s\": %m", tofile)));
|
|
|
|
|
2010-02-15 12:40:49 +01:00
|
|
|
if (S_ISREG(fst.st_mode))
|
2010-02-28 22:05:30 +01:00
|
|
|
fsync_fname(tofile, false);
|
2010-02-15 01:50:57 +01:00
|
|
|
}
|
|
|
|
FreeDir(xldir);
|
|
|
|
|
2010-02-22 03:50:10 +01:00
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* It's important to fsync the destination directory itself as individual
|
|
|
|
* file fsyncs don't guarantee that the directory entry for the file is
|
|
|
|
* synced. Recent versions of ext4 have made the window much wider but
|
|
|
|
* it's been true for ext3 and other filesystems in the past.
|
2010-02-15 01:50:57 +01:00
|
|
|
*/
|
2010-02-28 22:05:30 +01:00
|
|
|
fsync_fname(todir, true);
|
2005-08-02 21:02:32 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* copy one file
|
|
|
|
*/
|
2010-12-29 12:48:53 +01:00
|
|
|
void
|
2005-08-02 21:02:32 +02:00
|
|
|
copy_file(char *fromfile, char *tofile)
|
|
|
|
{
|
2005-09-02 20:55:32 +02:00
|
|
|
char *buffer;
|
2005-08-02 21:02:32 +02:00
|
|
|
int srcfd;
|
|
|
|
int dstfd;
|
|
|
|
int nbytes;
|
2010-02-15 01:50:57 +01:00
|
|
|
off_t offset;
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
off_t flush_offset;
|
2005-08-02 21:02:32 +02:00
|
|
|
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
/* Size of copy buffer (read and write requests) */
|
2005-09-02 20:55:32 +02:00
|
|
|
#define COPY_BUF_SIZE (8 * BLCKSZ)
|
|
|
|
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
/*
|
|
|
|
* Size of data flush requests. It seems beneficial on most platforms to
|
|
|
|
* do this every 1MB or so. But macOS, at least with early releases of
|
|
|
|
* APFS, is really unfriendly to small mmap/msync requests, so there do it
|
|
|
|
* only every 32MB.
|
|
|
|
*/
|
|
|
|
#if defined(__darwin__)
|
|
|
|
#define FLUSH_DISTANCE (32 * 1024 * 1024)
|
|
|
|
#else
|
|
|
|
#define FLUSH_DISTANCE (1024 * 1024)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Use palloc to ensure we get a maxaligned buffer */
|
2005-09-02 20:55:32 +02:00
|
|
|
buffer = palloc(COPY_BUF_SIZE);
|
|
|
|
|
2005-08-02 21:02:32 +02:00
|
|
|
/*
|
|
|
|
* Open the files
|
|
|
|
*/
|
2017-09-23 15:49:22 +02:00
|
|
|
srcfd = OpenTransientFile(fromfile, O_RDONLY | PG_BINARY);
|
2005-08-02 21:02:32 +02:00
|
|
|
if (srcfd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open file \"%s\": %m", fromfile)));
|
|
|
|
|
2017-09-23 15:49:22 +02:00
|
|
|
dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
|
2005-08-02 21:02:32 +02:00
|
|
|
if (dstfd < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create file \"%s\": %m", tofile)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Do the data copying.
|
|
|
|
*/
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
flush_offset = 0;
|
2010-02-26 03:01:40 +01:00
|
|
|
for (offset = 0;; offset += nbytes)
|
2003-05-15 19:59:17 +02:00
|
|
|
{
|
2010-07-06 21:19:02 +02:00
|
|
|
/* If we got a cancel signal during the copy of the file, quit */
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
2010-07-01 22:12:40 +02:00
|
|
|
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
/*
|
|
|
|
* We fsync the files later, but during the copy, flush them every so
|
|
|
|
* often to avoid spamming the cache and hopefully get the kernel to
|
|
|
|
* start writing them out before the fsync comes.
|
|
|
|
*/
|
|
|
|
if (offset - flush_offset >= FLUSH_DISTANCE)
|
|
|
|
{
|
|
|
|
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
|
|
|
|
flush_offset = offset;
|
|
|
|
}
|
|
|
|
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
|
2005-09-02 20:55:32 +02:00
|
|
|
nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2005-08-02 21:02:32 +02:00
|
|
|
if (nbytes < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not read file \"%s\": %m", fromfile)));
|
|
|
|
if (nbytes == 0)
|
|
|
|
break;
|
|
|
|
errno = 0;
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_WRITE);
|
2005-08-02 21:02:32 +02:00
|
|
|
if ((int) write(dstfd, buffer, nbytes) != nbytes)
|
2003-08-04 02:43:34 +02:00
|
|
|
{
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2005-08-02 21:02:32 +02:00
|
|
|
/* if write didn't set errno, assume problem is no disk space */
|
|
|
|
if (errno == 0)
|
|
|
|
errno = ENOSPC;
|
|
|
|
ereport(ERROR,
|
2003-08-04 02:43:34 +02:00
|
|
|
(errcode_for_file_access(),
|
2005-08-02 21:02:32 +02:00
|
|
|
errmsg("could not write to file \"%s\": %m", tofile)));
|
2003-08-04 02:43:34 +02:00
|
|
|
}
|
Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and
6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O. Add wait events for those operations, too.
Rushabh Lathia, with further hacking by me. Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.
Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 12:43:01 +01:00
|
|
|
pgstat_report_wait_end();
|
2010-02-15 01:50:57 +01:00
|
|
|
}
|
2003-05-15 19:59:17 +02:00
|
|
|
|
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ),
and historically has issued a pg_flush_data request after each write.
This turns out to interact really badly with macOS's new APFS file
system: a large file copy takes over 100X longer than it ought to on
APFS, as reported by Brent Dearth. While that's arguably a macOS bug,
it's not clear whether Apple will do anything about it in the near
future, and in any case experimentation suggests that issuing flushes
a bit less often can be helpful on other platforms too.
Hence, rearrange the logic in copy_file() so that flush requests are
issued once per N writes rather than every time through the loop.
I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still
results in a noticeable speed degradation on APFS), but 1MB elsewhere.
In limited testing on Linux and FreeBSD, this seems slightly faster
than the previous code, and certainly no worse. It helps noticeably
on macOS even with the older HFS filesystem.
A simpler change would have been to just increase the size of the
copy buffer without changing the loop logic, but that seems likely
to trash the processor cache without really helping much.
Back-patch to 9.6 where we introduced msync() as an implementation
option for pg_flush_data(). The problem seems specific to APFS's
mmap/msync support, so I don't think we need to go further back.
Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
2017-10-08 21:25:26 +02:00
|
|
|
if (offset > flush_offset)
|
|
|
|
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
|
|
|
|
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
if (CloseTransientFile(dstfd))
|
2005-08-02 21:02:32 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not close file \"%s\": %m", tofile)));
|
|
|
|
|
Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().
This plugs a few rare fd leaks in error cases:
1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
use OpenTransientFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
PathNameOpenFile.
In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.
The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 09:25:50 +01:00
|
|
|
CloseTransientFile(srcfd);
|
2005-09-02 20:55:32 +02:00
|
|
|
|
|
|
|
pfree(buffer);
|
2003-05-15 19:59:17 +02:00
|
|
|
}
|