Commit Graph

219 Commits

Author SHA1 Message Date
Robert Haas 249cf070e3 Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add and
6f3bd98ebf, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O.  Add wait events for those operations, too.

Rushabh Lathia, with further hacking by me.  Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.

Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 07:43:01 -04:00
Tom Lane f97de05a14 Fix sloppy handling of corner-case errors in fd.c.
Several places in fd.c had badly-thought-through handling of error returns
from lseek() and close().  The fact that those would seldom fail on valid
FDs is probably the reason we've not noticed this up to now; but if they
did fail, we'd get quite confused.

LruDelete and LruInsert actually just Assert'd that lseek never fails,
which is pretty awful on its face.

In LruDelete, we indeed can't throw an error, because that's likely to get
called during error abort and so throwing an error would probably just lead
to an infinite loop.  But by the same token, throwing an error from the
close() right after that was ill-advised, not to mention that it would've
left the LRU state corrupted since we'd already unlinked the VFD from the
list.  I also noticed that really, most of the time, we should know the
current seek position and it shouldn't be necessary to do an lseek here at
all.  As patched, if we don't have a seek position and an lseek attempt
doesn't give us one, we'll close the file but then subsequent re-open
attempts will fail (except in the somewhat-unlikely case that a
FileSeek(SEEK_SET) call comes between and allows us to re-establish a known
target seek position).  This isn't great but it won't result in any state
corruption.

Meanwhile, having an Assert instead of an honest test in LruInsert is
really dangerous: if that lseek failed, a subsequent read or write would
read or write from the start of the file, not where the caller expected,
leading to data corruption.

In both LruDelete and FileClose, if close() fails, just LOG that and mark
the VFD closed anyway.  Possibly leaking an FD is preferable to getting
into an infinite loop or corrupting the VFD list.  Besides, as far as I can
tell from the POSIX spec, it's unspecified whether or not the file has been
closed, so treating it as still open could be the wrong thing anyhow.

I also fixed a number of other places that were being sloppy about
behaving correctly when the seekPos is unknown.

Also, I changed FileSeek to return -1 with EINVAL for the cases where it
detects a bad offset, rather than throwing a hard elog(ERROR).  It seemed
pretty inconsistent that some bad-offset cases would get a failure return
while others got elog(ERROR).  It was missing an offset validity check for
the SEEK_CUR case on a closed file, too.

Back-patch to all supported branches, since all this code is fundamentally
identical in all of them.

Discussion: https://postgr.es/m/2982.1487617365@sss.pgh.pa.us
2017-02-21 17:51:37 -05:00
Bruce Momjian 1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas f82ec32ac3 Rename "pg_xlog" directory to "pg_wal".
"xlog" is not a particularly clear abbreviation for "write-ahead log",
and it sometimes confuses users into believe that the contents of the
"pg_xlog" directory are not critical data, leading to unpleasant
consequences.  So, rename the directory to "pg_wal".

This patch modifies pg_upgrade and pg_basebackup to understand both
the old and new directory layouts; the former is necessary given the
purpose of the tool, while the latter merely avoids an unnecessary
backward-compatibility break.

We may wish to consider renaming other programs, switches, and
functions which still use the old "xlog" naming to also refer to
"wal".  However, that's still under discussion, so let's do just this
much for now.

Discussion: CAB7nPqTeC-8+zux8_-4ZD46V7YPwooeFxgndfsq5Rg8ibLVm1A@mail.gmail.com

Michael Paquier
2016-10-20 11:32:18 -04:00
Tom Lane 95ef43c430 Widen amount-to-flush arguments of FileWriteback and callers.
It's silly to define these counts as narrower than they might someday
need to be.  Also, I believe that the BLCKSZ * nflush calculation in
mdwriteback was capable of overflowing an int.
2016-04-13 18:12:06 -04:00
Tom Lane fa11a09fed Fix assorted portability issues with using msync() for data flushing.
Commit 428b1d6b29 introduced the use of
msync() for flushing dirty data from the kernel's file buffers.  Several
portability issues were overlooked, though:

* Not all implementations of mmap() think that nbytes == 0 means "map
the whole file".  To fix, use lseek() to find out the true length.
Fix callers of pg_flush_data to be aware that nbytes == 0 may result
in trashing the file's seek position.

* Not all implementations of mmap() will accept partial-page mmap
requests.  To fix, round down the length request to whatever sysconf()
says the page size is.  (I think this is OK from a portability standpoint,
because sysconf() is required by SUS v2, and we aren't trying to compile
this part on Windows anyway.  Buildfarm should let us know if not.)

* On 32-bit machines, the file size might exceed the available free
address space, or even exceed what will fit in size_t.  Check for
the latter explicitly to avoid passing a false request size to mmap().
If mmap fails, silently fall through to the next implementation method,
rather than bleating to the postmaster log and giving up.

* mmap'ing directories fails on some platforms, and even if it works,
msync'ing the directory is quite unlikely to help, as for that matter are
the other flush implementations.  In pre_sync_fname(), just skip flush
attempts on directories.

In passing, copy-edit the comments a bit.

Stas Kelvich and myself
2016-04-13 17:17:51 -04:00
Andres Freund e01157500f Include portability/mem.h into fd.c for MAP_FAILED.
Buildfarm members gaur and pademelon are old enough not to know about
MAP_FAILED; which is used in 428b1d6. Include portability/mem.h to fix;
as already done in a bunch of other places.
2016-03-12 12:16:48 -08:00
Andres Freund 428b1d6b29 Allow to trigger kernel writeback after a configurable number of writes.
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches.  When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively.  This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

While desirable and likely possible this patch does not contain an
implementation for windows.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
2016-03-10 17:04:34 -08:00
Andres Freund 606e0f9841 Introduce durable_rename() and durable_link_or_rename().
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
2016-03-09 18:53:53 -08:00
Robert Haas 070140ee48 Add some functions to fd.c for the convenience of extensions.
For example, if you want to perform an ioctl() on a file descriptor
opened through the fd.c routines, there's no way to do that without
being able to get at the underlying fd.

KaiGai Kohei
2016-03-08 10:09:50 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Tom Lane 57e1138bcc Remove special cases for ETXTBSY from new fsync'ing logic.
The argument that this is a sufficiently-expected case to be silently
ignored seems pretty thin.  Andres had brought it up back when we were
still considering that most fsync failures should be hard errors, and it
probably would be legit not to fail hard for ETXTBSY --- but the same is
true for EROFS and other cases, which is why we gave up on hard failures.
ETXTBSY is surely not a normal case, so logging the failure seems fine
from here.
2015-05-29 15:11:36 -04:00
Tom Lane d8179b001a Fix fsync-at-startup code to not treat errors as fatal.
Commit 2ce439f337 introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.

After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/.  The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.

This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.

I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().

The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.

Back-patch to all active branches, like the prior commit.

Abhijit Menon-Sen and Tom Lane
2015-05-28 17:33:03 -04:00
Tom Lane 32f628be74 Fix assorted inconsistencies in our calls of readlink().
Ensure that we null-terminate the result string (one place in pg_rewind).
Be paranoid about out-of-range results from readlink() (should not happen,
but there is no good reason for some call sites to be careful about it and
others not).  Consistently use the whole buffer, not sometimes one byte
less.  Ensure we emit an appropriate errcode() in all cases.  Spell the
error messages the same way.

The only serious bug here is the missing null-termination in pg_rewind,
which is new code, so no need for a back-patch.

Abhijit Menon-Sen and Tom Lane
2015-05-28 12:17:22 -04:00
Bruce Momjian 807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Robert Haas 922de19ef2 Fix error message in pre_sync_fname.
The old one didn't include %m anywhere, and required extra
translation.

Report by Peter Eisentraut. Fix by me. Review by Tom Lane.
2015-05-18 12:53:54 -04:00
Robert Haas 456ff08638 Fix some problems with patch to fsync the data directory.
pg_win32_is_junction() was a typo for pgwin32_is_junction().  open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.

Per reports from Andrew Dunstan and David Rowley.
2015-05-05 09:29:49 -04:00
Robert Haas 2ce439f337 Recursively fsync() the data directory after a crash.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk.  This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
2015-05-04 14:13:53 -04:00
Bruce Momjian 4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Heikki Linnakangas 2076db2aea Move the backup-block logic from XLogInsert to a new file, xloginsert.c.
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
related to putting together the WAL record are in xloginsert.c, and the
lower level stuff for managing WAL buffers and such are in xlog.c.

Also move the definition of XLogRecord to a separate header file. This
causes churn in the #includes of all the files that write WAL records, and
redo routines, but it avoids pulling in xlog.h into most places.

Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
2014-11-06 13:55:36 +02:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane 2d00190495 Rationalize common/relpath.[hc].
Commit a730183926 created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that.  To clean it up:

* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h.  We won't consider this symbol part of the FE/BE API.

* Push enum ForkNumber from relfilenode.h into relpath.h.  We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.

* So, relfilenode.h now includes relpath.h rather than vice-versa.  This
direction of dependency is fine.  (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)

* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.

* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().

* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers.  (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)

* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().

* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames.  This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.

The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.

Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 17:30:50 -04:00
Bruce Momjian 1494931d73 Remove MinGW readdir/errno bug workaround fixed on 2003-10-10 2014-03-21 13:47:37 -04:00
Bruce Momjian 6f03927fce Properly check for readdir/closedir() failures
Clear errno before calling readdir() and handle old MinGW errno bug
while adding full test coverage for readdir/closedir failures.

Backpatch through 8.4.
2014-03-21 13:45:11 -04:00
Bruce Momjian 886c0be3f6 C comments: remove odd blank lines after #ifdef WIN32 lines 2014-03-13 01:34:42 -04:00
Tom Lane ac4ef637ad Allow use of "z" flag in our printf calls, and use it where appropriate.
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has".  Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64).  So let's
start using "z" instead.  To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".

Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead.  This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them.  I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.

It's possible that this change will result in format-string warnings from
pre-C99 compilers.  We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.

Andres Freund, with a little additional work by me
2014-01-23 17:18:33 -05:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Robert Haas cc52d5b33f Expose fsync_fname as a public API.
Andres Freund
2013-09-04 11:15:00 -04:00
Tom Lane 007556bf08 Remove fixed limit on the number of concurrent AllocateFile() requests.
AllocateFile(), AllocateDir(), and some sister routines share a small array
for remembering requests, so that the files can be closed on transaction
failure.  Previously that array had a fixed size, MAX_ALLOCATED_DESCS (32).
While historically that had seemed sufficient, Steve Toutant pointed out
that this meant you couldn't scan more than 32 file_fdw foreign tables in
one query, because file_fdw depends on the COPY code which uses
AllocateFile().  There are probably other cases, or will be in the future,
where this nonconfigurable limit impedes users.

We can't completely remove any such limit, at least not without a lot of
work, since each such request requires a kernel file descriptor and most
platforms limit the number we can have.  (In principle we could
"virtualize" these descriptors, as fd.c already does for the main VFD pool,
but not without an additional layer of overhead and a lot of notational
impact on the calling code.)  But we can at least let the array size be
configurable.  Hence, change the code to allow up to max_safe_fds/2
allocated file requests.  On modern platforms this should allow several
hundred concurrent file_fdw scans, or more if one increases the value of
max_files_per_process.  To go much further than that, we'd need to do some
more work on the data structure, since the current code for closing
requests has potentially O(N^2) runtime; but it should still be all right
for request counts in this range.

Back-patch to 9.1 where contrib/file_fdw was introduced.
2013-06-09 13:46:54 -04:00
Tom Lane 6563fb2b45 Fix fd.c to preserve errno where needed.
PathNameOpenFile failed to ensure that the correct value of errno was
returned to its caller after a failure (because it incorrectly supposed
that free() can never change errno).  In some cases this would result
in a user-visible failure because an expected ENOENT errno was replaced
with something else.  Bogus EINVAL failures have been observed on OS X,
for example.

There were also a couple of places that could mangle an important value
of errno if FDDEBUG was defined.  While the usefulness of that debug
support is highly debatable, we might as well make it safe to use,
so add errno save/restore logic to the DO_DB macro.

Per bug #8167 from Nelson Minar, diagnosed by RhodiumToad.
Back-patch to all supported branches.
2013-05-16 15:04:31 -04:00
Heikki Linnakangas 3d009e45bd Add support for piping COPY to/from an external program.
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.

In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.

This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.

Etsuro Fujita, reviewed by Amit Kapila.
2013-02-27 18:22:31 +02:00
Alvaro Herrera a730183926 Move relpath() to libpgcommon
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
2013-02-21 22:46:17 -03:00
Heikki Linnakangas 5d6899dbae Fix yet another typo in comment.
Etsuro Fujita
2013-02-20 12:31:26 +02:00
Magnus Hagander c572bfaf39 Fix another typo in a comment
Noted by Thom Brown
2013-02-08 15:42:01 +01:00
Magnus Hagander 733701d274 Fix typo in comment
Etsuro Fujita
2013-02-08 11:45:42 +01:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Heikki Linnakangas 1f67078ea3 Add OpenTransientFile, with automatic cleanup at end-of-xact.
Files opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that only want
to keep the file open for a short time. There is AllocateFile, but that
returns a buffered FILE * stream, which in many cases is not the nicest API
to work with. So add function called OpenTransientFile, which returns a
unbuffered fd that's cleaned up like the FILE* returned by AllocateFile().

This plugs a few rare fd leaks in error cases:

1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile
2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't
   use OpenTransientFile here because the fd is supposed to persist over
   transaction boundaries.
3. lo_import/lo_export - fixed by using OpenTransientFile instead of
   PathNameOpenFile.

In addition to plugging those leaks, this replaces many BasicOpenFile() calls
with OpenTransientFile() that were not leaking, because the code meticulously
closed the file on error. That wasn't strictly necessary, but IMHO it's good
for robustness.

The same leaks exist in older versions, but given the rarity of the issues,
I'm not backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master branch.
2012-11-27 10:25:50 +02:00
Tom Lane 9bacf0e373 Revert "Use "transient" files for blind writes, take 2".
This reverts commit fba105b109.
That approach had problems with the smgr-level state not tracking what
we really want to happen, and with the VFD-level state not tracking the
smgr-level state very well either.  In consequence, it was still possible
to hold kernel file descriptors open for long-gone tables (as in recent
report from Tore Halset), and yet there were also cases of FDs being closed
undesirably soon.  A replacement implementation will follow.
2012-10-17 12:37:08 -04:00
Alvaro Herrera 45326c5a11 Split resowner.h
This lets files that are mere users of ResourceOwner not automatically
include the headers for stuff that is managed by the resowner mechanism.
2012-08-28 18:02:07 -04:00
Tom Lane 2d46a57ddc Improve copydir() code for the case that fsync is off.
We should avoid calling sync_file_range or posix_fadvise in this case,
since (a) we don't really care if the data gets synced, and might as
well save the kernel calls; (b) at least on Linux we know that the
kernel might block us until it's scheduled the write.

Also, avoid making a useless second traversal of the directory tree
if we're not actually going to call fsync(2) after all.
2012-07-21 20:10:29 -04:00
Tom Lane b966dd6c42 Add fsync capability to initdb, and use sync_file_range() if available.
Historically we have not worried about fsync'ing anything during initdb
(in fact, initdb intentionally passes -F to each backend launch to prevent
it from fsync'ing).  But with filesystems getting more aggressive about
caching data, that's not such a good plan anymore.  Make initdb do a pass
over the finished data directory tree to fsync everything.  For testing
purposes, the -N/--nosync flag can be used to restore the old behavior.

Also, testing shows that on Linux, sync_file_range() is much faster than
posix_fadvise() for hinting to the kernel that an fsync is coming,
apparently because the latter blocks on a rather small request queue while
the former doesn't.  So use this function if available in initdb, and also
in the backend's pg_flush_data() (where it currently will affect only the
speed of CREATE DATABASE's cloning step).

We will later make pg_regress invoke initdb with the --nosync flag
to avoid slowing down cases such as "make check" in contrib.  But
let's not do so until we've shaken out any portability issues in this
patch.

Jeff Davis, reviewed by Andres Freund
2012-07-13 17:16:58 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Heikki Linnakangas 5762a4d909 Inherit max_safe_fds to child processes in EXEC_BACKEND mode.
Postmaster sets max_safe_fds by testing how many open file descriptors it
can open, and that is normally inherited by all child processes at fork().
Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we
effectively ignored max_files_per_process on Windows, and always assumed
a conservative default of 32 simultaneous open files. That could have an
impact on performance, if you need to access a lot of different files
in a query. After this patch, the value is passed to child processes by
save/restore_backend_variables() among many other global variables.

It has been like this forever, but given the lack of complaints about it,
I'm not backpatching this.
2012-03-29 08:19:11 +03:00
Peter Eisentraut 0e85abd658 Clean up compiler warnings from unused variables with asserts disabled
For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.
2012-03-21 23:33:10 +02:00
Magnus Hagander 672614cf21 Prevent logging "failed to stat file: success" for temp files
This was broken in commit bc3347484a, the
addition of statistics counters for temp files.

Reported by Thom Brown
2012-01-28 10:03:26 +01:00
Robert Haas 467ff207f5 Add missing #include, to suppress compiler warning. 2012-01-26 10:16:26 -05:00
Magnus Hagander bc3347484a Track temporary file count and size in pg_stat_database
Add counters for number and size of temporary files used
for spill-to-disk queries for each database to the
pg_stat_database view.

Tomas Vondra, review by Magnus Hagander
2012-01-26 14:41:19 +01:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane 9473bb96d0 Further thoughts about temp_file_limit patch.
Move FileClose's decrement of temporary_files_size up, so that it will be
executed even if elog() throws an error.  This is reasonable since if the
unlink() fails, the fact the file is still there is not our fault, and we
are going to forget about it anyhow.  So we won't count it against
temp_file_limit anymore.

Update fileSize and temporary_files_size correctly in FileTruncate.
We probably don't have any places that truncate temp files, but fd.c
surely should not assume that.
2011-07-17 15:05:44 -04:00
Tom Lane 23e5b16c71 Add temp_file_limit GUC parameter to constrain temporary file space usage.
The limit is enforced against the total amount of temp file space used by
each session.

Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii
2011-07-17 14:19:31 -04:00