Commit Graph

326 Commits

Author SHA1 Message Date
Peter Eisentraut ed1e0a6512 Error message capitalisation
per style guidelines

Author: Peter Smith <peter.b.smith@fujitsu.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPtzstExQ4%3DvFH%2BWzZ4g4xEx2JA%3DqxussxOdxVEwJce6bw%40mail.gmail.com
2024-01-18 09:35:12 +01:00
Michael Paquier e72a37528d Refactor code checking for file existence
jit.c and dfgr.c had a copy of the same code to check if a file exists
or not, with a twist: jit.c did not check for EACCES when failing the
stat() call for the path whose existence is tested.  This refactored
routine will be used by an upcoming patch.

Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz
2024-01-12 12:04:51 +09:00
Bruce Momjian 29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Thomas Munro 871fe4917e Provide vectored variants of FileRead() and FileWrite().
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for
fd.c's virtual file descriptors.  The simple FileRead() and FileWrite()
functions are now implemented in terms of the vectored functions, to
avoid code duplication, and they are converted back to the corresponding
simple system calls further down (commit 15c9ac36).  Later work will
make more interesting multi-iovec calls.

The traditional behavior of reporting a "fake" ENOSPC error is
simplified.  It's now always set for non-failing writes, for the benefit
of callers that expect to log a meaningful "%m" if they determine that
the write was short.  (Perhaps we should consider getting rid of that
expectation one day.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12 13:12:43 +13:00
Michael Paquier 8d9978a717 Apply quotes more consistently to GUC names in logs
Quotes are applied to GUCs in a very inconsistent way across the code
base, with a mix of double quotes or no quotes used.  This commit
removes double quotes around all the GUC names that are obviously
referred to as parameters with non-English words (use of underscore,
mixed case, etc).

This is the result of a discussion with Álvaro Herrera, Nathan Bossart,
Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
2023-11-30 14:11:45 +09:00
Heikki Linnakangas b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Peter Eisentraut 611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
Nathan Bossart 8d140c5822 Improve the naming in wal_sync_method code.
* sync_method is renamed to wal_sync_method.

* sync_method_options[] is renamed to wal_sync_method_options[].

* assign_xlog_sync_method() is renamed to assign_wal_sync_method().

* The names of the available synchronization methods are now
  prefixed with "WAL_SYNC_METHOD_" and have been moved into a
  WalSyncMethod enum.

* PLATFORM_DEFAULT_SYNC_METHOD is renamed to
  PLATFORM_DEFAULT_WAL_SYNC_METHOD, and DEFAULT_SYNC_METHOD is
  renamed to DEFAULT_WAL_SYNC_METHOD.

These more descriptive names help distinguish the code for
wal_sync_method from the code for DataDirSyncMethod (e.g., the
recovery_init_sync_method configuration parameter and the
--sync-method option provided by several frontend utilities).  This
change also prevents name collisions between the aforementioned
sets of code.  Since this only improves the naming of internal
identifiers, there should be no behavior change.

Author: Maxim Orlov
Discussion: https://postgr.es/m/CACG%3DezbL1gwE7_K7sr9uqaCGkWhmvRTcTEnm3%2BX1xsRNwbXULQ%40mail.gmail.com
2023-10-13 15:16:45 -05:00
Nathan Bossart 3ed1956719 Make enum for sync methods available to frontend code.
This commit renames RecoveryInitSyncMethod to DataDirSyncMethod and
moves it to common/file_utils.h.  This is preparatory work for a
follow-up commit that will allow specifying the synchronization
method in frontend utilities such as pg_upgrade and pg_basebackup.

Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/ZN2ZB4afQ2JbR9TA%40paquier.xyz
2023-09-06 16:26:39 -07:00
Peter Eisentraut 4f3514f201 Rename hook functions for debug_io_direct to match variable name.
Commit 319bae9a renamed the GUC.  Rename the check and assign functions
to match, and alphabetize.

Back-patch to 16.

Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2769341e-fa28-c2ee-3e4b-53fdcaaf2271%40eisentraut.org
2023-08-24 22:25:49 +12:00
Thomas Munro d0c28601ef Remove wal_sync_method=fsync_writethrough on Windows.
The "fsync" level already flushes drive write caches on Windows (as does
"fdatasync"), so it only confuses matters to have an apparently higher
level that isn't actually different at all.

That leaves "fsync_writethrough" only for macOS, where it actually does
something different.

Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/CA%2BhUKGJ2CG2SouPv2mca2WCTOJxYumvBARRcKPraFMB6GSEMcA%40mail.gmail.com
2023-07-14 12:30:13 +12:00
Tom Lane b334612b8a Pre-beta2 mechanical code beautification.
Run pgindent and pgperltidy.  It seems we're still some ways
away from all committers doing this automatically.  Now that
we have a buildfarm animal that will whine about poorly-indented
code, we'll try to keep the tree more tidy.

Discussion: https://postgr.es/m/3156045.1687208823@sss.pgh.pa.us
2023-06-20 09:50:43 -04:00
Andres Freund 0d369ac650 fd.c: Retry after EINTR in more places
Starting with 4d330a61bb we can use posix_fallocate() to extend
files. Unfortunately in some situation, e.g. on tmpfs filesystems, EINTR may
be returned. See also 4518c798b2.

To fix, add a retry path to FileFallocate(). In contrast to 4518c798b2 the
amount we extend by is limited and the extending may happen at a high
frequency, so disabling signals does not appear to be the correct path here.

Also add retry paths to other file operations currently lacking them (around
fdatasync(), fsync(), ftruncate(), posix_fadvise(), sync_file_range(),
truncate()) - they are all documented or have been observed to return EINTR.

Even though most of these functions used in the back branches, it does not
seem worth the risk to backpatch - outside of the new-to-16 case of
posix_fallocate() I am not aware of problem reports due to the lack of
retries.

Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/ZEZDj1H61ryrmY9o@msg.df7cb.de
Backpatch: -
2023-06-19 14:11:32 -07:00
Thomas Munro 319bae9a8d Rename io_direct to debug_io_direct.
Give the new GUC introduced by d4e71df6 a name that is clearly not
intended for mainstream use quite yet.

Future proposals would drop the prefix only after adding infrastructure
to make it efficient.  Having the switch in the tree sooner is good
because it might lead to new discoveries about the hazards awaiting us
on a wide range of systems, but that name was too enticing and could
lead to cross-version confusion in future, per complaints from Noah and
Justin.

Suggested-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (the idea, not the patch)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (ditto)
Discussion: https://postgr.es/m/20230430041106.GA2268796%40rfd.leadboat.com
2023-05-15 10:31:14 +12:00
Michael Paquier 8961cb9a03 Fix typos in comments
The changes done in this commit impact comments with no direct
user-visible changes, with fixes for incorrect function, variable or
structure names.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
2023-05-02 12:23:08 +09:00
Thomas Munro d4e71df6d7 Add io_direct setting (developer-only).
Provide a way to ask the kernel to use O_DIRECT (or local equivalent)
where available for data and WAL files, to avoid or minimize kernel
caching.  This hurts performance currently and is not intended for end
users yet.  Later proposed work would introduce our own I/O clustering,
read-ahead, etc to replace the facilities the kernel disables with this
option.

The only user-visible change, if the developer-only GUC is not used, is
that this commit also removes the obscure logic that would activate
O_DIRECT for the WAL when wal_sync_method=open_[data]sync and
wal_level=minimal (which also requires max_wal_senders=0).  Those are
non-default and unlikely settings, and this behavior wasn't (correctly)
documented.  The same effect can be achieved with io_direct=wal.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com
2023-04-08 16:35:07 +12:00
Andres Freund 4d330a61bb Add smgrzeroextend(), FileZero(), FileFallocate()
smgrzeroextend() uses FileFallocate() to efficiently extend files by multiple
blocks. When extending by a small number of blocks, use FileZero() instead, as
using posix_fallocate() for small numbers of blocks is inefficient for some
file systems / operating systems. FileZero() is also used as the fallback for
FileFallocate() on platforms / filesystems that don't support fallocate.

A big advantage of using posix_fallocate() is that it typically won't cause
dirty buffers in the kernel pagecache. So far the most common pattern in our
code is that we smgrextend() a page full of zeroes and put the corresponding
page into shared buffers, from where we later write out the actual contents of
the page. If the kernel, e.g. due to memory pressure or elapsed time, already
wrote back the all-zeroes page, this can lead to doubling the amount of writes
reaching storage.

There are no users of smgrzeroextend() as of this commit. That will follow in
future commits.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/20221029025420.eplyow6k7tgu6he3@awork3.anarazel.de
2023-04-05 10:06:39 -07:00
Andres Freund f9054b7a7c Fix format code in fd.c debugging infrastructure
These were not sufficiently adjusted in 2d4f1ba6cf.
2023-03-30 10:26:10 -07:00
Thomas Munro 1da569ca1f Don't leak descriptors into subprograms.
Open long-lived data and WAL file descriptors with O_CLOEXEC.  This flag
was introduced by SUSv4 (POSIX.1-2008), and by now all of our target
Unix systems have it.  Our open() implementation for Windows already had
that behavior, so provide a dummy O_CLOEXEC flag on that platform.

For now, callers of open() and the "thin" wrappers in fd.c that deal in
raw descriptors need to pass in O_CLOEXEC explicitly if desired.  This
commit does that for WAL files, and automatically for everything
accessed via VFDs including SMgrRelation and BufFile.  (With more
discussion we might decide to turn it on automatically for the thin
open()-wrappers too to avoid risk of missing places that need it, but
these are typically used for short-lived descriptors where we don't
expect to fork/exec, and it's remotely possible that extensions could be
using these APIs and passing descriptors to subprograms deliberately, so
that hasn't been done here.)

Do the same for sockets and the postmaster pipe with FD_CLOEXEC.  (Later
commits might use modern interfaces to remove these extra fcntl() calls
and more where possible, but we'll need them as a fallback for a couple
of systems, so do it that way in this initial commit.)

With this change, subprograms executed for archiving, copying etc will
no longer have access to the server's descriptors, other than the ones
that we decide to pass down.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
2023-03-03 10:43:33 +13:00
Thomas Munro a1f45f69bb Remove obsolete coding for early macOS.
Commits 04cad8f7 and 0c088568 supported old macOS systems that didn't
define O_CLOEXEC or O_DSYNC yet, but those arrived in macOS releases
10.7 and 10.6 (respectively), which themselves reached EOL around a
decade ago.  We've already made use of other POSIX features that early
macOS vintages can't compile (for example commits 623cc673, d2e15083).

A later commit will use O_CLOEXEC on POSIX systems so it would be
strange to pretend here that it's optional, and we might as well give
O_DSYNC the same treatment since the reference is also guarded by a test
for a macOS-specific macro, and we know that current Macs have it.

Discussion: https://postgr.es/m/CA%2BhUKGKb6FsAdQWcRL35KJsftv%2B9zXqQbzwkfRf1i0J2e57%2BhQ%40mail.gmail.com
2023-02-22 09:55:43 +13:00
Thomas Munro 72aea955d4 Fix pg_truncate() on Windows.
Commit 57faaf376 added pg_truncate(const char *path, off_t length), but
"length" was ignored under WIN32 and the file was unconditionally
truncated to 0.

There was no live bug, since the only caller passes 0.

Fix, and back-patch to 14 where the function arrived.

Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20230106031652.GR3109%40telsasoft.com
2023-01-06 16:42:47 +13:00
Bruce Momjian c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Peter Eisentraut 2d4f1ba6cf Update types in File API
Make the argument types of the File API match stdio better:

- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.

In passing, change the FilePrefetch() amount argument from int to
off_t, to match the underlying posix_fadvise().

Discussion: https://www.postgresql.org/message-id/flat/11dda853-bb5b-59ba-a746-e168b1ce4bdb%40enterprisedb.com
2022-12-08 08:58:15 +01:00
Michael Paquier 4ab8c81bd9 Move pg_pwritev_with_retry() to src/common/file_utils.c
This commit moves pg_pwritev_with_retry(), a convenience wrapper of
pg_writev() able to handle partial writes, to common/file_utils.c so
that the frontend code is able to use it.  A first use-case targetted
for this routine is pg_basebackup and pg_receivewal, for the
zero-padding of a newly-initialized WAL segment.  This is used currently
in the backend when the GUC wal_init_zero is enabled (default).

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Thomas Munro
Discussion: https://postgr.es/m/CALj2ACUq7nAb7=bJNbK3yYmp-SZhJcXFR_pLk8un6XgDzDF3OA@mail.gmail.com
2022-10-27 14:39:42 +09:00
Thomas Munro b6d8a60aba Restore pg_pread and friends.
Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of
the pg_ prefixes where we use pread(), pwrite() and vectored variants.

We dropped support for ancient Unixes where we needed to use lseek() to
implement replacements for those, but it turns out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile() if the file handle is synchronous, despite
its documentation saying otherwise.

Switching to asynchronous file handles would fix that, but have other
complications.  For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect, which we can now
describe as Windows-only.

Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
2022-09-29 13:12:11 +13:00
Amit Kapila 6971a839cc Improve some error messages.
It is not our usual style to use "we" in the error messages.

Author: Kyotaro Horiguchi
Reviewed-By: Amit Kapila
Discussion: https://postgr.es/m/20220914.111507.13049297635620898.horikyota.ntt@gmail.com
2022-09-21 09:43:59 +05:30
Michael Paquier bfb9dfd937 Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat()
Several backend-side loops scanning one or more directories with
ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory
copy, temporary file removal, configuration file parsing, some logical
decoding logic and some pgtz stuff) already know the type of the entry
being scanned thanks to the dirent structure associated to the entry, on
platforms where we know about DT_REG, DT_DIR and DT_LNK to make the
difference between a regular file, a directory and a symbolic link.

Relying on the direct structure of an entry saves a few system calls to
stat() and lstat() in the loops updated here, shaving some code while on
it.  The logic of the code remains the same, calling stat() or lstat()
depending on if it is necessary to look through symlinks.

Authors: Nathan Bossart, Bharath Rupireddy
Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
2022-09-02 16:58:06 +09:00
Tom Lane 7fed801135 Clean up inconsistent use of fflush().
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).

Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls.  While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order.  Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.

Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
2022-08-29 13:55:41 -04:00
Thomas Munro 36b3d52459 Remove configure probe for sys/resource.h and refactor.
<sys/resource.h> is in SUSv2 and is on all targeted Unix systems.  We
have a replacement for getrusage() on Windows, so let's just move its
declarations into src/include/port/win32/sys/resource.h so that we can
use a standard-looking #include.  Also remove an obsolete reference to
CLK_TCK.  Also rename src/port/getrusage.c to win32getrusage.c,
following the convention for Windows-only fallback code.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
2022-08-14 00:09:47 +12:00
Thomas Munro 5fc88c5d53 Replace pgwin32_is_junction() with lstat().
Now that lstat() reports junction points with S_IFLNK/S_ISLINK(), and
unlink() can unlink them, there is no need for conditional code for
Windows in a few places.  That was expressed by testing for WIN32 or
S_ISLNK, which we can now constant-fold.

The coding around pgwin32_is_junction() was a bit suspect anyway, as we
never checked for errors, and we also know that errors can be spuriously
reported because of transient sharing violations on this OS.  The
lstat()-based code has handling for that.

This also reverts 4fc6b6ee on master only.  That was done because
lstat() didn't previously work for symlinks (junction points), but now
it does.

Tested-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
2022-08-06 12:50:59 +12:00
Thomas Munro d2e150831a Remove configure probe for fdatasync.
fdatasync() is in SUSv2, and all targeted Unix systems have it.  We have
a replacement function for Windows.

We retain the probe for the function declaration, which allows us to
supply the mysteriously missing declaration for macOS, and also for
Windows.  No need to keep a HAVE_FDATASYNC macro around.

Also rename src/port/fdatasync.c to win32fdatasync.c since it's only for
Windows.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
2022-08-05 16:37:38 +12:00
Thomas Munro a0dc827112 Simplify replacement code for preadv and pwritev.
preadv() and pwritev() are not standardized by POSIX, but appeared in
NetBSD in 1999 and were adopted by at least OpenBSD, FreeBSD,
DragonFlyBSD, Linux, AIX, illumos and macOS.  We don't use them much
yet, but an active proposal uses them heavily.

In 15, we had two replacement implementations for other OSes: one based
on lseek() + -v function if available for true vector I/O, and the other
based on a loop over p- function.

The former would be an obstacle to hypothetical future multi-threaded
code sharing file descriptors, while the latter would not, since commit
cf112c12.  Furthermore, the number of targeted systems that could
benefit from the former's potential upside has dwindled to just one
niche OS, since macOS added the functions and we de-supported HP-UX.
That doesn't seem like a good trade-off.

Therefore, drop the lseek()-based variant, and also the pg_ prefix now
that the file position portability hazard is gone.

At the time of writing, the only systems in our build farm that lack
native preadv/pwritev and thus use fallback code are:

 * Solaris (but not illumos)
 * macOS before release 11.0
 * Windows

With this commit, the above systems will now use the *same* fallback
code, the version that loops over pread()/pwrite().  Windows already
used that (though a later proposal may include true vector I/O for
Windows), so this decision really only affects Solaris, until it gets
around to adding these system calls.

Also remove some useless includes while here.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 14:04:02 +12:00
Thomas Munro cf112c1220 Remove dead pread and pwrite replacement code.
pread() and pwrite() are in SUSv2, and all targeted Unix systems have
them.

Previously, we defined pg_pread and pg_pwrite to emulate these function
with lseek() on old Unixen.  The names with a pg_ prefix were a reminder
of a portability hazard: they might change the current file position.
That hazard is gone, so we can drop the prefixes.

Since the remaining replacement code is Windows-only, move it into
src/port/win32p{read,write}.c, and move the declarations into
src/include/port/win32_port.h.

No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only
used for declarations in port.h which have now moved into win32_port.h.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:49:21 +12:00
Thomas Munro bdb657edd6 Remove configure probe and related tests for getrlimit.
getrlimit() is in SUSv2 and all targeted systems have it.

Windows doesn't have it.  We could just use #ifndef WIN32, but for a
little more explanation about why we're making things conditional, let's
retain the HAVE_GETRLIMIT macro.  It's defined in port.h for Unix systems.

On systems that have it, it's not necessary to test for RLIMIT_CORE,
RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all
targeted systems have them.  Also remove references to a pre-historic
alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe
that Cygwin didn't have it.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
2022-08-05 09:18:34 +12:00
Michael Paquier eb64ceac7e Remove durable_rename_excl()
A previous commit replaced all the calls to this function with
durable_rename() as of dac1ff3, making it used nowhere in the tree.
Using it in extension code is also risky based on the issues described
in this previous commit, so let's remove it.  This makes possible the
removal of HAVE_WORKING_LINK.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
2022-07-05 12:54:25 +09:00
Michael Paquier 55b5686511 Revert recent changes with durable_rename_excl()
This reverts commits 2c902bb and ccfbd92.  Per buildfarm members
kestrel, rorqual and calliphoridae, the assertions checking that a TLI
history file should not exist when created by a WAL receiver have been
failing, and switching to durable_rename() over durable_rename_excl()
would cause the newest TLI history file to overwrite the existing one.
We need to think harder about such cases, so revert the new logic for
now.

Note that all the failures have been reported in the test
025_stuck_on_old_timeline.

Discussion: https://postgr.es/m/511362.1651116498@sss.pgh.pa.us
2022-04-28 13:08:16 +09:00
Michael Paquier 2c902bbf19 Remove durable_rename_excl()
ccfbd92 has replaced all existing in-core callers of this function in
favor of durable_rename().  durable_rename_excl() is by nature unsafe on
crashes happening at the wrong time, so just remove it.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
2022-04-28 11:10:40 +09:00
David Rowley a00fd066b1 Add missing spaces after single-line comments
Only 1 of 3 of these changes appear to be handled by pgindent. That change
is new to v15.  The remaining two appear to be left alone by pgindent. The
exact reason for that is not 100% clear to me.  It seems related to the
fact that it's a line that contains *only* a single line comment and no
actual code.  It does not seem worth investigating this in too much
detail.  In any case, these do not conform to our usual practices, so fix
them.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
2022-04-14 09:28:56 +12:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Tom Lane 3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
Robert Haas 9ce346eabf Report progress of startup operations that take a long time.
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.

To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.

To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.

Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
2021-10-25 11:51:57 -04:00
Tom Lane c95ede41b8 In count_usable_fds(), duplicate stderr not stdin.
We had a complaint that the postmaster fails to start if the invoking
program closes stdin.  That happens because count_usable_fds expects
to be able to dup(0), and if it can't, we conclude there are no free
FDs and go belly-up.  So far as I can find, though, there is no other
place in the server that touches stdin, and it's not unreasonable to
expect that a daemon wouldn't use that file.

As a simple improvement, let's dup FD 2 (stderr) instead.  Unlike stdin,
it *is* reasonable for us to expect that stderr be open; even if we are
configured not to touch it, common libraries such as libc might try to
write error messages there.

Per gripe from Mario Emmenlauer.  Given the lack of previous complaints,
I'm not excited about pushing this into stable branches, but it seems
OK to squeeze it into v14.

Discussion: https://postgr.es/m/48bafc63-c30f-3962-2ded-f2e985d93e86@emmenlauer.de
2021-09-02 18:53:10 -04:00
Amit Kapila dcac5e7ac1 Refactor sharedfileset.c to separate out fileset implementation.
Move fileset related implementation out of sharedfileset.c to allow its
usage by backends that don't want to share filesets among different
processes. After this split, fileset infrastructure is used by both
sharedfileset.c and worker.c for the named temporary files that survive
across transactions.

Author: Dilip Kumar, based on suggestion by Andres Freund
Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila
Discussion: https://postgr.es/m/E1mCC6U-0004Ik-Fs@gemulon.postgresql.org
2021-08-30 08:48:15 +05:30
Andres Freund 675c945394 Move temporary file cleanup to before_shmem_exit().
As reported by a few OSX buildfarm animals there exist at least one path where
temporary files exist during AtProcExit_Files() processing. As temporary file
cleanup causes pgstat reporting, the assertions added in ee3f8d3d3a caused
failures.

This is not an OSX specific issue, we were just lucky that timing on OSX
reliably triggered the problem.  The known way to cause this is a FATAL error
during perform_base_backup() with a MANIFEST used - adding an elog(FATAL)
after InitializeBackupManifest() reliably reproduces the problem in isolation.

The problem is that the temporary file created in InitializeBackupManifest()
is not cleaned up via resource owner cleanup as WalSndResourceCleanup()
currently is only used for non-FATAL errors. That then allows to reach
AtProcExit_Files() with existing temporary files, causing the assertion
failure.

To fix this problem, move temporary file cleanup to a before_shmem_exit() hook
and add assertions ensuring that no temporary files are created before / after
temporary file management has been initialized / shut down. The cleanest way
to do so seems to be to split fd.c initialization into two, one for plain file
access and one for temporary file access.

Right now there's no need to perform further fd.c cleanup during process exit,
so I just renamed AtProcExit_Files() to BeforeShmemExit_Files(). Alternatively
we could perform another pass through the files to check that no temporary
files exist, but the added assertions seem to provide enough protection
against that.

It might turn out that the assertions added in ee3f8d3d3a will cause too much
noise - in that case we'll have to downgrade them to a WARNING, at least
temporarily.

This commit is not necessarily the best approach to address this issue, but it
should resolve the buildfarm failures. We can revise later.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210807190131.2bm24acbebl4wl6i@alap3.anarazel.de
2021-08-07 19:20:47 -07:00
Thomas Munro 0c08856856 Don't use #if inside function-like macro arguments.
No concrete problem reported, but in the past it's been known to cause
problems on some compilers so let's avoid doing that.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/234364.1626704007%40sss.pgh.pa.us
2021-07-20 11:19:13 +12:00
Thomas Munro 04cad8f7bc Adjust commit 2dbe8905 for ancient macOS.
A couple of open flags used in an assertion didn't exist in macOS 10.4.
Per build farm animal prairiedog.  Also add O_EXCL while here (there are
a few more standard flags but they're not relevant and likely to be
missing).
2021-07-19 16:50:21 +12:00
Thomas Munro 2dbe890571 Support direct I/O on macOS.
Macs don't understand O_DIRECT, but they can disable caching with a
separate fcntl() call.  Extend the file opening functions in fd.c to
handle this for us if the caller passes in PG_O_DIRECT.

For now, this affects only WAL data and even then only if you set:

  max_wal_senders=0
  wal_level=minimal

This is not expected to be very useful on its own, but later proposed
patches will make greater use of direct I/O, and it'll be useful for
testing if developers on Macs can see the effects.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BADiyyHe0cun2wfT%2BSVnFVqNYPxoO6J9zcZkVO7%2BNGig%40mail.gmail.com
2021-07-19 11:01:01 +12:00
Peter Eisentraut c31833779d Message style improvements 2021-06-28 08:36:44 +02:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Michael Paquier 909b449e00 Fix concurrency issues with WAL segment recycling on Windows
This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename().  As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG:  could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied

This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used.  This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive.  The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename().  In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).

Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.

Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz
Backpatch-through: 13
2021-03-22 14:02:26 +09:00