These opcodes have been around in the AMD world since 2007, and 2008 in
the case of intel. They're supported in GCC and Clang via some __builtin
macros. The opcodes may be unavailable during runtime, in which case we
fall back on a C-based implementation of the code. In order to get the
POPCNT instruction we must pass the -mpopcnt option to the compiler. We
do this only for the pg_bitutils.c file.
David Rowley (with fragments taken from a patch by Thomas Munro)
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
Using strtod() creates a double-rounding problem; the input decimal
value is first rounded to the nearest double; rounding that to the
nearest float may then give an incorrect result.
An example is that 7.038531e-26 when input via strtod and then rounded
to float4 gives 0xAE43FEp-107 instead of the correct 0xAE43FDp-107.
Values output by earlier PG versions with extra_float_digits=3 should
all be read in with the same values as previously. However, values
supplied by other software using shortest representations could be
mis-read.
On platforms that lack a strtof() entirely, we fall back to the old
incorrect rounding behavior. (As strtof() is required by C99, such
platforms are considered of primarily historical interest.) On VS2013,
some workarounds are used to get correct error handling.
The regression tests now test for the correct input values, so
platforms that lack strtof() will need resultmap entries. An entry for
HP-UX 10 is included (more may be needed).
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/871s5emitx.fsf@news-spur.riddles.org.uk
Discussion: https://postgr.es/m/87d0owlqpv.fsf@news-spur.riddles.org.uk
While Windows (allegedly) has _configthreadlocale() pretty far back,
it seems MinGW didn't acquire support for that till more recently.
Fortunately, we can use an autoconf probe on that toolchain,
instead of guessing whether it's there. (Hm, I wonder whether Cygwin
will need this also.)
Per buildfarm.
Discussion: https://postgr.es/m/20190121193512.tdmcnic2yjxlufaw@alap3.anarazel.de
ecpglib attempts to force the LC_NUMERIC locale to "C" while reading
server output, to avoid problems with strtod() and related functions.
Historically it's just issued setlocale() calls to do that, but that
has major problems if we're in a threaded application. setlocale()
itself is not required by POSIX to be thread-safe (and indeed is not,
on recent OpenBSD). Moreover, its effects are process-wide, so that
we could cause unexpected results in other threads, or another thread
could change our setting.
On platforms having uselocale(), which is required by POSIX:2008,
we can avoid these problems by using uselocale() instead. Windows
goes its own way as usual, but we can make it safe by using
_configthreadlocale(). Platforms having neither continue to use the
old code, but that should be pretty much nobody among current systems.
This should get back-patched, but let's see what the buildfarm
thinks of it first.
Michael Meskes and Tom Lane; thanks also to Takayuki Tsunakawa.
Discussion: https://postgr.es/m/31420.1547783697@sss.pgh.pa.us
This removes a portion of infrastructure introduced by fe0a0b5 to allow
compilation of Postgres in environments where no strong random source is
available, meaning that there is no linking to OpenSSL and no
/dev/urandom (Windows having its own CryptoAPI). No systems shipped
this century lack /dev/urandom, and the buildfarm is actually not
testing this switch at all, so just remove it. This simplifies
particularly some backend code which included a fallback implementation
using shared memory, and removes a set of alternate regression output
files from pgcrypto.
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20181230063219.GG608@paquier.xyz
It appears that all platforms that have sys_siglist[] also have
strsignal(), making that fallback case in pg_strsignal() dead code.
Getting rid of it allows dropping a configure test, which seems worth
more than providing textual signal descriptions on whatever platforms
might still hypothetically have use for the fallback case.
Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers. We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.
Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.
To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available. The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal". All extant callers print the numeric
signal number too, so no need to work harder than that.
Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.
Discussion: https://postgr.es/m/25758.1544983503@sss.pgh.pa.us
Add another transfer mode --clone to pg_upgrade (besides the existing
--link and the default copy), using special file cloning calls. This
makes the file transfer faster and more space efficient, achieving
speed similar to --link mode without the associated drawbacks.
On Linux, file cloning is supported on Btrfs and XFS (if formatted with
reflink support). On macOS, file cloning is supported on APFS.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Forward to POSIX pread() and pwrite(), or emulate them if unavailable.
The emulation is not perfect as the file position is changed, so
we'll put pg_ prefixes on the names to minimize the risk of confusion
in future patches that might inadvertently try to mix pread() and read()
on the same file descriptor.
Author: Thomas Munro
Reviewed-by: Tom Lane, Jesper Pedersen
Discussion: https://postgr.es/m/CAEepm=02rapCpPR3ZGF2vW=SBHSdFYO_bz_f-wwWJonmA3APgw@mail.gmail.com
NetBSD-current generates a large number of warnings about "%m" not
being appropriate to use with *printf functions. While that's true
for their native printf, it's surely not true for snprintf.c, so I
think they have misunderstood gcc's definition of the "gnu_printf"
archetype. Nonetheless, choosing "__syslog__" instead silences the
warnings; so teach configure about that.
Since this is only a cosmetic warning issue (and anyway it depends
on previous hacking to be self-consistent), no back-patch.
Discussion: https://postgr.es/m/16785.1539046036@sss.pgh.pa.us
In combination, these changes make our version of snprintf as fast
or faster than most platforms' native snprintf, except for cases
involving floating-point conversion (which we still delegate to
the native sprintf). The speed penalty for a float conversion
is down to around 10% though, much better than before.
Notable changes:
* Rather than always parsing the format twice to see if it contains
instances of %n$, do the extra scan only if we actually find a $.
This obviously wins for non-localized formats, and even when there
is use of %n$, we can avoid scanning text before the first % twice.
* Use strchrnul() if available to find the next %, and emit the
literal text between % escapes as strings rather than char-by-char.
* Create a bespoke function (dopr_outchmulti) for the common case
of emitting N copies of the same character, in place of writing
loops around dopr_outch.
* Simplify construction of the format string for invocations of sprintf
for floats.
* Const-ify some internal functions, and avoid unnecessary use of
pass-by-reference arguments.
Patch by me, reviewed by Andres Freund
Discussion: https://postgr.es/m/11787.1534530779@sss.pgh.pa.us
The method we've traditionally used, of redeclaring strerror_r() to
see if the compiler complains of inconsistent declarations, turns out
not to work reliably because some compilers only report a warning,
not an error. Amazingly, this has gone undetected for years, even
though it certainly breaks our detection of whether strerror_r
succeeded.
Let's instead test whether the compiler will take the result of
strerror_r() as a switch() argument. It's possible this won't
work universally either, but it's the best idea I could come up with
on the spur of the moment.
We should probably back-patch this once the dust settles, but
first let's see what the buildfarm thinks of it.
Discussion: https://postgr.es/m/10877.1537993279@sss.pgh.pa.us
We've spent an awful lot of effort over the years in coping with
platform-specific vagaries of the *printf family of functions. Let's just
forget all that mess and standardize on always using src/port/snprintf.c.
This gets rid of a lot of configure logic, and it will allow a saner
approach to dealing with %m (though actually changing that is left for
a follow-on patch).
Preliminary performance testing suggests that as it stands, snprintf.c is
faster than the native printf functions for some tasks on some platforms,
and slower for other cases. A pending patch will improve that, though
cases with floating-point conversions will doubtless remain slower unless
we want to put a *lot* of effort into that. Still, we've not observed
that *printf is really a performance bottleneck for most workloads, so
I doubt this matters much.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
elog.c has long had a private strerror wrapper that handles assorted
possible failures or deficiencies of the platform's strerror. On Windows,
it also knows how to translate Winsock error codes, which the native
strerror does not. Move all this code into src/port/strerror.c and
define strerror() as a macro that invokes it, so that both our frontend
and backend code will have all of this behavior.
I believe this constitutes an actual bug fix on Windows, since AFAICS
our frontend code did not report Winsock error codes properly before this.
However, the main point is to lay the groundwork for implementing %m
in src/port/snprintf.c: the behavior we want %m to have is this one,
not the native strerror's.
Note that this throws away the prior use of src/port/strerror.c,
which was to implement strerror() on platforms lacking it. That's
been dead code for nigh twenty years now, since strerror() was
already required by C89.
We should likewise cause strerror_r to use this behavior, but
I'll tackle that separately.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
Previously, pgbench always used select(2) for this purpose, but that's
problematic for very high client counts, because select() can't deal
with file descriptor numbers larger than FD_SETSIZE. It's pretty common
for that to be only 1024 or so, whereas modern OSes can allow many more
open files than that. Using poll(2) would surmount that problem, but it
creates another one: poll()'s timeout resolution is only 1ms, which is
poor enough to cause problems with --rate specifications approaching or
exceeding 1K TPS.
On platforms that have ppoll(2), which includes Linux and recent
FreeBSD, we can use that to avoid the FD_SETSIZE problem without any
loss of timeout resolution. Hence, add configure logic to test for
ppoll(), and use it if available.
This patch introduces an abstraction layer into pgbench that could
be extended to support other kernel event-wait APIs such as kevents.
But actually adding such support is a matter for some future patch.
Doug Rady, reviewed by Robert Haas and Fabien Coelho, and whacked around
a good bit more by me
Discussion: https://postgr.es/m/23D017C9-81B7-484D-8490-FD94DEC4DF59@amazon.com
This reverts commit 3a60c8ff89. Buildfarm
results show that that caused a whole bunch of new warnings on platforms
where gcc believes the local printf to be non-POSIX-compliant. This
problem outweighs the hypothetical-anyway possibility of getting warnings
for misuse of %m. We could use gnu_printf archetype when we've substituted
src/port/snprintf.c, but that brings us right back to the problem of not
getting warnings for %m.
A possible answer is to attack it in the other direction by insisting
that %m support be included in printf's feature set, but that will take
more investigation. In the meantime, revert the previous change, and
update the comment for PGAC_C_PRINTF_ARCHETYPE to more fully explain
what's going on.
Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
The elog/ereport family of functions certainly support the %m format spec,
because they implement it "by hand". But elsewhere we have printf wrappers
that might or might not allow it depending on whether the platform's printf
does. (Most non-glibc versions don't, and notably, src/port/snprintf.c
doesn't.) Hence, rather than using the gnu_printf format archetype
interchangeably for all these functions, use it only for elog/ereport.
This will allow us to get compiler warnings for mistakes like the ones
fixed in commit a13b47a59, at least on platforms where printf doesn't
take %m and gcc is correctly configured to know it. (Unfortunately,
that won't happen on Linux, nor on macOS according to my testing.
It remains to be seen what the buildfarm's gcc-on-Windows animals will
think of this, but we may well have to rely on less-popular platforms
to warn us about unportable code of this kind.)
Discussion: https://postgr.es/m/2975.1526862605@sss.pgh.pa.us
During the work of upstreaming my previous patches for gdb and perf
support the API changed. Adapt. Normally this wouldn't necessarily be
something to backpatch, but the previous API wasn't upstream, and at
least the gdb support is quite useful for debugging.
Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
We used to claim to support platforms using 'q' or 'I64' as the printf
length modifier for long long int, by dint of replacing snprintf with
our own code which uses the C99 standard 'll' modifier. But that is
only adequate if we use INT64_MODIFIER only in snprintf-based calls,
not directly with the platform's native printf or fprintf. Which
hasn't been the case for years. We had not noticed, partially because
of inadequate test coverage, and partially because the buildfarm is
almost completely bare of machines that won't take 'll'. The last
one seems to have been frogmouth, which was adjusted recently so that
it will take 'll'. We might as well just give up on the pretense
that anything else works, and save ourselves some configure cycles.
Discussion: https://postgr.es/m/13103.1526749980@sss.pgh.pa.us
Discussion: https://postgr.es/m/24769.1526772680@sss.pgh.pa.us
Ancient HPUX, for one, does this. We hadn't noticed due to the lack
of regression tests that required a working strtoll.
(I was slightly tempted to remove the other historical spelling,
strto[u]q, since it seems we have no buildfarm members testing that case.
But I refrained.)
Discussion: https://postgr.es/m/151935568942.1461.14623890240535309745@wrigleys.postgresql.org
Buildfarm member dromedary is still unhappy about the recently-added
ecpg "long long" tests. The reason turns out to be that it includes
"-ansi" in its CFLAGS, and in their infinite wisdom Apple have decided
to hide the declarations of strtoll/strtoull in C89-compliant builds.
(I find it pretty curious that they hide those function declarations
when you can nonetheless declare a "long long" variable, but anyway
that is their behavior, both on dromedary's obsolete macOS version and
the newest and shiniest.) As a result, gcc assumes these functions
return "int", leading naturally to wrong results.
(Looking at dromedary's past build results, it's evident that this
problem also breaks pg_strtouint64() on 32-bit platforms; but we
evidently have no regression tests that exercise that function with
values above 32 bits.)
To fix, supply declarations for these functions when the platform
provides the functions but not the declarations, using the same type
of mechanism as we use for some other similar cases.
Discussion: https://postgr.es/m/151935568942.1461.14623890240535309745@wrigleys.postgresql.org
ARMv8 introduced special CPU instructions for calculating CRC-32C. Use
them, when available, for speed.
Like with the similar Intel CRC instructions, several factors affect
whether the instructions can be used. The compiler intrinsics for them must
be supported by the compiler, and the instructions must be supported by the
target architecture. If the compilation target architecture does not
support the instructions, but adding "-march=armv8-a+crc" makes them
available, then we compile the code with a runtime check to determine if
the host we're running on supports them or not.
For the runtime check, use glibc getauxval() function. Unfortunately,
that's not very portable, but I couldn't find any more portable way to do
it. If getauxval() is not available, the CRC instructions will still be
used if the target architecture supports them without any additional
compiler flags, but the runtime check will not be available.
Original patch by Yuqi Gu, heavily modified by me. Reviewed by Andres
Freund, Thomas Munro.
Discussion: https://www.postgresql.org/message-id/HE1PR0801MB1323D171938EABC04FFE7FA9E3110%40HE1PR0801MB1323.eurprd08.prod.outlook.com
LLVM will be used for *optional* Just-in-time compilation
support. This commit just adds the configure infrastructure that
detects LLVM.
No documentation has been added for the --with-llvm flag, that'll be
added after the actual supporting code has been added.
Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
Since e3bdb2d926, libpq failed to build on
some platforms because they did not have SSL_clear_options(). Although
mainline OpenSSL introduced SSL_clear_options() after
SSL_OP_NO_COMPRESSION, so the code should have built fine, at least an
old NetBSD version (build farm "coypu" NetBSD 5.1 gcc 4.1.3 PR-20080704
powerpc) has SSL_OP_NO_COMPRESSION but no SSL_clear_options().
So add a configure check for SSL_clear_options(). If we don't find it,
skip the call. That means on such a platform one cannot *enable* SSL
compression if the built-in default is off, but that seems an unlikely
combination anyway and not very interesting in practice.
It seems we can't easily work around the lack of
X509_get_signature_nid(), so revert the previous attempts and just
disable the tls-server-end-point feature if we don't have it.
While ldaptls=1 provides an RFC 4513 conforming way to do LDAP
authentication with TLS encryption, there was an earlier de facto
standard way to do LDAP over SSL called LDAPS. Even though it's not
enshrined in a standard, it's still widely used and sometimes required
by organizations' network policies. There seems to be no reason not to
support it when available in the client library. Therefore, add support
when using OpenLDAP 2.4+ or Windows. It can be configured with
ldapscheme=ldaps or ldapurl=ldaps://...
Add tests for both ways of requesting LDAPS and a test for the
pre-existing ldaptls=1. Modify the 001_auth.pl test for "diagnostic
messages", which was previously relying on the server rejecting
ldaptls=1.
Author: Thomas Munro
Reviewed-By: Peter Eisentraut
Discussion: https://postgr.es/m/CAEepm=1s+pA-LZUjQ-9GQz0Z4rX_eK=DFXAF1nBQ+ROPimuOYQ@mail.gmail.com
It's not easy to get signed integer overflow checks correct and
fast. Therefore abstract the necessary infrastructure into a common
header providing addition, subtraction and multiplication for 16, 32,
64 bit signed integers.
The new macros aren't yet used, but a followup commit will convert
several open coded overflow checks.
Author: Andres Freund, with some code stolen from Greg Stark
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
Our initial work with int128 neglected alignment considerations, an
oversight that came back to bite us in bug #14897 from Vincent Lachenal.
It is unsurprising that int128 might have a 16-byte alignment requirement;
what's slightly more surprising is that even notoriously lax Intel chips
sometimes enforce that.
Raising MAXALIGN seems out of the question: the costs in wasted disk and
memory space would be significant, and there would also be an on-disk
compatibility break. Nor does it seem very practical to try to allow some
data structures to have more-than-MAXALIGN alignment requirement, as we'd
have to push knowledge of that throughout various code that copies data
structures around.
The only way out of the box is to make type int128 conform to the system's
alignment assumptions. Fortunately, gcc supports that via its
__attribute__(aligned()) pragma; and since we don't currently support
int128 on non-gcc-workalike compilers, we shouldn't be losing any platform
support this way.
Although we could have just done pg_attribute_aligned(MAXIMUM_ALIGNOF) and
called it a day, I did a little bit of extra work to make the code more
portable than that: it will also support int128 on compilers without
__attribute__(aligned()), if the native alignment of their 128-bit-int
type is no more than that of int64.
Add a regression test case that exercises the one known instance of the
problem, in parallel aggregation over a bigint column.
This will need to be back-patched, along with the preparatory commit
91aec93e6. But let's see what the buildfarm makes of it first.
Discussion: https://postgr.es/m/20171110185747.31519.28038@wrigleys.postgresql.org
Unfortunately using 'restrict' plainly causes problems with MSVC,
which supports restrict only as '__restrict'. Defining 'restrict' to
'__restrict' unfortunately causes a conflict with MSVC's usage of
__declspec(restrict) in headers.
Therefore define pg_restrict to the appropriate keyword instead, and
replace existing usages.
This replaces the temporary workaround introduced in 36b4b91ba0.
Author: Andres Freund
Discussion: https://postgr.es/m/2656.1507830907@sss.pgh.pa.us
Will be used in later commits improving performance for a few key
routines where information about aliasing allows for significantly
better code generation.
This allows to use the C99 'restrict' keyword without breaking C89, or
for that matter C++, compilers. If not supported it's defined to be
empty.
Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
The previous placement of the fallback implementation in libpgcommon
was problematic, because libpqport functions need strnlen
functionality.
Move replacement into libpgport. Provide strnlen() under its posix
name, instead of pg_strnlen(). Fix stupid configure bug, executing the
test only when compiled with threading support.
Author: Andres Freund
Discussion: https://postgr.es/m/E1e1gR2-0005fB-SI@gemulon.postgresql.org
Upcoming patches are going to address performance issues that involve
slow system provided ntohs/htons etc. To address that expand
pg_bswap.h to provide pg_ntoh{16,32,64}, pg_hton{16,32,64} and
optimize their respective implementations by using compiler intrinsics
for gcc compatible compilers and msvc. Fall back to manual
implementations using shifts etc otherwise.
Additionally remove multiple evaluation hazards from the existing
BSWAP32/64 macros, by replacing them with inline functions when
necessary. In the course of that the naming scheme is changed to
pg_bswap16/32/64.
Author: Andres Freund
Discussion: https://postgr.es/m/20170927172019.gheidqy6xvlxb325@alap3.anarazel.de
On Linux, shared memory segments created with shm_open() are backed by
swap files created in tmpfs. If the swap file needs to be extended,
but there's no tmpfs space left, you get a very unfriendly SIGBUS trap.
To avoid this, force allocation of the full request size when we create
the segment. This adds a few cycles, but none that we wouldn't expend
later anyway, assuming the request isn't hugely bigger than the actual
need.
Make this code #ifdef __linux__, because (a) there's not currently a
reason to think the same problem exists on other platforms, and (b)
applying posix_fallocate() to an FD created by shm_open() isn't very
portable anyway.
Back-patch to 9.4 where the DSM code came in.
Thomas Munro, per a bug report from Amul Sul
Discussion: https://postgr.es/m/1002664500.12301802.1471008223422.JavaMail.yahoo@mail.yahoo.com
These functions are required by SUS v2, which is our minimum baseline
for Unix platforms, and are present on all interesting Windows versions
as well. Even our oldest buildfarm members have them. Thus, we were not
testing the "!USE_WIDE_UPPER_LOWER" code paths, which explains why the bug
fixed in commit e6023ee7f escaped detection. Per discussion, there seems
to be no more real-world value in maintaining this option. Hence, remove
the configure-time tests for wcstombs() and towlower(), remove the
USE_WIDE_UPPER_LOWER symbol, and remove all the !USE_WIDE_UPPER_LOWER code.
There's not actually all that much of the latter, but simplifying the #if
nests is a win in itself.
Discussion: https://postgr.es/m/20170921052928.GA188913@rfd.leadboat.com
For performance reasons a larger segment size than the default 16MB
can be useful. A larger segment size has two main benefits: Firstly,
in setups using archiving, it makes it easier to write scripts that
can keep up with higher amounts of WAL, secondly, the WAL has to be
written and synced to disk less frequently.
But at the same time large segment size are disadvantageous for
smaller databases. So far the segment size had to be configured at
compile time, often making it unrealistic to choose one fitting to a
particularly load. Therefore change it to a initdb time setting.
This includes a breaking changes to the xlogreader.h API, which now
requires the current segment size to be configured. For that and
similar reasons a number of binaries had to be taught how to recognize
the current segment size.
Author: Beena Emerson, editorialized by Andres Freund
Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael
Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja
Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
Instead of using a cast to force the constant to be the right width,
assume we can plaster on an L, UL, LL, or ULL suffix as appropriate.
The old approach to this is very hoary, dating from before we were
willing to require compilers to have working int64 types.
This fix makes the PG_INT64_MIN, PG_INT64_MAX, and PG_UINT64_MAX
constants safe to use in preprocessor conditions, where a cast
doesn't work. Other symbolic constants that might be defined using
[U]INT64CONST are likewise safer than before.
Also fix the SIZE_MAX macro to be similarly safe, if we are forced
to provide a definition for that. The test added in commit 2e70d6b5e
happens to do what we want even with the hack "(size_t) -1" definition,
but we could easily get burnt on other tests in future.
Back-patch to all supported branches, like the previous commits.
Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
Various bugs can cause crashes, so don't use that function before ICU
53. It will fall back to the code path used for other encodings.
Since we now tie the function availability to an ICU version, we don't
need the configure test anymore. That also resolves the issue that the
test result was previously hardcoded for Windows.
researched by Daniel Verite <daniel@manitou-mail.org>, Peter Geoghegan
<pg@bowt.ie>, Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org
This reverts commit 81069a9efc.
Buildfarm results suggest that some platforms have versions of pselect(2)
that are not merely non-atomic, but flat out non-functional. Revert the
use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART
patch as the source of trouble). If it's so, we should probably look into
blacklisting specific platforms that have broken pselect.
Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
Traditionally we've unblocked signals, called select(2), and then blocked
signals again. The code expects that the select() will be cancelled with
EINTR if an interrupt occurs; but there's a race condition, which is that
an already-pending signal will be delivered as soon as we unblock, and then
when we reach select() there will be nothing preventing it from waiting.
This can result in a long delay before we perform any action that
ServerLoop was supposed to have taken in response to the signal. As with
the somewhat-similar symptoms fixed by commit 893902085, the main practical
problem is slow launching of parallel workers. The window for trouble is
usually pretty short, corresponding to one iteration of ServerLoop; but
it's not negligible.
To fix, use pselect(2) in place of select(2) where available, as that's
designed to solve exactly this problem. Where not available, we continue
to use the old way, and are no worse off than before.
pselect(2) has been required by POSIX since about 2001, so most modern
platforms should have it. A bigger portability issue is that some
implementations are said to be non-atomic, ie pselect() isn't really
any different from unblock/select/reblock. Still, we're no worse off
than before on such a platform.
There is talk of rewriting the postmaster to use a WaitEventSet and
not do signal response work in signal handlers, at which point this
could be reverted, since we'd be using a self-pipe to solve the race
condition. But that's not happening before v11 at the earliest.
Back-patch to 9.6. The problem exists much further back, but the
worst symptom arises only in connection with parallel query, so it
does not seem worth taking any portability risks in older branches.
Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
poll.h is mandated by Single Unix Spec v2, the usual baseline for
postgres on unix. None of the unixoid buildfarms animals has
sys/poll.h but not poll.h. Therefore there's not much point to test
for sys/poll.h's existence and include it optionally.
Author: Andres Freund, per suggestion from Tom Lane
Discussion: https://postgr.es/m/20505.1492723662@sss.pgh.pa.us
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.
If the compiler supports typeof or something similar, cast the result to
the input type. This creates a greater amount of type safety. In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Add a column collprovider to pg_collation that determines which library
provides the collation data. The existing choices are default and libc,
and this adds an icu choice, which uses the ICU4C library.
The pg_locale_t type is changed to a union that contains the
provider-specific locale handles. Users of locale information are
changed to look into that struct for the appropriate handle to use.
Also add a collversion column that records the version of the collation
when it is created, and check at run time whether it is still the same.
This detects potentially incompatible library upgrades that can corrupt
indexes and other structures. This is currently only supported by
ICU-provided collations.
initdb initializes the default collation set as before from the `locale
-a` output but also adds all available ICU locales with a "-x-icu"
appended.
Currently, ICU-provided collations can only be explicitly named
collations. The global database locales are still always libc-provided.
ICU support is enabled by configure --with-icu.
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
We had some AC_CHECK_HEADER tests that were really wastes of cycles,
because the code proceeded to #include those headers unconditionally
anyway, in all or a large majority of cases. The lack of complaints
shows that those headers are available on every platform of interest,
so we might as well let configure run a bit faster by not probing
those headers at all.
I suspect that some of the tests I left alone are equally useless, but
since all the existing #includes of the remaining headers are properly
guarded, I didn't touch them.
Per discussion, the time has come to do this. The handwriting has been
on the wall at least since 9.0 that this would happen someday, whenever
it got to be too much of a burden to support the float-timestamp option.
The triggering factor now is the discovery that there are multiple bugs
in the code that attempts to implement use of integer timestamps in the
replication protocol even when the server is built for float timestamps.
The internal float timestamps leak into the protocol fields in places.
While we could fix the identified bugs, there's a very high risk of
introducing more. Trying to build a wall that would positively prevent
mixing integer and float timestamps is more complexity than we want to
undertake to maintain a long-deprecated option. The fact that these
bugs weren't found through testing also indicates a lack of interest
in float timestamps.
This commit disables configure's --disable-integer-datetimes switch
(it'll still accept --enable-integer-datetimes, though), removes direct
references to USE_INTEGER_DATETIMES, and removes discussion of float
timestamps from the user documentation. A considerable amount of code is
rendered dead by this, but removing that will occur as separate mop-up.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
The advantage of clock_gettime() is that the API allows the result to
be precise to nanoseconds, not just microseconds as in gettimeofday().
Now that it's routinely possible to do tens of plan node executions
in 1us, we really need more precision than gettimeofday() can offer
for EXPLAIN ANALYZE to accumulate statistics with.
Some research shows that clock_gettime() is available on pretty nearly
every modern Unix-ish platform, and as far as I have been able to test,
it has about the same execution time as gettimeofday(), so there's no
loss in switching over. (By the same token, this doesn't do anything
to fix the fact that we really wish clock readings were faster. But
there's enough win here to justify changing anyway.)
A small side benefit is that on most platforms, we can use CLOCK_MONOTONIC
instead of CLOCK_REALTIME and thereby render EXPLAIN impervious to
concurrent resets of the system clock. (This means that code must not
assume that the contents of struct instr_time have any well-defined
interpretation as timestamps, but really that was true before.)
Some platforms offer nonstandard clock IDs that might be of interest.
This patch knows we should use CLOCK_MONOTONIC_RAW on macOS, because it
provides more precision and is faster to read than their CLOCK_MONOTONIC.
If there turn out to be many more cases where we need special rules, it
might be appropriate to handle the selection of clock ID in configure,
but for the moment that doesn't seem worth the trouble.
Discussion: https://postgr.es/m/31856.1400021891@sss.pgh.pa.us
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
Unlike the current pgcrypto function, the source is chosen by configure.
That makes it easier to test different implementations, and ensures that
we don't accidentally fall back to a less secure implementation, if the
primary source fails. All of those methods are quite reliable, it would be
pretty surprising for them to fail, so we'd rather find out by failing
hard.
If no strong random source is available, we fall back to using erand48(),
seeded from current timestamp, like PostmasterRandom() was. That isn't
cryptographically secure, but allows us to still work on platforms that
don't have any of the above stronger sources. Because it's not very secure,
the built-in implementation is only used if explicitly requested with
--disable-strong-random.
This replaces the more complicated Fortuna algorithm we used to have in
pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
so it doesn't seem worth the maintenance effort to keep that. pgcrypto
functions that require strong random numbers will be disabled with
--disable-strong-random.
Original patch by Magnus Hagander, tons of further work by Michael Paquier
and me.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
LibreSSL defines OPENSSL_VERSION_NUMBER to claim that it is version 2.0.0,
but it doesn't have the functions added in OpenSSL 1.1.0. Add autoconf
checks for the individual functions we need, and stop relying on
OPENSSL_VERSION_NUMBER.
Backport to 9.5 and 9.6, like the patch that broke this. In the
back-branches, there are still a few OPENSSL_VERSION_NUMBER checks left,
to check for OpenSSL 0.9.8 or 0.9.7. I left them as they were - LibreSSL
has all those functions, so they work as intended.
Per buildfarm member curculio.
Discussion: <2442.1473957669@sss.pgh.pa.us>
Create a "bsd" auth method that works the same as "password" so far as
clients are concerned, but calls the BSD Authentication service to
check the password. This is currently only available on OpenBSD.
Marisa Emerson, reviewed by Thomas Munro
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.
This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.
Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.
In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.
To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication. There are additional
places that would benefit from a long-lived set, but that's a task for
another day.
Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.
Reported-By: Dmitry Vasilyev Discussion:
CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com20160114143931.GG10941@awork2.anarazel.de
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t. According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t. (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)
It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>. This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.
Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().
Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
Insert sd_notify() calls at server start and stop for integration with
systemd. This allows the use of systemd service units of type "notify",
which greatly simplifies the systemd configuration.
Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>
It emerges that libreadline doesn't notice terminal window size change
events unless they occur while collecting input. This is easy to stumble
over if you resize the window while using a pager to look at query output,
but it can be demonstrated without any pager involvement. The symptom is
that queries exceeding one line are misdisplayed during subsequent input
cycles, because libreadline has the wrong idea of the screen dimensions.
The safest, simplest way to fix this is to call rl_reset_screen_size()
just before calling readline(). That causes an extra ioctl(TIOCGWINSZ)
for every command; but since it only happens when reading from a tty, the
performance impact should be negligible. A more valid objection is that
this still leaves a tiny window during entry to readline() wherein delivery
of SIGWINCH will be missed; but the practical consequences of that are
probably negligible. In any case, there doesn't seem to be any good way to
avoid the race, since readline exposes no functions that seem safe to call
from a generic signal handler --- rl_reset_screen_size() certainly isn't.
It turns out that we also need an explicit rl_initialize() call, else
rl_reset_screen_size() dumps core when called before the first readline()
call.
rl_reset_screen_size() is not present in old versions of libreadline,
so we need a configure test for that. (rl_initialize() is present at
least back to readline 4.0, so we won't bother with a test for it.)
We would need a configure test anyway since libedit's emulation of
libreadline doesn't currently include such a function. Fortunately,
libedit seems not to have any corresponding bug.
Merlin Moncure, adjusted a bit by me
This is like BSWAP32, but for 64-bit values. Since we've got two of
them now and they have use cases (like sortsupport) beyond CRCs, move
the definitions to their own header file.
Peter Geoghegan
Remove configure's checks for HAVE_POSIX_SIGNALS, HAVE_SIGPROCMASK, and
HAVE_SIGSETJMP. These APIs are required by the Single Unix Spec v2
(POSIX 1997), which we generally consider to define our minimum required
set of Unix APIs. Moreover, no buildfarm member has reported not having
them since 2012 or before, which means that even if the code is still live
somewhere, it's untested --- and we've made plenty of signal-handling
changes of late. So just take these APIs as given and save the cycles for
configure probes for them.
However, we can't remove as much C code as I'd hoped, because the Windows
port evidently still uses the non-POSIX code paths for signal masking.
Since we're largely emulating these BSD-style APIs for Windows anyway, it
might be a good thing to switch over to POSIX-like notation and thereby
remove a few more #ifdefs. But I'm not in a position to code or test that.
In the meantime, we can at least make things a bit more transparent by
testing for WIN32 explicitly in these places.
C89 requires <signal.h> to define sig_atomic_t, and there is no evidence
in the buildfarm that any supported platforms don't comply. Remove the
configure test to stop wasting build cycles on a purely historical issue.
(Once upon a time, we cared about supporting C89-compliant compilers on
machines with pre-C89 system headers, but that use-case has been dead for
quite a few years.)
I have some other fixes planned in this area, but let's start with this
to see if the buildfarm produces any surprising results.
These are emitted by the new ax_pthread.m4 script version. They are not
used for anything in PostgreSQL, but let's keep the generated header file
up-to-date.
Andres Freund
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.
To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.
This change in policy will allow us to more easily employ inline
functions.
I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.
Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.
Discussion: 20150701161447.GB30708@awork2.anarazel.de
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
We will, for the foreseeable future, not expose 128 bit datatypes to
SQL. But being able to use 128bit math will allow us, in a later patch,
to use 128bit accumulators for some aggregates; leading to noticeable
speedups over using numeric.
So far we only detect a gcc/clang extension that supports 128bit math,
but no 128bit literals, and no *printf support. We might want to expand
this in the future to further compilers; if there are any that that
provide similar support.
Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson, with significant editorializing by me
Reviewed-By: Peter Geoghegan, Oskari Saarenmaa
This speeds up WAL generation and replay. The new algorithm is
significantly faster with large inputs, like full-page images or when
inserting wide rows. It is slower with tiny inputs, i.e. less than 10 bytes
or so, but the speedup with longer inputs more than make up for that. Even
small WAL records at least have 24 byte header in the front.
The output is identical to the current byte-at-a-time computation, so this
does not affect compatibility. The new algorithm is only used for the
CRC-32C variant, not the legacy version used in tsquery or the
"traditional" CRC-32 used in hstore and ltree. Those are not as performance
critical, and are usually only applied over small inputs, so it seems
better to not carry around the extra lookup tables to speed up those rare
cases.
Abhijit Menon-Sen
We had code that supposed that some platforms might offer a nonstandard
version of getpwuid_r() with only four arguments. However, the 5-argument
definition has been standardized at least since the Single Unix Spec v2,
which is our normal reference for what's portable across all Unix-oid
platforms. (What's more, this wasn't the only pre-standardization version
of getpwuid_r(); my old HPUX 10.20 box has still another signature.)
So let's just get rid of the now-useless configure step.
Darwin --enable-nls builds use a substitute setlocale() that may start a
thread. Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously. Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork(). Emit LOG messages about the
problem and its workaround. Back-patch to 9.0 (all supported versions).
Several upcoming performance/scalability improvements require atomic
operations. This new API avoids the need to splatter compiler and
architecture dependent code over all the locations employing atomic
ops.
For several of the potential usages it'd be problematic to maintain
both, a atomics using implementation and one using spinlocks or
similar. In all likelihood one of the implementations would not get
tested regularly under concurrency. To avoid that scenario the new API
provides a automatic fallback of atomic operations to spinlocks. All
properties of atomic operations are maintained. This fallback -
obviously - isn't as fast as just using atomic ops, but it's not bad
either. For one of the future users the atomics ontop spinlocks
implementation was actually slightly faster than the old purely
spinlock using implementation. That's important because it reduces the
fear of regressing older platforms when improving the scalability for
new ones.
The API, loosely modeled after the C11 atomics support, currently
provides 'atomic flags' and 32 bit unsigned integers. If the platform
efficiently supports atomic 64 bit unsigned integers those are also
provided.
To implement atomics support for a platform/architecture/compiler for
a type of atomics 32bit compare and exchange needs to be
implemented. If available and more efficient native support for flags,
32 bit atomic addition, and corresponding 64 bit operations may also
be provided. Additional useful atomic operations are implemented
generically ontop of these.
The implementation for various versions of gcc, msvc and sun studio have
been tested. Additional existing stub implementations for
* Intel icc
* HUPX acc
* IBM xlc
are included but have never been tested. These will likely require
fixes based on buildfarm and user feedback.
As atomic operations also require barriers for some operations the
existing barrier support has been moved into the atomics code.
Author: Andres Freund with contributions from Oskari Saarenmaa
Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera
Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=VXAnVRHp3Kess6Pn3DMXAPAEA@mail.gmail.com,
20131015123303.GH5300@awork2.anarazel.de,
20131028205522.GI20248@awork2.anarazel.de
We have had INT64_FORMAT and UINT64_FORMAT for a long time, but that's not
good enough if you want something more exotic, like "%20lld".
Abhijit Menon-Sen, per Andres Freund's suggestion.
This refactoring is in preparation for adding support for other SSL
implementations, with no user-visible effects. There are now two #defines,
USE_OPENSSL which is defined when building with OpenSSL, and USE_SSL which
is defined when building with any SSL implementation. Currently, OpenSSL is
the only implementation so the two #defines go together, but USE_SSL is
supposed to be used for implementation-independent code.
The libpq SSL code is changed to use a custom BIO, which does all the raw
I/O, like we've been doing in the backend for a long time. That makes it
possible to use MSG_NOSIGNAL to block SIGPIPE when using SSL, which avoids
a couple of syscall for each send(). Probably doesn't make much performance
difference in practice - the SSL encryption is expensive enough to mask the
effect - but it was a natural result of this refactoring.
Based on a patch by Martijn van Oosterhout from 2006. Briefly reviewed by
Alvaro Herrera, Andreas Karlsson, Jeff Janes.
Apparently we still build against OpenSSL so old that it doesn't
have this function, so add an autoconf check for it to make the
buildfarm happy. If the function doesn't exist, always return
that compression is disabled, since presumably the actual
compression functionality is always missing.
For now, hardcode the function as present on MSVC, since we should
hopefully be well beyond those old versions on that platform.
Almost ten years ago, commit e48322a6d6 broke
the logic in ACX_PTHREAD by looping through all the possible flags rather
than stopping with the first one that would work. This meant that
$acx_pthread_ok was no longer meaningful after the loop; it would usually
be "no", whether or not we'd found working thread flags. The reason nobody
noticed is that Postgres doesn't actually use any of the symbols set up
by the code after the loop. Rather than complicate things some more to
make it work as designed, let's just remove all that dead code, and thereby
save a few cycles in each configure run.
This function is pervasive on free software operating systems; import
NetBSD's implementation. Back-patch to 8.4, like the commit that will
harness it.
Allow the contrib/uuid-ossp extension to be built atop any one of these
three popular UUID libraries. (The extension's name is now arguably a
misnomer, but we'll keep it the same so as not to cause unnecessary
compatibility issues for users.)
We would not normally consider a change like this post-beta1, but the issue
has been forced by our upgrade to autoconf 2.69, whose more rigorous header
checks are causing OSSP's header files to be rejected on some platforms.
It's been foreseen for some time that we'd have to move away from depending
on OSSP UUID due to lack of upstream maintenance, so this is a down payment
on that problem.
While at it, add some simple regression tests, in hopes of catching any
major incompatibilities between the three implementations.
Matteo Beccati, with some further hacking by me
Since C99, it's been standard for printf and friends to accept a "z" size
modifier, meaning "whatever size size_t has". Up to now we've generally
dealt with printing size_t values by explicitly casting them to unsigned
long and using the "l" modifier; but this is really the wrong thing on
platforms where pointers are wider than longs (such as Win64). So let's
start using "z" instead. To ensure we can do that on all platforms, teach
src/port/snprintf.c to understand "z", and add a configure test to force
use of that implementation when the platform's version doesn't handle "z".
Having done that, modify a bunch of places that were using the
unsigned-long hack to use "z" instead. This patch doesn't pretend to have
gotten everyplace that could benefit, but it catches many of them. I made
an effort in particular to ensure that all uses of the same error message
text were updated together, so as not to increase the number of
translatable strings.
It's possible that this change will result in format-string warnings from
pre-C99 compilers. We might have to reconsider if there are any popular
compilers that will warn about this; but let's start by seeing what the
buildfarm thinks.
Andres Freund, with a little additional work by me
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).
libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
Defining this symbol causes OS X 10.5 to use a buggy version of readdir(),
which can sometimes fail with EINVAL if the previously-fetched directory
entry has been deleted or renamed. In later OS X versions that bug has
been repaired, but we still don't need the #define because it's on by
default. So this is just an all-around bad idea, and we can do without it.
Remove the use of the following macros, which are obsolescent according
to the Autoconf documentation:
- AC_C_CONST
- AC_C_STRINGIZE
- AC_C_VOLATILE
- AC_FUNC_MEMCMP
asprintf(), aside from not being particularly portable, has a fundamentally
badly-designed API; the psprintf() function that was added in passing in
the previous patch has a much better API choice. Moreover, the NetBSD
implementation that was borrowed for the previous patch doesn't work with
non-C99-compliant vsnprintf, which is something we still have to cope with
on some platforms; and it depends on va_copy which isn't all that portable
either. Get rid of that code in favor of an implementation similar to what
we've used for many years in stringinfo.c. Also, move it into libpgcommon
since it's not really libpgport material.
I think this patch will be enough to turn the buildfarm green again, but
there's still cosmetic work left to do, namely get rid of pg_asprintf()
in favor of using psprintf(). That will come in a followon patch.
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
This reverts commit 269e780822
and commit 5b571bb8c8.
Unfortunately, the initial patch had insufficient performance testing,
and resulted in a regression.
Per report by Thom Brown.
This function is more efficient than actually writing out zeroes to
the new file, per microbenchmarks by Jon Nelson. Also, it may reduce
the likelihood of WAL file fragmentation.
Jon Nelson, with review by Andres Freund, Greg Smith and me.
In commit 71450d7fd6, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher. This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.
The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99. But our minimum language baseline is still C89, so add a configure
test for that.
The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel. Otherwise, use a local variable inside
the macros to prevent double evaluation. The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable. But it seems
better than not teaching non-gcc compilers about unreachability at all.
Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted. However, it seems wise to do this only
in non-assert builds. In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.
These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.
Andres Freund, Tom Lane, Heikki Linnakangas
The former name was too likely to conflict with symbols from external
headers; and, as seen in recent buildfarm failures in member spoonbill,
it has now happened at least in plpython.
Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".
This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
Currently, the macros only work with fairly recent gcc versions, but there
is room to expand them to other compilers that have comparable features.
Heavily revised and autoconfiscated version of a patch by Andres Freund.
We previously supposed that any given platform would supply both or neither
of these functions, so that one configure test would be sufficient. It now
appears that at least on AIX this is not the case ... which is likely an
AIX bug, but nonetheless we need to cope with it. So use separate tests.
Per bug #6758; thanks to Andrew Hastie for doing the followup testing
needed to confirm what was happening.
Backpatch to 9.1, where we began using these functions.
We had put a test for libxml2's xmlStructuredErrorContext variable in
configure, but of course that doesn't work on Windows builds. The next
best alternative seems to be to test the LIBXML_VERSION symbol provided
by xmlversion.h.
Per report from Talha Bin Rizwan, though this fixes it in a different way
than his proposed patch.
Historically we have not worried about fsync'ing anything during initdb
(in fact, initdb intentionally passes -F to each backend launch to prevent
it from fsync'ing). But with filesystems getting more aggressive about
caching data, that's not such a good plan anymore. Make initdb do a pass
over the finished data directory tree to fsync everything. For testing
purposes, the -N/--nosync flag can be used to restore the old behavior.
Also, testing shows that on Linux, sync_file_range() is much faster than
posix_fadvise() for hinting to the kernel that an fsync is coming,
apparently because the latter blocks on a rather small request queue while
the former doesn't. So use this function if available in initdb, and also
in the backend's pg_flush_data() (where it currently will affect only the
speed of CREATE DATABASE's cloning step).
We will later make pg_regress invoke initdb with the --nosync flag
to avoid slowing down cases such as "make check" in contrib. But
let's not do so until we've shaken out any portability issues in this
patch.
Jeff Davis, reviewed by Andres Freund
All Unix-oid platforms that we currently support should have waitpid(),
since it's in V2 of the Single Unix Spec. Our git history shows that
the wait3 code was added to support NextStep, which we officially dropped
support for as of 9.2. So get rid of the configure test, and simplify the
macro spaghetti in reaper(). Per suggestion from Fujii Masao.
The BSD-ish members of the buildfarm all seem to think removing this
was a bad idea. It looks to me like it resulted in omitting the system
header inclusion necessary to detect the fields of struct tm correctly.
ENABLE_DTRACE unused as of a7b7b07af3
HAVE_ERR_SET_MARK unused as of 4ed4b6c54e
HAVE_FCVT unused as of 4553e1d80f
HAVE_STRUCT_SOCKADDR_UN unused as of b4cea00a1f
HAVE_SYSCONF unused as of f83356c7f5
TM_IN_SYS_TIME never used, obsolescent per Autoconf documentation
In the Fedora variant of MinGW, the openssl libraries have their normal
names, not libeay32 and libssleay32. Adjust configure probes to allow
that, per bug #6486.
Tomasz Ostrowski
The immediate impetus for this is that Noah Misch's patch to elide
unnecessary table and index rebuilds when changing typmod for temporal
types uses it; and this is extracted from that patch, with some
further commentary by me. But it seems logically separate from the
remainder of the patch, so I'm committing it separately; this is not
the first time someone has wanted fls() in the backend and probably
won't be the last.
If we end up using this in more performance-critical spots it may be
worthwhile to add some architecture-specific optimizations to our
src/port version of fls() - e.g. any x86 platform can implement this
using the assembly instruction BSRL. But performance won't matter
a bit for assessing typmod changes, so I'm not worried about that
right now.
Historically we've used the SWPB instruction for TAS() on ARM, but this
is deprecated and not available on ARMv6 and later. Instead, make use
of a GCC builtin if available. We'll still fall back to SWPB if not,
so as not to break existing ports using older GCC versions.
Eventually we might want to try using __sync_lock_test_and_set() on some
other architectures too, but for now that seems to present only risk and
not reward.
Back-patch to all supported versions, since people might want to use any
of them on more recent ARM chips.
Martin Pitt
The hint bit makes for a small but measurable performance improvement
in access to contended spinlocks.
On the other hand, some PPC chips give an illegal-instruction failure.
There doesn't seem to be a completely bulletproof way to tell whether the
hint bit will cause an illegal-instruction failure other than by trying
it; but most if not all 64-bit PPC machines should accept it, so follow
the Linux kernel's lead and assume it's okay to use it in 64-bit builds.
Of course we must also check whether the assembler accepts the command,
since even with a recent CPU the toolchain could be old.
Patch by Manabu Ori, significantly modified by me.
All supported platforms support the C89 standard function atexit()
(SunOS 4 probably being the last one not to), and supporting both
makes the code clumsy.
Original patch by Lars Kanis, reviewed by Nishiyama Tomoaki and tweaked some by me.
This compiler, or at least the latest version of it, is currently broken, and
only passes the regression tests if built with -O0.
glibc renders random() thread-safe by wrapping a futex lock around it;
testing reveals that this limits the performance of pgbench on machines
with many CPU cores. Rather than switching to random_r(), which is
only available on GNU systems and crashes unless you use undocumented
alchemy to initialize the random state properly, switch to our built-in
implementation of erand48(), which is both thread-safe and concurrent.
Since the list of reasons not to use the operating system's erand48()
is getting rather long, rename ours to pg_erand48() (and similarly
for our implementations of lrand48() and srand48()) and just always
use those. We were already doing this on Cygwin anyway, and the
glibc implementation is not quite thread-safe, so pgbench wouldn't
be able to use that either.
Per discussion with Tom Lane.
libxml reports some errors (like invalid xmlns attributes) via the error
handler hook, but still returns a success indicator to the library caller.
This causes us to miss some errors that are important to report. Since the
"generic" error handler hook doesn't know whether the message it's getting
is for an error, warning, or notice, stop using that and instead start
using the "structured" error handler hook, which gets enough information
to be useful.
While at it, arrange to save and restore the error handler hook setting in
each libxml-using function, rather than assuming we can set and forget the
hook. This should improve the odds of working nicely with third-party
libraries that also use libxml.
In passing, volatile-ize some local variables that get modified within
PG_TRY blocks. I noticed this while testing with an older gcc version
than I'd previously tried to compile xml.c with.
Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch
Flexible array members are a C99 feature that avoids "cheating" in the
declaration of variable-length arrays at the end of structs. With
Autoconf support, this should be transparent for older compilers.
We start with one use in gist.h because gcc 4.6 started to raise a
warning there. Over time, it can be expanded to other places in the
source, but they will likely need some review of sizeof and offsetof
usage. The current change in gist.h appears to be safe in this
regard.
It turns out the reason we hadn't found out about the portability issues
with our credential-control-message code is that almost no modern platforms
use that code at all; the ones that used to need it now offer getpeereid(),
which we choose first. The last holdout was NetBSD, and they added
getpeereid() as of 5.0. So far as I can tell, the only live platform on
which that code was being exercised was Debian/kFreeBSD, ie, FreeBSD kernel
with Linux userland --- since glibc doesn't provide getpeereid(), we fell
back to the control message code. However, the FreeBSD kernel provides a
LOCAL_PEERCRED socket parameter that's functionally equivalent to Linux's
SO_PEERCRED. That is both much simpler to use than control messages, and
superior because it doesn't require receiving a message from the other end
at just the right time.
Therefore, add code to use LOCAL_PEERCRED when necessary, and rip out all
the credential-control-message code in the backend. (libpq still has such
code so that it can still talk to pre-9.1 servers ... but eventually we can
get rid of it there too.) Clean up related autoconf probes, too.
This means that libpq's requirepeer parameter now works on exactly the same
platforms where the backend supports peer authentication, so adjust the
documentation accordingly.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available. Where those functions are not
available, temporarily select the correct locale with uselocale().
This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.
I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions. Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
Synchronize pg_config.h.in with configure.in (someone must have
forgotten to run autoheader or autoreconf), and clean up some spurious
change in configure introduced by the last commit there.
compilers, by applying a configure check to see if the compiler will accept
an unreferenced "static inline foo ..." function without warnings. It is
believed that such warnings are the only reason not to declare inlined
functions in headers, if the compiler understands "inline" at all.
Kurt Harriman
provide a working 64-bit integer datatype. As recently noted, we've been
broken on such platforms since early in the 8.4 development cycle. Since
it took nearly two years for anyone to even notice, it seems that the
rationale for continuing to support such platforms has reached the point
of non-existence. Rather than thrashing around to try to make it work
again, we'll just admit up front that this no longer works.
Back-patch to 8.4 since that branch is also broken.
We should go around to remove INT64_IS_BUSTED support, but just in HEAD,
so that seems like material for a separate commit.
This is more in keeping with modern practice, and is a first step towards
porting to Win64 (which has sizeof(pointer) > sizeof(long)).
Tsutomu Yamada, Magnus Hagander, Tom Lane
append_history(), if libreadline is new enough to have those functions
(they seem to be present at least since 4.2; but libedit may not have them).
This gives significantly saner behavior when two or more sessions overlap in
their use of the history file; although having two sessions exit at just the
same time is still perilous to your history. The behavior of \s remains
unchanged, ie, overwrite whatever was there.
Per bug #5052 from Marek Wójtowicz.
and extend configure to test for it properly instead of hard-wiring
an assumption that everybody but Windows has the rand48 functions.
(We do cheat to the extent of assuming that probing for erand48 will do
for the entire rand48 family.)
erand48() is unused as of this commit, but a followon patch will cause
GEQO to depend on it.
Andres Freund, additional hacking by Tom
This upgrades the configure infrastructure to the latest Autoconf version.
Some notable news are:
- The workaround for the broken fseeko() test is gone.
- Checking for unknown options is now provided by Autoconf itself.
- Fixes for Mac OS X
we can get some buildfarm feedback about whether that function is still
problematic. (Note that the planned async-preread patch will not really
prove anything one way or the other in buildfarm testing, since it will
be inactive with default GUC settings.)
the * character at the beginning of a pattern, and it does not match
subdomains.
Since this means we no longer need fnmatch, remove the imported implementation
from port, along with the autoconf check for it.
let XLOG_BLCKSZ and XLOG_SEG_SIZE be set via configure. Per a proposal by
Mark Wong, though I thought it better to call the switches after "wal" rather
than "xlog".
support for a nonsegmented mode from md.c. Per recent discussions, there
doesn't seem to be much value in a "never segment" option as opposed to
segmenting with a suitably large segment size. So instead provide a
configure-time switch to set the desired segment size in units of gigabytes.
While at it, expose a configure switch for BLCKSZ as well.
Zdenek Kotala
where Datum is 8 bytes wide. Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior. Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.
Zoltan Boszormenyi, plus configurability stuff by me.
than dividing them into 1GB segments as has been our longtime practice. This
requires working support for large files in the operating system; at least for
the time being, it won't be the default.
Zdenek Kotala
- Change configure.in to use Autoconf 2.61 and update generated files.
- Update build system and documentation to support now directory variables
offered by Autoconf 2.61.
- Replace usages of PGAC_CHECK_ALIGNOF by AC_CHECK_ALIGNOF, now available
in Autoconf 2.61.
- Drop our patched version of AC_C_INLINE, as Autoconf now has the change.
OpenSSL libraries --- just don't call them if they're not there. This
might possibly lead to misleading error messages, but we'll just have
to live with that.