The Perl documentation is very clear that stuff calling libperl should
be built with the compiler switches shown by Perl's $Config{ccflags}.
We'd been ignoring that up to now, and mostly getting away with it,
but recent Perl versions contain ABI compatibility cross-checks that
fail on some builds because of this omission. In particular the
sizeof(PerlInterpreter) can come out different due to some fields being
added or removed; which means we have a live ABI hazard that we'd better
fix rather than continuing to sweep it under the rug.
However, it still seems like a bad idea to just absorb $Config{ccflags}
verbatim. In some environments Perl was built with a different compiler
that doesn't even use the same switch syntax. -D switch syntax is pretty
universal though, and absorbing Perl's -D switches really ought to be
enough to fix the problem.
Furthermore, Perl likes to inject stuff like -D_LARGEFILE_SOURCE and
-D_FILE_OFFSET_BITS=64 into $Config{ccflags}, which affect libc ABIs on
platforms where they're relevant. Adopting those seems dangerous too.
It's unclear whether a build wherein Perl and Postgres have different ideas
of sizeof(off_t) etc would work, or whether anyone would care about making
it work. But it's dead certain that having different stdio ABIs in
core Postgres and PL/Perl will not work; we've seen that movie before.
Therefore, let's also ignore -D switches for symbols beginning with
underscore. The symbols that we actually need to import should be the ones
mentioned in perl.h's PL_bincompat_options stanza, and none of those start
with underscore, so this seems likely to work. (If it turns out not to
work everywhere, we could consider intersecting the symbols mentioned in
PL_bincompat_options with the -D switches. But that will be much more
complicated, so let's try this way first.)
This will need to be back-patched, but first let's see what the
buildfarm makes of it.
Ashutosh Sharma, some adjustments by me
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
Previously, UNBOUNDED meant no lower bound when used in the FROM list,
and no upper bound when used in the TO list, which was OK for
single-column range partitioning, but problematic with multiple
columns. For example, an upper bound of (10.0, UNBOUNDED) would not be
collocated with a lower bound of (10.0, UNBOUNDED), thus making it
difficult or impossible to define contiguous multi-column range
partitions in some cases.
Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to
represent a partition column that is unbounded below or above
respectively. This syntax removes any ambiguity, and ensures that if
one partition's lower bound equals another partition's upper bound,
then the partitions are contiguous.
Also drop the constraint prohibiting finite values after an unbounded
column, and just document the fact that any values after MINVALUE or
MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
multiple times, which was needlessly verbose.
Note: Forces a post-PG 10 beta2 initdb.
Report by Amul Sul, original patch by Amit Langote with some
additional hacking by me.
Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
Doing so was useful in 273c458a2b but
became obsolete when 818fd4a67d caused
postgres.exe to provide the relevant symbols. No other loadable module
links to libpgcommon directly.
In the frontend Makefiles that pull in libpgfeutils, we'd generally
done it like this:
LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
That method is badly broken, as seen in bug #14742 from Chris Ruprecht.
The -L flag for src/fe_utils ends up being placed after whatever random
-L flags are in LDFLAGS already. That puts us at risk of pulling in
libpgfeutils.a from some previous installation rather than the freshly
built one in src/fe_utils. Also, the lack of an "override" is hazardous
if someone tries to specify some LDFLAGS on the make command line.
The correct way to do it is like this:
override LDFLAGS := -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) $(LDFLAGS)
so that libpgfeutils, along with libpq, libpgport, and libpgcommon, are
guaranteed to be pulled in from the build tree and not from any referenced
system directory, because their -L flags will appear first.
In some places we'd been even lazier and done it like this:
LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq
which is subtly wrong in an additional way: on platforms where we can't
restrict the symbols exported by libpq.so, it allows libpgfeutils to
latch onto libpgport and libpgcommon symbols from libpq.so, rather than
directly from those static libraries as intended. This carries hazards
like those explained in the comments for the libpq_pgport macro.
In addition to fixing the broken libpgfeutils usages, I tried to
standardize on using $(libpq_pgport) like so:
override LDFLAGS := $(libpq_pgport) $(LDFLAGS)
even where libpgfeutils is not in the picture. This makes no difference
right now but will hopefully discourage future mistakes of the same ilk.
And it's more like the way we handle CPPFLAGS in libpq-using Makefiles.
In passing, just for consistency, make pgbench include PTHREAD_LIBS the
same way everyplace else does, ie just after LIBS rather than in some
random place in the command line. This might have practical effect if
there are -L switches in that macro on some platform.
It looks to me like the MSVC build scripts are not affected by this
error, but someone more familiar with them than I might want to double
check.
Back-patch to 9.6 where libpgfeutils was introduced. In 9.6, the hazard
this error creates is that a reinstallation might link to the prior
installation's copy of libpgfeutils.a and thereby fail to absorb a
minor-version bug fix.
Discussion: https://postgr.es/m/20170714125106.9231.13772@wrigleys.postgresql.org
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log. We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.
In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse. In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read. This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f). pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.
In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order. This is needed
on Windows where the SHMEM_KEY line will never be written at all. We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.
Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway. (Yes, I found that
out the hard way.)
While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious. This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else. For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec(). (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)
In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.
Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Update version-checking code and list of switches. Delete obsolete
quasi-support for using GNU indent. Remove a lot of no-longer-needed
workarounds for bugs of the old version, and improve comments for
the hacks that remain. Update run_build() subroutine to fetch the
pg_bsd_indent code from the newly established git repo for it.
In passing, fix pgindent to not overwrite files that require no changes;
this makes it a bit more friendly to run on a built tree.
Adjust relevant documentation.
Remove indent.bsd.patch; it's not relevant anymore (and was obsolete
long ago anyway). Likewise remove pgcppindent, since we're no longer
in the business of shipping C++ code.
Piotr Stefaniak is responsible for most of the algorithmic changes
to the pgindent script; I did the rest.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
If a .c or .h file corresponds to a .y or .l file, skip indenting it.
There's no point in reindenting derived files, and these files tend to
confuse pgindent. (Which probably indicates a bug in BSD indent, but
I can't get excited about trying to fix it.)
For the same reasons, add src/backend/utils/fmgrtab.c to the set of
files excluded by src/tools/pgindent/exclude_file_patterns.
The point of doing this is that it makes it safe to run pgindent over
the tree without doing "make maintainer-clean" first. While these are
not the only derived .c/.h files in the tree, they are the only ones
pgindent fails on. Removing that prerequisite step results in one less
way to mess up a pgindent run, and it's necessary if we ever hope to get
to the ease of running pgindent via "make indent".
Some openssl builds put their lib files in a VC subdirectory, others do
not. Cater for both cases.
Backpatch to all live branches.
From an offline discussion with Leonardo Cecchi.
On MSVC builds and on back branches that means removing the hardcoded
--verbose setting. On master for Unix that means removing the empty
setting in the global Makefile so that the value can be acquired from
the environment as well as from the make arguments.
Backpatch to 9.4 where we introduced TAP tests
On Unix this path is detected via the use of xml2-config, but that's not
available on Windows. This means that users building with libxml2 will
no longer need to move things around from the standard libxml2
installation for MSVC builds.
Backpatch to all live branches.
Somehow, we'd missed ever doing this. The consequences aren't too
severe: basically, the timezone library would fall back on its hardwired
notion of the DST transition dates to use for a POSIX-style zone name,
rather than obeying US/Eastern which is the intended behavior. The net
effect would only be to obey current US DST law further back than it
ought to apply; so it's not real surprising that nobody noticed.
David Rowley, per report from Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1LC7CaNhRAQ__C3ht1JVrPzaAXXhEJRnR5L6bfYHiLmWw@mail.gmail.com
Commit eaba54c20c added support for Tcl 8.6 for configure-supported
platforms after verifying that pltcl works without further changes, but
the MSVC tooling wasn't updated accordingly. Update MSVC to match,
restructuring the code to avoid duplicating the logic for every Tcl
version supported.
Backpatch to all live branches, like eaba54c20c. In 9.4 and previous,
change the patch to use backslashes rather than forward, as in the rest
of the file.
Reported by Paresh More, who also tested the patch I provided.
Discussion: https://postgr.es/m/CAAgiCNGVw3ssBtSi3ZNstrz5k00ax=UV+_ZEHUeW_LMSGL2sew@mail.gmail.com
Currently only provision for running the bin checks in a single step is
provided for. Now these tests can be run individually, as well as tests
in other locations (e.g. src.test/recover).
Also provide for suppressing unnecessary temp installs by setting the
NO_TEMP_INSTALL environment variable just as the Makefiles do.
Backpatch to 9.4.
Commit a4777f355 was a shade too mechanical: we don't want to override
MSVC's own definition of _MSC_VER, as that breaks tests on its numerical
value. Per buildfarm.
This used to mean "Visual C++ except in those parts where Borland C++
was supported where it meant one of those". Now that we don't support
Borland C++ anymore, simplify by using _MSC_VER which is the normal way
to detect Visual C++.
An important step of SASLprep normalization, is to convert the string to
Unicode normalization form NFKC. Unicode normalization requires a fairly
large table of character decompositions, which is generated from data
published by the Unicode consortium. The script to generate the table is
put in src/common/unicode, as well test code for the normalization.
A pre-generated version of the tables is included in src/include/common,
so you don't need the code in src/common/unicode to build PostgreSQL, only
if you wish to modify the normalization tables.
The SASLprep implementation depends on the UTF-8 functions from
src/backend/utils/mb/wchar.c. So to use it, you must also compile and link
that. That doesn't change anything for the current users of these
functions, the backend and libpq, as they both already link with wchar.o.
It would be good to move those functions into a separate file in
src/commmon, but I'll leave that for another day.
No documentation changes included, because there is no details on the
SCRAM mechanism in the docs anyway. An overview on that in the protocol
specification would probably be good, even though SCRAM is documented in
detail in RFC5802. I'll write that as a separate patch. An important thing
to mention there is that we apply SASLprep even on invalid UTF-8 strings,
to support other encodings.
Patch by Michael Paquier and me.
Discussion: https://www.postgresql.org/message-id/CAB7nPqSByyEmAVLtEf1KxTRh=PWNKiWKEKQR=e1yGehz=wbymQ@mail.gmail.com
When using integer timestamps, the interval-comparison functions tried
to compute the overall magnitude of an interval as an int64 number of
microseconds. As reported by Frazer McLean, this overflows for intervals
exceeding about 296000 years, which is bad since we nominally allow
intervals many times larger than that. That results in wrong comparison
results, and possibly in corrupted btree indexes for columns containing
such large interval values.
To fix, compute the magnitude as int128 instead. Although some compilers
have native support for int128 calculations, many don't, so create our
own support functions that can do 128-bit addition and multiplication
if the compiler support isn't there. These support functions are designed
with an eye to allowing the int128 code paths in numeric.c to be rewritten
for use on all platforms, although this patch doesn't do that, or even
provide all the int128 primitives that will be needed for it.
Back-patch as far as 9.4. Earlier releases did not guard against overflow
of interval values at all (commit 146604ec4 fixed that), so it seems not
very exciting to worry about overly-large intervals for them.
Before 9.6, we did not assume that unreferenced "static inline" functions
would not draw compiler warnings, so omit functions not directly referenced
by timestamp.c, the only present consumer of int128.h. (We could have
omitted these functions in HEAD too, but since they were written and
debugged on the way to the present patch, and they look likely to be needed
by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent
such warnings in a --disable-integer-datetimes build, though.
Before 9.5, configure will never define HAVE_INT128, so the part of
int128.h that exploits a native int128 implementation is dead code in the
9.4 branch. I didn't bother to remove it, thinking that keeping the file
looking similar in different branches is more useful.
In HEAD only, add a simple test harness for int128.h in src/tools/.
In back branches, this does not change the float-timestamps code path.
That's not subject to the same kind of overflow risk, since it computes
the interval magnitude as float8. (No doubt, when this code was originally
written, overflow was disregarded for exactly that reason.) There is a
precision hazard instead :-(, but we'll avert our eyes from that question,
since no complaints have been reported and that code's deprecated anyway.
Kyotaro Horiguchi and Tom Lane
Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com
The previous change wanted to avoid modifying $_ in grep, but the code
just made the change in a local variable and then lost it. Rewrite the
code using a separate map and grep, which is clearer anyway.
Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
I think I've now gotten all the hard-coded references in C code.
Unfortunately there's no equally convenient way to parameterize
SQL files ...
Discussion: https://postgr.es/m/11145.1488931324@sss.pgh.pa.us
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
We customarily #include <netinet/in.h> before <arpa/inet.h>; according
to our git history (cf commit 527f8babc) there used to be platform(s)
where <arpa/inet.h> didn't compile otherwise. That's probably not
really an issue anymore, but since test_ifaddrs.c is the one and only
place in our code that's not following that rule, bring it into line.
Also remove #include <sys/socket.h>, as that's duplicative given that
libpq/ifaddr.h does so (via pqcomm.h).
In passing, add a .gitignore file so nobody accidentally commits the
test_ifaddrs executable, as I nearly did.
I see no particular need to back-patch this, as it's just neatnik-ism
considering we don't build test_ifaddrs by default, or even document
it anywhere.
This introduces a new generic SASL authentication method, similar to the
GSS and SSPI methods. The server first tells the client which SASL
authentication mechanism to use, and then the mechanism-specific SASL
messages are exchanged in AuthenticationSASLcontinue and PasswordMessage
messages. Only SCRAM-SHA-256 is supported at the moment, but this allows
adding more SASL mechanisms in the future, without changing the overall
protocol.
Support for channel binding, aka SCRAM-SHA-256-PLUS is left for later.
The SASLPrep algorithm, for pre-processing the password, is not yet
implemented. That could cause trouble, if you use a password with
non-ASCII characters, and a client library that does implement SASLprep.
That will hopefully be added later.
Authorization identities, as specified in the SCRAM-SHA-256 specification,
are ignored. SET SESSION AUTHORIZATION provides more or less the same
functionality, anyway.
If a user doesn't exist, perform a "mock" authentication, by constructing
an authentic-looking challenge on the fly. The challenge is derived from
a new system-wide random value, "mock authentication nonce", which is
created at initdb, and stored in the control file. We go through these
motions, in order to not give away the information on whether the user
exists, to unauthenticated users.
Bumps PG_CONTROL_VERSION, because of the new field in control file.
Patch by Michael Paquier and Heikki Linnakangas, reviewed at different
stages by Robert Haas, Stephen Frost, David Steele, Aleksander Alekseev,
and many others.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRbR3GmFYdedCAhzukfKrgBLTLtMvENOmPrVWREsZkF8g%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqSMXU35g%3DW9X74HVeQp0uvgJxvYOuA4A-A3M%2B0wfEBv-w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/55192AFE.6080106@iki.fi
This way both frontend and backends can use them. The functions are taken
from pgcrypto, which now fetches the source files it needs from
src/common/.
A new interface is designed for the SHA2 functions, which allow linking
to either OpenSSL or the in-core stuff taken from KAME as needed.
Michael Paquier, reviewed by Robert Haas.
Discussion: https://www.postgresql.org/message-id/CAB7nPqTGKuTM5jiZriHrNaQeVqp5e_iT3X4BFLWY_HyHxLvySQ%40mail.gmail.com
This makes the handling of operators similar to that of functions and
aggregates.
Rename node FuncWithArgs to ObjectWithArgs, to reflect the expanded use.
Reviewed-by: Jim Nasby <Jim.Nasby@BlueTreble.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
The new slab allocator needs different per-allocation information than
the classical aset.c. The definition in 58b25e981 wasn't sufficiently
careful on 32 platforms with 8 byte alignment, leading to buildfarm
failures. That's not entirely easy to fix by just adjusting the
definition.
As slab.c doesn't actually need the size part(s) of the common header,
all chunks are equally sized after all, it seems better to instead
reduce the header to the part needed by all allocators, namely which
context an allocation belongs to. That has the advantage of reducing
the overhead of slab allocations, and also allows for more flexibility
in future allocators.
To avoid spreading the logic about accessing a chunk's context around,
centralize it in GetMemoryChunkContext(), which allows to delete a
good number of lines.
A followup commit will revise the mmgr/README portion about
StandardChunkHeader, and more.
Author: Andres Freund
Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
The default general purpose aset.c style memory context is not a great
choice for allocations that are all going to be evenly sized,
especially when those objects aren't small, and have varying
lifetimes. There tends to be a lot of fragmentation, larger
allocations always directly go to libc rather than have their cost
amortized over several pallocs.
These problems lead to the introduction of ad-hoc slab allocators in
reorderbuffer.c. But it turns out that the simplistic implementation
leads to problems when a lot of objects are allocated and freed, as
aset.c is still the underlying implementation. Especially freeing can
easily run into O(n^2) behavior in aset.c.
While the O(n^2) behavior in aset.c can, and probably will, be
addressed, custom allocators for this behavior are more efficient
both in space and time.
This allocator is for evenly sized allocations, and supports both
cheap allocations and freeing, without fragmenting significantly. It
does so by allocating evenly sized blocks via malloc(), and carves
them into chunks that can be used for allocations. In order to
release blocks to the OS as early as possible, chunks are allocated
from the fullest block that still has free objects, increasing the
likelihood of a block being entirely unused.
A subsequent commit uses this in reorderbuffer.c, but a further
allocator is needed to resolve the performance problems triggering
this work.
There likely are further potentialy uses of this allocator besides
reorderbuffer.c.
There's potential further optimizations of the new slab.c, in
particular the array of freelists could be replaced by a more
intelligent structure - but for now this looks more than good enough.
Author: Tomas Vondra, editorialized by Andres Freund
Reviewed-By: Andres Freund, Petr Jelinek, Robert Haas, Jim Nasby
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc7292ad@2ndquadrant.com
Per discussion, the time has come to do this. The handwriting has been
on the wall at least since 9.0 that this would happen someday, whenever
it got to be too much of a burden to support the float-timestamp option.
The triggering factor now is the discovery that there are multiple bugs
in the code that attempts to implement use of integer timestamps in the
replication protocol even when the server is built for float timestamps.
The internal float timestamps leak into the protocol fields in places.
While we could fix the identified bugs, there's a very high risk of
introducing more. Trying to build a wall that would positively prevent
mixing integer and float timestamps is more complexity than we want to
undertake to maintain a long-deprecated option. The fact that these
bugs weren't found through testing also indicates a lack of interest
in float timestamps.
This commit disables configure's --disable-integer-datetimes switch
(it'll still accept --enable-integer-datetimes, though), removes direct
references to USE_INTEGER_DATETIMES, and removes discussion of float
timestamps from the user documentation. A considerable amount of code is
rendered dead by this, but removing that will occur as separate mop-up.
Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
It didn't take long at all for me to become irritated that the original
choice of name for this script resulted in "warning" showing up in several
places in build logs, because I tend to grep for that. Change the script
name to avoid that.
Versions of flex before 2.5.36 might generate code that results in an
"unused variable" warning, when using %option reentrant. Historically
we've worked around that by specifying -Wno-error, but that's an
unsatisfying solution. The official "fix" for this was just to insert a
dummy reference to the variable, so write a small perl script that edits
the generated C code similarly.
The MSVC side of this is untested, but the buildfarm should soon reveal
if I broke that.
Discussion: https://postgr.es/m/25456.1487437842@sss.pgh.pa.us
This isn't exposed to the optimizer or the executor yet; we'll add
support for those things in a separate patch. But this puts the
basic mechanism in place: several processes can attach to a parallel
btree index scan, and each one will get a subset of the tuples that
would have been produced by a non-parallel scan. Each index page
becomes the responsibility of a single worker, which then returns
all of the TIDs on that page.
Rahila Syed, Amit Kapila, Robert Haas, reviewed and tested by
Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi.
This module was intended to ease migrations of applications that used
the pre-8.3 version of text search to the in-core version introduced
in that release. However, since all pre-8.3 releases of the database
have been out of support for more than 5 years at this point, we
expect that few people are depending on it at this point. If some
people still need it, nothing prevents it from being maintained as a
separate extension, outside of core.
Discussion: http://postgr.es/m/CA+Tgmob5R8aDHiFRTQsSJbT1oreKg2FOSBrC=2f4tqEH3dOMAg@mail.gmail.com
This patch doesn't actually make any index AM parallel-aware, but it
provides the necessary functions at the AM layer to do so.
Rahila Syed, Amit Kapila, Robert Haas
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h. This avoids
having to manually maintain these prototypes across a random variety of
header files. It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
The different actions in pg_ctl had different defaults for -w and -W,
mostly for historical reasons. Most users will want the -w behavior, so
make that the default.
Remove the -w option in most example and test code, so avoid confusion
and reduce verbosity. pg_upgrade is not touched, so it can continue to
work with older installations.
Reviewed-by: Beena Emerson <memissemerson@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
Various example and test code used -m fast explicitly, but since it's
the default, this can be omitted now or should be replaced by a better
example.
pg_upgrade is not touched, so it can continue to operate with older
installations.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
Unlike the current pgcrypto function, the source is chosen by configure.
That makes it easier to test different implementations, and ensures that
we don't accidentally fall back to a less secure implementation, if the
primary source fails. All of those methods are quite reliable, it would be
pretty surprising for them to fail, so we'd rather find out by failing
hard.
If no strong random source is available, we fall back to using erand48(),
seeded from current timestamp, like PostmasterRandom() was. That isn't
cryptographically secure, but allows us to still work on platforms that
don't have any of the above stronger sources. Because it's not very secure,
the built-in implementation is only used if explicitly requested with
--disable-strong-random.
This replaces the more complicated Fortuna algorithm we used to have in
pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom,
so it doesn't seem worth the maintenance effort to keep that. pgcrypto
functions that require strong random numbers will be disabled with
--disable-strong-random.
Original patch by Magnus Hagander, tons of further work by Michael Paquier
and me.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
Programmers discovered decades ago that it was useful to have a simple
interface for allocating and freeing memory, which is why malloc() and
free() were invented. Unfortunately, those handy tools don't work
with dynamic shared memory segments because those are specific to
PostgreSQL and are not necessarily mapped at the same address in every
cooperating process. So invent our own allocator instead. This makes
it possible for processes cooperating as part of parallel query
execution to allocate and free chunks of memory without having to
reserve them prior to the start of execution. It could also be used
for longer lived objects; for example, we could consider storing data
for pg_stat_statements or the stats collector in shared memory using
these interfaces, rather than writing them to files. Basically,
anything that needs shared memory but can't predict in advance how
much it's going to need might find this useful.
Thomas Munro and Robert Haas. The original code (of mine) on which
Thomas based his work was actually designed to be a new backend-local
memory allocator for PostgreSQL, but that hasn't gone anywhere - or
not yet, anyway. Thomas took that work and performed major
refactoring and extensive modifications to make it work with dynamic
shared memory, including the addition of appropriate locking.
Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
This is intended as infrastructure for a full-fledged allocator for
dynamic shared memory. The interface looks a bit like a real
allocator, but only supports allocating and freeing memory in
multiples of the 4kB page size. Further, to free memory, you must
know the size of the span you wish to free, in pages. While these are
make it unsuitable as an allocator in and of itself, it still serves
as very useful scaffolding for a full-fledged allocator.
Robert Haas and Thomas Munro. This code is mostly the same as my 2014
submission, but Thomas fixed quite a few bugs and made some changes to
the interface.
Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com
Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
In each case, absence of a trailing newline would itself constitute a
PostgreSQL bug. Therefore, this slightly enhances the changed tests.
This works around a bug that last appeared in Perl 5.8.8, fixing
src/test/modules/test_pg_dump when run against that version. Commit
e7293e3271 worked around the bug, but the
subsequent addition of test_pg_dump introduced affected code. As that
commit had shown, slight increases in pattern complexity can suppress
the bug. This commit edits qr/foo$/m patterns too complex to encounter
the bug today, for style consistency and robustness against unrelated
pattern changes. Back-patch to 9.6, where test_pg_dump was introduced.
As of this writing, a fresh MSYS installation includes an affected Perl
5.8.8. The Perl 5.8.8 in Red Hat Enterprise Linux 5.11 carries a patch
that renders it unaffected, but the Perl 5.8.5 of Red Hat Enterprise
Linux 4.4 is affected.
This reverts commit 9e083fd468. That was a
few bricks shy of a load:
* Query cancel stopped working
* Buildfarm member pademelon stopped working, because the box doesn't have
/dev/urandom nor /dev/random.
This clearly needs some more discussion, and a quite different patch, so
revert for now.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
- /dev/random
Original patch by Magnus Hagander, with further work by Michael Paquier
and me.
Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>
The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.
Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.
The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional. The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size. That
size is only used for the initial sizing of the hash-table.
Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
dynahash.c hash tables aren't quite fast enough for some
use-cases. There are several reasons for lacking performance:
- the use of chaining for collision handling makes them cache
inefficient, that's especially an issue when the tables get bigger.
- as the element sizes for dynahash are only determined at runtime,
offset computations are somewhat expensive
- hash and element comparisons are indirect function calls, causing
unnecessary pipeline stalls
- it's two level structure has some benefits (somewhat natural
partitioning), but increases the number of indirections
to fix several of these the hash tables have to be adjusted to the
individual use-case at compile-time. C unfortunately doesn't provide a
good way to do compile code generation (like e.g. c++'s templates for
all their weaknesses do). Thus the somewhat ugly approach taken here is
to allow for code generation using a macro-templatized header file,
which generates functions and types based on a prefix and other
parameters.
Later patches use this infrastructure to use such hash tables for
tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
...). In queries where these use up a large fraction of the time, this
has been measured to lead to performance improvements of over 100%.
There are other cases where this could be useful (e.g. catcache.c).
The hash table design chosen is a variant of linear open-addressing. The
biggest disadvantage of simple linear addressing schemes are highly
variable lookup times due to clustering, and deletions leaving a lot of
tombstones around. To address these issues a variant of "robin hood"
hashing is employed. Robin hood hashing optimizes chaining lengths by
moving elements close to their optimal bucket ("rich" elements), out of
the way if a to-be-inserted element is further away from its optimal
position (i.e. it's "poor"). While that can make insertions slower, the
average lookup performance is a lot better, and higher fill factors can
be used in a still performant manner. To avoid tombstones - which
normally solve the issue that a deleted node's presence is relevant to
determine whether a lookup needs to continue looking or is done -
buckets following a deleted element are shifted backwards, unless
they're empty or already at their optimal position.
There's further possible improvements that can be made to this
implementation. Amongst others:
- Use distance as a termination criteria during searches. This is
generally a good idea, but I've been able to see the overhead of
distance calculations in some cases.
- Consider combining the 'empty' status into the hashvalue, and enforce
storing the hashvalue. That could, in some cases, increase memory
density and remove a few instructions.
- Experiment further with the, very conservatively choosen, fillfactor.
- Make maximum size of hashtable configurable, to allow storing very
very large tables. That'd require 64bit hash values to be more common
than now, though.
- some smaller memcpy calls could be optimized to copy larger chunks
But since the new implementation is already considerably faster than
dynahash it seem sensible to start using it.
Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
Just turning the crank on the project started in commit d51924be8.
These cases turn out to be exact subsets of the boilerplate needed
for hstore_plpython.
Discussion: <2652.1475512158@sss.pgh.pa.us>
Previously, on most platforms, we allowed hstore_plpython's references
to hstore and plpython to be unresolved symbols at link time, trusting
the dynamic linker to resolve them when the module is loaded. This
has a number of problems, the worst being that the dynamic linker
does not know where the references come from and can do nothing but
fail if those other modules haven't been loaded. We've more or less
gotten away with that for the limited use-case of datatype transform
modules, but even there, it requires some awkward hacks, most recently
commit 83c249200.
Instead, let's not treat these references as linker-resolvable at all,
but use function pointers that are manually filled in by the module's
_PG_init function. There are few enough contact points that this
doesn't seem unmaintainable, at least for these use-cases. (Note that
the same technique wouldn't work at all for decoupling from libpython
itself, but fortunately that's just a standard shared library and can
be linked to normally.)
This is an initial patch that just converts hstore_plpython. If the
buildfarm doesn't find any fatal problems, I'll work on the other
transform modules soon.
Tom Lane, per an idea of Andres Freund's.
Discussion: <2652.1475512158@sss.pgh.pa.us>
We weren't terribly consistent about whether to call Apple's OS "OS X"
or "Mac OS X", and the former is probably confusing to people who aren't
Apple users. Now that Apple has rebranded it "macOS", follow their lead
to establish a consistent naming pattern. Also, avoid the use of the
ancient project name "Darwin", except as the port code name which does not
seem desirable to change. (In short, this patch touches documentation and
comments, but no actual code.)
I didn't touch contrib/start-scripts/osx/, either. I suspect those are
obsolete and due for a rewrite, anyway.
I dithered about whether to apply this edit to old release notes, but
those were responsible for quite a lot of the inconsistencies, so I ended
up changing them too. Anyway, Apple's being ahistorical about this,
so why shouldn't we be?
Solution.pm mistakenly believed that the xml option requires the xslt
option, when actually the dependency is the other way around; and it
believed that libxml requires libiconv, which is not necessarily so,
so we shouldn't enforce it here. Fix the option cross-checking logic.
Also, since AddProject already takes care of adding libxml and libxslt
include and library dependencies to every project, there's no need
for the custom code that did that in mkvcbuild. While at it, let's
handle the similar dependencies for uuid in a similar fashion.
Given the lack of field complaints about these overly strict build
dependency requirements, there seems no need for a back-patch.
Michael Paquier
Discussion: <CAB7nPqR0+gpu3mRQvFjf-V-bMxmiSJ6NpTg9_WzVDL+a31cV2g@mail.gmail.com>
Until now, it used the current working directory. This makes it safe
for simultaneous invocations of gendef.pl, with different target
directories, to run from a single current working directory, such as
$(top_srcdir). The MSVC build system will soon rely on this.
Christian Ullrich, reviewed by Michael Paquier.
Commit b2cbced9e instituted a policy of referring to the timezone database
as the "IANA timezone database" in our user-facing documentation.
Propagate that wording into a couple of places that were still using "zic"
to refer to the database, which is definitely not right (zic is the
compilation tool, not the data).
Back-patch, not because this is very important in itself, but because
we routinely cherry-pick updates to the tznames files and I don't want
to risk future merge failures.
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.
Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.
Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.
There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.
Patch by Michael Paquier, with some changes by me.
Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
Up to now we've manually adjusted these numbers in several different
Makefiles at the start of each development cycle. While that's not
much work, it's easily forgotten, so let's get rid of it by setting
the SO_MINOR_VERSION values directly from $(MAJORVERSION).
In the case of libpq, this dev cycle's value of SO_MINOR_VERSION happens
to be "10" anyway, so this switch is transparent. For ecpg's shared
libraries, this will result in skipping one or two minor version numbers
between v9.6 and v10, which seems like no big problem; and it was a bit
inconsistent that they didn't have equal minor version numbers anyway.
Discussion: <21969.1471287988@sss.pgh.pa.us>
Once upon a time, it made sense for the ecpg preprocessor to have its
own version number, because it used a manually-maintained grammar that
wasn't always in sync with the core grammar. But those days are
thankfully long gone, leaving only a maintenance nuisance behind.
Let's use the PG v10 version numbering changeover as an excuse to get
rid of the ecpg version number and just have ecpg identify itself by
PG_VERSION. From the user's standpoint, ecpg will go from "4.12" in
the 9.6 branch to "10" in the 10 branch, so there's no failure of
monotonicity.
Discussion: <1471332659.4410.67.camel@postgresql.org>
Prior to commit 7882c3b0b9, it was
possible to use LWLocks within DSM segments, but that commit broke
this use case by switching from a doubly linked list to a circular
linked list. Switch back, using a new bit of general infrastructure
for maintaining lists of PGPROCs.
Thomas Munro, reviewed by me.
This is a good bit more complicated than the average new-version stamping
commit, because it includes various adjustments in pursuit of changing
from three-part to two-part version numbers. It's likely some further
work will be needed around that change; but this is enough to get through
the regression tests, at least in Unix builds.
Peter Eisentraut and Tom Lane
Wrap the perltidy invocation into a shell script to reduce the risk of
copy-and-paste errors. Include removal of *.bak files in the script,
so they don't accidentally get committed. Improve the directions in
the README file.
Due to simplistic quoting and confusion of database names with conninfo
strings, roles with the CREATEDB or CREATEROLE option could escalate to
superuser privileges when a superuser next ran certain maintenance
commands. The new coding rule for PQconnectdbParams() calls, documented
at conninfo_array_parse(), is to pass expand_dbname=true and wrap
literal database names in a trivial connection string. Escape
zero-length values in appendConnStrVal(). Back-patch to 9.1 (all
supported versions).
Nathan Bossart, Michael Paquier, and Noah Misch. Reviewed by Peter
Eisentraut. Reported by Nathan Bossart.
Security: CVE-2016-5424
To ensure that "make installcheck" can be used safely against an existing
installation, we need to be careful about what global object names
(database, role, and tablespace names) we use; otherwise we might
accidentally clobber important objects. There's been a weak consensus that
test databases should have names including "regression", and that test role
names should start with "regress_", but we didn't have any particular rule
about tablespace names; and neither of the other rules was followed with
any consistency either.
This commit moves us a long way towards having a hard-and-fast rule that
regression test databases must have names including "regression", and that
test role and tablespace names must start with "regress_". It's not
completely there because I did not touch some test cases in rolenames.sql
that test creation of special role names like "session_user". That will
require some rethinking of exactly what we want to test, whereas the intent
of this patch is just to hit all the cases in which the needed renamings
are cosmetic.
There is no enforcement mechanism in this patch either, but if we don't
add one we can expect that the tests will soon be violating the convention
again. Again, that's not such a cosmetic change and it will require
discussion. (But I did use a quick-hack enforcement patch to find these
cases.)
Discussion: <16638.1468620817@sss.pgh.pa.us>
Change assorted places in our Perl code that did things like
system("prog $path/file");
to do it more like
system('prog', "$path/file");
which is safe against spaces and other special characters in the path
variable. The latter was already the prevailing style, but a few bits
of code hadn't gotten this memo. Back-patch to 9.4 as relevant.
Michael Paquier, Kyotaro Horiguchi
Discussion: <20160704.160213.111134711.horiguchi.kyotaro@lab.ntt.co.jp>
The new pg_check_visible() and pg_check_frozen() functions can be used to
verify that the visibility map bits for a relation's data pages match the
actual state of the tuples on those pages.
Amit Kapila and Robert Haas, reviewed (in earlier versions) by Andres
Freund. Additional testing help by Thomas Munro.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.
Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
The test_pg_dump extension doesn't have a C component, so we need
to exclude it from the MSVC build system trying to figure out how
to build it.
Also add a "MODULES" line to the Makefile, as test_extensions has.
Might not be necessary, but seems good to keep things consistent.
Lastly, remove the 'installcheck' line from test_pg_dump, as that
was causing redefinition errors, at least on my box. This also
makes test_pg_dump consistent with how commit_ts is set up.
This time, use the buildfarm-supplied contents for this file, instead
of trying to update it by eyeballing the pgindent output.
Per discussion with Tom and Bruce.
This has the inverse effect of --master-only. It's needed to help find
cases where a commit should not be described in major release notes
because it was back-patched into older branches, though not at the same
time as the HEAD commit.
Adjust the way we detect the locale. As a result the minumum Windows
version supported by VS2015 and later is Windows Vista. Add some tweaks
to remove new compiler warnings. Remove documentation references to the
now obsolete msysGit.
Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich.
Backpatch to 9.5
In addition to adding new typedefs, I also re-sorted the file so that
various entries add piecemeal, mostly or entirely by me, were alphabetized
the same way as other entries in the file.
In commit c0b050192, Andres introduced the idea of including one-line
commit references in our major release notes. Teach git_changelog to
emit a (lightly adapted) version of that format, so that we don't
have to laboriously add it to the notes after the fact. The default
output isn't changed, since I anticipate still using that for minor
release notes.
It was incorrectly declared as a volatile pointer to a non-volatile
structure. Eliminate the OldSnapshotControl struct definition; it
is really not needed. Pointed out by Tom Lane.
While at it, add OldSnapshotControlData to pgindent's list of
structures.
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).
To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).
As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths. There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.
As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.
This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me. Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.
Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
The buildfarm showed failure for Windows MSVC builds due to this
omission. This might not be the only problem with the Makefile for
this feature, but hopefully this will get it past the immediate
problem.
Fix suggested by Tom Lane
Most of what is produced by the detailed verbosity level is of no
interest at all, so switch to the normal level for more usable output.
Christian Ullrich
Backpatch to all live branches
Previously synchronous replication offered only the ability to confirm
that all changes made by a transaction had been transferred to at most
one synchronous standby server.
This commit extends synchronous replication so that it supports multiple
synchronous standby servers. It enables users to consider one or more
standby servers as synchronous, and increase the level of transaction
durability by ensuring that transaction commits wait for replies from
all of those synchronous standbys.
Multiple synchronous standby servers are configured in
synchronous_standby_names which is extended to support new syntax of
'num_sync ( standby_name [ , ... ] )', where num_sync specifies
the number of synchronous standbys that transaction commits need to
wait for replies from and standby_name is the name of a standby
server.
The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before
is also still supported. It's the same as new syntax with num_sync=1.
This commit doesn't include "quorum commit" feature which was discussed
in pgsql-hackers. Synchronous standbys are chosen based on their priorities.
synchronous_standby_names determines the priority of each standby for
being chosen as a synchronous standby. The standbys whose names appear
earlier in the list are given higher priority and will be considered as
synchronous. Other standby servers appearing later in this list
represent potential synchronous standbys.
The regression test for multiple synchronous standbys is not included
in this commit. It should come later.
Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs,
Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen,
Rajeev Rastogi
Many thanks to the various individuals who were involved in
discussing and developing this feature.
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output. Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.
Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
We don't really want to encourage people to write numeric SQLSTATEs in
programs; that's unreadable and error-prone. Copy plpgsql's infrastructure
for converting between SQLSTATEs and exception names shown in Appendix A,
and modify examples in tests and documentation to do it that way.
This completes (at least for now) the project of getting rid of ad-hoc
linkages among the src/bin/ subdirectories. Everything they share is now
in src/fe_utils/ and is included from a static library at link time.
A side benefit is that we can restore the FLEX_NO_BACKUP check for
psqlscanslash.l. We might need to think of another way to do that check
if we ever need to build two lexers with that property in the same source
directory, but there's no foreseeable reason to need that.
Per discussion, we want to create a static library and put the stuff into
it that until now has been shared across src/bin/ directories by ad-hoc
methods like symlinking a source file. This commit creates the library and
populates it with a couple of files that contain the widely-useful portions
of pg_dump's dumputils.c file. dumputils.c survives, because it has some
stuff that didn't seem appropriate for fe_utils, but it's significantly
smaller and is no longer referenced from any other directory.
Follow-on patches will move more stuff into fe_utils.
The Mkvcbuild.pm hacking here is just a best guess; we'll see how the
buildfarm likes it.
Now that we have src/common/ for code shared between frontend and backend,
we can get rid of (most of) the klugy ways that the keyword table and
keyword lookup code were formerly shared between different uses.
This is a first step towards a more general plan of getting rid of
special-purpose kluges for sharing code in src/bin/.
I chose to merge kwlookup.c back into keywords.c, as it once was, and
always has been so far as keywords.h is concerned. We could have
kept them separate, but there is noplace that uses ScanKeywordLookup
without also wanting access to the backend's keyword list, so there
seems little point.
ecpg is still a bit weird, but at least now the trickiness is documented.
I think that the MSVC build script should require no adjustments beyond
what's done here ... but we'll soon find out.
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.
This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.
Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.
In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.
To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication. There are additional
places that would benefit from a long-lived set, but that's a task for
another day.
Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.
Reported-By: Dmitry Vasilyev Discussion:
CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com20160114143931.GG10941@awork2.anarazel.de
Previously latches for windows and unix had been implemented in
different files. A later patch introduce an expanded wait
infrastructure, keeping the implementation separate would introduce too
much duplication.
This basically just moves the functions, without too much change. The
reason to keep this separate is that it allows blame to continue working
a little less badly; and to make review a tiny bit easier.
Discussion: 20160114143931.GG10941@awork2.anarazel.de
After the previous fix in 6f1f34c9 msvc ended up looking for psqlscan.c
in the wrong directory.
David's fix just forces the path to be adjusted. That's not a
particularly pretty fix, but it hopefully will make the buildfarm green
again.
Author: David Rowley
Discussion: CAKJS1f_9CCi_t+LEgV5GWoCj3wjavcMoDc5qfcf_A0UwpQoPoA@mail.gmail.com
pgbench now needs to use src/bin/psql/psqlscan.l, but it's not very clear
how to fit that into the MSVC build system. If this doesn't work I'm going
to need some help from somebody who actually understands those scripts ...
Modern Perl has removed psed from its core distribution, so it might not
be readily available on some build platforms. We therefore replace its
use with a Perl script generated by s2p, which is equivalent to the sed
script. The latter is retained for non-MSVC builds to avoid creating a
new hard dependency on Perl for non-Windows tarball builds.
Backpatch to all live branches.
Michael Paquier and me.
This gets us to a point where psqlscan.l can be used by other frontend
programs for the same purpose psql uses it for, ie to detect when it's
collected a complete SQL command from input that is divided across
line boundaries. Moreover, other programs can supply their own lexers
for backslash commands of their own choosing. A follow-on patch will
use this in pgbench.
The end result here is roughly the same as in Kyotaro Horiguchi's
0001-Make-SQL-parser-part-of-psqlscan-independent-from-ps.patch, although
the details of the method for switching between lexers are quite different.
Basically, in this patch we share the entire PsqlScanState, YY_BUFFER_STATE
stack, *and* yyscan_t between different lexers. The only thing we need
to do to switch to a different lexer is to make sure the start_state is
valid for the new lexer. This works because flex doesn't keep any other
persistent state that depends on the specific lexing tables generated for
a particular .l file. (We are assuming that both lexers are built with
the same flex version, or at least versions that are compatible with
respect to the contents of yyscan_t; but that doesn't seem likely to
be a big problem in practice, considering how slowly flex changes.)
Aside from being more efficient than Horiguchi-san's original solution,
this avoids possible corner-case changes in semantics: the original code
was capable of popping the input buffer stack while still staying in
backslash-related parsing states. I'm not sure that that equates to any
useful user-visible behaviors, but I'm not sure it doesn't either, so
I'm loath to assume that we only need to consider the topmost buffer when
parsing a backslash command.
I've attempted to update the MSVC build scripts for the added .l file,
but will rely on the buildfarm to see if I missed anything.
Kyotaro Horiguchi and Tom Lane
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.
To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.
One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.
The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.
This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.
Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches. When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively. This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.
On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.
Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache. Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.
While desirable and likely possible this patch does not contain an
implementation for windows.
With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.
A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences. This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.
Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.
See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind
This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.
Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
Add four new SQL accessible functions: pg_control_system(),
pg_control_checkpoint(), pg_control_recovery(), and pg_control_init()
which expose a subset of the control file data.
Along the way move the code to read and validate the control file to
src/common, where it can be shared by the new backend functions
and the original pg_controldata frontend program.
Patch by me, significant input, testing, and review by Michael Paquier.
This makes the flag more visible for testers using the default file as a
template, increasing the likelyhood that the test suite will be run.
Also have the flag be displayed in the fake "configure" output, if set.
This patch is two new lines only, but perltidy decides to shift things
around which makes it appear a bit bigger.
Author: Michaël Paquier
Reviewed-by: Craig Ringer
Discussion: https://www.postgresql.org/message-id/CAB7nPqRet6UAP2APhZAZw%3DVhJ6w-Q-gGLdZkrOqFgd2vc9-ZDw%40mail.gmail.com