This reverts commit 9e083fd468. That was a
few bricks shy of a load:
* Query cancel stopped working
* Buildfarm member pademelon stopped working, because the box doesn't have
/dev/urandom nor /dev/random.
This clearly needs some more discussion, and a quite different patch, so
revert for now.
This adds a new routine, pg_strong_random() for generating random bytes,
for use in both frontend and backend. At the moment, it's only used in
the backend, but the upcoming SCRAM authentication patches need strong
random numbers in libpq as well.
pg_strong_random() is based on, and replaces, the existing implementation
in pgcrypto. It can acquire strong random numbers from a number of sources,
depending on what's available:
- OpenSSL RAND_bytes(), if built with OpenSSL
- On Windows, the native cryptographic functions are used
- /dev/urandom
- /dev/random
Original patch by Magnus Hagander, with further work by Michael Paquier
and me.
Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>
The more efficient hashtable speeds up hash-aggregations with more than
a few hundred groups significantly. Improvements of over 120% have been
measured.
Due to the the different hash table queries that not fully
determined (e.g. GROUP BY without ORDER BY) may change their result
order.
The conversion is largely straight-forward, except that, due to the
static element types of simplehash.h type hashes, the additional data
some users store in elements (e.g. the per-group working data for hash
aggregaters) is now stored in TupleHashEntryData->additional. The
meaning of BuildTupleHashTable's entrysize (renamed to additionalsize)
has been changed to only be about the additionally stored size. That
size is only used for the initial sizing of the hash-table.
Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
dynahash.c hash tables aren't quite fast enough for some
use-cases. There are several reasons for lacking performance:
- the use of chaining for collision handling makes them cache
inefficient, that's especially an issue when the tables get bigger.
- as the element sizes for dynahash are only determined at runtime,
offset computations are somewhat expensive
- hash and element comparisons are indirect function calls, causing
unnecessary pipeline stalls
- it's two level structure has some benefits (somewhat natural
partitioning), but increases the number of indirections
to fix several of these the hash tables have to be adjusted to the
individual use-case at compile-time. C unfortunately doesn't provide a
good way to do compile code generation (like e.g. c++'s templates for
all their weaknesses do). Thus the somewhat ugly approach taken here is
to allow for code generation using a macro-templatized header file,
which generates functions and types based on a prefix and other
parameters.
Later patches use this infrastructure to use such hash tables for
tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
...). In queries where these use up a large fraction of the time, this
has been measured to lead to performance improvements of over 100%.
There are other cases where this could be useful (e.g. catcache.c).
The hash table design chosen is a variant of linear open-addressing. The
biggest disadvantage of simple linear addressing schemes are highly
variable lookup times due to clustering, and deletions leaving a lot of
tombstones around. To address these issues a variant of "robin hood"
hashing is employed. Robin hood hashing optimizes chaining lengths by
moving elements close to their optimal bucket ("rich" elements), out of
the way if a to-be-inserted element is further away from its optimal
position (i.e. it's "poor"). While that can make insertions slower, the
average lookup performance is a lot better, and higher fill factors can
be used in a still performant manner. To avoid tombstones - which
normally solve the issue that a deleted node's presence is relevant to
determine whether a lookup needs to continue looking or is done -
buckets following a deleted element are shifted backwards, unless
they're empty or already at their optimal position.
There's further possible improvements that can be made to this
implementation. Amongst others:
- Use distance as a termination criteria during searches. This is
generally a good idea, but I've been able to see the overhead of
distance calculations in some cases.
- Consider combining the 'empty' status into the hashvalue, and enforce
storing the hashvalue. That could, in some cases, increase memory
density and remove a few instructions.
- Experiment further with the, very conservatively choosen, fillfactor.
- Make maximum size of hashtable configurable, to allow storing very
very large tables. That'd require 64bit hash values to be more common
than now, though.
- some smaller memcpy calls could be optimized to copy larger chunks
But since the new implementation is already considerably faster than
dynahash it seem sensible to start using it.
Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
Just turning the crank on the project started in commit d51924be8.
These cases turn out to be exact subsets of the boilerplate needed
for hstore_plpython.
Discussion: <2652.1475512158@sss.pgh.pa.us>
Previously, on most platforms, we allowed hstore_plpython's references
to hstore and plpython to be unresolved symbols at link time, trusting
the dynamic linker to resolve them when the module is loaded. This
has a number of problems, the worst being that the dynamic linker
does not know where the references come from and can do nothing but
fail if those other modules haven't been loaded. We've more or less
gotten away with that for the limited use-case of datatype transform
modules, but even there, it requires some awkward hacks, most recently
commit 83c249200.
Instead, let's not treat these references as linker-resolvable at all,
but use function pointers that are manually filled in by the module's
_PG_init function. There are few enough contact points that this
doesn't seem unmaintainable, at least for these use-cases. (Note that
the same technique wouldn't work at all for decoupling from libpython
itself, but fortunately that's just a standard shared library and can
be linked to normally.)
This is an initial patch that just converts hstore_plpython. If the
buildfarm doesn't find any fatal problems, I'll work on the other
transform modules soon.
Tom Lane, per an idea of Andres Freund's.
Discussion: <2652.1475512158@sss.pgh.pa.us>
We weren't terribly consistent about whether to call Apple's OS "OS X"
or "Mac OS X", and the former is probably confusing to people who aren't
Apple users. Now that Apple has rebranded it "macOS", follow their lead
to establish a consistent naming pattern. Also, avoid the use of the
ancient project name "Darwin", except as the port code name which does not
seem desirable to change. (In short, this patch touches documentation and
comments, but no actual code.)
I didn't touch contrib/start-scripts/osx/, either. I suspect those are
obsolete and due for a rewrite, anyway.
I dithered about whether to apply this edit to old release notes, but
those were responsible for quite a lot of the inconsistencies, so I ended
up changing them too. Anyway, Apple's being ahistorical about this,
so why shouldn't we be?
Solution.pm mistakenly believed that the xml option requires the xslt
option, when actually the dependency is the other way around; and it
believed that libxml requires libiconv, which is not necessarily so,
so we shouldn't enforce it here. Fix the option cross-checking logic.
Also, since AddProject already takes care of adding libxml and libxslt
include and library dependencies to every project, there's no need
for the custom code that did that in mkvcbuild. While at it, let's
handle the similar dependencies for uuid in a similar fashion.
Given the lack of field complaints about these overly strict build
dependency requirements, there seems no need for a back-patch.
Michael Paquier
Discussion: <CAB7nPqR0+gpu3mRQvFjf-V-bMxmiSJ6NpTg9_WzVDL+a31cV2g@mail.gmail.com>
Until now, it used the current working directory. This makes it safe
for simultaneous invocations of gendef.pl, with different target
directories, to run from a single current working directory, such as
$(top_srcdir). The MSVC build system will soon rely on this.
Christian Ullrich, reviewed by Michael Paquier.
Commit b2cbced9e instituted a policy of referring to the timezone database
as the "IANA timezone database" in our user-facing documentation.
Propagate that wording into a couple of places that were still using "zic"
to refer to the database, which is definitely not right (zic is the
compilation tool, not the data).
Back-patch, not because this is very important in itself, but because
we routinely cherry-pick updates to the tznames files and I don't want
to risk future merge failures.
When building libpq, ip.c and md5.c were symlinked or copied from
src/backend/libpq into src/interfaces/libpq, but now that we have a
directory specifically for routines that are shared between the server and
client binaries, src/common/, move them there.
Some routines in ip.c were only used in the backend. Keep those in
src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file
that's now in common.
Fix the comment in src/common/Makefile to reflect how libpq actually links
those files.
There are two more files that libpq symlinks directly from src/backend:
encnames.c and wchar.c. I don't feel compelled to move those right now,
though.
Patch by Michael Paquier, with some changes by me.
Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>
Up to now we've manually adjusted these numbers in several different
Makefiles at the start of each development cycle. While that's not
much work, it's easily forgotten, so let's get rid of it by setting
the SO_MINOR_VERSION values directly from $(MAJORVERSION).
In the case of libpq, this dev cycle's value of SO_MINOR_VERSION happens
to be "10" anyway, so this switch is transparent. For ecpg's shared
libraries, this will result in skipping one or two minor version numbers
between v9.6 and v10, which seems like no big problem; and it was a bit
inconsistent that they didn't have equal minor version numbers anyway.
Discussion: <21969.1471287988@sss.pgh.pa.us>
Once upon a time, it made sense for the ecpg preprocessor to have its
own version number, because it used a manually-maintained grammar that
wasn't always in sync with the core grammar. But those days are
thankfully long gone, leaving only a maintenance nuisance behind.
Let's use the PG v10 version numbering changeover as an excuse to get
rid of the ecpg version number and just have ecpg identify itself by
PG_VERSION. From the user's standpoint, ecpg will go from "4.12" in
the 9.6 branch to "10" in the 10 branch, so there's no failure of
monotonicity.
Discussion: <1471332659.4410.67.camel@postgresql.org>
Prior to commit 7882c3b0b9, it was
possible to use LWLocks within DSM segments, but that commit broke
this use case by switching from a doubly linked list to a circular
linked list. Switch back, using a new bit of general infrastructure
for maintaining lists of PGPROCs.
Thomas Munro, reviewed by me.
This is a good bit more complicated than the average new-version stamping
commit, because it includes various adjustments in pursuit of changing
from three-part to two-part version numbers. It's likely some further
work will be needed around that change; but this is enough to get through
the regression tests, at least in Unix builds.
Peter Eisentraut and Tom Lane
Wrap the perltidy invocation into a shell script to reduce the risk of
copy-and-paste errors. Include removal of *.bak files in the script,
so they don't accidentally get committed. Improve the directions in
the README file.
Due to simplistic quoting and confusion of database names with conninfo
strings, roles with the CREATEDB or CREATEROLE option could escalate to
superuser privileges when a superuser next ran certain maintenance
commands. The new coding rule for PQconnectdbParams() calls, documented
at conninfo_array_parse(), is to pass expand_dbname=true and wrap
literal database names in a trivial connection string. Escape
zero-length values in appendConnStrVal(). Back-patch to 9.1 (all
supported versions).
Nathan Bossart, Michael Paquier, and Noah Misch. Reviewed by Peter
Eisentraut. Reported by Nathan Bossart.
Security: CVE-2016-5424
To ensure that "make installcheck" can be used safely against an existing
installation, we need to be careful about what global object names
(database, role, and tablespace names) we use; otherwise we might
accidentally clobber important objects. There's been a weak consensus that
test databases should have names including "regression", and that test role
names should start with "regress_", but we didn't have any particular rule
about tablespace names; and neither of the other rules was followed with
any consistency either.
This commit moves us a long way towards having a hard-and-fast rule that
regression test databases must have names including "regression", and that
test role and tablespace names must start with "regress_". It's not
completely there because I did not touch some test cases in rolenames.sql
that test creation of special role names like "session_user". That will
require some rethinking of exactly what we want to test, whereas the intent
of this patch is just to hit all the cases in which the needed renamings
are cosmetic.
There is no enforcement mechanism in this patch either, but if we don't
add one we can expect that the tests will soon be violating the convention
again. Again, that's not such a cosmetic change and it will require
discussion. (But I did use a quick-hack enforcement patch to find these
cases.)
Discussion: <16638.1468620817@sss.pgh.pa.us>
Change assorted places in our Perl code that did things like
system("prog $path/file");
to do it more like
system('prog', "$path/file");
which is safe against spaces and other special characters in the path
variable. The latter was already the prevailing style, but a few bits
of code hadn't gotten this memo. Back-patch to 9.4 as relevant.
Michael Paquier, Kyotaro Horiguchi
Discussion: <20160704.160213.111134711.horiguchi.kyotaro@lab.ntt.co.jp>
The new pg_check_visible() and pg_check_frozen() functions can be used to
verify that the visibility map bits for a relation's data pages match the
actual state of the tuples on those pages.
Amit Kapila and Robert Haas, reviewed (in earlier versions) by Andres
Freund. Additional testing help by Thomas Munro.
Mostly these are just comments but there are a few in documentation
and a handful in code and tests. Hopefully this doesn't cause too much
unnecessary pain for backpatching. I relented from some of the most
common like "thru" for that reason. The rest don't seem numerous
enough to cause problems.
Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
The test_pg_dump extension doesn't have a C component, so we need
to exclude it from the MSVC build system trying to figure out how
to build it.
Also add a "MODULES" line to the Makefile, as test_extensions has.
Might not be necessary, but seems good to keep things consistent.
Lastly, remove the 'installcheck' line from test_pg_dump, as that
was causing redefinition errors, at least on my box. This also
makes test_pg_dump consistent with how commit_ts is set up.
This time, use the buildfarm-supplied contents for this file, instead
of trying to update it by eyeballing the pgindent output.
Per discussion with Tom and Bruce.
This has the inverse effect of --master-only. It's needed to help find
cases where a commit should not be described in major release notes
because it was back-patched into older branches, though not at the same
time as the HEAD commit.
Adjust the way we detect the locale. As a result the minumum Windows
version supported by VS2015 and later is Windows Vista. Add some tweaks
to remove new compiler warnings. Remove documentation references to the
now obsolete msysGit.
Michael Paquier, somewhat edited by me, reviewed by Christian Ullrich.
Backpatch to 9.5
In addition to adding new typedefs, I also re-sorted the file so that
various entries add piecemeal, mostly or entirely by me, were alphabetized
the same way as other entries in the file.
In commit c0b050192, Andres introduced the idea of including one-line
commit references in our major release notes. Teach git_changelog to
emit a (lightly adapted) version of that format, so that we don't
have to laboriously add it to the notes after the fact. The default
output isn't changed, since I anticipate still using that for minor
release notes.
It was incorrectly declared as a volatile pointer to a non-volatile
structure. Eliminate the OldSnapshotControl struct definition; it
is really not needed. Pointed out by Tom Lane.
While at it, add OldSnapshotControlData to pgindent's list of
structures.
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).
To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).
As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths. There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.
As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.
This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me. Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.
Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
The buildfarm showed failure for Windows MSVC builds due to this
omission. This might not be the only problem with the Makefile for
this feature, but hopefully this will get it past the immediate
problem.
Fix suggested by Tom Lane
Most of what is produced by the detailed verbosity level is of no
interest at all, so switch to the normal level for more usable output.
Christian Ullrich
Backpatch to all live branches
Previously synchronous replication offered only the ability to confirm
that all changes made by a transaction had been transferred to at most
one synchronous standby server.
This commit extends synchronous replication so that it supports multiple
synchronous standby servers. It enables users to consider one or more
standby servers as synchronous, and increase the level of transaction
durability by ensuring that transaction commits wait for replies from
all of those synchronous standbys.
Multiple synchronous standby servers are configured in
synchronous_standby_names which is extended to support new syntax of
'num_sync ( standby_name [ , ... ] )', where num_sync specifies
the number of synchronous standbys that transaction commits need to
wait for replies from and standby_name is the name of a standby
server.
The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before
is also still supported. It's the same as new syntax with num_sync=1.
This commit doesn't include "quorum commit" feature which was discussed
in pgsql-hackers. Synchronous standbys are chosen based on their priorities.
synchronous_standby_names determines the priority of each standby for
being chosen as a synchronous standby. The standbys whose names appear
earlier in the list are given higher priority and will be considered as
synchronous. Other standby servers appearing later in this list
represent potential synchronous standbys.
The regression test for multiple synchronous standbys is not included
in this commit. It should come later.
Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs,
Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen,
Rajeev Rastogi
Many thanks to the various individuals who were involved in
discussing and developing this feature.
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output. Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.
Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
We don't really want to encourage people to write numeric SQLSTATEs in
programs; that's unreadable and error-prone. Copy plpgsql's infrastructure
for converting between SQLSTATEs and exception names shown in Appendix A,
and modify examples in tests and documentation to do it that way.
This completes (at least for now) the project of getting rid of ad-hoc
linkages among the src/bin/ subdirectories. Everything they share is now
in src/fe_utils/ and is included from a static library at link time.
A side benefit is that we can restore the FLEX_NO_BACKUP check for
psqlscanslash.l. We might need to think of another way to do that check
if we ever need to build two lexers with that property in the same source
directory, but there's no foreseeable reason to need that.
Per discussion, we want to create a static library and put the stuff into
it that until now has been shared across src/bin/ directories by ad-hoc
methods like symlinking a source file. This commit creates the library and
populates it with a couple of files that contain the widely-useful portions
of pg_dump's dumputils.c file. dumputils.c survives, because it has some
stuff that didn't seem appropriate for fe_utils, but it's significantly
smaller and is no longer referenced from any other directory.
Follow-on patches will move more stuff into fe_utils.
The Mkvcbuild.pm hacking here is just a best guess; we'll see how the
buildfarm likes it.
Now that we have src/common/ for code shared between frontend and backend,
we can get rid of (most of) the klugy ways that the keyword table and
keyword lookup code were formerly shared between different uses.
This is a first step towards a more general plan of getting rid of
special-purpose kluges for sharing code in src/bin/.
I chose to merge kwlookup.c back into keywords.c, as it once was, and
always has been so far as keywords.h is concerned. We could have
kept them separate, but there is noplace that uses ScanKeywordLookup
without also wanting access to the backend's keyword list, so there
seems little point.
ecpg is still a bit weird, but at least now the trickiness is documented.
I think that the MSVC build script should require no adjustments beyond
what's done here ... but we'll soon find out.
Commit ac1d794 ("Make idle backends exit if the postmaster dies.")
introduced a regression on, at least, large linux systems. Constantly
adding the same postmaster_alive_fds to the OSs internal datastructures
for implementing poll/select can cause significant contention; leading
to a performance regression of nearly 3x in one example.
This can be avoided by using e.g. linux' epoll, which avoids having to
add/remove file descriptors to the wait datastructures at a high rate.
Unfortunately the current latch interface makes it hard to allocate any
persistent per-backend resources.
Replace, with a backward compatibility layer, WaitLatchOrSocket with a
new WaitEventSet API. Users can allocate such a Set across multiple
calls, and add more than one file-descriptor to wait on. The latter has
been added because there's upcoming postgres features where that will be
helpful.
In addition to the previously existing poll(2), select(2),
WaitForMultipleObjects() implementations also provide an epoll_wait(2)
based implementation to address the aforementioned performance
problem. Epoll is only available on linux, but that is the most likely
OS for machines large enough (four sockets) to reproduce the problem.
To actually address the aforementioned regression, create and use a
long-lived WaitEventSet for FE/BE communication. There are additional
places that would benefit from a long-lived set, but that's a task for
another day.
Thanks to Amit Kapila, who helped make the windows code I blindly wrote
actually work.
Reported-By: Dmitry Vasilyev Discussion:
CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com20160114143931.GG10941@awork2.anarazel.de
Previously latches for windows and unix had been implemented in
different files. A later patch introduce an expanded wait
infrastructure, keeping the implementation separate would introduce too
much duplication.
This basically just moves the functions, without too much change. The
reason to keep this separate is that it allows blame to continue working
a little less badly; and to make review a tiny bit easier.
Discussion: 20160114143931.GG10941@awork2.anarazel.de
After the previous fix in 6f1f34c9 msvc ended up looking for psqlscan.c
in the wrong directory.
David's fix just forces the path to be adjusted. That's not a
particularly pretty fix, but it hopefully will make the buildfarm green
again.
Author: David Rowley
Discussion: CAKJS1f_9CCi_t+LEgV5GWoCj3wjavcMoDc5qfcf_A0UwpQoPoA@mail.gmail.com
pgbench now needs to use src/bin/psql/psqlscan.l, but it's not very clear
how to fit that into the MSVC build system. If this doesn't work I'm going
to need some help from somebody who actually understands those scripts ...
Modern Perl has removed psed from its core distribution, so it might not
be readily available on some build platforms. We therefore replace its
use with a Perl script generated by s2p, which is equivalent to the sed
script. The latter is retained for non-MSVC builds to avoid creating a
new hard dependency on Perl for non-Windows tarball builds.
Backpatch to all live branches.
Michael Paquier and me.
This gets us to a point where psqlscan.l can be used by other frontend
programs for the same purpose psql uses it for, ie to detect when it's
collected a complete SQL command from input that is divided across
line boundaries. Moreover, other programs can supply their own lexers
for backslash commands of their own choosing. A follow-on patch will
use this in pgbench.
The end result here is roughly the same as in Kyotaro Horiguchi's
0001-Make-SQL-parser-part-of-psqlscan-independent-from-ps.patch, although
the details of the method for switching between lexers are quite different.
Basically, in this patch we share the entire PsqlScanState, YY_BUFFER_STATE
stack, *and* yyscan_t between different lexers. The only thing we need
to do to switch to a different lexer is to make sure the start_state is
valid for the new lexer. This works because flex doesn't keep any other
persistent state that depends on the specific lexing tables generated for
a particular .l file. (We are assuming that both lexers are built with
the same flex version, or at least versions that are compatible with
respect to the contents of yyscan_t; but that doesn't seem likely to
be a big problem in practice, considering how slowly flex changes.)
Aside from being more efficient than Horiguchi-san's original solution,
this avoids possible corner-case changes in semantics: the original code
was capable of popping the input buffer stack while still staying in
backslash-related parsing states. I'm not sure that that equates to any
useful user-visible behaviors, but I'm not sure it doesn't either, so
I'm loath to assume that we only need to consider the topmost buffer when
parsing a backslash command.
I've attempted to update the MSVC build scripts for the added .l file,
but will rely on the buildfarm to see if I missed anything.
Kyotaro Horiguchi and Tom Lane
Up to now checkpoints were written in the order they're in the
BufferDescriptors. That's nearly random in a lot of cases, which
performs badly on rotating media, but even on SSDs it causes slowdowns.
To avoid that, sort checkpoints before writing them out. We currently
sort by tablespace, relfilenode, fork and block number.
One of the major reasons that previously wasn't done, was fear of
imbalance between tablespaces. To address that balance writes between
tablespaces.
The other prime concern was that the relatively large allocation to sort
the buffers in might fail, preventing checkpoints from happening. Thus
pre-allocate the required memory in shared memory, at server startup.
This particularly makes it more efficient to have checkpoint flushing
enabled, because that'll often result in a lot of writes that can be
coalesced into one flush.
Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
Currently writes to the main data files of postgres all go through the
OS page cache. This means that some operating systems can end up
collecting a large number of dirty buffers in their respective page
caches. When these dirty buffers are flushed to storage rapidly, be it
because of fsync(), timeouts, or dirty ratios, latency for other reads
and writes can increase massively. This is the primary reason for
regular massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.
On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.
Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache. Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.
While desirable and likely possible this patch does not contain an
implementation for windows.
With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.
A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences. This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.
Discussion: alpine.DEB.2.10.1506011320000.28433@sto
Author: Fabien Coelho and Andres Freund
Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.
See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind
This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.
Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
Add four new SQL accessible functions: pg_control_system(),
pg_control_checkpoint(), pg_control_recovery(), and pg_control_init()
which expose a subset of the control file data.
Along the way move the code to read and validate the control file to
src/common, where it can be shared by the new backend functions
and the original pg_controldata frontend program.
Patch by me, significant input, testing, and review by Michael Paquier.
This makes the flag more visible for testers using the default file as a
template, increasing the likelyhood that the test suite will be run.
Also have the flag be displayed in the fake "configure" output, if set.
This patch is two new lines only, but perltidy decides to shift things
around which makes it appear a bit bigger.
Author: Michaël Paquier
Reviewed-by: Craig Ringer
Discussion: https://www.postgresql.org/message-id/CAB7nPqRet6UAP2APhZAZw%3DVhJ6w-Q-gGLdZkrOqFgd2vc9-ZDw%40mail.gmail.com
Architecture reference material specifies this order, and s_lock.h
inline assembly agrees. The former order failed to provide mutual
exclusion to lwlock.c and perhaps to other clients. The two xlc
buildfarm members, hornet and mandrill, have failed sixteen times with
duplicate key errors involving pg_class_oid_index or pg_type_oid_index.
Back-patch to 9.5, where commit b64d92f1a5
introduced atomics.
Reviewed by Andres Freund and Tom Lane.
Move and refactor the underlying code for the pg_config client
application to src/common in support of sharing it with a new
system information SRF called pg_config() which makes the same
information available via SQL. Additionally wrap the SRF with a
new system view, as called pg_config.
Patch by me with extensive input and review by Michael Paquier
and additional review by Alvaro Herrera.
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs. Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks. To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.
Amit Kapila and Robert Haas
This will enable PL/Java to be cleanly compiled, as dynloader.h is a
requirement.
Report by Chapman Flack
Patch by Michael Paquier
Backpatch through 9.1
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
pg_ctl is using isatty() to verify whether the process is running in a
terminal, and if not it sends its output to Windows' Event Log ... which
does the wrong thing when the output has been redirected to a pipe, as
reported in bug #13592.
To fix, make pg_ctl use the code we already have to detect service-ness:
in the master branch, move src/backend/port/win32/security.c to src/port
(with suitable tweaks so that it runs properly in backend and frontend
environments); pg_ctl already has access to pgport so it Just Works. In
older branches, that's likely to cause trouble, so instead duplicate the
required code in pg_ctl.c.
Author: Michael Paquier
Bug report and diagnosis: Egon Kocjan
Backpatch: all supported branches
The need for this is shown by the files it missed in Bruce's recent run.
I fixed it so that it will actually adjust the case when needed.
In passing, also make it skip .po files, since those will just get
overwritten anyway from the translation repository.
As of commit d5563d7df, psql -c no longer implies -X, but not all of
our regression testing scripts had gotten that memo.
To ensure consistency of results across different developers, make
sure that *all* invocations of psql in all scripts in our tree
use -X, even where this is not what previously happened.
Michael Paquier and Tom Lane
Previously, each subroutine in initdb fired up its own standalone backend
session. Over time we'd grown as many as fifteen of these sessions,
and the cumulative startup and shutdown work for them was getting pretty
noticeable. Combining things so that all these steps share a single
backend session cuts a good 10% off the total runtime of initdb, more
if you're not fsync'ing.
The main stumbling block to doing this before was that some of the sessions
were run with -j and some not. The improved definition of -j mode
implemented by my previous commit makes it possible to fix that by running
all the post-bootstrap steps with -j; we just have to use double instead of
single newlines to end command strings. (This is only absolutely necessary
around the VACUUM and CREATE DATABASE steps, since those can't be run in a
transaction block. But it seems best to make them all use double newlines
so that the commands remain separate for error-reporting purposes.)
A minor disadvantage is that since initdb can't tell how much of its
output the backend has executed, we can no longer have the per-step
progress reporting initdb used to print. But things are fast enough
nowadays that that's not really all that useful anyway.
In passing, add more const decoration to some of the static arrays in
initdb.c.
Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two. Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).
We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche. This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.
Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.
Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.
The target is now named 'bincheck' rather than 'tapcheck' so that it
reflects what is checked instead of the test mechanism. Some of the
logic is improved, making it easier to add further sets of TAP based
tests in future. Also, the environment setting logic is imrpoved.
As discussed on -hackers a couple of months ago.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
Without CASCADE, if an extension has an unfullfilled dependency on
another extension, CREATE EXTENSION ERRORs out with "required extension
... is not installed". That is annoying, especially when that dependency
is an implementation detail of the extension, rather than something the
extension's user can make sense of.
In addition to CASCADE this also includes a small set of regression
tests around CREATE EXTENSION.
Author: Petr Jelinek, editorialized by Michael Paquier, Andres Freund
Reviewed-By: Michael Paquier, Andres Freund, Jeff Janes
Discussion: 557E0520.3040800@2ndquadrant.com
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.
We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.
This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
this has gotten much easier
During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
segments. This happened to be hard or impossible to hit due to the
interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
that doesn't work for offsets that haven't yet been flushed to
disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
of emergency vacuuming. The problem remains in
pg_get_multixact_members(), but that's quite harmless.
For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.
For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound. To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.
A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.
Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.
Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
The reverted changes did not narrow the semantic gap between the MSVC
build system and the GNU make build system. For targets old and new
that run multiple suites (contribcheck, modulescheck, tapcheck), restore
vcregress.pl to mimicking "make -k" rather than the "make -S" default.
Lack of "-k" would be more burdensome than lack of "-S". Keep changes
reflecting contemporary changes to the GNU make build system, and keep
updates to Makefile parsing. Keep the loss of --psqldir in "check" and
"ecpgcheck" targets; it had been a no-op when used alongside
--temp-install. No log message mentioned any of the reverted changes.
Based on a germ by Michael Paquier. Back-patch to 9.5.
This code relied on knowing exactly where in the source tree temporary
installations might appear. A reasonable hacker may not think to update
this code when adding use of a temporary installation, making it
fragile. Observe that commit 9fa8b0ee90
broke it unnoticed, and commit dcae5facca
fixed it unnoticed. Back-patch to 9.5 only; use of temporary
installations is unlikely to change in released versions.
On Windows, use listen_address=127.0.0.1 to allow TCP connections. We were
already using "pg_regress --config-auth" to set up HBA appropriately. The
standard_initdb helper function now sets up the server's
unix_socket_directories or listen_addresses in the config file, so that
they don't need to be specified in the pg_ctl command line anymore. That
way, the pg_ctl invocations in test programs don't need to differ between
Windows and Unix.
Add another helper function to configure the server's pg_hba.conf to allow
replication connections. The configuration is done similarly to "pg_regress
--config-auth": trust on domain sockets on Unix, and SSPI authentication on
Windows.
Replace calls to "cat" and "touch" programs with built-in perl code, as
those programs don't normally exist on Windows.
Add instructions in the docs on how to install IPC::Run on Windows. Adjust
vcregress.pl to not replace PERL5LIB completely in vcregress.pl, because
otherwise cannot install IPC::Run in a non-standard location easily.
Michael Paquier, reviewed by Noah Misch, some additional tweaking by me.
Build.bat and vcregress.bat got similar treatment years ago. I'm not sure
why install.bat wasn't treated at the same time, but it seems like a good
idea anyway.
The immediate problem with the old install.bat was that it had quoting
issues, and wouldn't work if the target directory's name contained spaces.
This fixes that problem.
This reverts commit 16304a0134, except
for its changes in src/port/snprintf.c; as well as commit
cac18a76bb which is no longer needed.
Fujii Masao reported that the previous commit caused failures in psql on
OS X, since if one exits the pager program early while viewing a query
result, psql sees an EPIPE error from fprintf --- and the wrapper function
thought that was reason to panic. (It's a bit surprising that the same
does not happen on Linux.) Further discussion among the security list
concluded that the risk of other such failures was far too great, and
that the one-size-fits-all approach to error handling embodied in the
previous patch is unlikely to be workable.
This leaves us again exposed to the possibility of the type of failure
envisioned in CVE-2015-3166. However, that failure mode is strictly
hypothetical at this point: there is no concrete reason to believe that
an attacker could trigger information disclosure through the supposed
mechanism. In the first place, the attack surface is fairly limited,
since so much of what the backend does with format strings goes through
stringinfo.c or psprintf(), and those already had adequate defenses.
In the second place, even granting that an unprivileged attacker could
control the occurrence of ENOMEM with some precision, it's a stretch to
believe that he could induce it just where the target buffer contains some
valuable information. So we concluded that the risk of non-hypothetical
problems induced by the patch greatly outweighs the security risks.
We will therefore revert, and instead undertake closer analysis to
identify specific calls that may need hardening, rather than attempt a
universal solution.
We have kept the portion of the previous patch that improved snprintf.c's
handling of errors when it calls the platform's sprintf(). That seems to
be an unalloyed improvement.
Security: CVE-2015-3166
All known standard library implementations of these functions can fail
with ENOMEM. A caller neglecting to check for failure would experience
missing output, information exposure, or a crash. Check return values
within wrappers and code, currently just snprintf.c, that bypasses the
wrappers. The wrappers do not return after an error, so their callers
need not check. Back-patch to 9.0 (all supported versions).
Popular free software standard library implementations do take pains to
bypass malloc() in simple cases, but they risk ENOMEM for floating point
numbers, positional arguments, large field widths, and large precisions.
No specification demands such caution, so this commit regards every call
to a printf family function as a potential threat.
Injecting the wrappers implicitly is a compromise between patch scope
and design goals. I would prefer to edit each call site to name a
wrapper explicitly. libpq and the ECPG libraries would, ideally, convey
errors to the caller rather than abort(). All that would be painfully
invasive for a back-patched security fix, hence this compromise.
Security: CVE-2015-3166
Currently regression tests for python 3 are disabled on MSVC, and these
tests fail with python 3, too, so we have some work to do to enable
both. Meanwhile, all the buildfarm hosts seem to be building with python
2 anyway, so this at least gets us some coverage.
Original patch from Michael Paquier, significantly modified by me.
With this patch the MSVC build and installation will work correctly with
the transforms. However the python transform tests for hstore and ltree
are still disabled pending some further adjustments.
Michael Paquier with some tweaks from me.
Switching the Windows build scripts to use forward slashes instead of
backslashes has caused a couple of issues in VC builds:
- The file tree list was not correctly generated, build script
generating vcproj file missing tree dependencies when listing items in
Filter.
- VC builds do not accept file paths with forward slashes, perhaps it
could be possible to use a Condition but it seems safer to simply
enforce the file paths to use backslashes in the vcproj files.
- chkpass had an unneeded dependency with libpgport and libpgcommon to
make build succeed but actually it is not necessary as crypt.c is
already listed for this project and should be replaced with a fake name
as it is a unique file.
Michael Paquier
This makes it possible to run some stages of these build scripts on
non-Windows systems. That way, we can more easily test whether file
moves or makefile changes might break the MSVC build.
Peter Eisentraut and Michael Paquier
The majority practice is to add -DFRONTEND in directories building files
that are, at other times, built for the backend. Some directories
lacking that property added a noise -DFRONTEND in one build system.
Remove the excess flags, for consistency.
Each of the libraries incorporates src/port files, which often check
FRONTEND. Build systems disagreed on whether to build libpgtypes this
way. Only libecpg incorporates files that rely on it today. Back-patch
to 9.0 (all supported versions) to forestall surprises.
Before, make check-world would create a new temporary installation for
each test suite, which is slow and wasteful. Instead, we now create one
test installation that is used by all test suites that are part of a
make run.
The management of the temporary installation is removed from pg_regress
and handled in the makefiles. This allows for better control, and
unifies the code with that of test suites not run through pg_regress.
review and msvc support by Michael Paquier <michael.paquier@gmail.com>
more review by Fabien Coelho <coelho@cri.ensmp.fr>
These modules have to be installed so that the testing module can access
them. (We don't have that yet, but will soon have it.)
Author: Michael Paquier
Reviewed by: Andrew Dunstan
Modern x86 and x86-64 processors with SSE 4.2 support have special
instructions, crc32b and crc32q, for calculating CRC-32C. They greatly
speed up CRC calculation.
Whether the instructions can be used or not depends on the compiler and the
target architecture. If generation of SSE 4.2 instructions is allowed for
the target (-msse4.2 flag on gcc and clang), use them. If they are not
allowed by default, but the compiler supports the -msse4.2 flag to enable
them, compile just the CRC-32C function with -msse4.2 flag, and check at
runtime whether the processor we're running on supports it. If it doesn't,
fall back to the slicing-by-8 algorithm. (With the common defaults on
current operating systems, the runtime-check variant is what you get in
practice.)
Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.
Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
This is a long-standing inconsistency that was probably just missed when
we got 64 bit MSVC builds. This brings the platform into line with all
other systems.
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.
Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.
Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
It worked in my Windows VM with VS2013, but buildfarm animal mastodon,
running MSVC 2005, was not happy. Amit Kapila also reported a similar error
earlier in his environment. Let's see if this helps.
Earlier versions of this tool were available (and still are) on github.
Thanks to Michael Paquier, Alvaro Herrera, Peter Eisentraut, Amit Kapila,
and Satoshi Nagayasu for review.
Instead of copying xlogreader.c and *desc.c files into the source directory,
build them where they are. That's what we do for other binaries that need to
compile and link in files from elsewhere in the source tree.
The commit history suggests that it was done this way because of issues with
older versions of MSVC. I think this should work, but we'll see if the
buildfarm complains.
Since commit cb4a3b04 we were already doing this for the Cygwin/mingw
toolchains, but MSVC had not been updated to do it. At Install.pm time,
the Makefile (or GNUmakefile) is inspected, and if a line matching
SO_MAJOR_VERSION is found (indicating a shared library is being built),
then files with the .dll extension are set to be installed in bin/
rather than lib/, while files with .lib extension are installed in lib/.
This makes the MSVC toolchain up to date with cygwin/mingw.
This removes ad-hoc hacks that were copying files into bin/ or lib/
manually (libpq.dll in particular was already being copied into bin).
So while this is a rather ugly kludge, it's still cleaner than what was
there before.
Author: Michael Paquier
Reviewed by: Asif Naeem
Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated. Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here. This also
adds support for the modulo operator (%), with the same semantics as in
C.
Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost
Code like
open(P, "cl /? 2>&1 |") || die "cl command not found";
does not actually catch any errors, because the exit status of the
command before the pipe is ignored. The fix is to look at $?.
This also gave the opportunity to clean up the logic of this code a bit.
The meta data of PGLZ symbolized by PGLZ_Header is removed, to make
the compression and decompression code independent on the backend-only
varlena facility. PGLZ_Header is being used to store some meta data
related to the data being compressed like the raw length of the uncompressed
record or some varlena-related data, making it unpluggable once PGLZ is
stored in src/common as it contains some backend-only code paths with
the management of varlena structures. The APIs of PGLZ are reworked
at the same time to do only compression and decompression of buffers
without the meta-data layer, simplifying its use for a more general usage.
On-disk format is preserved as well, so there is no incompatibility with
previous major versions of PostgreSQL for TOAST entries.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. Especially this commit is required
for upcoming WAL compression feature so that the WAL reader facility can
decompress the WAL data by using pglz_decompress.
Michael Paquier, reviewed by me.
Backpatch to 9.3 where src/common was introduce, because a bugfix that
needs to be backpatched, requires the function. Earlier branches will
have to duplicate the code.
Windows versions later than Windows Server 2003 map "localhost" to ::1.
Account for that in the generated pg_hba.conf, fixing another oversight
in commit f6dc6dd5ba. Back-patch to 9.0,
like that commit.
David Rowley and Noah Misch
This reverts commit 60838df922.
That change needs a bit more thought to be workable. In view of
the potentially machine-dependent stuff that went in today,
we need all of the buildfarm to be testing those other changes.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error instead.
This commit is required for upcoming WAL compression feature so that
the WAL reader facility can decompress the WAL data by using pglz_decompress.
Michael Paquier
Back-patch to 9.0 (all supported versions). This is mere
future-proofing in the context of the master branch, but commit
f6dc6dd5ba requires it of older branches.
Use SSPI authentication to allow connections exclusively from the OS
user that launched the test suite. This closes on Windows the
vulnerability that commit be76a6d39e
closed on other platforms. Users of "make installcheck" or custom test
harnesses can run "pg_regress --config-auth=DATADIR" to activate the
same authentication configuration that "make check" would use.
Back-patch to 9.0 (all supported versions).
Security: CVE-2014-0067
The old pattern would match files with strange extensions like *.ry or
*.lpp. Refactor it to only include files with known extensions, and to make
it more readable.
Per Andrew Dunstan's suggestion.
These comments don't seem to have been touched in a long time. Make them
describe the current implementation rather than what was here last century,
and be a bit more explicit about the unreferenced-typedefs issue.
pg_atomic_init_u64 (indirectly) uses compare/exchange to guarantee
atomic writes on platforms where compare/exchange is available, but
64bit writes aren't atomic (yes, those exist). That leads to a
harmless read of the initial value of variable.
As 'ALTER TABLESPACE .. MOVE ALL' really didn't change the tablespace
but instead changed objects inside tablespaces, it made sense to
rework the syntax and supporting functions to operate under the
'ALTER (TABLE|INDEX|MATERIALIZED VIEW)' syntax and to be in
tablecmds.c.
Pointed out by Alvaro, who also suggested the new syntax.
Back-patch to 9.4.
In support of this, have the MSVC build follow GNU make in preferring
GNUmakefile over Makefile when a directory contains both.
Michael Paquier, reviewed by MauMau.
This refactoring is in preparation for adding support for other SSL
implementations, with no user-visible effects. There are now two #defines,
USE_OPENSSL which is defined when building with OpenSSL, and USE_SSL which
is defined when building with any SSL implementation. Currently, OpenSSL is
the only implementation so the two #defines go together, but USE_SSL is
supposed to be used for implementation-independent code.
The libpq SSL code is changed to use a custom BIO, which does all the raw
I/O, like we've been doing in the backend for a long time. That makes it
possible to use MSG_NOSIGNAL to block SIGPIPE when using SSL, which avoids
a couple of syscall for each send(). Probably doesn't make much performance
difference in practice - the SSL encryption is expensive enough to mask the
effect - but it was a natural result of this refactoring.
Based on a patch by Martijn van Oosterhout from 2006. Briefly reviewed by
Alvaro Herrera, Andreas Karlsson, Jeff Janes.
ws2_32 is the new version of the library that should be used, as
it contains the require functionality from wsock32 as well as some
more (which is why some binaries were already using ws2_32).
Michael Paquier, reviewed by MauMau
Prominent binaries already had this metadata. A handful of minor
binaries, such as pg_regress.exe, still lack it; efforts to eliminate
such exceptions are welcome.
Michael Paquier, reviewed by MauMau.
Unlike "make" itself, the MSVC build process recognized a continuation
even with whitespace after the backslash. (Due to a typo, some code
sites accepted the letter "s" instead of whitespace). Also, it would
consume any number of newlines following a single backslash. This is
mere cleanup; those behaviors were unlikely to cause bugs.
Achieve this by consistently using four-argument Solution::AddProject()
calls. Remove ad hoc Makefile parsing made redundant by doing that.
Michael Paquier and Noah Misch, reviewed by MauMau.
Catch up with commit b8cc8f94730610c0189aa82dfec4ae6ce9b13e34's
introduction of the HAVE_UUID_OSSP symbol to the principal build
process. Back-patch to 9.4, where that commit appeared.
This function is pervasive on free software operating systems; import
NetBSD's implementation. Back-patch to 8.4, like the commit that will
harness it.
The recent addition of regression tests to uuid-ossp exposed the fact
that the MSVC build system wasn't being consistent about whether it was
building/testing that contrib module, ie, it would try to test the module
even when it hadn't built it. The same hazard was latent for sslinfo.
For the moment I just copied the more up-to-date logic from point A to
point B, but this is screaming for refactoring.
Per buildfarm results.
config.pl and buildenv.pl can be used to customize build settings when
using MSVC. They should never get committed into the common source tree.
Back-patch to 9.0; it looks like the rules were different in 8.4.
Michael Paquier
It's easy to forget using SYSTEMQUOTEs when constructing command strings
for system() or popen(). Even if we fix all the places missing it now, it is
bound to be forgotten again in the future. Introduce wrapper functions that
do the the extra quoting for you, and get rid of SYSTEMQUOTEs in all the
callers.
We previosly used SYSTEMQUOTEs in all the hard-coded command strings, and
this doesn't change the behavior of those. But user-supplied commands, like
archive_command, restore_command, COPY TO/FROM PROGRAM calls, as well as
pgbench's \shell, will now gain an extra pair of quotes. That is desirable,
but if you have existing scripts or config files that include an extra
pair of quotes, those might need to be adjusted.
Reviewed by Amit Kapila and Tom Lane
Now that we're pretty much feature-frozen, it's time to update the checks
on system catalog foreign-key references.
(It looks like we missed doing this altogether for 9.3. Sigh.)
This has probably been broken for quite a long time. Buildfarm member
currawong's current results suggest that it's been broken since 9.1, so
backpatch this to that branch.
This only supports Python 2 - I will handle Python 3 separately, but
this is a fairly simple fix.
In order for this to work, walsenders need the optional ability to
connect to a database, so the "replication" keyword now allows true
or false, for backward-compatibility, and the new value "database"
(which causes the "dbname" parameter to be respected).
walsender needs to loop not only when idle but also when sending
decoded data to the user and when waiting for more xlog data to decode.
This means that there are now three separate loops inside walsender.c;
although some refactoring has been done here, this is still a bit ugly.
Andres Freund, with contributions from Álvaro Herrera, and further
review by me.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
The ASLR in Windows 8/Windows 2012 can break PostgreSQL's shared memory. It
doesn't fail every time (which is explained by the Random part in ASLR), but
can fail with errors abut failing to reserve shared memory region.
MauMau, reviewed by Craig Ringer
This should make the MSVC build act more like builds for other platforms,
i.e. backend global variables will be automatically available to loadable
libraries without need for explicit PGDLLIMPORT marking.
Craig Ringer
Providing this information as plain text was doubtless worth the trouble
ten years ago, but it seems likely that hardly anyone reads it in this
format anymore. And the effort required to maintain these files (in the
form of extra-complex markup rules in the relevant parts of the SGML
documentation) is significant. So, let's stop doing that and rely solely
on the other documentation formats.
Per discussion, the plain-text INSTALL instructions might still be worth
their keep, so we continue to generate that file.
Rather than remove HISTORY and src/test/regress/README from distribution
tarballs entirely, replace them with simple stub files that tell the reader
where to find the relevant documentation. This is mainly to avoid possibly
breaking packaging recipes that expect these files to exist.
Back-patch to all supported branches, because simplifying the markup
requirements for release notes won't help much unless we do it in all
branches.
Replication slots are a crash-safe data structure which can be created
on either a master or a standby to prevent premature removal of
write-ahead log segments needed by a standby, as well as (with
hot_standby_feedback=on) pruning of tuples whose removal would cause
replication conflicts. Slots have some advantages over existing
techniques, as explained in the documentation.
In a few places, we refer to the type of replication slots introduced
by this patch as "physical" slots, because forthcoming patches for
logical decoding will also have slots, but with somewhat different
properties.
Andres Freund and Robert Haas
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment. There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.
This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.
Patch by me. Review by Andres Freund and KaiGai Kohei.
In the MSVC build system we've never separated krb5 from gss,
and always built them both. Since the removal of native krb5
support, this parameter only controls GSSAPI, so rename it
accordingly.
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).
libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
objects from one tablespace to another. This can be extremely handy and avoids
a lot of error-prone scripting. ALTER TABLESPACE ... MOVE will only move
objects the user owns, will notify the user if no objects were found, and can
be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
MATERIALIZED VIEWS).
Previously, lookups of non-existent user names could return "Success";
it will now return "User does not exist" by resetting errno. This also
centralizes the user name lookup code in libpgport.
Report and analysis by Nicolas Marchildon; patch by me
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba983. This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all. Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.
Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.
Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.
Many committers seem to now be using a work flow in which back-patched
commits are timestamped minutes or even hours apart in different branches
(most likely because they commit in one branch before starting work on
the next one). git_changelog was failing to merge its reports in such
cases, so increase the max time it's willing to merge commits across.
I considered getting rid of the limit altogether, but that produces
some odd results in terms of how the merged commit gets sorted relative
to unrelated commits.
The C and POSIX standards state that strncpy's behavior is undefined when
source and destination areas overlap. While it remains dubious whether any
implementations really misbehave when the pointers are exactly equal, some
platforms are now starting to force the issue by complaining when an
undefined call occurs. (In particular OS X 10.9 has been seen to dump core
here, though the exact set of circumstances needed to trigger that remain
elusive. Similar behavior can be expected to be optional on Linux and
other platforms in the near future.) So tweak the code to explicitly do
nothing when nothing need be done.
Back-patch to all active branches. In HEAD, this also lets us get rid of
an exception in valgrind.supp.
Per discussion of a report from Matthias Schmitt.
asprintf(), aside from not being particularly portable, has a fundamentally
badly-designed API; the psprintf() function that was added in passing in
the previous patch has a much better API choice. Moreover, the NetBSD
implementation that was borrowed for the previous patch doesn't work with
non-C99-compliant vsnprintf, which is something we still have to cope with
on some platforms; and it depends on va_copy which isn't all that portable
either. Get rid of that code in favor of an implementation similar to what
we've used for many years in stringinfo.c. Also, move it into libpgcommon
since it's not really libpgport material.
I think this patch will be enough to turn the buildfarm green again, but
there's still cosmetic work left to do, namely get rid of pg_asprintf()
in favor of using psprintf(). That will come in a followon patch.
Continuing 63f32f3416, libpgcommon should
depend on libpgport, but not vice versa. But wait_result_to_str() in
wait_error.c depends on pstrdup() in libpgcommon. So move exec.c and
wait_error.c from libpgport to libpgcommon. Also switch the link order
in the place that's actually used by the failing ecpg builds.
The function declarations have been left in port.h for now. That should
perhaps be separated sometime.
Commit 95ef6a3448 removed the
ability to create rules on an individual column as of 7.3, but
left some residual code which has since been useless. This cleans
up that dead code without any change in behavior other than
dropping the useless column from the catalog.
Update emacs.samples with new configuration snippets that match pgindent
et al. formatting more accurately and follow Emacs Lisp best practices
better.
Add .dir-locals.el with a subset of that configuration for casual
editing and viewing.
Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>
Reviewed-by: Noah Misch <noah@leadboat.com>
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.
Patch by Michael Paquier, reviewed by Andres Freund and me.
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator. This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks. Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions. An included "suppression file" silences nominal
errors we don't plan to fix.
Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.
This appears to cause some intermittent file system problems
on Windows 8. Instead, set up the old data directory in its
intended final location to start with.
I had thought we weren't using this version of pqsignal() at all on
Windows, but that's wrong --- initdb is using it (and coping with the
POSIX-ish semantics of bare signal() :-(). So allow the file to be
built in WIN32+FRONTEND case, and add it to the MSVC build logic.
The previous commit didn't work on MSVC editions earlier than
Visual Studio 2011, apparently. This works by copying files into the
contrib directory, and making provision to clean them up, which should
work on all editions.
This enables non-backend code, such as pg_xlogdump, to use it easily.
The previous location, in src/backend/catalog/catalog.c, made that
essentially impossible because that file depends on many backend-only
facilities; so this needs to live separately.
The previous order of steps didn't literally work, because git clean
-fdx would delete the downloaded typedefs.list. Also, pgindent needs to
be called with a path when one is in at the top of the build tree.
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines. We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.
The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.
At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly. To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc(). This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.
This lets us clean up some places that were already with
localized hacks.
Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that. libpgcommon infrastructure was authored by Álvaro.
This ensure the version number increases over time. The first three digits
in the version number is still set to the actual PostgreSQL version
number, but the last one is intended to be an ever increasing build number,
which previosly failed when it changed between 1, 2 and 3 digits long values.
Noted by Deepak
Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".
This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
This makes the naming inside plpgsql consistent and distinguishes the
file from the backend's gram.y file. It will also allow easier
refactoring of the bison make rules later on.
This script is a bit slow, but still it only takes a fraction of the time
the bison run does, so the overhead doesn't seem intolerable. And we
definitely need some mechanical aid here, because people keep missing
the need to add new keywords to the appropriate keyword-list production.
While at it, I moved check_keywords.pl from src/tools into
src/backend/parser where it's actually used, and did some very minor
cleanup on the script.
This was removed in commit cd00406774,
we're not quite sure why, but there have been reports of crashes due
to AS Perl being built with it when we are not, and it certainly
seems like the right thing to do. There is still some uncertainty
as to why it sometimes fails and sometimes doesn't.
Original patch from Owais Khani, substantially reworked and
extended by Andrew Dunstan.
We seem to have a rough policy that our Perl scripts should work with
Perl 5.8, so make this one do so. Main change is to not use the newfangled
\h character class in regexes; "[ \t]" is a serviceable replacement.
This includes fixing the MSVC copy of ecpg/preproc's version info, which
seems to have been overlooked repeatedly. Can't we fix that so there are
not two copies??
The header file is needed by any module that wants to use the PL/pgSQL
instrumentation plugin interface. Most notably, the pldebugger plugin needs
this. With this patch, it can be built using pgxs, without having the full
server source tree available.
This makes it much more convenient to build tools for Postgres that are
separately compiled and require a matching CRC implementation.
To prevent multiple copies of the CRC polynomial tables being introduced
into the postgres binaries, they are now included in the static library
libpgport that is mainly meant for replacement system functions. That
seems like a bit of a kludge, but there's no better place.
This cleans up building of the tools pg_controldata and pg_resetxlog,
which previously had to build their own copies of pg_crc.o.
In the future, external programs that need access to the CRC tables can
include the tables directly from the new header file pg_crc_tables.h.
Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane
Per recent work by Peter Geoghegan, it's significantly faster to
tuplesort on a single sortkey if ApplySortComparator is inlined into
quicksort rather reached via a function pointer. It's also faster
in general to have a version of quicksort which is specialized for
sorting SortTuple objects rather than objects of arbitrary size and
type. This requires a couple of additional copies of the quicksort
logic, which in this patch are generate using a Perl script. There
might be some benefit in adding further specializations here too,
but thus far it's not clear that those gains are worth their weight
in code footprint.
This fixes a longstanding but up to now benign bug in the way pg_dumpall
was built. The bug was exposed by recent code adjustments. The Makefile
does not use $(OBJS) to build pg_dumpall, so this fix removes their source
files from the pg_dumpall object and adds in the one source file it
consequently needs.
distro version of perl.
David Wheeler and Alex Hunsaker.
Backpatch to 9.1 where it applies cleanly. A simple workaround is available for earlier
branches, and further effort doesn't seem warranted.
We should generally use left-recursion not right-recursion to parse lists.
Bison hasn't got any built-in way to check for this type of inefficiency,
and I didn't find anything on the net in a quick search, so I wrote a
little Perl script to do it. Add to src/tools/ so we don't have to
re-invent this wheel next time we wonder if we're doing anything stupid.
Currently, the only place that seems to need fixing is plpgsql's stmt_else
production, so the problem doesn't appear to be common enough to warrant
trying to include such a test in our standard build process. If we did
want to do that, we'd need a way to ignore some false positives, such as
a_expr := '-' a_expr
Add option for parallel streaming of the transaction log while a
base backup is running, to get the logfiles before the server has
removed them.
Also add a tool called pg_receivexlog, which streams the transaction
log into files, creating a log archive without having to wait for
segments to complete, thus decreasing the window of data loss without
having to waste space using archive_timeout. This works best in
combination with archive_command - suggested usage docs etc coming later.
Use the CommitDate not the AuthorDate, as the former is representative of
the order in which things went into the main repository, and the latter
isn't very; we now have instances where the AuthorDate is as much as a
month before the patch really went in. Also, get rid of the "commit order
inversions" heuristic, which turns out not to do anything very desirable.
Instead we just print commits in strict timestamp order, interpreting the
"timestamp" of a merged commit as its timestamp on the newest branch it
appears in. This fixes some cases where very ancient commits were being
printed relatively early in the report.
on Windows. ecpglib doesn't link with libpgport, but picks and compiles
the .c files it needs individually. To cope with that, move the setlocale()
wrapper from chklocale.c to a separate setlocale.c file, and include that
in ecpglib.
their own.
Avoid compile problems with defines being redefined after the removal of
the #if blocks.
Change script to use shell functions for simplicity.
In the process, remove almost all knowledge of individual .y and .l files,
and instead get invocation settings from the relevant make files.
The exception is plpgsql's gram.y, which has a target with a different
name. It is hoped that this will make the scripts more future-proof,
so that they won't require adjustment every time we add a new .l or .y
file.
The logic is also notably less tortured than that forced on us
by the idiosyncrasies of the Windows command processor.
The .bat files are kept as thin wrappers for the perl scripts.
The old .bat file wasn't working for reasons that are unclear, and
which it did not seem worth the trouble to ascertain.
The new perl script has been tested and is known to work.
Soon it will be tested regularly on the buildfarm.
The .bat file is kept as a simple wrapper for the perl script.
The MSVC compiler complains if a macro is called with less arguments
than its definition provides for. flex generates a macro with one
argument for yywrap, but only supplies the argument for reentrant
scanners, so we remove the useless argument in the non-reentrant
case to silence the warning.
Teach the program and script to deal with OID-array referencing columns,
which we now have several of. Also, modify the recommended usage process
to specify that the program should be run against the regression database
rather than template1. This lets it find numerous joins that cannot be
found in template1 because the relevant catalogs are entirely empty.
Together these changes add seventeen formerly-missed cases to the oidjoins
regression test.
Since both tarballs and git now result in a "postgresql" directory
rather than a "pgsql" directory, adjust the example configuration to
look for the former.
... for some value of "properly". Instead of overriding REGRESS_OPTS,
set the variables ENCODING and NO_LOCALE, which is more expressive and
allows overriding by the user. Fix vcregress.pl to handle that.
The original scheme for this was to symlink plpython.$DLSUFFIX to
plpython2.$DLSUFFIX, but that doesn't work on Windows, and only
accidentally failed to fail because of the way that CREATE LANGUAGE created
or didn't create new C functions. My changes of yesterday exposed the
weakness of that approach. To fix, get rid of the symlink and make
pg_pltemplate show what's really going on.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them. However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.
catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.
Add support for regression testing these as extensions not bare
languages.
Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.
Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
This provides a separate exception class for each error code that the
backend defines, as well as the ability to get the SQLSTATE from the
exception object.
Jan Urbański, reviewed by Steve Singer
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
This fixes make distprep, and seems more robust in other ways as well.
Some special handling is required because errcodes.txt is needed by
some stuff in src/port, but just by src/backend as is the case for the
other generated headers.
While I'm at it, fix a few other things that were overlooked in the
original patch.
src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a
big chunk of errcodes.sgml are now automatically generated from a single
file, src/backend/utils/errcodes.txt.
Jan Urbański, reviewed by Tom Lane.
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.
There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.
In passing, make the checkpoint request optionally
fast instead of hardcoding it.
Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine
Makes it easier to parse mainly the BASE_BACKUP command
with it's options, and avoids having to manually deal
with quoted identifiers in the label (previously broken),
and makes it easier to add new commands and options in
the future.
In passing, refactor the case statement in the walsender
to put each command in it's own function.
This is slower than the original coding but avoids the problem of
including files in an unpredictable order. Aside from being more
trustworthy, we can get rid of some exclusions that were formerly
made for what turn out to be ordering or re-inclusion problems.
I also modified it to include libpq's exported files in the check.
ecpg should be included as well, but I'm unclear on which ecpg .h
files are meant to be included by clients.
Replace for loops in makefiles with proper dependencies. Parallel
make can now span across directories. Also, make -k and make -q work
properly.
GNU make 3.80 or newer is now required.
Look only at the non-localized part of the output from "vcbuild /?",
which is used to determine the version of Visual Studio in use. Different
languages seem to localize different amounts of the string, but we assume
the part "Microsoft Visual C++" won't be modified.
1. Resurrect the behavior where old commits on master will have Branch:
labels for branches sprouted after the commit was made. I'm still
dubious about this mode, but if you want it, say --post-date or -p.
2. Annotate the Branch: labels with the release or branch in which the
commit was publicly released. For example, on a release branch you could
see
Branch: REL8_3_STABLE Release: REL8_3_2 [92c3a8004] 2008-03-29 00:15:37 +0000
showing that the fix was released in 8.3.2. Commits on master will
usually instead have notes like
Branch: master Release: REL8_4_BR [6fc9d4272] 2008-03-29 00:15:28 +0000
showing that this commit is ancestral to release branches 8.4 and later.
If no Release: marker appears, the commit hasn't yet made it into any
release.
3. Add support for release branches older than 7.4.
4. The implementation is improved by running git log on each branch only
back to where the branch sprouts from master. This saves a good deal
of time (about 50% of the runtime when generating the complete history).
We generate the post-date-mode tags via a direct understanding that
they should be applied to master commits made before the branch sprouted,
rather than backing into them via matching (which isn't any too
reliable when people used identical log messages for successive commits).
1. Don't assume there's only one candidate match; check them all and use the
one with the closest timestamp. Avoids funny output when someone makes
several successive commits with the same log message, as certain people
have been known to do.
2. When the same commit (with the same SHA1) is reachable from multiple
branch tips, don't report it for all the branches; instead report it only
for the first such branch. Given our development practices, this case
arises only for commits that occurred before a given branch split off from
master. The original coding blamed old commits on *all* the branches,
which isn't terribly useful; the new coding blames such a commit only on
master.
1. Don't forget the last (oldest) commit on the oldest branch.
2. When considering which commit to print next, if two alternatives have
the same "distortion" score (which is actually the normal case, since
generally the "distortion" is 0), then choose the later timestamp to
print first. I don't know where Robert got the idea to ignore timestamps
and sort by branch age, but it wasn't a good idea: the resulting ordering
of commits was just plain bizarre anywhere that some branches had many
fewer commits than others, which is the typical situation for us.
Avoid depending on Date::Calc, which isn't in a basic Perl installation,
when we can equally well use Time::Local which is. Also fix the parsing
of timestamps to take heed of the timezone. (It looks like cvs2git emitted
all commit timestamps with zone GMT, so this refinement might've looked
unnecessary when looking at converted data; but it's needed now.)
Fix parsing of message bodies so that blank lines that may or may not get
emitted by "git log" aren't confused with real data. This avoids strange
formatting of the oldest commit on a branch.
Check child-process exit status, so that we actually notice if "git log"
fails, and so that we don't accumulate zombie children.
This script is intended to substitute for cvs2cl in generating release
notes and scrutinizing what got back-patched to which branches.
Script by me. Support for --since by Alex Hunsaker.
wait until it is set. Latches can be used to reliably wait until a signal
arrives, which is hard otherwise because signals don't interrupt select()
on some platforms, and even when they do, there's race conditions.
On Unix, latches use the so called self-pipe trick under the covers to
implement the sleep until the latch is set, without race conditions. On
Windows, Windows events are used.
Use the new latch abstraction to sleep in walsender, so that as soon as
a transaction finishes, walsender is woken up to immediately send the WAL
to the standby. This reduces the latency between master and standby, which
is good.
Preliminary work by Fujii Masao. The latch implementation is by me, with
helpful comments from many people.
linking both executables and shared libraries, and we add on LDFLAGS_EX when
linking executables or LDFLAGS_SL when linking shared libraries. This
provides a significantly cleaner way of dealing with link-time switches than
the former behavior. Also, make sure that the various platform-specific
%.so: %.o rules incorporate LDFLAGS and LDFLAGS_SL; most of them missed that
before. (I did not add these variables for the platforms that invoke $(LD)
directly, however. It's not clear if we can do that safely, since for the
most part we assume these variables use CC command-line syntax.)
Per gripe from Aaron Swenson and subsequent investigation.
prefix, instead of assuming it will always be following the default layout.
All information we need is not available on Windows, but the number of
assumptions are at least fewer this way than before.
Based on suggestions from James William Pye.
and allow using config.pl to override the defaults. config.pl is removed from
the repository, so changes there will no longer show up when doing diff, and
will not prevent switching branches and such things.
config.pl would normally be used to override single values, but if an
old-style config.pl is read, it will override the entire default configuration,
making it backwards compatible.
symbols both using __declspec(dllexport) (via the PGDLLIMPORT macro) and using
full-dll-export. This works without warning on Win32, but not on Win64.
In passing, fix the fact that the framework could never deal with more than
one disbled linker warning - because MSVC wants commas between linker warnings,
and semicolons between compiler warnings...
pg_attribute, by having genbki.pl derive the information from the various
catalog header files. This greatly simplifies modification of the
"bootstrapped" catalogs.
This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
Perl scripts for those build steps. To avoid creating a Perl build dependency
where there was not one before, the output files generated by these scripts
are now treated as distprep targets, ie, they will be built and shipped in
tarballs. But you will need a reasonably modern Perl (probably at least
5.6) if you want to build from a CVS pull.
The changes to the MSVC build process are untested, and may well break ---
we'll soon find out from the buildfarm.
John Naylor, based on ideas from Robert Haas and others
directly. This was a lot of trouble, but should be worth it in terms of
not having to keep the plpgsql lexer in step with core anymore. In addition
the handling of keywords is significantly better-structured, allowing us to
de-reserve a number of words that plpgsql formerly treated as reserved.
hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs
that have handmade type rows in pg_type.h. Give pg_database such an OID.
Restore the availability of C macros for the rowtype OIDs of the bootstrapped
catalogs. (These macros are now in the individual catalogs' .h files,
though, not in pg_type.h.)
This commit doesn't do anything especially useful by itself, but it's
necessary infrastructure for reverting some ill-considered changes in
relcache.c.
Test coverage support now covers the entire source tree, including
contrib, instead of just src/backend. In a related but independent
development, the commands make coverage and make coverage-html can be run
in any directory.
This turned out to be much easier than feared. Besides a few ad hoc fixes
to pass the make target down the tree, change all affected makefiles to
list their directories in the SUBDIRS variable, changed from variants like
DIRS and WANTED_DIRS. MSVC build fix was attempted as well.
and extend configure to test for it properly instead of hard-wiring
an assumption that everybody but Windows has the rand48 functions.
(We do cheat to the extent of assuming that probing for erand48 will do
for the entire rand48 family.)
erand48() is unused as of this commit, but a followon patch will cause
GEQO to depend on it.
Andres Freund, additional hacking by Tom
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright. You still
(probably) need extern "C" { }; around the inclusion of backend headers.
based on a patch by Kurt Harriman <harriman@acm.org>
Also add a script cpluspluscheck to check for C++ compatibility in the
future. As of right now, this passes without error for me.
keyword lists in gram.y and kwlist.h. It checks that all lists are in
alphabetical order, and that all keywords present in gram.y are listed
in kwlist.h in the right category, and that all keywords in kwlist.h are
also in gram.y. What's still missing is to check that all keywords
defined with "%token <keyword>" in gram.y are present in one of the
keyword lists in gram.y.
Also, if linked against other versions than the default MSVCRT library
(for example the MSVC build which links against MSVCRT80), also update
the cache in the default MSVCRT at the same time.
This should fix the issues with setting LC_MESSAGES on the MSVC build.
Original patch from Hiroshi Inoue and Hiroshi Saito, much rewritten
by me.
the * character at the beginning of a pattern, and it does not match
subdomains.
Since this means we no longer need fnmatch, remove the imported implementation
from port, along with the autoconf check for it.
the postgres.bki file during build, because we want that file to be entirely
platform- and configuration-independent; else it can't safely be put into
/usr/share on multiarch machines. We can do the substitution during initdb,
instead. FLOAT4PASSBYVAL and FLOAT8PASSBYVAL are new breakage as of 8.4,
while the NAMEDATALEN hazard has been there all along but I guess no one
tripped over it. Noticed while trying to build "universal" OS X binaries.
in pg_proc. Also make it not emit duplicate extern declarations, and make it
a bit more bulletproof in some other small ways. Likewise fix the equally
hard-wired, and utterly undocumented, knowledge in the MSVC build scripts.
For testing purposes and perhaps other uses in future, pull out that portion
of the MSVC scripts into a standalone perl script equivalent to
Gen_fmgrtab.sh, and make it generate actually identical output, rather than
just more-or-less-the-same output.
Motivated by looking at Pavel's variadic function patch. Whether or not
that gets accepted, we can be sure that pg_proc's column set will change
again in the future; it's time to not have to deal with this gotcha.
"make all", and then reference them there during the actual tests. This
makes the handling of these files more parallel to that of regress.so,
and in particular simplifies use of the regression tests outside the
original build tree. The PGDG and Red Hat RPMs have been doing this via
patches for a very long time. Inclusion of the change in core was requested
by Jørgen Austvik of Sun, and I can't see any reason not to.
I attempted to fix the MSVC scripts for this too, but they may need
further tweaking ...
any hardcoding of those options. Along the way, reorder the expression used to
calculate RELSEG_SIZE to make it slightly clearer. For now wal_segsize is only
allowed to have a value of 1 on Windows - we can relax that when we get full
large file support in the backend.
where Datum is 8 bytes wide. Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior. Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.
Zoltan Boszormenyi, plus configurability stuff by me.
data structures and backend internal APIs. This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t. Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.
There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL). I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk. time_t should
be avoided for anything visible to extension modules, however.
in .bat simply did not work, and it called them in the wrong order,
some several times, and some not at all. So this unrolls all subroutine
calls.
This should fix the issues with clean deleting the wrong files reported
by Dave Page.
While at it, add the "clean dist" option to act like "make distclean",
and no longer remove the flex/bison output files by default. This shuold
fix the problem reported by Pavel Golub in bug #3909.
*just* libpq ... its not perfect, as it pulls in more files then is
necessarily required to build, but as it is, it requires one simple patch
to configure.in in order to work ...
Tested on FreeBSD ... patch for configure.in hasn't been applied, but
putting the script in place so that it doesn't get lost ...
categories, as per discussion. asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case. But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before. This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages. In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words". The hyphenated-word categories are adjusted
similarly.