Commit Graph

37850 Commits

Author SHA1 Message Date
Alvaro Herrera 0e5680f473 Grab heavyweight tuple lock only before sleeping
We were trying to acquire the lock even when we were subsequently
not sleeping in some other transaction, which opens us up unnecessarily
to deadlocks.  In particular, this is troublesome if an update tries to
lock an updated version of a tuple and finds itself doing EvalPlanQual
update chain walking; more than two sessions doing this concurrently
will find themselves sleeping on each other because the HW tuple lock
acquisition in heap_lock_tuple called from EvalPlanQualFetch races with
the same tuple lock being acquired in heap_update -- one of these
sessions sleeps on the other one to finish while holding the tuple lock,
and the other one sleeps on the tuple lock.

Per trouble report from Andrew Sackville-West in
http://www.postgresql.org/message-id/20140731233051.GN17765@andrew-ThinkPad-X230

His scenario can be simplified down to a relatively simple
isolationtester spec file which I don't include in this commit; the
reason is that the current isolationtester is not able to deal with more
than one blocked session concurrently and it blocks instead of raising
the expected deadlock.  In the future, if we improve isolationtester, it
would be good to include the spec file in the isolation schedule.  I
posted it in
http://www.postgresql.org/message-id/20141212205254.GC1768@alvh.no-ip.org

Hat tip to Mark Kirkwood, who helped diagnose the trouble.
2014-12-26 13:52:27 -03:00
Noah Misch 8d9cb0bc48 Have config_sspi_auth() permit IPv6 localhost connections.
Windows versions later than Windows Server 2003 map "localhost" to ::1.
Account for that in the generated pg_hba.conf, fixing another oversight
in commit f6dc6dd5ba.  Back-patch to 9.0,
like that commit.

David Rowley and Noah Misch
2014-12-25 13:52:03 -05:00
Andres Freund 740a4ec7f4 Blindly fix a dtrace probe in lwlock.c for a removed local variable.
Per buildfarm member locust.
2014-12-25 19:48:46 +01:00
Tom Lane 966115c305 Temporarily revert "Move pg_lzcompress.c to src/common."
This reverts commit 60838df922.
That change needs a bit more thought to be workable.  In view of
the potentially machine-dependent stuff that went in today,
we need all of the buildfarm to be testing those other changes.
2014-12-25 13:22:55 -05:00
Andres Freund d72731a704 Lockless StrategyGetBuffer clock sweep hot path.
StrategyGetBuffer() has proven to be a bottleneck in a number of
buffer acquisition heavy workloads. To some degree this has already
been alleviated by 5d7962c6, but it still can be quite a heavy
bottleneck.  The problem is that in unfortunate usage patterns a
single StrategyGetBuffer() call will have to look at a large number of
buffers - in turn making it likely that the process will be put to
sleep while still holding the spinlock.

Replace most of the usage of the buffer_strategy_lock spinlock for the
clock sweep by a atomic nextVictimBuffer variable. That variable,
modulo NBuffers, is the current hand of the clock sweep. The buffer
clock-sweep then only needs to acquire the spinlock after a
wraparound. And even then only in the process that did the wrapping
around. That alleviates nearly all the contention on the relevant
spinlock, although significant contention on the cacheline can still
exist.

Reviewed-By: Robert Haas and Amit Kapila

Discussion: 20141010160020.GG6670@alap3.anarazel.de,
    20141027133218.GA2639@awork2.anarazel.de
2014-12-25 18:26:25 +01:00
Andres Freund ab5194e6f6 Improve LWLock scalability.
The old LWLock implementation had the problem that concurrent lock
acquisitions required exclusively acquiring a spinlock. Often that
could lead to acquirers waiting behind the spinlock, even if the
actual LWLock was free.

The new implementation doesn't acquire the spinlock when acquiring the
lock itself. Instead the new atomic operations are used to atomically
manipulate the state. Only the waitqueue, used solely in the slow
path, is still protected by the spinlock. Check lwlock.c's header for
an explanation about the used algorithm.

For some common workloads on larger machines this can yield
significant performance improvements. Particularly in read mostly
workloads.

Reviewed-By: Amit Kapila and Robert Haas
Author: Andres Freund

Discussion: 20130926225545.GB26663@awork2.anarazel.de
2014-12-25 17:24:30 +01:00
Andres Freund 7882c3b0b9 Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it.
Besides being shorter and much easier to read it changes the logic in
LWLockRelease() to release all shared lockers when waking up any. This
can yield some significant performance improvements - and the fairness
isn't really much worse than before, as we always allowed new shared
lockers to jump the queue.
2014-12-25 17:24:30 +01:00
Andres Freund 570bd2b3fd Add capability to suppress CONTEXT: messages to elog machinery.
Hiding context messages usually is not a good idea - except for rather
verbose debugging/development utensils like LOG_DEBUG. There the
amount of repeated context messages just bloat the log without adding
information.
2014-12-25 17:24:30 +01:00
Fujii Masao 4a5593197b Remove duplicate include of slot.h.
Back-patch to 9.4, where this problem was added.
2014-12-25 22:47:53 +09:00
Fujii Masao 60838df922 Move pg_lzcompress.c to src/common.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error instead.

This commit is required for upcoming WAL compression feature so that
the WAL reader facility can decompress the WAL data by using pglz_decompress.

Michael Paquier
2014-12-25 20:46:14 +09:00
Tom Lane 5b89473d87 Add CST (China Standard Time) to our lists of timezone abbreviations.
For some reason this seems to have been missed when the lists in
src/timezone/tznames/ were first constructed.  We can't put it in Default
because of the conflict with US CST, but we should certainly list it among
the alternative entries in Asia.txt.  (I checked for other oversights, but
all the other abbreviations that are in current use according to the IANA
files seem to be accounted for.)  Noted while responding to bug #12326.
2014-12-24 16:35:23 -05:00
Andrew Dunstan 3f37b6c316 Fix installcheck case for tap tests 2014-12-24 10:31:36 -05:00
Bruce Momjian 83bcc70459 pgbench: remove odd trailing period in init progress output 2014-12-24 09:21:09 -05:00
Fujii Masao 3b6ca123b5 Remove unused fields from ReindexStmt.
fe263d1 changed the REINDEX logic so that those fields are not used at all,
but forgot to remove them.

Sawada Masahiko
2014-12-24 21:40:47 +09:00
Andres Freund cd5ebe1edd Suppress MSVC warning in typeStringToTypeName function.
MSVC doesn't realize ereport(ERROR) doesn't return.

David Rowley
2014-12-24 12:30:08 +01:00
Tom Lane 3e22753559 Remove failing collation case from object_address regression test.
Per buildfarm, this test case does not yield consistent results.
I don't think it's useful enough to figure out a workaround, either.
2014-12-23 16:55:51 -05:00
Alvaro Herrera a609d96778 Revert "Use a bitmask to represent role attributes"
This reverts commit 1826987a46.

The overall design was deemed unacceptable, in discussion following the
previous commit message; we might find some parts of it still
salvageable, but I don't want to be on the hook for fixing it, so let's
wait until we have a new patch.
2014-12-23 15:35:49 -03:00
Alvaro Herrera d7ee82e50f Add SQL-callable pg_get_object_address
This allows access to get_object_address from SQL, which is useful to
obtain OID addressing information from data equivalent to that emitted
by the parser.  This is necessary infrastructure of a project to let
replication systems propagate object dropping events to remote servers,
where the schema might be different than the server originating the
DROP.

This patch also adds support for OBJECT_DEFAULT to get_object_address;
that is, it is now possible to refer to a column's default value.

Catalog version bumped due to the new function.

Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
Freund, Abhijit Menon-Sen, Adam Brightwell.
2014-12-23 15:31:29 -03:00
Alvaro Herrera 1826987a46 Use a bitmask to represent role attributes
The previous representation using a boolean column for each attribute
would not scale as well as we want to add further attributes.

Extra auxilliary functions are added to go along with this change, to
make up for the lost convenience of access of the old representation.

Catalog version bumped due to change in catalogs and the new functions.

Author: Adam Brightwell, minor tweaks by Álvaro
Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
2014-12-23 10:22:09 -03:00
Alvaro Herrera 7eca575d1c get_object_address: separate domain constraints from table constraints
Apart from enabling comments on domain constraints, this enables a
future project to replicate object dropping to remote servers: with the
current mechanism there's no way to distinguish between the two types of
constraints, so there's no way to know what to drop.

Also added support for the domain constraint comments in psql's \dd and
pg_dump.

Catalog version bumped due to the change in ObjectType enum.
2014-12-23 09:06:44 -03:00
Peter Eisentraut 584e35d17c Change local_preload_libraries to PGC_USERSET
This allows it to be used with ALTER ROLE SET.

Although the old setting of PGC_BACKEND prevented changes after session
start, after discussion it was more useful to allow ALTER ROLE SET
instead and just document that changes during a session have no effect.
This is similar to how session_preload_libraries works already.

An alternative would be to change things to allow PGC_BACKEND and
PGC_SU_BACKEND settings to be changed by ALTER ROLE SET.  But that might
need further research (e.g., log_connections would probably not work).

based on patch by Kyotaro Horiguchi
2014-12-22 23:05:46 -05:00
Andrew Dunstan 2a3f2743f2 Further tidy up on json aggregate documentation 2014-12-22 18:30:46 -05:00
Andrew Dunstan b2d145bff0 Fix documentation of argument type of json_agg and jsonb_agg
json_agg was originally designed to aggregate records. However, it soon
became clear that it is useful for aggregating all kinds of values and
that's what we have on 9.3 and 9.4, and in head for it and jsonb_agg.
The documentation suggested otherwise, so this fixes it.
2014-12-22 14:12:06 -05:00
Heikki Linnakangas 955557ddcc Move rbtree.c from src/backend/utils/misc to src/backend/lib.
We have other general-purpose data structures in src/backend/lib, so it
seems like a better home for the red-black tree as well.
2014-12-22 17:52:08 +02:00
Heikki Linnakangas 7f0dccaed6 Turn much of the btree_gin macros into real functions.
This makes the functions much nicer to read and edit, and also makes
debugging easier.
2014-12-22 17:11:53 +02:00
Heikki Linnakangas e7032610f7 Use a pairing heap for the priority queue in kNN-GiST searches.
This performs slightly better, uses less memory, and needs slightly less
code in GiST, than the Red-Black tree previously used.

Reviewed by Peter Geoghegan
2014-12-22 12:05:57 +02:00
Tom Lane 699300a146 Docs: clarify treatment of variadic functions with zero variadic arguments.
Explain that you have to use "VARIADIC ARRAY[]" to pass an empty array
to a variadic parameter position.  This was already implicit in the text
but it seems better to spell it out.

Per a suggestion from David Johnston, though I didn't use his proposed
wording.  Back-patch to all supported branches.
2014-12-21 15:30:57 -05:00
Heikki Linnakangas 2ef6c66a2b Fix file descriptor leak at end of recovery.
XLogFileInit() returns a file descriptor, which needs to be closed. The leak
was short-lived, since the startup process exits shortly afterwards, but it
was clearly a bug, nevertheless.

Per Coverity report.
2014-12-21 21:51:59 +02:00
Bruce Momjian cfb64fde7b doc: Adjust wording of ALTER TABLESPACE restriction
Report by Noah Misch
2014-12-19 20:56:03 -05:00
Alvaro Herrera 0ee98d1cbf pg_event_trigger_dropped_objects: add behavior flags
Add "normal" and "original" flags as output columns to the
pg_event_trigger_dropped_objects() function.  With this it's possible to
distinguish which objects, among those listed, need to be explicitely
referenced when trying to replicate a deletion.

This is necessary so that the list of objects can be pruned to the
minimum necessary to replicate the DROP command in a remote server that
might have slightly different schema (for instance, TOAST tables and
constraints with different names and such.)

Catalog version bumped due to change of function definition.

Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas,
Robert Haas.
2014-12-19 15:00:45 -03:00
Heikki Linnakangas 5c805d0a81 Fix timestamp in end-of-recovery WAL records.
We used time(null) to set a TimestampTz field, which gave bogus results.
Noticed while looking at pg_xlogdump output.

Backpatch to 9.3 and above, where the fast promotion was introduced.
2014-12-19 17:04:20 +02:00
Andres Freund 37de8de9e3 Prevent potentially hazardous compiler/cpu reordering during lwlock release.
In LWLockRelease() (and in 9.4+ LWLockUpdateVar()) we release enqueued
waiters using PGSemaphoreUnlock(). As there are other sources of such
unlocks backends only wake up if MyProc->lwWaiting is set to false;
which is only done in the aforementioned functions.

Before this commit there were dangers because the store to lwWaitLink
could become visible before the store to lwWaitLink. This could both
happen due to compiler reordering (on most compilers) and on some
platforms due to the CPU reordering stores.

The possible consequence of this is that a backend stops waiting
before lwWaitLink is set to NULL. If that backend then tries to
acquire another lock and has to wait there the list could become
corrupted once the lwWaitLink store is finally performed.

Add a write memory barrier to prevent that issue.

Unfortunately the barrier support has been only added in 9.2. Given
that the issue has not knowingly been observed in praxis it seems
sufficient to prohibit compiler reordering using volatile for 9.0 and
9.1. Actual problems due to compiler reordering are more likely
anyway.

Discussion: 20140210134625.GA15246@awork2.anarazel.de
2014-12-19 14:29:52 +01:00
Andres Freund 9959abb012 Define Assert() et al to ((void)0) to avoid pedantic warnings.
gcc's -Wempty-body warns about the current usage when compiling
postgres without --enable-cassert.
2014-12-19 14:27:45 +01:00
Tom Lane 5b51683589 Improve documentation about CASE and constant subexpressions.
The possibility that constant subexpressions of a CASE might be evaluated
at planning time was touched on in 9.17.1 (CASE expressions), but it really
ought to be explained in 4.2.14 (Expression Evaluation Rules) which is the
primary discussion of such topics.  Add text and an example there, and
revise the <note> under CASE to link there.

Back-patch to all supported branches, since it's acted like this for a
long time (though 9.2+ is probably worse because of its more aggressive
use of constant-folding via replanning of nominally-prepared statements).
Pre-9.4, also back-patch text added in commit 0ce627d4 about CASE versus
aggregate functions.

Tom Lane and David Johnston, per discussion of bug #12273.
2014-12-18 16:39:20 -05:00
Alvaro Herrera cd6e66572b Use %u to print out BlockNumber variables
Per Tom Lane
2014-12-18 17:59:00 -03:00
Alvaro Herrera 35192f0626 Have VACUUM log number of skipped pages due to pins
Author: Jim Nasby, some kibitzing by Heikki Linnankangas.
Discussion leading to current behavior and precise wording fueled by
thoughts from Robert Haas and Andres Freund.
2014-12-18 17:18:33 -03:00
Tom Lane 4a14f13a0a Improve hash_create's API for selecting simple-binary-key hash functions.
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate.  Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions.  hash_create() itself will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing function
for 8-byte keys.  Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules.  Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.

Teodor Sigaev and Tom Lane
2014-12-18 13:36:36 -05:00
Heikki Linnakangas ba94518aad Change how first WAL segment on new timeline after promotion is created.
Two changes:

1. When copying a WAL segment from old timeline to create the first segment
on the new timeline, only copy up to the point where the timeline switch
happens, and zero-fill the rest. This avoids corner cases where we might
think that the copied WAL from the previous timeline belong to the new
timeline.

2. If the timeline switch happens at a segment boundary, don't copy the
whole old segment to the new timeline. It's pointless, because it's 100%
identical to the old segment.
2014-12-18 20:23:03 +02:00
Fujii Masao 38628db8d8 Add memory barriers for PgBackendStatus.st_changecount protocol.
st_changecount protocol needs the memory barriers to ensure that
the apparent order of execution is as it desires. Otherwise,
for example, the CPU might rearrange the code so that st_changecount
is incremented twice before the modification on a machine with
weak memory ordering. This surprising result can lead to bugs.

This commit introduces the macros to load and store st_changecount
with the memory barriers. These are called before and after
PgBackendStatus entries are modified or copied into private memory,
in order to prevent CPU from reordering PgBackendStatus access.

Per discussion on pgsql-hackers, we decided not to back-patch this
to 9.4 or before until we get an actual bug report about this.

Patch by me. Review by Robert Haas.
2014-12-18 23:07:51 +09:00
Fujii Masao 19e065c049 Ensure variables live across calls in generate_series(numeric, numeric).
In generate_series_step_numeric(), the variables "start_num"
and "stop_num" may be potentially freed until the next call.
So they should be put in the location which can survive across calls.
But previously they were not, and which could cause incorrect
behavior of generate_series(numeric, numeric). This commit fixes
this problem by copying them on multi_call_memory_ctx.

Andrew Gierth
2014-12-18 21:13:52 +09:00
Fujii Masao ccf292cd2e Update .gitignore for config.cache.
Also add a comment about why regreesion.* aren't listed in .gitignore.

Jim Nasby
2014-12-18 19:56:42 +09:00
Andres Freund 72950dc1d0 Adjust valgrind suppression to the changes in 2c03216d83.
CRC computation is now done in XLogRecordAssemble.
2014-12-18 10:45:57 +01:00
Noah Misch 43b56171b1 Recognize Makefile line continuations in fetchRegressOpts().
Back-patch to 9.0 (all supported versions).  This is mere
future-proofing in the context of the master branch, but commit
f6dc6dd5ba requires it of older branches.
2014-12-18 03:55:17 -05:00
Fujii Masao 26674c923d Remove odd blank line in comment.
Etsuro Fujita
2014-12-18 17:33:38 +09:00
Andres Freund c303e9e7e5 Fix (re-)starting from a basebackup taken off a standby after a failure.
When starting up from a basebackup taken off a standby extra logic has
to be applied to compute the point where the data directory is
consistent. Normal base backups use a WAL record for that purpose, but
that isn't possible on a standby.

That logic had a error check ensuring that the cluster's control file
indicates being in recovery. Unfortunately that check was too strict,
disregarding the fact that the control file could also indicate that
the cluster was shut down while in recovery.

That's possible when the a cluster starting from a basebackup is shut
down before the backup label has been removed. When everything goes
well that's a short window, but when either restore_command or
primary_conninfo isn't configured correctly the window can get much
wider. That's because inbetween reading and unlinking the label we
restore the last checkpoint from WAL which can need additional WAL.

To fix simply also allow starting when the control file indicates
"shutdown in recovery". There's nicer fixes imaginable, but they'd be
more invasive.

Backpatch to 9.2 where support for taking basebackups from standbys
was added.
2014-12-18 08:47:27 +01:00
Noah Misch 40c598fa15 Fix previous commit for TAP test suites in VPATH builds.
Per buildfarm member crake.  Back-patch to 9.4, where the TAP suites
were introduced.
2014-12-18 01:24:57 -05:00
Noah Misch f6dc6dd5ba Lock down regression testing temporary clusters on Windows.
Use SSPI authentication to allow connections exclusively from the OS
user that launched the test suite.  This closes on Windows the
vulnerability that commit be76a6d39e
closed on other platforms.  Users of "make installcheck" or custom test
harnesses can run "pg_regress --config-auth=DATADIR" to activate the
same authentication configuration that "make check" would use.
Back-patch to 9.0 (all supported versions).

Security: CVE-2014-0067
2014-12-17 22:48:40 -05:00
Tom Lane fc2ac1fb41 Allow CHECK constraints to be placed on foreign tables.
As with NOT NULL constraints, we consider that such constraints are merely
reports of constraints that are being enforced by the remote server (or
other underlying storage mechanism).  Their only real use is to allow
planner optimizations, for example in constraint-exclusion checks.  Thus,
the code changes here amount to little more than removal of the error that
was formerly thrown for applying CHECK to a foreign table.

(In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page,
which had accumulated some weird decisions about ordering etc.)

Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and
Ashutosh Bapat.
2014-12-17 17:00:53 -05:00
Heikki Linnakangas ce01548d4f Clarify the regexp used to detect source files in MSVC builds.
The old pattern would match files with strange extensions like *.ry or
*.lpp. Refactor it to only include files with known extensions, and to make
it more readable.

Per Andrew Dunstan's suggestion.
2014-12-17 21:55:26 +02:00
Tom Lane c340494235 Fix another poorly worded error message.
Spotted by Álvaro Herrera.
2014-12-17 13:22:07 -05:00