Commit Graph

40261 Commits

Author SHA1 Message Date
Tom Lane f73b2bbbdc Fix poorly thought-through code from commit 5c3c3cd0a3.
It's not entirely clear to me whether PyString_AsString can return
null (looks like the answer might vary between Python 2 and 3).
But in any case, this code's attempt to cope with the possibility
was quite broken, because pstrdup() neither allows a null argument
nor ever returns a null.

Moreover, the code below this point assumes that "message" is a
palloc'd string, which would not be the case for a dgettext result.

Fix both problems by doing the pstrdup step separately.
2016-04-11 00:28:44 -04:00
Tom Lane 074050f16a pg_dump: add missing "destroyPQExpBuffer(query)" in dumpForeignServer().
Coverity complained about this resource leak (why now, I don't know,
since it's been like that a long time).  Our general policy in pg_dump
is that PQExpBuffers are worth cleaning up, so do it here too.  But
don't bother with a back-patch, because it seems unlikely that very
many databases contain enough FOREIGN SERVER objects to notice.
2016-04-11 00:00:08 -04:00
Tom Lane 1630f5b92a Add comment about intentional fallthrough in switch.
Coverity complained about an apparent missing "break" in a switch
added by bb140506df.  The human-readable comments are pretty
clear that this is intentional, but add a standard /* FALL THRU */
comment to make it clear to tools too.
2016-04-10 23:52:34 -04:00
Tom Lane 5306df2831 Clean up foreign-key caching code in planner.
Coverity complained that the code added by 015e88942a lacked an
error check for SearchSysCache1 failures, which it should have.  But
the code was pretty duff in other ways too, including failure to think
about whether it could really cope with arrays of different lengths.
2016-04-10 23:47:30 -04:00
Tom Lane 7e3bb08038 Fix access-to-already-freed-memory issue in plpython's error handling.
PLy_elog() could attempt to access strings that Python had already freed,
because the strings that PLy_get_spi_error_data() returns are simply
pointers into storage associated with the error "val" PyObject.  That's
fine at the instant PLy_get_spi_error_data() returns them, but just after
that PLy_traceback() intentionally releases the only refcount on that
object, allowing it to be freed --- so that the strings we pass to
ereport() are dangling pointers.

In principle this could result in garbage output or a coredump.  In
practice, I think the risk is pretty low, because there are no Python
operations between where we decrement that refcount and where we use the
strings (and copy them into PG storage), and thus no reason for Python
to recycle the storage.  Still, it's clearly hazardous, and it leads to
Valgrind complaints when running under a Valgrind that hasn't been
lobotomized to ignore Python memory allocations.

The code was a mess anyway: we fetched the error data out of Python
(clearing Python's error indicator) with PyErr_Fetch, examined it, pushed
it back into Python with PyErr_Restore (re-setting the error indicator),
then immediately pulled it back out with another PyErr_Fetch.  Just to
confuse matters even more, there were some gratuitous-and-yet-hazardous
PyErr_Clear calls in the "examine" step, and we didn't get around to doing
PyErr_NormalizeException until after the second PyErr_Fetch, making it even
less clear which object was being manipulated where and whether we still
had a refcount on it.  (If PyErr_NormalizeException did substitute a
different "val" object, it's possible that the problem could manifest for
real, because then we'd be doing assorted Python stuff with no refcount
on the object we have string pointers into.)

So, rearrange all that into some semblance of sanity, and don't decrement
the refcount on the Python error objects until the end of PLy_elog().
In HEAD, I failed to resist the temptation to reformat some messy bits
from 5c3c3cd0a3 along the way.

Back-patch as far as 9.2, because the code is substantially the same
that far back.  I believe that 9.1 has the bug as well; but the code
around it is rather different and I don't want to take a chance on
breaking something for what seems a low-probability problem.
2016-04-10 23:16:10 -04:00
Andres Freund 008608b9d5 Avoid the use of a separate spinlock to protect a LWLock's wait queue.
Previously we used a spinlock, in adition to the atomically manipulated
->state field, to protect the wait queue. But it's pretty simple to
instead perform the locking using a flag in state.

Due to 6150a1b0 BufferDescs, on platforms (like PPC) with > 1 byte
spinlocks, increased their size above 64byte. As 64 bytes are the size
we pad allocated BufferDescs to, this can increase false sharing;
causing performance problems in turn. Together with the previous commit
this reduces the size to <= 64 bytes on all common platforms.

Author: Andres Freund
Discussion: CAA4eK1+ZeB8PMwwktf+3bRS0Pt4Ux6Rs6Aom0uip8c6shJWmyg@mail.gmail.com
    20160327121858.zrmrjegmji2ymnvr@alap3.anarazel.de
2016-04-10 20:12:32 -07:00
Andres Freund 48354581a4 Allow Pin/UnpinBuffer to operate in a lockfree manner.
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths.  There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.

As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.

This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me.  Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.

On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.

Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
2016-04-10 20:12:32 -07:00
Tom Lane cf223c3bf5 Improve contrib/bloom regression test using code coverage info.
Originally, this test created a 100000-row test table, which made it
run rather slowly compared to other contrib tests.  Investigation with
gcov showed that we got no further improvement in code coverage after
the first 700 or so rows, making the large table 99% a waste of time.
Cut it back to 2000 rows to fix the runtime problem and still leave
some headroom for testing behaviors that may appear later.

A closer look at the gcov results showed that the main coverage
omissions in contrib/bloom occurred because the test never filled more
than one entry in the notFullPage array; which is unsurprising because
it exercised index cleanup only in the scenario of complete table
deletion, allowing every page in the index to become deleted rather
than not-full.  Add testing that allows the not-full path to be
exercised as well.

Also, test the amvalidate function, because blvalidate.c had zero
coverage without that, and besides it's a good idea to check for
mistakes in the bloom opclass definitions.
2016-04-10 13:12:24 -04:00
Alvaro Herrera bd905a0d04 Fix possible NULL dereference in ExecAlterObjectDependsStmt
I used the wrong variable here.  Doesn't make a difference today because
the only plausible caller passes a non-NULL variable, but someday it
will be wrong, and even today's correctness is subtle: the caller that
does pass a NULL is never invoked because of object type constraints.
Surely not a condition to rely on.

Noted by Coverity
2016-04-10 11:03:35 -03:00
Tom Lane 660d5fb856 Further minor improvement in generic_xlog.c: always say REGBUF_STANDARD.
Since we're requiring pages handled by generic_xlog.c to be standard
format, specify REGBUF_STANDARD when doing a full-page image, so that
xloginsert.c can compress out the "hole" between pd_lower and pd_upper.
Given the current API in which this path will be taken only for a newly
initialized page, the hole is likely to be particularly large in such
cases, so that this oversight could easily be performance-significant.
I don't notice any particular change in the runtime of contrib/bloom's
regression test, though.
2016-04-10 00:24:28 -04:00
Tom Lane 68689c66ef Micro-optimize GenericXLogFinish().
Make the inner comparison loops of computeDelta() as tight as possible by
pulling considerations of valid and invalid ranges out of the inner loops,
and extending a match or non-match detection as far as possible before
deciding what to do next.  To keep this tractable, give up the possibility
of merging fragments across the pd_lower to pd_upper gap.  The fraction of
pages where that could happen (ie, there are 4 or fewer bytes in the gap,
*and* data changes immediately adjacent to it on both sides) is too small
to be worth spending cycles on.

Also, avoid two BLCKSZ-length memcpy()s by computing the delta before
moving data into the target buffer, instead of after.  This doesn't save
nearly as many cycles as being tenser about computeDelta(), but it still
seems worth doing.

On my machine, this patch cuts a full 40% off the runtime of
contrib/bloom's regression test.
2016-04-09 19:30:56 -04:00
Tom Lane c7a141a986 Fix PL/Python ereport() test to work on Python 2.3.
Per buildfarm.

Pavel Stehule
2016-04-09 16:44:54 -04:00
Tom Lane 08e785436f Get rid of GenericXLogUnregister().
This routine is unsafe as implemented, because it invalidates the page
image pointers returned by previous GenericXLogRegister() calls.

Rather than complicate the API or the implementation to avoid that,
let's just get rid of it; the use-case for having it seems much
too thin to justify a lot of work here.

While at it, do some wordsmithing on the SGML docs for generic WAL.
2016-04-09 16:39:30 -04:00
Tom Lane 80cf18910c Get rid of blinsert()'s use of GenericXLogUnregister().
That routine is dangerous, and unnecessary once we get rid of this
one caller.

In passing, fix failure to clean up temp memory context, or switch
back to caller's context, during slowest exit path.
2016-04-09 15:39:14 -04:00
Tom Lane db03cf375d Code review/prettification for generic_xlog.c.
Improve commentary, use more specific names for the delta fields,
const-ify pointer arguments where possible, avoid assuming that
initializing only the first element of a local array will guarantee
that the remaining elements end up as we need them.  (I think that
code in generic_redo actually worked, but only because InvalidBuffer
is zero; this is a particularly ugly way of depending on that ...)
2016-04-09 15:02:19 -04:00
Tom Lane 2dd318d277 Run pgindent on generic_xlog.c.
This code desperately needs some micro-optimization, and I'd like it
to be formatted a bit more nicely while I work on it.
2016-04-09 13:33:33 -04:00
Kevin Grittner 381200be4b Fix typo in C comment. 2016-04-09 09:07:42 -05:00
Kevin Grittner 56dffb5a73 Turn special page pointer validation to static inline function
Inclusion of multiple macros inside another macro was pushing MSVC
past its size liimit.  Reported by buildfarm.
2016-04-09 08:17:22 -05:00
Alvaro Herrera 1ff3f420d4 Move \crosstabview regression tests to a separate file
It cannot run in the same parallel group as misc, because it creates a
table which is unpredictably visible in that test.

Per buildfarm member crake.
2016-04-08 23:42:24 -03:00
Alvaro Herrera c09b18f21c Support \crosstabview in psql
\crosstabview is a completely different way to display results from a
query: instead of a vertical display of rows, the data values are placed
in a grid where the column and row headers come from the data itself,
similar to a spreadsheet.

The sort order of the horizontal header can be specified by using
another column in the query, and the vertical header determines its
ordering from the order in which they appear in the query.

This only allows displaying a single value in each cell.  If more than
one value correspond to the same cell, an error is thrown.  Merging of
values can be done in the query itself, if necessary.  This may be
revisited in the future.

Author: Daniel Verité
Reviewed-by: Pavel Stehule, Dean Rasheed
2016-04-08 20:23:18 -03:00
Kevin Grittner 279d86afdb Add snapshot_too_old to NSVC @contrib_excludes
The buildfarm showed failure for Windows MSVC builds due to this
omission.  This might not be the only problem with the Makefile for
this feature, but hopefully this will get it past the immediate
problem.

Fix suggested by Tom Lane
2016-04-08 17:22:21 -05:00
Andres Freund c1ddd2361f Expose more out/readfuncs support functions.
Previously bcac23d exposed a subset of support functions, namely the
ones Kaigai found useful. In
20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de I mentioned that
there's some functions missing to use the facility in an external
project.

To avoid having to add functions piecemeal, add all the functions which
are used to define READ_* and WRITE_* macros; users of the extensible
node functionality are likely to need these. Additionally expose
outDatum(), which doesn't have it's own WRITE_ macro, as it needs
information from the embedding struct.

Discussion: 20160304193704.elq773pyg5fyl3mi@alap3.anarazel.de
2016-04-08 14:26:36 -07:00
Stephen Frost 7a542700df Create default roles
This creates an initial set of default roles which administrators may
use to grant access to, historically, superuser-only functions.  Using
these roles instead of granting superuser access reduces the number of
superuser roles required for a system.  Documention for each of the
default roles has been added to user-manag.sgml.

Bump catversion to 201604082, as we had a commit that bumped it to
201604081 and another that set it back to 201604071...

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Stephen Frost 293007898d Reserve the "pg_" namespace for roles
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.

This will allow for default roles to be provided at initdb time.

Reviews by José Luis Tallón and Robert Haas
2016-04-08 16:56:27 -04:00
Stephen Frost fa6075e551 Fix improper usage of 'dump' bitmap
Now that 'dump' is a bitmap, we can't simply set it to 'true'.

Noticed while debugging the prior issue.
2016-04-08 16:30:02 -04:00
Kevin Grittner 848ef42bb8 Add the "snapshot too old" feature
This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
2016-04-08 14:36:30 -05:00
Kevin Grittner 8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Stephen Frost 689f9a0588 In dumpTable, re-instate the skipping logic
Pretty sure I removed this based on some incorrect thinking that it was
no longer possible to reach this point for a table which will not be
dumped, but that's clearly wrong.

Pointed out on IRC by Erik Rijkers.
2016-04-08 15:00:44 -04:00
Teodor Sigaev 8b99edefca Revert CREATE INDEX ... INCLUDING ...
It's not ready yet, revert two commits
690c543550 - unstable test output
386e3d7609 - patch itself
2016-04-08 21:52:13 +03:00
Magnus Hagander 35e2e357cb Add authentication parameters compat_realm and upn_usename for SSPI
These parameters are available for SSPI authentication only, to make
it possible to make it behave more like "normal gssapi", while
making it possible to maintain compatibility.

compat_realm is on by default, but can be turned off to make the
authentication use the full Kerberos realm instead of the NetBIOS name.

upn_username is off by default, and can be turned on to return the users
Kerberos UPN rather than the SAM-compatible name (a user in Active
Directory can have both a legacy SAM-compatible username and a new
Kerberos one. Normally they are the same, but not always)

Author: Christian Ullrich
Reviewed by: Robbie Harwood, Alvaro Herrera, me
2016-04-08 20:28:38 +02:00
Teodor Sigaev cb0c8cbf31 Fix possible use of uninitialised value in ts_headline()
Found during investigation of failure of skink buildfarm member and its
valgrind report.

Backpatch to all supported branches
2016-04-08 21:25:14 +03:00
Tom Lane 690c543550 Fix unstable regression test output.
Output order from the pg_indexes view might vary depending on the
phase of the moon, so add ORDER BY to ensure stable results of tests
added by commit 386e3d7609.
Per buildfarm.
2016-04-08 14:15:20 -04:00
Peter Eisentraut 7c7d4fddab Distrust external OpenSSL clients; clear err queue
OpenSSL has an unfortunate tendency to mix per-session state error
handling with per-thread error handling.  This can cause problems when
programs that link to libpq with OpenSSL enabled have some other use of
OpenSSL; without care, one caller of OpenSSL may cause problems for the
other caller.  Backend code might similarly be affected, for example
when a third party extension independently uses OpenSSL without taking
the appropriate precautions.

To fix, don't trust other users of OpenSSL to clear the per-thread error
queue.  Instead, clear the entire per-thread queue ahead of certain I/O
operations when it appears that there might be trouble (these I/O
operations mostly need to call SSL_get_error() to check for success,
which relies on the queue being empty).  This is slightly aggressive,
but it's pretty clear that the other callers have a very dubious claim
to ownership of the per-thread queue.  Do this is both frontend and
backend code.

Finally, be more careful about clearing our own error queue, so as to
not cause these problems ourself.  It's possibly that control previously
did not always reach SSLerrmessage(), where ERR_get_error() was supposed
to be called to clear the queue's earliest code.  Make sure
ERR_get_error() is always called, so as to spare other users of OpenSSL
the possibility of similar problems caused by libpq (as opposed to
problems caused by a third party OpenSSL library like PHP's OpenSSL
extension).  Again, do this is both frontend and backend code.

See bug #12799 and https://bugs.php.net/bug.php?id=68276

Based on patches by Dave Vitek and Peter Eisentraut.

From: Peter Geoghegan <pg@bowt.ie>
2016-04-08 14:11:56 -04:00
Tom Lane 34c33a1f00 Add BSD authentication method.
Create a "bsd" auth method that works the same as "password" so far as
clients are concerned, but calls the BSD Authentication service to
check the password.  This is currently only available on OpenBSD.

Marisa Emerson, reviewed by Thomas Munro
2016-04-08 13:52:06 -04:00
Robert Haas af025eed53 Add combine functions for various floating-point aggregates.
This allows parallel aggregation to use them.  It may seem surprising
that we use float8_combine for both float4_accum and float8_accum
transition functions, but that's because those functions differ only
in the type of the non-transition-state argument.

Haribabu Kommi, reviewed by David Rowley and Tomas Vondra
2016-04-08 13:47:06 -04:00
Teodor Sigaev 38627f6878 Fix output of regression test of contrib/tsearch2
Just forget to add in 1ec4c7c055
2016-04-08 20:37:12 +03:00
Teodor Sigaev 1ec4c7c055 Restore original tsquery operation numbering.
As noticed by Tom Lane changing operation's number in commit
bb140506df causes on-disk format incompatibility.
Revert to previous numbering, that is reason to add special array to store
priorities of operation. Also it reverts order of tsquery to previous.

Author: Dmitry Ivanov
2016-04-08 20:11:30 +03:00
Andrew Dunstan 76a1c97bf2 Silence warning from modern perl about unescaped braces 2016-04-08 12:50:30 -04:00
Teodor Sigaev 386e3d7609 CREATE INDEX ... INCLUDING (column[, ...])
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.

Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
2016-04-08 19:45:59 +03:00
Peter Eisentraut 339025c68f Replace printf format %i by %d
see also ce8d7bb644
2016-04-08 12:42:58 -04:00
Andrew Dunstan 01a07e6c11 Turn down MSVC compiler verbosity
Most of what is produced by the detailed verbosity level is of no
interest at all, so switch to the normal level for more usable output.

Christian Ullrich

Backpatch to all live branches
2016-04-08 12:37:20 -04:00
Peter Eisentraut 8b737f9084 Fix printf format 2016-04-08 12:34:33 -04:00
Tom Lane 93c301fc4f Fix multiple bugs in tablespace symlink removal.
Don't try to examine S_ISLNK(st.st_mode) after a failed lstat().
It's undefined.

Also, if the lstat() reported ENOENT, we do not wish that to be a hard
error, but the code might nonetheless treat it as one (giving an entirely
misleading error message, too) depending on luck-of-the-draw as to what
S_ISLNK() returned.

Don't throw error for ENOENT from rmdir(), either.  (We're not really
expecting ENOENT because we just stat'd the file successfully; but
if we're going to allow ENOENT in the symlink code path, surely the
directory code path should too.)

Generate an appropriate errcode for its-the-wrong-type-of-file complaints.
(ERRCODE_SYSTEM_ERROR doesn't seem appropriate, and failing to write
errcode() around it certainly doesn't work, and not writing an errcode
at all is not per project policy.)

Valgrind noticed the undefined S_ISLNK result; the other problems emerged
while reading the code in the area.

All of this appears to have been introduced in 8f15f74a44.
Back-patch to 9.5 where that commit appeared.
2016-04-08 12:31:53 -04:00
Robert Haas 752b948dfc Document which aggregates support partial mode.
David Rowley, reviewed by Tomas Vondra
2016-04-08 12:13:07 -04:00
Teodor Sigaev 5c3c3cd0a3 Enhanced custom error in PLPythonu
Patch adds a new, more rich,  way to emit error message or exception from
PL/Pythonu code.

Author: Pavel Stehule
Reviewers: Catalin Iacob, Peter Eisentraut, Jim Nasby
2016-04-08 18:33:06 +03:00
Andres Freund 5364b357fb Increase maximum number of clog buffers.
Benchmarking has shown that the current number of clog buffers limits
scalability. We've previously increased the number in 33aaa139, but
that's not sufficient with a large number of clients.

We've benchmarked the cost of increasing the limit by benchmarking worst
case scenarios; testing showed that 128 buffers don't cause a
regression, even in contrived scenarios, whereas 256 does

There are a number of more complex patches flying around to address
various clog scalability problems, but this is simple enough that we can
get it into 9.6; and is beneficial even after those patches have been
applied.

It is a bit unsatisfactory to increase this in small steps every few
releases, but a better solution seems to require a rewrite of slru.c;
not something done quickly.

Author: Amit Kapila and Andres Freund
Discussion: CAA4eK1+-=18HOrdqtLXqOMwZDbC_15WTyHiFruz7BvVArZPaAw@mail.gmail.com
2016-04-08 08:25:59 -07:00
Robert Haas 25fe8b5f1a Add a 'parallel_degree' reloption.
The code that estimates what parallel degree should be uesd for the
scan of a relation is currently rather stupid, so add a parallel_degree
reloption that can be used to override the planner's rather limited
judgement.

Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila,
and me.  Some further hacking by me.
2016-04-08 11:14:56 -04:00
Robert Haas b0b64f6505 Attempt to fix breakage due to declaration following code.
Per Tom Lane and the buildfarm.
2016-04-08 10:52:56 -04:00
Peter Eisentraut 2f1d2b7a75 Set PAM_RHOST item for PAM authentication
The PAM_RHOST item is set to the remote IP address or host name and can
be used by PAM modules.  A pg_hba.conf option is provided to choose
between IP address and resolved host name.

From: Grzegorz Sampolski <grzsmp@gmail.com>
Reviewed-by: Haribabu Kommi <kommi.haribabu@gmail.com>
2016-04-08 10:48:44 -04:00
Teodor Sigaev 4e55b3f033 Rename comparePos() to compareWordEntryPos()
Rename comparePos() to compareWordEntryPos() to prevent export of too
generic name.

Per gripe from Tom Lane.
2016-04-08 12:04:15 +03:00