Commit Graph

15693 Commits

Author SHA1 Message Date
Fujii Masao 31b6606c48 Make concurrent refresh check early that there is a unique index on matview.
In REFRESH MATERIALIZED VIEW command, CONCURRENTLY option is only
allowed if there is at least one unique index with no WHERE clause on
one or more columns of the matview. Previously, concurrent refresh
checked the existence of a unique index on the matview after filling
the data to new snapshot, i.e., after calling refresh_matview_datafill().
So, when there was no unique index, we could need to wait a long time
before we detected that and got the error. It was a waste of time.

To eliminate such wasting time, this commit changes concurrent refresh
so that it checks the existence of a unique index at the beginning of
the refresh operation, i.e., before starting any time-consuming jobs.
If CONCURRENTLY option is not allowed due to lack of a unique index,
concurrent refresh can immediately detect it and emit an error.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Fujii Masao
2016-02-16 02:15:44 +09:00
Tom Lane 9b92e76f7b Make GetLockStatusData's header comment resemble reality.
The API spec for this function was changed completely (and for the better)
by commit 3cba8999b3, but it didn't bother
with anything as mundane as updating the comments.
2016-02-13 15:42:31 -05:00
Joe Conway 59a884e985 Change delimiter used for display of NextXID
NextXID has been rendered in the form of a pg_lsn even though it
really is not. This can cause confusion, so change the format from
%u/%u to %u:%u, per discussion on hackers.

Complaint by me, patch by me and Bruce, reviewed by Michael Paquier
and Alvaro. Applied to HEAD only.

Author: Joe Conway, Bruce Momjian
Reviewed-by: Michael Paquier, Alvaro Herrera
Backpatch-through: master
2016-02-12 14:23:59 -08:00
Robert Haas bcac23de73 Introduce extensible node types.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data.  Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise.  Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.

Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.

KaiGai Kohei, per a design suggested by me.  Reviewed by Andres Freund
and by me, and further edited by me.
2016-02-12 09:38:11 -05:00
Robert Haas 63461a63f9 Make builtin lwlock tranche names consistent.
Previously, we had a mix of styles.

Amit Kapila
2016-02-12 08:07:11 -05:00
Tom Lane f144f73242 Refactor check_functional_grouping() to use get_primary_key_attnos().
If we ever get around to allowing functional dependency to be proven
from other things besides simple primary keys, this code will need to
be rethought, but that was true anyway.  In the meantime, we might as
well not have two very-similar routines for scanning pg_constraint.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:52:03 -05:00
Tom Lane d4c3a156cb Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.

This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.

David Rowley, reviewed by Julien Rouhaud
2016-02-11 17:34:59 -05:00
Tom Lane 72eee410d4 Move pg_constraint.h function declarations to new file pg_constraint_fn.h.
A pending patch requires exporting a function returning Bitmapset from
catalog/pg_constraint.c.  As things stand, that would mean including
nodes/bitmapset.h in pg_constraint.h, which might be hazardous for the
client-side includability of that header.  It's not entirely clear whether
any client-side code needs to include pg_constraint.h, but it seems prudent
to assume that there is some such code somewhere.  Therefore, split off the
function definitions into a new file pg_constraint_fn.h, similarly to what
we've done for some other catalog header files.
2016-02-11 15:51:28 -05:00
Tom Lane 2564be360a Fix typo in comment. 2016-02-11 15:20:14 -05:00
Tom Lane d18643c4a6 Shift the responsibility for emitting "database system is shut down".
Historically this message has been emitted at the end of ShutdownXLOG().
That's not an insane place for it in a standalone backend, but in the
postmaster environment we've grown a fair amount of stuff that happens
later, including archiver/walsender shutdown, stats collector shutdown,
etc.  Recent buildfarm experimentation showed that on slower machines
there could be many seconds' delay between finishing ShutdownXLOG() and
actual postmaster exit.  That's fairly confusing, both for testing
purposes and for DBAs.  Hence, move the code that prints this message
into UnlinkLockFiles(), so that it comes out just after we remove the
postmaster's pidfile.  That is a more appropriate definition of "is shut
down" from the point of view of "pg_ctl stop", for example.  In general,
removing the pidfile should be the last externally-visible action of
either a postmaster or a standalone backend; compare commit
d73d14c271 for instance.  So this seems
like a reasonably future-proof approach.
2016-02-11 14:14:22 -05:00
Robert Haas c319991bca Use separate lwlock tranches for buffer, lock, and predicate lock managers.
This finishes the work - spread across many commits over the last
several months - of putting each type of lock other than the named
individual locks into a separate tranche.

Amit Kapila
2016-02-11 14:07:33 -05:00
Teodor Sigaev 07d25a964b Improve error reporting in format()
Clarify invalid format conversion type error message and add hint.

Author: Jim Nasby
2016-02-11 18:11:11 +03:00
Robert Haas a455878d99 Rename PGPROC fields related to group XID clearing again.
Commit 0e141c0fbb introduced a new
facility to reduce ProcArrayLock contention by clearing several XIDs
from the ProcArray under a single lock acquisition.  The names
initially chosen were deemed not to be very good choices, so commit
4aec49899e renamed them.  But now it
seems like we still didn't get it right.  A pending patch wants to
add similar infrastructure for batching CLOG updates, so the names
need to be clear enough to allow a new set of structure members with
a related purpose.

Amit Kapila
2016-02-11 08:55:24 -05:00
Tom Lane 51e78ab4ff Avoid use of sscanf() to parse ispell dictionary files.
It turns out that on FreeBSD-derived platforms (including OS X), the
*scanf() family of functions is pretty much brain-dead about multibyte
characters.  In particular it will apply isspace() to individual bytes
of input even when those bytes are part of a multibyte character, thus
allowing false recognition of a field-terminating space.

We appear to have little alternative other than instituting a coding
rule that *scanf() is not to be used if the input string might contain
multibyte characters.  (There was some discussion of relying on "%ls",
but that probably just moves the portability problem somewhere else,
and besides it doesn't fully prevent BSD *scanf() from using isspace().)

This patch is a down payment on that: it gets rid of use of sscanf()
to parse ispell dictionary files, which are certainly at great risk
of having a problem.  The code is cleaner this way anyway, though
a bit longer.

In passing, improve a few comments.

Report and patch by Artur Zakirov, reviewed and somewhat tweaked by me.
Back-patch to all supported branches.
2016-02-10 19:30:11 -05:00
Tom Lane c5e9b77127 Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."
This reverts commit 3971f64843 and a
couple of followon debugging commits; I think we've learned what we can
from them.
2016-02-10 16:01:04 -05:00
Robert Haas 79a7ff0fe5 Code cleanup in the wake of recent LWLock refactoring.
As of commit c1772ad922, there's no
longer any way of requesting additional LWLocks in the main tranche,
so we don't need NumLWLocks() or LWLockAssign() any more.  Also,
some of the allocation counters that we had previously aren't needed
any more either.

Amit Kapila
2016-02-10 09:58:09 -05:00
Tom Lane 41d505a7ff Add still more chattiness in server shutdown.
Further investigation says that there may be some slow operations after
we've finished ShutdownXLOG(), so add some more log messages to try to
isolate that.  This is all temporary code too.
2016-02-09 19:36:30 -05:00
Tom Lane 7351e18286 Add more chattiness in server shutdown.
Early returns from the buildfarm show that there's a bit of a gap in the
logging I added in 3971f64843b02e4a: the portion of CreateCheckPoint()
after CheckPointGuts() can take a fair amount of time.  Add a few more
log messages in that section of code.  This too shall be reverted later.
2016-02-09 11:21:46 -05:00
Tom Lane 3971f64843 Temporarily make pg_ctl and server shutdown a whole lot chattier.
This is a quick hack, due to be reverted when its purpose has been served,
to try to gather information about why some of the buildfarm critters
regularly fail with "postmaster does not shut down" complaints.  Maybe they
are just really overloaded, but maybe something else is going on.  Hence,
instrument pg_ctl to print the current time when it starts waiting for
postmaster shutdown and when it gives up, and add a lot of logging of the
current time in the server's checkpoint and shutdown code paths.

No attempt has been made to make this pretty.  I'm not even totally sure
if it will build on Windows, but we'll soon find out.
2016-02-08 18:43:11 -05:00
Tom Lane 0231f83856 Re-pgindent varlena.c.
Just to make sure previous commit worked ...
2016-02-08 15:17:40 -05:00
Tom Lane 58e797216f Rename typedef "string" to "VarString".
Since pgindent treats typedef names as global, the original coding of
b47b4dbf68 would have had rather nasty effects on the formatting
of other files in which "string" is used as a variable or field name.
Use a less generic name for this typedef, and rename some other
identifiers to match.

Peter Geoghegan, per gripe from me
2016-02-08 15:15:56 -05:00
Tom Lane 3bb3f42f37 Fix some regex issues with out-of-range characters and large char ranges.
Previously, our regex code defined CHR_MAX as 0xfffffffe, which is a
bad choice because it is outside the range of type "celt" (int32).
Characters approaching that limit could lead to infinite loops in logic
such as "for (c = a; c <= b; c++)" where c is of type celt but the
range bounds are chr.  Such loops will work safely only if CHR_MAX+1
is representable in celt, since c must advance to beyond b before the
loop will exit.

Fortunately, there seems no reason not to restrict CHR_MAX to 0x7ffffffe.
It's highly unlikely that Unicode will ever assign codes that high, and
none of our other backend encodings need characters beyond that either.

In addition to modifying the macro, we have to explicitly enforce character
range restrictions on the values of \u, \U, and \x escape sequences, else
the limit is trivially bypassed.

Also, the code for expanding case-independent character ranges in bracket
expressions had a potential integer overflow in its calculation of the
number of characters it could generate, which could lead to allocating too
small a character vector and then overwriting memory.  An attacker with the
ability to supply arbitrary regex patterns could easily cause transient DOS
via server crashes, and the possibility for privilege escalation has not
been ruled out.

Quite aside from the integer-overflow problem, the range expansion code was
unnecessarily inefficient in that it always produced a result consisting of
individual characters, abandoning the knowledge that we had a range to
start with.  If the input range is large, this requires excessive memory.
Change it so that the original range is reported as-is, and then we add on
any case-equivalent characters that are outside that range.  With this
approach, we can bound the number of individual characters allowed without
sacrificing much.  This patch allows at most 100000 individual characters,
which I believe to be more than the number of case pairs existing in
Unicode, so that the restriction will never be hit in practice.

It's still possible for range() to take awhile given a large character code
range, so also add statement-cancel detection to its loop.  The downstream
function dovec() also lacked cancel detection, and could take a long time
given a large output from range().

Per fuzz testing by Greg Stark.  Back-patch to all supported branches.

Security: CVE-2016-0773
2016-02-08 10:25:40 -05:00
Andres Freund a6897efab9 Fix overeager pushdown of HAVING clauses when grouping sets are used.
In 61444bfb we started to allow HAVING clauses to be fully pushed down
into WHERE, even when grouping sets are in use. That turns out not to
work correctly, because grouping sets can "produce" NULLs, meaning that
filtering in WHERE and HAVING can have different results, even when no
aggregates or volatile functions are involved.

Instead only allow pushdown of empty grouping sets.

It'd be nice to do better, but the exact mechanics of deciding which
cases are safe are still being debated. It's important to give correct
results till we find a good solution, and such a solution might not be
appropriate for backpatching anyway.

Bug: #13863
Reported-By: 'wrb'
Diagnosed-By: Dean Rasheed
Author: Andrew Gierth
Reviewed-By: Dean Rasheed and Andres Freund
Discussion: 20160113183558.12989.56904@wrigleys.postgresql.org
Backpatch: 9.5, where grouping sets were introduced
2016-02-08 11:03:31 +01:00
Tom Lane cc2ca9319a Fix deparsing of ON CONFLICT arbiter WHERE clauses.
The parser doesn't allow qualification of column names appearing in
these clauses, but ruleutils.c would sometimes qualify them, leading
to dump/reload failures.  Per bug #13891 from Onder Kalaci.

(In passing, make stanzas in ruleutils.c that save/restore varprefix
more consistent.)

Peter Geoghegan
2016-02-07 14:57:24 -05:00
Tom Lane f867ce5518 ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable.
Commit 45f6240a8f added an assumption in ExecHashIncreaseNumBatches
and ExecHashIncreaseNumBuckets that they could find all tuples in the main
hash table by iterating over the "dense storage" introduced by that patch.
However, ExecHashRemoveNextSkewBucket continued its old practice of simply
re-linking deleted skew tuples into the main table's hashchains.  Hence,
such tuples got lost during any subsequent increase in nbatch or nbuckets,
and would never get joined, as reported in bug #13908 from Seth P.

I (tgl) think that the aforesaid commit has got multiple design issues
and should be reworked rather completely; but there is no time for that
right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket
physically copy deleted skew tuples into the "dense storage" arena.

The added test case is able to exhibit the problem by means of fooling the
planner with a WHERE condition that it will underestimate the selectivity
of, causing the initial nbatch estimate to be too small.

Tomas Vondra and Tom Lane.  Thanks to David Johnston for initial
investigation into the bug report.
2016-02-07 12:29:32 -05:00
Robert Haas 7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Robert Haas a1c1af2a1f Introduce group locking to prevent parallel processes from deadlocking.
For locking purposes, we now regard heavyweight locks as mutually
non-conflicting between cooperating parallel processes.  There are some
possible pitfalls to this approach that are not to be taken lightly,
but it works OK for now and can be changed later if we find a better
approach.  Without this, it's very easy for parallel queries to
silently self-deadlock if the user backend holds strong relation locks.

Robert Haas, with help from Amit Kapila.  Thanks to Noah Misch and
Andres Freund for extensive discussion of possible issues with this
approach.
2016-02-07 10:16:13 -05:00
Tom Lane aa2387e2fd Improve speed of timestamp/time/date output functions.
It seems that sprintf(), at least in glibc's version, is unreasonably slow
compared to hand-rolled code for printing integers.  Replacing most uses of
sprintf() in the datetime.c output functions with special-purpose code
turns out to give more than a 2X speedup in COPY of a table with a single
timestamp column; which is pretty impressive considering all the other
logic in that code path.

David Rowley and Andres Freund, reviewed by Peter Geoghegan and myself
2016-02-06 23:11:28 -05:00
Tom Lane b921aeb167 Fix comment block trashed by pgindent.
Looks like I put the protective dashes in the wrong place in f4e4b32743.
2016-02-06 15:13:36 -05:00
Tom Lane be11f8400d Improve HJDEBUG code a bit.
Commit 30d7ae3c76 introduced an HJDEBUG
stanza that probably didn't compile at the time, and definitely doesn't
compile now, because it refers to a nonexistent variable.  It doesn't seem
terribly useful anyway, so just get rid of it.

While I'm fooling with it, use %z modifier instead of the obsolete hack of
casting size_t to unsigned long, and include the HashJoinTable's address in
each printout so that it's possible to distinguish the activities of
multiple hashjoins occurring in one query.

Noted while trying to use HJDEBUG to investigate bug #13908.  Back-patch
to 9.5, because code that doesn't compile is certainly not very helpful.
2016-02-06 15:05:23 -05:00
Noah Misch 41baee7a93 Comment on dead code in AtAbort_Portals() and AtSubAbort_Portals().
Reviewed by Tom Lane and Robert Haas.
2016-02-05 20:23:40 -05:00
Noah Misch f4aa3a18a2 Force certain "pljava" custom GUCs to be PGC_SUSET.
Future PL/Java versions will close CVE-2016-0766 by making these GUCs
PGC_SUSET.  This PostgreSQL change independently mitigates that PL/Java
vulnerability, helping sites that update PostgreSQL more frequently than
PL/Java.  Back-patch to 9.1 (all supported versions).
2016-02-05 20:22:51 -05:00
Robert Haas e98fd78607 Fix typo in comment.
Michael Paquier
2016-02-05 08:11:00 -05:00
Robert Haas e0e7b8fa22 Remove parallel-safety check from GetExistingLocalJoinPath.
Commit a104a017fc has this check because
I added it to the submitted patch before commit, but that was entirely
wrongheaded, as explained to me by Ashutosh Bapat, who also wrote this
patch.
2016-02-05 08:07:38 -05:00
Robert Haas 63f39b9148 Fix small goof in comment.
Peter Geoghegan
2016-02-05 08:04:48 -05:00
Tom Lane 6819514fca Add num_nulls() and num_nonnulls() to count NULL arguments.
An example use-case is "CHECK(num_nonnulls(a,b,c) = 1)" to assert that
exactly one of a,b,c isn't NULL.  The functions are variadic, so they
can also be pressed into service to count the number of null or nonnull
elements in an array.

Marko Tiikkaja, reviewed by Pavel Stehule
2016-02-04 23:03:37 -05:00
Robert Haas 9418d79a76 When modifying a foreign table, initialize tableoid field properly.
Failure to do this can cause AFTER ROW triggers or RETURNING expressions
that reference this field to misbehave.

Etsuro Fujita, reviewed by Thom Brown
2016-02-04 21:17:53 -05:00
Peter Eisentraut f8003e07f9 Improve error message 2016-02-04 20:41:32 -05:00
Robert Haas a104a017fc Add some additional core functions to support join pushdown for FDWs.
GetExistingLocalJoinPath() is useful for handling EvalPlanQual rechecks
properly, and GetUserMappingById() is needed to make sure you're using
the right credentials.

Shigeru Hanada, Etsuro Fujita, Ashutosh Bapat, Robert Haas
2016-02-04 17:05:09 -05:00
Robert Haas c1772ad922 Change the way that LWLocks for extensions are allocated.
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs.  Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks.  To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.

Amit Kapila and Robert Haas
2016-02-04 16:43:04 -05:00
Robert Haas b47b4dbf68 Extend sortsupport for text to more opclasses.
Have varlena.c expose an interface that allows the char(n), bytea, and
bpchar types to piggyback on a now-generalized SortSupport for text.
This pushes a little more knowledge of the bpchar/char(n) type into
varlena.c than might be preferred, but that seems like the approach
that creates least friction.  Also speed things up for index builds
that use text_pattern_ops or varchar_pattern_ops.

This patch does quite a bit of renaming, but it seems likely to be
worth it, so as to avoid future confusion about the fact that this code
is now more generally used than the old names might have suggested.

Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson,
with small tweaks by me.
2016-02-03 14:29:53 -05:00
Robert Haas 69d34408e5 Allow parallel custom and foreign scans.
This patch doesn't put the new infrastructure to use anywhere, and
indeed it's not clear how it could ever be used for something like
postgres_fdw which has to send an SQL query and wait for a reply,
but there might be FDWs or custom scan providers that are CPU-bound,
so let's give them a way to join club parallel.

KaiGai Kohei, reviewed by me.
2016-02-03 12:49:46 -05:00
Robert Haas f2305d40ec Remove CustomPath's TextOutCustomPath method.
You can't really do anything useful with this in the form it currently
exists; among other problems, there's no way to reread whatever
information might be produced when the path is output.  Work is
underway to replace this with a more useful and more general system of
extensible nodes, but let's start by getting rid of this bit.

Extracted from a larger patch by KaiGai Kohei.
2016-02-03 10:38:50 -05:00
Tom Lane e6ecc93a17 Fix IsValidJsonNumber() to notice trailing non-alphanumeric garbage.
Commit e09996ff8d was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers.  Per bug #13906 from Dmitry Ryabov.

In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.
2016-02-03 01:39:48 -05:00
Peter Eisentraut 7d17e683fc Add support for systemd service notifications
Insert sd_notify() calls at server start and stop for integration with
systemd.  This allows the use of systemd service units of type "notify",
which greatly simplifies the systemd configuration.

Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>
2016-02-02 21:04:29 -05:00
Peter Eisentraut ac7238dc0f Improve error reporting when location specified by postgres -D does not exist
Previously, the first error seen would be that postgresql.conf does not
exist.  But for the case where the whole directory does not exist, give
an error message about that, together with a hint for how to create one.
2016-02-02 21:03:19 -05:00
Alvaro Herrera 3cb5867b7d Don't test for system columns on join relations
create_foreignscan_plan needs to know whether any system columns are
requested from a relation (this flag is needed by ForeignNext during
execution).  However, for join relations this is a pointless test,
because it's not possible to request system columns from them, so
remove the check.

Author: Etsuro Fujita
Discussion: http://www.postgresql.org/message-id/56AA0FC5.9000207@lab.ntt.co.jp
Reviewed-by: David Rowley, Robert Haas
2016-02-02 19:20:02 +01:00
Teodor Sigaev f25d07d99f Fix lossy KNN GiST when ordering operator returns non-float8 value.
KNN GiST with recheck flag should return to executor the same type as ordering
operator, GiST detects this type by looking to return type of function which
implements ordering operator. But occasionally detecting code works after
replacing ordering operator function to distance support function.
Distance support function always returns float8, so, detecting code get float8
instead of actual return type of ordering operator.

Built-in opclasses don't have ordering operator which doesn't return
non-float8 value, so, tests are impossible here, at least now.

Backpatch to 9.5 where lozzy KNN was introduced.

Author: Alexander Korotkov
Report by: Artur Zakirov
2016-02-02 15:20:33 +03:00
Robert Haas 7191ce8bea Make all built-in lwlock tranche IDs fixed.
This makes the values more stable, which seems like a good thing for
anybody who needs to look at at them.

Alexander Korotkov and Amit Kapila
2016-02-02 06:45:55 -05:00
Magnus Hagander e51ab85cd9 Fix typos in comments
Author: Michael Paquier
2016-02-01 11:43:48 +01:00
Heikki Linnakangas 61ce1e8f15 Fix misspelled function name in comment. 2016-02-01 10:10:24 +02:00
Peter Eisentraut 9217bf3961 Fix whitespace 2016-01-30 15:58:20 -05:00
Robert Haas 2251179e6a Migrate replication slot I/O locks into a separate tranche.
This is following in a long train of similar changes and for the same
reasons - see b319356f0e and
fe702a7b3f inter alia.

Author: Amit Kapila
Reviewed-by: Alexander Korotkov, Robert Haas
2016-01-29 09:45:38 -05:00
Robert Haas b319356f0e Migrate PGPROC's backendLock into PGPROC itself, using a new tranche.
Previously, each PGPROC's backendLock was part of the main tranche,
and the PGPROC just contained a pointer.  Now, the actual LWLock is
part of the PGPROC.

As with previous, similar patches, this makes it significantly easier
to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output
and improves modularity.

Author: Ildus Kurbangaliev
Reviewed-by: Amit Kapila, Robert Haas
2016-01-29 08:14:28 -05:00
Robert Haas fbe5a3fb73 Only try to push down foreign joins if the user mapping OIDs match.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
2016-01-28 14:05:36 -05:00
Robert Haas 96198d94cb Avoid multiple foreign server connections when all use same user mapping.
Previously, postgres_fdw's connection cache was keyed by user OID and
server OID, but this can lead to multiple connections when it's not
really necessary.  In particular, if all relevant users are mapped to
the public user mapping, then their connection options are certainly
the same, so one connection can be used for all of them.

While we're cleaning things up here, drop the "server" argument to
GetConnection(), which isn't really needed.  This saves a few cycles
because callers no longer have to look this up; the function itself
does, but only when establishing a new connection, not when reusing
an existing one.

Ashutosh Bapat, with a few small changes by me.
2016-01-28 12:05:19 -05:00
Fujii Masao 62e2ddd4ca Fix typos in comments and doc
overriden -> overridden

The misspelling in create_extension.sgml was introduced in b67aaf2,
so no need to backpatch.
2016-01-28 16:47:36 +09:00
Fujii Masao 7f46eaf035 Add gin_clean_pending_list function to clean up GIN pending list
This function cleans up the pending list of the GIN index by
moving entries in it to the main GIN data structure in bulk.
It returns the number of pages cleaned up from the pending list.

This function is useful, for example, when the pending list
needs to be cleaned up *quickly* to improve the performance of
the search using GIN index. VACUUM can do the same thing, too,
but it may take days to run on a large table.

Jeff Janes,
reviewed by Julien Rouhaud, Jaime Casanova, Alvaro Herrera and me.

Discussion: CAMkU=1x8zFkpfnozXyt40zmR3Ub_kHu58LtRmwHUKRgQss7=iQ@mail.gmail.com
2016-01-28 12:57:52 +09:00
Robert Haas eaf7b1f643 Assert that create_unique_path returns non-NULL.
Per off-list discussion with Tom Lane and Michael Paquier, Coverity
gets unhappy if this is not done.
2016-01-27 22:03:18 -05:00
Tom Lane b8682a7155 Fix startup so that log prefix %h works for the log_connections message.
We entirely randomly chose to initialize port->remote_host just after
printing the log_connections message, when we could perfectly well do it
just before, allowing %h and %r to work for that message.  Per gripe from
Artem Tomyuk.
2016-01-26 15:38:33 -05:00
Tom Lane cc988fbb0b Improve ResourceOwners' behavior for large numbers of owned objects.
The original coding was quite fast so long as objects were always
released in reverse order of addition; otherwise, it degenerated into
O(N^2) behavior due to searching for the array element to delete.
Improve matters by switching to hashed storage when the number of
objects of a given type exceeds 64.  (The cutover point is open to
discussion, of course, but some simple performance testing suggests
that hashing has enough overhead to be a loser below there.)

Also, refactor resowner.c so that we don't need N copies of the array
management code.  Since all the resource IDs the code currently needs
to deal with are either pointers or integers, it seems sufficient to
create a one-size-fits-all infrastructure in which everything is
converted to a Datum for storage.

Aleksander Alekseev, reviewed by Stas Kelvich, further fixes by me
2016-01-26 15:20:30 -05:00
Simon Riggs 1129c2b0ad Correct comment in GetConflictingVirtualXIDs()
We use Share lock because it is safe to do so.
2016-01-24 10:22:11 -08:00
Tom Lane 00347575e2 Yet further adjust degree-based trig functions for more portability.
Buildfarm member cockatiel is still saying that cosd(60) isn't 0.5.
What seems likely is that the subexpression (1.0 - cos(x)) isn't being
rounded to double width before more arithmetic is done on it, so force
that by storing it into a variable.
2016-01-24 12:53:03 -05:00
Tom Lane 360f67d31a Still further adjust degree-based trig functions for more portability.
Indeed, the non-static declaration foreseen in my previous commit message
is necessary.  Per Noah Misch.
2016-01-23 18:12:54 -05:00
Tom Lane 65abaab547 Further adjust degree-based trig functions for more portability.
The last round didn't do it.  Per Noah Misch, the problem on at least
some machines is that the compiler pre-evaluates trig functions having
constant arguments using code slightly different from what will be used
at runtime.  Therefore, we must prevent the compiler from seeing constant
arguments to any of the libm trig functions used in this code.

The method used here might still fail if init_degree_constants() gets
inlined into the call sites.  That probably won't happen given the large
number of call sites; but if it does, we could probably fix it by making
init_degree_constants() non-static.  I'll avoid that till proven
necessary, though.
2016-01-23 16:17:31 -05:00
Tom Lane 73193d82d7 Adjust degree-based trig functions for more portability.
The buildfarm isn't very happy with the results of commit e1bd684a34.
To try to get the expected exact results everywhere:

* Replace M_PI / 180 subexpressions with a precomputed constant, so that
the compiler can't decide to rearrange that division with an adjacent
operation.  Hopefully this will fix failures to get exactly 0.5 from
sind(30) and cosd(60).

* Add scaling to ensure that tand(45) and cotd(45) give exactly 1; there
was nothing particularly guaranteeing that before.

* Replace minus zero by zero when tand() or cotd() would output that;
many machines did so for tand(180) and cotd(270), but not all.  We could
alternatively deem both results valid, but that doesn't seem likely to
be what users will want.
2016-01-23 11:26:07 -05:00
Tom Lane e1bd684a34 Add trigonometric functions that work in degrees.
The implementations go to some lengths to deliver exact results for values
where an exact result can be expected, such as sind(30) = 0.5 exactly.

Dean Rasheed, reviewed by Michael Paquier
2016-01-22 15:46:22 -05:00
Tom Lane fd5200c3dc Improve cross-platform consistency of Inf/NaN handling in trig functions.
Ensure that the trig functions return NaN for NaN input regardless of what
the underlying C library functions might do.  Also ensure that an error
is thrown for Inf (or otherwise out-of-range) input, except for atan/atan2
which should accept it.

All these behaviors should now conform to the POSIX spec; previously, all
our popular platforms deviated from that in one case or another.

The main remaining platform dependency here is whether the C library might
choose to throw a domain error for sin/cos/tan inputs that are large but
less than infinity.  (Doing so is not unreasonable, since once a single
unit-in-the-last-place exceeds PI, there can be no significance at all in
the result; however there doesn't seem to be any suggestion in POSIX that
such an error is allowed.)  We will report such errors if they are reported
via "errno", but not if they are reported via "fetestexcept" which is the
other mechanism sanctioned by POSIX.  Some preliminary experiments with
fetestexcept indicated that it might also report errors we could do
without, such as complaining about underflow at an unreasonably large
threshold.  So let's skip that complexity for now.

Dean Rasheed, reviewed by Michael Paquier
2016-01-22 14:50:51 -05:00
Tom Lane a396144ac0 Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.
Commit e529cd4ffa introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.

It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.

Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place).  That means the cost is not really O(mn) but more like O(max(m,n)).

Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.

In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.

Per gripe from Kevin Day; solution suggested by Robert Haas.  Back-patch
to 9.5 where the unwanted restriction was introduced.
2016-01-22 11:53:06 -05:00
Tom Lane 647d87c56a Make extract() do something more reasonable with infinite datetimes.
Historically, extract() just returned zero for any case involving an
infinite timestamp[tz] input; even cases in which the unit name was
invalid.  This is not very sensible.  Instead, return infinity or
-infinity as appropriate when the requested field is one that is
monotonically increasing (e.g, year, epoch), or NULL when it is not
(e.g., day, hour).  Also, throw the expected errors for bad unit names.

BACKWARDS INCOMPATIBLE CHANGE

Vitaly Burovoy, reviewed by Vik Fearing
2016-01-21 22:26:20 -05:00
Tom Lane d9b9289c83 Suppress compiler warning.
Given the limited range of i, these shifts should not cause any
problem, but that apparently doesn't stop some compilers from
whining about them.

David Rowley
2016-01-21 21:14:07 -05:00
Tom Lane be44ed27b8 Improve index AMs' opclass validation procedures.
The amvalidate functions added in commit 65c5fcd353 were on the
crude side.  Improve them in a few ways:

* Perform signature checking for operators and support functions.

* Apply more thorough checks for missing operators and functions,
where possible.

* Instead of reporting problems as ERRORs, report most problems as INFO
messages and make the amvalidate function return FALSE.  This allows
more than one problem to be discovered per run.

* Report object names rather than OIDs, and work a bit harder on making
the messages understandable.

Also, remove a few more opr_sanity regression test queries that are
now superseded by the amvalidate checks.
2016-01-21 19:47:15 -05:00
Tom Lane b99551832e Add defenses against putting expanded objects into Const nodes.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons.  It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime.  Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself.  This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.

I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.

The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems.  Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)

The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum().  That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.

Back-patch to 9.5 where the expanded-object facility was added.  It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
2016-01-21 12:56:08 -05:00
Fujii Masao 38710a374e Remove unused argument from ginInsertCleanup()
It's an oversight in commit dc943ad.
2016-01-22 01:22:56 +09:00
Simon Riggs c80b31d557 Refactor headers to split out standby defs
Jeff Janes
2016-01-20 18:51:34 -08:00
Simon Riggs 978b2f65aa Speedup 2PC by skipping two phase state files in normal path
2PC state info is written only to WAL at PREPARE, then read back from WAL at
COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr
checkpoint cycle will be written to disk in the same form as previously. Crash
recovery path is not altered. Measured performance gains of 50-100% for short
2PC transactions by completely avoiding writing files and fsyncing. Other
optimizations still available, further patches in related areas expected.

Stas Kelvich and heavily edited by Simon Riggs

Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas,
a concrete example of how Postgres-XC has fed back ideas into PostgreSQL.

Reviewed by Michael Paquier, Jeff Janes and Andres Freund
Performance testing by Jesper Pedersen
2016-01-20 18:40:44 -08:00
Simon Riggs 422a55a687 Refactor to create generic WAL page read callback
Previously we didn’t have a generic WAL page read callback function,
surprisingly. Logical decoding has logical_read_local_xlog_page(), which was
actually generic, so move that to xlogfunc.c and rename to
read_local_xlog_page().
Maintain logical_read_local_xlog_page() so existing callers still work.

As requested by Michael Paquier, Alvaro Herrera and Andres Freund
2016-01-20 17:18:58 -08:00
Robert Haas 45be99f8cd Support parallel joins, and make related improvements.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.

This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.

Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.

There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.

Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20 14:40:26 -05:00
Robert Haas a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Bruce Momjian 216d568432 Properly install dynloader.h on MSVC builds
This will enable PL/Java to be cleanly compiled, as dynloader.h is a
requirement.

Report by Chapman Flack

Patch by Michael Paquier

Backpatch through 9.1
2016-01-19 23:30:29 -05:00
Alvaro Herrera 948c97958b Add two HyperLogLog functions
New functions initHyperLogLogError() and freeHyperLogLog() simplify
using this module from elsewhere.

Author: Tomáš Vondra
Review: Peter Geoghegan
2016-01-19 17:40:15 -03:00
Tom Lane 9ff60273e3 Fix assorted inconsistencies in GiST opclass support function declarations.
The conventions specified by the GiST SGML documentation were widely
ignored.  For example, the strategy-number argument for "consistent" and
"distance" functions is specified to be a smallint, but most of the
built-in support functions declared it as an integer, and for that matter
the core code passed it using Int32GetDatum not Int16GetDatum.  None of
that makes any real difference at runtime, but it's quite confusing for
newcomers to the code, and it makes it very hard to write an amvalidate()
function that checks support function signatures.  So let's try to instill
some consistency here.

Another similar issue is that the "query" argument is not of a single
well-defined type, but could have different types depending on the strategy
(corresponding to search operators with different righthand-side argument
types).  Some of the functions threw up their hands and declared the query
argument as being of "internal" type, which surely isn't right ("any" would
have been more appropriate); but the majority position seemed to be to
declare it as being of the indexed data type, corresponding to a search
operator with both input types the same.  So I've specified a convention
that that's what to do always.

Also, the result of the "union" support function actually must be of the
index's storage type, but the documentation suggested declaring it to
return "internal", and some of the functions followed that.  Standardize
on telling the truth, instead.

Similarly, standardize on declaring the "same" function's inputs as
being of the storage type, not "internal".

Also, somebody had forgotten to add the "recheck" argument to both
the documentation of the "distance" support function and all of their
SQL declarations, even though the C code was happily using that argument.
Clean that up too.

Fix up some other omissions in the docs too, such as documenting that
union's second input argument is vestigial.

So far as the errors in core function declarations go, we can just fix
pg_proc.h and bump catversion.  Adjusting the erroneous declarations in
contrib modules is more debatable: in principle any change in those
scripts should involve an extension version bump, which is a pain.
However, since these changes are purely cosmetic and make no functional
difference, I think we can get away without doing that.
2016-01-19 12:04:36 -05:00
Tom Lane 49b4950650 Add explicit cast to amcostestimate call.
My compiler doesn't complain here, but David Rowley's does ...
2016-01-17 22:56:16 -05:00
Tom Lane 65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane 8d290c8ec6 Re-pgindent a few files.
In preparation for landing index AM interface changes.
2016-01-17 19:13:18 -05:00
Robert Haas 23c2dd03d5 Fix spelling mistakes.
Same patch submitted independently by David Rowley and Peter Geoghegan.
2016-01-14 23:16:40 -05:00
Tom Lane a923af382c Fix build_grouping_chain() to not clobber its input lists.
There's no good reason for stomping on the input data; it makes the logic
in this function no simpler, in fact probably the reverse.  And it makes
it impossible to separate path generation from plan generation, as I'm
working towards doing; that will require more than one traversal of these
lists.
2016-01-14 11:51:57 -05:00
Magnus Hagander 6a61d1ff9d Properly close token in sspi authentication
We can never leak more than one token, but we shouldn't do that. We
don't bother closing it in the error paths since the process will
exit shortly anyway.

Christian Ullrich
2016-01-14 13:06:03 +01:00
Simon Riggs e63bb4549a Add new user fn pg_current_xlog_flush_location()
Tomas Vondra, reviewed by Michael Paquier and Amit Kapila
Minor edits by me
2016-01-12 07:54:52 +00:00
Simon Riggs 1e29e6324c Maintain local LogwrtResult consistently
Teach GetFlushRecPtr() to update LogwrtResult cache as performed by all other
functions in xlog.c
2016-01-12 07:33:20 +00:00
Robert Haas 950ab82c3d Remove obsolete comment.
Noted while reviewing a question from Dickson S. Guedes.
2016-01-10 21:35:33 -05:00
Tom Lane 820bdccc1b Remove a useless PG_GETARG_DATUM() call from jsonb_build_array.
This loop uselessly fetched the argument after the one it's currently
looking at.  No real harm is done since we couldn't possibly fetch off
the end of memory, but it's confusing to the reader.

Also remove a duplicate (and therefore confusing) PG_ARGISNULL check in
jsonb_build_object.

I happened to notice these things while trolling for missed null-arg
checks earlier today.  Back-patch to 9.5, not because there is any
real bug, but just because 9.5 and HEAD are still in sync in this
file and we might as well keep them so.

In passing, re-pgindent.
2016-01-09 17:39:45 -05:00
Tom Lane 26d538dc93 Clean up some lack-of-STRICT issues in the core code, too.
A scan for missed proisstrict markings in the core code turned up
these functions:

brin_summarize_new_values
pg_stat_reset_single_table_counters
pg_stat_reset_single_function_counters
pg_create_logical_replication_slot
pg_create_physical_replication_slot
pg_drop_replication_slot

The first three of these take OID, so a null argument will normally look
like a zero to them, resulting in "ERROR: could not open relation with OID
0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX
functions.  The other three will dump core on a null argument, though this
is mitigated by the fact that they won't do so until after checking that
the caller is superuser or has rolreplication privilege.

In addition, the pg_logical_slot_get/peek[_binary]_changes family was
intentionally marked nonstrict, but failed to make nullness checks on all
the arguments; so again a null-pointer-dereference crash is possible but
only for superusers and rolreplication users.

Add the missing ARGISNULL checks to the latter functions, and mark the
former functions as strict in pg_proc.  Make that change in the back
branches too, even though we can't force initdb there, just so that
installations initdb'd in future won't have the issue.  Since none of these
bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX
functions hardly misbehave at all), it seems sufficient to do this.

In addition, fix some order-of-operations oddities in the slot_get_changes
family, mostly cosmetic, but not the part that moves the function's last
few operations into the PG_TRY block.  As it stood, there was significant
risk for an error to exit without clearing historical information from
the system caches.

The slot_get_changes bugs go back to 9.4 where that code was introduced.
Back-patch appropriate subsets of the pg_proc changes into all active
branches, as well.
2016-01-09 16:58:32 -05:00
Simon Riggs b602842613 Revoke change to rmgr desc of btree vacuum
Per discussion with Andres Freund
2016-01-09 18:31:08 +00:00
Simon Riggs 687f2cd7a0 Avoid pin scan for replay of XLOG_BTREE_VACUUM
Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require
complex interlocking that matched the requirements on the master. This required
an O(N) operation that became a significant problem with large indexes, causing
replication delays of seconds or in some cases minutes while the
XLOG_BTREE_VACUUM was replayed.

This commit skips the “pin scan” that was previously required, by observing in
detail when and how it is safe to do so, with full documentation. The pin scan
is skipped only in replay; the VACUUM code path on master is not touched here.

The current commit still performs the pin scan for toast indexes, though this
can also be avoided if we recheck scans on toast indexes. Later patch will
address this.

No tests included. Manual tests using an additional patch to view WAL records
and their timing have shown the change in WAL records and their handling has
successfully reduced replication delay.
2016-01-09 10:10:08 +00:00
Tom Lane a54676acad Marginal cleanup of GROUPING SETS code in grouping_planner().
Improve comments and make it a shade less messy.  I think we might want
to move all of this somewhere else later, but it needs to be more
readable first.

In passing, re-pgindent the file, affecting some recently-added comments
concerning parallel query planning.
2016-01-07 20:32:35 -05:00
Tom Lane c44d013835 Delay creation of subplan tlist until after create_plan().
Once upon a time it was necessary for grouping_planner() to determine
the tlist it wanted from the scan/join plan subtree before it called
query_planner(), because query_planner() would actually make a Plan using
that.  But we refactored things a long time ago to delay construction of
the Plan tree till later, so there's no need to build that tlist until
(and indeed unless) we're ready to plaster it onto the Plan.  The only
thing query_planner() cares about is what Vars are going to be needed for
the tlist, and it can perfectly well get that by looking at the real tlist
rather than some masticated version.

Well, actually, there is one minor glitch in that argument, which is that
make_subplanTargetList also adds Vars appearing only in HAVING to the
tlist it produces.  So now we have to account for HAVING explicitly in
build_base_rel_tlists.  But that just adds a few lines of code, and
I doubt it moves the needle much on processing time; we might be doing
pull_var_clause() twice on the havingQual, but before we had it scanning
dummy tlist entries instead.

This is a very small down payment on rationalizing grouping_planner
enough so it can be refactored.
2016-01-07 20:23:57 -05:00
Alvaro Herrera b1a9bad9e7 pgstat: add WAL receiver status view & SRF
This new view provides insight into the state of a running WAL receiver
in a HOT standby node.
The information returned includes the PID of the WAL receiver process,
its status (stopped, starting, streaming, etc), start LSN and TLI, last
received LSN and TLI, timestamp of last message send and receipt, latest
end-of-WAL LSN and time, and the name of the slot (if any).

Access to the detailed data is only granted to superusers; others only
get the PID.

Author: Michael Paquier
Reviewer: Haribabu Kommi
2016-01-07 16:21:19 -03:00
Tom Lane 6b1a837f69 Remove vestigial CHECK_FOR_INTERRUPTS call.
Commit e710b65c inserted code in md5_crypt_verify to disable and later
re-enable interrupts, with a CHECK_FOR_INTERRUPTS call as part of the
second step, to process any interrupts that had been held off.  Commit
6647248e removed the interrupt disable/re-enable code, but left behind
the CHECK_FOR_INTERRUPTS, even though this is now an entirely random,
pointless place for one.  md5_crypt_verify doesn't run long enough to
need such a check, and if it did, this would still be the wrong place
to put one.
2016-01-07 11:26:54 -05:00
Tom Lane 5e0b5dcab6 Provide more detail in postmaster log for password authentication failures.
We tell people to examine the postmaster log if they're unsure why they are
getting auth failures, but actually only a few relatively-uncommon failure
cases were given their own log detail messages in commit 64e43c59b8.
Expand on that so that every failure case detected within md5_crypt_verify
gets a specific log detail message.  This should cover pretty much every
ordinary password auth failure cause.

So far I've not noticed user demand for a similar level of auth detail
for the other auth methods, but sooner or later somebody might want to
work on them.  This is not that patch, though.
2016-01-07 11:19:33 -05:00
Alvaro Herrera a967613911 Windows: Make pg_ctl reliably detect service status
pg_ctl is using isatty() to verify whether the process is running in a
terminal, and if not it sends its output to Windows' Event Log ... which
does the wrong thing when the output has been redirected to a pipe, as
reported in bug #13592.

To fix, make pg_ctl use the code we already have to detect service-ness:
in the master branch, move src/backend/port/win32/security.c to src/port
(with suitable tweaks so that it runs properly in backend and frontend
environments); pg_ctl already has access to pgport so it Just Works.  In
older branches, that's likely to cause trouble, so instead duplicate the
required code in pg_ctl.c.

Author: Michael Paquier
Bug report and diagnosis: Egon Kocjan
Backpatch: all supported branches
2016-01-07 11:59:08 -03:00
Tom Lane 4bf87169cc Comment typo fix.
Per Amit Langote.
2016-01-06 11:06:51 -05:00
Alvaro Herrera abb1733922 Add scale(numeric)
Author: Marko Tiikkaja
2016-01-05 19:02:13 -03:00
Tom Lane 419400c5da Remove some ancient and unmaintained encoding-conversion test cruft.
In commit 921191912c I claimed that we weren't testing encoding
conversion functions, but further poking around reveals that we did
have an equivalent though hard-wired set of tests in conversion.sql.
AFAICS there is no advantage to doing it like that as compared to letting
the catalog contents drive the test, so let the opr_sanity addition stand
and remove the now-redundant tests in conversion.sql.

Also, remove some infrastructure in src/backend/utils/mb/conversion_procs
for building conversion.sql's list of tests.  That was unmaintained, and
had not corresponded to the actual contents of conversion.sql since 2007
or perhaps even further back.
2016-01-05 16:43:40 -05:00
Tom Lane ea0d494dae Make the to_reg*() functions accept text not cstring.
Using cstring as the input type was a poor decision, because that's not
really a full-fledged type.  In particular, it lacks implicit coercions
from text or varchar, meaning that usages like to_regproc('foo'||'bar')
wouldn't work; basically the only case that did work without explicit
casting was a simple literal constant argument.

The lack of field complaints about this suggests that hardly anyone
is using these functions, so hopefully fixing it won't cause much of
a compatibility problem.  They've only been there since 9.4, anyway.

Petr Korobeinikov
2016-01-05 13:02:43 -05:00
Alvaro Herrera efa318bcfa Make pg_shseclabel available in early backend startup
While the in-core authentication mechanism doesn't need to access
pg_shseclabel at all, it's reasonable to think that an authentication
hook will want to look at the label for the role logging in, or for rows
in other catalogs used during the authentication phase of startup.

Catalog version bumped, because this changes the "is nailed" status for
pg_shseclabel.

Author: Adam Brightwell
2016-01-05 14:50:53 -03:00
Tom Lane 5d35438273 Adjust behavior of row_security GUC to match the docs.
Some time back we agreed that row_security=off should not be a way to
bypass RLS entirely, but only a way to get an error if it was being
applied.  However, the code failed to act that way for table owners.
Per discussion, this is a must-fix bug for 9.5.0.

Adjust the logic in rls.c to behave as expected; also, modify the
error message to be more consistent with the new interpretation.
The regression tests need minor corrections as well.  Also update
the comments about row_security in ddl.sgml to be correct.  (The
official description of the GUC in config.sgml is already correct.)

I failed to resist the temptation to do some other very minor
cleanup as well, such as getting rid of a duplicate extern declaration.
2016-01-04 12:21:41 -05:00
Tom Lane b0cadc08fe Fix regrole and regnamespace output functions to do quoting, too.
We discussed this but somehow failed to implement it...
2016-01-04 01:53:24 -05:00
Tom Lane fb1227af67 Fix regrole and regnamespace types to honor quoting like other reg* types.
Aside from any consistency arguments, this is logically necessary because
the I/O functions for these types also handle numeric OID values.  Without
a quoting rule it is impossible to distinguish numeric OIDs from role or
namespace names that happen to contain only digits.

Also change the to_regrole and to_regnamespace functions to dequote their
arguments.  While not logically essential, this seems like a good idea
since the other to_reg* functions do it.  Anyone who really wants raw
lookup of an uninterpreted name can fall back on the time-honored solution
of (SELECT oid FROM pg_namespace WHERE nspname = whatever).

Report and patch by Jim Nasby, reviewed by Michael Paquier
2016-01-04 01:03:53 -05:00
Tom Lane f47b602df8 Fix bogus lock release in RemovePolicyById and RemoveRoleFromObjectPolicy.
Can't release the AccessExclusiveLock on the target table until commit.
Otherwise there is a race condition whereby other backends might service
our cache invalidation signals before they can actually see the updated
catalog rows.

Just to add insult to injury, RemovePolicyById was closing the rel (with
incorrect lock drop) and then passing the now-dangling rel pointer to
CacheInvalidateRelcache.  Probably the reason this doesn't fall over on
CLOBBER_CACHE buildfarm members is that some outer level of the DROP logic
is still holding the rel open ... but it'd have bit us on the arse
eventually, no doubt.
2016-01-03 20:53:35 -05:00
Tom Lane 939d10cd87 Guard against null arguments in binary_upgrade_create_empty_extension().
The CHECK_IS_BINARY_UPGRADE macro is not sufficient security protection
if we're going to dereference pass-by-reference arguments before it.

But in any case we really need to explicitly check PG_ARGISNULL for all
the arguments of a non-strict function, not only the ones we expect null
values for.

Oversight in commits 30982be4e5 and
f92fc4c95d.  Found by Andreas Seltenreich.
(The other usages in pg_upgrade_support.c seem safe.)
2016-01-03 16:26:38 -05:00
Tom Lane 90e61df813 Fix treatment of *lpNumberOfBytesRecvd == 0: that's a completion condition.
pgwin32_recv() has treated a non-error return of zero bytes from WSARecv()
as being a reason to block ever since the current implementation was
introduced in commit a4c40f140d.  However, so far as one can tell
from Microsoft's documentation, that is just wrong: what it means is
graceful connection closure (in stream protocols) or receipt of a
zero-length message (in message protocols), and neither case should result
in blocking here.  The only reason the code worked at all was that control
then fell into the retry loop, which did *not* treat zero bytes specially,
so we'd get out after only wasting some cycles.  But as of 9.5 we do not
normally reach the retry loop and so the bug is exposed, as reported by
Shay Rojansky and diagnosed by Andres Freund.

Remove the unnecessary test on the byte count, and rearrange the code
in the retry loop so that it looks identical to the initial sequence.

Back-patch to 9.5.  The code is wrong all the way back, AFAICS, but
since it's relatively harmless in earlier branches we'll leave it alone.
2016-01-03 13:56:29 -05:00
Tom Lane 7157fe80f4 Fix overly-strict assertions in spgtextproc.c.
spg_text_inner_consistent is capable of reconstructing an empty string
to pass down to the next index level; this happens if we have an empty
string coming in, no prefix, and a dummy node label.  (In practice, what
is needed to trigger that is insertion of a whole bunch of empty-string
values.)  Then, we will arrive at the next level with in->level == 0
and a non-NULL (but zero length) in->reconstructedValue, which is valid
but the Assert tests weren't expecting it.

Per report from Andreas Seltenreich.  This has no impact in non-Assert
builds, so should not be a problem in production, but back-patch to
all affected branches anyway.

In passing, remove a couple of useless variable initializations and
shorten the code by not duplicating DatumGetPointer() calls.
2016-01-02 16:24:50 -05:00
Bruce Momjian ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Noah Misch dfcd9cb302 Cover heap_page_prune_opt()'s cleanup lock tactic in README.
Jeff Janes, reviewed by Jim Nasby.
2016-01-01 21:52:22 -05:00
Tom Lane c7e27becd2 Teach flatten_reloptions() to quote option values safely.
flatten_reloptions() supposed that it didn't really need to do anything
beyond inserting commas between reloption array elements.  However, in
principle the value of a reloption could be nearly anything, since the
grammar allows a quoted string there.  Any restrictions on it would come
from validity checking appropriate to the particular option, if any.

A reloption value that isn't a simple identifier or number could thus lead
to dump/reload failures due to syntax errors in CREATE statements issued
by pg_dump.  We've gotten away with not worrying about this so far with
the core-supported reloptions, but extensions might allow reloption values
that cause trouble, as in bug #13840 from Kouhei Sutou.

To fix, split the reloption array elements explicitly, and then convert
any value that doesn't look like a safe identifier to a string literal.
(The details of the quoting rule could be debated, but this way is safe
and requires little code.)  While we're at it, also quote reloption names
if they're not safe identifiers; that may not be a likely problem in the
field, but we might as well try to be bulletproof here.

It's been like this for a long time, so back-patch to all supported
branches.

Kouhei Sutou, adjusted some by me
2016-01-01 15:27:53 -05:00
Tom Lane 3c93a60f60 Add some more defenses against silly estimates to gincostestimate().
A report from Andy Colson showed that gincostestimate() was not being
nearly paranoid enough about whether to believe the statistics it finds in
the index metapage.  The problem is that the metapage stats (other than the
pending-pages count) are only updated by VACUUM, and in the worst case
could still reflect the index's original empty state even when it has grown
to many entries.  We attempted to deal with that by scaling up the stats to
match the current index size, but if nEntries is zero then scaling it up
still gives zero.  Moreover, the proportion of pages that are entry pages
vs. data pages vs. pending pages is unlikely to be estimated very well by
scaling if the index is now orders of magnitude larger than before.

We can improve matters by expanding the use of the rule-of-thumb estimates
I introduced in commit 7fb008c5ee59b040: if the index has grown by more
than a cutoff amount (here set at 4X growth) since VACUUM, then use the
rule-of-thumb numbers instead of scaling.  This might not be exactly right
but it seems much less likely to produce insane estimates.

I also improved both the scaling estimate and the rule-of-thumb estimate
to account for numPendingPages, since it's reasonable to expect that that
is accurate in any case, and certainly pages that are in the pending list
are not either entry or data pages.

As a somewhat separate issue, adjust the estimation equations that are
concerned with extra fetches for partial-match searches.  These equations
suppose that a fraction partialEntries / numEntries of the entry and data
pages will be visited as a consequence of a partial-match search.  Now,
it's physically impossible for that fraction to exceed one, but our
estimate of partialEntries is mostly bunk, and our estimate of numEntries
isn't exactly gospel either, so we could arrive at a silly value.  In the
example presented by Andy we were coming out with a value of 100, leading
to insane cost estimates.  Clamp the fraction to one to avoid that.

Like the previous patch, back-patch to all supported branches; this
problem can be demonstrated in one form or another in all of them.
2016-01-01 13:42:21 -05:00
Noah Misch 3cd1ba147e Fix comments about WAL rule "write xlog before data" versus pg_multixact.
Recovery does not achieve its goal of zeroing all pg_multixact entries
whose accompanying WAL records never reached disk.  Remove that claim
and justify its expendability.  Detail the need for TrimMultiXact(),
which has little in common with the TrimCLOG() rationale.  Merge two
tightly-related comments.  Stop presenting pg_multixact as specific to
heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update().

Noticed while investigating a report from Andres Freund.
2016-01-01 01:46:46 -05:00
Tom Lane 0dab5ef39b Fix ALTER OPERATOR to update dependencies properly.
Fix an oversight in commit 321eed5f0f7563a0: replacing an operator's
selectivity functions needs to result in a corresponding update in
pg_depend.  We have a function that can handle that, but it was not
called by AlterOperator().

To fix this without enlarging pg_operator.h's #include list beyond
what clients can safely include, split off the function definitions
into a new file pg_operator_fn.h, similarly to what we've done for
some other catalog header files.  It's not entirely clear whether
any client-side code needs to include pg_operator.h, but it seems
prudent to assume that there is some such code somewhere.
2015-12-31 17:37:31 -05:00
Tom Lane e5d06f2b12 Dept of second thoughts: the !scan_all exit mustn't increase scanned_pages.
In the extreme edge case where contended pages are the only ones that
escape being scanned, the previous commit would have allowed us to think
that relfrozenxid could be advanced, which is exactly wrong.
2015-12-30 17:32:23 -05:00
Tom Lane e842908233 Avoid useless truncation attempts during VACUUM.
VACUUM can skip heap pages altogether when there's a run of consecutive
pages that are all-visible according to the visibility map.  This causes it
to not update its nonempty_pages count, just as if those pages were empty,
which means that at the end we will think they are candidates for deletion.
Thus, we may take the table's AccessExclusive lock only to find that no
pages are really truncatable.  This usually causes no real problems on a
master server, thanks to the lock being acquired only conditionally; but on
hot-standby servers, the same lock must be acquired unconditionally which
can result in unnecessary query cancellations.

To improve matters, force examination of the table's last page whenever
we reach there with a nonempty_pages count that would allow a truncation
attempt.  If it's not empty, we'll advance nonempty_pages and thereby
prevent the truncation attempt.

If we are unable to acquire cleanup lock on that page, there's no need to
force it, unless we're doing an anti-wraparound vacuum.  We can just check
for tuples with a shared buffer lock and then give up.  (When we are doing
an anti-wraparound vacuum, and decide it's okay to skip the page because it
contains no freezable tuples, this patch still improves matters because
nonempty_pages is properly updated, which it was not before.)

Since only the last page is special-cased in this way, we might attempt a
truncation that will release many fewer pages than the normal heuristic
would suggest; at worst, only one page would be truncated.  But that seems
all right, because the situation won't repeat during the next vacuum.
The real problem with the old logic is that the useless truncation attempt
happens every time we vacuum, so long as the state of the last few dozen
pages doesn't change.

This is a longstanding deficiency, but since the consequences aren't very
severe in most scenarios, I'm not going to risk a back-patch.

Jeff Janes and Tom Lane
2015-12-30 17:13:15 -05:00
Tom Lane efe4c9d704 Add some comments about division of labor between rewriter and planner.
The rationale for the way targetlist processing is done wasn't clearly
stated anywhere, and I for one had forgotten some of the details.  Having
just painfully re-learned them, add some breadcrumbs for the next person.
2015-12-29 18:50:35 -05:00
Tom Lane fd19525756 Put back one copyObject() in rewriteTargetView().
Commit 6f8cb1e234 tried to centralize rewriteTargetView's copying
of a target view's Query struct.  However, it ignored the fact that the
jointree->quals field was used twice.  This only accidentally failed to
fail immediately because the same ChangeVarNodes mutation is applied in
both cases, so that we end up with logically identical expression trees
for both uses (and, as the code stands, the second ChangeVarNodes call
actually does nothing).  However, we end up linking *physically*
identical expression trees into both an RTE's securityQuals list and
the WithCheckOption list.  That's pretty dangerous, mainly because
prepsecurity.c is utterly cavalier about further munging such structures
without copying them first.

There may be no live bug in HEAD as a consequence of the fact that we apply
preprocess_expression in between here and prepsecurity.c, and that will
make a copy of the tree anyway.  Or it may just be that the regression
tests happen to not trip over it.  (I noticed this only because things
fell over pretty badly when I tried to relocate the planner's call of
expand_security_quals to before expression preprocessing.)  In any case
it's very fragile because if anyone tried to make the securityQuals and
WithCheckOption trees diverge before we reach preprocess_expression, it
would not work.  The fact that the current code will preprocess
securityQuals and WithCheckOptions lists at completely different times in
different query levels does nothing to increase my trust that that can't
happen.

In view of the fact that 9.5.0 is almost upon us and the aforesaid commit
has seen exactly zero field testing, the prudent course is to make an extra
copy of the quals so that the behavior is not different from what has been
in the field during beta.
2015-12-29 16:45:47 -05:00
Joe Conway 241448b23a Rename (new|old)estCommitTs to (new|old)estCommitTsXid
The variables newestCommitTs and oldestCommitTs sound as if they are
timestamps, but in fact they are the transaction Ids that correspond
to the newest and oldest timestamps rather than the actual timestamps.
Rename these variables to reflect that they are actually xids: to wit
newestCommitTsXid and oldestCommitTsXid respectively. Also modify
related code in a similar fashion, particularly the user facing output
emitted by pg_controldata and pg_resetxlog.

Complaint and patch by me, review by Tom Lane and Alvaro Herrera.
Backpatch to 9.5 where these variables were first introduced.
2015-12-28 12:34:11 -08:00
Tom Lane fec1ad94df Include typmod when complaining about inherited column type mismatches.
MergeAttributes() rejects cases where columns to be merged have the same
type but different typmod, which is correct; but the error message it
printed didn't show either typmod, which is unhelpful.  Changing this
requires using format_type_with_typemod() in place of TypeNameToString(),
which will have some minor side effects on the way some type names are
printed, but on balance this is an improvement: the old code sometimes
printed one type according to one set of rules and the other type according
to the other set, which could be confusing in its own way.

Oddly, there were no regression test cases covering any of this behavior,
so add some.

Complaint and fix by Amit Langote
2015-12-26 13:41:29 -05:00
Tom Lane 3d2b31e30e Fix brin_summarize_new_values() to check index type and ownership.
brin_summarize_new_values() did not check that the passed OID was for
an index at all, much less that it was a BRIN index, and would fail in
obscure ways if it wasn't (possibly damaging data first?).  It also
lacked any permissions test; by analogy to VACUUM, we should only allow
the table's owner to summarize.

Noted by Jeff Janes, fix by Michael Paquier and me
2015-12-26 12:56:09 -05:00
Robert Haas bc7fcab5e3 Read from the same worker repeatedly until it returns no tuple.
The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next.  I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time.  With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.

Patch by me.  Dilip Kumar helped with performance testing.
2015-12-23 14:06:52 -05:00
Robert Haas 51d152f18e Change Gather not to use a physical tlist.
This should have been part of the original commit, but was missed.
Pushing data between processes is expensive, so we definitely want
to project away unneeded columns here, just as we do for other nodes
like Sort and Hash that care about the volume of data.
2015-12-23 13:41:06 -05:00
Peter Eisentraut 30c0c4bf12 Remove unnecessary escaping in C character literals
'\"' is more commonly written simply as '"'.
2015-12-22 22:43:46 -05:00
Tom Lane 6efbded6e4 Allow omitting one or both boundaries in an array slice specifier.
Omitted boundaries represent the upper or lower limit of the corresponding
array subscript.  This allows simpler specification of many common
use-cases.

(Revised version of commit 9246af6799)

YUriy Zhuravlev
2015-12-22 21:05:29 -05:00
Robert Haas 0ba3f3bc65 Comment improvements for abbreviated keys.
Peter Geoghegan and Robert Haas
2015-12-22 13:57:18 -05:00
Robert Haas ccd8f97922 postgres_fdw: Consider requesting sorted data so we can do a merge join.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join.  Commit f18c944b61
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.

For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful.  Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.

Ashutosh Bapat, with very substantial cosmetic revisions by me.  Also
reviewed by Rushabh Lathia.
2015-12-22 13:46:40 -05:00
Stephen Frost 6f8cb1e234 Make viewquery a copy in rewriteTargetView()
Rather than expect the Query returned by get_view_query() to be
read-only and then copy bits and pieces of it out, simply copy the
entire structure when we get it.  This addresses an issue where
AcquireRewriteLocks, which is called by acquireLocksOnSubLinks(),
scribbles on the parsetree passed in, which was actually an entry
in relcache, leading to segfaults with certain view definitions.
This also future-proofs us a bit for anyone adding more code to this
path.

The acquireLocksOnSubLinks() was added in commit c3e0ddd40.

Back-patch to 9.3 as that commit was.
2015-12-21 10:34:14 -05:00
Teodor Sigaev bbbd807097 Revert 9246af6799 because
I miss too much. Patch is returned to commitfest process.
2015-12-18 21:35:22 +03:00
Robert Haas 6e7b335930 Remove duplicate word.
Kyotaro Horiguchi
2015-12-18 12:43:52 -05:00
Robert Haas 2bdfcb52c5 Fix TupleQueueReaderNext not to ignore its nowait argument.
This was a silly goof on my (rhaas's) part.

Report and fix by Rushabh Lathia.
2015-12-18 12:37:43 -05:00
Robert Haas 4496226782 Fix copy-and-paste error in logical decoding callback.
This could result in the error context misidentifying where the error
actually occurred.

Craig Ringer
2015-12-18 12:17:35 -05:00
Robert Haas 9a51698bae Fix typo in comment.
Amit Langote
2015-12-18 12:03:15 -05:00
Teodor Sigaev 9246af6799 Allow to omit boundaries in array subscript
Allow to omiy lower or upper or both boundaries in array subscript
for selecting slice of array.

Author: YUriy Zhuravlev
2015-12-18 15:18:58 +03:00
Tom Lane c4a8812cf6 Use just one standalone-backend session for initdb's post-bootstrap steps.
Previously, each subroutine in initdb fired up its own standalone backend
session.  Over time we'd grown as many as fifteen of these sessions,
and the cumulative startup and shutdown work for them was getting pretty
noticeable.  Combining things so that all these steps share a single
backend session cuts a good 10% off the total runtime of initdb, more
if you're not fsync'ing.

The main stumbling block to doing this before was that some of the sessions
were run with -j and some not.  The improved definition of -j mode
implemented by my previous commit makes it possible to fix that by running
all the post-bootstrap steps with -j; we just have to use double instead of
single newlines to end command strings.  (This is only absolutely necessary
around the VACUUM and CREATE DATABASE steps, since those can't be run in a
transaction block.  But it seems best to make them all use double newlines
so that the commands remain separate for error-reporting purposes.)

A minor disadvantage is that since initdb can't tell how much of its
output the backend has executed, we can no longer have the per-step
progress reporting initdb used to print.  But things are fast enough
nowadays that that's not really all that useful anyway.

In passing, add more const decoration to some of the static arrays in
initdb.c.
2015-12-17 19:38:21 -05:00
Tom Lane 66d947b9d3 Adjust behavior of single-user -j mode for better initdb error reporting.
Previously, -j caused the entire input file to be read in and executed as
a single command string.  That's undesirable, not least because any error
causes the entire file to be regurgitated as the "failing query".  Some
experimentation suggests a better rule: end the command string when we see
a semicolon immediately followed by two newlines, ie, an empty line after
a query.  This serves nicely to break up the existing examples such as
information_schema.sql and system_views.sql.  A limitation is that it's
no longer possible to write such a sequence within a string literal or
multiline comment in a file meant to be read with -j; but there are no
instances of such a problem within the data currently used by initdb.
(If someone does make such a mistake in future, it'll be obvious because
they'll get an unterminated-literal or unterminated-comment syntax error.)
Other than that, there shouldn't be any negative consequences; you're not
forced to end statements that way, it's just a better idea in most cases.

In passing, remove src/include/tcop/tcopdebug.h, which is dead code
because it's not included anywhere, and hasn't been for more than
ten years.  One of the debug-support symbols it purported to describe
has been unreferenced for at least the same amount of time, and the
other is removed by this commit on the grounds that it was useless:
forcing -j mode all the time would have broken initdb.  The lack of
complaints about that, or about the missing inclusion, shows that
no one has tried to use TCOP_DONTUSENEWLINE in many years.
2015-12-17 19:34:15 -05:00
Alvaro Herrera 756e7b4c9d Rework internals of changing a type's ownership
This is necessary so that REASSIGN OWNED does the right thing with
composite types, to wit, that it also alters ownership of the type's
pg_class entry -- previously, the pg_class entry remained owned by the
original user, which caused later other failures such as the new owner's
inability to use ALTER TYPE to rename an attribute of the affected
composite.  Also, if the original owner is later dropped, the pg_class
entry becomes owned by a non-existant user which is bogus.

To fix, create a new routine AlterTypeOwner_oid which knows whether to
pass the request to ATExecChangeOwner or deal with it directly, and use
that in shdepReassignOwner rather than calling AlterTypeOwnerInternal
directly.  AlterTypeOwnerInternal is now simpler in that it only
modifies the pg_type entry and recurses to handle a possible array type;
higher-level tasks are handled by either AlterTypeOwner directly or
AlterTypeOwner_oid.

I took the opportunity to add a few more objects to the test rig for
REASSIGN OWNED, so that more cases are exercised.  Additional ones could
be added for superuser-only-ownable objects (such as FDWs and event
triggers) but I didn't want to push my luck by adding a new superuser to
the tests on a backpatchable bug fix.

Per bug #13666 reported by Chris Pacejo.

Backpatch to 9.5.

(I would back-patch this all the way back, except that it doesn't apply
cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched.  If we
decide that we need this in earlier branches too, we should backpatch
both.)
2015-12-17 14:25:41 -03:00
Robert Haas b648b70342 Speed up CREATE INDEX CONCURRENTLY's TID sort.
Encode TIDs as 64-bit integers to speed up comparisons.  This seems to
speed things up on all platforms, but is even more beneficial when
8-byte integers are passed by value.

Peter Geoghegan.  Design suggestions and review by Tom Lane.  Review
also by Simon Riggs and by me.
2015-12-16 15:23:45 -05:00
Robert Haas f27a6b15e6 Mark CHECK constraints declared NOT VALID valid if created with table.
FOREIGN KEY constraints have behaved this way for a long time, but for
some reason the behavior of CHECK constraints has been inconsistent up
until now.

Amit Langote and Amul Sul, with assorted tweaks by me.
2015-12-16 07:43:56 -05:00
Robert Haas 049469e7e7 Teach mdnblocks() not to create zero-length files.
It's entirely surprising that mdnblocks() has the side effect of
creating new files on disk, so let's make it not do that.  One
consequence of the old behavior is that, if running on a damaged
cluster that is missing a file, mdnblocks() can recreate the file
and allow a subsequent _mdfd_getseg() for a higher segment to succeed.
This happens because, while mdnblocks() stops when it finds a segment
that is shorter than 1GB, _mdfd_getseg() has no such check, and thus
the empty file created by mdnblocks() can allow it to continue its
traversal and find higher-numbered segments which remain.

It might be a good idea for _mdfd_getseg() to actually verify that
each segment it finds is exactly 1GB before proceeding to the next
one, but that would involve some additional system calls, so for
now I'm just doing this much.

Patch by me, per off-list analysis by Kevin Grittner and Rahila Syed.
Review by Andres Freund.
2015-12-15 13:57:45 -05:00
Robert Haas 6150a1b08a Move buffer I/O and content LWLocks out of the main tranche.
Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two.  Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).

We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche.  This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.

Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.

Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.
2015-12-15 13:32:54 -05:00
Robert Haas 3fed417452 Provide a way to predefine LWLock tranche IDs.
It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID.  So, for built-in tranches,
use a hard-coded value instead.

This is motivated by an upcoming patch adding further built-in tranches.

Andres Freund and Robert Haas
2015-12-15 11:48:19 -05:00
Stephen Frost e5e11c8cca Collect the global OR of hasRowSecurity flags for plancache
We carry around information about if a given query has row security or
not to allow the plancache to use that information to invalidate a
planned query in the event that the environment changes.

Previously, the flag of one of the subqueries was simply being copied
into place to indicate if the query overall included RLS components.
That's wrong as we need the global OR of all subqueries.  Fix by
changing the code to match how fireRIRules works, which is results
in OR'ing all of the flags.

Noted by Tom.

Back-patch to 9.5 where RLS was introduced.
2015-12-14 20:05:43 -05:00
Alvaro Herrera 0d8f3d5d11 Add missing CHECK_FOR_INTERRUPTS in lseg_inside_poly
Apparently, there are bugs in this code that cause it to loop endlessly.
That bug still needs more research, but in the meantime it's clear that
the loop is missing a check for interrupts so that it can be cancelled
timely.

Backpatch to 9.1 -- this has been missing since 49475aab8d.
2015-12-14 16:44:40 -03:00
Andres Freund cca705a5d9 Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.
Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would
install 0 into MultiXactState->offsetStopLimit if it previously succeeded.
Luckily, there are no known cases where find_multixact_start() will return
an error in 9.5 and above. But if it were to happen, for example due to
filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId()
could continue allocating mxids even if close to a wraparound, or it could
erroneously stop allocating mxids, even if no wraparound is looming.  The
wrong value would be corrected the next time SetOffsetVacuumLimit() is
called, or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Discussion: 20151210140450.GA22278@alap3.anarazel.de
Backpatch: 9.5, where the bug was introduced as part of 4f627f
2015-12-14 11:34:16 +01:00