Commit Graph

15762 Commits

Author SHA1 Message Date
Robert Haas
6c878a7553 Avoid server crash when worker registration fails at execution time.
The previous coding attempts to destroy the DSM in this case, but
child nodes might have stored data there and still be holding onto
pointers in this case.  So don't do that.

Also, free the reader array instead of leaking it.

Extracted from two different patch versions both by Amit Kapila.
2015-11-20 13:03:39 -05:00
Robert Haas
74d0d5f3eb Fix typo in comment.
Amit Langote
2015-11-19 16:45:39 -05:00
Robert Haas
fea2b642fd Remove numbers from incorrectly-numbered list.
Reported by Andres Freund.
2015-11-19 16:45:13 -05:00
Robert Haas
bc4996e61b Make ALTER .. SET SCHEMA do nothing, instead of throwing an ERROR.
This was already true for CREATE EXTENSION, but historically has not
been true for other object types.  Therefore, this is a backward
incompatibility.  Per discussion on pgsql-hackers, everyone seems to
agree that the new behavior is better.

Marti Raudsepp, reviewed by Haribabu Kommi and myself
2015-11-19 10:49:25 -05:00
Robert Haas
7907a949ab Fix incomplete set_foreignscan_references handling for fdw_recheck_quals
KaiGai Kohei
2015-11-18 22:12:21 -05:00
Andres Freund
d3c8ac114f Remove function names from some elog() calls in heapam.c.
At least one of the names was, due to a function renaming late in the
development of ON CONFLICT, wrong. Since including function names in
error messages is against the message style guide anyway, remove them
from the messages.

Discussion: CAM3SWZT8paz=usgMVHm0XOETkQvzjRtAUthATnmaHQQY0obnGw@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-11-19 01:37:58 +01:00
Robert Haas
166b61a88e Avoid aggregating worker instrumentation multiple times.
Amit Kapila, per design ideas from me.
2015-11-18 12:35:25 -05:00
Robert Haas
adeee97486 Fix dumb bug in tqueue.c
When I wrote this code originally, the intention was to recompute the
remapinfo only when the tupledesc changes.  This presumably only
happens once per query, but I copied the design pattern from other
DestReceivers.  However, due to a silly oversight on my part,
tqueue->tupledesc never got set, leading to recomputation for every
tuple.

This should improve the performance of parallel scans that return a
significant number of tuples.

Report by Amit Kapila; patch by me, reviewed by him.
2015-11-18 08:25:33 -05:00
Tom Lane
5f10b7a604 Fix possible internal overflow in numeric division.
div_var_fast() postpones propagating carries in the same way as mul_var(),
so it has the same corner-case overflow risk we fixed in 246693e5ae,
namely that the size of the carries has to be accounted for when setting
the threshold for executing a carry propagation step.  We've not devised
a test case illustrating the brokenness, but the required fix seems clear
enough.  Like the previous fix, back-patch to all active branches.

Dean Rasheed
2015-11-17 15:46:47 -05:00
Peter Eisentraut
c5ec406412 Message style fix
from Euler Taveira
2015-11-17 06:53:07 -05:00
Peter Eisentraut
5db837d3f2 Message improvements 2015-11-16 21:39:23 -05:00
Robert Haas
e93b62985f Remove volatile qualifiers from bufmgr.c and freelist.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Review by Andres Freund
2015-11-16 18:50:06 -05:00
Tom Lane
8004953b5a Speed up ruleutils' name de-duplication code, and fix overlength-name case.
Since commit 11e131854f, ruleutils.c has
attempted to ensure that each RTE in a query or plan tree has a unique
alias name.  However, the code that was added for this could be quite slow,
even as bad as O(N^3) if N identical RTE names must be replaced, as noted
by Jeff Janes.  Improve matters by building a transient hash table within
set_rtable_names.  The hash table in itself reduces the cost of detecting a
duplicate from O(N) to O(1), and we can save another factor of N by storing
the number of de-duplicated names already created for each entry, so that
we don't have to re-try names already created.  This way is probably a bit
slower overall for small range tables, but almost by definition, such cases
should not be a performance problem.

In principle the same problem applies to the column-name-de-duplication
code; but in practice that seems to be less of a problem, first because
N is limited since we don't support extremely wide tables, and second
because duplicate column names within an RTE are fairly rare, so that in
practice the cost is more like O(N^2) not O(N^3).  It would be very much
messier to fix the column-name code, so for now I've left that alone.

An independent problem in the same area was that the de-duplication code
paid no attention to the identifier length limit, and would happily produce
identifiers that were longer than NAMEDATALEN and wouldn't be unique after
truncation to NAMEDATALEN.  This could result in dump/reload failures, or
perhaps even views that silently behaved differently than before.  We can
fix that by shortening the base name as needed.  Fix it for both the
relation and column name cases.

In passing, check for interrupts in set_rtable_names, just in case it's
still slow enough to be an issue.

Back-patch to 9.3 where this code was introduced.
2015-11-16 13:45:17 -05:00
Robert Haas
179c97bf58 Remove accidentally-committed debugging code.
Amit Kapila
2015-11-15 18:07:57 -05:00
Tom Lane
7745bc352a Fix ruleutils.c's dumping of whole-row Vars in ROW() and VALUES() contexts.
Normally ruleutils prints a whole-row Var as "foo.*".  We already knew that
that doesn't work at top level of a SELECT list, because the parser would
treat the "*" as a directive to expand the reference into separate columns,
not a whole-row Var.  However, Joshua Yanovski points out in bug #13776
that the same thing happens at top level of a ROW() construct; and some
nosing around in the parser shows that the same is true in VALUES().
Hence, apply the same workaround already devised for the SELECT-list case,
namely to add a forced cast to the appropriate rowtype in these cases.
(The alternative of just printing "foo" was rejected because it is
difficult to avoid ambiguity against plain columns named "foo".)

Back-patch to all supported branches.
2015-11-15 14:41:09 -05:00
Tom Lane
7d9a4737c2 Improve type numeric's calculations for ln(), log(), exp(), pow().
Set the "rscales" for intermediate-result calculations to ensure that
suitable numbers of significant digits are maintained throughout.  The
previous coding hadn't thought this through in any detail, and as a result
could deliver results with many inaccurate digits, or in the worst cases
even fail with divide-by-zero errors as a result of losing all nonzero
digits of intermediate results.

In exp_var(), get rid entirely of the logic that separated the calculation
into integer and fractional parts: that was neither accurate nor
particularly fast.  The existing range-reduction method of dividing by 2^n
can be applied across the full input range instead of only 0..1, as long as
we are careful to set an appropriate rscale for each step.

Also fix the logic in mul_var() for shortening the calculation when the
caller asks for fewer output digits than an exact calculation would
require.  This bug doesn't affect simple multiplications since that code
path asks for an exact result, but it does contribute to accuracy issues
in the transcendental math functions.

In passing, improve performance of mul_var() a bit by forcing the shorter
input to be on the left, thus reducing the number of iterations of the
outer loop and probably also reducing the number of carry-propagation
steps needed.

This is arguably a bug fix, but in view of the lack of field complaints,
it does not seem worth the risk of back-patching.

Dean Rasheed
2015-11-14 14:55:46 -05:00
Bruce Momjian
e57646e962 Fix spelling error in postgresql.conf
Report by Greg Clough
2015-11-14 14:00:17 -05:00
Robert Haas
fe702a7b3f Move each SLRU's lwlocks to a separate tranche.
This makes it significantly easier to identify these lwlocks in
LWLOCK_STATS or Trace_lwlocks output.  It's also arguably better
from a modularity standpoint, since lwlock.c no longer needs to
know anything about the LWLock needs of the higher-level SLRU
facility.

Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
2015-11-12 14:59:09 -05:00
Robert Haas
ac1d7945f8 Make idle backends exit if the postmaster dies.
Letting backends continue to run if the postmaster has exited prevents
PostgreSQL from being restarted, which in many environments is
catastrophic.  Worse, if some other backend crashes, we no longer have
any protection against shared memory corruption.  So, arrange for them
to exit instead.  We don't want to expend many cycles on this, but
including postmaster death in the set of things that we wait for when
a backend is idle seems cheap enough.

Rajeev Rastogi and Robert Haas
2015-11-12 09:00:33 -05:00
Robert Haas
a05dc4d7fd Provide readfuncs support for custom scans.
Commit a0d9f6e434 added this support for
all other plan node types; this fills in the gap.

Since TextOutCustomScan complicates this and is pretty well useless,
remove it.

KaiGai Kohei, with some modifications by me.
2015-11-12 07:40:31 -05:00
Tom Lane
da3751c8ea Be more noisy about "wrong number of nailed relations" initfile problems.
In commit 5d1ff6bd55 I added some logic to
relcache.c to try to ensure that the regression tests would fail if we
made a mistake about which relations belong in the relcache init files.
I'm quite sure I tested that, but I must have done so only for the
non-shared-catalog case, because a report from Adam Brightwell showed that
the regression tests still pass just fine if we bollix the shared-catalog
init file in the way this code was supposed to catch.  The reason is that
that file gets loaded before we do client authentication, so the WARNING
is not sent to the client, only to the postmaster log, where it's far too
easily missed.

The least Rube Goldbergian answer to this is to put an Assert(false)
after the elog(WARNING).  That will certainly get developers' attention,
while not breaking production builds' ability to recover from corner
cases with similar symptoms.

Since this is only of interest to developers, there seems no need for
a back-patch, even though the previous commit went into all branches.
2015-11-11 13:39:21 -05:00
Robert Haas
80558c1f5a Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.

Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.

Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.

Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.

Patch by me, reviewed in earlier versions by Amit Kapila.
2015-11-11 09:02:52 -05:00
Robert Haas
f0661c4e8c Make sequential scans parallel-aware.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure.  Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged.  It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.

Amit Kapila, with a fair amount of adjustment by me.  Review of previous
patch versions by Haribabu Kommi and others.
2015-11-11 08:57:52 -05:00
Robert Haas
f764ecd81b Add outfuncs.c support for GatherPath.
I dunno how commit 3bd909b220 missed
this, but it evidently did.
2015-11-11 06:29:03 -05:00
Tom Lane
b05ae27e9a Add missing "static" qualifier.
Per buildfarm member pademelon.
2015-11-10 18:24:18 -05:00
Robert Haas
5c90a2ffdd Comment update.
Adjust to account for 5fc4c26db5.

Etsuro Fujita
2015-11-09 13:48:50 -05:00
Robert Haas
bf3d015631 Fix rebasing mistake in nodeGather.c
The patches committed as 6e71dd7ce9
and 3a1f8611f2 were developed in
parallel but dependent on each other in a way that I failed to
notice.

This patch to fix the problem was prepared by Amit Kapila.
2015-11-09 10:49:24 -05:00
Robert Haas
89ff5c7f75 Add a dummy return statement to TupleQueueRemap.
This is unreachable for multiple reasons, but per Amit Kapila the
Windows compiler he is using still thinks we can get there.
2015-11-09 10:45:32 -05:00
Andres Freund
f3a764b0da Set replication origin when decoding commit records.
By accident the replication origin was not set properly in
DecodeCommit(). That's bad because the origin is passed to the output
plugins origin filter, and accessible from the output plugin via
ReorderBufferTXN->origin_id.  Accessing the origin of individual changes
worked before the fix, which is why this wasn't notices earlier.

Reported-By: Craig Ringer
Author: Craig Ringer
Discussion: CAMsr+YFhBJLp=qfSz3-J+0P1zLkE8zNXM2otycn20QRMx380gw@mail.gmail.com
Backpatch: 9.5, where replication origins where introduced
2015-11-09 00:03:35 +01:00
Robert Haas
fba60e573e Remove set-but-not-used variables.
Reported by both Peter Eisentraunt and Kevin Grittner.
2015-11-07 20:25:32 -05:00
Tom Lane
c5e86ea932 Add "xid <> xid" and "xid <> int4" operators.
The corresponding "=" operators have been there a long time, and not
having their negators is a bit of a nuisance.

Michael Paquier
2015-11-07 16:40:15 -05:00
Tom Lane
a43b4ab111 Fix enforcement of restrictions inside regexp lookaround constraints.
Lookahead and lookbehind constraints aren't allowed to contain backrefs,
and parentheses within them are always considered non-capturing.  Or so
says the manual.  But the regexp parser forgot about these rules once
inside a parenthesized subexpression, so that constructs like (\w)(?=(\1))
were accepted (but then not correctly executed --- a case like this acted
like (\w)(?=\w), without any enforcement that the two \w's match the same
text).  And in (?=((foo))) the innermost parentheses would be counted as
capturing parentheses, though no text would ever be captured for them.

To fix, properly pass down the "type" argument to the recursive invocation
of parse().

Back-patch to all supported branches; it was agreed that silent
misexecution of such patterns is worse than throwing an error, even though
new errors in minor releases are generally not desirable.
2015-11-07 12:43:24 -05:00
Robert Haas
8d7396e509 Try to convince gcc that TupleQueueRemap never falls off the end.
Without this, MacOS gcc version 4.2.1 isn't convinced.
2015-11-06 23:04:21 -05:00
Robert Haas
6e71dd7ce9 Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend.  Transferring such tuples without modification between two
cooperating backends does not work.  This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves.  The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender.  That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.

Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues.  This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was.  typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
2015-11-06 16:58:45 -05:00
Robert Haas
cbb82e370d Remove unnecessary cast in previous commit.
Noted by Kyotaro Horiguchi, who also reviewed the previous patch, but
I failed to notice his review before committing.
2015-11-06 12:17:31 -05:00
Robert Haas
a76ef15d9f Add sort support routine for the UUID data type.
This introduces a simple encoding scheme to produce abbreviated keys:
pack as many bytes of each UUID as will fit into a Datum.  On
little-endian machines, a byteswap is also performed; the abbreviated
comparator can therefore just consist of a simple 3-way unsigned integer
comparison.

The purpose of this change is to speed up sorting data on a column
of type UUID.

Peter Geoghegan
2015-11-06 12:14:35 -05:00
Stephen Frost
5644419b3d Set include_realm=1 default in parse_hba_line
With include_realm=1 being set down in parse_hba_auth_opt, if multiple
options are passed on the pg_hba line, such as:

host all     all    0.0.0.0/0    gss include_realm=0 krb_realm=XYZ.COM

We would mistakenly reset include_realm back to 1.  Instead, we need to
set include_realm=1 up in parse_hba_line, prior to parsing any of the
additional options.

Discovered by Jeff McCormick during testing.

Bug introduced by 9a08841.

Back-patch to 9.5
2015-11-06 11:18:27 -05:00
Robert Haas
8a1fab36ab pg_size_pretty: Format negative values similar to positive ones.
Previously, negative values were always displayed in bytes, regardless
of how large they were.

Adrian Vondendriesch, reviewed by Julien Rouhaud and myself
2015-11-06 11:03:02 -05:00
Tom Lane
b23af45875 Fix erroneous hash calculations in gin_extract_jsonb_path().
The jsonb_path_ops code calculated hash values inconsistently in some cases
involving nested arrays and objects.  This would result in queries possibly
not finding entries that they should find, when using a jsonb_path_ops GIN
index for the search.  The problem cases involve JSONB values that contain
both scalars and sub-objects at the same nesting level, for example an
array containing both scalars and sub-arrays.  To fix, reset the current
stack->hash after processing each value or sub-object, not before; and
don't try to be cute about the outermost level's initial hash.

Correcting this means that existing jsonb_path_ops indexes may now be
inconsistent with the new hash calculation code.  The symptom is the same
--- searches not finding entries they should find --- but the specific
rows affected are likely to be different.  Users will need to REINDEX
jsonb_path_ops indexes to make sure that all searches work as expected.

Per bug #13756 from Daniel Cheng.  Back-patch to 9.4 where the faulty
logic was introduced.
2015-11-05 18:15:48 -05:00
Robert Haas
64b2e7ad91 Pass extra data to bgworkers, and use this to fix parallel contexts.
Up until now, the total amount of data that could be passed to a
background worker at startup was one datum, which can be a small as
4 bytes on some systems.  That's enough to pass a dsm_handle or an
array index, but not much else.  Add a bgw_extra flag to the
BackgroundWorker struct, allowing up to 128 bytes to be passed to
a new worker on any platform.

Use this to fix a problem I recently discovered with the parallel
context machinery added in 9.5: the master assigns each worker an
array index, and each worker subsequently assigns itself an array
index, and there's nothing to guarantee that the two sets of indexes
match, leading to chaos.

Normally, I would not back-patch the change to add bgw_extra, since it
is basically a feature addition.  However, since 9.5 is still in beta
and there seems to be no other sensible way to repair the broken
parallel context machinery, back-patch to 9.5.  Existing background
worker code can ignore the bgw_extra field without a problem, but
might need to be recompiled since the structure size has changed.

Report and patch by me.  Review by Amit Kapila.
2015-11-05 12:13:56 -05:00
Tom Lane
59464bd6f9 Improve implementation of GEQO's init_tour() function.
Rather than filling a temporary array and then copying values to the
output array, we can generate the required random permutation in-place
using the Fisher-Yates shuffle algorithm.  This is shorter as well as
more efficient than before.  It's pretty unlikely that anyone would
notice a speed improvement, but shorter code is better.

Nathan Wagner, edited a bit by me
2015-11-05 10:46:14 -05:00
Peter Eisentraut
7bd099d511 Update spelling of COPY options
The preferred spelling was changed from FORCE QUOTE to FORCE_QUOTE and
the like, but some code was still referring to the old spellings.
2015-11-04 21:01:26 -05:00
Tom Lane
d894941663 Allow postgres_fdw to ship extension funcs/operators for remote execution.
The user can whitelist specified extension(s) in the foreign server's
options, whereupon we will treat immutable functions and operators of those
extensions as candidates to be sent for remote execution.

Whitelisting an extension in this way basically promises that the extension
exists on the remote server and behaves compatibly with the local instance.
We have no way to prove that formally, so we have to rely on the user to
get it right.  But this seems like something that people can usually get
right in practice.

We might in future allow functions and operators to be whitelisted
individually, but extension granularity is a very convenient special case,
so it got done first.

The patch as-committed lacks any regression tests, which is unfortunate,
but introducing dependencies on other extensions for testing purposes
would break "make installcheck" scenarios, which is worse.  I have some
ideas about klugy ways around that, but it seems like material for a
separate patch.  For the moment, leave the problem open.

Paul Ramsey, hacked up a bit more by me
2015-11-03 18:42:18 -05:00
Robert Haas
ee44cb7566 Improve comments about abbreviation abort.
Peter Geoghegan
2015-11-03 14:11:49 -05:00
Robert Haas
4efe26cbd3 shm_mq: Third attempt at fixing nowait behavior in shm_mq_receive.
Commit a1480ec1d3 purported to fix the
problems with commit b2ccb5f4e6, but it
didn't completely fix them.  The problem is that the checks were
performed in the wrong order, leading to a race condition.  If the
sender attached, sent a message, and detached after the receiver
called shm_mq_get_sender and before the receiver called
shm_mq_counterparty_gone, we'd incorrectly return SHM_MQ_DETACHED
before all messages were read.  Repair by reversing the order of
operations, and add a long comment explaining why this new logic is
(hopefully) correct.
2015-11-03 09:12:52 -05:00
Robert Haas
0279f62fdc Correct tiny inaccuracy in strxfrm cache comment.
Peter Geoghegan
2015-11-03 08:32:22 -05:00
Tom Lane
620ac88d6f Remove some more dead Alpha-specific code. 2015-11-02 19:37:51 -05:00
Robert Haas
1efc7e5382 Fix problems with ParamListInfo serialization mechanism.
Commit d1b7c1ffe7 introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker.  However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch.  Repair.

Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case.  To fix, make the bms_is_member test
in that function unconditional.

Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list.  This isn't necessary for correctness, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.

Design suggestions and extensive review by Noah Misch.  Patch by me.
2015-11-02 18:11:29 -05:00
Kevin Grittner
585e2a3b1a Fix serialization anomalies due to race conditions on INSERT.
On insert the CheckForSerializableConflictIn() test was performed
before the page(s) which were going to be modified had been locked
(with an exclusive buffer content lock).  If another process
acquired a relation SIReadLock on the heap and scanned to a page on
which an insert was going to occur before the page was so locked,
a rw-conflict would be missed, which could allow a serialization
anomaly to be missed.  The window between the check and the page
lock was small, so the bug was generally not noticed unless there
was high concurrency with multiple processes inserting into the
same table.

This was reported by Peter Bailis as bug #11732, by Sean Chittenden
as bug #13667, and by others.

The race condition was eliminated in heap_insert() by moving the
check down below the acquisition of the buffer lock, which had been
the very next statement.  Because of the loop locking and unlocking
multiple buffers in heap_multi_insert() a check was added after all
inserts were completed.  The check before the start of the inserts
was left because it might avoid a large amount of work to detect a
serialization anomaly before performing the all of the inserts and
the related WAL logging.

While investigating this bug, other SSI bugs which were even harder
to hit in practice were noticed and fixed, an unnecessary check
(covered by another check, so redundant) was removed from
heap_update(), and comments were improved.

Back-patch to all supported branches.

Kevin Grittner and Thomas Munro
2015-10-31 14:43:34 -05:00
Tom Lane
12c9a04008 Implement lookbehind constraints in our regular-expression engine.
A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point.  This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.

Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked.  In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall.  To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops.  This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space.  So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.

I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND
constraints, which already existed in the engine.  This avoids the overhead
of the LACON machinery entirely for these rather common cases.

The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely.  So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares.  (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)
2015-10-30 19:14:19 -04:00
Robert Haas
3a1f8611f2 Update parallel executor support to reuse the same DSM.
Commit b0b0d84b3d purported to make it
possible to relaunch workers using the same parallel context, but it had
an unpleasant race condition: we might reinitialize after the workers
have sent their last control message but before they have dettached the
DSM, leaving to crashes.  Repair by introducing a new ParallelContext
operation, ReinitializeParallelDSM.

Adjust execParallel.c to use this new support, so that we can rescan a
Gather node by relaunching workers but without needing to recreate the
DSM.

Amit Kapila, with some adjustments by me.  Extracted from latest parallel
sequential scan patch.
2015-10-30 10:44:54 +01:00
Robert Haas
c6baec92fc Fix typo in bgworker.c 2015-10-30 10:35:33 +01:00
Peter Eisentraut
a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Robert Haas
d455651624 Add missing serial comma, for consistency.
Amit Langote, per Etsuro Fujita
2015-10-28 12:19:14 +01:00
Robert Haas
9dcce7123e Fix incorrect message in ATWrongRelkindError.
Mistake introduced by commit 3bf3ab8c56.

Etsuro Fujita
2015-10-28 11:47:19 +01:00
Robert Haas
8538a63070 Make Gather node projection-capable.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo.  There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does.  Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
2015-10-28 00:27:58 +01:00
Alvaro Herrera
21a4e4a4c9 Fix BRIN free space computations
A bug in the original free space computation made it possible to
return a page which wasn't actually able to fit the item.  Since the
insertion code isn't prepared to deal with PageAddItem failing, a PANIC
resulted ("failed to add BRIN tuple [to new page]").  Add a macro to
encapsulate the correct computation, and use it in
brin_getinsertbuffer's callers before calling that routine, to raise an
early error.

I became aware of the possiblity of a problem in this area while working
on ccc4c07499.  There's no archived discussion about it, but it's
easy to reproduce a problem in the unpatched code with something like

CREATE TABLE t (a text);
CREATE INDEX ti ON t USING brin (a) WITH (pages_per_range=1);

for length in `seq 8000 8196`
do
	psql -f - <<EOF
TRUNCATE TABLE t;
INSERT INTO t VALUES ('z'), (repeat('a', $length));
EOF
done

Backpatch to 9.5, where BRIN was introduced.
2015-10-27 18:17:55 -03:00
Alvaro Herrera
531d21b75f Cleanup commit timestamp module activaction, again
Further tweak commit_ts.c so that on a standby the state is completely
consistent with what that in the master, rather than behaving
differently in the cases that the settings differ.  Now in standby and
master the module should always be active or inactive in lockstep.

Author: Petr Jelínek, with some further tweaks by Álvaro Herrera.

Backpatch to 9.5, where commit timestamps were introduced.

Discussion: http://www.postgresql.org/message-id/5622BF9D.2010409@2ndquadrant.com
2015-10-27 15:06:50 -03:00
Alvaro Herrera
0cd836a4e8 Measure string lengths only once
Bernd Helmle complained that CreateReplicationSlot() was assigning the
same value to the same variable twice, so we could remove one of them.
Code inspection reveals that we can actually remove both assignments:
according to the author the assignment was there for beauty of the
strlen line only, and another possible fix to that is to put the strlen
in its own line, so do that.

To be consistent within the file, refactor all duplicated strlen()
calls, which is what we do elsewhere in the backend anyway.  In
basebackup.c, snprintf already returns the right length; no need for
strlen afterwards.

Backpatch to 9.4, where replication slots were introduced, to keep code
identical.  Some of this is older, but the patch doesn't apply cleanly
and it's only of cosmetic value anyway.

Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan
2015-10-27 13:20:40 -03:00
Robert Haas
a1480ec1d3 shm_mq: Repair breakage from previous commit.
If the counterparty writes some data into the queue and then detaches,
it's wrong to return SHM_MQ_DETACHED right away.  If we do that, we
fail to read whatever was written.
2015-10-22 22:01:11 -04:00
Robert Haas
872101bede Add two missing cases to ATWrongRelkindError.
This way, we produce a better error message if someone tries to do
something like ALTER INDEX .. ALTER COLUMN .. SET STORAGE.

Amit Langote
2015-10-22 17:00:53 -04:00
Robert Haas
b2ccb5f4e6 shm_mq: Fix failure to notice a dead counterparty when nowait is used.
The shm_mq mechanism was intended to optionally notice when the process
on the other end of the queue fails to attach to the queue.  It does
this by allowing the user to pass a BackgroundWorkerHandle; if the
background worker in question is launched and dies without attaching
to the queue, then we know it never will.  This logic works OK in
blocking mode, but when called with nowait = true we fail to notice
that this has happened due to an asymmetry in the logic.  Repair.

Reported off-list by Rushabh Lathia.  Patch by me.
2015-10-22 16:33:30 -04:00
Robert Haas
31ba62ce32 Fix typos in comments.
CharSyam
2015-10-22 14:52:23 -04:00
Tom Lane
d371bebd3d Remove redundant CREATEUSER/NOCREATEUSER options in CREATE ROLE et al.
Once upon a time we did not have a separate CREATEROLE privilege, and
CREATEUSER effectively meant SUPERUSER.  When we invented CREATEROLE
(in 8.1) we also added SUPERUSER so as to have a less confusing keyword
for this role property.  However, we left CREATEUSER in place as a
deprecated synonym for SUPERUSER, because of backwards-compatibility
concerns.  It's still there and is still confusing people, as for example
in bug #13694 from Justin Catterson.  9.6 will be ten years or so later,
which surely ought to be long enough to end the deprecation and just
remove these old keywords.  Hence, do so.
2015-10-22 09:34:03 -07:00
Robert Haas
bde39eed0c Fix a couple of bugs in recent parallelism-related commits.
Commit 816e336f12 added the wrong error
check to async.c; sending restrictions is restricted to the leader,
not altogether unsafe.

Commit 3bd909b220 added ExecShutdownNode
to traverse the planstate tree and call shutdown functions, but made
a Gather node, the only node that actually has such a function, abort
the tree traversal, which is wrong.
2015-10-22 10:49:20 -04:00
Robert Haas
1a219fa15b Add header comments to execParallel.c and nodeGather.c.
Patch by me, per a note from Simon Riggs.  Reviewed by Amit Kapila
and Amit Langote.
2015-10-22 10:37:24 -04:00
Tom Lane
d435542583 Fix incorrect translation of minus-infinity datetimes for json/jsonb.
Commit bda76c1c8c caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json().  Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely.  Per bug #13687 from Stepan Perlov.  Back-patch to 9.4, like
the previous commit.

In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
2015-10-20 11:07:04 -07:00
Robert Haas
dc486fb969 Remove duplicate word.
Amit Langote
2015-10-20 10:29:19 -04:00
Robert Haas
84ef9c596e Put back ssl_renegotiation_limit parameter, but only allow 0.
Per a report from Shay Rojansky, Npgsql sends ssl_renegotiation_limit=0
in the startup packet because it does not support renegotiation; other
clients which have not attempted to support renegotiation might well
behave similarly.  The recent removal of this parameter forces them to
break compatibility with either current PostgreSQL versions, or
previous ones.  Per discussion, the best solution is to accept the
parameter but only allow a value of 0.

Shay Rojansky, edited a little by me.
2015-10-20 09:56:04 -04:00
Robert Haas
5be94a9eb1 Be a bit more rigorous about how we cache strcoll and strxfrm results.
Commit 0e57b4d8bd contained some clever
logic that attempted to make sure that we couldn't get confused about
whether the last thing we cached was a strcoll() result or a strxfrm()
result, but it wasn't quite clever enough, because we can perform
further abbreviations after having already performed some comparisons.
Introduce an explicit flag in the hopes of making this watertight.

Peter Geoghegan, reviewed by me.
2015-10-20 09:27:50 -04:00
Robert Haas
d53f808e7e Remove obsolete comment.
Peter Geoghegan
2015-10-20 09:15:13 -04:00
Tom Lane
9f1e642d50 Fix incorrect handling of lookahead constraints in pg_regprefix().
pg_regprefix was doing nothing with lookahead constraints, which would
be fine if it were the right kind of nothing, but it isn't: we have to
terminate our search for a fixed prefix, not just pretend the LACON arc
isn't there.  Otherwise, if the current state has both a LACON outarc and a
single plain-color outarc, we'd falsely conclude that the color represents
an addition to the fixed prefix, and generate an extracted index condition
that restricts the indexscan too much.  (See added regression test case.)

Terminating the search is conservative: we could traverse the LACON arc
(thus assuming that the constraint can be satisfied at runtime) and then
examine the outarcs of the linked-to state.  But that would be a lot more
work than it seems worth, because writing a LACON followed by a single
plain character is a pretty silly thing to do.

This makes a difference only in rather contrived cases, but it's a bug,
so back-patch to all supported branches.
2015-10-19 13:54:53 -07:00
Robert Haas
ee7ca559fc Add a C API for parallel heap scans.
Using this API, one backend can set up a ParallelHeapScanDesc to
which multiple backends can then attach.  Each tuple in the relation
will be returned to exactly one of the scanning backends.  Only
forward scans are supported, and rescans must be carefully
coordinated.

This is not exposed to the planner or executor yet.

The original version of this code was written by me.  Amit Kapila
reviewed it, tested it, and improved it, including adding support for
synchronized scans, per review comments from Jeff Davis.  Extensive
testing of this and related patches was performed by Haribabu Kommi.
Final cleanup of this patch by me.
2015-10-16 17:33:18 -04:00
Robert Haas
b0b0d84b3d Allow a parallel context to relaunch workers.
This may allow some callers to avoid the overhead involved in tearing
down a parallel context and then setting up a new one, which means
releasing the DSM and then allocating and populating a new one.  I
suspect we'll want to revise the Gather node to make use of this new
capability, but even if not it may be useful elsewhere and requires
very little additional code.
2015-10-16 17:18:05 -04:00
Tom Lane
afdfcd3f76 Miscellaneous cleanup of regular-expression compiler.
Revert our previous addition of "all" flags to copyins() and copyouts();
they're no longer needed, and were never anything but an unsightly hack.

Improve a couple of infelicities in the REG_DEBUG code for dumping
the NFA data structure, including adding code to count the total
number of states and arcs.

Add a couple of missed error checks.

Add some more documentation in the README file, and some regression tests
illustrating cases that exceeded the state-count limit and/or took
unreasonable amounts of time before this set of patches.

Back-patch to all supported branches.
2015-10-16 15:55:59 -04:00
Tom Lane
538b3b8b35 Improve memory-usage accounting in regular-expression compiler.
This code previously counted the number of NFA states it created, and
complained if a limit was exceeded, so as to prevent bizarre regex patterns
from consuming unreasonable time or memory.  That's fine as far as it went,
but the code paid no attention to how many arcs linked those states.  Since
regexes can be contrived that have O(N) states but will need O(N^2) arcs
after fixempties() processing, it was still possible to blow out memory,
and take a long time doing it too.  To fix, modify the bookkeeping to count
space used by both states and arcs.

I did not bother with including the "color map" in the accounting; it
can only grow to a few megabytes, which is not a lot in comparison to
what we're allowing for states+arcs (about 150MB on 64-bit machines
or half that on 32-bit machines).

Looking at some of the larger real-world regexes captured in the Tcl
regression test suite suggests that the most that is likely to be needed
for regexes found in the wild is under 10MB, so I believe that the current
limit has enough headroom to make it okay to keep it as a hard-wired limit.

In connection with this, redefine REG_ETOOBIG as meaning "regular
expression is too complex"; the previous wording of "nfa has too many
states" was already somewhat inapropos because of the error code's use
for stack depth overrun, and it was not very user-friendly either.

Back-patch to all supported branches.
2015-10-16 15:55:59 -04:00
Tom Lane
6a7153661d Improve performance of pullback/pushfwd in regular-expression compiler.
The previous coding would create a new intermediate state every time it
wanted to interchange the ordering of two constraint arcs.  Certain regex
features such as \Y can generate large numbers of parallel constraint arcs,
and if we needed to reorder the results of that, we created unreasonable
numbers of intermediate states.  To improve matters, keep a list of
already-created intermediate states associated with the state currently
being considered by the outer loop; we can re-use such states to place all
the new arcs leading to the same destination or source.

I also took the trouble to redefine push() and pull() to have a less risky
API: they no longer delete any state or arc that the caller might possibly
have a pointer to, except for the specifically-passed constraint arc.
This reduces the risk of re-introducing the same type of error seen in
the failed patch for CVE-2007-4772.

Back-patch to all supported branches.
2015-10-16 15:55:59 -04:00
Tom Lane
f5b7d103bc Improve performance of fixempties() pass in regular-expression compiler.
The previous coding took something like O(N^4) time to fully process a
chain of N EMPTY arcs.  We can't really do much better than O(N^2) because
we have to insert about that many arcs, but we can do lots better than
what's there now.  The win comes partly from using mergeins() to amortize
de-duplication of arcs across multiple source states, and partly from
exploiting knowledge of the ordering of arcs for each state to avoid
looking at arcs we don't need to consider during the scan.  We do have
to be a bit careful of the possible reordering of arcs introduced by
the sort-merge coding of the previous commit, but that's not hard to
deal with.

Back-patch to all supported branches.
2015-10-16 15:55:59 -04:00
Tom Lane
579840ca05 Fix O(N^2) performance problems in regular-expression compiler.
Change the singly-linked in-arc and out-arc lists to be doubly-linked,
so that arc deletion is constant time rather than having worst-case time
proportional to the number of other arcs on the connected states.

Modify the bulk arc transfer operations copyins(), copyouts(), moveins(),
moveouts() so that they use a sort-and-merge algorithm whenever there's
more than a small number of arcs to be copied or moved.  The previous
method is O(N^2) in the number of arcs involved, because it performs
duplicate checking independently for each copied arc.  The new method may
change the ordering of existing arcs for the destination state, but nothing
really cares about that.

Provide another bulk arc copying method mergeins(), which is unused as
of this commit but is needed for the next one.  It basically is like
copyins(), but the source arcs might not all come from the same state.

Replace the O(N^2) bubble-sort algorithm used in carcsort() with a qsort()
call.

These changes greatly improve the performance of regex compilation for
large or complex regexes, at the cost of extra space for arc storage during
compilation.  The original tradeoff was probably fine when it was made, but
now we care more about speed and less about memory consumption.

Back-patch to all supported branches.
2015-10-16 15:55:59 -04:00
Tom Lane
48789c5d23 Fix regular-expression compiler to handle loops of constraint arcs.
It's possible to construct regular expressions that contain loops of
constraint arcs (that is, ^ $ AHEAD BEHIND or LACON arcs).  There's no use
in fully traversing such a loop at execution, since you'd just end up in
the same NFA state without having consumed any input.  Worse, such a loop
leads to infinite looping in the pullback/pushfwd stage of compilation,
because we keep pushing or pulling the same constraints around the loop
in a vain attempt to move them to the pre or post state.  Such looping was
previously recognized in CVE-2007-4772; but the fix only handled the case
of trivial single-state loops (that is, a constraint arc leading back to
its source state) ... and not only that, it was incorrect even for that
case, because it broke the admittedly-not-very-clearly-stated API contract
of the pull() and push() subroutines.  The first two regression test cases
added by this commit exhibit patterns that result in assertion failures
because of that (though there seem to be no ill effects in non-assert
builds).  The other new test cases exhibit multi-state constraint loops;
in an unpatched build they will run until the NFA state-count limit is
exceeded.

To fix, remove the code added for CVE-2007-4772, and instead create a
general-purpose constraint-loop-breaking phase of regex compilation that
executes before we do pullback/pushfwd.  Since we never need to traverse
a constraint loop fully, we can just break the loop at any chosen spot,
if we add clone states that can replicate any sequence of arc transitions
that would've traversed just part of the loop.

Also add some commentary clarifying why we have to have all these
machinations in the first place.

This class of problems has been known for some time --- we had a report
from Marc Mamin about two years ago, for example, and there are related
complaints in the Tcl bug tracker.  I had discussed a fix of this kind
off-list with Henry Spencer, but didn't get around to doing something
about it until the issue was rediscovered by Greg Stark recently.

Back-patch to all supported branches.
2015-10-16 15:55:58 -04:00
Robert Haas
d53e3d5fe0 Remove volatile qualifiers from proc.c and procarray.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Michael Paquier
2015-10-16 14:20:36 -04:00
Robert Haas
430008b5a7 Remove volatile qualifiers from dynahash.c, shmem.c, and sinvaladt.c
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.

Thomas Munro
2015-10-16 14:14:15 -04:00
Robert Haas
a53c06a13e Prohibit parallel query when the isolation level is serializable.
In order for this to be safe, the code which hands true serializability
will need to taught that the SIRead locks taken by a parallel worker
pertain to the same transaction as those taken by the parallel leader.
Some further changes may be needed as well.  Until the necessary
adaptations are made, don't generate parallel plans in serializable
mode, and if a previously-generated parallel plan is used after
serializable mode has been activated, run it serially.

This fixes a bug in commit 7aea8e4f2d.
2015-10-16 11:58:27 -04:00
Robert Haas
bfc78d7196 Rewrite interaction of parallel mode with parallel executor support.
In the previous coding, before returning from ExecutorRun, we'd shut
down all parallel workers.  This was dead wrong if ExecutorRun was
called with a non-zero tuple count; it had the effect of truncating
the query output.  To fix, give ExecutePlan control over whether to
enter parallel mode, and have it refuse to do so if the tuple count
is non-zero.  Rewrite the Gather logic so that it can cope with being
called outside parallel mode.

Commit 7aea8e4f2d is largely to blame
for this problem, though this patch modifies some subsequently-committed
code which relied on the guarantees it purported to make.
2015-10-16 11:56:02 -04:00
Robert Haas
816e336f12 Mark more functions parallel-restricted or parallel-unsafe.
Commit 7aea8e4f2d was overoptimistic
about the degree of safety associated with running various functions
in parallel mode.  Functions that take a table name or OID as an
argument are at least parallel-restricted, because the table might be
temporary, and we currently don't allow parallel workers to touch
temporary tables.  Functions that take a query as an argument are
outright unsafe, because the query could be anything, including a
parallel-unsafe query.

Also, the queue of pending notifications is backend-private, so adding
to it from a worker doesn't behave correctly.  We could fix this by
transferring the worker's queue of pending notifications to the master
during worker cleanup, but that seems like more trouble than it's
worth for now.  In addition to adjusting the pg_proc.h markings, also
add an explicit check for this in async.c.
2015-10-16 11:49:31 -04:00
Robert Haas
82b37765c7 Fix a problem with parallel workers being unable to restore role.
check_role() tries to verify that the user has permission to become the
requested role, but this is inappropriate in a parallel worker, which
needs to exactly recreate the master's authorization settings.  So skip
the check in that case.

This fixes a bug in commit 924bcf4f16.
2015-10-16 11:37:19 -04:00
Robert Haas
6de6d96d97 Invalidate caches after cranking up a parallel worker transaction.
Starting a parallel worker transaction changes our notion of which XIDs
are in-progress or committed, and our notion of the current command
counter ID.  Therefore, our view of these caches prior to starting
this transaction may no longer valid.  Defend against that by clearing
them.

This fixes a bug in commit 924bcf4f16.
2015-10-16 11:31:23 -04:00
Robert Haas
94b4f7e2a6 Tighten up application of parallel mode checks.
Commit 924bcf4f16 failed to enforce
parallel mode checks during the commit of a parallel worker, because
we exited parallel mode prior to ending the transaction so that we
could pop the active snapshot.  Re-establish parallel mode during
parallel worker commit.  Without this, it's far too easy for unsafe
actions during the pre-commit sequence to crash the server instead of
hitting the error checks as intended.

Just to be extra paranoid, adjust a couple of the sanity checks in
xact.c to check not only IsInParallelMode() but also
IsParallelWorker().
2015-10-16 09:59:57 -04:00
Robert Haas
423ec0877f Transfer current command counter ID to parallel workers.
Commit 924bcf4f16 correctly forbade
parallel workers to modify the command counter while in parallel mode,
but it inexplicably neglected to actually transfer the current command
counter from leader to workers.  This can result in the workers seeing
a different set of tuples from the leader, which is bad.  Repair.
2015-10-16 09:58:42 -04:00
Robert Haas
2ad5c27bb5 Don't send protocol messages to a shm_mq that no longer exists.
Commit 2bd9e412f9 introduced a mechanism
for relaying protocol messages from a background worker to another
backend via a shm_mq.  However, there was no provision for shutting
down the communication channel.  Therefore, a protocol message sent
late in the shutdown sequence, such as a DEBUG message resulting from
cranking up log_min_messages, could crash the server.  To fix, install
an on_dsm_detach callback that disables sending messages to the shm_mq
when the associated DSM is detached.
2015-10-16 09:42:33 -04:00
Tom Lane
3587cbc34f Fix NULL handling in datum_to_jsonb().
The function failed to adhere to its specification that the "tcategory"
argument should not be examined when the input value is NULL.  This
resulted in a crash in some cases.  Per bug #13680 from Boyko Yordanov.

In passing, re-pgindent some recent changes in jsonb.c, and fix a rather
ungrammatical comment.

Diagnosis and patch by Michael Paquier, cosmetic changes by me
2015-10-15 13:46:09 -04:00
Robert Haas
08fbad0afd Revert "Have dtrace depend on object files directly, not objfiles.txt"
This reverts commit 7353782853.  Per
report from Tom Lane, this breaks parallel builds.
2015-10-15 13:16:03 -04:00
Robert Haas
5fc4c26db5 Allow FDWs to push down quals without breaking EvalPlanQual rechecks.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.

This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down.  Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.

Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
2015-10-15 13:00:40 -04:00
Alvaro Herrera
817588bc2b Fix bogus comments
Author: Amit Langote
2015-10-15 12:20:11 -03:00
Robert Haas
7353782853 Have dtrace depend on object files directly, not objfiles.txt
Per Mark Johnston, this resolves a build error on FreeBSD related
to the fact that dtrace is modifying the generated object files
under the hood.  Consequently, without this, dtrace gets reinvoked
at install time because the object files have been updated.  This
is a pretty hacky fix, but it shouldn't hurt anything, and it's
not clear that it's worth expending any more effort for a feature
that not too many people are using.

Patch by Mark Johnston.  This is arguably back-patchable as a bug
fix to the build system, but I'm not certain enough of the
consequences to try that.  Let's see what the buildfarm (and
our packagers) think of this change on master first.
2015-10-13 15:39:58 -04:00
Robert Haas
b8dd19af50 Improve INSERT .. ON CONFLICT error message.
Peter Geoghegan, reviewed by me.
2015-10-13 15:33:07 -04:00
Tom Lane
869f693a36 On Windows, ensure shared memory handle gets closed if not being used.
Postmaster child processes that aren't supposed to be attached to shared
memory were not bothering to close the shared memory mapping handle they
inherit from the postmaster process.  That's mostly harmless, since the
handle vanishes anyway when the child process exits -- but the syslogger
process, if used, doesn't get killed and restarted during recovery from a
backend crash.  That meant that Windows doesn't see the shared memory
mapping as becoming free, so it doesn't delete it and the postmaster is
unable to create a new one, resulting in failure to recover from crashes
whenever logging_collector is turned on.

Per report from Dmitry Vasilyev.  It's a bit astonishing that we'd not
figured this out long ago, since it's been broken from the very beginnings
of out native Windows support; probably some previously-unexplained trouble
reports trace to this.

A secondary problem is that on Cygwin (perhaps only in older versions?),
exec() may not detach from the shared memory segment after all, in which
case these child processes did remain attached to shared memory, posing
the risk of an unexpected shared memory clobber if they went off the rails
somehow.  That may be a long-gone bug, but we can deal with it now if it's
still live, by detaching within the infrastructure introduced here to deal
with closing the handle.

Back-patch to all supported branches.

Tom Lane and Amit Kapila
2015-10-13 11:21:33 -04:00
Noah Misch
7732d49ca2 Use JsonbIteratorToken consistently in automatic variable declarations.
Many functions stored JsonbIteratorToken values in variables of other
integer types.  Also, standardize order relative to other declarations.
Expect compilers to generate the same code before and after this change.
2015-10-11 23:53:35 -04:00
Robert Haas
0e57b4d8bd Speed up text sorts where the same strings occur multiple times.
Cache strxfrm() blobs across calls made to the text SortSupport
abbreviation routine.  This can speed up sorting if the same string
needs to be abbreviated many times in a row.

Also, cache the result of the previous strcoll() comparison, so that
if we're asked to compare the same strings agin, we do need to call
strcoll() again.

Perhaps surprisingly, these optimizations don't seem to hurt even when
they don't help.  memcmp() is really cheap compared to strcoll() or
strxfrm().

Peter Geoghegan, reviewed by me.
2015-10-09 19:03:44 -04:00
Robert Haas
bfb54ff15a Make abbreviated key comparisons for text a bit cheaper.
If we do some byte-swapping while abbreviating, we can do comparisons
using integer arithmetic rather than memcmp.

Peter Geoghegan, reviewed and slightly revised by me.
2015-10-09 15:06:06 -04:00
Robert Haas
db0f6cad48 Remove set_latch_on_sigusr1 flag.
This flag has proven to be a recipe for bugs, and it doesn't seem like
it can really buy anything in terms of performance.  So let's just
*always* set the process latch when we receive SIGUSR1 instead of
trying to do it only when needed.

Per my recent proposal on pgsql-hackers.
2015-10-09 14:31:04 -04:00
Stephen Frost
b7aac36245 Handle append_rel_list in expand_security_qual
During expand_security_quals, we take the security barrier quals on an
RTE and create a subquery which evaluates the quals.  During this, we
have to replace any variables in the outer query which refer to the
original RTE with references to the columns from the subquery.

We need to also perform that replacement for any Vars in the
append_rel_list.

Only backpatching to 9.5 as we only go through this process in 9.4 for
auto-updatable security barrier views, which UNION ALL queries aren't.

Discovered by Haribabu Kommi

Patch by Dean Rasheed
2015-10-09 10:49:02 -04:00
Tom Lane
94f5246ce1 Fix uninitialized-variable bug.
For some reason, neither of the compilers I usually use noticed the
uninitialized-variable problem I introduced in commit 7e2a18a916.
That's hardly a good enough excuse though.  Committing with brown paper bag
on head.

In addition to putting the operations in the right order, move the
declaration of "now" inside the loop; there's no need for it to be
outside, and that does wake up older gcc enough to notice any similar
future problem.

Back-patch to 9.4; earlier versions lack the time-to-SIGKILL stanza
so there's no bug.
2015-10-09 09:12:03 -05:00
Tom Lane
7e2a18a916 Perform an immediate shutdown if the postmaster.pid file is removed.
The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence.  When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit.  Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.

However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.
2015-10-06 17:15:52 -04:00
Robert Haas
8f6bb851bd Remove more volatile qualifiers.
Prior to commit 0709b7ee72, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.
This continues work begun in df4077cda2
and 6ba4ecbf47.

Thomas Munro and Michael Paquier
2015-10-06 15:45:02 -04:00
Bruce Momjian
b943f502b7 Have CREATE TABLE LIKE add OID column if any LIKEd table has one
Also, process constraints for LIKEd tables at the end so an OID column
can be referenced in a constraint.

Report by Tom Lane
2015-10-05 21:19:16 -04:00
Bruce Momjian
28b3a3d41a to_number(): allow 'V' to divide by 10^(the number of digits)
to_char('V') already multiplied in a similar manner.

Report by Jeremy Lowery
2015-10-05 21:03:38 -04:00
Bruce Momjian
2d87eedc1d to_char(): Do not count negative sign as a digit for time values
For time masks, like HH24, MI, SS, CC, MM, do not count the negative
sign as part of the zero-padding length specified by the mask, e.g. have
to_char('-4 years'::interval, 'YY') return '-04', not '-4'.

Report by Craig Ringer
2015-10-05 20:51:46 -04:00
Noah Misch
5976097c0f Prevent stack overflow in query-type functions.
The tsquery, ltxtquery and query_int data types have a common ancestor.
Having acquired check_stack_depth() calls independently, each was
missing at least one call.  Back-patch to 9.0 (all supported versions).
2015-10-05 10:06:30 -04:00
Noah Misch
30cb12881d Prevent stack overflow in container-type functions.
A range type can name another range type as its subtype, and a record
type can bear a column of another record type.  Consequently, functions
like range_cmp() and record_recv() are recursive.  Functions at risk
include operator family members and referents of pg_type regproc
columns.  Treat as recursive any such function that looks up and calls
the same-purpose function for a record column type or the range subtype.
Back-patch to 9.0 (all supported versions).

An array type's element type is never itself an array type, so array
functions are unaffected.  Recursion depth proportional to array
dimensionality, found in array_dim_to_jsonb(), is fine thanks to MAXDIM.
2015-10-05 10:06:29 -04:00
Noah Misch
08fa47c485 Prevent stack overflow in json-related functions.
Sufficiently-deep recursion heretofore elicited a SIGSEGV.  If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections.  That applies to 9.3, where the json parser
adopted recursive descent, and later versions.  Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.

Oskari Saarenmaa, reviewed by Michael Paquier.

Security: CVE-2015-5289
2015-10-05 10:06:29 -04:00
Stephen Frost
2ca9d5445c Apply SELECT policies in INSERT/UPDATE+RETURNING
Similar to 7d8db3e, given that INSERT+RETURNING requires SELECT rights
on the table, apply the SELECT policies as WCOs to the tuples being
inserted.  Apply the same logic to UPDATE+RETURNING.

Back-patch to 9.5 where RLS was added.
2015-10-05 07:55:13 -04:00
Stephen Frost
4158cc3793 Do not write out WCOs in Query
The WithCheckOptions list in Query are only populated during rewrite and
do not need to be written out or read in as part of a Query structure.

Further, move WithCheckOptions to the bottom and add comments to clarify
that it is only populated during rewrite.

Back-patch to 9.5 with a catversion bump, as we are still in alpha.
2015-10-05 07:38:58 -04:00
Andres Freund
2596d705bd Re-Align *_freeze_max_age reloption limits with corresponding GUC limits.
In 020235a575 I lowered the autovacuum_*freeze_max_age minimums to
allow for easier testing of wraparounds. I did not touch the
corresponding per-table limits. While those don't matter for the purpose
of wraparound, it seems more consistent to lower them as well.

It's noteworthy that the previous reloption lower limit for
autovacuum_multixact_freeze_max_age was too high by one magnitude, even
before 020235a575.

Discussion: 26377.1443105453@sss.pgh.pa.us
Backpatch: back to 9.0 (in parts), like the prior patch
2015-10-05 11:53:43 +02:00
Stephen Frost
088c83363a ALTER TABLE .. FORCE ROW LEVEL SECURITY
To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.

row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump
output is complete (by default).

Also add SECURITY_NOFORCE_RLS context to avoid data corruption when
ALTER TABLE .. FORCE ROW SECURITY is being used. The
SECURITY_NOFORCE_RLS security context is used only during referential
integrity checks and is only considered in check_enable_rls() after we
have already checked that the current user is the owner of the relation
(which should always be the case during referential integrity checks).

Back-patch to 9.5 where RLS was added.
2015-10-04 21:05:08 -04:00
Tom Lane
f2fc98fb8e Further twiddling of nodeHash.c hashtable sizing calculation.
On reflection, the submitted patch didn't really work to prevent the
request size from exceeding MaxAllocSize, because of the fact that we'd
happily round nbuckets up to the next power of 2 after we'd limited it to
max_pointers.  The simplest way to enforce the limit correctly is to
round max_pointers down to a power of 2 when it isn't one already.

(Note that the constraint to INT_MAX / 2, if it were doing anything useful
at all, is properly applied after that.)
2015-10-04 15:55:07 -04:00
Tom Lane
a31e64d065 Fix some issues in new hashtable size calculations in nodeHash.c.
Limit the size of the hashtable pointer array to not more than
MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory
alloc request size" failures.  There was discussion of allowing the array
to get larger than that by using the "huge" palloc API, but so far no proof
that that is actually a good idea, and at this point in the 9.5 cycle major
changes from old behavior don't seem like the way to go.

Fix a rather serious secondary bug in the new code, which was that it
didn't ensure nbuckets remained a power of 2 when recomputing it for the
multiple-batch case.

Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and
its sole call site.
2015-10-04 14:06:50 -04:00
Andrew Dunstan
1edd4ec831 Disallow invalid path elements in jsonb_set
Null path elements and, where the object is an array, invalid integer
elements now cause an error.

Incorrect behaviour noted by Thom Brown, patch from Dmitry Dolgov.

Backpatch to 9.5 where jsonb_set was introduced
2015-10-04 13:28:16 -04:00
Peter Eisentraut
6390c8c654 Group cluster_name and update_process_title settings together 2015-10-04 12:29:36 -04:00
Noah Misch
3cb0a7e75a Make BYPASSRLS behave like superuser RLS bypass.
Specifically, make its effect independent from the row_security GUC, and
make it affect permission checks pertinent to views the BYPASSRLS role
owns.  The row_security GUC thereby ceases to change successful-query
behavior; it can only make a query fail with an error.  Back-patch to
9.5, where BYPASSRLS was introduced.
2015-10-03 20:19:57 -04:00
Andres Freund
b67aaf21e8 Add CASCADE support for CREATE EXTENSION.
Without CASCADE, if an extension has an unfullfilled dependency on
another extension, CREATE EXTENSION ERRORs out with "required extension
... is not installed". That is annoying, especially when that dependency
is an implementation detail of the extension, rather than something the
extension's user can make sense of.

In addition to CASCADE this also includes a small set of regression
tests around CREATE EXTENSION.

Author: Petr Jelinek, editorialized by Michael Paquier, Andres Freund
Reviewed-By: Michael Paquier, Andres Freund, Jeff Janes
Discussion: 557E0520.3040800@2ndquadrant.com
2015-10-03 18:23:40 +02:00
Tom Lane
bf686796a0 Add missing "static" specifier.
Per buildfarm (pademelon, at least, doesn't like this).
2015-10-03 10:59:42 -04:00
Andres Freund
920218cbc0 Improve errhint() about replication slot naming restrictions.
The existing hint talked about "may only contain letters", but the
actual requirement is more strict: only lower case letters are allowed.

Reported-By: Rushabh Lathia
Author: Rushabh Lathia
Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com
Backpatch: 9.4-, where replication slots were added
2015-10-03 15:29:08 +02:00
Andres Freund
ad22783792 Fix several bugs related to ON CONFLICT's EXCLUDED pseudo relation.
Four related issues:

1) attnos/varnos/resnos for EXCLUDED were out of sync when a column
   after one dropped in the underlying relation was referenced.
2) References to whole-row variables (i.e. EXCLUDED.*) lead to errors.
3) It was possible to reference system columns in the EXCLUDED pseudo
   relations, even though they would not have valid contents.
4) References to EXCLUDED were rewritten by the RLS machinery, as
   EXCLUDED was treated as if it were the underlying relation.

To fix the first two issues, generate the excluded targetlist with
dropped columns in mind and add an entry for whole row
variables. Instead of unconditionally adding a wholerow entry we could
pull up the expression if needed, but doing it unconditionally seems
simpler. The wholerow entry is only really needed for ruleutils/EXPLAIN
support anyway.

The remaining two issues are addressed by changing the EXCLUDED RTE to
have relkind = composite. That fits with EXCLUDED not actually being a
real relation, and allows to treat it differently in the relevant
places. scanRTEForColumn now skips looking up system columns when the
RTE has a composite relkind; fireRIRrules() already had a corresponding
check, thereby preventing RLS expansion on EXCLUDED.

Also add tests for these issues, and improve a few comments around
excluded handling in setrefs.c.

Reported-By: Peter Geoghegan, Geoff Winkless
Author: Andres Freund, Amit Langote, Peter Geoghegan
Discussion: CAEzk6fdzJ3xYQZGbcuYM2rBd2BuDkUksmK=mY9UYYDugg_GgZg@mail.gmail.com,
   CAM3SWZS+CauzbiCEcg-GdE6K6ycHE_Bz6Ksszy8AoixcMHOmsA@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-10-03 15:12:10 +02:00
Tom Lane
2e8cfcf4ea Add recursion depth protection to LIKE matching.
Since MatchText() recurses, it could in principle be driven to stack
overflow, although quite a long pattern would be needed.
2015-10-02 15:00:51 -04:00
Tom Lane
b63fc28776 Add recursion depth protections to regular expression matching.
Some of the functions in regex compilation and execution recurse, and
therefore could in principle be driven to stack overflow.  The Tcl crew
has seen this happen in practice in duptraverse(), though their fix was
to put in a hard-wired limit on the number of recursive levels, which is
not too appetizing --- fortunately, we have enough infrastructure to check
the actually available stack.  Greg Stark has also seen it in other places
while fuzz testing on a machine with limited stack space.  Let's put guards
in to prevent crashes in all these places.

Since the regex code would leak memory if we simply threw elog(ERROR),
we have to introduce an API that checks for stack depth without throwing
such an error.  Fortunately that's not difficult.
2015-10-02 14:51:58 -04:00
Tom Lane
f2c4ffc330 Fix potential infinite loop in regular expression execution.
In cfindloop(), if the initial call to shortest() reports that a
zero-length match is possible at the current search start point, but then
it is unable to construct any actual match to that, it'll just loop around
with the same start point, and thus make no progress.  We need to force the
start point to be advanced.  This is safe because the loop over "begin"
points has already tried and failed to match starting at "close", so there
is surely no need to try that again.

This bug was introduced in commit e2bd904955,
wherein we allowed continued searching after we'd run out of match
possibilities, but evidently failed to think hard enough about exactly
where we needed to search next.

Because of the way this code works, such a match failure is only possible
in the presence of backrefs --- otherwise, shortest()'s judgment that a
match is possible should always be correct.  That probably explains how
come the bug has escaped detection for several years.

The actual fix is a one-liner, but I took the trouble to add/improve some
comments related to the loop logic.

After fixing that, the submitted test case "()*\1" didn't loop anymore.
But it reported failure, though it seems like it ought to match a
zero-length string; both Tcl and Perl think it does.  That seems to be from
overenthusiastic optimization on my part when I rewrote the iteration match
logic in commit 173e29aa5d: we can't just
"declare victory" for a zero-length match without bothering to set match
data for capturing parens inside the iterator node.

Per fuzz testing by Greg Stark.  The first part of this is a bug in all
supported branches, and the second part is a bug since 9.2 where the
iteration rewrite happened.
2015-10-02 14:26:36 -04:00
Tom Lane
9fe8fe9c9e Add some more query-cancel checks to regular expression matching.
Commit 9662143f0c added infrastructure to
allow regular-expression operations to be terminated early in the event
of SIGINT etc.  However, fuzz testing by Greg Stark disclosed that there
are still cases where regex compilation could run for a long time without
noticing a cancel request.  Specifically, the fixempties() phase never
adds new states, only new arcs, so it doesn't hit the cancel check I'd put
in newstate().  Add one to newarc() as well to cover that.

Some experimentation of my own found that regex execution could also run
for a long time despite a pending cancel.  We'd put a high-level cancel
check into cdissect(), but there was none inside the core text-matching
routines longest() and shortest().  Ordinarily those inner loops are very
very fast ... but in the presence of lookahead constraints, not so much.
As a compromise, stick a cancel check into the stateset cache-miss
function, which is enough to guarantee a cancel check at least once per
lookahead constraint test.

Making this work required more attention to error handling throughout the
regex executor.  Henry Spencer had apparently originally intended longest()
and shortest() to be incapable of incurring errors while running, so
neither they nor their subroutines had well-defined error reporting
behaviors.  However, that was already broken by the lookahead constraint
feature, since lacon() can surely suffer an out-of-memory failure ---
which, in the code as it stood, might never be reported to the user at all,
but just silently be treated as a non-match of the lookahead constraint.
Normalize all that by inserting explicit error tests as needed.  I took the
opportunity to add some more comments to the code, too.

Back-patch to all supported branches, like the previous patch.
2015-10-02 13:45:39 -04:00
Alvaro Herrera
e06b2e1d2e Don't disable commit_ts in standby if enabled locally
Bug noticed by Fujii Masao
2015-10-02 12:49:01 -03:00
Peter Eisentraut
87c2b517ac Fix message punctuation according to style guide 2015-10-01 21:39:56 -04:00
Alvaro Herrera
f12e814b88 Fix commit_ts for standby
Module initialization was still not completely correct after commit
6b61955135, per crash report from Takashi Ohnishi.  To fix, instead of
trying to monkey around with the value of the GUC setting directly, add
a separate boolean flag that enables the feature on a standby, but only
for the startup (recovery) process, when it sees that its master server
has the feature enabled.
Discussion: http://www.postgresql.org/message-id/ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp

Also change the deactivation routine to delete all segment files rather
than leaving the last one around.  (This doesn't need separate
WAL-logging, because on recovery we execute the same deactivation
routine anyway.)

In passing, clean up the code structure somewhat, particularly so that
xlog.c doesn't know so much about when to activate/deactivate the
feature.

Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion.

Back-patch to 9.5, where commit_ts was introduced.
2015-10-01 15:06:55 -03:00
Tom Lane
21995d3f6d Fix documentation error in commit 8703059c6b.
Etsuro Fujita spotted a thinko in the README commentary.
2015-10-01 10:32:11 -04:00
Robert Haas
286a3a68dc Fix readfuncs/outfuncs problems in last night's Gather patch.
KaiGai Kohei, with one correction by me.
2015-10-01 09:19:26 -04:00
Tom Lane
5884b92a84 Fix errors in commit a04bb65f70.
Not a lot of commentary needed here really.
2015-09-30 23:37:26 -04:00
Tom Lane
07e4d03fb4 Improve LISTEN startup time when there are many unread notifications.
If some existing listener is far behind, incoming new listener sessions
would start from that session's read pointer and then need to advance over
many already-committed notification messages, which they have no interest
in.  This was expensive in itself and also thrashed the pg_notify SLRU
buffers a lot more than necessary.  We can improve matters considerably
in typical scenarios, without much added cost, by starting from the
furthest-ahead read pointer, not the furthest-behind one.  We do have to
consider only sessions in our own database when doing this, which requires
an extra field in the data structure, but that's a pretty small cost.

Back-patch to 9.0 where the current LISTEN/NOTIFY logic was introduced.

Matt Newell, slightly adjusted by me
2015-09-30 23:32:43 -04:00
Robert Haas
3bd909b220 Add a Gather executor node.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream.  It can also run the plan itself, if the workers are
unavailable or haven't started up yet.  It is intended to work with
the Partial Seq Scan node which will be added in future commits.

It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used.  In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results.  So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes.  Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.

There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne.  But we're getting
close.

Amit Kapila.  Some designs suggestions were provided by me, and I also
reviewed the patch.  Single-copy mode, documentation, and other minor
changes also by me.
2015-09-30 19:23:36 -04:00
Robert Haas
227d57f358 Don't dump core when destroying an unused ParallelContext.
If a transaction or subtransaction creates a ParallelContext but ends
without calling InitializeParallelDSM, the previous code would
seg fault.  Fix that.
2015-09-30 18:36:31 -04:00
Stephen Frost
7d8db3e8f3 Include policies based on ACLs needed
When considering which policies should be included, rather than look at
individual bits of the query (eg: if a RETURNING clause exists, or if a
WHERE clause exists which is referencing the table, or if it's a
FOR SHARE/UPDATE query), consider any case where we've determined
the user needs SELECT rights on the relation while doing an UPDATE or
DELETE to be a case where we apply SELECT policies, and any case where
we've deteremind that the user needs UPDATE rights on the relation while
doing a SELECT to be a case where we apply UPDATE policies.

This simplifies the logic and addresses concerns that a user could use
UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
they could use SELECT .. FOR UPDATE to lock rows which they are not
actually allowed to modify through UPDATE policies.

Use list_append_unique() to avoid adding the same quals multiple times,
as, on balance, the cost of checking when adding the quals will almost
always be cheaper than keeping them and doing busywork for each tuple
during execution.

Back-patch to 9.5 where RLS was added.
2015-09-30 07:39:24 -04:00
Tom Lane
6057f61b4d Small improvements in comments in async.c.
We seem to have lost a line somewhere along the way in the comment block
that discusses async.c's locks, because it suddenly refers to "both locks"
without previously having mentioned more than one.  Add a sentence to make
that read more sanely.  Also, refer to the "pos of the slowest backend"
not the "tail of the slowest backend", since we have no per-backend value
called "tail".
2015-09-29 22:07:16 -04:00
Alvaro Herrera
6b61955135 Code review for transaction commit timestamps
There are three main changes here:

1. No longer cause a start failure in a standby if the feature is
disabled in postgresql.conf but enabled in the master.  This reverts one
part of commit 4f3924d9cd43; what we keep is the ability of the standby
to activate/deactivate the module (which includes creating and removing
segments as appropriate) during replay of such actions in the master.

2. Replay WAL records affecting commitTS even if the feature is
disabled.  This means the standby will always have the same state as the
master after replay.

3. Have COMMIT PREPARE record the transaction commit time as well.  We
were previously only applying it in the normal transaction commit path.

Author: Petr Jelínek
Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com

Additionally, I cleaned up nearby code related to replication origins,
which I found a bit hard to follow, and fixed a couple of typos.

Backpatch to 9.5, where this code was introduced.

Per bug reports from Fujii Masao and subsequent discussion.
2015-09-29 14:40:56 -03:00
Robert Haas
758fcfdc01 Comment update for join pushdown.
Etsuro Fujita
2015-09-29 07:42:30 -04:00
Robert Haas
d1b7c1ffe7 Parallel executor support.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master.  User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not.  Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.

Amit Kapila and Robert Haas
2015-09-28 21:55:57 -04:00
Alvaro Herrera
17f5831c81 Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426, so backpatch to 9.5.

Pointed out by Fujii Masao
2015-09-28 19:13:42 -03:00
Alvaro Herrera
590e2d12f0 COPY: use pg_plan_query() instead of planner()
While at it, trim the includes list in copy.c.  The planner headers
cannot be removed, but there are a few others that are not of any use.
2015-09-28 15:14:08 -03:00
Andres Freund
617db3a2d8 Fix ON CONFLICT DO UPDATE for tables with oids.
When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables
with oids were not supported. The tuple generated by the update target
list was projected without space for an oid - a simple oversight.

Reported-By: Peter Geoghegan
Author: Andres Freund
Backpatch: 9.5, where ON CONFLICT was introduced
2015-09-28 19:29:44 +02:00
Robert Haas
f40792a93c Use LOCKBIT_ON() instead of a bit shift in a few places.
We do this mostly everywhere, so it seems just as well to do it here,
too.

Thomas Munro
2015-09-28 10:57:15 -04:00
Andres Freund
aa29c1ccd9 Remove legacy multixact truncation support.
In 9.5 and master there is no need to support legacy truncation. This is
just committed separately to make it easier to backpatch the WAL logged
multixact truncation to 9.3 and 9.4 if we later decide to do so.

I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to
0xD087 to avoid 9.5 reusing a value that has been in use on master while
keeping the numbers increasing between major versions.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Backpatch: 9.5
2015-09-26 19:04:25 +02:00
Andres Freund
4f627f8973 Rework the way multixact truncations work.
The fact that multixact truncations are not WAL logged has caused a fair
share of problems. Amongst others it requires to do computations during
recovery while the database is not in a consistent state, delaying
truncations till checkpoints, and handling members being truncated, but
offset not.

We tried to put bandaids on lots of these issues over the last years,
but it seems time to change course. Thus this patch introduces WAL
logging for multixact truncations.

This allows:
1) to perform the truncation directly during VACUUM, instead of delaying it
   to the checkpoint.
2) to avoid looking at the offsets SLRU for truncation during recovery,
   we can just use the master's values.
3) simplify a fair amount of logic to keep in memory limits straight,
   this has gotten much easier

During the course of fixing this a bunch of additional bugs had to be
fixed:
1) Data was not purged from memory the member's SLRU before deleting
   segments. This happened to be hard or impossible to hit due to the
   interlock between checkpoints and truncation.
2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but
   that doesn't work for offsets that haven't yet been flushed to
   disk. Add code to flush the SLRUs to fix. Not pretty, but it feels
   slightly safer to only make decisions based on actual on-disk state.
3) find_multixact_start() could be called concurrently with a truncation
   and thus fail. Via SetOffsetVacuumLimit() that could lead to a round
   of emergency vacuuming. The problem remains in
   pg_get_multixact_members(), but that's quite harmless.

For now this is going to only get applied to 9.5+, leaving the issues in
the older branches in place. It is quite possible that we need to
backpatch at a later point though.

For the case this gets backpatched we need to handle that an updated
standby may be replaying WAL from a not-yet upgraded primary. We have to
recognize that situation and use "old style" truncation (i.e. looking at
the SLRUs) during WAL replay. In contrast to before, this now happens in
the startup process, when replaying a checkpoint record, instead of the
checkpointer. Doing truncation in the restartpoint is incorrect, they
can happen much later than the original checkpoint, thereby leading to
wraparound.  To avoid "multixact_redo: unknown op code 48" errors
standbys would have to be upgraded before primaries.

A later patch will bump the WAL page magic, and remove the legacy
truncation codepaths. Legacy truncation support is just included to make
a possible future backpatch easier.

Discussion: 20150621192409.GA4797@alap3.anarazel.de
Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro
Backpatch: 9.5 for now
2015-09-26 19:04:25 +02:00
Tom Lane
2abfd9d5e9 Second try at fixing O(N^2) problem in foreign key references.
This replaces ill-fated commit 5ddc72887a,
which was reverted because it broke active uses of FK cache entries.  In
this patch, we still do nothing more to invalidatable cache entries than
mark them as needing revalidation, so we won't break active uses.  To keep
down the overhead of InvalidateConstraintCacheCallBack(), keep a list of
just the currently-valid cache entries.  (The entries are large enough that
some added space for list links doesn't seem like a big problem.)  This
would still be O(N^2) when there are many valid entries, though, so when
the list gets too long, just force the "sinval reset" behavior to remove
everything from the list.  I set the threshold at 1000 entries, somewhat
arbitrarily.  Possibly that could be fine-tuned later.  Another item for
future study is whether it's worth adding reference counting so that we
could safely remove invalidated entries.  As-is, problem cases are likely
to end up with large and mostly invalid FK caches.

Like the previous attempt, backpatch to 9.3.

Jan Wieck and Tom Lane
2015-09-25 13:16:30 -04:00
Tom Lane
39df0f150c Allow planner to use expression-index stats for function calls in WHERE.
Previously, a function call appearing at the top level of WHERE had a
hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated
in the source code itself to July 1992.  The expectation at the time was
that somebody would soon implement estimator support functions analogous
to those for operators; but no such code has appeared, nor does it seem
likely to in the near future.  We do have an alternative solution though,
at least for immutable functions on single relations: creating an
expression index on the function call will allow ANALYZE to gather stats
about the function's selectivity.  But the code in clause_selectivity()
failed to make use of such data even if it exists.

Refactor so that that will happen.  I chose to make it try this technique
for any clause type for which clause_selectivity() doesn't have a special
case, not just functions.  To avoid adding unnecessary overhead in the
common case where we don't learn anything new, make selfuncs.c provide an
API that hooks directly to examine_variable() and then var_eq_const(),
rather than the previous coding which laboriously constructed an OpExpr
only so that it could be expensively deconstructed again.

I preserved the behavior that the default estimate for a function call
is 0.3333333.  (For any other expression node type, it's 0.5, as before.)
I had originally thought to make the default be 0.5 across the board, but
changing a default estimate that's survived for twenty-three years seems
like something not to do without a lot more testing than I care to put
into it right now.

Per a complaint from Jehan-Guillaume de Rorthais.  Back-patch into 9.5,
but not further, at least for the moment.
2015-09-24 18:35:46 -04:00
Robert Haas
9f1255ac85 Don't zero opfuncid when reading nodes.
The comments here stated that this was just in case we ever had an
ALTER OPERATOR command that could remap an operator to a different
function.  But those comments have been here for a long time, and no
such command has come about.  In the absence of such a feature,
forcing the pg_proc OID to be looked up again each time we reread a
stored rule or similar is just a waste of cycles.  Moreover, parallel
query needs a way to reread the exact same node tree that was written
out, not one that has been slightly stomped on.  So just get rid of
this for now.

Per discussion with Tom Lane.
2015-09-24 11:36:29 -04:00
Andres Freund
020235a575 Lower *_freeze_max_age minimum values.
The old minimum values are rather large, making it time consuming to
test related behaviour. Additionally the current limits, especially for
multixacts, can be problematic in space-constrained systems. 10000000
multixacts can contain a lot of members.

Since there's no good reason for the current limits, lower them a good
bit. Setting them to 0 would be a bad idea, triggering endless vacuums,
so still retain a limit.

While at it fix autovacuum_multixact_freeze_max_age to refer to
multixact.c instead of varsup.c.

Reviewed-By: Robert Haas
Discussion: CA+TgmoYmQPHcrc3GSs7vwvrbTkbcGD9Gik=OztbDGGrovkkEzQ@mail.gmail.com
Backpatch: back to 9.0 (in parts)
2015-09-24 14:53:32 +02:00
Tom Lane
82e1ba7fd6 Make ANALYZE compute basic statistics even for types with no "=" operator.
Previously, ANALYZE simply ignored columns of datatypes that have neither
a btree nor hash opclass (which means they have no recognized equality
operator).  Without a notion of equality, we can't identify most-common
values nor estimate the number of distinct values.  But we can still
count nulls and compute the average physical column width, and those
stats might be of value.  Moreover there are some tools out there that
don't work so well if rows are missing from pg_statistic.  So let's
add suitable logic for this case.

While this is arguably a bug fix, it also has the potential to change
query plans, and the gain seems not worth taking a risk of that in
stable branches.  So back-patch into 9.5 but not further.

Oleksandr Shulgin, rewritten a bit by me.
2015-09-23 18:26:49 -04:00
Robert Haas
a0d9f6e434 Add readfuncs.c support for plan nodes.
For parallel query, we need to be able to pass a Plan to a worker, so
that it knows what it's supposed to do.  We could invent our own way
of serializing plans for that purpose, but piggybacking on the
existing node infrastructure seems like a much better idea.

Initially, we'll probably only support a limited number of nodes
within parallel workers, but this commit adds support for everything
in plannodes.h except CustomScan, because doing it all at once seems
easier than doing it piecemeal, and it makes testing this code easier,
too.  CustomScan is excluded because making that work requires a
larger rework of that facility.

Amit Kapila, reviewed and slightly revised by me.
2015-09-23 11:51:50 -04:00
Robert Haas
4fe6f72bda Print a MergeJoin's mergeNullsFirst array as bool, not int.
It's declared as being an array of bool, but it's printed
differently from the way bool and arrays of bool are handled
elsewhere.

Patch by Amit Kapila.  Anomaly noted independently by Amit Kapila
and KaiGai Kohei.
2015-09-23 10:53:29 -04:00
Teodor Sigaev
dc943ad952 Allow autoanalyze to add pages deleted from pending list to FSM
Commit e956808328 introduces adding pages
to FSM for ordinary insert, but autoanalyze was able just cleanup
pending list without adding to FSM.

Also fix double call of IndexFreeSpaceMapVacuum() during ginvacuumcleanup()

Report from Fujii Masao
Patch by me
Review by Jeff Janes
2015-09-23 15:33:51 +03:00
Robert Haas
262e56bcae Teach planstate_tree_walker about custom scans.
This logic was missing from ExplainPreScanNode, from which I derived
planstate_tree_walker.  But it shouldn't be missing, especially not
from a generic walker function, so add it.

KaiGai Kohei
2015-09-22 21:42:00 -04:00
Andres Freund
98d5b084d2 Correct value of LW_SHARED_MASK.
The previous wrong value lead to wrong LOCK_DEBUG output, never showing
any shared lock holders.

Reported-By: Alexander Korotkov
Discussion: CAPpHfdsPmWqz9FB0AnxJrwp1=KLF0n=-iB+QvR0Q8GSmpFVdUQ@mail.gmail.com
Backpatch: 9.5, where the bug was introduced.
2015-09-22 11:14:14 +02:00
Tom Lane
246693e5ae Fix possible internal overflow in numeric multiplication.
mul_var() postpones propagating carries until it risks overflow in its
internal digit array.  However, the logic failed to account for the
possibility of overflow in the carry propagation step, allowing wrong
results to be generated in corner cases.  We must slightly reduce the
when-to-propagate-carries threshold to avoid that.

Discovered and fixed by Dean Rasheed, with small adjustments by me.

This has been wrong since commit d72f6c7503,
so back-patch to all supported branches.
2015-09-21 12:11:32 -04:00
Noah Misch
7f11724bd6 Remove the SECURITY_ROW_LEVEL_DISABLED security context bit.
This commit's parent made superfluous the bit's sole usage.  Referential
integrity checks have long run as the subject table's owner, and that
now implies RLS bypass.  Safe use of the bit was tricky, requiring
strict control over the SQL expressions evaluating therein.  Back-patch
to 9.5, where the bit was introduced.

Based on a patch by Stephen Frost.
2015-09-20 20:47:17 -04:00
Noah Misch
537bd178c7 Remove the row_security=force GUC value.
Every query of a single ENABLE ROW SECURITY table has two meanings, with
the row_security GUC selecting between them.  With row_security=force
available, every function author would have been advised to either set
the GUC locally or test both meanings.  Non-compliance would have
threatened reliability and, for SECURITY DEFINER functions, security.
Authors already face an obligation to account for search_path, and we
should not mimic that example.  With this change, only BYPASSRLS roles
need exercise the aforementioned care.  Back-patch to 9.5, where the
row_security GUC was introduced.

Since this narrows the domain of pg_db_role_setting.setconfig and
pg_proc.proconfig, one might bump catversion.  A row_security=force
setting in one of those columns will elicit a clear message, so don't.
2015-09-20 20:45:41 -04:00
Tom Lane
ba51774d87 Be more wary about partially-valid LOCALLOCK data in RemoveLocalLock().
RemoveLocalLock() must consider the possibility that LockAcquireExtended()
failed to palloc the initial space for a locallock's lockOwners array.
I had evidently meant to cope with this hazard when the code was originally
written (commit 1785acebf2), but missed that
the pfree needed to be protected with an if-test.  Just to make sure things
are left in a clean state, reset numLockOwners as well.

Per low-memory testing by Andreas Seltenreich.  Back-patch to all supported
branches.
2015-09-20 16:48:44 -04:00
Peter Eisentraut
4a1e15e4a9 Add missing serial comma 2015-09-18 22:41:42 -04:00
Peter Eisentraut
f2dd10613e Remove trailing slashes from directories in find command
BSD find is not very smart and ends up writing double slashes into the
output in those cases.  Also, xgettext is not very smart and splits the
file names incorrectly in those cases, resulting in slightly incorrect
file names being written into the POT file.
2015-09-18 22:06:54 -04:00
Robert Haas
4a4e6893aa Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends.  However, shm_mq itself only deals in raw
bytes.  Since commit 2bd9e412f9, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them.  This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.

The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion.  Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.

This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.

Extracted from Amit Kapila's parallel sequential scan patch.  This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
2015-09-18 21:56:58 -04:00
Andrew Dunstan
c00c3249e3 Cache argument type information in json(b) aggregate functions.
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.

Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.

There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.

Backpatch to 9.5 where the jsonb variants were introduced.
2015-09-18 14:39:39 -04:00
Tom Lane
d9c0c728af Fix low-probability memory leak in regex execution.
After an internal failure in shortest() or longest() while pinning down the
exact location of a match, find() forgot to free the DFA structure before
returning.  This is pretty unlikely to occur, since we just successfully
ran the "search" variant of the DFA; but it could happen, and it would
result in a session-lifespan memory leak since this code uses malloc()
directly.  Problem seems to have been aboriginal in Spencer's library,
so back-patch all the way.

In passing, correct a thinko in a comment I added awhile back about the
meaning of the "ntree" field.

I happened across these issues while comparing our code to Tcl's version
of the library.
2015-09-18 13:55:17 -04:00
Teodor Sigaev
d63a1720fa Add header forgotten in 213335c145
Report from Peter Eisentraut
2015-09-18 14:32:09 +03:00
Teodor Sigaev
9acb9007de Fix oversight in tsearch type check
Use IsBinaryCoercible() method instead of custom
is_expected_type/is_text_type functions which was introduced when tsearch2
was moved into core.

Per report by David E. Wheeler
Analysis by Tom Lane
Patch by me
2015-09-17 19:50:51 +03:00
Robert Haas
8dd401aa07 Add new function planstate_tree_walker.
ExplainPreScanNode knows how to iterate over a generic tree of plan
states; factor that logic out into a separate walker function so that
other code, such as upcoming patches for parallel query, can also use
it.

Patch by me, reviewed by Tom Lane.
2015-09-17 11:27:06 -04:00
Teodor Sigaev
22f519c92a Fix bug introduced by microvacuum for GiST
Commit 013ebc0a7b introduces microvacuum for
GiST, deletetion of tuple marked LP_DEAD uses IndexPageMultiDelete while
recovery code uses IndexPageTupleDelete in loop. This causes a difference
in offset numbers of tuples to delete. Patch introduces usage of
IndexPageMultiDelete in GiST except gistplacetopage() where only one tuple is
deleted at once. That also slightly improve performance, because
IndexPageMultiDelete is more effective.

Patch changes WAL format, so bump wal page magic.

Bug report from Jeff Janes
Diagnostic and patch by Anastasia Lubennikova and me
2015-09-17 14:22:37 +03:00
Robert Haas
7aea8e4f2d Determine whether it's safe to attempt a parallel plan for a query.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode.  In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode.  This requires those functions to be
labeled, so this patch introduces an infrastructure for that.  Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.

Parallel plans can't be used except for the case where the query will
run to completion.  If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail.  Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion.  This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count.  Having the client indicate its
intentions at bind time would constitute a wire protocol break.  Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.

This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.

Robert Haas and Amit Kapila.  Reviewed (in earlier versions) by Noah
Misch.
2015-09-16 15:38:47 -04:00
Tom Lane
b44d92b67b Sync regex code with Tcl 8.6.4.
Sync our regex code with upstream changes since last time we did this,
which was Tcl 8.5.11 (see commit 08fd6ff37f).

The only functional change here is to disbelieve that an octal escape is
three digits long if it would exceed \377.  That's a bug fix, but it's
a minor one and could change the interpretation of working regexes, so
don't back-patch.

In addition to that, s/INFINITY/DUPINF/ to eliminate the risk of collisions
with <math.h>'s macro, and s/LOCAL/NOPROP/ because that also seems like
an unnecessarily collision-prone macro name.

There were some other cosmetic changes in their copy that I did not adopt,
notably a rather half-hearted attempt at renaming some of the C functions
in a more verbose style.  (I'm not necessarily against the concept, but
renaming just a few functions in the package is not an improvement.)
2015-09-16 15:25:25 -04:00
Stephen Frost
4f3b2a8883 Enforce ALL/SELECT policies in RETURNING for RLS
For the UPDATE/DELETE RETURNING case, filter the records which are not
visible to the user through ALL or SELECT policies from those considered
for the UPDATE or DELETE.  This is similar to how the GRANT system
works, which prevents RETURNING unless the caller has SELECT rights on
the relation.

Per discussion with Robert, Dean, Tom, and Kevin.

Back-patch to 9.5 where RLS was introduced.
2015-09-15 15:49:31 -04:00
Stephen Frost
22eaf35c1d RLS refactoring
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.

This also allowed us to do away with the policy_id field.  A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.

Patch by Dean Rasheed, with various mostly cosmetic changes by me.

Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
2015-09-15 15:49:31 -04:00
Tom Lane
3d9e8db9e5 Revert "Fix an O(N^2) problem in foreign key references".
Commit 5ddc72887a does not actually work
because it will happily blow away ri_constraint_cache entries that are
in active use in outer call levels.  In any case, it's a very ugly,
brute-force solution to the problem of limiting the cache size.
Revert until it can be redesigned.
2015-09-15 11:09:15 -04:00
Fujii Masao
10fbb79f1a Improve log messages related to tablespace_map file
This patch changes the log message which is logged when the server
successfully renames backup_label file to *.old but fails to rename
tablespace_map file during the shutdown. Previously the WARNING
message "online backup mode was not canceled" was logged in that case.
However this message is confusing because the backup mode is treated
as canceled whenever backup_label is successfully renamed. So this
commit makes the server log the message "online backup mode canceled"
in that case.

Also this commit changes errdetail messages so that they follow the
error message style guide.

Back-patch to 9.5 where tablespace_map file is introduced.

Original patch by Amit Kapila, heavily modified by me.
2015-09-15 23:21:51 +09:00
Andrew Dunstan
e7e3ac2d51 Fix the fastpath rule for jsonb_concat with an empty operand.
To prevent perverse results, we now only return the other operand if
it's not scalar, and if both operands are of the same kind (array or
object).

Original bug complaint and patch from Oskari Saarenmaa, extended by me
to cover the cases of different kinds of jsonb.

Backpatch to 9.5 where jsonb_concat was introduced.
2015-09-13 17:06:45 -04:00
Peter Eisentraut
b2ae8f1e35 Update SQL features list 2015-09-12 00:08:18 -04:00
Robert Haas
2ccc4e972e Fix build problems in commit aa65de042f.
The previous way didn't work for vpath builds, and make distprep was
busted too.

Reported off-list by Andres Freund.
2015-09-11 14:56:17 -04:00
Alvaro Herrera
5cd6538345 Add missing ReleaseBuffer call in BRIN revmap code
I think this particular branch is actually dead, but the analysis to
prove that is not trivial, so instead take the weasel way.

Reported by Jinyu Zhang
Backpatch to 9.5, where BRIN was introduced.
2015-09-11 15:29:46 -03:00
Kevin Grittner
5ddc72887a Fix an O(N^2) problem in foreign key references.
Commit 45ba424f improved foreign key lookups during bulk updates
when the FK value does not change.  When restoring a schema dump
from a database with many (say 100,000) foreign keys, this cache
would grow very big and every ALTER TABLE command was causing an
InvalidateConstraintCacheCallBack(), which uses a sequential hash
table scan.  This could cause a severe performance regression in
restoring a schema dump (including during pg_upgrade).

The patch uses a heuristic method of detecting when the hash table
should be destroyed and recreated.
InvalidateConstraintCacheCallBack() adds the current size of the
hash table to a counter.  When that sum reaches 1,000,000, the hash
table is flushed.  This fixes the regression without noticeable
harm to the bulk update use case.

Jan Wieck
Backpatch to 9.3 where the performance regression was introduced.
2015-09-11 13:06:51 -05:00
Robert Haas
aa65de042f When trace_lwlocks is used, identify individual lwlocks by name.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.

Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
2015-09-11 14:01:39 -04:00
Tom Lane
87efbc2be1 Fix setrefs.c comment properly.
The "typo" alleged in commit 1e460d4bd was actually a comment that was
correct when written, but I missed updating it in commit b5282aa89.
Use a slightly less specific (and hopefully more future-proof) description
of what is collected.  Back-patch to 9.2 where that commit appeared, and
revert the comment to its then-entirely-correct state before that.
2015-09-10 10:23:56 -04:00
Stephen Frost
1e460d4bd6 Fix typo in setrefs.c
We're adding OIDs, not TIDs, to invalItems.

Pointed out by Etsuro Fujita.

Back-patch to all supported branches.
2015-09-10 09:22:03 -04:00
Tom Lane
91cf3135b9 Fix minor bug in regexp makesearch() function.
The list-wrangling here was done wrong, allowing the same state to get
put into the list twice.  The following loop then would clone it twice.
The second clone would wind up with no inarcs, so that there was no
observable misbehavior AFAICT, but a useless state in the finished NFA
isn't an especially good thing.
2015-09-09 20:14:58 -04:00
Teodor Sigaev
223936e226 Fix oversight in 013ebc0a7b commit
Declaration of varibale inside ÓÝ×Õ
2015-09-09 19:21:16 +03:00
Teodor Sigaev
013ebc0a7b Microvacuum for GIST
Mark index tuple as dead if it's pointed by kill_prior_tuple during
ordinary (search) scan and remove it during insert process if there is no
enough space for new tuple to insert. This improves select performance
because index will not return tuple marked as dead and improves insert
performance because it reduces number of page split.

Anastasia Lubennikova <a.lubennikova@postgrespro.ru> with
 minor editorialization by me
2015-09-09 18:43:37 +03:00
Fujii Masao
96f6a0cb41 Remove files signaling a standby promotion request at postmaster startup
This commit makes postmaster forcibly remove the files signaling
a standby promotion request. Otherwise, the existence of those files
can trigger a promotion too early, whether a user wants that or not.

This removal of files is usually unnecessary because they can exist
only during a few moments during a standby promotion. However
there is a race condition: if pg_ctl promote is executed and creates
the files during a promotion, the files can stay around even after
the server is brought up to new master. Then, if new standby starts
by using the backup taken from that master, the files can exist
at the server startup and should be removed in order to avoid
an unexpected promotion.

Back-patch to 9.1 where promote signal file was introduced.

Problem reported by Feike Steenbergen.
Original patch by Michael Paquier, modified by me.

Discussion: 20150528100705.4686.91426@wrigleys.postgresql.org
2015-09-09 22:51:44 +09:00
Stephen Frost
c3e0ddd403 Lock all relations referred to in updatable views
Even views considered "simple" enough to be automatically updatable may
have mulitple relations involved (eg: in a where clause).  We need to
make sure and lock those relations when rewriting the query.

Back-patch to 9.3 where updatable views were added.

Pointed out by Andres, patch thanks to Dean Rasheed.
2015-09-08 17:02:49 -04:00
Fujii Masao
043113e798 Add gin_fuzzy_search_limit to postgresql.conf.sample.
This was forgotten in 8a3631f (commit that originally added the parameter)
and 0ca9907 (commit that added the documentation later that year).

Back-patch to all supported versions.
2015-09-09 02:25:50 +09:00
Alvaro Herrera
1aba62ec63 Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.

Author: Julien Rouhaud
Reviewed by: Andres Freund
2015-09-08 12:51:42 -03:00
Jeff Davis
b1e1862a12 Coordinate log_line_prefix options 'm' and 'n' to share a timeval.
Commit f828654e introduced the 'n' option, but it invoked
gettimeofday() independently of the 'm' option. If both options were
in use (or multiple 'n' options), or if 'n' was in use along with
csvlog, then the reported times could be different for the same log
message.

To fix, initialize a global variable with gettimeofday() once per log
message, and use that for both formats.

Don't bother coordinating the time for the 't' option, which has much
lower resolution.

Per complaint by Alvaro Herrera.
2015-09-07 15:40:49 -07:00
Jeff Davis
f828654e10 Add log_line_prefix option 'n' for Unix epoch.
Prints time as Unix epoch with milliseconds.

Tomas Vondra, reviewed by Fabien Coelho.
2015-09-07 13:46:31 -07:00
Teodor Sigaev
e26692248a Make GIN's cleanup pending list process interruptable
Cleanup process could be called by ordinary insert/update and could take a lot
of time. Add vacuum_delay_point() to make this process interruptable. Under
vacuum this call will also throttle a vacuum process to decrease system load,
called from insert/update it will not throttle, and that reduces a latency.

Backpatch for all supported branches.

Jeff Janes <jeff.janes@gmail.com>
2015-09-07 17:16:29 +03:00
Teodor Sigaev
e956808328 Add pages deleted from pending list to FSM
Add pages deleted from GIN's pending list during cleanup to free space map
immediately. Clean up process could be initiated by ordinary insert but adding
page to FSM might occur only at vacuum. On some workload like never-vacuumed
insert-only tables it could cause a huge bloat.

Jeff Janes <jeff.janes@gmail.com>
2015-09-07 16:24:01 +03:00
Magnus Hagander
643beffe8f Support RADIUS passwords up to 128 characters
Previous limit was 16 characters, due to lack of support for multiple passes
of encryption.

Marko Tiikkaja
2015-09-06 14:31:53 +02:00
Andres Freund
c314ead5be Add ability to reserve WAL upon slot creation via replication protocol.
Since 6fcd885 it is possible to immediately reserve WAL when creating a
slot via pg_create_physical_replication_slot(). Extend the replication
protocol to allow that as well.

Although, in contrast to the SQL interface, it is possible to update the
reserved location via the replication interface, it is still useful
being able to reserve upon creation there. Otherwise the logic in
ReplicationSlotReserveWal() has to be repeated in slot employing
clients.

Author: Michael Paquier
Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
2015-09-06 13:30:57 +02:00
Greg Stark
258ee1b635 Move DTK_ISODOW DTK_DOW and DTK_DOY to be type UNITS rather than
RESERV. RESERV is meant for tokens like "now" and having them in that
category throws errors like these when used as an input date:

stark=# SELECT 'doy'::timestamptz;
ERROR:  unexpected dtype 33 while parsing timestamptz "doy"
LINE 1: SELECT 'doy'::timestamptz;
               ^
stark=# SELECT 'dow'::timestamptz;
ERROR:  unexpected dtype 32 while parsing timestamptz "dow"
LINE 1: SELECT 'dow'::timestamptz;
               ^

Found by LLVM's Libfuzzer
2015-09-06 03:35:56 +01:00
Tom Lane
9270d8db9a Fix CreateTableSpace() so it will compile without HAVE_SYMLINK.
This has been broken since 9.3 (commit 82b1b213ca to be exact),
which suggests that nobody is any longer using a Windows build system that
doesn't provide a symlink emulation.  Still, it's wrong on its own terms,
so repair.

YUriy Zhuravlev
2015-09-05 16:15:38 -04:00
Heikki Linnakangas
c80b5f66c6 Fix misc typos.
Oskari Saarenmaa. Backpatch to stable branches where applicable.
2015-09-05 11:35:49 +03:00
Tatsuo Ishii
c39f5674df Fix brin index summarizing while vacuuming.
If the number of heap blocks is not multiples of pages per range, the
summarizing produces wrong summary information for the last brin index
tuple while vacuuming.

Problem reported by Tatsuo Ishii and fixed by Amit Langote.

Discussion at "[HACKERS] BRIN INDEX value (message id :20150903.174935.1946402199422994347.t-ishii@sraoss.co.jp)
Backpatched to 9.5 in which brin index was added.
2015-09-05 09:19:25 +09:00
Tom Lane
c5454f99c4 Fix subtransaction cleanup after an outer-subtransaction portal fails.
Formerly, we treated only portals created in the current subtransaction as
having failed during subtransaction abort.  However, if the error occurred
while running a portal created in an outer subtransaction (ie, a cursor
declared before the last savepoint), that has to be considered broken too.

To allow reliable detection of which ones those are, add a bookkeeping
field to struct Portal that tracks the innermost subtransaction in which
each portal has actually been executed.  (Without this, we'd end up
failing portals containing functions that had called the subtransaction,
thereby breaking plpgsql exception blocks completely.)

In addition, when we fail an outer-subtransaction Portal, transfer its
resources into the subtransaction's resource owner, so that they're
released early in cleanup of the subxact.  This fixes a problem reported by
Jim Nasby in which a function executed in an outer-subtransaction cursor
could cause an Assert failure or crash by referencing a relation created
within the inner subtransaction.

The proximate cause of the Assert failure is that AtEOSubXact_RelationCache
assumed it could blow away a relcache entry without first checking that the
entry had zero refcount.  That was a bad idea on its own terms, so add such
a check there, and to the similar coding in AtEOXact_RelationCache.  This
provides an independent safety measure in case there are still ways to
provoke the situation despite the Portal-level changes.

This has been broken since subtransactions were invented, so back-patch
to all supported branches.

Tom Lane and Michael Paquier
2015-09-04 13:37:14 -04:00
Robert Haas
4aec49899e Assorted code review for recent ProcArrayLock patch.
Post-commit review by Andres Freund discovered a couple of concurrency
bugs in the original patch: specifically, if the leader cleared a
follower's XID before it reached PGSemaphoreLock, the semaphore would be
left in the wrong state; and if another process did PGSemaphoreUnlock
for some unrelated reason, we might resume execution before the fact
that our XID was cleared was globally visible.

Also, improve the wording of some comments, rename nextClearXidElem
to firstClearXidElem in PROC_HDR for clarity, and drop some volatile
qualifiers that aren't necessary.

Amit Kapila, reviewed and slightly revised by me.
2015-09-03 13:19:15 -04:00
Fujii Masao
1ea5ce5c5f Document that max_worker_processes must be high enough in standby.
The setting values of some parameters including max_worker_processes
must be equal to or higher than the values on the master. However,
previously max_worker_processes was not listed as such parameter
in the document. So this commit adds it to that list.

Back-patch to 9.4 where max_worker_processes was added.
2015-09-03 22:30:16 +09:00
Teodor Sigaev
30bb26b5e0 Allow usage of huge maintenance_work_mem for GIN build.
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.

Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.

Robert Abraham <robert.abraham86@googlemail.com> with additions by me
2015-09-02 20:08:58 +03:00
Robert Haas
8a02b3d732 Allow notifications to bgworkers without database connections.
Previously, if one background worker registered another background
worker and set bgw_notify_pid while for the second background worker,
it would not receive notifications from the postmaster unless, at the
time the "parent" was registered, BGWORKER_BACKEND_DATABASE_CONNECTION
was set.

To fix, instead instead of including only those background workers that
requested database connections in the postmater's BackendList, include
them all.  There doesn't seem to be any reason not do this, and indeed
it removes a significant amount of duplicated code.  The other option
is to make PostmasterMarkPIDForWorkerNotify look at BackgroundWorkerList
in addition to BackendList, but that adds more code duplication instead
of getting rid of it.

Patch by me.  Review and testing by Ashutosh Bapat.
2015-09-01 15:30:19 -04:00
Tom Lane
123c9d2fc1 Clean up icc + ia64 situation.
Some googling turned up multiple sources saying that older versions of icc
do not accept gcc-compatible asm blocks on IA64, though asm does work on
x86[_64].  This is apparently fixed as of icc version 12.0 or so, but that
doesn't help us much; if we have to carry the extra implementation anyway,
we may as well just use it for icc rather than add a compiler version test.

Hence, revert commit 2c713d6ea2 (though I
separated the icc code from the gcc code completely, producing what seems
cleaner code).  Document the state of affairs more explicitly, both in
s_lock.h and postgres.c, and make some cosmetic adjustments around the
IA64 code in s_lock.h.
2015-08-31 18:10:04 -04:00
Tom Lane
f333204bbc Actually, it's not that hard to merge the Windows pqsignal code ...
... just need to typedef sigset_t and provide sigemptyset/sigfillset,
which are easy enough.
2015-08-31 15:52:56 -04:00
Tom Lane
2c713d6ea2 Remove theoretically-unnecessary special case for icc.
Intel's icc is generally able to swallow asm blocks written for gcc.
We have a few places that don't seem to know that, though.  Experiment
with removing the special case for icc in ia64_get_bsp(); if the buildfarm
likes this, I'll try more cleanup.  This is a good test case because it
involves a "stop" notation that seems like it might not be very portable.
2015-08-31 14:43:10 -04:00
Tom Lane
a65e086453 Remove support for Unix systems without the POSIX signal APIs.
Remove configure's checks for HAVE_POSIX_SIGNALS, HAVE_SIGPROCMASK, and
HAVE_SIGSETJMP.  These APIs are required by the Single Unix Spec v2
(POSIX 1997), which we generally consider to define our minimum required
set of Unix APIs.  Moreover, no buildfarm member has reported not having
them since 2012 or before, which means that even if the code is still live
somewhere, it's untested --- and we've made plenty of signal-handling
changes of late.  So just take these APIs as given and save the cycles for
configure probes for them.

However, we can't remove as much C code as I'd hoped, because the Windows
port evidently still uses the non-POSIX code paths for signal masking.
Since we're largely emulating these BSD-style APIs for Windows anyway, it
might be a good thing to switch over to POSIX-like notation and thereby
remove a few more #ifdefs.  But I'm not in a position to code or test that.
In the meantime, we can at least make things a bit more transparent by
testing for WIN32 explicitly in these places.
2015-08-31 12:56:10 -04:00
Stephen Frost
2ba9e2b778 Ensure locks are acquired on RLS-added relations
During fireRIRrules(), get_row_security_policies can add to
securityQuals and withCheckOptions.  Make sure to lock any relations
added at that point and before firing RIR rules on those expressions.

Back-patch to 9.5 where RLS was added.
2015-08-28 11:39:37 -04:00
Andres Freund
c0f0d8097b Clarify what some historic terms in rewriteHandler.c mean.
Discussion: 20150827131352.GF2435@awork2.anarazel.de
2015-08-28 16:27:58 +02:00
Tom Lane
8a7d070181 Speed up HeapTupleSatisfiesMVCC() by replacing the XID-in-progress test.
Rather than consulting TransactionIdIsInProgress to see if an in-doubt
transaction is still running, consult XidInMVCCSnapshot.  That requires
the same or fewer cycles as TransactionIdIsInProgress, and what's far
more important, it does not access shared data structures (at least in the
no-subxip-overflow case) so it incurs no contention.  Furthermore, we would
have had to check XidInMVCCSnapshot anyway before deciding that we were
allowed to see the tuple.

There should never be a case where XidInMVCCSnapshot says a transaction is
done while TransactionIdIsInProgress says it's still running.  The other
way around is quite possible though.  The result of that difference is that
HeapTupleSatisfiesMVCC will no longer set hint bits on tuples whose source
transactions recently finished but are still running according to our
snapshot.  The main cost of delaying the hint-bit setting is that repeated
visits to a just-committed tuple, by transactions none of which have
snapshots new enough to see the source transaction as done, will each
execute TransactionIdIsCurrentTransactionId, which they need not have done
before.  However, that's normally just a small overhead, and no contention
costs are involved; so it seems well worth the benefit of removing
TransactionIdIsInProgress calls during the life of the source transaction.

The core idea for this patch is due to Jeff Janes, who also did the legwork
proving its performance benefits.  His original proposal was to swap the
order of TransactionIdIsInProgress and XidInMVCCSnapshot calls in some
cases within HeapTupleSatisfiesMVCC.  That was a bit messy though.
The idea that we could dispense with calling TransactionIdIsInProgress
altogether was mine, as is the final patch.
2015-08-26 18:19:07 -04:00
Tom Lane
7b5ef8f2d0 Limit the verbosity of memory context statistics dumps.
We had a report from Stefan Kaltenbrunner of a case in which postmaster
log files overran available disk space because multiple backends spewed
enormous context stats dumps upon hitting an out-of-memory condition.
Given the lack of similar reports, this isn't a common problem, but it
still seems worth doing something about.  However, we don't want to just
blindly truncate the output, because that might prevent diagnosis of OOM
problems.  What seems like a workable compromise is to limit the dump to
100 child contexts per parent, and summarize the space used within any
additional child contexts.  That should help because practical cases where
the dump gets long will typically be huge numbers of siblings under the
same parent context; while the additional debugging value from seeing
details about individual siblings beyond 100 will not be large, we hope.
Anyway it doesn't take much code or memory space to do this, so let's try
it like this and see how things go.

Since the summarization mechanism requires passing totals back up anyway,
I took the opportunity to add a "grand total" line to the end of the
printout.
2015-08-25 13:09:48 -04:00
Tom Lane
aad663a0b4 Reduce number of bytes examined by convert_one_string_to_scalar().
Previously, convert_one_string_to_scalar() would examine up to 20 bytes of
the input string, producing a scalar conversion with theoretical precision
far greater than is of any possible use considering the other limitations
on the accuracy of the resulting selectivity estimate.  (I think this
choice might pre-date the caller-level logic that strips any common prefix
of the strings; before that, there could have been value in scanning the
strings far enough to use all the precision available in a double.)

Aside from wasting cycles to little purpose, this choice meant that the
"denom" variable could grow to as much as 256^21 = 3.74e50, which could
overflow in some non-IEEE float arithmetics.  While we don't really support
any machines with non-IEEE arithmetic anymore, this still seems like quite
an unnecessary platform dependency.  Limit the scan to 12 bytes instead,
thus limiting "denom" to 256^13 = 2.03e31, a value more likely to be
computable everywhere.

Per testing by Greg Stark, which showed overflow failures in our standard
regression tests on VAX.
2015-08-23 15:15:47 -04:00
Tom Lane
44ed65a545 Avoid use of float arithmetic in bipartite_match.c.
Since the distances used in this algorithm are small integers (not more
than the size of the U set, in fact), there is no good reason to use float
arithmetic for them.  Use short ints instead: they're smaller, faster, and
require no special portability assumptions.

Per testing by Greg Stark, which disclosed that the code got into an
infinite loop on VAX for lack of IEEE-style float infinities.  We don't
really care all that much whether Postgres can run on a VAX anymore,
but there seems sufficient reason to change this code anyway.

In passing, make a few other small adjustments to make the code match
usual Postgres coding style a bit better.
2015-08-23 13:02:18 -04:00
Kevin Grittner
5956b7f9e8 Fix typo in C comment.
Merlin Moncure
Backpatch to 9.5, where the misspelling was introduced
2015-08-23 10:38:57 -05:00
Peter Eisentraut
b386271594 Improve whitespace 2015-08-22 21:54:35 -04:00
Tom Lane
6e5d9f278c Avoid O(N^2) behavior when enlarging SPI tuple table in spi_printtup().
For no obvious reason, spi_printtup() was coded to enlarge the tuple
pointer table by just 256 slots at a time, rather than doubling the size at
each reallocation, as is our usual habit.  For very large SPI results, this
makes for O(N^2) time spent in repalloc(), which of course soon comes to
dominate the runtime.  Use the standard doubling approach instead.

This is a longstanding performance bug, so back-patch to all active
branches.

Neil Conway
2015-08-21 20:32:11 -04:00
Alvaro Herrera
e68be16b0d Do not allow *timestamp to be passed as NULL
The code had bugs that would cause crashes if NULL was passed as that
argument (originally intended to mean not to bother returning its
value), and after inspection it turns out that nothing seems interested
in the case that *ts is NULL anyway.  Therefore, remove the partial
checks intended to support that case.

Author: Michael Paquier
though I didn't include a proposed Assert.

Backpatch to 9.5.
2015-08-21 14:36:54 -03:00
Alvaro Herrera
8c3d63c521 Remove ExecGetScanType function
This became unused in a191a169d6.
2015-08-21 14:11:58 -03:00
Tom Lane
09b3d27256 Allow record_in() and record_recv() to work for transient record types.
If we have the typmod that identifies a registered record type, there's no
reason that record_in() should refuse to perform input conversion for it.
Now, in direct SQL usage, record_in() will always be passed typmod = -1
with type OID RECORDOID, because no typmodin exists for type RECORD, so the
case can't arise.  However, some InputFunctionCall users such as PLs may be
able to supply the right typmod, so we should allow this to support them.

Note: the previous coding and comment here predate commit 59c016aa9f.
There has been no case since 8.1 in which the passed type OID wouldn't be
valid; and if it weren't, this error message wouldn't be apropos anyway.
Better to let lookup_rowtype_tupdesc complain about it.

Back-patch to 9.1, as this is necessary for my upcoming plpython fix.
I'm committing it separately just to make it a bit more visible in the
commit history.
2015-08-21 11:19:33 -04:00
Stephen Frost
3c99788797 Rename 'cmd' to 'cmd_name' in CreatePolicyStmt
To avoid confusion, rename CreatePolicyStmt's 'cmd' to 'cmd_name',
parse_policy_command's 'cmd' to 'polcmd', and AlterPolicy's 'cmd_datum'
to 'polcmd_datum', per discussion with Noah and as a follow-up to his
correction of copynodes/equalnodes handling of the CreatePolicyStmt
'cmd' field.

Back-patch to 9.5 where the CreatePolicyStmt was introduced, as we
are still only in alpha.
2015-08-21 08:22:22 -04:00
Stephen Frost
7ec8296e70 In AlterRole, make bypassrls an int
When reworking bypassrls in AlterRole to operate the same way the other
attribute handling is done, I missed that the variable was incorrectly a
bool rather than an int.  This meant that on platforms with an unsigned
char, we could end up with incorrect behavior during ALTER ROLE.

Pointed out by Andres thanks to tests he did changing our bool to be the
one from stdbool.h which showed this and a number of other issues.

Add regression tests to test CREATE/ALTER role for the various role
attributes.  Arrange to leave roles behind for testing pg_dumpall, but
none which have the LOGIN attribute.

Back-patch to 9.5 where the AlterRole bug exists.
2015-08-21 08:22:22 -04:00
Kevin Grittner
1cac8c9820 Fix bug in calculations of hash join buckets.
Commit 8cce08f168 used a left-shift
on a literal of 1 that could (in large allocations) be shifted by
31 or more bits.  This was assigned to a local variable that was
already declared to be a long to protect against overruns of int,
but the literal in this shift needs to be declared long to allow it
to work correctly in some compilers.

Backpatch to 9.5, where the bug was introduced.

Report and patch by KaiGai Kohei, slighly modified based on
discussion.
2015-08-19 08:20:55 -05:00
Andres Freund
e95126cf04 Don't use function definitions looking like old-style ones.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
2015-08-15 17:25:00 +02:00
Andres Freund
f9dec81a54 Correct type of waitMode variable in ExecInsertIndexTuples().
It was a bool, even though it should be CEOUC_WAIT_MODE. That's unlikely
to have a negative effect with the current definition of bool (char),
but it's definitely wrong.

Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.5, where ON CONFLICT was merged
2015-08-15 17:11:42 +02:00
Andres Freund
6c772c7453 Don't use 'bool' as a struct member name in help_config.c.
Doing so doesn't work if bool is a macro rather than a typedef.

Although c.h spends some effort to support configurations where bool is
a preexisting macro, help_config.c has existed this way since
2003 (b700a6), and there have not been any reports of
problems. Backpatch anyway since this is as riskless as it gets.

Discussion: 20150812084351.GD8470@awork2.anarazel.de
Backpatch: 9.0-master
2015-08-15 16:32:38 +02:00
Noah Misch
ec79978dd0 Encoding PG_UHC is code page 949.
This fixes presentation of non-ASCII messages to the Windows event log
and console in rare cases involving Korean locale.  Processes like the
postmaster and checkpointer, but not processes attached to databases,
were affected.  Back-patch to 9.4, where MessageEncoding was introduced.
The problem exists in all supported versions, but this change has no
effect in the absence of the code recognizing PG_UHC MessageEncoding.

Noticed while investigating bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:13 -04:00
Noah Misch
43adc7a714 Restore old pgwin32_message_to_UTF16() behavior outside transactions.
Commit 49c817eab7 replaced with a hard
error the dubious pg_do_encoding_conversion() behavior when outside a
transaction.  Reintroduce the historic soft failure locally within
pgwin32_message_to_UTF16().  This fixes errors when writing messages in
less-common encodings to the Windows event log or console.  Back-patch
to 9.4, where the aforementioned commit first appeared.

Per bug #13427 from Dmitri Bourlatchkov.
2015-08-14 20:23:09 -04:00
Simon Riggs
47167b7907 Reduce lock levels for ALTER TABLE SET autovacuum storage options
Reduce lock levels down to ShareUpdateExclusiveLock for all autovacuum-related
relation options when setting them using ALTER TABLE.

Add infrastructure to allow varying lock levels for relation options in later
patches. Setting multiple options together uses the highest lock level required
for any option. Works for both main and toast tables.

Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional regression
tests from myself
2015-08-14 14:19:28 +01:00
Alvaro Herrera
fcbf455842 Fix unitialized variables
As complained by clang, reported by Andres Freund.  Brown paper bag bug
in ccc4c07499.

Add some comments, too.

Backpatch to 9.5, like that one.
2015-08-13 00:12:07 -03:00
Tom Lane
cfe30a72fa Undo mistaken tightening in join_is_legal().
One of the changes I made in commit 8703059c6b turns out not to have
been such a good idea: we still need the exception in join_is_legal() that
allows a join if both inputs already overlap the RHS of the special join
we're checking.  Otherwise we can miss valid plans, and might indeed fail
to find a plan at all, as in recent report from Andreas Seltenreich.

That code was added way back in commit c17117649b, but I failed to
include a regression test case then; my bad.  Put it back with a better
explanation, and a test this time.  The logic does end up a bit different
than before though: I now believe it's appropriate to make this check
first, thereby allowing such a case whether or not we'd consider the
previous SJ(s) to commute with this one.  (Presumably, we already decided
they did; but it was confusing to have this consideration in the middle
of the code that was handling the other case.)

Back-patch to all active branches, like the previous patch.
2015-08-12 21:19:03 -04:00
Alvaro Herrera
ccc4c07499 Close some holes in BRIN page assignment
In some corner cases, it is possible for the BRIN index relation to be
extended by brin_getinsertbuffer but the new page not be used
immediately for anything by its callers; when this happens, the page is
initialized and the FSM is updated (by brin_getinsertbuffer) with the
info about that page, but these actions are not WAL-logged.  A later
index insert/update can use the page, but since the page is already
initialized, the initialization itself is not WAL-logged then either.
Replay of this sequence of events causes recovery to fail altogether.

There is a related corner case within brin_getinsertbuffer itself, in
which we extend the relation to put a new index tuple there, but later
find out that we cannot do so, and do not return the buffer; the page
obtained from extension is not even initialized.  The resulting page is
lost forever.

To fix, shuffle the code so that initialization is not the
responsibility of brin_getinsertbuffer anymore, in normal cases;
instead, the initialization is done by its callers (brin_doinsert and
brin_doupdate) once they're certain that the page is going to be used.
When either those functions determine that the new page cannot be used,
before bailing out they initialize the page as an empty regular page,
enter it in FSM and WAL-log all this.  This way, the page is usable for
future index insertions, and WAL replay doesn't find trying to insert
tuples in pages whose initialization didn't make it to the WAL.  The
same strategy is used in brin_getinsertbuffer when it cannot return the
new page.

Additionally, add a new step to vacuuming so that all pages of the index
are scanned; whenever an uninitialized page is found, it is initialized
as empty and WAL-logged.  This closes the hole that the relation is
extended but the system crashes before anything is WAL-logged about it.
We also take this opportunity to update the FSM, in case it has gotten
out of date.

Thanks to Heikki Linnakangas for finding the problem that kicked some
additional analysis of BRIN page assignment code.

Backpatch to 9.5, where BRIN was introduced.

Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org
2015-08-12 14:20:38 -03:00
Andres Freund
a4b059fdde Remove duplicated assignment in pg_create_physical_replication_slot.
Reported-By: Gurjeet Singh
2015-08-12 17:35:50 +02:00
Andres Freund
d25fbf9f3e Fix two off-by-one errors in bufmgr.c.
In 4b4b680c I passed a buffer index number (starting from 0) instead of
a proper Buffer id (which start from 1 for shared buffers) in two
places.

This wasn't noticed so far as one of those locations isn't compiled at
all (PrintPinnedBufs) and the other one (InvalidBuffer) requires a
unlikely, but possible, set of circumstances to trigger a symptom.

To reduce the likelihood of such incidents a bit also convert existing
open coded mappings from buffer descriptors to buffer ids with
BufferDescriptorGetBuffer().

Author: Qingqing Zhou
Reported-By: Qingqing Zhou
Discussion: CAJjS0u2ai9ooUisKtkV8cuVUtEkMTsbK8c7juNAjv8K11zeCQg@mail.gmail.com
Backpatch: 9.5 where the private ref count infrastructure was introduced
2015-08-12 17:35:50 +02:00
Tom Lane
8a0258c318 Fix some possible low-memory failures in regexp compilation.
newnfa() failed to set the regex error state when malloc() fails.
Several places in regcomp.c failed to check for an error after calling
subre().  Each of these mistakes could lead to null-pointer-dereference
crashes in memory-starved backends.

Report and patch by Andreas Seltenreich.  Back-patch to all branches.
2015-08-12 00:48:11 -04:00
Tom Lane
68fa28f771 Postpone extParam/allParam calculations until the very end of planning.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree.  We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded).  Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.

The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient.  (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.

Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan.  So leave that idea for another day.
2015-08-11 23:48:37 -04:00
Alvaro Herrera
4901b2f495 Don't include rel.h when relcache.h is sufficient
Trivial change to reduce exposure of rel.h.
2015-08-11 13:03:14 -03:00
Andres Freund
6fcd88511f Allow pg_create_physical_replication_slot() to reserve WAL.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.

Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.

Catversion bump because of the new parameter.

Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:34:31 +02:00
Andres Freund
093d0c83c1 Introduce macros determining if a replication slot is physical or logical.
These make the code a bit easier to read, and make it easier to add a
more explicit notion of a slot's type at some point in the future.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:32:48 +02:00
Andres Freund
3b425b7c02 Minor cleanups in slot related code.
Fix a bunch of typos, and remove two superflous includes.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
Backpatch: 9.4
2015-08-11 12:32:48 +02:00
Tom Lane
4200a92862 Further mucking with PlaceHolderVar-related restrictions on join order.
Commit 85e5e222b1 turns out not to have taken
care of all cases of the partially-evaluatable-PlaceHolderVar problem found
by Andreas Seltenreich's fuzz testing.  I had set it up to check for risky
PHVs only in the event that we were making a star-schema-based exception to
the param_source_rels join ordering heuristic.  However, it turns out that
the problem can occur even in joins that satisfy the param_source_rels
heuristic, in which case allow_star_schema_join() isn't consulted.
Refactor so that we check for risky PHVs whenever the proposed join has
any remaining parameterization.

Back-patch to 9.2, like the previous patch (except for the regression test
case, which only works back to 9.3 because it uses LATERAL).

Note that this discovery implies that problems of this sort could've
occurred in 9.2 and up even before the star-schema patch; though I've not
tried to prove that experimentally.
2015-08-10 17:18:17 -04:00
Andres Freund
3f811c2d6f Add confirmed_flush column to pg_replication_slots.
There's no reason not to expose both restart_lsn and confirmed_flush
since they have rather distinct meanings. The former is the oldest WAL
still required and valid for both physical and logical slots, whereas
the latter is the location up to which a logical slot's consumer has
confirmed receiving data. Most of the time a slot will require older
WAL (i.e. restart_lsn) than the confirmed
position (i.e. confirmed_flush_lsn).

Author: Marko Tiikkaja, editorialized by me
Discussion: 559D110B.1020109@joh.to
2015-08-10 13:28:18 +02:00
Andres Freund
5c4b25acce Fix copy & paste mistake in pg_get_replication_slots().
XLogRecPtr was compared with InvalidTransactionId instead of
InvalidXLogRecPtr. As both are defined to the same value this doesn't
cause any actual problems, but it's still wrong.

Backpatch: 9.4-master, bug was introduced in 9.4
2015-08-10 13:28:18 +02:00
Tom Lane
1e3e1ae266 Remove gram.y's precedence declaration for OVERLAPS.
The allowed syntax for OVERLAPS, viz "row OVERLAPS row", is sufficiently
constrained that we don't actually need a precedence declaration for
OVERLAPS; indeed removing this declaration does not change the generated
gram.c file at all.  Let's remove it to avoid confusion about whether
OVERLAPS has precedence or not.  If we ever generalize what we allow for
OVERLAPS, we might need to put back a precedence declaration for it,
but we might want some other level than what it has today --- and leaving
the declaration there would just risk confusion about whether that would
be an incompatible change.

Likewise, remove OVERLAPS from the documentation's precedence table.

Per discussion with Noah Misch.  Back-patch to 9.5 where we hacked up some
nearby precedence decisions.
2015-08-09 19:01:04 -04:00
Tom Lane
89db83922a Further adjustments to PlaceHolderVar removal.
A new test case from Andreas Seltenreich showed that we were still a bit
confused about removing PlaceHolderVars during join removal.  Specifically,
remove_rel_from_query would remove a PHV that was used only underneath
the removable join, even if the place where it's used was the join partner
relation and not the join clause being deleted.  This would lead to a
"too late to create a new PlaceHolderInfo" error later on.  We can defend
against that by checking ph_eval_at to see if the PHV could possibly be
getting used at some partner rel.

Also improve some nearby LATERAL-related logic.  I decided that the check
on ph_lateral needed to take precedence over the check on ph_needed, in
case there's a lateral reference underneath the join being considered.
(That may be impossible, but I'm not convinced of it, and it's easy enough
to defend against the case.)  Also, I realized that remove_rel_from_query's
logic for updating LateralJoinInfos is dead code, because we don't build
those at all until after join removal.

Back-patch to 9.3.  Previous versions didn't have the LATERAL issues, of
course, and they also didn't attempt to remove PlaceHolderInfos during join
removal.  (I'm starting to wonder if changing that was really such a great
idea.)
2015-08-07 14:13:50 -04:00
Robert Haas
846f8c9483 Fix attach-related race condition in shm_mq_send_bytes.
Spotted by Antonin Houska.
2015-08-07 10:04:07 -04:00
Andres Freund
4eda0a6470 Don't include low level locking code from frontend code.
Some frontend code like e.g. pg_xlogdump or pg_resetxlog, has to use
backend headers. Unfortunately until now that code includes most of the
locking code. It's generally not nice to expose such low level details,
but de6fd1c898 made that a hard problem. We fall back to defining
'inline' away if the compiler doesn't support it - that can cause linker
errors like on buildfarm animal pademelon if a inline function
references backend only code.

To fix that problem separate definitions from lock.h that are required
from frontend code into lockdefs.h and use it in the relevant
places. I've only removed the minimal amount of necessary definitions
for now - it might turn out that we want more for other reasons.

To avoid such details being exposed again put some checks against being
included from frontend code into atomics.h, lock.h, lwlock.h and
s_lock.h. It's otherwise fairly easy to indirectly include these
headers.

Discussion: 20150806070902.GE12214@awork2.anarazel.de
2015-08-07 15:10:56 +02:00
Andres Freund
18e8613564 Address points made in post-commit review of replication origins.
Amit reviewed the replication origins patch and made some good
points. Address them. This fixes typos in error messages, docs and
comments and adds a missing error check (although in a
should-never-happen scenario).

Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com
Backpatch: 9.5, where replication origins were introduced.
2015-08-07 15:09:05 +02:00
Tom Lane
bab163e121 Fix old oversight in join removal logic.
Commit 9e7e29c75a introduced an Assert that
join removal didn't reduce the eval_at set of any PlaceHolderVar to empty.
At first glance it looks like join_is_removable ensures that's true --- but
actually, the loop in join_is_removable skips PlaceHolderVars that are not
referenced above the join due to be removed.  So, if we don't want any
empty eval_at sets, the right thing to do is to delete any now-unreferenced
PlaceHolderVars from the data structure entirely.

Per fuzz testing by Andreas Seltenreich.  Back-patch to 9.3 where the
aforesaid Assert was added.
2015-08-06 22:14:27 -04:00
Tom Lane
cde35cf4ae Fix eclass_useful_for_merging to give valid results for appendrel children.
Formerly, this function would always return "true" for an appendrel child
relation, because it would think that the appendrel parent was a potential
join target for the child.  In principle that should only lead to some
inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed
that it could lead to "could not find pathkey item to sort" planner errors
in odd corner cases.  Specifically, we would think that all columns of a
child table's multicolumn index were interesting pathkeys, causing us to
generate a MergeAppend path that sorts by all the columns.  However, if any
of those columns weren't actually used above the level of the appendrel,
they would not get added to that rel's targetlist, which would result in
being unable to resolve the MergeAppend's sort keys against its targetlist
during createplan.c.

Backpatch to 9.3.  In older versions, columns of an appendrel get added
to its targetlist even if they're not mentioned above the scan level,
so that the failure doesn't occur.  It might be worth back-patching this
fix to older versions anyway, but I'll refrain for the moment.
2015-08-06 20:14:53 -04:00
Tom Lane
8703059c6b Further fixes for degenerate outer join clauses.
Further testing revealed that commit f69b4b9495 was still a few
bricks shy of a load: minor tweaking of the previous test cases resulted
in the same wrong-outer-join-order problem coming back.  After study
I concluded that my previous changes in make_outerjoininfo() were just
accidentally masking the problem, and should be reverted in favor of
forcing syntactic join order whenever an upper outer join's predicate
doesn't mention a lower outer join's LHS.  This still allows the
chained-outer-joins style that is the normally optimizable case.

I also tightened things up some more in join_is_legal().  It seems to me
on review that what's really happening in the exception case where we
ignore a mismatched special join is that we're allowing the proposed join
to associate into the RHS of the outer join we're comparing it to.  As
such, we should *always* insist that the proposed join be a left join,
which eliminates a bunch of rather dubious argumentation.  The case where
we weren't enforcing that was the one that was already known buggy anyway
(it had a violatable Assert before the aforesaid commit) so it hardly
deserves a lot of deference.

Back-patch to all active branches, like the previous patch.  The added
regression test case failed in all branches back to 9.1, and I think it's
only an unrelated change in costing calculations that kept 9.0 from
choosing a broken plan.
2015-08-06 15:35:46 -04:00
Robert Haas
df0a67f754 Fix incorrect calculation in shm_mq_receive.
If some, but not all, of the length word has already been read, and the
next attempt to read sees exactly the number of bytes needed to complete
the length word, or fewer, then we'll incorrectly read less than all of
the available data.

Antonin Houska
2015-08-06 13:25:45 -04:00
Robert Haas
0e141c0fbb Reduce ProcArrayLock contention by removing backends in batches.
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results.  When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn.  To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group.  Benchmarking reveals that this is much
more efficient at very high client counts.

Amit Kapila, heavily revised by me, with some review also from Pavan
Deolasee.
2015-08-06 12:02:12 -04:00
Noah Misch
b8fe12a836 Reconcile nodes/*funcs.c with recent work.
A few of the discrepancies had semantic significance, but I did not
track down the resulting user-visible bugs, if any.  Back-patch to 9.5,
where all but one discrepancy appeared.  The _equalCreateEventTrigStmt()
situation dates to 9.3 but does not affect semantics.

catversion bump due to readfuncs.c field order changes.
2015-08-05 20:44:27 -04:00
Alvaro Herrera
2834855cb9 Fix BRIN to use SnapshotAny during summarization
For correctness of summarization results, it is critical that the
snapshot used during the summarization scan is able to see all tuples
that are live to all transactions -- including tuples inserted or
deleted by in-progress transactions.  Otherwise, it would be possible
for a transaction to insert a tuple, then idle for a long time while a
concurrent transaction executes summarization of the range: this would
result in the inserted value not being considered in the summary.
Previously we were trying to use a MVCC snapshot in conjunction with
adding a "placeholder" tuple in the index: the snapshot would see all
committed tuples, and the placeholder tuple would catch insertions by
any new inserters.  The hole is that prior insertions by transactions
that are still in progress by the time the MVCC snapshot was taken were
ignored.

Kevin Grittner reported this as a bogus error message during vacuum with
default transaction isolation mode set to repeatable read (because the
error report mentioned a function name not being invoked during), but
the problem is larger than that.

To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves
the way we need using SnapshotAny visibility rules.  This change
simplifies the BRIN code a bit, mainly by removing large comments that
were mistaken.  Instead, rely on the SnapshotAny semantics to provide
what it needs.  (The business about a placeholder tuple needs to remain:
that covers the case that a transaction inserts a a tuple in a page that
summarization already scanned.)

Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org

In passing, remove a couple of unused declarations from brin.h and
reword a comment to be proper English.  This part submitted by Kevin
Grittner.

Backpatch to 9.5, where BRIN was introduced.
2015-08-05 16:20:50 -03:00
Tom Lane
6af9ee4c8c Make real sure we don't reassociate joins into or out of SEMI/ANTI joins.
Per the discussion in optimizer/README, it's unsafe to reassociate anything
into or out of the RHS of a SEMI or ANTI join.  An example from Piotr
Stefaniak showed that join_is_legal() wasn't sufficiently enforcing this
rule, so lock it down a little harder.

I couldn't find a reasonably simple example of the optimizer trying to
do this, so no new regression test.  (Piotr's example involved the random
search in GEQO accidentally trying an invalid case and triggering a sanity
check way downstream in clause selectivity estimation, which did not seem
like a sequence of events that would be useful to memorialize in a
regression test as-is.)

Back-patch to all active branches.
2015-08-05 14:39:29 -04:00
Andres Freund
de6fd1c898 Rely on inline functions even if that causes warnings in older compilers.
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.

To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.

This change in policy will allow us to more easily employ inline
functions.

I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.

Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.

Discussion: 20150701161447.GB30708@awork2.anarazel.de
2015-08-05 18:19:52 +02:00
Andres Freund
a855118be3 Fix debug message output when connecting to a logical slot.
Previously the message erroneously printed the same LSN twice as the
assignment to the start_lsn variable was before the message. Correct
that.

Reported-By: Marko Tiikkaja
Author: Marko Tiikkaja
Backpatch: 9.5, where logical decoding was introduced
2015-08-05 13:26:01 +02:00
Tom Lane
8ea3e7a75c Fix bogus "out of memory" reports in tuplestore.c.
The tuplesort/tuplestore memory management logic assumed that the chunk
allocation overhead for its memtuples array could not increase when
increasing the array size.  This is and always was true for tuplesort,
but we (I, I think) blindly copied that logic into tuplestore.c without
noticing that the assumption failed to hold for the much smaller array
elements used by tuplestore.  Given rather small work_mem, this could
result in an improper complaint about "unexpected out-of-memory situation",
as reported by Brent DeSpain in bug #13530.

The easiest way to fix this is just to increase tuplestore's initial
array size so that the assumption holds.  Rather than relying on magic
constants, though, let's export a #define from aset.c that represents
the safe allocation threshold, and make tuplestore's calculation depend
on that.

Do the same in tuplesort.c to keep the logic looking parallel, even though
tuplesort.c isn't actually at risk at present.  This will keep us from
breaking it if we ever muck with the allocation parameters in aset.c.

Back-patch to all supported versions.  The error message doesn't occur
pre-9.3, not so much because the problem can't happen as because the
pre-9.3 tuplestore code neglected to check for it.  (The chance of
trouble is a great deal larger as of 9.3, though, due to changes in the
array-size-increasing strategy.)  However, allowing LACKMEM() to become
true unexpectedly could still result in less-than-desirable behavior,
so let's patch it all the way back.
2015-08-04 18:18:46 -04:00
Tom Lane
85e5e222b1 Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a7460d, I changed the planner
so that it would allow nestloop paths to remain partially parameterized,
ie the inner relation might need parameters from both the current outer
relation and some upper-level outer relation.  That's fine so long as we're
talking about distinct parameters; but the patch also allowed creation of
nestloop paths for cases where the inner relation's parameter was a
PlaceHolderVar whose eval_at set included the current outer relation and
some upper-level one.  That does *not* work.

In principle we could allow such a PlaceHolderVar to be evaluated at the
lower join node using values passed down from the upper relation along with
values from the join's own outer relation.  However, nodeNestloop.c only
supports simple Vars not arbitrary expressions as nestloop parameters.
createplan.c is also a few bricks shy of being able to handle such cases;
it misplaces the PlaceHolderVar parameters in the plan tree, which is why
the visible symptoms of this bug are "plan should not reference subplan's
variable" and "failed to assign all NestLoopParams to plan nodes" planner
errors.

Adding the necessary complexity to make this work doesn't seem like it
would be repaid in significantly better plans, because in cases where such
a PHV exists, there is probably a corresponding join order constraint that
would allow a good plan to be found without using the star-schema exception.
Furthermore, adding complexity to nodeNestloop.c would create a run-time
penalty even for plans where this whole consideration is irrelevant.
So let's just reject such paths instead.

Per fuzz testing by Andreas Seltenreich; the added regression test is based
on his example query.  Back-patch to 9.2, like the previous patch.
2015-08-04 14:55:50 -04:00
Robert Haas
369342cf70 Cap wal_buffers to avoid a server crash when it's set very large.
It must be possible to multiply wal_buffers by XLOG_BLCKSZ without
overflowing int, or calculations in StartupXLOG will go badly wrong
and crash the server.  Avoid that by imposing a maximum value on
wal_buffers.  This will be just under 2GB, assuming the usual value
for XLOG_BLCKSZ.

Josh Berkus, per an analysis by Andrew Gierth.
2015-08-04 12:58:54 -04:00
Robert Haas
a6a2357820 Update comment to match behavior of latest code.
Peter Geoghegan
2015-08-04 11:45:29 -04:00
Heikki Linnakangas
804163bc25 Share transition state between different aggregates when possible.
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.

David Rowley, reviewed and edited by me.
2015-08-04 17:53:10 +03:00
Stephen Frost
dee0200f02 RLS: Keep deny policy when only restrictive exist
Only remove the default deny policy when a permissive policy exists
(either from the hook or defined by the user).  If only restrictive
policies exist then no rows will be visible, as restrictive policies
shouldn't make rows visible.  To address this requirement, a single
"USING (true)" permissive policy can be created.

Update the test_rls_hooks regression tests to create the necessary
"USING (true)" permissive policy.

Back-patch to 9.5 where RLS was added.

Per discussion with Dean.
2015-08-03 15:32:49 -04:00
Fujii Masao
dd85acf0c4 Make recovery rename tablespace_map to *.old if backup_label is not present.
If tablespace_map file is present without backup_label file, there is
no use of such file.  There is no harm in retaining it, but it is better
to get rid of the map file so that we don't have any redundant file
in data directory and it will avoid any sort of confusion. It seems
prudent though to just rename the file out of the way rather than
delete it completely, also we ignore any error that occurs in rename
operation as even if map file is present without backup_label file,
it is harmless.

Back-patch to 9.5 where tablespace_map file was introduced.

Amit Kapila, reviewed by Robert Haas, Alvaro Herrera and me.
2015-08-03 23:04:41 +09:00
Tom Lane
09cecdf285 Fix a number of places that produced XX000 errors in the regression tests.
It's against project policy to use elog() for user-facing errors, or to
omit an errcode() selection for errors that aren't supposed to be "can't
happen" cases.  Fix all the violations of this policy that result in
ERRCODE_INTERNAL_ERROR log entries during the standard regression tests,
as errors that can reliably be triggered from SQL surely should be
considered user-facing.

I also looked through all the files touched by this commit and fixed
other nearby problems of the same ilk.  I do not claim to have fixed
all violations of the policy, just the ones in these files.

In a few places I also changed existing ERRCODE choices that didn't
seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR
by something more specific.

Back-patch to 9.5, but no further; changing ERRCODE assignments in
stable branches doesn't seem like a good idea.
2015-08-02 23:49:19 -04:00
Tom Lane
13bba02271 Avoid calling memcpy() with a NULL source pointer and count == 0.
As in commit 0a52d378b0, avoid doing something that has undefined
results according to the C standard, even though in practice there does
not seem to be any problem with it.

This fixes two places in numeric.c that demonstrably could call memcpy()
with such arguments.  I looked through that file and didn't see any other
places with similar hazards; this is not to claim that there are not such
places in other files.

Per report from Piotr Stefaniak.  Back-patch to 9.5 which is where the
previous commit was added.  We're more or less setting a precedent that
we will not worry about this type of issue in pre-9.5 branches unless
someone demonstrates a problem in the field.
2015-08-02 15:48:31 -04:00
Tom Lane
d73d14c271 Fix incorrect order of lock file removal and failure to close() sockets.
Commit c9b0cbe98b accidentally broke the
order of operations during postmaster shutdown: it resulted in removing
the per-socket lockfiles after, not before, postmaster.pid.  This creates
a race-condition hazard for a new postmaster that's started immediately
after observing that postmaster.pid has disappeared; if it sees the
socket lockfile still present, it will quite properly refuse to start.
This error appears to be the explanation for at least some of the
intermittent buildfarm failures we've seen in the pg_upgrade test.

Another problem, which has been there all along, is that the postmaster
has never bothered to close() its listen sockets, but has just allowed them
to close at process death.  This creates a different race condition for an
incoming postmaster: it might be unable to bind to the desired listen
address because the old postmaster is still incumbent.  This might explain
some odd failures we've seen in the past, too.  (Note: this is not related
to the fact that individual backends don't close their client communication
sockets.  That behavior is intentional and is not changed by this patch.)

Fix by adding an on_proc_exit function that closes the postmaster's ports
explicitly, and (in 9.3 and up) reshuffling the responsibility for where
to unlink the Unix socket files.  Lock file unlinking can stay where it
is, but teach it to unlink the lock files in reverse order of creation.
2015-08-02 14:55:03 -04:00
Heikki Linnakangas
358cde320b Fix race condition that lead to WALInsertLock deadlock with commit_delay.
If a call to WaitForXLogInsertionsToFinish() returned a value in the middle
of a page, and another backend then started to insert a record to the same
page, and then you called WaitXLogInsertionsToFinish() again, the second
call might return a smaller value than the first call. The problem was in
GetXLogBuffer(), which always updated the insertingAt value to the
beginning of the requested page, not the actual requested location. Because
of that, the second call might return a xlog pointer to the beginning of
the page, while the first one returned a later position on the same page.
XLogFlush() performs two calls to WaitXLogInsertionsToFinish() in
succession, and holds WALWriteLock on the second call, which can deadlock
if the second call to WaitXLogInsertionsToFinish() blocks.

Reported by Spiros Ioannou. Backpatch to 9.4, where the more scalable
WALInsertLock mechanism, and this bug, was introduced.
2015-08-02 20:08:10 +03:00
Andres Freund
a4b09af3e9 Micro optimize LWLockAttemptLock() a bit.
LWLockAttemptLock pointlessly read the lock's state in every loop
iteration, even though pg_atomic_compare_exchange_u32() returns the old
value. Instead do that only once before the loop iteration.

Additionally there's no need to have the expected_state variable,
old_state mostly had the same value anyway.

Noticed-By: Heikki Linnakangas
Backpatch: 9.5, no reason to let the branches diverge at this point
2015-08-02 18:41:23 +02:00
Andres Freund
7039760114 Fix issues around the "variable" support in the lwlock infrastructure.
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.

To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.

Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
2015-08-02 18:41:23 +02:00
Tom Lane
f69b4b9495 Fix some planner issues with degenerate outer join clauses.
An outer join clause that didn't actually reference the RHS (perhaps only
after constant-folding) could confuse the join order enforcement logic,
leading to wrong query results.  Also, nested occurrences of such things
could trigger an Assertion that on reflection seems incorrect.

Per fuzz testing by Andreas Seltenreich.  The practical use of such cases
seems thin enough that it's not too surprising we've not heard field
reports about it.

This has been broken for a long time, so back-patch to all active branches.
2015-08-01 20:57:41 -04:00
Tom Lane
dea1491ffb Teach predtest.c that "foo" implies "foo IS NOT NULL".
Per complaint from Peter Holzer.  It's useful to cover this special case,
since for a boolean variable "foo", earlier parts of the planner will have
reduced variants like "foo = true" to just "foo", and thus we may fail
to recognize the applicability of a partial index with predicate
"foo IS NOT NULL".

Back-patch to 9.5, but not further; given the lack of previous complaints
this doesn't seem like behavior to change in stable branches.
2015-08-01 14:31:46 -04:00
Tom Lane
a6492ff897 Fix an oversight in checking whether a join with LATERAL refs is legal.
In many cases, we can implement a semijoin as a plain innerjoin by first
passing the righthand-side relation through a unique-ification step.
However, one of the cases where this does NOT work is where the RHS has
a LATERAL reference to the LHS; that makes the RHS dependent on the LHS
so that unique-ification is meaningless.  joinpath.c understood this,
and so would not generate any join paths of this kind ... but join_is_legal
neglected to check for the case, so it would think that we could do it.
The upshot would be a "could not devise a query plan for the given query"
failure once we had failed to generate any join paths at all for the bogus
join pair.

Back-patch to 9.3 where LATERAL was added.
2015-07-31 19:26:33 -04:00
Alvaro Herrera
c81276241b Fix broken assertion in BRIN code
The code was assuming that any NULL value in scan keys was due to IS
NULL or IS NOT NULL, but it turns out to be possible to get them with
other operators too, if they are used in contrived-enough ways.  Easiest
way out of the problem seems to check explicitely for the IS NOT NULL
flag, instead of assuming it must be set if the IS NULL flag is not set,
when a null scan key is found; if neither flag is set, follow the lead
of other index AMs and assume that all indexable operators must be
strict, and thus the query is never satisfiable.

Also, add a comment to try and lure some future hacker into improving
analysis of scan keys in brin.

Per report from Andreas Seltenreich; diagnosis by Tom Lane.
Backpatch to 9.5.

Discussion: http://www.postgresql.org/message-id/20646.1437919632@sss.pgh.pa.us
2015-07-30 15:07:19 -03:00
Joe Conway
1e15b21229 Use appropriate command type when retrieving relation's policies.
When retrieving policies, if not working on the root target relation,
we actually want the relation's SELECT policies, regardless of
the top level query command type. For example in UPDATE t1...FROM t2
we need to apply t1's UPDATE policies and t2's SELECT policies.
Previously top level query command type was applied to all relations,
which was wrong. Add some regression coverage to ensure we don't
violate this principle in the future.

Report and patch by Dean Rasheed. Cherry picked from larger refactoring
patch and tweaked by me. Back-patched to 9.5 where RLS was introduced.
2015-07-30 09:38:15 -07:00
Tom Lane
8693ebe37d Avoid some zero-divide hazards in the planner.
Although I think on all modern machines floating division by zero
results in Infinity not SIGFPE, we still don't want infinities
running around in the planner's costing estimates; too much risk
of that leading to insane behavior.

grouping_planner() failed to consider the possibility that final_rel
might be known dummy and hence have zero rowcount.  (I wonder if it
would be better to set a rows estimate of 1 for dummy relations?
But at least in the back branches, changing this convention seems
like a bad idea, so I'll leave that for another day.)

Make certain that get_variable_numdistinct() produces a nonzero result.
The case that can be shown to be broken is with stadistinct < 0.0 and
small ntuples; we did not prevent the result from rounding to zero.
For good luck I applied clamp_row_est() to all the nonconstant return
values.

In ExecChooseHashTableSize(), Assert that we compute positive nbuckets
and nbatch.  I know of no reason to think this isn't the case, but it
seems like a good safety check.

Per reports from Piotr Stefaniak.  Back-patch to all active branches.
2015-07-30 12:11:23 -04:00
Andrew Dunstan
2cd40adb85 Add IF NOT EXISTS processing to ALTER TABLE ADD COLUMN
Fabrízio de Royes Mello, reviewed by Payal Singh, Alvaro Herrera and
Michael Paquier.
2015-07-29 21:30:00 -04:00
Joe Conway
632cd9f892 Create new ParseExprKind for use by policy expressions.
Policy USING and WITH CHECK expressions were using EXPR_KIND_WHERE for
parse analysis, which results in inappropriate ERROR messages when
the expression contains unsupported constructs such as aggregates.
Create a new ParseExprKind called EXPR_KIND_POLICY and tailor the
related messages to fit.

Reported by Noah Misch. Reviewed by Dean Rasheed, Alvaro Herrera,
and Robert Haas. Back-patch to 9.5 where RLS was introduced.
2015-07-29 15:40:24 -07:00
Robert Haas
f04ce31475 Fix incorrect comment.
Amit Langote
2015-07-29 16:47:12 -04:00
Joe Conway
efe72a82aa Add missing post create and alter hooks to policy objects.
AlterPolicy() and CreatePolicy() lacked their respective hook invocations.
Noted by Noah Misch, review by Dean Rasheed. Back-patch to 9.5 where
RLS was introduced.
2015-07-29 09:47:49 -07:00
Andres Freund
3bc9356ddd Remove outdated comment in LWLockDequeueSelf's header.
Noticed-By: Robert Haas
Backpatch: 9.5, where the function was added
2015-07-29 10:13:10 +02:00
Heikki Linnakangas
a309ebd6b9 Fix typo in comment.
Amit Langote
2015-07-29 10:55:43 +03:00
Tom Lane
2c698f438a Suppress "variable may be used uninitialized" warning.
Also re-pgindent, just because I'm a neatnik.
2015-07-28 19:55:59 -04:00
Joe Conway
d824e2800f Disallow converting a table to a view if row security is present.
When DefineQueryRewrite() is about to convert a table to a view, it checks
the table for features unavailable to views.  For example, it rejects tables
having triggers.  It omits to reject tables having relrowsecurity or a
pg_policy record. Fix that. To faciliate the repair, invent
relation_has_policies() which indicates the presence of policies on a
relation even when row security is disabled for that relation.

Reported by Noah Misch. Patch by me, review by Stephen Frost. Back-patch
to 9.5 where RLS was introduced.
2015-07-28 16:24:01 -07:00
Joe Conway
f781a0f1d8 Create a pg_shdepend entry for each role in TO clause of policies.
CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for
each role in the TO clause. Fix this by creating a new shared dependency
type called SHARED_DEPENDENCY_POLICY and assigning it to each role.

Reported by Noah Misch. Patch by me, reviewed by Alvaro Herrera.
Back-patch to 9.5 where RLS was introduced.
2015-07-28 16:01:53 -07:00
Andrew Dunstan
6d10f4e9d7 Only adjust negative indexes in json_get up to the length of the path.
The previous code resulted in memory access beyond the path bounds. The
cure is to move it into a code branch that checks the value of lex_level
is within the correct bounds.

Bug reported and diagnosed by Piotr Stefaniak.
2015-07-28 17:54:13 -04:00
Tom Lane
d8f15c95be Reduce chatter from signaling of autovacuum workers.
Don't print a WARNING if we get ESRCH from a kill() that's attempting
to cancel an autovacuum worker.  It's possible (and has been seen in the
buildfarm) that the worker is already gone by the time we are able to
execute the kill, in which case the failure is harmless.  About the only
plausible reason for reporting such cases would be to help debug corrupted
lock table contents, but this is hardly likely to be the most important
symptom if that happens.  Moreover issuing a WARNING might scare users
more than is warranted.

Also, since sending a signal to an autovacuum worker is now entirely a
routine thing, and the worker will log the query cancel on its end anyway,
reduce the message saying we're doing that from LOG to DEBUG1 level.

Very minor cosmetic cleanup as well.

Since the main practical reason for doing this is to avoid unnecessary
buildfarm failures, back-patch to all active branches.
2015-07-28 17:34:23 -04:00
Joe Conway
7b4bfc87d5 Plug RLS related information leak in pg_stats view.
The pg_stats view is supposed to be restricted to only show rows
about tables the user can read. However, it sometimes can leak
information which could not otherwise be seen when row level security
is enabled. Fix that by not showing pg_stats rows to users that would
be subject to RLS on the table the row is related to. This is done
by creating/using the newly introduced SQL visible function,
row_security_active().

Along the way, clean up three call sites of check_enable_rls(). The second
argument of that function should only be specified as other than
InvalidOid when we are checking as a different user than the current one,
as in when querying through a view. These sites were passing GetUserId()
instead of InvalidOid, which can cause the function to return incorrect
results if the current user has the BYPASSRLS privilege and row_security
has been set to OFF.

Additionally fix a bug causing RI Trigger error messages to unintentionally
leak information when RLS is enabled, and other minor cleanup and
improvements. Also add WITH (security_barrier) to the definition of pg_stats.

Bumped CATVERSION due to new SQL functions and pg_stats view definition.

Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav.
Patch by Joe Conway and Dean Rasheed with review and input by
Michael Paquier and Stephen Frost.
2015-07-28 13:21:22 -07:00
Andres Freund
426746b930 Remove ssl renegotiation support.
While postgres' use of SSL renegotiation is a good idea in theory, it
turned out to not work well in practice. The specification and openssl's
implementation of it have lead to several security issues. Postgres' use
of renegotiation also had its share of bugs.

Additionally OpenSSL has a bunch of bugs around renegotiation, reported
and open for years, that regularly lead to connections breaking with
obscure error messages. We tried increasingly complex workarounds to get
around these bugs, but we didn't find anything complete.

Since these connection breakages often lead to hard to debug problems,
e.g. spuriously failing base backups and significant latency spikes when
synchronous replication is used, we have decided to change the default
setting for ssl renegotiation to 0 (disabled) in the released
backbranches and remove it entirely in 9.5 and master.

Author: Andres Freund
Discussion: 20150624144148.GQ4797@alap3.anarazel.de
Backpatch: 9.5 and master, 9.0-9.4 get a different patch
2015-07-28 22:06:31 +02:00
Robert Haas
6f2871f12e Centralize decision-making about where to get a backend's PGPROC.
This code was originally written as part of parallel query effort, but
it seems to have independent value, because if we make one decision
about where to get a PGPROC when we allocate and then put it back on a
different list at backend-exit time, bad things happen.  This isn't
just a theoretical risk; we fixed an actual problem of this type in
commit e280c630a8.
2015-07-28 14:51:57 -04:00
Tom Lane
95f4e59c32 Remove an unsafe Assert, and explain join_clause_is_movable_into() better.
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level.  This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact.  Comment out the Assert, and add a bunch of commentary
to clarify what's going on.

Per fuzz testing by Andreas Seltenreich.  The added regression test is
a pretty silly query, but it's based on his crasher example.

Back-patch to 9.2 where the faulty logic was introduced.
2015-07-28 13:20:39 -04:00
Heikki Linnakangas
5e65f45c6e Another attempt at fixing memory leak in xlogreader.
max_block_id is also reset between reading records.

Michael Paquier
2015-07-28 09:09:36 +03:00
Stephen Frost
3d5cb31c9a Improve RLS handling in copy.c
To avoid a race condition where the relation being COPY'd could be
changed into a view or otherwise modified, keep the original lock
on the relation.  Further, fully qualify the relation when building
the query up.

Also remove the poorly thought-out Assert() and check the entire
relationOids list as, post-RLS, there can certainly be multiple
relations involved and the planner does not guarantee their ordering.

Per discussion with Noah and Andres.

Back-patch to 9.5 where RLS was introduced.
2015-07-27 16:48:26 -04:00
Tom Lane
4c8f8ffaca Further code review for pg_stat_ssl patch.
Fix additional bogosity in commit 9029f4b374.  Include the
BackendSslStatusBuffer in the BackendStatusShmemSize calculation,
avoid ugly and error-prone casts to char* and back, put related
code stanzas into a consistent order (and fix a couple of previous
instances of that sin).  All cosmetic except for the size oversight.
2015-07-27 16:29:14 -04:00
Tom Lane
7d791ed49b Fix pointer-arithmetic thinko in pg_stat_ssl patch.
Nasty memory-stomp bug in commit 9029f4b374.  It's not apparent how
this survived even cursory testing :-(.  Per report from Peter Holzer.
2015-07-27 15:58:46 -04:00
Heikki Linnakangas
820d1ced1b Don't assume that PageIsEmpty() returns true on an all-zeros page.
It does currently, and I don't see us changing that any time soon, but we
don't make that assumption anywhere else.

Per Tom Lane's suggestion. Backpatch to 9.2, like the previous patch that
added this assumption.
2015-07-27 18:54:09 +03:00
Heikki Linnakangas
61a65c53bd Fix memory leak in xlogreader facility.
XLogReaderFree failed to free the per-block data buffers, when they
happened to not be used by the latest read WAL record.

Michael Paquier. Backpatch to 9.5, where the per-block buffers were added.
2015-07-27 18:29:31 +03:00
Heikki Linnakangas
334445179c Reuse all-zero pages in GIN.
In GIN, an all-zeros page would be leaked forever, and never reused. Just
add them to the FSM in vacuum, and they will be reinitialized when grabbed
from the FSM. On master and 9.5, attempting to access the page's opaque
struct also caused an assertion failure, although that was otherwise
harmless.

Reported by Jeff Janes. Backpatch to all supported versions.
2015-07-27 12:30:26 +03:00
Heikki Linnakangas
023430abf7 Fix handling of all-zero pages in SP-GiST vacuum.
SP-GiST initialized an all-zeros page at vacuum, but that was not
WAL-logged, which is not safe. You might get a torn page write, when it gets
flushed to disk, and end-up with a half-initialized index page. To fix,
leave it in the all-zeros state, and add it to the FSM. It will be
initialized when reused. Also don't set the page-deleted flag when recycling
an empty page. That was also not WAL-logged, and a torn write of that would
cause the page to have an invalid checksum.

Backpatch to 9.2, where SP-GiST indexes were added.
2015-07-27 12:28:21 +03:00
Heikki Linnakangas
65c384c5ab Avoid calling PageGetSpecialPointer() on an all-zeros page.
That was otherwise harmless, but tripped the new assertion in
PageGetSpecialPointer().

Reported by Amit Langote. Backpatch to 9.5, where the assertion was added.
2015-07-27 12:24:27 +03:00
Heikki Linnakangas
e3a9a194b7 Remove false comment about speculative insertion.
There is no full discussion of speculative insertions in the executor
README. There is a high-level explanation in execIndexing.c, but it doesn't
seem necessary to refer it from here.

Peter Geoghegan
2015-07-27 11:46:11 +03:00
Tom Lane
fca8e59c1c Fix oversight in flattening of subqueries with empty FROM.
I missed a restriction that commit f4abd0241d
should have enforced: we can't pull up an empty-FROM subquery if it's under
an outer join, because then we'd need to wrap its output columns in
PlaceHolderVars.  As the code currently stands, the PHVs end up with empty
relid sets, which doesn't work (and is correctly caught by an Assert).

It's possible that this could be fixed by assigning the PHVs the relid
sets of the parent FromExpr/JoinExpr, but getting that to work is more
complication than I care to add right now; indeed it's likely that
we'll never bother, since pulling up empty-FROM subqueries is a rather
marginal optimization anyway.

Per report from Andreas Seltenreich.  Back-patch to 9.5 where the faulty
code was added.
2015-07-26 17:44:27 -04:00
Tom Lane
358eaa01bf Make entirely-dummy appendrels get marked as such in set_append_rel_size.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL).  When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount.  Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist.  (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)

In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.

Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.

Per report from Andreas Seltenreich.  Back-patch to 9.2.  Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist.  It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
2015-07-26 16:19:08 -04:00
Andres Freund
159cff58cf Check the relevant index element in ON CONFLICT unique index inference.
ON CONFLICT unique index inference had a thinko that could affect cases
where the user-supplied inference clause required that an attribute
match a particular (user specified) collation and/or opclass.

infer_collation_opclass_match() has to check for opclass and/or
collation matches and that the attribute is in the list of attributes or
expressions known to be in the definition of the index under
consideration. The bug was that these two conditions weren't necessarily
evaluated for the same index attribute.

Author: Peter Geoghegan
Discussion: CAM3SWZR4uug=WvmGk7UgsqHn2MkEzy9YU-+8jKGO4JPhesyeWg@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
2015-07-26 18:20:41 +02:00
Andres Freund
faab14ecb8 Fix flattening of nested grouping sets.
Previously nested grouping set specifications accidentally weren't
flattened, but instead contained the nested specification as a element
in the outer list.

Fix this by, as actually documented in comments, concatenating the
nested set specification into the outer one. Also add tests to prevent
this from breaking again.

Author: Andrew Gierth, with tests from Jeevan Chalke
Reported-By: Jeevan Chalke
Discussion: CAM2+6=V5YvuxB+EyN4iH=GbD-XTA435TCNvnDFSD--YvXs+pww@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:50:29 +02:00
Andres Freund
61444bfb80 Allow to push down clauses from HAVING to WHERE when grouping sets are used.
Previously we disallowed pushing down quals to WHERE in the presence of
grouping sets. That's overly restrictive.

We now instead copy quals to WHERE if applicable, leaving the
one in HAVING in place. That's because, at that stage of the planning
process, it's nontrivial to determine if it's safe to remove the one in
HAVING.

Author: Andrew Gierth
Discussion: 874mkt3l59.fsf@news-spur.riddles.org.uk
Backpatch: 9.5, where grouping sets were introduced. This isn't exactly
    a bugfix, but it seems better to keep the branches in sync at this point.
2015-07-26 16:50:20 +02:00
Andres Freund
e6d8cb77c0 Recognize GROUPING() as a aggregate expression.
Previously GROUPING() was not recognized as a aggregate expression,
erroneously allowing the planner to move it from HAVING to WHERE.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=WG9omG5rFOMAYBweJxmpTaapvVp5pCeMrE6BfpCwr4Og@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:50:02 +02:00
Andres Freund
144666f65b Build column mapping for grouping sets in all required cases.
The previous coding frequently failed to fail because for one it's
unusual to have rollup clauses with one column, and for another
sometimes the wrong mapping didn't cause obvious problems.

Author: Jeevan Chalke
Reviewed-By: Andrew Gierth
Discussion: CAM2+6=W=9=hQOipH0HAPbkun3Z3TFWij_EiHue0_6UX=oR=1kw@mail.gmail.com
Backpatch: 9.5, where grouping sets were introduced
2015-07-26 16:46:27 +02:00
Tom Lane
d9476b8380 Dodge portability issue (apparent compiler bug) in new tablesample code.
Some of the older OS X critters in the buildfarm are failing regression,
with symptoms showing that a request for 100% sampling in BERNOULLI or
SYSTEM methods actually gets only around 50% of the table.  gdb revealed
that the computation of the "cutoff" number was producing 0x7FFFFFFF
rather than the expected 0x100000000.  Inspecting the assembly code,
it looks like gcc is trying to use lrint() instead of rint() and then
fumbling the conversion from long double to uint64.  This seems like a
clear compiler bug, but assigning the intermediate result into a plain
double variable works around it, so let's just do that.  (Another idea
would be to give up one bit of hash width so that we don't need to use
a uint64 cutoff, but let's see if this is enough.)
2015-07-25 19:42:32 -04:00
Tom Lane
dd7a8f66ed Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM.  (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.)  Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type.  This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.

Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.

Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.

Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
Joe Conway
b26e3d660d Make RLS work with UPDATE ... WHERE CURRENT OF
UPDATE ... WHERE CURRENT OF would not work in conjunction with
RLS. Arrange to allow the CURRENT OF expression to be pushed down.
Issue noted by Peter Geoghegan. Patch by Dean Rasheed. Back patch
to 9.5 where RLS was introduced.
2015-07-24 12:55:30 -07:00
Andrew Dunstan
d9a356ff2e Fix treatment of nulls in jsonb_agg and jsonb_object_agg
The wrong is_null flag was being passed to datum_to_json. Also, null
object key values are not permitted, and this was not being checked
for. Add regression tests covering these cases, and also add those tests
to the json set, even though it was doing the right thing.

Fixes bug #13514, initially diagnosed by Tom Lane.
2015-07-24 09:40:46 -04:00
Andres Freund
c1ca3a19df Fix bug around assignment expressions containing indirections.
Handling of assigned-to expressions with indirection (e.g. set f1[1] =
3) was broken for ON CONFLICT DO UPDATE.  The problem was that
ParseState was consulted to determine if an INSERT-appropriate or
UPDATE-appropriate behavior should be used when transforming expressions
with indirections. When the wrong path was taken the old row was
substituted with NULL, leading to wrong results..

To fix remove p_is_update and only use p_is_insert to decide how to
transform the assignment expression, and uset p_is_insert while parsing
the on conflict statement. This isn't particularly pretty, but it's not
any worse than before.

Author: Peter Geoghegan, slightly edited by me
Discussion: CAM3SWZS8RPvA=KFxADZWw3wAHnnbxMxDzkEC6fNaFc7zSm411w@mail.gmail.com
Backpatch: 9.5, where the feature was introduced
2015-07-24 11:52:07 +02:00
Heikki Linnakangas
766dcfb16c Fix off-by-one error in calculating subtrans/multixact truncation point.
If there were no subtransactions (or multixacts) active, we would calculate
the oldestxid == next xid. That's correct, but if next XID happens to be
on the next pg_subtrans (pg_multixact) page, the page does not exist yet,
and SimpleLruTruncate will produce an "apparent wraparound" warning. The
warning is harmless in this case, but looks very alarming to users.

Backpatch to all supported versions. Patch and analysis by Thomas Munro.
2015-07-23 01:29:59 +03:00
Tom Lane
46d0a9bfac Fix add_rte_to_flat_rtable() for recent feature additions.
The TABLESAMPLE and row security patches each overlooked this function,
though their errors of omission were opposite: RLS failed to zero out the
securityQuals field, leading to wasteful copying of useless expression
trees in finished plans, while TABLESAMPLE neglected to add a comment
saying that it intentionally *isn't* deleting the tablesample subtree.
There probably should be a similar comment about ctename, too.

Back-patch as appropriate.
2015-07-21 20:03:58 -04:00
Tom Lane
434873806a Fix some oversights in BRIN patch.
Remove HeapScanDescData.rs_initblock, which wasn't being used for anything
in the final version of the patch.

Fix IndexBuildHeapScan so that it supports syncscan again; the patch
broke synchronous scanning for index builds by forcing rs_startblk
to zero even when the caller did not care about that and had asked
for syncscan.

Add some commentary and usage defenses to heap_setscanlimits().

Fix heapam so that asking for rs_numblocks == 0 does what you would
reasonably expect.  As coded it amounted to requesting a whole-table
scan, because those "--x <= 0" tests on an unsigned variable would
behave surprisingly.
2015-07-21 13:38:24 -04:00
Alvaro Herrera
149b1dd840 Fix omission of OCLASS_TRANSFORM in object_classes[]
This was forgotten in cac7658205 (and its fixup ad89a5d115).  Since it
seems way too easy to miss this, this commit also introduces a mechanism
to enforce that the array is consistent with the enum.

Problem reported independently by Robert Haas and Jaimin Pan.
Patches proposed by Jaimin Pan, Jim Nasby, Michael Paquier and myself,
though I didn't use any of these and instead went with a cleaner
approach suggested by Tom Lane.

Backpatch to 9.5.

Discussion:
https://www.postgresql.org/message-id/CA+Tgmoa6SgDaxW_n_7SEhwBAc=mniYga+obUj5fmw4rU9_mLvA@mail.gmail.com
https://www.postgresql.org/message-id/29788.1437411581@sss.pgh.pa.us
2015-07-21 13:20:53 +02:00
Heikki Linnakangas
eb11de8ff5 Sanity-check that a page zeroed by redo routine is marked with WILL_INIT.
There was already a sanity-check in the other direction: if a page was
marked with WILL_INIT, it had to be initialized by the redo routine. It's
not strictly necessary for correctness that a page is marked with WILL_INIT
if it's going to be initialized at redo, but it's a missed optimization if
nothing else.

Fix a few instances of this issue in SP-GiST, where a block in WAL record
was not marked with WILL_INIT, but was in fact always initialized at redo.
We were creating a full-page image of the page unnecessarily in those
cases.

Backpatch to 9.5, where the new WILL_INIT flag was added.
2015-07-20 22:34:01 +03:00
Alvaro Herrera
e52b690cf5 Don't handle PUBLIC/NONE separately
Since those role specifiers are checked in the grammar, there's no need
for the old checks to remain in place after 31eae6028e.  Remove them.

Backpatch to 9.5.

Noted and patch by Jeevan Chalke
2015-07-20 18:47:15 +02:00
Alvaro Herrera
8d90736924 Improve BRIN documentation somewhat
This removes some info about support procedures being used, which was
obsoleted by commit db5f98ab4f, as well as add some more documentation
on how to create new opclasses using the Minmax infrastructure.
(Hopefully we can get something similar for Inclusion as well.)

In passing, fix some obsolete mentions of "mmtuples" in source code
comments.

Backpatch to 9.5, where BRIN was introduced.
2015-07-20 12:16:40 +02:00
Andrew Dunstan
9aa663463b Remove dead code.
Defect noticed by Coverity.
2015-07-19 13:19:38 -04:00
Tom Lane
576a95b3a1 Make WaitLatchOrSocket's timeout detection more robust.
In the previous coding, timeout would be noticed and reported only when
poll() or socket() returned zero (or the equivalent behavior on Windows).
Ordinarily that should work well enough, but it seems conceivable that we
could get into a state where poll() always returns a nonzero value --- for
example, if it is noticing a condition on one of the file descriptors that
we do not think is reason to exit the loop.  If that happened, we'd be in a
busy-wait loop that would fail to terminate even when the timeout expires.

We can make this more robust at essentially no cost, by deciding to exit
of our own accord if we compute a zero or negative time-remaining-to-wait.
Previously the code noted this but just clamped the time-remaining to zero,
expecting that we'd detect timeout on the next loop iteration.

Back-patch to 9.2.  While 9.1 had a version of WaitLatchOrSocket, it was
primitive compared to later versions, and did not guarantee reliable
detection of timeouts anyway.  (Essentially, this is a refinement of
commit 3e7fdcffd6, which was back-patched only as far as 9.2.)
2015-07-18 11:47:13 -04:00
Andrew Dunstan
e02d44b8a7 Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting.  Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator).  This inconsistency seemed likely to be
counter-intuitive to users.  To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators.  This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.

For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count.  From there on, it's as
if a positive-wise value was originally provided.

Note that there is still a minor inconsistency here across jsonb
deletion operators.  Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed.  The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements.  However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).

In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte.  This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator).  Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.

Peter Geoghegan and Andrew Dunstan
2015-07-17 21:13:47 -04:00
Robert Haas
a04bb65f70 Add new function pg_notification_queue_usage.
This tells you what fraction of NOTIFY's queue is currently filled.

Brendan Jurd, reviewed by Merlin Moncure and Gurjeet Singh.  A few
further tweaks by me.
2015-07-17 09:12:03 -04:00
Tom Lane
9d6077abf9 Fix a low-probability crash in our qsort implementation.
It's standard for quicksort implementations, after having partitioned the
input into two subgroups, to recurse to process the smaller partition and
then handle the larger partition by iterating.  This method guarantees
that no more than log2(N) levels of recursion can be needed.  However,
Bentley and McIlroy argued that checking to see which partition is smaller
isn't worth the cycles, and so their code doesn't do that but just always
recurses on the left partition.  In most cases that's fine; but with
worst-case input we might need O(N) levels of recursion, and that means
that qsort could be driven to stack overflow.  Such an overflow seems to
be the only explanation for today's report from Yiqing Jin of a SIGSEGV
in med3_tuple while creating an index of a couple billion entries with a
very large maintenance_work_mem setting.  Therefore, let's spend the few
additional cycles and lines of code needed to choose the smaller partition
for recursion.

Also, fix up the qsort code so that it properly uses size_t not int for
some intermediate values representing numbers of items.  This would only
be a live risk when sorting more than INT_MAX bytes (in qsort/qsort_arg)
or tuples (in qsort_tuple), which I believe would never happen with any
caller in the current core code --- but perhaps it could happen with
call sites in third-party modules?  In any case, this is trouble waiting
to happen, and the corrected code is probably if anything shorter and
faster than before, since it removes sign-extension steps that had to
happen when converting between int and size_t.

In passing, move a couple of CHECK_FOR_INTERRUPTS() calls so that it's
not necessary to preserve the value of "r" across them, and prettify
the output of gen_qsort_tuple.pl a little.

Back-patch to all supported branches.  The odds of hitting this issue
are probably higher in 9.4 and up than before, due to the new ability
to allocate sort workspaces exceeding 1GB, but there's no good reason
to believe that it's impossible to crash older branches this way.
2015-07-16 22:57:46 -04:00
Magnus Hagander
828df727a6 Fix spelling error
David Rowley
2015-07-16 10:31:58 +03:00
Magnus Hagander
64c9d8a6c8 Fix copy/past error in comment
David Christensen
2015-07-16 10:28:44 +03:00
Noah Misch
bcd7c41206 AIX: Link the postgres executable with -Wl,-brtllib.
This allows PostgreSQL modules and their dependencies to have undefined
symbols, resolved at runtime.  Perl module shared objects rely on that
in Perl 5.8.0 and later.  This fixes the crash when PL/PerlU loads such
modules, as the hstore_plperl test suite does.  Module authors can link
using -Wl,-G to permit undefined symbols; by default, linking will fail
as it has.  Back-patch to 9.0 (all supported versions).
2015-07-15 21:00:26 -04:00
Heikki Linnakangas
d5c0495cd4 Fix event trigger support for the new ALTER OPERATOR command.
Also, the lock on pg_operator should not be released until end of
transaction.
2015-07-14 19:50:18 +03:00
Heikki Linnakangas
321eed5f0f Add ALTER OPERATOR command, for changing selectivity estimator functions.
Other options cannot be changed, as it's not totally clear if cached plans
would need to be invalidated if one of the other options change. Selectivity
estimator functions only change plan costs, not correctness of plans, so
those should be safe.

Original patch by Uriy Zhuravlev, heavily edited by me.
2015-07-14 18:17:55 +03:00
Heikki Linnakangas
e42375fc81 Retain comments on indexes and constraints at ALTER TABLE ... TYPE ...
When a column's datatype is changed, ATExecAlterColumnType() rebuilds all
the affected indexes and constraints, and the comments from the old
indexes/constraints were not carried over.

To fix, create a synthetic COMMENT ON command in the work queue, to re-add
any comments on constraints. For indexes, there's a comment field in
IndexStmt that is used.

This fixes bug #13126, reported by Kirill Simonov. Original patch by
Michael Paquier, reviewed by Petr Jelinek and me. This bug is present in
all versions, but only backpatch to 9.5. Given how minor the issue is, it
doesn't seem worth the work and risk to backpatch further than that.
2015-07-14 11:40:22 +03:00
Heikki Linnakangas
1ab9faaecb Reformat code in ATPostAlterTypeParse.
The code in ATPostAlterTypeParse was very deeply indented, mostly because
there were two nested switch-case statements, which add a lot of
indentation. Use if-else blocks instead, to make the code less indented
and more readable.

This is in preparation for next patch that makes some actualy changes to
the function. These cosmetic parts have been separated to make it easier
to see the real changes in the other patch.
2015-07-14 11:38:08 +03:00
Andres Freund
3ed26e5f87 For consistency add a pfree to ON CONFLICT set_plan_refs code.
Backpatch to 9.5 where ON CONFLICT was introduced.

Author: Peter Geoghegan
2015-07-12 22:18:57 +02:00
Tom Lane
0a0fe2ff6e Add now-required #include.
Fixes compiler warning induced by 808ea8fc7b.
2015-07-11 23:34:41 -04:00
Joe Conway
808ea8fc7b Add assign_expr_collations() to CreatePolicy() and AlterPolicy().
As noted by Noah Misch, CreatePolicy() and AlterPolicy() omit to call
assign_expr_collations() on the node trees. Fix the omission and add
his test case to the rowsecurity regression test.
2015-07-11 14:19:31 -07:00
Tom Lane
45811be94e Fix postmaster's handling of a startup-process crash.
Ordinarily, a failure (unexpected exit status) of the startup subprocess
should be considered fatal, so the postmaster should just close up shop
and quit.  However, if we sent the startup process a SIGQUIT or SIGKILL
signal, the failure is hardly "unexpected", and we should attempt restart;
this is necessary for recovery from ordinary backend crashes in hot-standby
scenarios.  I attempted to implement the latter rule with a two-line patch
in commit 442231d7f7, but it now emerges that
that patch was a few bricks shy of a load: it failed to distinguish the
case of a signaled startup process from the case where the new startup
process crashes before reaching database consistency.  That resulted in
infinitely respawning a new startup process only to have it crash again.

To handle this properly, we really must track whether we have sent the
*current* startup process a kill signal.  Rather than add yet another
ad-hoc boolean to the postmaster's state, I chose to unify this with the
existing RecoveryError flag into an enum tracking the startup process's
state.  That seems more consistent with the postmaster's general state
machine design.

Back-patch to 9.0, like the previous patch.
2015-07-09 13:22:22 -04:00
Fujii Masao
c2e5f4d1c1 Make wal_compression PGC_SUSET rather than PGC_USERSET.
When enabling wal_compression, there is a risk to leak data similarly to
the BREACH and CRIME attacks on SSL where the compression ratio of
a full page image gives a hint of what is the existing data of this page.
This vulnerability is quite cumbersome to exploit in practice, but doable.

So this patch makes wal_compression PGC_SUSET in order to prevent
non-superusers from enabling it and exploiting the vulnerability while
DBA thinks the risk very seriously and disables it in postgresql.conf.

Back-patch to 9.5 where wal_compression was introduced.
2015-07-09 22:30:52 +09:00
Noah Misch
bfb4cf12ab Add .gitignore entries for AIX-specific intermediate build artifacts. 2015-07-08 20:44:22 -04:00
Noah Misch
be8b06c364 Revoke support for strxfrm() that write past the specified array length.
This formalizes a decision implicit in commit
4ea51cdfe8 and adds clean detection of
affected systems.  Vendor updates are available for each such known bug.
Back-patch to 9.5, where the aforementioned commit first appeared.
2015-07-08 20:44:21 -04:00
Andres Freund
b2f6f749c7 Fix logical decoding bug leading to inefficient reopening of files.
When spilling transaction data to disk a simple typo caused the output
file to be closed and reopened for every serialized change. That happens
to not have a huge impact on linux, which is why it probably wasn't
noticed so far, but on windows that appears to trigger actual disk
writes after every change. Not fun.

The bug fortunately does not have any impact besides speed. A change
could end up being in the wrong segment (last instead of next), but
since we read all files to the end, that's just ugly, not really
problematic. It's not a problem to upgrade, since transaction spill
files do not persist across restarts.

Bug: #13484
Reported-By: Olivier Gosseaume
Discussion: 20150703090217.1190.63940@wrigleys.postgresql.org

Backpatch to 9.4, where logical decoding was added.
2015-07-07 13:12:46 +02:00
Joe Conway
02eac01f91 Make RLS related error messages more consistent and compliant.
Also updated regression expected output to match. Noted and patch by Daniele Varrazzo.
2015-07-06 19:16:53 -07:00
Heikki Linnakangas
8e33fc1784 Call getsockopt() on the correct socket.
We're interested in the buffer size of the socket that's connected to the
client, not the one that's listening for new connections. It happened to
work, as default buffer size is the same on both, but it was clearly not
wrong.

Spotted by Tom Lane
2015-07-06 16:36:48 +03:00
Heikki Linnakangas
4f33621f3f Don't set SO_SNDBUF on recent Windows versions that have a bigger default.
It's unnecessary to set it if the default is higher in the first place.
Furthermore, setting SO_SNDBUF disables the so-called "dynamic send
buffering" feature, which hurts performance further. This can be seen
especially when the network between the client and the server has high
latency.

Chen Huajun
2015-07-06 16:10:58 +03:00
Tom Lane
ac50f84866 Fix misuse of TextDatumGetCString().
"TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text*
is not a Datum.  Although this coding will accidentally fail to fail on
all known platforms, it risks leaking memory if a detoast step is needed,
unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used
elsewhere.  Make pg_get_object_address() fall in line with other uses.

Noted while reviewing two-arg current_setting() patch.
2015-07-02 17:02:08 -04:00
Tom Lane
10fb48d66d Add an optional missing_ok argument to SQL function current_setting().
This allows convenient checking for existence of a GUC from SQL, which is
particularly useful when dealing with custom variables.

David Christensen, reviewed by Jeevan Chalke
2015-07-02 16:41:07 -04:00
Heikki Linnakangas
7261172430 Remove obsolete heap_formtuple/modifytuple/deformtuple functions.
These variants used the old-style 'n'/' ' NULL indicators. The new-style
functions have been available since version 8.1. That should be long enough
that if there is still any old external code using these functions, they
can just switch to the new functions without worrying about backwards
compatibility

Peter Geoghegan
2015-07-02 21:21:23 +03:00
Heikki Linnakangas
f92d6a540a Use appendStringInfoString/Char et al where appropriate.
Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in
9.5, and keeping the code in sync with master makes future backpatching
easier.
2015-07-02 12:36:03 +03:00
Tom Lane
1e24cf645d Don't leave pg_hba and pg_ident data lying around in running backends.
Free the contexts holding this data after we're done using it, by the
expedient of attaching them to the PostmasterContext which we were
already taking care to delete (and where, indeed, this data used to live
before commits e5e2fc842c and 7c45e3a3c6).  This saves a
probably-usually-negligible amount of space per running backend.  It also
avoids leaving potentially-security-sensitive data lying around in memory
in processes that don't need it.  You'd have to be unusually paranoid to
think that that amounts to a live security bug, so I've not gone so far as
to forcibly zero the memory; but there surely isn't a good reason to keep
this data around.

Arguably this is a memory management bug in the aforementioned commits,
but it doesn't seem important enough to back-patch.
2015-07-01 18:55:39 -04:00
Tom Lane
d7c19d6855 Make sampler_random_fract() actually obey its API contract.
This function is documented to return a value in the range (0,1),
which is what its predecessor anl_random_fract() did.  However, the
new version depends on pg_erand48() which returns a value in [0,1).
The possibility of returning zero creates hazards of division by zero
or trying to compute log(0) at some call sites, and it might well
break third-party modules using anl_random_fract() too.  So let's
change it to never return zero.  Spotted by Coverity.

Michael Paquier, cosmetically adjusted by me
2015-07-01 18:07:48 -04:00
Fujii Masao
8217370864 Make XLogFileCopy() look the same as in 9.4.
XLogFileCopy() was changed heavily in commit de76884. However it was
partially reverted in commit 7abc685 and most of those changes to
XLogFileCopy() were no longer needed. Then commit 7cbee7c removed
those unnecessary code, but XLogFileCopy() looked different in master
and 9.4 though the contents are almost the same.

This patch makes XLogFileCopy() look the same in master and back-branches,
which makes back-patching easier, per discussion on pgsql-hackers.
Back-patch to 9.5.

Discussion: 55760844.7090703@iki.fi

Michael Paquier
2015-07-01 10:54:47 +09:00
Tom Lane
131926a52d Remove useless check for NULL subexpression.
Coverity rightly gripes that it's silly to have a test here when
the adjacent ExecEvalExpr() would choke on a NULL expression pointer.

Petr Jelinek
2015-06-30 12:53:54 -04:00
Heikki Linnakangas
fdf28853ae Don't call PageGetSpecialPointer() on page until it's been initialized.
After calling XLogInitBufferForRedo(), the page might be all-zeros if it was
not in page cache already. btree_xlog_unlink_page initialized the page
correctly, but it called PageGetSpecialPointer before initializing it, which
would lead to a corrupt page at WAL replay, if the unlinked page is not in
page cache.

Backpatch to 9.4, the bug came with the rewrite of B-tree page deletion.
2015-06-30 13:41:30 +03:00
Robert Haas
b48ecf862b In bttext_abbrev_convert, move pfree to the right place.
Without this, we might access memory that's already been freed, or
leak memory if in the C locale.

Peter Geoghegan
2015-06-29 23:53:05 -04:00
Heikki Linnakangas
47fe4d25d5 Initialize GIN metapage correctly when replaying metapage-update WAL record.
I broke this with my WAL format refactoring patch. Before that, the metapage
was read from disk, and modified in-place regardless of the LSN. That was
always a bit silly, as there's no need to read the old page version from
disk disk when we're overwriting it anyway. So that was changed in 9.5, but
I failed to add a GinInitPage call to initialize the page-headers correctly.
Usually you wouldn't notice, because the metapage is already in the page
cache and is not zeroed.

One way to reproduce this is to perform a VACUUM on an already vacuumed
table (so that the vacuum has no real work to do), immediately after a
checkpoint, and then perform an immediate shutdown. After recovery, the
page headers of the metapage will be incorrectly all-zeroes.

Reported by Jeff Janes
2015-06-30 00:06:00 +03:00
Tom Lane
cbc8d65639 Code + docs review for escaping of option values (commit 11a020eb6).
Avoid memory leak from incorrect choice of how to free a StringInfo
(resetStringInfo doesn't do it).  Now that pg_split_opts doesn't scribble
on the optstr, mark that as "const" for clarity.  Attach the commentary in
protocol.sgml to the right place, and add documentation about the
user-visible effects of this change on postgres' -o option and libpq's
PGOPTIONS option.
2015-06-29 12:42:52 -04:00
Peter Eisentraut
c5e5d444de Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: fb7e72f46cfafa1b5bfe4564d9686d63a1e6383f
2015-06-28 23:56:55 -04:00
Tom Lane
2bdc51a294 Run the C portions of guc-file.l through pgindent.
Yeah, I know, pretty anal-retentive of me.  But we oughta find some
way to automate this for the .y and .l files.
2015-06-28 20:49:35 -04:00
Tom Lane
62d16c7fc5 Improve design and implementation of pg_file_settings view.
As first committed, this view reported on the file contents as they were
at the last SIGHUP event.  That's not as useful as reporting on the current
contents, and what's more, it didn't work right on Windows unless the
current session had serviced at least one SIGHUP.  Therefore, arrange to
re-read the files when pg_show_all_settings() is called.  This requires
only minor refactoring so that we can pass changeVal = false to
set_config_option() so that it won't actually apply any changes locally.

In addition, add error reporting so that errors that would prevent the
configuration files from being loaded, or would prevent individual settings
from being applied, are visible directly in the view.  This makes the view
usable for pre-testing whether edits made in the config files will have the
desired effect, before one actually issues a SIGHUP.

I also added an "applied" column so that it's easy to identify entries that
are superseded by later entries; this was the main use-case for the original
design, but it seemed unnecessarily hard to use for that.

Also fix a 9.4.1 regression that allowed multiple entries for a
PGC_POSTMASTER variable to cause bogus complaints in the postmaster log.
(The issue here was that commit bf007a27ac unintentionally reverted
3e3f65973a, which suppressed any duplicate entries within
ParseConfigFp.  However, since the original coding of the pg_file_settings
view depended on such suppression *not* happening, we couldn't have fixed
this issue now without first doing something with pg_file_settings.
Now we suppress duplicates by marking them "ignored" within
ProcessConfigFileInternal, which doesn't hide them in the view.)

Lesser changes include:

Drive the view directly off the ConfigVariable list, instead of making a
basically-equivalent second copy of the data.  There's no longer any need
to hang onto the data permanently, anyway.

Convert show_all_file_settings() to do its work in one call and return a
tuplestore; this avoids risks associated with assuming that the GUC state
will hold still over the course of query execution.  (I think there were
probably latent bugs here, though you might need something like a cursor
on the view to expose them.)

Arrange to run SIGHUP processing in a short-lived memory context, to
forestall process-lifespan memory leaks.  (There is one known leak in this
code, in ProcessConfigDirectory; it seems minor enough to not be worth
back-patching a specific fix for.)

Remove mistaken assignment to ConfigFileLineno that caused line counting
after an include_dir directive to be completely wrong.

Add missed failure check in AlterSystemSetConfigFile().  We don't really
expect ParseConfigFp() to fail, but that's not an excuse for not checking.
2015-06-28 18:06:14 -04:00
Heikki Linnakangas
d661532e27 Also trigger restartpoints based on max_wal_size on standby.
When archive recovery and restartpoints were initially introduced,
checkpoint_segments was ignored on the grounds that the files restored from
archive don't consume any space in the recovery server. That was changed in
later releases, but even then it was arguably a feature rather than a bug,
as performing restartpoints as often as checkpoints during normal operation
might be excessive, but you might nevertheless not want to waste a lot of
space for pre-allocated WAL by setting checkpoint_segments to a high value.
But now that we have separate min_wal_size and max_wal_size settings, you
can bound WAL usage with max_wal_size, and still avoid consuming excessive
space usage by setting min_wal_size to a lower value, so that argument is
moot.

There are still some issues with actually limiting the space usage to
max_wal_size: restartpoints in recovery can only start after seeing the
checkpoint record, while a checkpoint starts flushing buffers as soon as
the redo-pointer is set. Restartpoint is paced to happen at the same
leisurily speed, determined by checkpoint_completion_target, as checkpoints,
but because they are started later, max_wal_size can be exceeded by upto
one checkpoint cycle's worth of WAL, depending on
checkpoint_completion_target. But that seems better than not trying at all,
and max_wal_size is a soft limit anyway.

The documentation already claimed that max_wal_size is obeyed in recovery,
so this just fixes the behaviour to match the docs. However, add some
weasel-words there to mention that max_wal_size may well be exceeded by
some amount in recovery.
2015-06-29 00:09:10 +03:00
Heikki Linnakangas
a32c3ec893 Promote the assertion that XLogBeginInsert() is not called twice into ERROR.
Seems like cheap insurance for WAL bugs. A spurious call to
XLogBeginInsert() in itself would be fairly harmless, but if there is any
data registered and the insertion is not completed/cancelled properly, there
is a risk that the data ends up in a wrong WAL record.

Per Jeff Janes's suggestion.
2015-06-28 22:30:39 +03:00
Heikki Linnakangas
a45c70acf3 Fix double-XLogBeginInsert call in GIN page splits.
If data checksums or wal_log_hints is on, and a GIN page is split, the code
to find a new, empty, block was called after having already called
XLogBeginInsert(). That causes an assertion failure or PANIC, if finding the
new block involves updating a FSM page that had not been modified since last
checkpoint, because that update is WAL-logged, which calls XLogBeginInsert
again. Nested XLogBeginInsert calls are not supported.

To fix, rearrange GIN code so that XLogBeginInsert is called later, after
finding the victim buffers.

Reported by Jeff Janes.
2015-06-28 22:16:21 +03:00
Heikki Linnakangas
cb2acb1081 Add missing_ok option to the SQL functions for reading files.
This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)

The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.

Michael Paquier, with plenty of kibitzing by me.
2015-06-28 21:35:46 +03:00
Kevin Grittner
cca8ba9529 Fix comment for GetCurrentIntegerTimestamp().
The unit of measure is microseconds, not milliseconds.

Backpatch to 9.3 where the function and its comment were added.
2015-06-28 12:43:59 -05:00
Tom Lane
0a52d378b0 Avoid passing NULL to memcmp() in lookups of zero-argument functions.
A few places assumed they could pass NULL for the argtypes array when
looking up functions known to have zero arguments.  At first glance
it seems that this should be safe enough, since memcmp() is surely not
allowed to fetch any bytes if its count argument is zero.  However,
close reading of the C standard says that such calls have undefined
behavior, so we'd probably best avoid it.

Since the number of places doing this is quite small, and some other
places looking up zero-argument functions were already passing dummy
arrays, let's standardize on the latter solution rather than hacking
the function lookup code to avoid calling memcmp() in these cases.
I also added Asserts to catch any future violations of the new rule.

Given the utter lack of any evidence that this actually causes any
problems in the field, I don't feel a need to back-patch this change.

Per report from Piotr Stefaniak, though this is not his patch.
2015-06-27 17:47:39 -04:00
Heikki Linnakangas
7845db2aa7 Fix typo in comment
Etsuro Fujita
2015-06-27 10:17:42 +03:00
Simon Riggs
66fbcb0d2e Avoid hot standby cancels from VAC FREEZE
VACUUM FREEZE generated false cancelations of standby queries on an
otherwise idle master. Caused by an off-by-one error on cutoff_xid
which goes back to original commit.

Backpatch to all versions 9.0+

Analysis and report by Marco Nenciarini

Bug fix by Simon Riggs
2015-06-27 00:41:47 +01:00
Alvaro Herrera
7d60b2af34 Fix DDL command collection for TRANSFORM
Commit b488c580ae, which added the DDL command collection feature,
neglected to update the code that commit cac7658205 had previously
added two weeks earlier for the TRANSFORM feature.

Reported by Michael Paquier.
2015-06-26 18:17:54 -03:00
Alvaro Herrera
4028222468 Fix BRIN xlog replay
There was a confusion about which block number to use when storing an
item's pointer in the revmap -- the revmap page's blkno was being used,
not the data page's blkno.

Spotted-by: Jeff Janes
2015-06-26 18:13:05 -03:00
Robert Haas
8f15f74a44 Be more conservative about removing tablespace "symlinks".
Don't apply rmtree(), which will gleefully remove an entire subtree,
and don't even apply unlink() unless it's symlink or a directory,
the only things that we expect to find.

Amit Kapila, with minor tweaks by me, per extensive discussions
involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas,
at least some of whom also reviewed the code.
2015-06-26 15:53:13 -04:00
Robert Haas
9043ef390f Don't warn about creating temporary or unlogged hash indexes.
Warning people that no WAL-logging will be done doesn't make sense
in this case.

Michael Paquier
2015-06-26 11:37:32 -04:00
Robert Haas
91118f1a59 Reduce log level for background worker events from LOG to DEBUG1.
Per discussion, LOG is just too chatty for something that will happen
as routinely as this.

Pavel Stehule
2015-06-26 11:23:32 -04:00
Andres Freund
1b468a131b Fix the fallback memory barrier implementation to be reentrant.
This was essentially "broken" since 0c8eda62; but until more
recently (14e8803f) barriers usage in signal handlers was infrequent.

The failure to be reentrant was noticed because the test_shm_mq, which
uses memory barriers at a high frequency, occasionally got stuck on some
solaris buildfarm animals. Turns out, those machines use sun studio
12.1, which doesn't yet have efficient memory barrier support. A machine
with a newer sun studio did not fail.  Forcing the barrier fallback to
be used on x86 allows to reproduce the problem.

The new fallback is to use kill(PostmasterPid, 0) based on the theory
that that'll always imply a barrier due to checking the liveliness of
PostmasterPid on systems old enough to need fallback support. It's hard
to come up with a good and performant fallback.

I'm not backpatching this for now - the problem isn't active in the back
branches, and we haven't backpatched barrier changes for
now. Additionally master looks entirely different than the back branches
due to the new atomics abstraction. It seems better to let this rest in
master, where the non-reentrancy actively causes a problem, and then
consider backpatching.

Found-By: Robert Haas
Discussion: 55626265.3060800@dunslane.net
2015-06-26 17:00:38 +02:00
Robert Haas
5ca611841b Improve handling of CustomPath/CustomPlan(State) children.
Allow CustomPath to have a list of paths, CustomPlan a list of plans,
and CustomPlanState a list of planstates known to the core system, so
that custom path/plan providers can more reasonably use this
infrastructure for nodes with multiple children.

KaiGai Kohei, per a design suggestion from Tom Lane, with some
further kibitzing by me.
2015-06-26 09:40:47 -04:00
Heikki Linnakangas
4b8e24b9ad Fix a couple of bugs with wal_log_hints.
1. Replay of the WAL record for setting a bit in the visibility map
contained an assertion that a full-page image of that record type can only
occur with checksums enabled. But it can also happen with wal_log_hints, so
remove the assertion. Unlike checksums, wal_log_hints can be changed on the
fly, so it would be complicated to figure out if it was enabled at the time
that the WAL record was generated.

2. wal_log_hints has the same effect on the locking needed to read the LSN
of a page as data checksums. BufferGetLSNAtomic() didn't get the memo.

Backpatch to 9.4, where wal_log_hints was added.
2015-06-26 12:38:24 +03:00
Robert Haas
f7bb7f0625 Allow background workers to connect to no particular database.
The documentation claims that this is supported, but it didn't
actually work.  Fix that.

Reported by Pavel Stehule; patch by me.
2015-06-25 15:52:13 -04:00
Tom Lane
5d1ff6bd55 Fix the logic for putting relations into the relcache init file.
Commit f3b5565dd4 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache.  However, we have historically nailed that index into cache for
performance reasons.  The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.

To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.

Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path.  Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set.  So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.

Back-patch to all supported branches, like the previous commit.
2015-06-25 14:39:05 -04:00
Robert Haas
51d0fe5d56 Update get_relation_info comment.
Thomas Munro
2015-06-23 10:09:53 -04:00
Tom Lane
2cb9ec1bcb Improve inheritance_planner()'s performance for large inheritance sets.
Commit c03ad5602f introduced a planner
performance regression for UPDATE/DELETE on large inheritance sets.
It required copying the append_rel_list (which is of size proportional to
the number of inherited tables) once for each inherited table, thus
resulting in O(N^2) time and memory consumption.  While it's difficult to
avoid that in general, the extra work only has to be done for
append_rel_list entries that actually reference subquery RTEs, which
inheritance-set entries will not.  So we can buy back essentially all of
the loss in cases without subqueries in FROM; and even for those, the added
work is mainly proportional to the number of UNION ALL subqueries.

Back-patch to 9.2, like the previous commit.

Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
2015-06-22 18:53:27 -04:00
Alvaro Herrera
ad89a5d115 Add transforms to pg_get_object_address and friends
This was missed when transforms were added by commit cac7658205.

Extracted from a larger patch
Author: Michael Paquier
2015-06-21 16:08:49 -03:00
Andres Freund
667912aee6 Improve multixact emergency autovacuum logic.
Previously autovacuum was not necessarily triggered if space in the
members slru got tight. The first problem was that the signalling was
tied to values in the offsets slru, but members can advance much
faster. Thats especially a problem if old sessions had been around that
previously prevented the multixact horizon to increase. Secondly the
skipping logic doesn't work if the database was restarted after
autovacuum was triggered - that knowledge is not preserved across
restart. This is especially a problem because it's a common
panic-reaction to restart the database if it gets slow to
anti-wraparound vacuums.

Fix the first problem by separating the logic for members from
offsets. Trigger autovacuum whenever a multixact crosses a segment
boundary, as the current member offset increases in irregular values, so
we can't use a simple modulo logic as for offsets.  Add a stopgap for
the second problem, by signalling autovacuum whenver ERRORing out
because of boundaries.

Discussion: 20150608163707.GD20772@alap3.anarazel.de

Backpatch into 9.3, where it became more likely that multixacts wrap
around.
2015-06-21 18:57:28 +02:00
Andres Freund
90231cd518 Add missing check for wal_debug GUC.
9a20a9b2 added a new elog(), enabled when WAL_DEBUG is defined. The
other WAL_DEBUG dependant messages check for the wal_debug GUC, but this
one did not. While at it replace 'upto' with 'up to'.

Discussion: 20150610110253.GF3832@alap3.anarazel.de

Backpatch to 9.4, the first release containing 9a20a9b2.
2015-06-21 18:37:09 +02:00
Noah Misch
f0a264a362 Fix failure to copy setlocale() return value.
POSIX permits setlocale() calls to invalidate any previous setlocale()
return values, but commit 5f538ad004
neglected to account for setlocale(LC_CTYPE, NULL) doing so.  The effect
was to set the LC_CTYPE environment variable to an unintended value.
pg_perm_setlocale() sets this variable to assist PL/Perl; without it,
Perl would undo PostgreSQL's locale settings.  The known-affected
configurations are 32-bit, release builds using Visual Studio 2012 or
Visual Studio 2013.  Visual Studio 2010 is unaffected, as were all
buildfarm-attested configurations.  In principle, this bug could leave
the wrong LC_CTYPE in effect after PL/Perl use, which could in turn
facilitate problems like corrupt tsvector datums.  No known platform
experiences that consequence, because PL/Perl on Windows does not use
this environment variable.

The bug has been user-visible, as early postmaster failure, on systems
with Windows ANSI code page set to CP936 for "Chinese (Simplified, PRC)"
and probably on systems using other multibyte code pages.
(SetEnvironmentVariable() rejects values containing character data not
valid under the Windows ANSI code page.)  Back-patch to 9.4, where the
faulty commit first appeared.

Reported by Didi Hu and 林鹏程.  Reviewed by Tom Lane, though this fix
strategy was not his first choice.
2015-06-20 12:09:29 -04:00
Noah Misch
1f2a378de4 Revert "Detect setlocale(LC_CTYPE, NULL) clobbering previous return values."
This reverts commit b76e76be46.  The
buildfarm yielded no related failures.
2015-06-20 12:08:48 -04:00
Alvaro Herrera
3c400a3f2b Fix thinko in comment (launcher -> worker) 2015-06-20 11:45:59 -03:00
Tom Lane
48913db887 In immediate shutdown, postmaster should not exit till children are gone.
This adjusts commit 82233ce7ea so that the
postmaster does not exit until all its child processes have exited, even
if the 5-second timeout elapses and we have to send SIGKILL.  There is no
great value in having the postmaster process quit sooner, and doing so can
mislead onlookers into thinking that the cluster is fully terminated when
actually some child processes still survive.

This effect might explain recent test failures on buildfarm member hamster,
wherein we failed to restart a cluster just after shutting it down with
"pg_ctl stop -m immediate".

I also did a bit of code review/beautification, including fixing a faulty
use of the Max() macro on a volatile expression.

Back-patch to 9.4.  In older branches, the postmaster never waited for
children to exit during immediate shutdowns, and changing that would be
too much of a behavioral change.
2015-06-19 14:23:39 -04:00
Alvaro Herrera
da1a9d0f5b Clamp autovacuum launcher sleep time to 5 minutes
This avoids the problem that it might go to sleep for an unreasonable
amount of time in unusual conditions like the server clock moving
backwards an unreasonable amount of time.

(Simply moving the server clock forward again doesn't solve the problem
unless you wake up the autovacuum launcher manually, say by sending it
SIGHUP).

Per trouble report from Prakash Itnal in
https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com

Analyzed independently by Haribabu Kommi and Tom Lane.
2015-06-19 12:44:36 -03:00
Tom Lane
be87143fe9 Fix bogus range_table_mutator() logic for RangeTblEntry.tablesample.
Must make a copy of the TableSampleClause node; the previous coding
modified the input data structure in-place.

Petr Jelinek
2015-06-19 11:41:56 -04:00
Robert Haas
ed16f73c57 Fix corner case in autovacuum-forcing logic for multixact wraparound.
Since find_multixact_start() relies on SimpleLruDoesPhysicalPageExist(),
and that function looks only at the on-disk state, it's possible for it
to fail to find a page that exists in the in-memory SLRU that has not
been written yet.  If that happens, SetOffsetVacuumLimit() will
erroneously decide to force emergency autovacuuming immediately.

We should probably fix find_multixact_start() to consider the data
cached in memory as well as on the on-disk state, but that's no excuse
for SetOffsetVacuumLimit() to be stupid about the case where it can
no longer read the value after having previously succeeded in doing so.

Report by Andres Freund.
2015-06-19 11:28:30 -04:00
Noah Misch
b76e76be46 Detect setlocale(LC_CTYPE, NULL) clobbering previous return values.
POSIX permits setlocale() calls to invalidate any previous setlocale()
return values.  Commit 5f538ad004
neglected to account for that.  In advance of fixing that bug, switch to
failing hard on affected configurations.  This is a planned temporary
commit to assay buildfarm-represented configurations.
2015-06-17 08:13:33 -04:00
Andrew Dunstan
2271d002d5 Fix "path" infrastructure bug affecting jsonb_set()
jsonb_set() and other clients of the setPathArray() utility function
could get spurious results when an array integer subscript is provided
that is not within the range of int.

To fix, ensure that the value returned by strtol() within setPathArray()
is within the range of int;  when it isn't, assume an invalid input in
line with existing, similar cases.  The path-orientated operators that
appeared in PostgreSQL 9.3 and 9.4 do not call setPathArray(), and
already independently take this precaution, so no change there.

Peter Geoghegan
2015-06-12 19:26:03 -04:00
Tom Lane
b00982344a Improve error message and hint for ALTER COLUMN TYPE can't-cast failure.
We already tried to improve this once, but the "improved" text was rather
off-target if you had provided a USING clause.  Also, it seems helpful
to provide the exact text of a suggested USING clause, so users can just
copy-and-paste it when needed.  Per complaint from Keith Rarick and a
suggestion from Merlin Moncure.

Back-patch to 9.2 where the current wording was adopted.
2015-06-12 11:54:03 -04:00
Fujii Masao
b5fe62038f Make postmaster restart archiver soon after it dies, even during recovery.
After the archiver dies, postmaster tries to start a new one immediately.
But previously this could happen only while server was running normally
even though archiving was enabled always (i.e., archive_mode was set to
always). So the archiver running during recovery could not restart soon
after it died. This is an oversight in commit ffd3774.

This commit changes reaper(), postmaster's signal handler to cleanup
after a child process dies, so that it tries to a new archiver even during
recovery if necessary.

Patch by me. Review by Alvaro Herrera.
2015-06-12 23:11:51 +09:00
Fujii Masao
091c02a958 Fix alphabetization in catalogs.sgml.
System catalogs and views should be listed alphabetically
in catalog.sgml, but only pg_file_settings view not.

This patch also fixes typos in pg_file_settings comments.
2015-06-12 12:59:29 +09:00
Peter Eisentraut
385522c7dc Fix typo 2015-06-10 21:30:17 -04:00
Tom Lane
f6e9cbfd91 Report more information if pg_perm_setlocale() fails at startup.
We don't know why a few Windows users have seen this fail, but the
taciturnity of the error message certainly isn't helping debug it.
Let's at least find out which LC category isn't working.
2015-06-09 13:37:08 -04:00
Alvaro Herrera
94232c909d Fix typos
tablesapce -> tablespace
there -> their

These were introduced in 72d422a52, so no need to backpatch.
2015-06-08 15:37:42 -03:00