Commit Graph

17799 Commits

Author SHA1 Message Date
Robert Haas 81c5e46c49 Introduce 64-bit hash functions with a 64-bit seed.
This will be useful for hash partitioning, which needs a way to seed
the hash functions to avoid problems such as a hash index on a hash
partitioned table clumping all values into a small portion of the
bucket space; it's also useful for anything that wants a 64-bit hash
value rather than a 32-bit hash value.

Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.

Robert Haas and Amul Sul

Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
2017-08-31 22:21:21 -04:00
Tom Lane 2d44c58c79 Avoid memory leaks when a GatherMerge node is rescanned.
Rescanning a GatherMerge led to leaking some memory in the executor's
query-lifespan context, because most of the node's working data structures
were simply abandoned and rebuilt from scratch.  In practice, this might
never amount to much, given the cost of relaunching worker processes ---
but it's still pretty messy, so let's fix it.

We can rearrange things so that the tuple arrays are simply cleared and
reused, and we don't need to rebuild the TupleTableSlots either, just
clear them.  One small complication is that because we might get a
different number of workers on each iteration, we can't keep the old
convention that the leader's gm_slots[] entry is the last one; the leader
might clobber a TupleTableSlot that we need for a worker in a future
iteration.  Hence, adjust the logic so that the leader has slot 0 always,
while the active workers have slots 1..n.

Back-patch to v10 to keep all the existing versions of nodeGatherMerge.c
in sync --- because of the renumbering of the slots, there would otherwise
be a very large risk that any future backpatches in this module would
introduce bugs.

Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
2017-08-31 16:21:05 -04:00
Robert Haas 30833ba154 Expand partitioned tables in PartDesc order.
Previously, we expanded the inheritance hierarchy in the order in
which find_all_inheritors had locked the tables, but that turns out
to block quite a bit of useful optimization.  For example, a
partition-wise join can't count on two tables with matching bounds
to get expanded in the same order.

Where possible, this change results in expanding partitioned tables in
*bound* order.  Bound order isn't well-defined for a list-partitioned
table with a null-accepting partition or for a list-partitioned table
where the bounds for a single partition are interleaved with other
partitions.  However, when expansion in bound order is possible, it
opens up further opportunities for optimization, such as
strength-reducing MergeAppend to Append when the expansion order
matches the desired sort order.

Patch by me, with cosmetic revisions by Ashutosh Bapat.

Discussion: http://postgr.es/m/CA+TgmoZrKj7kEzcMSum3aXV4eyvvbh9WD=c6m=002WMheDyE3A@mail.gmail.com
2017-08-31 15:50:18 -04:00
Tom Lane 6708e447ef Clean up shm_mq cleanup.
The logic around shm_mq_detach was a few bricks shy of a load, because
(contrary to the comments for shm_mq_attach) all it did was update the
shared shm_mq state.  That left us leaking a bit of process-local
memory, but much worse, the on_dsm_detach callback for shm_mq_detach
was still armed.  That means that whenever we ultimately detach from
the DSM segment, we'd run shm_mq_detach again for already-detached,
possibly long-dead queues.  This accidentally fails to fail today,
because we only ever re-use a shm_mq's memory for another shm_mq, and
multiple detach attempts on the last such shm_mq are fairly harmless.
But it's gonna bite us someday, so let's clean it up.

To do that, change shm_mq_detach's API so it takes a shm_mq_handle
not the underlying shm_mq.  This makes the callers simpler in most
cases anyway.  Also fix a few places in parallel.c that were just
pfree'ing the handle structs rather than doing proper cleanup.

Back-patch to v10 because of the risk that the revenant shm_mq_detach
callbacks would cause a live bug sometime.  Since this is an API
change, it's too late to do it in 9.6.  (We could make a variant
patch that preserves API, but I'm not excited enough to do that.)

Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
2017-08-31 15:10:24 -04:00
Tom Lane 04e9678614 Code review for nodeGatherMerge.c.
Comment the fields of GatherMergeState, and organize them a bit more
sensibly.  Comment GMReaderTupleBuffer more usefully too.  Improve
assorted other comments that were obsolete or just not very good English.

Get rid of the use of a GMReaderTupleBuffer for the leader process;
that was confusing, since only the "done" field was used, and that
in a way redundant with need_to_scan_locally.

In gather_merge_init, avoid calling load_tuple_array for
already-known-exhausted workers.  I'm not sure if there's a live bug there,
but the case is unlikely to be well tested due to timing considerations.

Remove some useless code, such as duplicating the tts_isempty test done by
TupIsNull.

Remove useless initialization of ps.qual, replacing that with an assertion
that we have no qual to check.  (If we did, the code would fail to check
it.)

Avoid applying heap_copytuple to a null tuple.  While that fails to crash,
it's confusing and it makes the code less legible not more so IMO.

Propagate a couple of these changes into nodeGather.c, as well.

Back-patch to v10, partly because of the possibility that the
gather_merge_init change is fixing a live bug, but mostly to keep
the branches in sync to ease future bug fixes.
2017-08-30 17:21:08 -04:00
Tom Lane 41b0dd987d Separate reinitialization of shared parallel-scan state from ExecReScan.
Previously, the parallel executor logic did reinitialization of shared
state within the ExecReScan code for parallel-aware scan nodes.  This is
problematic, because it means that the ExecReScan call has to occur
synchronously (ie, during the parent Gather node's ReScan call).  That is
swimming very much against the tide so far as the ExecReScan machinery is
concerned; the fact that it works at all today depends on a lot of fragile
assumptions, such as that no plan node between Gather and a parallel-aware
scan node is parameterized.  Another objection is that because ExecReScan
might be called in workers as well as the leader, hacky extra tests are
needed in some places to prevent unwanted shared-state resets.

Hence, let's separate this code into two functions, a ReInitializeDSM
call and the ReScan call proper.  ReInitializeDSM is called only in
the leader and is guaranteed to run before we start new workers.
ReScan is returned to its traditional function of resetting only local
state, which means that ExecReScan's usual habits of delaying or
eliminating child rescan calls are safe again.

As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary
to make these changes in 9.6, which is a good thing because the FDW and
CustomScan APIs are impacted.

Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 13:18:16 -04:00
Tom Lane 7df2c1f8da Force rescanning of parallel-aware scan nodes below a Gather[Merge].
The ExecReScan machinery contains various optimizations for postponing
or skipping rescans of plan subtrees; for example a HashAgg node may
conclude that it can re-use the table it built before, instead of
re-reading its input subtree.  But that is wrong if the input contains
a parallel-aware table scan node, since the portion of the table scanned
by the leader process is likely to vary from one rescan to the next.
This explains the timing-dependent buildfarm failures we saw after
commit a2b70c89c.

The established mechanism for showing that a plan node's output is
potentially variable is to mark it as depending on some runtime Param.
Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC
parameter number, but carries no actual value) associated with each Gather
or GatherMerge node, mark parallel-aware nodes below that node as dependent
on that Param, and arrange for ExecReScanGather[Merge] to flag that Param
as changed whenever the Gather[Merge] node is rescanned.

This solution breaks an undocumented assumption made by the parallel
executor logic, namely that all rescans of nodes below a Gather[Merge]
will happen synchronously during the ReScan of the top node itself.
But that's fundamentally contrary to the design of the ExecReScan code,
and so was doomed to fail someday anyway (even if you want to argue
that the bug being fixed here wasn't a failure of that assumption).
A follow-on patch will address that issue.  In the meantime, the worst
that's expected to happen is that given very bad timing luck, the leader
might have to do all the work during a rescan, because workers think
they have nothing to do, if they are able to start up before the eventual
ReScan of the leader's parallel-aware table scan node has reset the
shared scan state.

Although this problem exists in 9.6, there does not seem to be any way
for it to manifest there.  Without GatherMerge, it seems that a plan tree
that has a rescan-short-circuiting node below Gather will always also
have one above it that will short-circuit in the same cases, preventing
the Gather from being rescanned.  Hence we won't take the risk of
back-patching this change into 9.6.  But v10 needs it.

Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-30 09:29:55 -04:00
Robert Haas bf11e7ee2e Propagate sort instrumentation from workers back to leader.
Up until now, when parallel query was used, no details about the
sort method or space used by the workers were available; details
were shown only for any sorting done by the leader.  Fix that.

Commit 1177ab1dab forced the test case
added by commit 1f6d515a67 to run
without parallelism; now that we have this infrastructure, allow
that again, with a little tweaking to make it pass with and without
force_parallel_mode.

Robert Haas and Tom Lane

Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=Rf6RqQ2br+zJvEgwJ0uoD_tQ@mail.gmail.com
2017-08-29 13:26:33 -04:00
Robert Haas 3452dc5240 Push tuple limits through Gather and Gather Merge.
If we only need, say, 10 tuples in total, then we certainly don't need
more than 10 tuples from any single process.  Pushing down the limit
lets workers exit early when possible.  For Gather Merge, there is
an additional benefit: a Sort immediately below the Gather Merge can
be done as a bounded sort if there is an applicable limit.

Robert Haas and Tom Lane

Discussion: http://postgr.es/m/CA+TgmoYa3QKKrLj5rX7UvGqhH73G1Li4B-EKxrmASaca2tFu9Q@mail.gmail.com
2017-08-29 13:16:55 -04:00
Tom Lane 3f4c7917b3 Code review for pushing LIMIT through subqueries.
Minor improvements for commit 1f6d515a6.  We do not need the (rather
expensive) test for SRFs in the targetlist, because since v10 any
such SRFs would appear in separate ProjectSet nodes.  Also, make the
code look more like the existing cases by turning it into a simple
recursion --- the argument that there might be some performance
benefit to contorting the code seems unfounded to me, especially since
any good compiler should turn the tail-recursion into iteration anyway.

Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com
2017-08-25 09:05:17 -04:00
Andres Freund d7694fc148 Consolidate the function pointer types used by dshash.c.
Commit 8c0d7bafad introduced dshash with hash
and compare functions like DynaHash's, and also variants that take a user
data pointer instead of size.  Simplify the interface by merging them into
a single pair of function pointer types that take both size and a user data
pointer.

Since it is anticipated that memcmp and tag_hash behavior will be a common
requirement, provide wrapper functions dshash_memcmp and dshash_memhash that
conform to the new function types.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de
2017-08-24 17:01:36 -07:00
Andres Freund 4569715bd6 Fix unlikely shared memory leak after failure in dshash_create().
Tidy-up for commit 8c0d7bafad, based on a
complaint from Andres Freund.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de
2017-08-24 16:58:30 -07:00
Andres Freund 20fbf25533 Fix harmless thinko in dsa.c.
Commit 16be2fd100 added DSA_ALLOC_HUGE,
DSA_ALLOC_ZERO and DSA_ALLOC_NO_OOM which have the same numerical
values and meanings as the similarly named MCXT_... macros.  In one
place we accidentally used MCXT_ALLOC_NO_OOM when DSA_ALLOC_NO_OOM is
wanted, so tidy that up.

Author: Thomas Munro
Discussion: http://postgr.es/m/CAEepm=2AimHxVkkxnMfQvbZMkXy0uKbVa0-D38c5-qwrCm4CMQ@mail.gmail.com
Backpatch: 10, where dsa was introduced.
2017-08-24 15:07:40 -07:00
Peter Eisentraut 1e1b01cd16 Fix outdated comment
Author: Thomas Munro <thomas.munro@enterprisedb.com>
2017-08-23 14:19:35 -04:00
Peter Eisentraut 237a0b87b1 Improve plural handling in error message
This does not use the normal plural handling, because no numbers appear
in the actual message.
2017-08-23 13:56:59 -04:00
Peter Eisentraut 85f4d6393d Tweak some SCRAM error messages and code comments
Clarify/correct some error messages, fix up some code comments that
confused SASL and SCRAM, and other minor fixes.  No changes in
functionality.
2017-08-23 12:29:38 -04:00
Peter Eisentraut 580ddcec39 Fix translation marker
This was erroneously removed in
55a70a023c.
2017-08-23 09:56:38 -04:00
Andres Freund 8c0d7bafad Hash tables backed by DSA shared memory.
Add general purpose chaining hash tables for DSA memory.  Unlike
DynaHash in shared memory mode, these hash tables can grow as
required, and cope with being mapped into different addresses in
different backends.

There is a wide range of potential users for such a hash table, though
it's very likely the interface will need to evolve as we come to
understand the needs of different kinds of users.  E.g support for
iterators and incremental resizing is planned for later commits and
the details of the callback signatures are likely to change.

Author: Thomas Munro
Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas
Discussion:
	https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com
	https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-22 22:43:07 -07:00
Andres Freund 35ea75632a Refactor typcache.c's record typmod hash table.
Previously, tuple descriptors were stored in chains keyed by a fixed size
array of OIDs.  That meant there were effectively two levels of collision
chain -- one inside and one outside the hash table.  Instead, let dynahash.c
look after conflicts for us by supplying a proper hash and equal function
pair.

This is a nice cleanup on its own, but also simplifies followup
changes allowing blessed TupleDescs to be shared between backends
participating in parallel query.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com
2017-08-22 16:11:54 -07:00
Peter Eisentraut 2bfd1b1ee5 Don't install ICU collation keyword variants
Users can still create them themselves.  Instead, document Unicode TR 35
collation options for ICU, so users can create all this themselves.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
2017-08-21 19:21:07 -04:00
Peter Eisentraut 51e225da30 Expand set of predefined ICU locales
Install language+region combinations even if they are not distinct from
the language's base locale.  This gives better long-term stability of
the set of predefined locales and makes the predefined locales less
implementation-dependent and more practical for users.

Reviewed-by: Peter Geoghegan <pg@bowt.ie>
2017-08-21 19:21:07 -04:00
Robert Haas 1f6d515a67 Push limit through subqueries to underlying sort, where possible.
Douglas Doole, reviewed by Ashutosh Bapat and by me.  Minor formatting
change by me.

Discussion: http://postgr.es/m/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w@mail.gmail.com
2017-08-21 14:19:44 -04:00
Robert Haas 79ccd7cbd5 pg_prewarm: Add automatic prewarm feature.
Periodically while the server is running, and at shutdown, write out a
list of blocks in shared buffers.  When the server reaches consistency
-- unfortunatey, we can't do it before that point without breaking
things -- reload those blocks into any still-unused shared buffers.

Mithun Cy and Robert Haas, reviewed and tested by Beena Emerson,
Amit Kapila, Jim Nasby, and Rafia Sabih.

Discussion: http://postgr.es/m/CAD__OugubOs1Vy7kgF6xTjmEqTR4CrGAv8w+ZbaY_+MZeitukw@mail.gmail.com
2017-08-21 14:17:39 -04:00
Noah Misch 66ed3829df Inject $(ICU_LIBS) regardless of platform.
It appeared in a conditional that excludes AIX, Cygwin and MinGW.  Give
ICU support a chance to work on those platforms.  Back-patch to v10,
where ICU support was introduced.
2017-08-20 21:22:18 -07:00
Andres Freund c6293249dc Partially flatten struct tupleDesc so that it can be used in DSM.
TupleDesc's attributes were already stored in contiguous memory after the
struct.  Go one step further and get rid of the array of pointers to
attributes so that they can be stored in shared memory mapped at different
addresses in each backend.  This won't work for TupleDescs with contraints
and defaults, since those point to other objects, but for many purposes
only attributes are needed.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:12 -07:00
Andres Freund 2cd7084524 Change tupledesc->attrs[n] to TupleDescAttr(tupledesc, n).
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc.  Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.

Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
2017-08-20 11:19:07 -07:00
Peter Eisentraut 24620fc52b Fix creation of ICU comments for keyword variants
It would create the comment referring to the keyword-less parent
locale.  This was broken in ddb5fdc068.
2017-08-18 23:02:28 -04:00
Robert Haas c4b841ba6a Fix interaction of triggers, partitioning, and EXPLAIN ANALYZE.
Add a new EState member es_leaf_result_relations, so that the trigger
code knows about ResultRelInfos created by tuple routing.  Also make
sure ExplainPrintTriggers knows about partition-related
ResultRelInfos.

Etsuro Fujita, reviewed by Amit Langote

Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp
2017-08-18 13:01:05 -04:00
Robert Haas 54cde0c4c0 Don't lock tables in RelationGetPartitionDispatchInfo.
Instead, lock them in the caller using find_all_inheritors so that
they get locked in the standard order, minimizing deadlock risks.

Also in RelationGetPartitionDispatchInfo, avoid opening tables which
are not partitioned; there's no need.

Amit Langote, reviewed by Ashutosh Bapat and Amit Khandekar

Discussion: http://postgr.es/m/91b36fa1-c197-b72f-ca6e-56c593bae68c@lab.ntt.co.jp
2017-08-17 15:43:09 -04:00
Robert Haas ecfe59e50f Refactor validation of new partitions a little bit.
Move some logic that is currently in ATExecAttachPartition to
separate functions to facilitate future code reuse.

Ashutosh Bapat and Jeevan Ladhe

Discussion: http://postgr.es/m/CA+Tgmobbnamyvii0pRdg9pp_jLHSUvq7u5SiRrVV0tEFFU58Tg@mail.gmail.com
2017-08-17 14:49:45 -04:00
Robert Haas 1e56883a52 Attempt to clarify comments related to force_parallel_mode.
Per discussion with Tom Lane.

Discussion: http://postgr.es/m/28589.1502902172@sss.pgh.pa.us
2017-08-17 14:09:14 -04:00
Tom Lane a2b70c89ca Fix ExecReScanGatherMerge.
Not surprisingly, since it'd never ever been tested, ExecReScanGatherMerge
didn't work.  Fix it, and add a regression test case to exercise it.

Amit Kapila

Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
2017-08-17 13:49:22 -04:00
Tom Lane 963af96920 Add missing "static" marker.
Per pademelon.
2017-08-17 11:17:39 -04:00
Heikki Linnakangas dcd052c8d2 Fix pg_atomic_u64 initialization.
As Andres pointed out, pg_atomic_init_u64 must be used to initialize an
atomic variable, before it can be accessed with the actual atomic ops.
Trying to use pg_atomic_write_u64 on an uninitialized variable leads to a
failure with the fallback implementation that uses a spinlock.

Discussion: https://www.postgresql.org/message-id/20170816191346.d3ke5tpshhco4bnd%40alap3.anarazel.de
2017-08-17 00:48:44 +03:00
Tom Lane 2b74303637 Make the planner assume that the entries in a VALUES list are distinct.
Previously, if we had to estimate the number of distinct values in a
VALUES column, we fell back on the default behavior used whenever we lack
statistics, which effectively is that there are Min(# of entries, 200)
distinct values.  This can be very badly off with a large VALUES list,
as noted by Jeff Janes.

We could consider actually running an ANALYZE-like scan on the VALUES,
but that seems unduly expensive, and anyway it could not deliver reliable
info if the entries are not all constants.  What seems like a better choice
is to assume that the values are all distinct.  This will sometimes be just
as wrong as the old code, but it seems more likely to be more nearly right
in many common cases.  Also, it is more consistent with what happens in
some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and
WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly.

This was discussed some time ago, but consensus was it'd be better
to slip it in at the start of a development cycle not near the end.
(It should've gone into v10, really, but I forgot about it.)

Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com
2017-08-16 15:37:20 -04:00
Heikki Linnakangas ac883ac453 Fix shm_toc.c to always return buffer-aligned memory.
Previously, if you passed a non-aligned size to shm_toc_create(), the
memory returned by shm_toc_allocate() would be similarly non-aligned.
This was exposed by commit 3cda10f41b, which allocated structs containing
a pg_atomic_uint64 field with shm_toc_allocate(). On systems with
MAXIMUM_ALIGNOF = 4, such structs still need to be 8-bytes aligned, but
the memory returned by shm_toc_allocate() was only 4-bytes aligned.

It's quite bogus that we abuse BUFFERALIGN to align the structs for
pg_atomic_uint64. It doesn't really have anything to do with buffers. But
that's a separate issue.

This ought to fix the buildfarm failures on 32-bit x86 systems.

Discussion: https://www.postgresql.org/message-id/7e0a73a5-0df9-1859-b8ae-9acf122dc38d@iki.fi
2017-08-16 21:52:38 +03:00
Peter Eisentraut 9b5140fb50 Correct representation of foreign tables in information schema
tables.table_type is supposed to be 'FOREIGN' rather than 'FOREIGN
TABLE' according to the SQL standard.
2017-08-16 11:03:33 -04:00
Heikki Linnakangas 3cda10f41b Use atomic ops to hand out pages to scan in parallel scan.
With a lot of CPUs, the spinlock that protects the current scan location
in a parallel scan can become a bottleneck. Use an atomic fetch-and-add
instruction instead.

David Rowley

Discussion: https://www.postgresql.org/message-id/CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy%3DB%3DbhqgaKn7cYzm-w@mail.gmail.com
2017-08-16 16:18:41 +03:00
Heikki Linnakangas 0c504a80cf Remove dedicated B-tree root-split record types.
Since commit 40dae7ec53, which changed the way b-tree page splitting
works, there has been no difference in the handling of root, and non-root
split WAL records. We don't need to distinguish them anymore

If you're worried about the loss of debugging information, note that
usually a root split record will normally be followed by a WAL record to
create the new root page. The root page will also have the BTP_ROOT flag
set on the page itself, and there is a pointer to it from the metapage.

Author: Aleksander Alekseev
Discussion: https://www.postgresql.org/message-id/20170406122116.GA11081@e733.localdomain
2017-08-16 12:24:40 +03:00
Peter Eisentraut 77d05706be Fix up some misusage of appendStringInfo() and friends
Change to appendStringInfoChar() or appendStringInfoString() where those
can be used.

Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
2017-08-15 23:34:39 -04:00
Peter Eisentraut 4d4c891715 Initialize replication_slot_catalog_xmin in procarray
Although not confirmed and probably rare, if the newly allocated memory
is not already zero, this could possibly have caused some problems.

Also reorder the initializations slightly so they match the order of the
struct definition.

Author: Wong, Yi Wen <yiwong@amazon.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-15 21:05:21 -04:00
Peter Eisentraut 0659465caa Include foreign tables in information_schema.table_privileges
This appears to have been an omission in the original commit
0d692a0dc9.  All related information_schema views already include
foreign tables.

Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com>
2017-08-15 19:27:22 -04:00
Alvaro Herrera 31ae1638ce Simplify autovacuum work-item implementation
The initial implementation of autovacuum work-items used a dynamic
shared memory area (DSA).  However, it's argued that dynamic shared
memory is not portable enough, so we cannot rely on it being supported
everywhere; at the same time, autovacuum work-items are now a critical
part of the server, so it's not acceptable that they don't work in the
cases where dynamic shared memory is disabled.  Therefore, let's fall
back to a simpler implementation of work-items that just uses
autovacuum's main shared memory segment for storage.

Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
2017-08-15 18:14:07 -03:00
Peter Eisentraut 70b573b267 Fix logical replication protocol comparison logic
Since we currently only have one protocol, this doesn't make much of a
difference other than the error message.

Author: Yugo Nagata <nagata@sraoss.co.jp>
2017-08-15 16:21:19 -04:00
Peter Eisentraut e42351ae07 Simplify some code in logical replication launcher
Avoid unnecessary locking calls when a subscription is disabled.

Author: Yugo Nagata <nagata@sraoss.co.jp>
2017-08-15 15:13:06 -04:00
Tom Lane 4867d7f62f Avoid out-of-memory in a hash join with many duplicate inner keys.
The executor is capable of splitting buckets during a hash join if
too much memory is being used by a small number of buckets.  However,
this only helps if a bucket's population is actually divisible; if
all the hash keys are alike, the tuples still end up in the same
new bucket.  This can result in an OOM failure if there are enough
inner keys with identical hash values.  The planner's cost estimates
will bias it against choosing a hash join in such situations, but not
by so much that it will never do so.  To mitigate the OOM hazard,
explicitly estimate the hash bucket space needed by just the inner
side's most common value, and if that would exceed work_mem then
add disable_cost to the hash cost estimate.

This approach doesn't account for the possibility that two or more
common values would share the same hash value.  On the other hand,
work_mem is normally a fairly conservative bound, so that eating
two or more times that much space is probably not going to kill us.

If we have no stats about the inner side, ignore this consideration.
There was some discussion of making a conservative assumption, but that
would effectively result in disabling hash join whenever we lack stats,
which seems like an overreaction given how seldom the problem manifests
in the field.

Per a complaint from David Hinkle.  Although this could be viewed
as a bug fix, the lack of similar complaints weighs against back-
patching; indeed we waited for v11 because it seemed already rather
late in the v10 cycle to be making plan choice changes like this one.

Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
2017-08-15 14:05:53 -04:00
Alvaro Herrera d9a622cee1 Fix error handling path in autovacuum launcher
The original code (since 00e6a16d01) was assuming aborting the
transaction in autovacuum launcher was sufficient to release all
resources, but in reality the launcher runs quite a lot of code out of
any transactions.  Re-introduce individual cleanup calls to make abort
more robust.

Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
2017-08-15 13:35:12 -03:00
Robert Haas e139f1953f Assorted preparatory refactoring for partition-wise join.
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.

Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.

Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 12:30:38 -04:00
Tom Lane 00418c6124 Simplify plpgsql's check for simple expressions.
plpgsql wants to recognize expressions that it can execute directly
via ExecEvalExpr() instead of going through the full SPI machinery.
Originally the test for this consisted of recursively groveling through
the post-planning expression tree to see if it contained only nodes that
plpgsql recognized as safe.  That was a major maintenance headache, since
it required updating plpgsql every time we added any kind of expression
node.  It was also kind of expensive, so over time we added various
pre-planning checks to try to short-circuit having to do that.
Robert Haas pointed out that as of the SRF-processing changes in v10,
particularly the addition of Query.hasTargetSRFs, there really isn't
any reason to make the recursive scan at all: the initial checks cover
everything we really care about.  We do have to make sure that those
checks agree with what inline_function() considers, so that inlining
of a function that formerly wasn't inlined can't cause an expression
considered simple to become non-simple.

Hence, delete the recursive function exec_simple_check_node(), and tweak
those other tests to more exactly agree with inline_function().  Adjust
some comments and function naming to match.

Discussion: https://postgr.es/m/CA+TgmoZGZpwdEV2FQWaVxA_qZXsQE1DAS5Fu8fwxXDNvfndiUQ@mail.gmail.com
2017-08-15 12:28:39 -04:00
Tom Lane f3a4d7e7c2 Distinguish wait-for-connection from wait-for-write-ready on Windows.
The API for WaitLatch and friends followed the Unix convention in which
waiting for a socket connection to complete is identical to waiting for
the socket to accept a write.  While Windows provides a select(2)
emulation that agrees with that, the native WaitForMultipleObjects API
treats them as quite different --- and for some bizarre reason, it will
report a not-yet-connected socket as write-ready.  libpq itself has so
far escaped dealing with this because it waits with select(), but in
libpqwalreceiver.c we want to wait using WaitLatchOrSocket.  The semantics
mismatch resulted in replication connection failures on Windows, but only
for remote connections (apparently, localhost connections complete
immediately, or at least too fast for anyone to have noticed the problem
in single-machine testing).

To fix, introduce an additional WL_SOCKET_CONNECTED wait flag for
WaitLatchOrSocket, which is identical to WL_SOCKET_WRITEABLE on
non-Windows, but results in waiting for FD_CONNECT events on Windows.

Ideally, we would also distinguish the two conditions in the API for
PQconnectPoll(), but changing that API at this point seems infeasible.
Instead, cheat by checking for PQstatus() == CONNECTION_STARTED to
determine that we're still waiting for the connection to complete.
(This is a cheat mainly because CONNECTION_STARTED is documented as an
internal state rather than something callers should rely on.  Perhaps
we ought to change the documentation ... but this patch doesn't.)

Per reports from Jobin Augustine and Igor Neyman.  Back-patch to v10
where commit 1e8a85009 exposed this longstanding shortcoming.

Andres Freund, minor fix and some code review/beautification by me

Discussion: https://postgr.es/m/CAHBggj8g2T+ZDcACZ2FmzX9CTxkWjKBsHd6NkYB4i9Ojf6K1Fw@mail.gmail.com
2017-08-15 11:07:57 -04:00
Robert Haas 480f1f4329 Teach adjust_appendrel_attrs(_multilevel) to do multiple translations.
Currently, child relations are always base relations, so when we
translate parent relids to child relids, we only need to translate
a singler relid.  However, the proposed partition-wise join feature
will create child joins, which will mean we need to translate a set
of parent relids to the corresponding child relids.  This is
preliminary refactoring to make that possible.

Ashutosh Bapat.  Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.  Some adjustments, mostly
cosmetic, by me.

Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
2017-08-15 10:49:06 -04:00
Robert Haas d57929afc7 Avoid unnecessary single-child Append nodes.
Before commit d3cc37f1d8, an inheritance parent
whose only children were temp tables of other sessions would end up
as a simple scan of the parent; but with that commit, we end up with
an Append node, per a report from Ashutosh Bapat.  Tweak the logic
so that we go back to the old way, and update the function header
comment for partitioning while we're at it.

Ashutosh Bapat, reviewed by Amit Langote and adjusted by me.

Discussion: http://postgr.es/m/CAFjFpReWJr1yTkHU=OqiMBmcYCMoSW3VPR39RBuQ_ovwDFBT5Q@mail.gmail.com
2017-08-15 09:16:33 -04:00
Robert Haas 1295a77788 Add missing call to ExecReScanGatherMerge.
Amit Kapila

Discussion: http://postgr.es/m/CAA4eK1KeQWZOoDmDmGMwuqzPW9JhRS+ditQVFdAfGjNmMZzqMQ@mail.gmail.com
2017-08-15 08:06:36 -04:00
Tom Lane 21d304dfed Final pgindent + perltidy run for v10. 2017-08-14 17:29:33 -04:00
Tom Lane 5b6289c1e0 Handle elog(FATAL) during ROLLBACK more robustly.
Stress testing by Andreas Seltenreich disclosed longstanding problems that
occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are
trying to execute a ROLLBACK of an already-failed transaction.  In such a
case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction
would skip AbortTransaction and go straight to CleanupTransaction.  This
led to an assert failure in an assert-enabled build (due to the ROLLBACK's
portal still having a cleanup hook) or without assertions, to a FATAL exit
complaining about "cannot drop active portal".  The latter's not
disastrous, perhaps, but it's messy enough to want to improve it.

We don't really want to run all of AbortTransaction in this code path.
The minimum required to clean up the open portal safely is to do
AtAbort_Memory and AtAbort_Portals.  It seems like a good idea to
do AtAbort_Memory unconditionally, to be entirely sure that we are
starting with a safe CurrentMemoryContext.  That means that if the
main loop in AbortOutOfAnyTransaction does nothing, we need an extra
step at the bottom to restore CurrentMemoryContext = TopMemoryContext,
which I chose to do by invoking AtCleanup_Memory.  This'll result in
calling AtCleanup_Memory twice in many of the paths through this function,
but that seems harmless and reasonably inexpensive.

The original motivation for the assertion in AtCleanup_Portals was that
we wanted to be sure that any user-defined code executed as a consequence
of the cleanup hook runs during AbortTransaction not CleanupTransaction.
That still seems like a valid concern, and now that we've seen one case
of the assertion firing --- which means that exactly that would have
happened in a production build --- let's replace the Assert with a runtime
check.  If we see the cleanup hook still set, we'll emit a WARNING and
just drop the hook unexecuted.

This has been like this a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
2017-08-14 15:43:20 -04:00
Peter Eisentraut 7f1bb1d734 Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-14 13:53:05 -04:00
Tom Lane 004a9702e0 Remove AtEOXact_CatCache().
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1.  We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds.  There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.

However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.

There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run.  That's pretty scary though: it could easily create
more problems than it solves.  Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.

There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing.  We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it.  It hasn't found any bugs for us in many years.

Per report from Jeevan Chalke.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
2017-08-13 16:15:14 -04:00
Alvaro Herrera 2336f84284 Reword comment for clarity
Reported by Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
2017-08-12 23:26:35 -04:00
Peter Eisentraut a1ef920e27 Remove uses of "slave" in replication contexts
This affects mostly code comments, some documentation, and tests.
Official APIs already used "standby".
2017-08-10 22:55:41 -04:00
Robert Haas bb5d6e80b1 Improve the error message when creating an empty range partition.
The previous message didn't mention the name of the table or the
bounds.  Put the table name in the primary error message and the
bounds in the detail message.

Amit Langote, changed slightly by me.  Suggestions on the exac
phrasing from Tom Lane, David G. Johnston, and Dean Rasheed.

Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com
2017-08-10 13:46:56 -04:00
Robert Haas ec99dd5aee Remove incorrect assertion in clog.c
We must advance the oldest XID that can be safely looked up in clog
*before* truncating CLOG, and the oldest XID that can't be reused
*after* truncating CLOG.  This assertion, and the accompanying
comment, are confused; remove them.

Reported by Neha Sharma.

Discussion: http://postgr.es/m/CANiYTQumC3T=UMBMd1Hor=5XWZYuCEQBioL3ug0YtNQCMMT5wQ@mail.gmail.com
2017-08-10 11:20:57 -04:00
Tom Lane 749c7c4170 Fix handling of container types in find_composite_type_dependencies.
find_composite_type_dependencies correctly found columns that are of
the specified type, and columns that are of arrays of that type, but
not columns that are domains or ranges over the given type, its array
type, etc.  The most general way to handle this seems to be to assume
that any type that is directly dependent on the specified type can be
treated as a container type, and processed recursively (allowing us
to handle nested cases such as ranges over domains over arrays ...).
Since a type's array type already has such a dependency, we can drop
the existing special case for the array type.

The very similar logic in get_rels_with_domain was likewise a few
bricks shy of a load, as it supposed that a directly dependent type
could *only* be a sub-domain.  This is already wrong for ranges over
domains, and it'll someday be wrong for arrays over domains.

Add test cases illustrating the problems, and back-patch to all
supported branches.

Discussion: https://postgr.es/m/15268.1502309024@sss.pgh.pa.us
2017-08-09 17:03:09 -04:00
Tom Lane 9bf4068cc3 Fix datumSerialize infrastructure to not crash on non-varlena data.
Commit 1efc7e538 did a poor job of emulating existing logic for touching
Datums that might be expanded-object pointers.  It didn't check for typlen
being -1 first, which meant it could crash on fixed-length pass-by-ref
values, and probably on cstring values as well.  It also didn't use
DatumGetPointer before VARATT_IS_EXTERNAL_EXPANDED, which while currently
harmless is not according to documentation nor prevailing style.

I also think the lack of any explanation as to why datumSerialize makes
these particular nonobvious choices is pretty awful, so fix that.

Per report from Jarred Ward.  Back-patch to 9.6 where this code came in.

Discussion: https://postgr.es/m/6F61E6D2-2F5E-4794-9479-A429BE1CEA4B@simple.com
2017-08-08 19:18:22 -04:00
Alvaro Herrera 77d2c00af7 Reword some unclear comments 2017-08-08 18:48:01 -04:00
Alvaro Herrera f5d54ef97a Fix typo in comment 2017-08-08 18:34:25 -04:00
Alvaro Herrera b2c95a3798 Fix replication origin-related race conditions
Similar to what was fixed in commit 9915de6c1c for replication slots,
but this time it's related to replication origins: DROP SUBSCRIPTION
attempts to drop the replication origin, but that fails if the
replication worker process hasn't yet marked it unused.  This causes
failures in the buildfarm:
ERROR:  could not drop replication origin with OID 1, in use by PID 34069

Like the aforementioned commit, fix by having the process running DROP
SUBSCRIPTION sleep until the worker marks the the replication origin
struct as free.  This uses a condition variable on each replication
origin shmem state struct, so that the session trying to drop can sleep
and expect to be awakened by the process keeping the origin open.

Also fix a SGML markup in the previous commit.

Discussion: https://postgr.es/m/20170808001433.rozlseaf4m2wkw3n@alvherre.pgsql
2017-08-08 16:07:46 -04:00
Alvaro Herrera 030273b7ea Fix inadequacies in recently added wait events
In commit 9915de6c1c, we introduced a new wait point for replication
slots and incorrectly labelled it as wait event PG_WAIT_LOCK.  That's
wrong, so invent an appropriate new wait event instead, and document it
properly.

While at it, fix numerous other problems in the vicinity:
- two different walreceiver wait events were being mixed up in a single
  wait event (which wasn't documented either); split it out so that they
  can be distinguished, and document the new events properly.

- ParallelBitmapPopulate was documented but didn't exist.

- ParallelBitmapScan was not documented (I think this should be called
  "ParallelBitmapScanInit" instead.)

- Logical replication wait events weren't documented

- various symbols had been added in dartboard order in various places.
  Put them in alphabetical order instead, as was originally intended.

Discussion: https://postgr.es/m/20170808181131.mu4fjepuh5m75cyq@alvherre.pgsql
2017-08-08 15:37:44 -04:00
Peter Eisentraut cdc47d1f39 Update SQL features list 2017-08-07 14:30:24 -04:00
Peter Eisentraut f7668b2b35 Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 1a0b5e655d7871506c2b1c7ba562c2de6b6a55de
2017-08-07 13:55:34 -04:00
Peter Eisentraut fca17a933b Fix local/remote attribute mix-up in logical replication
This would lead to failures if local and remote tables have a different
column order.  The tests previously didn't catch that because they only
tested the initial data copy.  So add another test that exercises the
apply worker.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-08-07 10:49:08 -04:00
Peter Eisentraut 0e58455dd4 Fix handling of dropped columns in logical replication
The relation attribute map was not initialized for dropped columns,
leading to errors later on.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Scott Milliken <scott@deltaex.com>
Bug: #14769
2017-08-07 10:28:35 -04:00
Tom Lane 8d9881911f Require update permission for the large object written by lo_put().
lo_put() surely should require UPDATE permission, the same as lowrite(),
but it failed to check for that, as reported by Chapman Flack.  Oversight
in commit c50b7c09d; backpatch to 9.4 where that was introduced.

Tom Lane and Michael Paquier

Security: CVE-2017-7548
2017-08-07 10:19:19 -04:00
Noah Misch e568e1eee4 Again match pg_user_mappings to information_schema.user_mapping_options.
Commit 3eefc51053 claimed to make
pg_user_mappings enforce the qualifications user_mapping_options had
been enforcing, but its removal of a longstanding restriction left them
distinct when the current user is the subject of a mapping yet has no
server privileges.  user_mapping_options emits no rows for such a
mapping, but pg_user_mappings includes full umoptions.  Change
pg_user_mappings to show null for umoptions.  Back-patch to 9.2, like
the above commit.

Reviewed by Tom Lane.  Reported by Jeff Janes.

Security: CVE-2017-7547
2017-08-07 07:09:28 -07:00
Heikki Linnakangas bf6b9e9444 Don't allow logging in with empty password.
Some authentication methods allowed it, others did not. In the client-side,
libpq does not even try to authenticate with an empty password, which makes
using empty passwords hazardous: an administrator might think that an
account with an empty password cannot be used to log in, because psql
doesn't allow it, and not realize that a different client would in fact
allow it. To clear that confusion and to be be consistent, disallow empty
passwords in all authentication methods.

All the authentication methods that used plaintext authentication over the
wire, except for BSD authentication, already checked that the password
received from the user was not empty. To avoid forgetting it in the future
again, move the check to the recv_password_packet function. That only
forbids using an empty password with plaintext authentication, however.
MD5 and SCRAM need a different fix:

* In stable branches, check that the MD5 hash stored for the user does not
not correspond to an empty string. This adds some overhead to MD5
authentication, because the server needs to compute an extra MD5 hash, but
it is not noticeable in practice.

* In HEAD, modify CREATE and ALTER ROLE to clear the password if an empty
string, or a password hash that corresponds to an empty string, is
specified. The user-visible behavior is the same as in the stable branches,
the user cannot log in, but it seems better to stop the empty password from
entering the system in the first place. Secondly, it is fairly expensive to
check that a SCRAM hash doesn't correspond to an empty string, because
computing a SCRAM hash is much more expensive than an MD5 hash by design,
so better avoid doing that on every authentication.

We could clear the password on CREATE/ALTER ROLE also in stable branches,
but we would still need to check at authentication time, because even if we
prevent empty passwords from being stored in pg_authid, there might be
existing ones there already.

Reported by Jeroen van der Ham, Ben de Graaff and Jelte Fennema.

Security: CVE-2017-7546
2017-08-07 17:03:42 +03:00
Peter Eisentraut 86524f0387 Fix function name in code comment
Reported-by: Peter Geoghegan <pg@bowt.ie>
2017-08-07 09:49:55 -04:00
Peter Eisentraut ad2ca3cba6 Improve wording of subscription refresh debug messages
Reported-by: Yugo Nagata <nagata@sraoss.co.jp>
2017-08-07 09:40:12 -04:00
Peter Eisentraut 6f81306e4d Downgrade subscription refresh messages to DEBUG1
The NOTICE messages about tables being added or removed during
subscription refresh would be incorrect and possibly confusing if the
transaction rolls back, so silence them but keep them available for
debugging.

Discussion: https://www.postgresql.org/message-id/CAD21AoAvaXizc2h7aiNyK_i0FQSa-tmhpdOGwbhh7Jy544Ad4Q%40mail.gmail.com
2017-08-07 09:16:03 -04:00
Andres Freund 5af4456a56 Fix thinko introduced in 2bef06d516 et al.
The callers for GetOldestSafeDecodingTransactionId() all inverted the
argument for the argument introduced in 2bef06d516. Luckily this
appears to be inconsequential for the moment, as we wait for
concurrent in-progress transaction when assembling a
snapshot. Additionally this could only make a difference when adding a
second logical slot, because only a pre-existing slot could cause an
issue by lowering the returned xid dangerously much.

Reported-By: Antonin Houska
Discussion: https://postgr.es/m/32704.1496993134@localhost
Backport: 9.4-, where 2bef06d516 was backpatched to.
2017-08-06 14:20:55 -07:00
Tom Lane e9f4ac1389 Suppress unused-variable warnings when building with ICU 4.2.
Tidy-up for commit eccead9ed.
2017-08-05 11:48:43 -04:00
Robert Haas 52f8a59dd9 Make pg_stop_backup's wait_for_archive flag work on standbys.
Previously, it had no effect.  Now, if archive_mode=always, it will
work, and if not, you'll get a warning.

Masahiko Sawada, Michael Paquier, and Robert Haas.  The patch as
submitted also changed the behavior so that we would write and remove
history files on standbys, but that seems like material for a separate
patch to me.

Discussion: http://postgr.es/m/CAD21AoC2Xw6M=ZJyejq_9d_iDkReC_=rpvQRw5QsyzKQdfYpkw@mail.gmail.com
2017-08-05 10:49:26 -04:00
Peter Eisentraut eccead9ed4 Add support for ICU 4.2
Supporting ICU 4.2 seems useful because it ships with CentOS 6.

Versions before ICU 4.6 don't support pkg-config, so document an
installation method without using pkg-config.

In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns values that
will not be accepted by uloc_toLanguageTag().  Skip loading keyword
variants in that version.

Reported-by: Victor Wagner <vitus@wagner.pp.ru>
2017-08-05 09:32:42 -04:00
Robert Haas f85f88bcc2 Fix bug in deciding whether to scan newly-attached partition.
If the table being attached had different attribute numbers than the
parent, the old code could incorrectly decide it needed to be scanned.

Amit Langote, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/CA+TgmobexgbBr2+Utw-pOMw9uxaBRKRjMW_-mmzKKx9PejPLMg@mail.gmail.com
2017-08-04 22:01:37 -04:00
Peter Eisentraut 7e174fa793 Only kill sync workers at commit time in subscription DDL
This allows a transaction abort to avoid killing those workers.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-08-04 21:17:47 -04:00
Robert Haas ff98a5e1e4 hash: Immediately after a bucket split, try to clean the old bucket.
If it works, then we won't be storing two copies of all the tuples
that were just moved.  If not, VACUUM will still take care of it
eventually.  Per a report from AP and analysis from Amit Kapila, it
seems that a bulk load can cause splits fast enough that VACUUM won't
deal with the problem in time to prevent bloat.

Amit Kapila; I rewrote the comment.

Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
2017-08-04 19:33:01 -04:00
Tom Lane c30f1770a9 Apply ALTER ... SET NOT NULL recursively in ALTER ... ADD PRIMARY KEY.
If you do ALTER COLUMN SET NOT NULL against an inheritance parent table,
it will recurse to mark all the child columns as NOT NULL as well.  This
is necessary for consistency: if the column is labeled NOT NULL then
reading it should never produce nulls.

However, that didn't happen in the case where ALTER ... ADD PRIMARY KEY
marks a target column NOT NULL that wasn't before.  That was questionable
from the beginning, and now Tushar Ahuja points out that it can lead to
dump/restore failures in some cases.  So let's make that case recurse too.

Although this is meant to fix a bug, it's enough of a behavioral change
that I'm pretty hesitant to back-patch, especially in view of the lack
of similar field complaints.  It doesn't seem to be too late to put it
into v10 though.

Michael Paquier, editorialized on slightly by me

Discussion: https://postgr.es/m/b8794d6a-38f0-9d7c-ad4b-e85adf860fc9@enterprisedb.com
2017-08-04 11:45:18 -04:00
Tom Lane 97d3a0b090 Disallow SSL session tickets.
We don't actually support session tickets, since we do not create an SSL
session identifier.  But it seems that OpenSSL will issue a session ticket
on-demand anyway, which will then fail when used.  This results in
reconnection failures when using ticket-aware client-side SSL libraries
(such as the Npgsql .NET driver), as reported by Shay Rojansky.

To fix, just tell OpenSSL not to issue tickets.  At some point in the
far future, we might consider enabling tickets instead.  But the security
implications of that aren't entirely clear; and besides it would have
little benefit except for very short-lived database connections, which is
Something We're Bad At anyhow.  It would take a lot of other work to get
to a point where that would really be an exciting thing to do.

While at it, also tell OpenSSL not to use a session cache.  This doesn't
really do anything, since a backend would never populate the cache anyway,
but it might gain some micro-efficiencies and/or reduce security
exposures.

Patch by me, per discussion with Heikki Linnakangas and Shay Rojansky.
Back-patch to all supported versions.

Discussion: https://postgr.es/m/CADT4RqBU8N-csyZuzaook-c795dt22Zcwg1aHWB6tfVdAkodZA@mail.gmail.com
2017-08-04 11:07:10 -04:00
Peter Eisentraut b374481221 Further unify ROLE and USER command grammar rules
ALTER USER ... SET did not support all the syntax variants of ALTER ROLE
...  SET.  Fix that, and to avoid further deviations of this kind, unify
many the grammar rules for ROLE/USER/GROUP commands.

Reported-by: Pavel Golub <pavel@microolap.com>
2017-08-03 20:34:45 -04:00
Robert Haas 972b6ec20b Fix lock upgrade hazard in ATExecAttachPartition.
Amit Langote

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-08-03 14:21:00 -04:00
Robert Haas 583df3b5c5 Code beautification for ATExecAttachPartition.
Amit Langote

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-08-03 14:19:59 -04:00
Robert Haas 86705aa8c3 Allow a foreign table CHECK constraint to be initially NOT VALID.
For a table, the constraint can be considered validated immediately,
because the table must be empty.  But for a foreign table this is
not necessarily the case.

Fixes a bug in commit f27a6b15e6.

Amit Langote, with some changes by me.

Discussion: http://postgr.es/m/d2b7419f-4a71-cf86-cc99-bfd0f359a1ea@lab.ntt.co.jp
2017-08-03 13:24:48 -04:00
Robert Haas 12a34f59bf Improve ExecModifyTable comments.
Some of these comments wrongly implied that only an AFTER ROW trigger
will cause a 'wholerow' attribute to be present for a foreign table,
but a BEFORE ROW trigger can have the same effect.  Others implied
that it would always be present for a foreign table, but that's not
true either.

Etsuro Fujita and Robert Haas

Discussion: http://postgr.es/m/10026bc7-1403-ef85-9e43-c6100c1cc0e3@lab.ntt.co.jp
2017-08-03 12:47:00 -04:00
Robert Haas 610e8ebb0f Teach map_partition_varattnos to handle whole-row expressions.
Otherwise, partitioned tables with RETURNING expressions or subject
to a WITH CHECK OPTION do not work properly.

Amit Langote, reviewed by Amit Khandekar and Etsuro Fujita.  A few
comment changes by me.

Discussion: http://postgr.es/m/9a39df80-871e-6212-0684-f93c83be4097@lab.ntt.co.jp
2017-08-03 11:21:29 -04:00
Tom Lane 9d4e566999 Remove broken and useless entry-count printing in HASH_DEBUG code.
init_htab(), with #define HASH_DEBUG, prints a bunch of hashtable
parameters.  It used to also print nentries, but commit 44ca4022f changed
that to "hash_get_num_entries(hctl)", which is wrong (the parameter should
be "hashp").

Rather than correct the coding, though, let's just remove that field from
the printout.  The table must be empty, since we just finished building
it, so expensively calculating the number of entries is rather pointless.
Moreover hash_get_num_entries makes assumptions (about not needing locks)
which we could do without in debugging code.

Noted by Choi Doo-Won in bug #14764.  Back-patch to 9.6 where the
faulty code was introduced.

Discussion: https://postgr.es/m/20170802032353.8424.12274@wrigleys.postgresql.org
2017-08-02 12:17:08 -04:00
Peter Eisentraut cf65201833 Get a snapshot before COPY in table sync
This fixes a crash if the local table has a function index and the
function makes non-immutable calls.

Reported-by: Scott Milliken <scott@deltaex.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-08-02 11:34:42 -04:00
Tom Lane f352f91cbf Remove duplicate setting of SSL_OP_SINGLE_DH_USE option.
Commit c0a15e07c moved the setting of OpenSSL's SSL_OP_SINGLE_DH_USE option
into a new subroutine initialize_dh(), but forgot to remove it from where
it was.  SSL_CTX_set_options() is a trivial function, amounting indeed to
just "ctx->options |= op", hence there's no reason to contort the code or
break separation of concerns to avoid calling it twice.  So separating the
DH setup from disabling of old protocol versions is a good change, but we
need to finish the job.

Noted while poking into the question of SSL session tickets.
2017-08-02 11:28:49 -04:00
Peter Eisentraut 41cefbb6db Fix OBJECT_TYPE/OBJECT_DOMAIN confusion
This doesn't have a significant impact except that now SECURITY LABEL ON
DOMAIN rejects types that are not domains.

Reported-by: 高增琦 <pgf00a@gmail.com>
2017-08-02 10:40:32 -04:00
Tom Lane 514f613293 Second try at getting useful errors out of newlocale/_create_locale.
The early buildfarm returns for commit 1e165d05f are pretty awful:
not only does Windows not return a useful error, but it looks like
a lot of Unix-ish platforms don't either.  Given the number of
different errnos seen so far, guess that what's really going on is
that some newlocale() implementations fail to set errno at all.
Hence, let's try zeroing errno just before newlocale() and then
if it's still zero report as though it's ENOENT.  That should cover
the Windows case too.

It's clear that we'll have to drop the regression test case, unless
we want to maintain a separate expected-file for platforms without
HAVE_LOCALE_T.  But I'll leave it there awhile longer to see if this
actually improves matters or not.

Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 17:17:20 -04:00
Tom Lane 1e165d05fe Try to deliver a sane message for _create_locale() failure on Windows.
We were just printing errno, which is certainly not gonna work on
Windows.  Now, it's not entirely clear from Microsoft's documentation
whether _create_locale() adheres to standard Windows error reporting
conventions, but let's assume it does and try to map the GetLastError
result to an errno.  If this turns out not to work, probably the best
thing to do will be to assume the error is always ENOENT on Windows.

This is a longstanding bug, but given the lack of previous field
complaints, I'm not excited about back-patching it.

Per report from Murtuza Zabuawala.

Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 16:11:51 -04:00
Tom Lane f97256570f Allow creation of C/POSIX collations without depending on libc behavior.
Most of our collations code has special handling for the locale names
"C" and "POSIX", allowing those collations to be used whether or not
the system libraries think those locale names are valid, or indeed
whether said libraries even have any locale support.  But we missed
handling things that way in CREATE COLLATION.  This meant you couldn't
clone the C/POSIX collations, nor explicitly define a new collation
using those locale names, unless the libraries allow it.  That's pretty
pointless, as well as being a violation of pg_newlocale_from_collation's
API specification.

The practical effect of this change is quite limited: it allows creating
such collations even on platforms that don't HAVE_LOCALE_T, and it allows
making "POSIX" collation objects on Windows, which before this would only
let you make "C" collation objects.  Hence, even though this is a bug fix
IMO, it doesn't seem worth the trouble to back-patch.

In passing, suppress the DROP CASCADE detail messages at the end of the
collation regression test.  I'm surprised we've never been bit by
message ordering issues there.

Per report from Murtuza Zabuawala.

Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01 13:51:05 -04:00
Dean Rasheed 4de6216877 Comment fix for partition_rbound_cmp().
This was an oversight in d363d42.

Beena Emerson
2017-08-01 09:40:45 +01:00
Peter Eisentraut f40254a799 Fix typo
Author: Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
2017-07-31 17:08:14 -04:00
Heikki Linnakangas c0a15e07cd Always use 2048 bit DH parameters for OpenSSL ephemeral DH ciphers.
1024 bits is considered weak these days, but OpenSSL always passes 1024 as
the key length to the tmp_dh callback. All the code to handle other key
lengths is, in fact, dead.

To remedy those issues:

* Only include hard-coded 2048-bit parameters.
* Set the parameters directly with SSL_CTX_set_tmp_dh(), without the
  callback
* The name of the file containing the DH parameters is now a GUC. This
  replaces the old hardcoded "dh1024.pem" filename. (The files for other
  key lengths, dh512.pem, dh2048.pem, etc. were never actually used.)

This is not a new problem, but it doesn't seem worth the risk and churn to
backport. If you care enough about the strength of the DH parameters on
old versions, you can create custom DH parameters, with as many bits as you
wish, and put them in the "dh1024.pem" file.

Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier.

Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com
2017-07-31 22:36:09 +03:00
Tatsuo Ishii 393d47ed0f Add missing comment in postgresql.conf.
current_source requires to restart server to reflect the new
value. Per Yugo Nagata and Masahiko Sawada.

Back patched to 9.2 and beyond.
2017-07-31 11:24:51 +09:00
Tatsuo Ishii 8b015dd723 Add missing comment in postgresql.conf.
dynamic_shared_memory_type requires to restart server to reflect
the new value. Per Yugo Nagata and Masahiko Sawada.

Back pached to 9.4 and beyond.
2017-07-31 11:06:37 +09:00
Tatsuo Ishii 9fe63092b5 Add missing comment in postgresql.conf.
max_logical_replication_workers requires to restart server to reflect
the new value. Per Yugo Nagata. Minor editing by me.
2017-07-31 10:46:32 +09:00
Andres Freund cc9f08b6b8 Move ExecProcNode from dispatch to function pointer based model.
This allows us to add stack-depth checks the first time an executor
node is called, and skip that overhead on following
calls. Additionally it yields a nice speedup.

While it'd probably have been a good idea to have that check all
along, it has become more important after the new expression
evaluation framework in b8d7f053c5 - there's no stack depth
check in common paths anymore now. We previously relied on
ExecEvalExpr() being executed somewhere.

We should move towards that model for further routines, but as this is
required for v10, it seems better to only do the necessary (which
already is quite large).

Author: Andres Freund, Tom Lane
Reported-By: Julien Rouhaud
Discussion:
    https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
    https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-07-30 16:18:21 -07:00
Andres Freund d47cfef711 Move interrupt checking from ExecProcNode() to executor nodes.
In a followup commit ExecProcNode(), and especially the large switch
it contains, will largely be replaced by a function pointer directly
to the correct node. The node functions will then get invoked by a
thin inline function wrapper. To avoid having to include miscadmin.h
in headers - CHECK_FOR_INTERRUPTS() - move the interrupt checks into
the individual executor routines.

While looking through all executor nodes, I noticed a number of
arguably missing interrupt checks, add these too.

Author: Andres Freund, Tom Lane
Reviewed-By: Tom Lane
Discussion:
    https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
2017-07-30 16:06:42 -07:00
Alvaro Herrera 5e3254f086 Update copyright in recently added files 2017-07-26 18:17:18 -04:00
Alvaro Herrera 459c64d322 Fix concurrent locking of tuple update chain
If several sessions are concurrently locking a tuple update chain with
nonconflicting lock modes using an old snapshot, and they all succeed,
it may happen that some of them fail because of restarting the loop (due
to a concurrent Xmax change) and getting an error in the subsequent pass
while trying to obtain a tuple lock that they already have in some tuple
version.

This can only happen with very high concurrency (where a row is being
both updated and FK-checked by multiple transactions concurrently), but
it's been observed in the field and can have unpleasant consequences
such as an FK check failing to see a tuple that definitely exists:
    ERROR:  insert or update on table "child_table" violates foreign key constraint "fk_constraint_name"
    DETAIL:  Key (keyid)=(123456) is not present in table "parent_table".
(where the key is observably present in the table).

Discussion: https://postgr.es/m/20170714210011.r25mrff4nxjhmf3g@alvherre.pgsql
2017-07-26 17:24:16 -04:00
Alvaro Herrera c28e4f4dc6 Remove obsolete comments about functional dependencies
Initial submitted versions of the functional dependencies patch ignored
row groups that were smaller than a configured size.  However, that
consideration was removed in late stages of the patch just before
commit, but some comments referring to it remained.  Remove them to
avoid confusion.

Author: Atsushi Torikoshi
Discussion: https://postgr.es/m/7cfb23fc-4493-9c02-5da9-e505fd0115d2@lab.ntt.co.jp
2017-07-26 11:40:39 -04:00
Alvaro Herrera 9915de6c1c Fix race conditions in replication slot operations
It is relatively easy to get a replication slot to look as still active
while one process is in the process of getting rid of it; when some
other process tries to "acquire" the slot, it would fail with an error
message of "replication slot XYZ is active for PID N".

The error message in itself is fine, except that when the intention is
to drop the slot, it is unhelpful: the useful behavior would be to wait
until the slot is no longer acquired, so that the drop can proceed.  To
implement this, we use a condition variable so that slot acquisition can
be told to wait on that condition variable if the slot is already
acquired, and we make any change in active_pid broadcast a signal on the
condition variable.  Thus, as soon as the slot is released, the drop
will proceed properly.

Reported by: Tom Lane
Discussion: https://postgr.es/m/11904.1499039688@sss.pgh.pa.us
Authors: Petr Jelínek, Álvaro Herrera
2017-07-25 13:26:49 -04:00
Robert Haas 4132dbec69 Fix partitioning crashes during error reporting.
In various places where we reverse-map a tuple before calling
ExecBuildSlotValueDescription, we neglected to ensure that the
slot descriptor matched the tuple stored in it.

Amit Langote and Amit Khandekar, reviewed by Etsuro Fujita

Discussion: http://postgr.es/m/CAJ3gD9cqpP=WvJj=dv1ONkPWjy8ZuUaOM4_x86i3uQPas=0_jg@mail.gmail.com
2017-07-24 18:08:08 -04:00
Tom Lane e2c8100e60 Fix race condition in predicate-lock init code in EXEC_BACKEND builds.
Trading a little too heavily on letting the code path be the same whether
we were creating shared data structures or only attaching to them,
InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry
unconditionally.  This is just wrong if we're in a postmaster child,
which would only reach this code in EXEC_BACKEND builds.  Most of the
time, the hash_search(HASH_ENTER) call would simply report that the
entry already existed, causing no visible effect since the code did not
bother to check for that possibility.  However, if this happened while
some other backend had transiently removed the "scratch" entry, then
that other backend's eventual RestoreScratchTarget would suffer an
assert failure; this appears to be the explanation for a recent failure
on buildfarm member culicidae.  In non-assert builds, there would be
no visible consequences there either.  But nonetheless this is a pretty
bad bug for EXEC_BACKEND builds, for two reasons:

1. Each new backend would perform the hash_search(HASH_ENTER) call
without holding any lock that would prevent concurrent access to the
PredicateLockTargetHash hash table.  This creates a low but certainly
nonzero risk of corruption of that hash table.

2. In the event that the race condition occurred, by reinserting the
scratch entry too soon, we were defeating the entire purpose of the
scratch entry, namely to guarantee that transaction commit could move
hash table entries around with no risk of out-of-memory failure.
The odds of an actual OOM failure are quite low, but not zero, and if
it did happen it would again result in corruption of the hash table.

The user-visible symptoms of such corruption are a little hard to predict,
but would presumably amount to misbehavior of SERIALIZABLE transactions
that'd require a crash or postmaster restart to fix.

To fix, just skip the hash insertion if IsUnderPostmaster.  I also
inserted a bunch of assertions that the expected things happen
depending on whether IsUnderPostmaster is true.  That might be overkill,
since most comparable code in other functions isn't quite that paranoid,
but once burnt twice shy.

In passing, also move a couple of lines to places where they seemed
to make more sense.

Diagnosis of problem by Thomas Munro, patch by me.  Back-patch to
all supported branches.

Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
2017-07-24 16:45:58 -04:00
Robert Haas 7086be6e36 When WCOs are present, disable direct foreign table modification.
If the user modifies a view that has CHECK OPTIONs and this gets
translated into a modification to an underlying relation which happens
to be a foreign table, the check options should be enforced.  In the
normal code path, that was happening properly, but it was not working
properly for "direct" modification because the whole operation gets
pushed to the remote side in that case and we never have an option to
enforce the constraint against individual tuples.  Fix by disabling
direct modification when there is a need to enforce CHECK OPTIONs.

Etsuro Fujita, reviewed by Kyotaro Horiguchi and by me.

Discussion: http://postgr.es/m/f8a48f54-6f02-9c8a-5250-9791603171ee@lab.ntt.co.jp
2017-07-24 15:57:24 -04:00
Tom Lane b4af9e3f37 Ensure that pg_get_ruledef()'s output matches pg_get_viewdef()'s.
Various cases involving renaming of view columns are handled by having
make_viewdef pass down the view's current relation tupledesc to
get_query_def, which then takes care to use the column names from the
tupledesc for the output column names of the SELECT.  For some reason
though, we'd missed teaching make_ruledef to do similarly when it is
printing an ON SELECT rule, even though this is exactly the same case.
The results from pg_get_ruledef would then be different and arguably wrong.
In particular, this breaks pre-v10 versions of pg_dump, which in some
situations would define views by means of emitting a CREATE RULE ... ON
SELECT command.  Third-party tools might not be happy either.

In passing, clean up some crufty code in make_viewdef; we'd apparently
modernized the equivalent code in make_ruledef somewhere along the way,
and missed this copy.

Per report from Gilles Darold.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/ec05659a-40ff-4510-fc45-ca9d965d0838@dalibo.com
2017-07-24 15:16:31 -04:00
Tom Lane 278cb43411 Be more consistent about errors for opfamily member lookup failures.
Add error checks in some places that were calling get_opfamily_member
or get_opfamily_proc and just assuming that the call could never fail.
Also, standardize the wording for such errors in some other places.

None of these errors are expected in normal use, hence they're just
elog not ereport.  But they may be handy for diagnosing omissions in
custom opclasses.

Rushabh Lathia found the oversight in RelationBuildPartitionKey();
I found the others by grepping for all callers of these functions.

Discussion: https://postgr.es/m/CAGPqQf2R9Nk8htpv0FFi+FP776EwMyGuORpc9zYkZKC8sFQE3g@mail.gmail.com
2017-07-24 11:23:27 -04:00
Tom Lane ab2324fd46 Improve comments about partitioned hash table freelists.
While I couldn't find any live bugs in commit 44ca4022f, the comments
seemed pretty far from adequate; in particular it was not made plain that
"borrowing" entries from other freelists is critical for correctness.
Try to improve the commentary.  A couple of very minor code style
tweaks, as well.

Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
2017-07-22 18:02:26 -04:00
Alvaro Herrera de38489b92 Fix typo in comment
Commit fd31cd2651 renamed the variable to skipping_blocks, but forgot
to update this comment.

Noticed while inspecting code.
2017-07-21 20:08:53 -04:00
Teodor Sigaev 7e1fb4c59e Fix double shared memory allocation.
SLRU buffer lwlocks are allocated twice by oversight in commit
fe702a7b3f where that locks were moved to
separate tranche. The bug doesn't have user-visible effects except small
overspending of shared memory.

Backpatch to 9.6 where it was introduced.

Alexander Korotkov with small editorization by me.
2017-07-21 13:31:20 +03:00
Dean Rasheed d363d42bb9 Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds.
Previously, UNBOUNDED meant no lower bound when used in the FROM list,
and no upper bound when used in the TO list, which was OK for
single-column range partitioning, but problematic with multiple
columns. For example, an upper bound of (10.0, UNBOUNDED) would not be
collocated with a lower bound of (10.0, UNBOUNDED), thus making it
difficult or impossible to define contiguous multi-column range
partitions in some cases.

Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to
represent a partition column that is unbounded below or above
respectively. This syntax removes any ambiguity, and ensures that if
one partition's lower bound equals another partition's upper bound,
then the partitions are contiguous.

Also drop the constraint prohibiting finite values after an unbounded
column, and just document the fact that any values after MINVALUE or
MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
multiple times, which was needlessly verbose.

Note: Forces a post-PG 10 beta2 initdb.

Report by Amul Sul, original patch by Amit Langote with some
additional hacking by me.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-21 09:20:47 +01:00
Tom Lane eb145fdfea Fix dumping of outer joins with empty qual lists.
Normally, a JoinExpr would have empty "quals" only if it came from CROSS
JOIN syntax.  However, it's possible to get to this state by specifying
NATURAL JOIN between two tables with no common column names, and there
might be other ways too.  The code previously printed no ON clause if
"quals" was empty; that's right for CROSS JOIN but syntactically invalid
if it's some type of outer join.  Fix by printing ON TRUE in that case.

This got broken by commit 2ffa740be, which stopped using NATURAL JOIN
syntax in ruleutils output due to its brittleness in the face of
column renamings.  Back-patch to 9.3 where that commit appeared.

Per report from Tushar Ahuja.

Discussion: https://postgr.es/m/98b283cd-6dda-5d3f-f8ac-87db8c76a3da@enterprisedb.com
2017-07-20 11:29:36 -04:00
Tom Lane 3cb29c42f9 Add static assertions about pg_control fitting into one disk sector.
When pg_control was first designed, sizeof(ControlFileData) was small
enough that a comment seemed like plenty to document the assumption that
it'd fit into one disk sector.  Now it's nearly 300 bytes, raising the
possibility that somebody would carelessly add enough stuff to create
a problem.  Let's add a StaticAssertStmt() to ensure that the situation
doesn't pass unnoticed if it ever occurs.

While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it
clearer what that symbol means, and convert the existing runtime
comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be
static asserts --- we didn't have that technology when this code was
first written.

Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us
2017-07-19 16:16:57 -04:00
Tom Lane 04a2c7f412 Improve make_tsvector() to handle empty input, and simplify its callers.
It seemed a bit silly that each caller of make_tsvector() was laboriously
special-casing the situation where no lexemes were found, when it would
be easy and much more bullet-proof to make make_tsvector() handle that.
2017-07-18 13:13:47 -04:00
Tom Lane b4c6d31c0b Fix serious performance problems in json(b) to_tsvector().
In an off-list followup to bug #14745, Bob Jones complained that
to_tsvector() on a 2MB jsonb value took an unreasonable amount of
time and space --- enough to draw the wrath of the OOM killer on
his machine.  On my machine, his example proved to require upwards
of 18 seconds and 4GB, which seemed pretty bogus considering that
to_tsvector() on the same data treated as text took just a couple
hundred msec and 10 or so MB.

On investigation, the problem is that the implementation scans each
string element of the json(b) and converts it to tsvector separately,
then applies tsvector_concat() to join those separate tsvectors.
The unreasonable memory usage came from leaking every single one of
the transient tsvectors --- but even without that mistake, this is an
O(N^2) or worse algorithm, because tsvector_concat() has to repeatedly
process the words coming from earlier elements.

We can fix it by accumulating all the lexeme data and applying
make_tsvector() just once.  As a side benefit, that also makes the
desired adjustment of lexeme positions far cheaper, because we can
just tweak the running "pos" counter between JSON elements.

In passing, try to make the explanation of that tweak more intelligible.
(I didn't think that a barely-readable comment far removed from the
actual code was helpful.)  And do some minor other code beautification.
2017-07-18 12:45:51 -04:00
Robert Haas c85ec643ff Reverse-convert row types in ExecWithCheckOptions.
Just as we already do in ExecConstraints, and for the same reason:
to improve the quality of error messages.

Etsuro Fujita, reviewed by Amit Langote

Discussion: http://postgr.es/m/56e0baa8-e458-2bbb-7936-367f7d832e43@lab.ntt.co.jp
2017-07-17 21:56:31 -04:00
Robert Haas f81a91db4d Use a real RT index when setting up partition tuple routing.
Before, we always used a dummy value of 1, but that's not right when
the partitioned table being modified is inside of a WITH clause
rather than part of the main query.

Amit Langote, reported and reviewd by Etsuro Fujita, with a comment
change by me.

Discussion: http://postgr.es/m/ee12f648-8907-77b5-afc0-2980bcb0aa37@lab.ntt.co.jp
2017-07-17 21:29:45 -04:00
Robert Haas 09c2e7cd2f hash: Fix write-ahead logging bugs related to init forks.
One, logging for CREATE INDEX was oblivious to the fact that when
an unlogged table is created, *only* operations on the init fork
should be logged.

Two, init fork buffers need to be flushed after they are written;
otherwise, a filesystem-level copy following recovery may do the
wrong thing.  (There may be a better fix for this issue than the
one used here, but this is transposed from the similar logic already
present in XLogReadBufferForRedoExtended, and a broader refactoring
after beta2 seems inadvisable.)

Amit Kapila, reviewed by Ashutosh Sharma, Kyotaro Horiguchi,
and Michael Paquier

Discussion: http://postgr.es/m/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
2017-07-17 12:03:35 -04:00
Tom Lane de2af6e001 Improve comments for execExpr.c's handling of FieldStore subexpressions.
Given this code's general eagerness to use subexpressions' output variables
as temporary workspace, it's not exactly clear that it is safe for
FieldStore to tell a newval subexpression that it can write into the same
variable that is being supplied as a potential input.  Document the chain
of assumptions needed for that to be safe.
2017-07-15 16:57:43 -04:00
Tom Lane e9b64824a0 Improve comments for execExpr.c's isAssignmentIndirectionExpr().
I got confused about why this function doesn't need to recursively
search the expression tree for a CaseTestExpr node.  After figuring
that out, add a comment to save the next person some time.
2017-07-15 14:03:39 -04:00
Tom Lane decb08ebdf Code review for NextValueExpr expression node type.
Add missing infrastructure for this node type, notably in ruleutils.c where
its lack could demonstrably cause EXPLAIN to fail.  Add outfuncs/readfuncs
support.  (outfuncs support is useful today for debugging purposes.  The
readfuncs support may never be needed, since at present it would only
matter for parallel query and NextValueExpr should never appear in a
parallelizable query; but it seems like a bad idea to have a primnode type
that isn't fully supported here.)  Teach planner infrastructure that
NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node
with cost cpu_operator_cost.  Given its limited scope of usage, there
*might* be no live bug today from the lack of that knowledge, but it's
certainly going to bite us on the rear someday.  Teach pg_stat_statements
about the new node type, too.

While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction,
XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost.
Failing to do this for SQLValueFunction was an oversight in my commit
0bb51aa96.  The others are longer-standing oversights, but no time like the
present to fix them.  (In principle, CoerceToDomain could have cost much
higher than this, but it doesn't presently seem worth trying to examine the
domain's constraints here.)

Modify execExprInterp.c to execute NextValueExpr as an out-of-line
function; it seems quite unlikely to me that it's worth insisting that
it be inlined in all expression eval methods.  Besides, providing the
out-of-line function doesn't stop anyone from inlining if they want to.

Adjust some places where NextValueExpr support had been inserted with the
aid of a dartboard rather than keeping it in the same order as elsewhere.

Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
2017-07-14 15:25:43 -04:00
Tom Lane a3ca72ae9a Fix dumping of FUNCTION RTEs that contain non-function-call expressions.
The grammar will only accept something syntactically similar to a function
call in a function-in-FROM expression.  However, there are various ways
to input something that ruleutils.c won't deparse that way, potentially
leading to a view or rule that fails dump/reload.  Fix by inserting a
dummy CAST around anything that isn't going to deparse as a function
(which is one of the ways to get something like that in there in the
first place).

In HEAD, also make use of the infrastructure added by this to avoid
emitting unnecessary parentheses in CREATE INDEX deparsing.  I did
not change that in back branches, thinking that people might find it
to be unexpected/unnecessary behavioral change.

In HEAD, also fix incorrect logic for when to add extra parens to
partition key expressions.  Somebody apparently thought they could
get away with simpler logic than pg_get_indexdef_worker has, but
they were wrong --- a counterexample is PARTITION BY LIST ((a[1])).
Ignoring the prettyprint flag for partition expressions isn't exactly
a nice solution anyway.

This has been broken all along, so back-patch to all supported branches.

Discussion: https://postgr.es/m/10477.1499970459@sss.pgh.pa.us
2017-07-13 19:25:03 -04:00
Heikki Linnakangas 74fc83869e Fix race between GetNewTransactionId and GetOldestActiveTransactionId.
The race condition goes like this:

1. GetNewTransactionId advances nextXid e.g. from 100 to 101
2. GetOldestActiveTransactionId reads the new nextXid, 101
3. GetOldestActiveTransactionId loops through the proc array. There are no
   active XIDs there, so it returns 101 as the oldest active XID.
4. GetNewTransactionid stores XID 100 to MyPgXact->xid

So, GetOldestActiveTransactionId returned XID 101, even though 100 only
just started and is surely still running.

This would be hard to hit in practice, and even harder to spot any ill
effect if it happens. GetOldestActiveTransactionId is only used when
creating a checkpoint in a master server, and the race condition can only
happen on an online checkpoint, as there are no backends running during a
shutdown checkpoint. The oldestActiveXid value of an online checkpoint is
only used when starting up a hot standby server, to determine the starting
point where pg_subtrans is initialized from. For the race condition to
happen, there must be no other XIDs in the proc array that would hold back
the oldest-active XID value, which means that the missed XID must be a top
transaction's XID. However, pg_subtrans is not used for top XIDs, so I
believe an off-by-one error is in fact inconsequential. Nevertheless, let's
fix it, as it's clearly wrong and the fix is simple.

This has been wrong ever since hot standby was introduced, so backport to
all supported versions.

Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
2017-07-13 15:47:02 +03:00
Tom Lane bc2d716ad0 Fix ruleutils.c for domain-over-array cases, too.
Further investigation shows that ruleutils isn't quite up to speed either
for cases where we have a domain-over-array: it needs to be prepared to
look past a CoerceToDomain at the top level of field and element
assignments, else it decompiles them incorrectly.  Potentially this would
result in failure to dump/reload a rule, if it looked like the one in the
new test case.  (I also added a test for EXPLAIN; that output isn't broken,
but clearly we need more test coverage here.)

Like commit b1cb32fb6, this bug is reachable in cases we already support,
so back-patch all the way.
2017-07-12 18:00:04 -04:00
Heikki Linnakangas da11977de9 Reduce memory usage of tsvector type analyze function.
compute_tsvector_stats() detoasted and kept in memory every tsvector value
in the sample, but that can be a lot of memory. The original bug report
described a case using over 10 gigabytes, with statistics target of 10000
(the maximum).

To fix, allocate a separate copy of just the lexemes that we keep around,
and free the detoasted tsvector values as we go. This adds some palloc/pfree
overhead, when you have a lot of distinct lexemes in the sample, but it's
better than running out of memory.

Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to
all supported versions.

Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org
2017-07-12 22:06:13 +03:00
Tom Lane 512f67c8d0 Avoid integer overflow while sifting-up a heap in tuplesort.c.
If the number of tuples in the heap exceeds approximately INT_MAX/2,
this loop's calculation "2*i+1" could overflow, resulting in a crash.
Fix it by using unsigned int rather than int for the relevant local
variables; that shouldn't cost anything extra on any popular hardware.
Per bug #14722 from Sergey Koposov.

Original patch by Sergey Koposov, modified by me per a suggestion
from Heikki Linnakangas to use unsigned int not int64.

Back-patch to 9.4, where tuplesort.c grew the ability to sort as many
as INT_MAX tuples in-memory (commit 263865a48).

Discussion: https://postgr.es/m/20170629161637.1478.93109@wrigleys.postgresql.org
2017-07-12 13:24:16 -04:00
Heikki Linnakangas ca906f68f2 Fix variable and type name in comment.
Kyotaro Horiguchi

Discussion: https://www.postgresql.org/message-id/20170711.163441.241981736.horiguchi.kyotaro@lab.ntt.co.jp
2017-07-12 17:07:35 +03:00
Heikki Linnakangas 49a3360209 Fix ordering of operations in SyncRepWakeQueue to avoid assertion failure.
Commit 14e8803f1 removed the locking in SyncRepWaitForLSN, but that
introduced a race condition, where SyncRepWaitForLSN might see
syncRepState already set to SYNC_REP_WAIT_COMPLETE, but the process was
not yet removed from the queue. That tripped the assertion, that the
process should no longer be in the uqeue. Reorder the operations in
SyncRepWakeQueue to remove the process from the queue first, and update
syncRepState only after that, and add a memory barrier in between to make
sure the operations are made visible to other processes in that order.

Fixes bug #14721 reported by Const Zhang. Analysis and fix by Thomas Munro.
Backpatch down to 9.5, where the locking was removed.

Discussion: https://www.postgresql.org/message-id/20170629023623.1480.26508%40wrigleys.postgresql.org
2017-07-12 15:30:52 +03:00
Tom Lane b1cb32fb62 Fix multiple assignments to a column of a domain type.
We allow INSERT and UPDATE commands to assign to the same column more than
once, as long as the assignments are to subfields or elements rather than
the whole column.  However, this failed when the target column was a domain
over array rather than plain array.  Fix by teaching process_matched_tle()
to look through CoerceToDomain nodes, and add relevant test cases.

Also add a group of test cases exercising domains over array of composite.
It's doubtless accidental that CREATE DOMAIN allows this case while not
allowing straight domain over composite; but it does, so we'd better make
sure we don't break it.  (I could not find any documentation mentioning
either side of that, so no doc changes.)

It's been like this for a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-07-11 16:48:59 -04:00
Alvaro Herrera 6c774caf0e Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: c5a8de3653bb1af6b0eb41cc6bf090c5522df52b
2017-07-10 11:53:55 -04:00
Tom Lane 45e004fb4e On Windows, retry process creation if we fail to reserve shared memory.
We've heard occasional reports of backend launch failing because
pgwin32_ReserveSharedMemoryRegion() fails, indicating that something
has already used that address space in the child process.  It's not
very clear what, given that we disable ASLR in Windows builds, but
suspicion falls on antivirus products.  It'd be better if we didn't
have to disable ASLR, anyway.  So let's try to ameliorate the problem
by retrying the process launch after such a failure, up to 100 times.

Patch by me, based on previous work by Amit Kapila and others.
This is a longstanding issue, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAA4eK1+R6hSx6t_yvwtx+NRzneVp+MRqXAdGJZChcau8Uij-8g@mail.gmail.com
2017-07-10 11:00:09 -04:00
Andrew Gierth 1add0b15f1 Fix COPY's handling of transition tables with indexes.
Commit c46c0e5202 failed to pass the
TransitionCaptureState object to ExecARInsertTriggers() in the case
where it's using heap_multi_insert and there are indexes.  Repair.

Thomas Munro, from a report by David Fetter
Discussion: https://postgr.es/m/20170708084213.GA14720%40fetter.org
2017-07-10 11:40:08 +01:00
Tom Lane ec4073f641 Avoid unreferenced-function warning on low-functionality platforms.
On platforms lacking both locale_t and ICU, collationcmds.c failed
to make any use of its static function is_all_ascii(), thus probably
drawing a compiler warning.  Oversight in my commit ddb5fdc06.
Per buildfarm member gaur.
2017-07-08 12:42:25 -04:00
Alvaro Herrera 46e9151942 Fix typo
Noticed while reviewing code.
2017-07-07 17:10:36 -04:00
Teodor Sigaev 31b8db8e6c Fix potential data corruption during freeze
Fix oversight in 3b97e6823b bug fix. Bitwise AND is used instead of OR and
it cleans all bits in t_infomask heap tuple field.

Backpatch to 9.3
2017-07-06 17:18:55 +03:00
Dean Rasheed f1dae097f2 Clarify the contract of partition_rbound_cmp().
partition_rbound_cmp() is intended to compare range partition bounds
in a way such that if all the bound values are equal but one is an
upper bound and one is a lower bound, the upper bound is treated as
smaller than the lower bound. This particular ordering is required by
RelationBuildPartitionDesc() when building the PartitionBoundInfoData,
so that it can consistently keep only the upper bounds when upper and
lower bounds coincide.

Update the function comment to make that clearer.

Also, fix a (currently unreachable) corner-case bug -- if the bound
values coincide and they contain unbounded values, fall through to the
lower-vs-upper comparison code, rather than immediately returning
0. Currently it is not possible to define coincident upper and lower
bounds containing unbounded columns, but that may change in the
future, so code defensively.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-06 12:46:08 +01:00
Dean Rasheed c03911d945 Simplify the logic checking new range partition bounds.
The previous logic, whilst not actually wrong, was overly complex and
involved doing two binary searches, where only one was really
necessary. This simplifies that logic and improves the comments.

One visible change is that if the new partition overlaps multiple
existing partitions, the error message now always reports the overlap
with the first existing partition (the one with the lowest
bounds). The old code would sometimes report the clash with the first
partition and sometimes with the last one.

Original patch idea from Amit Langote, substantially rewritten by me.

Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
2017-07-06 09:58:06 +01:00
Peter Eisentraut d80e73f229 Fix output of char node fields
WRITE_CHAR_FIELD() didn't do any escaping, so that for example a zero
byte would cause the whole output string to be truncated.  To fix, pass
the char through outToken(), so it is escaped like a string.  Adjust the
reading side to handle this.
2017-07-05 10:51:54 -04:00
Peter Eisentraut cb9079cd51 Improve subscription locking
This avoids "tuple concurrently updated" errors when a ALTER or DROP
SUBSCRIPTION writes to pg_subscription_rel at the same time as a worker.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-07-03 22:47:06 -04:00
Heikki Linnakangas b93827c745 Treat clean shutdown of an SSL connection same as the non-SSL case.
If the client closes an SSL connection, treat it the same as EOF on a
non-SSL connection. In particular, don't write a message in the log about
that.

Michael Paquier.

Discussion: https://www.postgresql.org/message-id/CAB7nPqSfyVV42Q2acFo%3DvrvF2gxoZAMJLAPq3S3KkjhZAYi7aw@mail.gmail.com
2017-07-03 14:51:51 +03:00
Peter Eisentraut d8b3c81335 Refine memory allocation in ICU conversions
The simple calculations done to estimate the size of the output buffers
for ucnv_fromUChars() and ucnv_toUChars() could overflow int32_t for
large strings.  To avoid that, go the long way and run the function
first without an output buffer to get the correct output buffer size
requirement.
2017-07-01 23:08:37 -04:00
Tom Lane f32678c016 Reduce delay for last logicalrep feedback message when master goes idle.
The regression tests contain numerous cases where we do some activity on a
master server and then wait till the slave has ack'd flushing its copy of
that transaction.  Because WAL flush on the slave is asynchronous to the
logicalrep worker process, the worker cannot send such a feedback message
during the LogicalRepApplyLoop iteration where it processes the last data
from the master.  In the previous coding, the feedback message would come
out only when the loop's WaitLatchOrSocket call returned WL_TIMEOUT.  That
requires one full second of delay (NAPTIME_PER_CYCLE); and to add insult
to injury, it could take more than that if the WaitLatchOrSocket was
interrupted a few times by latch-setting events.

In reality we can expect the slave's walwriter process to have flushed the
WAL data after, more or less, WalWriterDelay (typically 200ms).  Hence,
if there are unacked transactions pending, make the wait delay only that
long rather than the full NAPTIME_PER_CYCLE.  Also, move one of the
send_feedback() calls into the loop main line, so that we'll check for the
need to send feedback even if we were woken by a latch event and not either
socket data or timeout.

It's not clear how much this matters for production purposes, but
it's definitely helpful for testing.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 12:15:51 -04:00
Tom Lane 799f8bc76a Shorten timeouts while waiting for logicalrep worker slot attach/detach.
When waiting for a logical replication worker process to start or stop,
we have to busy-wait until we see it add or remove itself from the
LogicalRepWorker slot in shared memory.  Those loops were using a
one-second delay between checks, but on any reasonably modern machine, it
doesn't take more than a couple of msec for a worker to spawn or shut down.
Reduce the loop delays to 10ms to avoid wasting quite so much time in the
related regression tests.

In principle, a better solution would be to fix things so that the waiting
process can be awakened via its latch at the right time.  But that seems
considerably more invasive, which is undesirable for a post-beta fix.
Worker start/stop performance likely isn't of huge interest anyway for
production purposes, so we might not ever get around to it.

In passing, rearrange the second wait loop in logicalrep_worker_stop()
so that the lock is held at the top of the loop, thus saving one lock
acquisition/release per call, and making it look more like the other loop.

Discussion: https://postgr.es/m/30864.1498861103@sss.pgh.pa.us
2017-07-01 11:59:44 -04:00
Peter Eisentraut ef74e03ef6 Fix UPDATE of GENERATED ALWAYS identity columns
The bug would previously prevent the update of any column in a table
with identity columns, rather than just the actual identity column.

Reported-by: zam6ak@gmail.com
Bug: #14718
2017-06-30 23:44:17 -04:00
Alvaro Herrera 572d6ee6d4 Fix locking in WAL receiver/sender shmem state structs
In WAL receiver and WAL server, some accesses to their corresponding
shared memory control structs were done without holding any kind of
lock, which could lead to inconsistent and possibly insecure results.

In walsender, fix by clarifying the locking rules and following them
correctly, as documented in the new comment in walsender_private.h;
namely that some members can be read in walsender itself without a lock,
because the only writes occur in the same process.  The rest of the
struct requires spinlock for accesses, as usual.

In walreceiver, fix by always holding spinlock while accessing the
struct.

While there is potentially a problem in all branches, it is minor in
stable ones.  This only became a real problem in pg10 because of quorum
commit in synchronous replication (commit 3901fd70cc), and a potential
security problem in walreceiver because a superuser() check was removed
by default monitoring roles (commit 25fff40798).  Thus, no backpatch.

In passing, clean up some leftover braces which were used to create
unconditional blocks.  Once upon a time these were used for
volatile-izing accesses to those shmem structs, which is no longer
required.  Many other occurrences of this pattern remain.

Author: Michaël Paquier
Reported-by: Michaël Paquier
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi, Thomas Munro,
	Robert Haas
Discussion: https://postgr.es/m/CAB7nPqTWYqtzD=LN_oDaf9r-hAjUEPAy0B9yRkhcsLdRN8fzrw@mail.gmail.com
2017-06-30 18:06:33 -04:00
Peter Eisentraut b295cc3b9a Fix typo in comment
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-30 15:54:39 -04:00
Tom Lane 1f201a818a Fix race conditions and missed wakeups in syncrep worker signaling.
When a sync worker is waiting for the associated apply worker to notice
that it's in SYNCWAIT state, wait_for_worker_state_change() would just
patiently wait for that to happen.  This generally required waiting for
the 1-second timeout in LogicalRepApplyLoop to elapse.  Kicking the worker
via its latch makes things significantly snappier.

While at it, fix race conditions that could potentially result in crashes:
we can *not* call logicalrep_worker_wakeup_ptr() once we've released the
LogicalRepWorkerLock, because worker->proc might've been reset to NULL
after we do that (indeed, there's no really solid reason to believe that
the LogicalRepWorker slot even belongs to the same worker anymore).
In logicalrep_worker_wakeup(), we can just move the wakeup inside the
lock scope.  In process_syncing_tables_for_apply(), a bit more code
rearrangement is needed.

Also improve some nearby comments.
2017-06-30 14:57:14 -04:00
Peter Eisentraut da8f26ec4e Fix typo in comment
Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
2017-06-30 14:48:43 -04:00
Tom Lane 609fa63db6 Check for error during PQendcopy.
Oversight in commit 78c8c8143; noted while nosing around the
walreceiver startup/shutdown code.
2017-06-30 12:22:33 -04:00
Tom Lane fca85f8ef1 Fix walsender to exit promptly if client requests shutdown.
It's possible for WalSndWaitForWal to be asked to wait for WAL that doesn't
exist yet.  That's fine, in fact it's the normal situation if we're caught
up; but when the client requests shutdown we should not keep waiting.
The previous coding could wait indefinitely if the source server was idle.

In passing, improve the rather weak comments in this area, and slightly
rearrange some related code for better readability.

Back-patch to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/14154.1498781234@sss.pgh.pa.us
2017-06-30 12:00:15 -04:00
Peter Eisentraut 13a57710db Prohibit creating ICU collation with different ctype
ICU does not support "collate" and "ctype" being different, so the
collctype catalog column is ignored.  But for catalog neatness, ensure
that they are the same.
2017-06-30 11:24:00 -04:00
Robert Haas 7d4a1838ef Add missing period to comment.
Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoA0jjXXhqK6Ym3jZNoUdVhXFyTkWTTTsVSr1vPuKcjsjA@mail.gmail.com
2017-06-30 10:01:45 -04:00
Peter Eisentraut 54baa48139 Copy collencoding in CREATE COLLATION / FROM
This command used to compute the collencoding entry like when a
completely new collation is created.  But for example when copying the
"C" collation, this would then result in a collation that has a
collencoding entry for the current database encoding rather than -1,
thus not making an exact copy.  This has probably no practical impact,
but making this change keeps the catalog contents neat.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2017-06-30 08:50:39 -04:00
Tom Lane f13ea95f9e Change pg_ctl to detect server-ready by watching status in postmaster.pid.
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log.  We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.

In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse.  In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read.  This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f).  pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.

In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order.  This is needed
on Windows where the SHMEM_KEY line will never be written at all.  We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.

Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway.  (Yes, I found that
out the hard way.)

While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious.  This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else.  For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec().  (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)

In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.

Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
2017-06-28 17:31:32 -04:00
Andrew Gierth 8c55244ae3 Fix transition tables for ON CONFLICT.
We now disallow having triggers with both transition tables and ON
INSERT OR UPDATE (which was a PG extension to the spec anyway),
because in this case it's not at all clear how the transition tables
should work for an INSERT ... ON CONFLICT query.  Separate ON INSERT
and ON UPDATE triggers with transition tables are allowed, and the
transition tables for these reflect only the inserted and only the
updated tuples respectively.

Patch by Thomas Munro

Discussion: https://postgr.es/m/CAEepm%3D11KHQ0JmETJQihSvhZB5mUZL2xrqHeXbCeLhDiqQ39%3Dw%40mail.gmail.com
2017-06-28 19:00:55 +01:00
Andrew Gierth c46c0e5202 Fix transition tables for wCTEs.
The original coding didn't handle this case properly; each separate
DML substatement needs its own set of transitions.

Patch by Thomas Munro

Discussion: https://postgr.es/m/CAL9smLCDQ%3D2o024rBgtD4WihzX8B3C6u_oSQ2K3%2BR5grJrV0bg%40mail.gmail.com
2017-06-28 18:59:01 +01:00
Andrew Gierth 501ed02cf6 Fix transition tables for partition/inheritance.
We disallow row-level triggers with transition tables on child tables.
Transition tables for triggers on the parent table contain only those
columns present in the parent.  (We can't mix tuple formats in a
single transition table.)

Patch by Thomas Munro

Discussion: https://postgr.es/m/CA%2BTgmoZzTBBAsEUh4MazAN7ga%3D8SsMC-Knp-6cetts9yNZUCcg%40mail.gmail.com
2017-06-28 18:55:03 +01:00
Tom Lane 99255d73c0 Second try at fixing tcp_keepalives_idle option on Solaris.
Buildfarm evidence shows that TCP_KEEPALIVE_THRESHOLD doesn't exist
after all on Solaris < 11.  This means we need to take positive action to
prevent the TCP_KEEPALIVE code path from being taken on that platform.
I've chosen to limit it with "&& defined(__darwin__)", since it's unclear
that anyone else would follow Apple's precedent of spelling the symbol
that way.

Also, follow a suggestion from Michael Paquier of eliminating code
duplication by defining a couple of intermediate symbols for the
socket option.

In passing, make some effort to reduce the number of translatable messages
by replacing "setsockopt(foo) failed" with "setsockopt(%s) failed", etc,
throughout the affected files.  And update relevant documentation so
that it doesn't claim to provide an exhaustive list of the possible
socket option names.

Like the previous commit (f0256c774), back-patch to all supported branches.

Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org
2017-06-28 12:30:16 -04:00
Tom Lane f0256c774d Support tcp_keepalives_idle option on Solaris.
Turns out that the socket option for this is named TCP_KEEPALIVE_THRESHOLD,
at least according to the tcp(7P) man page for Solaris 11.  (But since that
text refers to "SunOS", it's likely pretty ancient.)  It appears that the
symbol TCP_KEEPALIVE does get defined on that platform, but it doesn't
seem to represent a valid protocol-level socket option.  This leads to
bleats in the postmaster log, and no tcp_keepalives_idle functionality.

Per bug #14720 from Andrey Lizenko, as well as an earlier report from
Dhiraj Chawla that nobody had followed up on.  The issue's been there
since we added the TCP_KEEPALIVE code path in commit 5acd417c8, so
back-patch to all supported branches.

Discussion: https://postgr.es/m/20170627163757.25161.528@wrigleys.postgresql.org
2017-06-27 18:47:57 -04:00
Tom Lane 9c7dc89282 Re-allow SRFs and window functions within sub-selects within aggregates.
check_agg_arguments_walker threw an error upon seeing a SRF or window
function, but that is too aggressive: if the function is within a
sub-select then it's perfectly fine.  I broke the SRF case in commit
0436f6bde by copying the logic for window functions ... but that was
broken too, and had been since commit eaccfded9.

Repair both cases in HEAD, and the window function case back to 9.3.
9.2 gets this right.
2017-06-27 17:51:11 -04:00
Tom Lane e5d494d78c Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and send
a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new
walreceiver to be launched.  There's a race condition, which at least
in HEAD is very easy to hit, whereby the postmaster might see that
signal before it processes the SIGCHLD from the walreceiver process.
In that situation, sigusr1_handler() just dropped the start request
on the floor, reasoning that it must be redundant.  Eventually, after
10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a
fresh request --- but that's a long time if the connection could have
been re-established almost immediately.

Fix it by setting a state flag inside the postmaster that we won't
clear until we do launch a walreceiver.  In cases where that results
in an extra walreceiver launch, it's up to the walreceiver to realize
it's unwanted and go away --- but we have, and need, that logic anyway
for the opposite race case.

I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there in test cases where
a master server is stopped and restarted while leaving streaming
slaves active.

This logic has been broken all along, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
2017-06-26 17:31:56 -04:00
Tom Lane ad1b5c842b Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time
before when it last wrote the relevant stats file.  That's fine, but at
startup when it reads the "permanent" stats files, it absorbed their
timestamps as if they were the times at which the corresponding temporary
stats files had been written.  In reality, of course, there's no data
out there at all.  This led to disregarding inquiry messages soon after
startup if the postmaster had been shut down and restarted within less
than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for
testing and in the field.  Requesting backends would hang for 10 seconds
and then report failure to read statistics, unless they got bailed out
by some other backend coming along and making a newer request within
that interval.

I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there because the autovacuum
launcher hangs for 10 seconds when it can't get statistics at startup,
thus preventing a second shutdown from occurring promptly.  We might
want to do some things in the autovac code to make it less prone to
getting stuck that way, but this change is a good bug fix regardless.

In passing, also fix pgstat_read_statsfiles() to ensure that it
re-zeroes its global stats variables if they are corrupted by a
short read from the stats file.  (Other reads in that function
go into temp variables, so that the issue doesn't arise.)

This has been broken since we created the separation between permanent
and temporary stats files in 8.4, so back-patch to all supported branches.

Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
2017-06-26 16:17:05 -04:00
Tom Lane 5efccc1cb4 Avoid useless "x = ANY(ARRAY[])" test for empty partition list.
This arises in practice if the partition only admits NULL values.

Jeevan Ladhe

Discussion: https://postgr.es/m/CAOgcT0OChrN--uuqH6wG6Z8+nxnCWJ+2Q-uhnK4KOANdRRxuAw@mail.gmail.com
2017-06-26 10:43:20 -04:00
Tom Lane 00c5e511b9 Minor code review for parse_phrase_operator().
Fix its header comment, which described the old behavior of the <N>
phrase distance operator; we missed updating that in commit 028350f61.
Also, reset errno before strtol() call, to defend against the possibility
that it was already ERANGE at entry.  (The lack of complaints says that
it generally isn't, but this is at least a latent bug.)  Very minor
stylistic improvements as well.

Victor Drobny noted the obsolete comment, I noted the errno issue.
Back-patch to 9.6 where this code was added, just in case the errno
issue is a live bug in some cases.

Discussion: https://postgr.es/m/2b5382fdff9b1f79d5eb2c99c4d2cbe2@postgrespro.ru
2017-06-26 10:31:10 -04:00
Tom Lane ddb5fdc068 Further hacking on ICU collation creation and usage.
pg_import_system_collations() refused to create any ICU collations if
the current database's encoding didn't support ICU.  This is wrongheaded:
initdb must initialize pg_collation in an encoding-independent way
since it might be used in other databases with different encodings.
The reason for the restriction seems to be that get_icu_locale_comment()
used icu_from_uchar() to convert the UChar-format display name, and that
unsurprisingly doesn't know what to do in unsupported encodings.
But by the same token that the initial catalog contents must be
encoding-independent, we can't allow non-ASCII characters in the comment
strings.  So we don't really need icu_from_uchar() here: just check for
Unicode codes outside the ASCII range, and if there are none, the format
conversion is trivial.  If there are some, we can simply not install the
comment.  (In my testing, this affects only Norwegian Bokmål, which has
given us trouble before.)

For paranoia's sake, also check for non-ASCII characters in ICU locale
names, and skip such locales, as we do for libc locales.  I don't
currently have a reason to believe that this will ever reject anything,
but then again the libc maintainers should have known better too.

With just the import changes, ICU collations can be found in pg_collation
in databases with unsupported encodings.  This resulted in more or less
clean failures at runtime, but that's not how things act for unsupported
encodings with libc collations.  Make it work the same as our traditional
behavior for libc collations by having collation lookup take into account
whether is_encoding_supported_by_icu().

Adjust documentation to match.  Also, expand Table 23.1 to show which
encodings are supported by ICU.

catversion bump because of likely change in pg_collation/pg_description
initial contents in ICU-enabled builds.

Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-24 13:54:23 -04:00
Simon Riggs a15b47df35 Fix typo in comment in SerializeSnapshot
Author: Masahiko Sawada
2017-06-24 13:51:26 +01:00
Simon Riggs 829f12e269 Revert 1f30295eab
Reported-by: Tom Lane
2017-06-24 13:03:55 +01:00
Tom Lane d1fcc62298 Fix incorrect buffer-length argument to uloc_getDisplayName().
The maxResultSize argument of uloc_getDisplayName is the number of
UChars in the output buffer, not the number of bytes.  In principle
this could result in a stack smash, although at least in my Fedora 25
install there are no ICU locales with display names long enough to
overrun the buffer.  But it's easily proven to be wrong by reducing
the length of displayname to around 20, whereupon a stack smash
does happen.

(This is a rather scary bug, because the same mistake could easily
have been made in other places; but in a quick code search looking
at uses of UChar I could not find any other instances.)
2017-06-23 16:00:55 -04:00
Peter Eisentraut 08859bb5c2 Fix replication with replica identity full
The comparison with the target rows on the subscriber side was done with
datumIsEqual(), which can have false negatives.  For instance, it didn't
work reliably for text columns.  So use the equality operator provided
by the type cache instead.

Also add more user documentation about replica identity requirements.

Reported-by: Tatsuo Ishii <ishii@sraoss.co.jp>
2017-06-23 15:40:17 -04:00
Tom Lane 0b13b2a771 Rethink behavior of pg_import_system_collations().
Marco Atzeri reported that initdb would fail if "locale -a" reported
the same locale name more than once.  All previous versions of Postgres
implicitly de-duplicated the results of "locale -a", but the rewrite
to move the collation import logic into C had lost that property.
It had also lost the property that locale names matching built-in
collation names were silently ignored.

The simplest way to fix this is to make initdb run the function in
if-not-exists mode, which means that there's no real use-case for
non if-not-exists mode; we might as well just drop the boolean argument
and simplify the function's definition to be "add any collations not
already known".  This change also gets rid of some odd corner cases
caused by the fact that aliases were added in if-not-exists mode even
if the function argument said otherwise.

While at it, adjust the behavior so that pg_import_system_collations()
doesn't spew "collation foo already exists, skipping" messages during a
re-run; that's completely unhelpful, especially since there are often
hundreds of them.  And make it return a count of the number of collations
it did add, which seems like it might be helpful.

Also, re-integrate the previous coding's property that it would make a
deterministic selection of which alias to use if there were conflicting
possibilities.  This would only come into play if "locale -a" reports
multiple equivalent locale names, say "de_DE.utf8" and "de_DE.UTF-8",
but that hardly seems out of the question.

In passing, fix incorrect behavior in pg_import_system_collations()'s
ICU code path: it neglected CommandCounterIncrement, which would result
in failures if ICU returns duplicate names, and it would try to create
comments even if a new collation hadn't been created.

Also, reorder operations in initdb so that the 'ucs_basic' collation
is created before calling pg_import_system_collations() not after.
This prevents a failure if "locale -a" were to report a locale named
that.  There's no reason to think that that ever happens in the wild,
but the old coding would have survived it, so let's be equally robust.

Discussion: https://postgr.es/m/20c74bc3-d6ca-243d-1bbc-12f17fa4fe9a@gmail.com
2017-06-23 14:19:58 -04:00
Simon Riggs 9ea3c64124 Improve replication lag interpolation after idle period
After sitting idle and fully replayed for a while and then encountering
a new burst of WAL activity, we interpolate between an ancient sample and the
not-yet-reached one for the new traffic. That produced a corner case report
of lag after receiving first new reply from standby, which might sometimes
be a large spike.

Correct this by resetting last_read time and handle that new case.

Author: Thomas Munro
2017-06-23 18:58:46 +01:00
Tom Lane b6159202c9 Fix memory leakage in ICU encoding conversion, and other code review.
Callers of icu_to_uchar() neglected to pfree the result string when done
with it.  This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory.  For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c.  I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.

Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place.  Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday.  Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed.  Const-ify icu_from_uchar()'s
input string for consistency.

That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.

Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
2017-06-23 12:22:06 -04:00
Alvaro Herrera da2322883b Fix typos in README.dependencies
There was a logic error in a formula, reported by Atsushi Torokoshi.
Ashutosh Bapat furthermore recommended to change notation for a variable
that was re-using a letter from a previous formula, though his proposed
patch contained a small error in attributing what the new letter is for.
Also, instead of his proposed d' I ended up using e, to avoid confusing
the reader with quotes which are used differently in the explaining
prose.

Bugs appeared in commit 2686ee1b7c.

Reported-by: Atsushi Torikoshi, Ashutosh Bapat
Discussion: https://postgr.es/m/CAFjFpRd03YojT4wyuDcjhCfYuygfWfnt68XGn2CKv=rcjRCtTA@mail.gmail.com
2017-06-22 17:12:27 -04:00
Alvaro Herrera 82c1507e30 Fix typo in comment
Once upon a time, WAL pointers could be NULL, but no longer.  We talk about
"valid" now.

Reported-by: Amit Langote
Discussion: https://postgr.es/m/33e9617d-27f1-eee8-3311-e27af98eaf2b@lab.ntt.co.jp
2017-06-22 16:42:38 -04:00
Robert Haas 6af9f1bd4b Document partitioned_rels in create_modifytable_path header comment.
Etsuro Fujita, slightly adjusted by me.

Discussion: http://postgr.es/m/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1@lab.ntt.co.jp
2017-06-22 13:52:50 -04:00
Alvaro Herrera a4f06606a3 Fix autovacuum launcher attachment to its DSA
The autovacuum launcher doesn't actually do anything with its DSA other
than creating it and attaching to it, but it's been observed that after
longjmp'ing to the standard error handling block (for example after
getting SIGINT) the autovacuum enters an infinite loop reporting that it
cannot attach to its DSA anymore (which is correct, because it's already
attached to it.)  Fix by only attempting to attach if not already
attached.

I introduced this bug together with BRIN autosummarization in
7526e10224.

Reported-by: Yugo Nagata.
Author: Thomas Munro.  I added the comment to go with it.
Discussion: https://postgr.es/m/20170621211538.0c9eae73.nagata@sraoss.co.jp
2017-06-22 13:50:26 -04:00
Robert Haas 2a6db5eba6 Update out-of-date comment in vacuumlazy.c
Commit 15c121b3ed seems to have
overlooked the need to trim this part of the comment.

Pavan Deolasee

Discussion: http://postgr.es/m/CABOikdPq_9+cWRNZ0RLKTwuZyj=uL85X=Usifa-CbPee1ZCM5A@mail.gmail.com
2017-06-22 13:38:53 -04:00
Alvaro Herrera 5dfd564b10 Fix IF NOT EXISTS in CREATE STATISTICS
I misplaced the IF NOT EXISTS clause in commit 7b504eb282, before the
word STATISTICS.  Put it where it belongs.

Patch written independently by Amit Langote and myself.  I adopted his
submitted test case with a slight edit also.

Reported-by: Bruno Wolff III
Discussion: https://postgr.es/m/20170621004237.GB8337@wolff.to
2017-06-22 13:17:08 -04:00
Robert Haas 1300276042 Update comment to account for table partitioning.
Ashutosh Bapat and Amit Langote

Discussion: http://postgr.es/m/CAFjFpRcG_NaAv6cDHD-9VfGdvB8maAtSfB=fTQr5+kxP2_sXzg@mail.gmail.com
2017-06-22 10:53:37 -04:00
Magnus Hagander f0415a30e0 Fix typo in comment
Author: Masahiko Sawada
2017-06-22 15:37:30 +02:00
Andres Freund fb886c153b Fix possibility of creating a "phantom" segment after promotion.
When promoting a standby just after a XLOG_SWITCH record was replayed,
and next segment(s) are already are locally available (via walsender,
restore_command + trigger/recovery target), that segment could
accidentally be recycled onto the past of the new timeline.  Later
checkpointer would create a .ready file for it, assuming there was an
error during creation, and it would get archived.  That causes trouble
if another standby is later brought up from a basebackup from before
the timeline creation, because it would try to read the
segment, because XLogFileReadAnyTLI just tries all possible timelines,
which doesn't have valid contents.  Thus replay would fail.

The problem, if already occurred, can be fixed by removing the segment
and/or having restore_command filter it out.

The reason for the creation of such "phantom" segments was, that after
an XLOG_SWITCH record the EndOfLog variable points to the beginning of
the next segment, and RemoveXlogFile() used XLByteToPrevSeg().
Normally RemoveXlogFile() doing so is harmless, because the last
segment will still exist preventing InstallXLogFileSegment() from
causing harm, but just after promotion there's no previous segment on
the new timeline.

Fix that by using XLByteToSeg() instead of XLByteToPrevSeg().

Author: Andres Freund
Reported-By: Greg Burek
Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de
Backpatch: 9.2-, bug older than all supported versions
2017-06-21 14:14:45 -07:00
Tom Lane 780b3a4c43 Manually un-break a few URLs that pgindent used to insist on splitting.
These will no longer get re-split by pgindent runs, so it's worth cleaning
them up now.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 16:02:08 -04:00
Tom Lane 382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Peter Eisentraut f669c09989 Restart logical replication launcher when killed
Author: Yugo Nagata <nagata@sraoss.co.jp>
2017-06-21 15:15:29 -04:00
Tom Lane e3860ffa4d Initial pgindent run with pg_bsd_indent version 2.0.
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:

* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
  sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
  well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
  with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
  than the expected column 33.

On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list.  This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.

There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses.  I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:39:04 -04:00
Tom Lane 9ef2dbefc7 Final pgindent run with old pg_bsd_indent (version 1.3).
This is just to have a clean basis for comparison with the results of
the new version (which will indeed end up reverting some of these
changes...)

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 14:09:24 -04:00
Dean Rasheed bcbf392ec8 Prevent table partitions from being turned into views.
A table partition must be a table, not a view, so don't allow a
"_RETURN" rule to be added that would convert an existing table
partition into a view.

Amit Langote

Discussion: https://postgr.es/m/CAEZATCVzFcAjZwC1bTFvJ09skB_sgkF4SwPKMywev-XTnimp9Q%40mail.gmail.com
2017-06-21 10:43:17 +01:00
Heikki Linnakangas ba1f017069 Fix typo in comment.
Etsuro Fujita
2017-06-21 11:55:07 +03:00
Peter Eisentraut 15c91568cf Fix typo in code comment
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-20 14:32:01 -04:00
Tom Lane a69dfe5f40 Don't downcase entries within shared_preload_libraries et al.
load_libraries(), which processes the various xxx_preload_libraries GUCs,
was parsing them using SplitIdentifierString() which isn't really
appropriate for values that could be path names: it downcases unquoted
text, and it doesn't allow embedded whitespace unless quoted.
Use SplitDirectoriesString() instead.  That also allows us to simplify
load_libraries() a bit, since canonicalize_path() is now done for it.

While this definitely seems like a bug fix, it has the potential to
break configuration settings that accidentally worked before because
of the downcasing behavior.  Also, there's an easy workaround for the
bug, namely to double-quote troublesome text.  Hence, no back-patch.

QL Zhuo, tweaked a bit by me

Discussion: https://postgr.es/m/CAB-oJtxHVDc3H+Km3CjB9mY1VDzuyaVH_ZYSz7iXcRqCtb93Ew@mail.gmail.com
2017-06-20 13:03:29 -04:00
Peter Eisentraut a2141c42f9 Tweak publication fetching in psql
Viewing a table with \d in psql also shows the publications at table is
in.  If a publication is concurrently dropped, this shows an error,
because the view pg_publication_tables internally uses
pg_get_publication_tables(), which uses a catalog snapshot.  This can be
particularly annoying if a for-all-tables publication is concurrently
dropped.

To avoid that, write the query in psql differently.  Expose the function
pg_relation_is_publishable() to SQL and write the query using that.
That still has a risk of being affected by concurrent catalog changes,
but in this case it would be a table drop that causes problems, and then
the psql \d command wouldn't be interesting anymore anyway.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2017-06-20 12:35:02 -04:00
Tom Lane d8e6b84bd2 Avoid regressions in foreign-key-based selectivity estimates.
David Rowley found that the "use the smallest per-column selectivity"
heuristic applied in some cases by get_foreign_key_join_selectivity()
was badly off if the FK columns are independent, producing estimates
much worse than we got before that code was added in 9.6.

One case where that heuristic was used was for LEFT and FULL outer joins
with the referenced rel on the outside of the join.  But we should not
really need to special-case those here.  eqjoinsel() never has had such a
special case; the correction is applied by calc_joinrel_size_estimate()
instead.  Let's just estimate such cases like inner joins and rely on that
later adjustment.  (I think there was something of a thinko here, in that
the comments seem to be thinking about the selectivity as defined for
semi/anti joins; but that shouldn't apply to left/full joins.)  Add a
regression test exercising such a case to show that this is sane in
at least some cases.

The other case where we used that heuristic was for SEMI/ANTI outer joins,
either if the referenced rel was on the outside, or if it was on the inside
but was part of a join within the RHS.  In either case, the FK doesn't give
us a lot of traction towards estimating the selectivity.  To ensure that
we don't have regressions from what happened before 9.6, let's punt by
ignoring the FK in such cases and applying the traditional selectivity
calculation.  (We might be able to improve on that later, but for now
I just want to be sure it's not worse than 9.5.)

Report and patch by David Rowley, simplified a bit by me.  Back-patch
to 9.6 where this code was added.

Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com
2017-06-19 15:33:41 -04:00
Andres Freund 3bdea167eb Fix leaking of small spilled subtransactions during logical decoding.
When, during logical decoding, a transaction gets too big, it's
contents get spilled to disk. Not just the top-transaction gets
spilled, but *also* all of its subtransactions, even if they're not
that large themselves.  Unfortunately we didn't clean up
such small spilled subtransactions from disk.

Fix that, by keeping better track of whether a transaction has been
spilled to disk.

Author: Andres Freund
Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello
Discussion:
    https://postgr.es/m/1457621358.355011041@f382.i.mail.ru
    https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com
Backpatch: 9.4-, where logical decoding was introduced
2017-06-18 19:12:56 -07:00
Peter Eisentraut 033370179a Set statement timestamp in apply worker
This ensures that triggers can see an up-to-date timestamp.

Reported-by: Konstantin Evteev <konst583@gmail.com>
2017-06-17 08:54:21 -04:00
Magnus Hagander bb1f8f9e5b Fix typos in comments
Author: Daniel Gustafsson <daniel@yesql.se>
2017-06-17 10:17:28 +02:00
Peter Eisentraut 94da2a6a9a Use RangeVarGetRelidExtended() in AlterSequence()
This allows us to combine the opening and the ownership check.

Reported-by: Robert Haas <robertmhaas@gmail.com>
2017-06-16 10:24:50 -04:00
Peter Eisentraut 41839b7abc Fix ICU collation use on Windows
Windows uses a separate code path for libc locales.  The code previously
ended up there also if an ICU collation should be used, leading to a
crash.

Reported-by: Ashutosh Sharma <ashu.coek88@gmail.com>
2017-06-16 10:08:54 -04:00
Heikki Linnakangas 30681c830d Fix dependency, when changing a function's argument/return type.
When a new base type is created using the old-style procedure of first
creating the input/output functions with "opaque" in place of the base
type, the "opaque" argument/return type is changed to the final base type,
on CREATE TYPE. However, we did not create a pg_depend record when doing
that, so the functions were left not depending on the type.

Fixes bug #14706, reported by Karen Huddleston.

Discussion: https://www.postgresql.org/message-id/20170614232259.1424.82774@wrigleys.postgresql.org
2017-06-16 11:33:12 +03:00
Noah Misch 39ac55918f Reconcile nodes/*funcs.c with PostgreSQL 10 work.
The _equalTableFunc() omission of coltypmods has semantic significance,
but I did not track down resulting user-visible bugs, if any.  The other
changes are cosmetic only, affecting order.  catversion bump due to
readfuncs.c field order change.
2017-06-16 00:16:11 -07:00
Tom Lane a3bed62d44 Fix low-probability leaks of PGresult objects in the backend.
We had three occurrences of essentially the same coding pattern
wherein we tried to retrieve a query result from a libpq connection
without blocking.  In the case where PQconsumeInput failed (typically
indicating a lost connection), all three loops simply gave up and
returned, forgetting to clear any previously-collected PGresult
object.  Since those are malloc'd not palloc'd, the oversight results
in a process-lifespan memory leak.

One instance, in libpqwalreceiver, is of little significance because
the walreceiver process would just quit anyway if its connection fails.
But we might as well fix it.

The other two instances, in postgres_fdw, are somewhat more worrisome
because at least in principle the scenario could be repeated, allowing
the amount of memory leaked to build up to something worth worrying
about.  Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS
calls, as well as other calls that could potentially elog(ERROR),
providing another way to exit without having cleared the PGresult.
Here we need to add PG_TRY logic similar to what exists in quite a
few other places in postgres_fdw.

Coverity noted the libpqwalreceiver bug; I found the other two cases
by checking all calls of PQconsumeInput.

Back-patch to all supported versions as appropriate (9.2 lacks
postgres_fdw, so this is really quite unexciting for that branch).

Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
2017-06-15 15:03:52 -04:00
Alvaro Herrera 3ab7912c18 Rename function for consistency
Avoid using prefix "staext" when everything else uses "statext".

Author: Kyotaro HORIGUCHI
Discussion: https://postgr.es/m/20170615.140041.165731947.horiguchi.kyotaro@lab.ntt.co.jp
2017-06-15 11:44:33 -04:00
Robert Haas f32d57fd70 Fix problems related to RangeTblEntry members enrname and enrtuples.
Commit 18ce3a4ab2 failed to update
the comments in parsenodes.h for the new members, and made only
incomplete updates to src/backend/nodes

Thomas Munro, per a report from Noah Misch.

Discussion: http://postgr.es/m/20170611062525.GA1628882@rfd.leadboat.com
2017-06-14 16:19:46 -04:00
Andres Freund 6c2003f8a1 Don't force-assign transaction id when exporting a snapshot.
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.

The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation.   It also
prohibited snapshots to be exported on hot-standby replicas.

Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.

Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
2017-06-14 11:57:21 -07:00
Peter Eisentraut b6966d4627 Use DEFACLOBJ_ macros in error message instead of hardcoding 2017-06-14 14:44:24 -04:00
Robert Haas b08df9cab7 Teach predtest.c about CHECK clauses to fix partitioning bugs.
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false.  predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms.  Add one.

Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped.  Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.

Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.

Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-06-14 13:13:11 -04:00
Alvaro Herrera e90ceeaa49 Avoid bogus TwoPhaseState locking sequences
The optimized code in 728bd991c3 contains a few invalid locking
sequences.  To wit, the original code would try to acquire an lwlock
that it already holds.  Avoid this by moving lock acquisitions to
higher-level code, and install appropriate assertions in low-level that
the correct mode is held.

Authors: Michael Paquier, Álvaro Herrera
Reported-By: chuanting wang
Bug: #14680
Discussion: https://postgr.es/m/20170531033228.1487.10124@wrigleys.postgresql.org
2017-06-14 11:29:05 -04:00
Tom Lane 8e72239e9d Fix no-longer-valid shortcuts in expression_returns_set().
expression_returns_set() used to short-circuit its recursion upon
seeing certain node types, such as DistinctExpr, that it knew the
executor did not support set-valued arguments for.  That was never
inherent, though, just a reflection of laziness in execQual.c.
With the new implementation of SRFs there is no reason to think
that any scalar-valued expression node could not have a set-valued
subexpression, except for AggRefs and WindowFuncs where we know there
is a parser check rejecting it.  And indeed, the shortcut causes
unexpected failures for cases such as a SRF underneath DistinctExpr,
because the planner stops looking for SRFs too soon.

Discussion: https://postgr.es/m/5259.1497044025@sss.pgh.pa.us
2017-06-14 11:10:05 -04:00
Tom Lane a571c7f661 Fix violations of CatalogTupleInsert/Update/Delete abstraction.
In commits 2f5c9d9c9 and ab0289651 we invented an abstraction layer
to insulate catalog manipulations from direct heap update calls.
But evidently some patches that hadn't landed in-tree at that point
didn't get the memo completely.  Fix a couple of direct calls to
simple_heap_delete to use CatalogTupleDelete instead; these appear
to have been added in commits 7c4f52409 and 7b504eb28.  This change is
purely cosmetic ATM, but there's no point in having an abstraction layer
if we allow random code to break it.

Masahiko Sawada and Tom Lane

Discussion: https://postgr.es/m/CAD21AoDOPRSVcwbnCN3Y1n_68ATyTspsU6=ygtHz_uY0VcdZ8A@mail.gmail.com
2017-06-14 10:26:46 -04:00
Dean Rasheed f356ec5744 Teach RemoveRoleFromObjectPolicy() about partitioned tables.
Table partitioning, introduced in commit f0e44751d7, added a new
relkind - RELKIND_PARTITIONED_TABLE. Update
RemoveRoleFromObjectPolicy() to handle it, otherwise DROP OWNED BY
will fail if the role has any RLS policies referring to partitioned
tables.

Dean Rasheed, reviewed by Amit Langote.

Discussion: https://postgr.es/m/CAEZATCUnNOKN8sLML9jUzxecALWpEXK3a3W7y0PgFR4%2Buhgc%3Dg%40mail.gmail.com
2017-06-14 08:43:40 +01:00
Tom Lane 0436f6bde8 Disallow set-returning functions inside CASE or COALESCE.
When we reimplemented SRFs in commit 69f4b9c85, our initial choice was
to allow the behavior to vary from historical practice in cases where a
SRF call appeared within a conditional-execution construct (currently,
only CASE or COALESCE).  But that was controversial to begin with, and
subsequent discussion has resulted in a consensus that it's better to
throw an error instead of executing the query differently from before,
so long as we can provide a reasonably clear error message and a way to
rewrite the query.

Hence, add a parser mechanism to allow detection of such cases during
parse analysis.  The mechanism just requires storing, in the ParseState,
a pointer to the set-returning FuncExpr or OpExpr most recently emitted
by parse analysis.  Then the parsing functions for CASE and COALESCE can
detect the presence of a SRF in their arguments by noting whether this
pointer changes while analyzing their arguments.  Furthermore, if it does,
it provides a suitable error cursor location for the complaint.  (This
means that if there's more than one SRF in the arguments, the error will
point at the last one to be analyzed not the first.  While connoisseurs of
parsing behavior might find that odd, it's unlikely the average user would
ever notice.)

While at it, we can also provide more specific error messages than before
about some pre-existing restrictions, such as no-SRFs-within-aggregates.
Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM
construct would need to return a set.  We've never supported that, but the
restriction is depended on in more subtle ways now, so it seems wise to
detect it at the start.

Also, provide some documentation about how to rewrite a SRF-within-CASE
query using a custom wrapper SRF.

It turns out that the information_schema.user_mapping_options view
contained an instance of exactly the behavior we're now forbidding; but
rewriting it makes it more clear and safer too.

initdb forced because of user_mapping_options change.

Patch by me, with error message suggestions from Alvaro Herrera and
Andres Freund, pursuant to a complaint from Regina Obe.

Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us
2017-06-13 23:46:39 -04:00
Tom Lane 651902deb1 Re-run pgindent.
This is just to have a clean base state for testing of Piotr Stefaniak's
latest version of FreeBSD indent.  I fixed up a couple of places where
pgindent would have changed format not-nicely.  perltidy not included.

Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
2017-06-13 13:05:59 -04:00
Robert Haas 096f1ccd52 Always initialize PartitionBoundInfoData's null_index.
This doesn't actually matter at present, because the current code
never consults null_index for range partitions.  However, leaving
it uninitialized is still a bad idea, so let's not do that.

Amul Sul, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/CAAJ_b94AkEzcx+12ySCnbMDX7=UdF4BjnoBGfMQbB0RNSTo3Ng@mail.gmail.com
2017-06-13 12:39:20 -04:00
Dean Rasheed b6263cd851 Teach relation_is_updatable() about partitioned tables.
Table partitioning, introduced in commit f0e44751d7, added a new
relkind - RELKIND_PARTITIONED_TABLE. Update relation_is_updatable() to
handle it. Specifically, partitioned tables and simple views built on
top of them are updatable.

This affects the SQL-callable functions pg_relation_is_updatable() and
pg_column_is_updatable(), and the views information_schema.views and
information_schema.columns.

Dean Rasheed, reviewed by Ashutosh Bapat.

Discussion: https://postgr.es/m/CAEZATCXnbiFkMXgF4Ez1pmM2c-tS1z33bSq7OGbw7QQhHov%2B6Q%40mail.gmail.com
2017-06-13 17:30:36 +01:00
Robert Haas ee252f074b Fix failure to remove dependencies when a partition is detached.
Otherwise, dropping the partitioned table will automatically drop
any previously-detached children, which would be unfortunate.

Ashutosh Bapat and Rahila Syed, reviewed by Amit Langote and by me.

Discussion: http://postgr.es/m/CAFjFpRdOwHuGj45i25iLQ4QituA0uH6RuLX1h5deD4KBZJ25yg@mail.gmail.com
2017-06-13 11:51:42 -04:00
Tom Lane b74701043e In initdb, defend against assignment of NULL values to not-null columns.
Previously, you could write _null_ in a BKI DATA line for a column that's
supposed to be NOT NULL and initdb would let it pass, probably breaking
subsequent accesses to the row.  No doubt the original coding overlooked
this simple sanity check because in the beginning we didn't have any way
to mark catalog columns NOT NULL at initdb time.
2017-06-13 10:54:43 -04:00
Peter Eisentraut f2a886104a Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-13 10:54:03 -04:00
Peter Eisentraut 88c6cff8e7 Improve code comments
Author: Erik Rijkers <er@xs4all.nl>
2017-06-13 10:43:36 -04:00
Peter Eisentraut 17082a88ea Prevent copying default collation
This will not have the desired effect and might lead to crashes when the
copied collation is used.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
2017-06-13 08:49:41 -04:00
Tom Lane 78a030a441 Fix confusion about number of subplans in partitioned INSERT setup.
ExecInitModifyTable() thought there was a plan per partition, but no,
there's only one.  The problem had escaped detection so far because there
would only be visible misbehavior if there were a SubPlan (not an InitPlan)
in the quals being duplicated for each partition.  However, valgrind
detected a bogus memory access in test cases added by commit 4f7a95be2,
and investigation of that led to discovery of the bug.  The additional
test case added here crashes without the patch.

Patch by Amit Langote, test case by me.

Discussion: https://postgr.es/m/10974.1497227727@sss.pgh.pa.us
2017-06-12 23:29:53 -04:00
Tom Lane 7332c3cbb3 Assert that we don't invent relfilenodes or type OIDs in binary upgrade.
During pg_upgrade's restore run, all relfilenode choices should be
overridden by commands in the dump script.  If we ever find ourselves
choosing a relfilenode in the ordinary way, someone blew it.  Likewise for
pg_type OIDs.  Since pg_upgrade might well succeed anyway, if there happens
not to be a conflict during the regression test run, we need assertions
here to keep us on the straight and narrow.

We might someday be able to remove the assertion in GetNewRelFileNode,
if pg_upgrade is rewritten to remove its assumption that old and new
relfilenodes always match.  But it's hard to see how to get rid of the
pg_type OID constraint, since those OIDs are embedded in user tables
in some cases.

Back-patch as far as 9.5, because of the risk of back-patches breaking
something here even if it works in HEAD.  I'd prefer to go back further,
but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary
table.  We can't use the later-branch solution of a CTE for compatibility
reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some
other way to do the query.  (I did check, by dint of changing the Asserts
to elog(WARNING), that there are no other cases of unwanted OID assignments
during 9.4's regression test run.)

Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
2017-06-12 20:04:32 -04:00
Tom Lane a475e46634 Fix ALTER SEQUENCE OWNED BY to not rewrite the sequence relation.
It's not necessary for it to do that, since OWNED BY requires only ordinary
catalog updates and doesn't affect future sequence values.  And pg_upgrade
needs to use OWNED BY without having it change the sequence's relfilenode.
Commit 3d79013b9 broke this by making all forms of ALTER SEQUENCE change
the relfilenode; that seems to be the explanation for the hard-to-reproduce
buildfarm failures we've been seeing since then.

Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
2017-06-12 16:57:31 -04:00
Peter Eisentraut 94c2ed0ebe Add ICU_CFLAGS to global CPPFLAGS
The original code only added ICU_CFLAGS to the backend build.  But it is
also needed for building external modules that include pg_locale.h.  So
add it to the global CPPFLAGS.  (This is only relevant if ICU is not in
a compiler default path, so it apparently hasn't bitten many.)
2017-06-12 15:57:22 -04:00
Peter Eisentraut 7f28a7946a Remove "synchronized table states" notice message
It appears to be more confusing than useful.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
2017-06-12 11:42:06 -04:00
Peter Eisentraut 253504fb9f Fix build of ICU support in Windows
and also any platform that does not have locale_t but enabled ICU.

Author: Ashutosh Sharma <ashu.coek88@gmail.com>
2017-06-12 10:28:37 -04:00
Peter Eisentraut ddd7b22b22 Stop table sync workers when subscription relation entry is removed
When a table sync worker is in waiting state and the subscription table
entry is removed because of a concurrent subscription refresh, the
worker could be left orphaned.  To avoid that, explicitly stop the
worker when the pg_subscription_rel entry is removed.

Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-12 08:53:37 -04:00
Tom Lane 51893985d3 Handle unqualified SEQUENCE NAME options properly in parse_utilcmd.c.
generateSerialExtraStmts() was sloppy about handling the case where
SEQUENCE NAME is given with a not-schema-qualified name.  It was generating
a CreateSeqStmt with an unqualified sequence name, and an AlterSeqStmt
whose "owned_by" DefElem contained a T_String Value with a null string
pointer in the schema-name position.  The generated nextval() argument was
also underqualified.  This accidentally failed to fail at runtime, but only
so long as the current default creation namespace at runtime is the right
namespace.  That's bogus; the parse-time transformation is supposed to be
inserting the right schema name in all cases, so as to avoid any possible
skew in that selection.  I'm not sure this could fail in pg_dump's usage,
but it's still wrong; we have had real bugs in this area before adopting
the policy that parse_utilcmd.c should generate only fully-qualified
auxiliary commands.  A slightly lesser problem, which is what led me to
notice this in the first place, is that pprint() dumped core on the
AlterSeqStmt because of the bogus T_String.

Noted while poking into the open problem with ALTER SEQUENCE breaking
pg_upgrade.
2017-06-11 19:00:01 -04:00
Joe Conway 4f7a95be2c Apply RLS policies to partitioned tables.
The new partitioned table capability added a new relkind, namely
RELKIND_PARTITIONED_TABLE. Update fireRIRrules() to apply RLS
policies on RELKIND_PARTITIONED_TABLE as it does RELKIND_RELATION.

In addition, add RLS regression test coverage for partitioned tables.

Issue raised by Fakhroutdinov Evgenievich and patch by Mike Palmiotto.
Regression test editorializing by me.

Discussion: https://postgr.es/m/flat/20170601065959.1486.69906@wrigleys.postgresql.org
2017-06-11 08:51:18 -07:00
Peter Eisentraut e11e24b1ed Formatting improvements in config file samples 2017-06-09 14:38:33 -04:00
Peter Eisentraut 8c9387c55e Update code comments
Author: Neha Khatri <nehakhatri5@gmail.com>
2017-06-09 14:04:22 -04:00
Peter Eisentraut dabbe8d564 Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-09 11:40:08 -04:00
Peter Eisentraut 8dc7c33812 Improve tablesync behavior with concurrent changes
When a table is removed from a subscription before the tablesync worker
could start, this would previously result in an error when reading
pg_subscription_rel.  Now we just ignore this.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-09 09:20:54 -04:00
Andres Freund 2c48f5db64 Use standard interrupt handling in logical replication launcher.
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls.  Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.

This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling.  But that's work for another day.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
2017-06-08 15:38:50 -07:00
Andres Freund 5fd56b9f5b Again report a useful error message when walreceiver's connection closes.
Since 7c4f52409a (merged in v10), a shutdown master is reported as
  FATAL:  unexpected result after CommandComplete: server closed the connection unexpectedly
by walsender. It used to be
  LOG:  replication terminated by primary server
  FATAL:  could not send end-of-streaming message to primary: no COPY in progress
while the old message clearly is not perfect, it's definitely better
than what's reported now.

The change comes from the attempt to handle finished COPYs without
erroring out, needed for the new logical replication, which wasn't
needed before.

There's probably better ways to handle this, but for now just
explicitly check for a closed connection.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f7c7dd08-855c-e4ed-41f4-d064a6c0665a@2ndquadrant.com
Backpatch: -
2017-06-08 14:51:43 -07:00
Heikki Linnakangas e3df8f8b93 Improve authentication error messages.
Most of the improvements were in the new SCRAM code:

* In SCRAM protocol violation messages, use errdetail to provide the
  details.

* If pg_backend_random() fails, throw an ERROR rather than just LOG. We
  shouldn't continue authentication if we can't generate a random nonce.

* Use ereport() rather than elog() for the "invalid SCRAM verifier"
  messages. They shouldn't happen, if everything works, but it's not
  inconceivable that someone would have invalid scram verifiers in
  pg_authid, e.g. if a broken client application was used to generate the
  verifier.

But this change applied to old code:

* Use ERROR rather than COMMERROR for protocol violation errors. There's
  no reason to not tell the client what they did wrong. The client might be
  confused already, so that it cannot read and display the error correctly,
  but let's at least try. In the "invalid password packet size" case, we
  used to actually continue with authentication anyway, but that is now a
  hard error.

Patch by Michael Paquier and me. Thanks to Daniel Varrazzo for spotting
the typo in one of the messages that spurred the discussion and these
larger changes.

Discussion: https://www.postgresql.org/message-id/CA%2Bmi_8aZYLhuyQi1Jo0hO19opNZ2OEATEOM5fKApH7P6zTOZGg%40mail.gmail.com
2017-06-08 19:54:22 +03:00
Peter Eisentraut 644ea35fc1 Fix updating of pg_subscription_rel from workers
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows.  Adjust the API to be able
to choose that behavior.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-07 13:49:14 -04:00
Robert Haas 15ce775faa Prevent BEFORE triggers from violating partitioning constraints.
Since tuple-routing implicitly checks the partitioning constraints
at least for the levels of the partitioning hierarchy it traverses,
there's normally no need to revalidate the partitioning constraint
after performing tuple routing.  However, if there's a BEFORE trigger
on the target partition, it could modify the tuple, causing the
partitioning constraint to be violated.  Catch that case.

Also, instead of checking the root table's partition constraint after
tuple-routing, check it beforehand.  Otherwise, the rules for when
the partitioning constraint gets checked get too complicated, because
you sometimes have to check part of the constraint but not all of it.
This effectively reverts commit 39162b2030
in favor of a different approach altogether.

Report by me.  Initial debugging by Jeevan Ladhe.  Patch by Amit
Langote, reviewed by me.

Discussion: http://postgr.es/m/CA+Tgmoa9DTgeVOqopieV8d1QRpddmP65aCdxyjdYDoEO5pS5KA@mail.gmail.com
2017-06-07 12:50:45 -04:00
Peter Eisentraut d4bfc06e29 Consistently use subscription name as application name
The logical replication apply worker uses the subscription name as
application name, except for table sync.  This was incorrectly set to
use the replication slot name, which might be different, in one case.
Also add a comment why the other case is different.
2017-06-06 22:11:22 -04:00
Andres Freund 9206ced1dc Clean up latch related code.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch.  The latter works even early during backend startup,
where MyProc->procLatch doesn't yet.  While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might.  Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.

While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.

As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de
    https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
2017-06-06 16:13:00 -07:00
Peter Eisentraut e3a815d2fa Improve handover logic between sync and apply workers
Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.

Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Tested-by: Erik Rijkers <er@xs4all.nl>
2017-06-06 14:41:04 -04:00
Robert Haas 3106829513 Use NIL rather than NULL to represent an empty list.
Just to be tidy.

Amit Langote

Discussion: http://postgr.es/m/9297f80f-e4ab-7dda-33d4-8580bab6d634@lab.ntt.co.jp
2017-06-06 11:21:22 -04:00
Robert Haas 2186b608b3 Clean up partcollation handling for OID 0.
Consistent with what we do for indexes, we shouldn't try to record
dependencies on collation OID 0 or the default collation OID (which
is pinned).  Also, the fact that indcollation and partcollation can
contain zero OIDs when the data type is not collatable should be
documented.

Amit Langote, per a complaint from me.

Discussion: http://postgr.es/m/CA+Tgmoba5mtPgM3NKfG06vv8na5gGbVOj0h4zvivXQwLw8wXXQ@mail.gmail.com
2017-06-06 11:07:20 -04:00
Andres Freund c1abe6c786 Wire up query cancel interrupt for walsender backends.
This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.

Author: Petr Jelinek
Reviewed-By: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
2017-06-05 19:18:16 -07:00
Andres Freund 6e1dd2773e Unify SIGHUP handling between normal and walsender backends.
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload.  Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).

This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.

A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.

Author: Petr Jelinek, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
2017-06-05 19:18:16 -07:00
Andres Freund c6c3334364 Prevent possibility of panics during shutdown checkpoint.
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.  Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.

Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
2017-06-05 19:18:15 -07:00
Andres Freund 47fd420fb4 Have walsenders participate in procsignal infrastructure.
The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.

This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler.  In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.

Author: Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
2017-06-05 19:18:15 -07:00
Andres Freund 703f148e98 Revert "Prevent panic during shutdown checkpoint"
This reverts commit 086221cf6b, which
was made to master only.

The approach implemented in the above commit has some issues.  While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
2017-06-05 19:18:15 -07:00
Peter Eisentraut 41a21bf9b4 Don't set application_name in logical replication workers
This was bothering some people because it's not the intended use of
application_name and it makes the default view of pg_stat_activity
bulky.
2017-06-05 22:16:02 -04:00
Peter Eisentraut 9907b55ceb Fix ALTER SUBSCRIPTION grammar ambiguity
There was a grammar ambiguity between SET PUBLICATION name REFRESH and
SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word.  To
resolve that, fold the refresh choice into the WITH options.  Refreshing
is the default now.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-05 21:43:25 -04:00
Peter Eisentraut 06bfb801c7 Ignore WL_POSTMASTER_DEATH latch event in single user mode
Otherwise code that uses this will abort with an assertion failure,
because postmaster_alive_fds are not initialized.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-06-05 21:03:35 -04:00
Tom Lane 3e60c6f723 Code review for shm_toc.h/.c.
Declare the toc_nentry field as uint32 not Size.  Since shm_toc_lookup()
reads the field without any lock, it has to be atomically readable, and
we do not assume that for fields wider than 32 bits.  Performance would
be impossibly bad for entry counts approaching 2^32 anyway, so there is
no need to try to preserve maximum width here.

This is probably an academic issue, because even if reading int64 isn't
atomic, the high order half would never change in practice.  Still, it's
a coding rule violation, so let's fix it.

Adjust some other not-terribly-well-chosen data types too, and copy-edit
some comments.  Make shm_toc_attach's Asserts consistent with
shm_toc_create's.

None of this looks to be a live bug, so no need for back-patch.

Discussion: https://postgr.es/m/16984.1496679541@sss.pgh.pa.us
2017-06-05 14:50:59 -04:00
Tom Lane d466335064 Don't be so trusting that shm_toc_lookup() will always succeed.
Given the possibility of race conditions and so on, it seems entirely
unsafe to just assume that shm_toc_lookup() always finds the key it's
looking for --- but that was exactly what all but one call site were
doing.  To fix, add a "bool noError" argument, similarly to what we
have in many other functions, and throw an error on an unexpected
lookup failure.  Remove now-redundant Asserts that a rather random
subset of call sites had.

I doubt this will throw any light on buildfarm member lorikeet's
recent failures, because if an unnoticed lookup failure were involved,
you'd kind of expect a null-pointer-dereference crash rather than the
observed symptom.  But you never know ... and this is better coding
practice even if it never catches anything.

Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us
2017-06-05 12:05:42 -04:00
Heikki Linnakangas af51fea039 Fix typo in error message.
Daniele Varrazzo

Discussion: https://www.postgresql.org/message-id/CA+mi_8bqY5THP8hLKKSdMEr5GCz6M=hD6_uLbvFeyEBfwqUxeA@mail.gmail.com
2017-06-05 11:38:26 +03:00
Tom Lane e7941a9766 Replace over-optimistic Assert in partitioning code with a runtime test.
get_partition_parent felt that it could simply Assert that systable_getnext
found a tuple.  This is unlike any other caller of that function, and it's
unsafe IMO --- in fact, the reason I noticed it was that the Assert failed.
(OK, I was working with known-inconsistent catalog contents, but I wasn't
expecting the DB to fall over quite that violently.  The behavior in a
non-assert-enabled build wouldn't be very nice, either.)  Fix it to do what
other callers do, namely an actual runtime-test-and-elog.

Also, standardize the wording of elog messages that are complaining about
unexpected failure of systable_getnext.  90% of them say "could not find
tuple for <object>", so make the remainder do likewise.  Many of the
holdouts were using the phrasing "cache lookup failed", which is outright
misleading since no catcache search is involved.
2017-06-04 16:20:03 -04:00
Tom Lane 9db7d47f90 #ifdef out assorted unused GEQO code.
I'd always assumed that backend/optimizer/geqo/'s remarkably poor
showing on code coverage metrics was because we weren't exercising
it much in the regression tests.  But it turns out that a good chunk
of the problem is that there's a bunch of code that is physically
unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
but is being built anyway.  Making the called code have #if guards
similar to the calling code saves a couple of kilobytes of executable
size and should make the coverage numbers more reflective of reality.

It's arguable that we should just delete all the unused recombination
mechanisms altogether, but I didn't feel a need to go that far today.
2017-06-04 13:34:05 -04:00
Tom Lane 0d18852666 Disallow CREATE INDEX if table is already in use in current session.
If we allow this, whatever outer command has the table open will not know
about the new index and may fail to update it as needed, as shown in a
report from Laurenz Albe.  We already had such a prohibition in place for
ALTER TABLE, but the CREATE INDEX syntax missed the check.

Fixing it requires an API change for DefineIndex(), which conceivably
would break third-party extensions if we were to back-patch it.  Given
how long this problem has existed without being noticed, fixing it in
the back branches doesn't seem worth that risk.

Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at
2017-06-04 12:02:41 -04:00
Alvaro Herrera 55a70a023c Assorted translatable string fixes
Mark our rusage reportage string translatable; remove quotes from type
names; unify formatting of very similar messages.
2017-06-04 11:41:16 -04:00
Tom Lane 5936d25f81 Remove dead variables.
Commit 512c7356b left a couple of variables unused except for being set.
My compiler didn't whine about this, but some buildfarm members did.
2017-06-03 20:35:52 -04:00
Tom Lane 512c7356b6 Fix <> and pattern-NOT-match estimators to handle nulls correctly.
These estimators returned 1 minus the corresponding equality/match
estimate, which is incorrect: we need to subtract off the fraction
of nulls in the column, since those are neither equal nor not equal
to the comparison value.  The error only becomes obvious if the
nullfrac is large, but it could be very bad in a mostly-nulls
column, as reported in bug #14676 from Marko Tiikkaja.

To fix the <> case, refactor eqsel() and neqsel() to call a common
support routine, which can be made to account for nullfrac correctly.
The pattern-match cases were already factored that way, and it was
simply an oversight that patternsel() wasn't subtracting off nullfrac.

neqjoinsel() has a similar problem, but since we're elsewhere discussing
changing its behavior entirely, I left it alone for now.

This is a very longstanding bug, but I'm hesitant to back-patch a fix for
it.  Given the lack of prior complaints, such cases must not come up often,
so it's probably not worth the risk of destabilizing plans in stable
branches.

Discussion: https://postgr.es/m/20170529153847.4275.95416@wrigleys.postgresql.org
2017-06-03 14:36:25 -04:00
Tom Lane 23886581b5 Fix old corner-case logic error in final_cost_nestloop().
When costing a nestloop with stop-at-first-inner-match semantics, and a
non-indexscan inner path, final_cost_nestloop() wants to charge the full
scan cost of the inner rel at least once, with additional scans charged
at inner_rescan_run_cost which might be less.  However the logic for
doing this effectively assumed that outer_matched_rows is at least 1.
If it's zero, which is not unlikely for a small outer rel, we ended up
charging inner_run_cost plus N times inner_rescan_run_cost, as much as
double the correct charge for an outer rel with only one row that
we're betting won't be matched.  (Unless the inner rel is materialized,
in which case it has very small inner_rescan_run_cost and the cost
is not so far off what it should have been.)

The upshot of this was that the planner had a tendency to select plans
that failed to make effective use of the stop-at-first-inner-match
semantics, and that might have Materialize nodes in them even when the
predicted number of executions of the Materialize subplan was only 1.
This was not so obvious before commit 9c7f5229a, because the case only
arose in connection with semi/anti joins where there's not freedom to
reverse the join order.  But with the addition of unique-inner joins,
it could result in some fairly bad planning choices, as reported by
Teodor Sigaev.  Indeed, some of the test cases added by that commit
have plans that look dubious on closer inspection, and are changed
by this patch.

Fix the logic to ensure that we don't charge for too many inner scans.
I chose to adjust it so that the full-freight scan cost is associated
with an unmatched outer row if possible, not a matched one, since that
seems like a better model of what would happen at runtime.

This is a longstanding bug, but given the lesser impact in back branches,
and the lack of field complaints, I won't risk a back-patch.

Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
2017-06-03 13:48:15 -04:00
Peter Eisentraut 66b84fa82f Receive invalidation messages correctly in tablesync worker
We didn't accept any invalidation messages until the whole sync process
had finished (because it flattens all the remote transactions in the
single one).  So the sync worker didn't learn about subscription
changes/drop until it has finished.  This could lead to "orphaned" sync
workers.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-03 11:40:05 -04:00
Peter Eisentraut 3c9bc2157a Make tablesync worker exit when apply dies while it was waiting for it
This avoids "orphaned" sync workers.

This was caused by a thinko in wait_for_sync_status_change.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-03 09:20:56 -04:00
Andres Freund 34aebcf42a Allow parallelism in COPY (query) TO ...;
Previously this was not allowed, as copy.c didn't set the
CURSOR_OPT_PARALLEL_OK flag when planning the query. Set it.

While the lack of parallel query for COPY isn't strictly speaking a
bug, it does prevent parallelism from being used in a facility
commonly used to run long running queries. Thus backpatch to 9.6.

Author: Andres Freund
Discussion: https://postgr.es/m/20170531231958.ihanapplorptykzm@alap3.anarazel.de
Backpatch: 9.6, where parallelism was introduced.
2017-06-02 19:11:15 -07:00
Peter Eisentraut 420a0392ef Remove replication slot name check from ReplicationSlotAcquire()
When trying to access a replication slot that is supposed to already
exist, we don't need to check the naming rules again.  If the slot
does not exist, we will then get a "does not exist" error message, which
is generally more useful from the perspective of an end user.
2017-06-02 15:16:57 -04:00
Peter Eisentraut 9fcf670c2e Fix signal handling in logical replication workers
The logical replication worker processes now use the normal die()
handler for SIGTERM and CHECK_FOR_INTERRUPTS() instead of custom code.
One problem before was that the apply worker would not exit promptly
when a subscription was dropped, which could lead to deadlocks.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-06-02 14:49:23 -04:00
Magnus Hagander acbd8375e9 Fix copy/paste mistake in comment
Amit Langote
2017-06-02 11:18:24 +02:00
Magnus Hagander 483373979b Fix typo in comment
Masahiko Sawada
2017-06-02 09:40:54 +02:00
Peter Eisentraut 6812330f1c Reorganize logical replication worker disconnect code
Move the walrcv_disconnect() calls into the before_shmem_exit handler.
This makes sure the call is always made even during exit by signal, it
saves some duplicate code, and it makes the logic more similar to
walreceiver.c.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-06-01 23:16:20 -04:00
Andres Freund 665104557f Modify sequence catalog tuple before invoking post alter hook.
This seems to have been broken in the commit (1753b1b027) that
moved the sequence definition into pg_sequence.

Author: Andres Freund
Discussion: https://postgr.es/m/20170601000716.qxg7c46ukkiljjb3@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 14:19:33 -07:00
Andres Freund 3d79013b97 Make ALTER SEQUENCE, including RESTART, fully transactional.
Previously the changes to the "data" part of the sequence, i.e. the
one containing the current value, were not transactional, whereas the
definition, including minimum and maximum value were.  That leads to
odd behaviour if a schema change is rolled back, with the potential
that out-of-bound sequence values can be returned.

To avoid the issue create a new relfilenode fork whenever ALTER
SEQUENCE is executed, similar to how TRUNCATE ... RESTART IDENTITY
already is already handled.

This commit also makes ALTER SEQUENCE RESTART transactional, as it
seems to be too confusing to have some forms of ALTER SEQUENCE behave
transactionally, some forms not.  This way setval() and nextval() are
not transactional, but DDL is, which seems to make sense.

This commit also rolls back parts of the changes made in 3d092fe540
and f8dc1985f as they're now not needed anymore.

Author: Andres Freund
Discussion: https://postgr.es/m/20170522154227.nvafbsm62sjpbxvd@alap3.anarazel.de
Backpatch: Bug is in master/v10 only
2017-06-01 14:19:33 -07:00
Robert Haas 814573e6c4 Restore accidentally-removed line.
Commit 88e66d193f is to blame.

Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoAXeb7O4hgg+efs8JT_SxpR4doAH5c5s-Z5WoRLstBZJA@mail.gmail.com
2017-05-31 14:24:22 -04:00
Tom Lane 54e839fe29 Sort syscache identifiers into alphabetical order.
Not much point in having a convention about this if we don't enforce it.

Mark Dilger

Discussion: https://postgr.es/m/7F67FBEF-C3B3-404E-8EC6-E02ACB15D894@gmail.com
2017-05-30 18:47:13 -04:00
Alvaro Herrera b4da9d0e1e brin: Don't crash on auto-summarization
We were trying to free a pointer into a shared buffer, which never
works; and we were failing to release the buffer lock appropriately.
Fix those omissions.

While at it, improve documentation for brinGetTupleForHeapBlock, the
inadequacy of which evidently caused these bugs in the first place.

Reported independently by Zhou Digoal (bug #14668) and Alexander Sosna.

Discussion: https://postgr.es/m/8c31c11b-6adb-228d-22c2-4ace89fc9209@credativ.de
Discussion: https://postgr.es/m/20170524063323.29941.46339@wrigleys.postgresql.org
2017-05-30 18:17:09 -04:00
Alvaro Herrera e6785a5ca1 Fix wording in amvalidate error messages
Remove some gratuituous message differences by making the AM name
previously embedded in each message be a %s instead.  While at it, get
rid of terminology that's unclear and unnecessary in one message.

Discussion: https://postgr.es/m/20170523001557.bq2hbq7hxyvyw62q@alvherre.pgsql
2017-05-30 15:45:42 -04:00
Tom Lane 80f583ffe9 Fix omission of locations in outfuncs/readfuncs partitioning node support.
We could have limped along without this for v10, which was my intention
when I annotated the bug in commit 76a3df6e5.  But consensus is that it's
better to fix it now and take the cost of a post-beta1 initdb (which is
needed because these node types are stored in pg_class.relpartbound).

Since we're forcing initdb anyway, take the opportunity to make the node
type identification strings match the node struct names, instead of being
randomly different from them.

Discussion: https://postgr.es/m/E1dFBEX-0004wt-8t@gemulon.postgresql.org
2017-05-30 11:32:41 -04:00
Tom Lane d5cb3bab56 Fix improper quoting of format_type_be() output.
Per our message style guidelines, error messages incorporating the
results of format_type_be() and its siblings should not add quotes
around those results, because those functions already add quotes
at need.  Fix a few places that hadn't gotten that memo.
2017-05-29 21:48:26 -04:00
Tom Lane 68cff231e3 Make edge-case behavior of jsonb_populate_record match json_populate_record
json_populate_record throws an error if asked to convert a JSON scalar
or array into a composite type.  jsonb_populate_record was returning
a record full of NULL fields instead.  It seems better to make it
throw an error for this case as well.

Nikita Glukhov

Discussion: https://postgr.es/m/fbd1d566-bba0-a3de-d6d0-d3b1d7c24ff2@postgrespro.ru
2017-05-29 19:29:42 -04:00
Tom Lane e45c5be99d Fix thinko in JsObjectSize() macro.
The macro gave the wrong answers for a JsObject with is_json == 0:
it would return 1 if jsonb_cont == NULL, or if that wasn't NULL,
it would return 1 for any non-zero size.

We could fix that, but the only use of this macro at present is in the
JsObjectIsEmpty() macro, so it seems simpler and clearer to get rid of
JsObjectSize() and put corrected logic into JsObjectIsEmpty().

Thinko in commit cf35346e8, so no need for back-patch.

Nikita Glukhov

Discussion: https://postgr.es/m/fbd1d566-bba0-a3de-d6d0-d3b1d7c24ff2@postgrespro.ru
2017-05-29 18:51:56 -04:00
Tom Lane ce50945295 Allow NumericOnly to be "+ FCONST".
The NumericOnly grammar production accepted ICONST, + ICONST, - ICONST,
FCONST, and - FCONST, but for some reason not + FCONST.  This led to
strange inconsistencies like

regression=# set random_page_cost = +4;
SET
regression=# set random_page_cost = 4000000000;
SET
regression=# set random_page_cost = +4000000000;
ERROR:  syntax error at or near "4000000000"

(because 4000000000 is too large to be an ICONST).  While there's
no actual functional reason to need to write a "+", if we allow
it for integers it seems like we should allow it for numerics too.

It's been like that forever, so back-patch to all supported branches.

Discussion: https://postgr.es/m/30908.1496006184@sss.pgh.pa.us
2017-05-29 15:19:07 -04:00
Tom Lane dced55dafe More code review for get_qual_for_list().
Avoid trashing the input PartitionBoundSpec; while that might be safe for
current callers, it's certainly trouble waiting to happen.  In the same
vein, make sure that all of the result data structure is freshly palloc'd,
rather than some of it being pointers into the input data structures
(which we don't know the lifespans of).

Simplify the logic for tacking on IS NULL or IS NOT NULL conditions some
more; commit 85c2b9a15 left a lot on the table there.  And rearrange the
construction of the nodes into (what seems to me) a more logical order.

In passing, make sure that get_qual_for_range() also returns a freshly
palloc'd structure, since there's no value in having that guarantee for
only one kind of partitioning.  And improve some comments there.

Jeevan Ladhe, with further tweaking by me

Discussion: https://postgr.es/m/CAOgcT0MAcYoMs93W80iTUf_dP36=1mZQzeUk+nnwY_-qWDrCfw@mail.gmail.com
2017-05-29 14:24:28 -04:00
Magnus Hagander 917d91285f Fix typo in comment
Masahiko Sawada
2017-05-29 16:29:19 +02:00
Tom Lane 76a3df6e5e Code review focused on new node types added by partitioning support.
Fix failure to check that we got a plain Const from const-simplification of
a coercion request.  This is the cause of bug #14666 from Tian Bing: there
is an int4 to money cast, but it's only stable not immutable (because of
dependence on lc_monetary), resulting in a FuncExpr that the code was
miserably unequipped to deal with, or indeed even to notice that it was
failing to deal with.  Add test cases around this coercion behavior.

In view of the above, sprinkle the code liberally with castNode() macros,
in hope of catching the next such bug a bit sooner.  Also, change some
functions that were randomly declared to take Node* to take more specific
pointer types.  And change some struct fields that were declared Node*
but could be given more specific types, allowing removal of assorted
explicit casts.

Place PARTITION_MAX_KEYS check a bit closer to the code it's protecting.
Likewise check only-one-key-for-list-partitioning restriction in a less
random place.

Avoid not-per-project-style usages like !strcmp(...).

Fix assorted failures to avoid scribbling on the input of parse
transformation.  I'm not sure how necessary this is, but it's entirely
silly for these functions to be expending cycles to avoid that and not
getting it right.

Add guards against partitioning on system columns.

Put backend/nodes/ support code into an order that matches handling
of these node types elsewhere.

Annotate the fact that somebody added location fields to PartitionBoundSpec
and PartitionRangeDatum but forgot to handle them in
outfuncs.c/readfuncs.c.  This is fairly harmless for production purposes
(since readfuncs.c would just substitute -1 anyway) but it's still bogus.
It's not worth forcing a post-beta1 initdb just to fix this, but if we
have another reason to force initdb before 10.0, we should go back and
clean this up.

Contrariwise, somebody added location fields to PartitionElem and
PartitionSpec but forgot to teach exprLocation() about them.

Consolidate duplicative code in transformPartitionBound().

Improve a couple of error messages.

Improve assorted commentary.

Re-pgindent the files touched by this patch; this affects a few comment
blocks that must have been added quite recently.

Report: https://postgr.es/m/20170524024550.29935.14396@wrigleys.postgresql.org
2017-05-28 23:20:28 -04:00
Tom Lane 94aced8cd0 Move autogenerated array types out of the way during ALTER ... RENAME.
Commit 9aa3c782c added code to allow CREATE TABLE/CREATE TYPE to not fail
when the desired type name conflicts with an autogenerated array type, by
dint of renaming the array type out of the way.  But I (tgl) overlooked
that the same case arises in ALTER TABLE/TYPE RENAME.  Fix that too.
Back-patch to all supported branches.

Report and patch by Vik Fearing, modified a bit by me

Discussion: https://postgr.es/m/0f4ade49-4f0b-a9a3-c120-7589f01d1eb8@2ndquadrant.com
2017-05-26 15:16:59 -04:00
Heikki Linnakangas 505b5d2f86 Abort authentication if the client selected an invalid SASL mechanism.
Previously, the server would log an error, but then try to continue with
SCRAM-SHA-256 anyway.

Michael Paquier

Discussion: https://www.postgresql.org/message-id/CAB7nPqR0G5aF2_kc_LH29knVqwvmBc66TF5DicvpGVdke68nKw@mail.gmail.com
2017-05-25 08:50:47 -04:00
Peter Eisentraut 073ce405d6 Fix table syncing with different column order
Logical replication supports replicating between tables with different
column order.  But this failed for the initial table sync because of a
logic error in how the column list for the internal COPY command was
composed.  Fix that and also add a test.

Also fix a minor omission in the column name mapping cache.  When
creating the mapping list, it would not skip locally dropped columns.
So if a remote column had the same name as a locally dropped
column (...pg.dropped...), then the expected error would not occur.
2017-05-24 19:40:30 -04:00
Peter Eisentraut 92ecb148e5 Improve logical replication worker log messages
Reduce some redundant messages to DEBUG1.  Be clearer about the
distinction between apply workers and table synchronization workers.
Add subscription and table name where possible.

Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-05-24 18:57:56 -04:00
Robert Haas 85c2b9a15a Code review of get_qual_for_list.
We need not consider the case where both nulltest1 and nulltest2 are
NULL; the partition either accepts nulls or it does not.

Jeevan Ladhe.  I added an assertion.
2017-05-24 16:45:58 -04:00
Tom Lane 9ae2661fe1 Tighten checks for whitespace in functions that parse identifiers etc.
This patch replaces isspace() calls with scanner_isspace() in functions
that are likely to be presented with non-ASCII input.  isspace() has
the small advantage that it will correctly recognize no-break space
in single-byte encodings (such as LATIN1); but it cannot work successfully
for any multibyte character, and depending on platform it might return
false positive results for some fragments of multibyte characters.  That's
disastrous for functions that are trying to discard whitespace between
valid strings, as noted in bug #14662 from Justin Muise.  Even treating
no-break space as whitespace is pretty questionable for the usages touched
here, because the core scanner would think it is an identifier character.

Affected functions are parse_ident(), parseNameAndArgTypes (underlying
regprocedurein() and siblings), SplitIdentifierString (used for parsing
GUCs and options that are qualified names or lists of names), and
SplitDirectoriesString (used for parsing GUCs that are lists of
directories).

All the functions adjusted here are parsing SQL identifiers and similar
constructs, so it's reasonable to insist that their definition of
whitespace match the core scanner.  So we can hope that this won't cause
many backwards-compatibility problems.  I've left alone isspace() calls
in places that aren't really expecting any non-ASCII input characters,
such as float8in().

Back-patch to all supported branches.

Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
2017-05-24 15:28:34 -04:00
Magnus Hagander 312bac54cc Fix typo in comment
Author: Masahiko Sawada
2017-05-22 09:10:02 +02:00
Tom Lane d761fe2182 Fix precision and rounding issues in money multiplication and division.
The cash_div_intX functions applied rint() to the result of the division.
That's not merely useless (because the result is already an integer) but
it causes precision loss for values larger than 2^52 or so, because of
the forced conversion to float8.

On the other hand, the cash_mul_fltX functions neglected to apply rint() to
their multiplication results, thus possibly causing off-by-one outputs.

Per C standard, arithmetic between any integral value and a float value is
performed in float format.  Thus, cash_mul_flt4 and cash_div_flt4 produced
answers good to only about six digits, even when the float value is exact.
We can improve matters noticeably by widening the float inputs to double.
(It's tempting to consider using "long double" arithmetic if available,
but that's probably too much of a stretch for a back-patched fix.)

Also, document that cash_div_intX operators truncate rather than round.

Per bug #14663 from Richard Pistole.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/22403.1495223615@sss.pgh.pa.us
2017-05-21 13:05:16 -04:00
Tom Lane 5c837ddd70 Rethink flex flags for syncrep_scanner.l.
Using flex's -i switch to achieve case-insensitivity is not a very safe
practice, because the scanner's behavior may then depend on the locale
that flex was invoked in.  In the particular example at hand, that's
not academic: the possible matches for "FIRST" will be different in a
Turkish locale than elsewhere.  Do it the hard way instead, as our
other scanners do.

Also, drop use of -b -CF -p, because this scanner is only used when
parsing the contents of a GUC variable.  That's not done often, and
the amount of text to be parsed can be expected to be trivial, so
prioritizing scanner speed over code size seems like quite the wrong
tradeoff.  Using flex's default optimization options reduces the
size of syncrep_gram.o by more than 50%.

The case-insensitivity problem is new in HEAD (cf commit 3901fd70c).
The poor choice of optimization flags exists also in 9.6, but it doesn't
seem important enough to back-patch.

Discussion: https://postgr.es/m/24403.1495225931@sss.pgh.pa.us
2017-05-19 18:05:20 -04:00
Peter Eisentraut e807d8b163 Fix mistake in error message
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
2017-05-19 16:30:02 -04:00
Robert Haas b522759508 Copy partitioned_rels lists to avoid shared substructure.
Otherwise, set_plan_refs() can get applied to the same list
multiple times through different references, leading to chaos.

Amit Langote, Dilip Kumar, and Robert Haas, reviewed by Ashutosh
Bapat.  Original report by Sveinn Sveinsson.

Discussion: http://postgr.es/m/20170517141151.1435.79890@wrigleys.postgresql.org
2017-05-19 15:26:05 -04:00
Tom Lane cf5389f5b5 Fix misspelled struct tag.
This was evidently intended to match the struct's typedef name,
but it didn't quite.  Noted while testing find_typedefs.
2017-05-19 15:05:58 -04:00
Robert Haas ac8d7e1b83 Fix corruption of tableElts list by MergeAttributes().
Since commit e7b3349a8a, MergeAttributes
destructively modifies the input List, to which the caller's
CreateStmt still points.  One may wonder whether this was already a
bug, but commit f0e44751d7 made things
noticeably worse by adding additional destructive modifications so
that the caller's List might, in the case of creation a partitioned
table, no longer even be structurally valid.  Restore the status quo
ante by assigning the return value of MergeAttributes back to
stmt->tableElts in the caller.

In most of the places where DefineRelation is called, it doesn't
matter what stmt->tableElts points to here or whether it's valid or
not, because the caller doesn't use the statement for anything after
DefineRelation returns anyway.  However, ProcessUtilitySlow passes it
to EventTriggerCollectSimpleCommand, and that function tries to invoke
copyObject on it.  If any of the CreateStmt's substructure is invalid
at that point, undefined behavior will result.

One might wonder whether this whole area needs further revision -
perhaps DefineRelation() ought not to be destructively modifying the
caller-provided CreateStmt at all.  However, that would be a behavior
change for any event triggers using C code to inspect the CreateStmt,
so for now, just fix the crash.

Report by Amit Langote, who provided a somewhat different patch for it.

Discussion: http://postgr.es/m/bf6a39a7-100a-74bd-1156-3c16a1429d88@lab.ntt.co.jp
2017-05-19 15:02:16 -04:00
Peter Eisentraut 7f17ae0ad0 Fix argument name differences
Different names were used between function declaration and definition.
2017-05-19 14:47:56 -04:00
Heikki Linnakangas 866490a6b7 Fix compilation with --with-bsd-auth.
Commit 8d3b9cce81 added extra arguments to the sendAuthRequest function,
but neglected this caller inside #ifdef USE_BSD_AUTH.

Per report from Pierre-Emmanuel André.

Discussion: https://www.postgresql.org/message-id/20170519090336.whzmjzrsap6ktbgg@digipea.digitick.local
2017-05-19 12:21:55 +03:00
Heikki Linnakangas 94884e1c27 Make slab allocator work on platforms with MAXIMUM_ALIGNOF < sizeof(int).
Notably, m68k only needs 2-byte alignment. Per report from Christoph Berg.

Discussion: https://www.postgresql.org/message-id/20170517193957.fwntkgi6epuso5l2@msg.df7cb.de
2017-05-18 22:22:13 +03:00
Robert Haas 3ec76ff1f2 Don't explicitly mark range partitioning columns NOT NULL.
This seemed like a good idea originally because there's no way to mark
a range partition as accepting NULL, but that now seems more like a
current limitation than something we want to lock down for all time.
For example, there's a proposal to add the notion of a default
partition which accepts all rows not otherwise routed, which directly
conflicts with the idea that a range-partitioned table should never
allow nulls anywhere.  So let's change this while we still can, by
putting the NOT NULL test into the partition constraint instead of
changing the column properties.

Amit Langote and Robert Haas, reviewed by Amit Kapila

Discussion: http://postgr.es/m/8e2dd63d-c6fb-bb74-3c2b-ed6d63629c9d@lab.ntt.co.jp
2017-05-18 13:49:31 -04:00
Heikki Linnakangas 2df537e43f Fix typo in comment.
Daniel Gustafsson
2017-05-18 10:33:16 +03:00
Peter Eisentraut 6234569851 Improve CREATE SUBSCRIPTION option parsing
When creating a subscription with slot_name = NONE, we failed to check
that also create_slot = false and enabled = false were set.  This
created an invalid subscription and could later lead to a crash if a
NULL slot name was accessed.  Add more checks around that for
robustness.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-05-17 20:47:37 -04:00
Bruce Momjian ce55481032 Post-PG 10 beta1 pgperltidy run 2017-05-17 19:01:23 -04:00
Bruce Momjian a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Robert Haas b2e4399baa Code review for make_partition_op_expr.
It's better to use the actual keynum here rather than 0, because
someday someone might try to make list partitioning work with
multiple partitioning columns.

Jeevan Ladhe

Discussion: http://postgr.es/m/CAOgcT0M6-mx+dSX47JGJuJP1CKr4XssBFVmKNETt0OZYWpFr+w@mail.gmail.com
2017-05-17 14:31:48 -04:00
Robert Haas 236d6d462d Remove redundant has_null member from PartitionBoundInfoData.
Jeevan Ladhe, with some changes by me.

Discussion: http://postgr.es/m/CAOgcT0NZ_30-pjBpW2OgneV1ammArHkZDZ8B_KFC3q+_Xb2H9A@mail.gmail.com
2017-05-17 12:50:01 -04:00
Peter Eisentraut 3db22794b7 Add more tests for CREATE SUBSCRIPTION
Add some tests for parsing different option combinations.  Fix some of
the resulting error messages for recent changes in option naming.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
2017-05-17 12:24:48 -04:00
Peter Eisentraut 944dc0f9ce Check relkind of tables in CREATE/ALTER SUBSCRIPTION
We used to only check for a supported relkind on the subscriber during
replication, which is needed to ensure that the setup is valid and we
don't crash.  But it's also useful to tell the user immediately when
CREATE or ALTER SUBSCRIPTION is executed that the relation being added
to the subscription is not of a supported relkind.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-05-16 22:57:16 -04:00
Tom Lane c079673dcb Preventive maintenance in advance of pgindent run.
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.

There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken.  Perhaps it's unreachable
in our usage?  Or maybe this is just sadly undertested.
2017-05-16 20:36:35 -04:00
Tom Lane ddd243584a Fix leakage of memory context header in find_all_inheritors().
Commit 827d6f977 contained the same misunderstanding of hash_create's API
as commit 090010f2e.  As in 5d00b764c, remove the unnecessary layer of
memory context.  (This bug is less significant than the other one, since
the extra context would be under a relatively short-lived context, but
it's still a bug.)
2017-05-16 19:33:31 -04:00
Tom Lane 8b0b6303e9 Try to ensure that stats collector's receive buffer size is at least 100KB.
Since commit 4e37b3e15, buildfarm member frogmouth has been failing
occasionally with symptoms indicating that some expected stats data is
getting dropped.  The reason that that commit changed the behavior seems
probably to be that more data is getting shoved at the collector in a short
span of time.  In current sources, the stats test's first session sends
about 9KB of data while exiting, which is probably the same as what was
sent just before wait_for_stats() in the previous test design.  But now,
the test's second session is starting up concurrently, and it sends another
2KB (presumably reflecting its initial catalog accesses).  Since frogmouth
is running on Windows XP, which reputedly has a default socket receive
buffer size of only 8KB, it is not very surprising if this has put us over
the threshold where the receive buffer can overflow and drop messages.

The same mechanism could very easily explain the intermittent stats test
failures we've been seeing for years, since background processes such
as the bgwriter will sometimes send data concurrently with all this, and
could thus cause occasional buffer overflows.

Hence, insert some code into pgstat_init() to increase the stats socket's
receive buffer size to 100KB if it's less than that.  (On failure, emit a
LOG message, but keep going.)  Modern systems seem to have default sizes
in the range of 100KB-250KB, but older platforms don't.  I couldn't find
any platforms that wouldn't accept 100KB, so in theory this won't cause
any portability problems.

If this is successful at reducing the buildfarm failure rate in HEAD,
we should back-patch it, because it's certain that similar buffer overflows
happen in the field on platforms with small buffer sizes.  Going forward,
there might be an argument for trying to increase the buffer size even
more, but let's take a baby step first.

Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us
2017-05-16 15:24:52 -04:00
Robert Haas 59f40566ca Fix relcache leak when row triggers on partitions are fired by COPY.
Thomas Munro, reviewed by Amit Langote

Discussion: http://postgr.es/m/CAEepm=15Jss-yhFApuKzxcoCuFnb8TR8iQiWMjG=CLYPx48QLw@mail.gmail.com
2017-05-16 12:46:32 -04:00
Robert Haas 0ad226f2ae Add missing apostrophe.
Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoAzaR_XV7j7Wk9-QYXaFoT8H4egKwXvFY63wc8Lw2C9cg@mail.gmail.com
2017-05-15 15:41:15 -04:00
Peter Eisentraut b1ff33fd9b Add assertion to quiet Coverity 2017-05-15 13:59:58 -04:00
Peter Eisentraut 82d24bab75 Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 398beeef4921df0956f917becd7b5669d2a8a5c4
2017-05-15 12:19:54 -04:00
Tom Lane 12590c5d33 Fix unsafe reference into relcache in constructed CommentStmt.
The CommentStmt made by RebuildConstraintComment() has to pstrdup the
relation name, else it will contain a dangling pointer after that
relcache entry is flushed.  (I'm less sure that pstrdup'ing conname
is necessary, but let's be safe.)  Failure to do this leads to weird
errors or crashes, as reported by Marko Elezovic.

Bug introduced by commit e42375fc8, so back-patch to 9.5 as that was.

Fix by David Rowley, regression test by Michael Paquier

Discussion: https://postgr.es/m/DB6PR03MB30775D58E732D4EB0C13725B9AE00@DB6PR03MB3077.eurprd03.prod.outlook.com
2017-05-15 11:33:44 -04:00
Peter Eisentraut f8dc1985fd Fix ALTER SEQUENCE locking
In 1753b1b027, the pg_sequence system
catalog was introduced.  This made sequence metadata changes
transactional, while the actual sequence values are still behaving
nontransactionally.  This requires some refinement in how ALTER
SEQUENCE, which operates on both, locks the sequence and the catalog.

The main problems were:

- Concurrent ALTER SEQUENCE causes "tuple concurrently updated" error,
  caused by updates to pg_sequence catalog.

- Sequence WAL writes and catalog updates are not protected by same
  lock, which could lead to inconsistent recovery order.

- nextval() disregarding uncommitted ALTER SEQUENCE changes.

To fix, nextval() and friends now lock the sequence using
RowExclusiveLock instead of AccessShareLock.  ALTER SEQUENCE locks the
sequence using ShareRowExclusiveLock.  This means that nextval() and
ALTER SEQUENCE block each other, and ALTER SEQUENCE on the same sequence
blocks itself.  (This was already the case previously for the OWNER TO,
RENAME, and SET SCHEMA variants.)  Also, rearrange some code so that the
entire AlterSequence is protected by the lock on the sequence.

As an exception, use reduced locking for ALTER SEQUENCE ... RESTART.
Since that is basically a setval(), it does not require the full locking
of other ALTER SEQUENCE actions.  So check whether we are only running a
RESTART and run with less locking if so.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Jason Petersen <jason@citusdata.com>
Reported-by: Andres Freund <andres@anarazel.de>
2017-05-15 10:19:57 -04:00
Tom Lane 5d00b764cd Make pgstat tabstat lookup hash table less fragile.
Code review for commit 090010f2e.

Fix cases where an elog(ERROR) partway through a function would leave the
persistent data structures in a corrupt state.  pgstat_report_stat got this
wrong by invalidating PgStat_TableEntry structs before removing hashtable
entries pointing to them, and get_tabstat_entry got it wrong by ignoring
the possibility of palloc failure after it had already created a hashtable
entry.

Also, avoid leaking a memory context per transaction, which the previous
code did through misunderstanding hash_create's API.  We do not need to
create a context to hold the hash table; hash_create will do that.
(The leak wasn't that large, amounting to only a memory context header
per iteration, but it's still surprising that nobody noticed it yet.)
2017-05-14 22:52:49 -04:00
Robert Haas edbe2a2936 Attempt to fix compiler warning.
Per a report from Tom Lane, newer versions of gcc apparently think
that partexprs_item_saved can be used uninitialized.  Try to convince
them otherwise.
2017-05-14 20:59:28 -04:00
Tom Lane e84c019598 Fix maintenance hazards caused by ill-considered use of default: cases.
Remove default cases from assorted switches over ObjectClass and some
related enum types, so that we'll get compiler warnings when someone
adds a new enum value without accounting for it in all these places.

In passing, re-order some switch cases as needed to match the declaration
of enum ObjectClass.  OK, that's just neatnik-ism, but I dislike code
that looks like it was assembled with the help of a dartboard.

Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql
2017-05-14 13:32:59 -04:00
Tom Lane b5b0db19b8 Fix handling of extended statistics during ALTER COLUMN TYPE.
ALTER COLUMN TYPE on a column used by a statistics object fails since
commit 928c4de30, because the relevant switch in ATExecAlterColumnType
is unprepared for columns to have dependencies from OCLASS_STATISTIC_EXT
objects.

Although the existing types of extended statistics don't actually need us
to do any work for a column type change, it seems completely indefensible
that that assumption is hidden behind the failure of an unrelated module
to contain any code for the case.  Hence, create and call an API function
in statscmds.c where the assumption can be explained, and where we could
add code to deal with the problem when it inevitably becomes real.

Also, the reason this wasn't handled before, neither for extended stats
nor for the last half-dozen new OCLASS kinds :-(, is that the default:
in that switch suppresses compiler warnings, allowing people to miss the
need to consider it when adding an OCLASS.  We don't really need a default
because surely getObjectClass should only return valid values of the enum;
so remove it, and add the missed OCLASS entries where they should be.

Discussion: https://postgr.es/m/20170512221010.nglatgt5azzdxjlj@alvherre.pgsql
2017-05-14 12:22:25 -04:00
Tom Lane f674743487 Remove no-longer-needed fields of Hash plan nodes.
skewColType/skewColTypmod are no longer used in the wake of commit
9aab83fc5, and seem unlikely to be wanted in future, so let's drop 'em.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-14 11:07:40 -04:00
Tom Lane f04c9a6146 Standardize terminology for pg_statistic_ext entries.
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics".  Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural.  That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.

This commit fixes the code and a lot of comments (though I may have
missed a few).  I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.

I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-14 10:55:01 -04:00
Andres Freund 524dbc1433 Avoid superfluous work for commits during logical slot creation.
Before 955a684e04 logical decoding snapshot maintenance needed to
cope with transactions it might not have seen in their entirety. For
such transactions we'd to assume they modified the catalog (could have
happened before we were watching), and thus a new snapshot had to be
built, and distributed to concurrently running transactions.

That's problematic because building a new snapshot isn't that cheap ,
especially as the the array of committed transactions needs to be
sorted.  When creating a slot on a server with a lot of transactions,
this could make logical slot creation infeasibly expensive.

After 955a684e04 there's no need to deal with transaction that
aren't guaranteed to be fully observable.  That allows to avoid
building snapshots for transactions that haven't modified catalog,
even before reaching consistency.

While this isn't necessarily a bugfix, slot creation being impossible
in some production workloads, is severe enough to warrant
backpatching.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 15:06:40 -07:00
Andres Freund 955a684e04 Fix race condition leading to hanging logical slot creation.
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 14:21:00 -07:00
Tom Lane 9aab83fc50 Redesign get_attstatsslot()/free_attstatsslot() for more safety and speed.
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway.  So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.

Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element.  That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger.  We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array.  Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free.  (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them.  At the other end of the size range, it doesn't expand
short-header datums either.  In either case, DatumGetArrayTypeP would have
to make a copy.  We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression.  But that
seems like a very acceptable price for winning in pass-by-ref cases.)

Hence, redesign to take these insights into account.  While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments.  That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.

It's certainly arguable that this is new development and not something to
push post-feature-freeze.  However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later.  Since
we aren't quite at beta phase yet, let's put it in.

Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
2017-05-13 15:14:39 -04:00
Robert Haas 1848b73d45 Teach \d+ to show partitioning constraints.
The fact that we didn't have this in the first place is likely why
the problem fixed by f8bffe9e6d
escaped detection.

Patch by Amit Langote, reviewed and slightly adjusted by me.

Discussion: http://postgr.es/m/CA+TgmoYWnV2GMnYLG-Czsix-E1WGAbo4D+0tx7t9NdfYBDMFsA@mail.gmail.com
2017-05-13 12:04:53 -04:00
Robert Haas f8bffe9e6d Fix multi-column range partitioning constraints.
The old logic was just plain wrong.

Report by Olaf Gawenda.  Patch by Amit Langote, reviewed by
Beena Emerson and by me.  Minor adjustments by me also.
2017-05-13 11:36:41 -04:00
Alvaro Herrera d99d58cdc8 Complete tab completion for DROP STATISTICS
Tab-completing DROP STATISTICS would only work if you started writing
the schema name containing the statistics object, because the visibility
clause was missing.  To add it, we need to add SQL-callable support for
testing visibility of a statistics object, like all other object types
already have.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-13 01:05:48 -03:00
Tom Lane 2df5d46555 Avoid searching for callback functions in CallSyscacheCallbacks().
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems.  In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID.  Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).

In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.

This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there.  I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place.  Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.

Back-patch to 9.4 where logical decoding was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 19:05:27 -04:00
Tom Lane 8085a4f751 Reduce initial size of RelfilenodeMapHash.
A test case provided by Mathieu Fenniak shows that hash_seq_search'ing
this hashtable can consume a very significant amount of overhead during
logical decoding, which triggers frequent cache invalidation.  Testing
suggests that the actual population of the hashtable is often no more
than a few dozen entries, so we can cut the overhead just by dropping
the initial number of buckets down from 1024 --- I chose to cut it to 64.
(In situations where we do have a significant number of entries, we
shouldn't get any real penalty from doing this, as the dynahash.c code
will resize the hashtable automatically.)

This gives a further factor-of-two savings in Mathieu's test case.
That may be overly optimistic for real-world benefit, as real cases
may have larger average table populations, but it's hard to see it
turning into a net negative for any workload.

Back-patch to 9.4 where relfilenodemap.c was introduced.

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:30:17 -04:00
Alvaro Herrera 5e2af609e1 getObjectDescription: support extended statistics
This was missed in 7b504eb282.

Remove the "default:" clause in the switch, to avoid this problem in the
future.  Other switches involving the same enum should probably be
changed in the same way, but are not touched by this patch.

Discussion: https://postgr.es/m/20170512204800.iqt2uwyx3c32j45r@alvherre.pgsql
2017-05-12 19:22:50 -03:00
Tom Lane 50ee1c7462 Avoid searching for the target catcache in CatalogCacheIdInvalidate.
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do.  There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID.  We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.

While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X.  We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.

Back-patch to 9.4 where logical decoding was introduced.  It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.

(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly.  So
backpatching should be fine.)

Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
2017-05-12 18:17:29 -04:00
Tom Lane 928c4de309 Fix dependencies for extended statistics objects.
A stats object ought to have a dependency on each individual column
it reads, not the entire table.  Doing this honestly lets us get rid
of the hard-wired logic in RemoveStatisticsExt, which seems to have
been misguidedly modeled on RemoveStatistics; and it will be far easier
to extend to multiple tables later.

Also, add overlooked dependency on owner, and make the dependency on
schema be NORMAL like every other such dependency.

There remains some unfinished work here, which is to allow statistics
objects to be extension members.  That takes more effort than just
adding the dependency call, though, so I left it out for now.

initdb forced because this changes the set of pg_depend records that
should exist for a statistics object.

Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
2017-05-12 16:26:31 -04:00
Alvaro Herrera bc085205c8 Change CREATE STATISTICS syntax
Previously, we had the WITH clause in the middle of the command, where
you'd specify both generic options as well as statistic types.  Few
people liked this, so this commit changes it to remove the WITH keyword
from that clause and makes it accept statistic types only.  (We
currently don't have any generic options, but if we invent in the
future, we will gain a new WITH clause, probably at the end of the
command).

Also, the column list is now specified without parens, which makes the
whole command look more similar to a SELECT command.  This change will
let us expand the command to supporting expressions (not just columns
names) as well as multiple tables and their join conditions.

Tom added lots of code comments and fixed some parts of the CREATE
STATISTICS reference page, too; more changes in this area are
forthcoming.  He also fixed a potential problem in the alter_generic
regression test, reducing verbosity on a cascaded drop to avoid
dependency on message ordering, as we do in other tests.

Tom also closed a security bug: we documented that table ownership was
required in order to create a statistics object on it, but didn't
actually implement it.

Implement tab-completion for statistics objects.  This can stand some
more improvement.

Authors: Alvaro Herrera, with lots of cleanup by Tom Lane
Discussion: https://postgr.es/m/20170420212426.ltvgyhnefvhixm6i@alvherre.pgsql
2017-05-12 14:59:35 -03:00
Peter Eisentraut d496a65790 Standardize "WAL location" terminology
Other previously used terms were "WAL position" or "log position".
2017-05-12 13:51:27 -04:00
Peter Eisentraut c1a7f64b4a Replace "transaction log" with "write-ahead log"
This makes documentation and error messages match the renaming of "xlog"
to "wal" in APIs and file naming.
2017-05-12 11:52:43 -04:00
Peter Eisentraut b807f59828 Rework the options syntax for logical replication commands
For CREATE/ALTER PUBLICATION/SUBSCRIPTION, use similar option style as
other statements that use a WITH clause for options.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2017-05-12 08:57:49 -04:00
Simon Riggs 024711bb54 Lag tracking for logical replication
Lag tracking is called for each commit, but we introduce
a pacing delay to ensure we don't swamp the lag tracker.

Author: Petr Jelinek, with minor pacing delay code from me
2017-05-12 10:50:56 +01:00
Tom Lane 596a7c8df7 Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.
Increase from the historical value of 32 to 64.  We are up to 31 callers
of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be
exercised in one process that would leave only one slot for add-on modules.
It's probably not possible for that to happen, but still we clearly need
more daylight here.  (At some point it might be worth making the array
dynamically resizable; but since we've never heard a complaint of "out of
syscache_callback_list slots" happening in the field, I doubt it's worth
it yet.)

Back-patch as far as 9.4, which is where we increased the companion limit
MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a).  It's not as urgent in
released branches, which have only a couple dozen call sites in core, but
it still seems that somebody might hit the limit before these branches die.

Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
2017-05-11 14:51:21 -04:00
Tom Lane d10c626de4 Rename WAL-related functions and views to use "lsn" not "location".
Per discussion, "location" is a rather vague term that could refer to
multiple concepts.  "LSN" is an unambiguous term for WAL locations and
should be preferred.  Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position".  Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.

David Rowley, minor additional docs hacking by me

Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
2017-05-11 11:49:59 -04:00
Alvaro Herrera b66adb7b0c Revert "Permit dump/reload of not-too-large >1GB tuples"
This reverts commits fa2fa99552 and 42f50cb8fa.

While the functionality that was intended to be provided by these
commits is desired, the patch didn't actually solve as many of the
problematic situations as we hoped, and it created a bunch of its own
problems.  Since we're going to require more extensive changes soon for
other reasons and users have been working around these problems for a
long time already, there is no point in spending effort in fixing this
halfway measure.

Per complaint from Tom Lane.
Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us

(Commit fa2fa99552 had already been reverted in branches 9.5 as
f858524ee4 and 9.6 as e9e44a0953, so this touches master only.
Commit 42f50cb8fa was not present in the older branches.)
2017-05-10 18:41:27 -03:00
Robert Haas 622c82279d Avoid theoretical infinite loop loading relcache partition key.
Amit Langote, per report from 甄明洋

Discussion: http://postgr.es/m/57bd1e1.1886.15bd7b79cee.Coremail.18612389267@yeah.net
2017-05-09 23:53:35 -04:00
Robert Haas a5775991bb Remove no-longer-needed compatibility code for hash indexes.
Because commit ea69a0dead bumped the
HASH_VERSION, we don't need to worry about PostgreSQL 10 seeing
bucket pages from earlier versions.

Amit Kapila

Discussion: http://postgr.es/m/CAA4eK1LAo4DGwh+mi-G3U8Pj1WkBBeFL38xdCnUHJv1z4bZFkQ@mail.gmail.com
2017-05-09 23:44:21 -04:00
Robert Haas df1a4eba94 Fix typos in comments.
Etsuro Fujita

Discussion: http://postgr.es/m/968d99bf-0fa8-085b-f0a1-a379f8d661ff@lab.ntt.co.jp
2017-05-09 23:40:08 -04:00
Robert Haas 9e6104c667 Prohibit transition tables on views and foreign tables.
Thomas Munro, per off-list report from Prabhat Sabu.  Changes
to the message wording for consistency with the existing
relkind check for partitioned tables by me.

Discussion: http://postgr.es/m/CAEepm=2xJFFpGM+N=gpWx-9Nft2q1oaFZX07_y23AHCrJQLt0g@mail.gmail.com
2017-05-09 23:34:02 -04:00
Robert Haas 29fd3d9da0 Don't permit transition tables with TRUNCATE triggers.
Prior to this prohibition, such a trigger caused a crash.

Thomas Munro, per a report from Neha Sharma.  I added a
regression test.

Discussion: http://postgr.es/m/CAEepm=0VR5W-N38eTkO_FqJbGqQ_ykbBRmzmvHyxDhy1p=0Csw@mail.gmail.com
2017-05-09 23:24:23 -04:00
Robert Haas 304007d9f1 Pass EXEC_FLAG_REWIND when initializing a tuplestore scan.
Since a rescan is possible, we must be able to rewind.

Thomas Munro, per a report from Prabhat Sabu

Discussion: http://postgr.es/m/CAEepm=2=Uv5fm=exqL+ygBxaO+-tgmC=o+63H4zYAXi9HtXf1w@mail.gmail.com
2017-05-09 23:13:21 -04:00
Robert Haas 3439f84475 Disallow finite partition bound following earlier UNBOUNDED column.
Amit Langote, per an observation by me.

Discussion: http://postgr.es/m/CA+TgmoYWnV2GMnYLG-Czsix-E1WGAbo4D+0tx7t9NdfYBDMFsA@mail.gmail.com
2017-05-09 22:41:12 -04:00
Peter Eisentraut 489b96e80b Improve memory use in logical replication apply
Previously, the memory used by the logical replication apply worker for
processing messages would never be freed, so that could end up using a
lot of memory.  To improve that, change the existing ApplyContext memory
context to ApplyMessageContext and reset that after every
message (similar to MessageContext used elsewhere).  For consistency of
naming, rename the ApplyCacheContext to ApplyContext.

Author: Stas Kelvich <s.kelvich@postgrespro.ru>
2017-05-09 14:51:49 -04:00
Peter Eisentraut 013c1178fd Remove the NODROP SLOT option from DROP SUBSCRIPTION
It turned out this approach had problems, because a DROP command should
not have any options other than CASCADE and RESTRICT.  Instead, always
attempt to drop the slot if there is one configured, but also add an
ALTER SUBSCRIPTION action to set the slot to NONE.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/29431.1493730652@sss.pgh.pa.us
2017-05-09 10:20:42 -04:00
Tom Lane da07596006 Further patch rangetypes_selfuncs.c's statistics slot management.
Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8,
not of the type of the column the statistics are for.

This bug is at least partly the fault of sloppy specification comments
for get_attstatsslot()/free_attstatsslot(): the type OID they want is that
of the stavalues entries, not of the underlying column.  (I double-checked
other callers and they seem to get this right.)  Adjust the comments to be
more correct.

Per buildfarm.

Security: CVE-2017-7484
2017-05-08 15:03:14 -04:00
Peter Eisentraut fe974cc5a6 Check connection info string in ALTER SUBSCRIPTION
Previously it would allow an invalid connection string to be set.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
2017-05-08 14:01:00 -04:00
Peter Eisentraut 9a591c1bcc Fix statistics reporting in logical replication workers
This new arrangement ensures that statistics are reported right after
commit of transactions.  The previous arrangement didn't get this quite
right and could lead to assertion failures.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Erik Rijkers <er@xs4all.nl>
2017-05-08 12:10:22 -04:00
Tom Lane b6576e5914 Fix possibly-uninitialized variable.
Oversight in e2d4ef8de et al (my fault not Peter's).  Per buildfarm.

Security: CVE-2017-7484
2017-05-08 11:18:40 -04:00
Noah Misch 3eefc51053 Match pg_user_mappings limits to information_schema.user_mapping_options.
Both views replace the umoptions field with NULL when the user does not
meet qualifications to see it.  They used different qualifications, and
pg_user_mappings documented qualifications did not match its implemented
qualifications.  Make its documentation and implementation match those
of user_mapping_options.  One might argue for stronger qualifications,
but these have long, documented tenure.  pg_user_mappings has always
exhibited this problem, so back-patch to 9.2 (all supported versions).

Michael Paquier and Feike Steenbergen.  Reviewed by Jeff Janes.
Reported by Andrew Wheelwright.

Security: CVE-2017-7486
2017-05-08 07:24:24 -07:00
Peter Eisentraut e2d4ef8de8 Add security checks to selectivity estimation functions
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables.  Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof.  If neither is satisfied, planning
will proceed as if there are no statistics available.

At least one of these is satisfied in most cases in practice.  The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.

Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>

Security: CVE-2017-7484
2017-05-08 09:26:32 -04:00
Heikki Linnakangas eb61136dc7 Remove support for password_encryption='off' / 'plain'.
Storing passwords in plaintext hasn't been a good idea for a very long
time, if ever. Now seems like a good time to finally forbid it, since we're
messing with this in PostgreSQL 10 anyway.

Remove the CREATE/ALTER USER UNENCRYPTED PASSSWORD 'foo' syntax, since
storing passwords unencrypted is no longer supported. ENCRYPTED PASSWORD
'foo' is still accepted, but ENCRYPTED is now just a noise-word, it does
the same as just PASSWORD 'foo'.

Likewise, remove the --unencrypted option from createuser, but accept
--encrypted as a no-op for backward compatibility. AFAICS, --encrypted was
a no-op even before this patch, because createuser encrypted the password
before sending it to the server even if --encrypted was not specified. It
added the ENCRYPTED keyword to the SQL command, but since the password was
already in encrypted form, it didn't make any difference. The documentation
was not clear on whether that was intended or not, but it's moot now.

Also, while password_encryption='on' is still accepted as an alias for
'md5', it is now marked as hidden, so that it is not listed as an accepted
value in error hints, for example. That's not directly related to removing
'plain', but it seems better this way.

Reviewed by Michael Paquier

Discussion: https://www.postgresql.org/message-id/16e9b768-fd78-0b12-cfc1-7b6b7f238fde@iki.fi
2017-05-08 11:26:07 +03:00
Simon Riggs 1f30295eab Remove poorly worded and duplicated comment
Move line of code to avoid need for duplicated comment

Brought to attention by Masahiko Sawada
2017-05-08 08:49:28 +01:00
Heikki Linnakangas 0186ded546 Fix memory leaks if random salt generation fails.
In the backend, this is just to silence coverity warnings, but in the
frontend, it's a genuine leak, even if extremely rare.

Spotted by Coverity, patch by Michael Paquier.
2017-05-07 19:58:21 +03:00
Stephen Frost aa5d3c0b3f RLS: Fix ALL vs. SELECT+UPDATE policy usage
When we add the SELECT-privilege based policies to the RLS with check
options (such as for an UPDATE statement, or when we have INSERT ...
RETURNING), we need to be sure and use the 'USING' case if the policy is
actually an 'ALL' policy (which could have both a USING clause and an
independent WITH CHECK clause).

This could result in policies acting differently when built using ALL
(when the ALL had both USING and WITH CHECK clauses) and when building
the policies independently as SELECT and UPDATE policies.

Fix this by adding an explicit boolean to add_with_check_options() to
indicate when the USING policy should be used, even if the policy has
both USING and WITH CHECK policies on it.

Reported by: Rod Taylor

Back-patch to 9.5 where RLS was introduced.
2017-05-06 21:46:35 -04:00
Andres Freund b58c433ef9 Fix duplicated words in comment.
Reported-By: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn3rY2N0gTWndaApD113T+O8L6oz8cm7_F3P8y4awdoOg@mail.gmail.com
Backpatch: no, only present in master
2017-05-06 17:03:45 -07:00
Andres Freund e6c44eef55 Fix off-by-one possibly leading to skipped XLOG_RUNNING_XACTS records.
Since 6ef2eba3f5 ("Skip checkpoints, archiving on idle systems."),
GetLastImportantRecPtr() is used to avoid performing superfluous
checkpoints, xlog switches, running-xact records when the system is
idle.  Unfortunately the check concerning running-xact records had a
off-by-one error, leading to such records being potentially skipped
when only a single record has been inserted since the last
running-xact record.

An alternative approach would have been to change
GetLastImportantRecPtr()'s definition to point to the end of records,
but that would make the checkpoint code more complicated.

Author: Andres Freund
Discussion: https://postgr.es/m/20170505012447.wsrympaxnfis6ojt@alap3.anarazel.de
Backpatch: no, code only present in master
2017-05-06 16:55:07 -07:00
Tom Lane b3a47cdfd6 Suppress compiler warning about unportable pointer value.
Setting a pointer value to "0xdeadbeef" draws a warning from some
compilers, and for good reason.  Be less cute and just set it to NULL.

In passing make some other cosmetic adjustments nearby.

Discussion: https://postgr.es/m/CAJrrPGdW3EkU-CRobvVKYf3fJuBdgWyuGeAbNzAQ4yBh+bfb_Q@mail.gmail.com
2017-05-05 12:46:04 -04:00
Peter Eisentraut 086221cf6b Prevent panic during shutdown checkpoint
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-05-05 10:31:42 -04:00
Heikki Linnakangas e6e9c4da3a Misc cleanup of SCRAM code.
* Remove is_scram_verifier() function. It was unused.
* Fix sanitize_char() function, used in error messages on protocol
  violations, to print bytes >= 0x7F correctly.
* Change spelling of scram_MockSalt() function to be more consistent with
  the surroundings.
* Change a few more references to "server proof" to "server signature" that
  I missed in commit d981074c24.
2017-05-05 10:01:44 +03:00
Heikki Linnakangas 344a113079 Don't use SCRAM-specific "e=invalid-proof" on invalid password.
Instead, send the same FATAL message as with other password-based
authentication mechanisms. This gives a more user-friendly message:

psql: FATAL:  password authentication failed for user "test"

instead of:

psql: error received from server in SASL exchange: invalid-proof

Even before this patch, the server sent that FATAL message, after the
SCRAM-specific "e=invalid-proof" message. But libpq would stop at the
SCRAM error message, and not process the ErrorResponse that would come
after that. We could've taught libpq to check for an ErrorResponse after
failed authentication, but it's simpler to modify the server to send only
the ErrorResponse. The SCRAM specification allows for aborting the
authentication at any point, using an application-defined error mechanism,
like PostgreSQL's ErrorResponse. Using the e=invalid-proof message is
optional.

Reported by Jeff Janes.

Discussion: https://www.postgresql.org/message-id/CAMkU%3D1w3jQ53M1OeNfN8Cxd9O%2BA_9VONJivTbYoYRRdRsLT6vA@mail.gmail.com
2017-05-05 10:01:41 +03:00
Tom Lane 3f074845a8 Fix pfree-of-already-freed-tuple when rescanning a GiST index-only scan.
GiST's getNextNearest() function attempts to pfree the previously-returned
tuple if any (that is, scan->xs_hitup in HEAD, or scan->xs_itup in older
branches).  However, if we are rescanning a plan node after ending a
previous scan early, those tuple pointers could be pointing to garbage,
because they would be pointing into the scan's pageDataCxt or queueCxt
which has been reset.  In a debug build this reliably results in a crash,
although I think it might sometimes accidentally fail to fail in
production builds.

To fix, clear the pointer field anyplace we reset a context it might
be pointing into.  This may be overkill --- I think probably only the
queueCxt case is involved in this bug, so that resetting in gistrescan()
would be sufficient --- but dangling pointers are generally bad news,
so let's avoid them.

Another plausible answer might be to just not bother with the pfree in
getNextNearest().  The reconstructed tuples would go away anyway in the
context resets, and I'm far from convinced that freeing them a bit earlier
really saves anything meaningful.  I'll stick with the original logic in
this patch, but if we find more problems in the same area we should
consider that approach.

Per bug #14641 from Denis Smirnov.  Back-patch to 9.5 where this
logic was introduced.

Discussion: https://postgr.es/m/20170504072034.24366.57688@wrigleys.postgresql.org
2017-05-04 13:59:39 -04:00
Peter Eisentraut 0de791ed76 Fix cursor_to_xml in tableforest false mode
It only produced <row> elements but no wrapping <table> element.

By contrast, cursor_to_xmlschema produced a schema that is now correct
but did not previously match the XML data produced by cursor_to_xml.

In passing, also fix a minor misunderstanding about moving cursors in
the tests related to this.

Reported-by: filip@jirsak.org
Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
2017-05-03 21:41:10 -04:00
Heikki Linnakangas 8f8b9be51f Add PQencryptPasswordConn function to libpq, use it in psql and createuser.
The new function supports creating SCRAM verifiers, in addition to md5
hashes. The algorithm is chosen based on password_encryption, by default.

This fixes the issue reported by Jeff Janes, that there was previously
no way to create a SCRAM verifier with "\password".

Michael Paquier and me

Discussion: https://www.postgresql.org/message-id/CAMkU%3D1wfBgFPbfAMYZQE78p%3DVhZX7nN86aWkp0QcCp%3D%2BKxZ%3Dbg%40mail.gmail.com
2017-05-03 11:19:07 +03:00
Tom Lane 23c6eb0336 Remove create_singleton_array(), hard-coding the case in its sole caller.
create_singleton_array() was not really as useful as we perhaps thought
when we added it.  It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself.  Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query.  The same objection seems
likely to apply to any other potential caller.

The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same.  Uglier seems like
a good tradeoff for smaller, faster, and safer.

Per discussion with Neha Khatri.

Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
2017-05-02 20:41:37 -04:00
Tom Lane 9209e07605 Ensure commands in extension scripts see the results of preceding DDL.
Due to a missing CommandCounterIncrement() call, parsing of a non-utility
command in an extension script would not see the effects of the immediately
preceding DDL command, unless that command's execution ends with
CommandCounterIncrement() internally ... which some do but many don't.
Report by Philippe Beaudoin, diagnosis by Julien Rouhaud.

Rather remarkably, this bug has evaded detection since extensions were
invented, so back-patch to all supported branches.

Discussion: https://postgr.es/m/2cf7941e-4e41-7714-3de8-37b1a8f74dff@free.fr
2017-05-02 18:06:09 -04:00
Alvaro Herrera 93bbeec6a2 extstats: change output functions to emit valid JSON
Manipulating extended statistics is more convenient as JSON than the
current ad-hoc format, so let's change before it's too late.

Discussion: https://postgr.es/m/20170420193828.k3fliiock5hdnehn@alvherre.pgsql
2017-05-02 18:49:32 -03:00
Robert Haas 0d1e1f0ea4 Fix typos in comments.
Etsuro Fujita

Discussion: http://postgr.es/m/00e88999-684d-d79a-70e4-908c937a0126@lab.ntt.co.jp
2017-05-02 14:47:46 -04:00
Peter Eisentraut 3d092fe540 Avoid unnecessary catalog updates in ALTER SEQUENCE
ALTER SEQUENCE can do nontransactional changes to the sequence (RESTART
clause) and transactional updates to the pg_sequence catalog (most other
clauses).  When just calling RESTART, the code would still needlessly do
a catalog update without any changes.  This would entangle that
operation in the concurrency issues of a catalog update (causing either
locking or concurrency errors, depending on how that issue is to be
resolved).

Fix by keeping track during options parsing whether a catalog update is
needed, and skip it if not.

Reported-by: Jason Petersen <jason@citusdata.com>
2017-05-02 10:41:48 -04:00
Magnus Hagander 34fc616738 Change hot_standby default value to 'on'
This goes together with the changes made to enable replication on the
sending side by default (wal_level, max_wal_senders etc) by making the
receiving stadby node also enable it by default.

Huong Dangminh
2017-05-02 11:12:30 +02:00
Peter Eisentraut a99448ab45 Don't wake up logical replication launcher unnecessarily
In CREATE SUBSCRIPTION, only wake up the launcher when the subscription
is enabled.

Author: Fujii Masao <masao.fujii@gmail.com>
2017-05-01 22:50:32 -04:00
Tom Lane 54affb41e7 Improve function header comment for create_singleton_array().
Mentioning the caller is neither future-proof nor an adequate substitute
for giving an API specification.  Per gripe from Neha Khatri, though
I changed the patch around some.

Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
2017-05-01 15:31:41 -04:00
Tom Lane 92a43e4857 Reduce semijoins with unique inner relations to plain inner joins.
If the inner relation can be proven unique, that is it can have no more
than one matching row for any row of the outer query, then we might as
well implement the semijoin as a plain inner join, allowing substantially
more freedom to the planner.  This is a form of outer join strength
reduction, but it can't be implemented in reduce_outer_joins() because
we don't have enough info about the individual relations at that stage.
Instead do it much like remove_useless_joins(): once we've built base
relations, we can make another pass over the SpecialJoinInfo list and
get rid of any entries representing reducible semijoins.

This is essentially a followon to the inner-unique patch (commit 9c7f5229a)
and makes use of the proof machinery that that patch created.  We need only
minor refactoring of innerrel_is_unique's API to support this usage.

Per performance complaint from Teodor Sigaev.

Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01 14:53:42 -04:00
Tom Lane 2057a58d16 Fix mis-optimization of semijoins with more than one LHS relation.
The inner-unique patch (commit 9c7f5229a) supposed that if we're
considering a JOIN_UNIQUE_INNER join path, we can always set inner_unique
for the join, because the inner path produced by create_unique_path should
be unique relative to the outer relation.  However, that's true only if
we're considering joining to the whole outer relation --- otherwise we may
be applying only some of the join quals, and so the inner path might be
non-unique from the perspective of this join.  Adjust the test to only
believe that we can set inner_unique if we have the whole semijoin LHS on
the outer side.

There is more that can be done in this area, but this commit is only
intended to provide the minimal fix needed to get correct plans.

Per report from Teodor Sigaev.  Thanks to David Rowley for preliminary
investigation.

Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
2017-05-01 14:39:11 -04:00
Peter Eisentraut 9414e41ea7 Fix logical replication launcher wake up and reset
After the logical replication launcher was told to wake up at
commit (for example, by a CREATE SUBSCRIPTION command), the flag to wake
up was not reset, so it would be woken up at every following commit as
well.  So fix that by resetting the flag.

Also, we don't need to wake up anything if the transaction was rolled
back.  Just reset the flag in that case.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
2017-05-01 10:18:09 -04:00
Robert Haas e180c8aa8c Fire per-statement triggers on partitioned tables.
Even though no actual tuples are ever inserted into a partitioned
table (the actual tuples are in the partitions, not the partitioned
table itself), we still need to have a ResultRelInfo for the
partitioned table, or per-statement triggers won't get fired.

Amit Langote, per a report from Rajkumar Raghuwanshi.  Reviewed by me.

Discussion: http://postgr.es/m/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
2017-05-01 08:23:01 -04:00
Tom Lane 12d11432b4 Fix possible null pointer dereference or invalid warning message.
Thinko in commit de4389712: this warning message references the wrong
"LogicalRepWorker *" variable.  This would often result in a core dump,
but if it didn't, the message would show the wrong subscription OID.

In passing, adjust the message text to format a subscription OID
similarly to how that's done elsewhere in the function; and fix
grammatical issues in some nearby messages.

Per Coverity testing.
2017-04-30 12:21:02 -04:00
Robert Haas 6a4dda44e0 Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute.
Currently, trying to validate a NO INHERIT constraint on the parent will
search for the constraint in child tables (where it is not supposed to
exist), wrongly causing a "constraint does not exist" error.

Amit Langote, per a report from Hans Buschmann.

Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org
2017-04-28 14:48:38 -04:00
Robert Haas 5e1ccd4844 In load_relcache_init_file, initialize rd_pdcxt.
Oversight noted by Gao Zeng Qi.

Discussion: http://postgr.es/m/CAFmBtr1N3-SbepJbnGpaYp=jw-FvWMnYY7-bTtRgvjvbyB8YJA@mail.gmail.com
2017-04-28 14:05:13 -04:00
Robert Haas c1e0e7e1d7 Speed up dropping tables with many partitions.
We need to lock the parent, but we don't need a relcache entry
for it.

Gao Zeng Qi, reviewed by Amit Langote

Discussion: http://postgr.es/m/CAFmBtr0ukqJjRJEhPWL5wt4rNMrJUUxggVAGXPR3SyYh3E+HDQ@mail.gmail.com
2017-04-28 14:02:24 -04:00
Robert Haas 504c2205ab Fix crash when partitioned column specified twice.
Amit Langote, reviewed by Beena Emerson

Discussion: http://postgr.es/m/6ed23d3d-c09d-4cbc-3628-0a8a32f750f4@lab.ntt.co.jp
2017-04-28 13:52:17 -04:00
Peter Eisentraut e3cf708016 Wait between tablesync worker restarts
Before restarting a tablesync worker for the same relation, wait
wal_retrieve_retry_interval (currently 5s by default).  This avoids
restarting failing workers in a tight loop.

We keep the last start times in a hash table last_start_times that is
separate from the table_states list, because that list is cleared out on
syscache invalidation, which happens whenever a table finishes syncing.
The hash table is kept until all tables have finished syncing.

A future project might be to unify these two and keep everything in one
data structure, but for now this is a less invasive change to accomplish
the original purpose.

For the test suite, set wal_retrieve_retry_interval to its minimum
value, to not increase the test suite run time.

Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-04-28 13:47:46 -04:00
Heikki Linnakangas d981074c24 Misc SCRAM code cleanups.
* Move computation of SaltedPassword to a separate function from
  scram_ClientOrServerKey(). This saves a lot of cycles in libpq, by
  computing SaltedPassword only once per authentication. (Computing
  SaltedPassword is expensive by design.)

* Split scram_ClientOrServerKey() into two functions. Improves
  readability, by making the calling code less verbose.

* Rename "server proof" to "server signature", to better match the
  nomenclature used in RFC 5802.

* Rename SCRAM_SALT_LEN to SCRAM_DEFAULT_SALT_LEN, to make it more clear
  that the salt can be of any length, and the constant only specifies how
  long a salt we use when we generate a new verifier. Also rename
  SCRAM_ITERATIONS_DEFAULT to SCRAM_DEFAULT_ITERATIONS, for consistency.

These things caught my eye while working on other upcoming changes.
2017-04-28 15:22:38 +03:00
Stephen Frost b9a3ef55b2 Remove unnecessairly duplicated gram.y productions
Declarative partitioning duplicated the TypedTableElement productions,
evidently to remove the need to specify WITH OPTIONS when creating
partitions.  Instead, simply make WITH OPTIONS optional in the
TypedTableElement production and remove all of the duplicate
PartitionElement-related productions.  This change simplifies the
syntax and makes WITH OPTIONS optional when adding defaults, constraints
or storage parameters to columns when creating either typed tables or
partitions.

Also update pg_dump to no longer include WITH OPTIONS, since it's not
necessary, and update the documentation to reflect that WITH OPTIONS is
now optional.
2017-04-27 20:14:39 -04:00
Andres Freund ab9c43381e Don't build full initial logical decoding snapshot if NOEXPORT_SNAPSHOT.
Earlier commits (56e19d938d and 2bef06d516) make it cheaper to
create a logical slot if not exporting the initial snapshot.  If
NOEXPORT_SNAPSHOT is specified, we can skip the overhead, not just
when creating a slot via sql (which can't export snapshots).  As
NOEXPORT_SNAPSHOT has only recently been introduced, this shouldn't be
backpatched.
2017-04-27 15:52:31 -07:00
Andres Freund 56e19d938d Don't use on-disk snapshots for exported logical decoding snapshot.
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore).  These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).

The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables.  Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data.  This
would only happen if logical slots are created while another logical
slot already exists.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 15:29:15 -07:00