Commit Graph

17897 Commits

Author SHA1 Message Date
Tom Lane 3afd75eaac Remove dubious micro-optimization in ckpt_buforder_comparator().
It seems incorrect to assume that the list of CkptSortItems can never
contain duplicate page numbers: concurrent activity could result in some
page getting dropped from a low-numbered buffer and later loaded into a
high-numbered buffer while BufferSync is scanning the buffer pool.
If that happened, the comparator would give self-inconsistent results,
potentially confusing qsort().  Saving one comparison step is not worth
possibly getting the sort wrong.

So far as I can tell, nothing would actually go wrong given our current
implementation of qsort().  It might get a bit slower than expected
if there were a large number of duplicates of one value, but that's
surely a probability-epsilon case.  Still, the comment is wrong,
and if we ever switched to another sort implementation it might be
less forgiving.

In passing, avoid casting away const-ness of the argument pointers;
I've not seen any compiler complaints from that, but it seems likely
that some compilers would not like it.

Back-patch to 9.6 where this code came in, just in case I've underestimated
the possible consequences.

Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us
2018-01-10 15:50:54 -05:00
Robert Haas 2fd58096f0 Add missing "return" statement to accumulate_append_subpath.
Without this, Parallel Append can end up with extra children.

Report by Rajkumar Raghuwanshi.  Fix by Amit Khandekar.  Brown
paper bag bug by me.

Discussion: http://postgr.es/m/CAKcux6mBF-NiddyEe9LwymoUC5+wh8bQJ=uk2gGkOE+L8cv=LA@mail.gmail.com
2018-01-10 11:21:20 -05:00
Peter Eisentraut b3617cdfbb Move portal pinning from PL/pgSQL to SPI
PL/pgSQL "pins" internally generated (unnamed) portals so that user code
cannot close them by guessing their names.  This logic is also useful in
other languages and really for any code.  So move that logic into SPI.
An unnamed portal obtained through SPI_cursor_open() and related
functions is now automatically pinned, and SPI_cursor_close()
automatically unpins a portal that is pinned.

In the core distribution, this affects PL/Perl and PL/Python, preventing
users from manually closing cursors created by spi_query and
plpy.cursor, respectively.  (PL/Tcl does not currently offer any cursor
functionality.)

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-10 10:20:51 -05:00
Peter Eisentraut acc67ffd0a Give more accurate error message for dropping pinned portal
The previous code gave the same error message for attempting to drop
pinned and active portals, but those are separate states, so give
separate error messages.
2018-01-10 09:22:07 -05:00
Andres Freund 69c3936a14 Expression evaluation based aggregate transition invocation.
Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluating input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:

- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
  commonly there are multiple aggregates in a query, resulting in the
  same call-stack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
  transition function invocation

Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.

This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.

There's potential for further improvement:
- It'd be nice if we could simplify the somewhat expensive
  aggstate->all_pergroups lookups.
- right now there's still an advance_transition_function invocation in
  nodeAgg.c, leading to some code duplication.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-01-09 13:25:38 -08:00
Alvaro Herrera 272c2ab9fd Change some bogus PageGetLSN calls to BufferGetLSNAtomic
As src/backend/access/transam/README says, PageGetLSN may only be called
by processes holding either exclusive lock on buffer, or a shared lock
on buffer plus buffer header lock.  Therefore any place that only holds
a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
which internally obtains buffer header lock prior to reading the LSN.

A few callsites failed to comply with this rule.  This was detected by
running all tests under a new (not committed) assertion that verifies
PageGetLSN locking contract.  All but one of the callsites that failed
the assertion are fixed by this patch.  Remaining callsites were
inspected manually and determined not to need any change.

The exception (unfixed callsite) is in TestForOldSnapshot, which only
has a Page argument, making it impossible to access the corresponding
Buffer from it.  Fixing that seems a much larger patch that will have to
be done separately; and that's just as well, since it was only
introduced in 9.6 and other bugs are much older.

Some of these bugs are ancient; backpatch all the way back to 9.3.

Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
2018-01-09 17:06:31 -03:00
Andrew Dunstan 11b623dd0a Implement TZH and TZM timestamp format patterns
These are compatible with Oracle and required for the datetime template
language for jsonpath in an upcoming patch.

Nikita Glukhov and Andrew Dunstan, reviewed by Pavel Stehule.
2018-01-09 14:25:05 -05:00
Peter Eisentraut a77dd53f30 Remove PortalGetQueryDesc()
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Peter Eisentraut 0f7c49e855 Update portal-related memory context names and API
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.

Rename PortalData's "heap" field to portalContext.  The "heap" naming
seems quite antiquated and confusing.  Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Tom Lane 3cb1b2a880 Rewrite list_qsort() to avoid trashing its input list.
The initial implementation of list_qsort(), from commit ab7271677,
re-used the ListCells of the input list while not touching the List
header.  This meant that anybody who still had a pointer to the
original header would now be in possession of a corrupted list,
a problem that seems sure to bite us eventually.

One possible solution is to re-use the original List header as well,
giving the function the semantics of update-in-place.  However, that
doesn't seem like a very good idea either given the way that the
function is used in the planner: create_path functions aren't normally
supposed to modify their input lists.  It doesn't look like there would
be a problem today, but it's not hard to foresee a time when modifying
a list of Paths in-place could have side-effects on some other append
path.

On the whole, and in view of the likelihood that this function might
be used in other contexts in the future, it seems best to get rid of
the micro-optimization of re-using the input list cells.  Just build
a new list.

Discussion: https://postgr.es/m/16912.1515449066@sss.pgh.pa.us
2018-01-09 13:25:53 -05:00
Tom Lane 624e440a47 Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost.  However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost.  In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.

We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa.  When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order.  (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)

This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.

Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-09 13:07:52 -05:00
Tom Lane 80259d4dbf While waiting for a condition variable, detect postmaster death.
The general assumption for postmaster child processes is that they
should just exit(1), reasonably promptly, if the postmaster disappears.
condition_variable.c neglected this consideration and could be left
waiting forever, if the counterpart process it is waiting for has
done the right thing and exited.

We had some discussion of adjusting the WaitEventSet API to make it
harder to make this type of mistake in future; but for the moment,
and for v10, let's make this narrow fix.

Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us
2018-01-09 12:34:57 -05:00
Tom Lane 8a906204ae Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended.  The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop.  What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely.  This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs.  We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
2018-01-09 12:09:30 -05:00
Tom Lane 13db3b9363 Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs.
The original coding here insisted that callers manually cancel any prepared
sleep for one condition variable before starting a sleep on another one.
While that's not a huge burden today, it seems like a gotcha that will bite
us in future if the use of condition variables increases; anything we can
do to make the use of this API simpler and more robust is attractive.
Hence, allow these functions to automatically switch their attention to
a different CV when required.  This is safe for the same reason it was OK
for commit aced5a92b to let a broadcast operation cancel any prepared CV
sleep: whenever we return to the other test-and-sleep loop, we will
automatically re-prepare that CV, paying at most an extra test of that
loop's exit condition.

Back-patch to v10 where condition variables were introduced.  Ordinarily
we would probably not back-patch a change like this, but since it does not
invalidate any coding pattern that was legal before, it seems safe enough.
Furthermore, there's an open bug in replorigin_drop() for which the
simplest fix requires this.  Even if we chose to fix that in some more
complicated way, the hazard would remain that we might back-patch some
other bug fix that requires this behavior.

Patch by me, reviewed by Thomas Munro.

Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us
2018-01-09 11:39:10 -05:00
Robert Haas 921059bd66 Don't allow VACUUM VERBOSE ANALYZE VERBOSE.
There are plans to extend the syntax for ANALYZE, so we need to break
the link between VacuumStmt and AnalyzeStmt.  But apart from that, the
syntax above is undocumented and, if discovered by users, might give
the impression that the VERBOSE option for VACUUM differs from the
verbose option from ANALYZE, which it does not.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada

Discussion: http://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com
2018-01-09 10:20:48 -05:00
Robert Haas 63008b19ee Fix comment.
RELATION_IS_OTHER_TEMP is tested in the caller, not here.

Discussion: http://postgr.es/m/5A5438E4.3090709@lab.ntt.co.jp
2018-01-09 09:40:31 -05:00
Tom Lane e35dba475a Cosmetic improvements in condition_variable.[hc].
Clarify a bunch of comments.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:28:03 -05:00
Tom Lane ea8e1bbc53 Improve error detection capability in proclists.
Previously, although the initial state of a proclist_node is expected
to be next == prev == 0, proclist_delete_offset would reset nodes to
next == prev == INVALID_PGPROCNO when removing them from a list.
This is the same state that a node in a singleton list has, so that
it's impossible to distinguish not-in-a-list from in-a-list.  Change
proclist_delete_offset to reset removed nodes to next == prev == 0,
making it possible to distinguish those cases, and then add Asserts
to the list add and delete functions that the supplied node isn't
or is in a list at entry.  Also tighten assertions about the node
being in the particular list (not some other one) where it is possible
to check that in O(1) time.

In ConditionVariablePrepareToSleep, since we don't expect the process's
cvWaitLink to already be in a list, remove the more-or-less-useless
proclist_contains check; we'd rather have proclist_push_tail's new
assertion fire if that happens.

Improve various comments related to proclists, too.

Patch by me, reviewed by Thomas Munro.  This isn't back-patchable, since
there could theoretically be inlined copies of proclist_delete_offset in
third-party modules.  But it's only improving debuggability anyway.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:07:04 -05:00
Tom Lane eeb3c2df42 Back off chattiness in RemovePgTempFiles().
In commit 561885db0, as part of normalizing RemovePgTempFiles's error
handling, I removed its behavior of silently ignoring ENOENT failures
during directory opens.  Thomas Munro points out that this is a bad idea at
the top level, because we don't create pgsql_tmp directories until needed.
Thus this coding could produce LOG messages in perfectly normal situations,
which isn't what I intended.  Restore the suppression of ENOENT logging,
but only at top level --- it would still be unexpected for a nested temp
directory to disappear between seeing it in the parent directory and
opening it.

Discussion: https://postgr.es/m/CAEepm=2y06SehAkTnd5sU_eVqdv5P-=Srt1y5vYNQk6yVDVaPw@mail.gmail.com
2018-01-07 20:40:40 -05:00
Simon Riggs 6271fceb8a Add TIMELINE to backup_label file
Allows new test to confirm timelines match

Author: Michael Paquier
Reviewed-by: David Steele
2018-01-06 12:24:19 +00:00
Simon Riggs 6668a54eb8 Default monitoring roles - errata
25fff40798 introduced
default monitoring roles. Apply these corrections:

* Allow access to pg_stat_get_wal_senders()
  by role pg_read_all_stats

* Correct comment in pg_stat_get_wal_receiver()
  to show it is no longer superuser-only.

Author: Feike Steenbergen
Reviewed-by: Michael Paquier

Apply to HEAD, then later backpatch to 10
2018-01-06 11:48:21 +00:00
Tom Lane ccf312a448 Remove return values of ConditionVariableSignal/Broadcast.
In the wake of commit aced5a92b, the semantics of these results are
a bit squishy: we can tell whether we signaled some other process(es),
but we do not know which ones were real waiters versus mere sentinels
for ConditionVariableBroadcast operations.  It does not help much that
ConditionVariableBroadcast will attempt to pass on the signal to the
next real waiter, because (a) there might not be one, and (b) that will
only happen awhile later, anyway.  So these results could overstate how
much effect the calls really had.

However, no existing caller of either function pays any attention to its
result value, so it seems reasonable to just define that as a required
property of a correct algorithm.  To encourage correctness and save some
tiny number of cycles, change both functions to return void.

Patch by me, per an observation by Thomas Munro.  No back-patch, since
if any third parties happen to be using these functions, they might not
appreciate an API break in a minor release.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 20:33:26 -05:00
Tom Lane 3cac0ec859 Reorder steps in ConditionVariablePrepareToSleep for more safety.
In the admittedly-very-unlikely case that AddWaitEventToSet fails,
ConditionVariablePrepareToSleep would error out after already having
set cv_sleep_target, which is probably bad, and after having already
set cv_wait_event_set, which is very bad.  Transaction abort might or
might not clean up cv_sleep_target properly; but there is nothing
that would be aware that the WaitEventSet wasn't fully constructed,
so that all future condition variable sleeps would be broken.
We can easily guard against these hazards with slight restructuring.

Back-patch to v10 where condition_variable.c was introduced.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:42:49 -05:00
Tom Lane aced5a92bf Rewrite ConditionVariableBroadcast() to avoid live-lock.
The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible".  Thomas Munro found out
it was a bit too dumb.  An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle.  Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore.  There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel.  But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:21:30 -05:00
Robert Haas 19c47e7c82 Factor error generation out of ExecPartitionCheck.
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.

Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
2018-01-05 15:22:33 -05:00
Alvaro Herrera df9f682c7b Fix failure to delete spill files of aborted transactions
Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large.  These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known.  The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction.  This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
2018-01-05 12:17:10 -03:00
Peter Eisentraut 054e8c6cdb Another attempt at fixing build with various OpenSSL versions
It seems we can't easily work around the lack of
X509_get_signature_nid(), so revert the previous attempts and just
disable the tls-server-end-point feature if we don't have it.
2018-01-04 19:09:27 -05:00
Peter Eisentraut 1834c1e432 Add missing includes
<openssl/x509.h> is necessary to look into the X509 struct, used by
ac3ff8b1d8.
2018-01-04 17:56:09 -05:00
Robert Haas ef6087ee5f Minor preparatory refactoring for UPDATE row movement.
Generalize is_partition_attr to has_partition_attrs and make it
accessible from outside tablecmds.c.  Change map_partition_varattnos
to clarify that it can be used for mapping between any two relations
in a partitioning hierarchy, not just parent -> child.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Some comment changes by me.

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 16:25:49 -05:00
Peter Eisentraut ac3ff8b1d8 Fix build with older OpenSSL versions
Apparently, X509_get_signature_nid() is only in fairly new OpenSSL
versions, so use the lower-level interface it is built on instead.
2018-01-04 16:22:06 -05:00
Robert Haas cc6337d2fe Simplify and encapsulate tuple routing support code.
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things.  Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me

Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
2018-01-04 15:48:15 -05:00
Peter Eisentraut d3fb72ea6d Implement channel binding tls-server-end-point for SCRAM
This adds a second standard channel binding type for SCRAM.  It is
mainly intended for third-party clients that cannot implement
tls-unique, for example JDBC.

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 15:29:50 -05:00
Peter Eisentraut f3049a603a Refactor channel binding code to fetch cbind_data only when necessary
As things stand now, channel binding data is fetched from OpenSSL and
saved into the SCRAM exchange context for any SSL connection attempted
for a SCRAM authentication, resulting in data fetched but not used if no
channel binding is used or if a different channel binding type is used
than what the data is here for.

Refactor the code in such a way that binding data is fetched from the
SSL stack only when a specific channel binding is used for both the
frontend and the backend.  In order to achieve that, save the libpq
connection context directly in the SCRAM exchange state, and add a
dependency to SSL in the low-level SCRAM routines.

This makes the interface in charge of initializing the SCRAM context
cleaner as all its data comes from either PGconn* (for frontend) or
Port* (for the backend).

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-04 13:55:12 -05:00
Peter Eisentraut 3ad2afc2e9 Define LDAPS_PORT if it's missing and disable implicit LDAPS on Windows
Some versions of Windows don't define LDAPS_PORT.

Also, Windows' ldap_sslinit() is documented to use LDAPS even if you
said secure=0 when the port number happens to be 636 or 3269.  Let's
avoid using the port number to imply that you want LDAPS, so that
connection strings have the same meaning on Windows and Unix.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm%3D23B7GV4AUz3MYH1TKpTv030VHxD2Sn%2BLYWDv8d-qWxww%40mail.gmail.com
2018-01-04 10:34:41 -05:00
Robert Haas c759395617 Code review for Parallel Append.
- Remove unnecessary #include mistakenly added in execnodes.h.
- Fix mistake in comment in choose_next_subplan_for_leader.
- Adjust row estimates in cost_append for a possibly-different
  parallel divisor.
- Clamp row estimates in cost_append after operations that may
  not produce integers.

Amit Kapila, with cosmetic adjustments by me.

Discussion: http://postgr.es/m/CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
2018-01-04 07:56:09 -05:00
Tom Lane 47c6772eb7 Clean up tupdesc.c for recent changes.
TupleDescCopy needs to have the same effects as CreateTupleDescCopy in
that, since it doesn't copy constraints, it should clear the per-attribute
fields associated with them.  Oversight in commit cc5f81366.

Since TupleDescCopy has already established the presumption that it
can just flat-copy the entire attribute array in one go, propagate
that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr.
(I'm suspicious that this would lead to valgrind complaints if we
had any trailing padding in the struct, but we do not, and anyway
fixing that seems like a job for a separate commit.)

Add some better comments.

Thomas Munro, reviewed by Vik Fearing, some additional hacking by me

Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com
2018-01-03 17:53:41 -05:00
Alvaro Herrera bab2969867 Fix typo
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/d8jefpk4jtd.fsf@dalvik.ping.uio.no
2018-01-03 19:12:06 -03:00
Alvaro Herrera 3c27944fb2 Make XactLockTableWait work for transactions that are not yet self-locked
XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

To fix, change SubTransGetParent to SubTransGetTopmostTransaction which
has a slightly different API: it returns the argument Xid if there is no
parent, and it goes all the way to the top instead of moving up the
levels one by one.  Also, to avoid busy-waiting, add a 1ms sleep to give
the other process time to register itself in the lock table.

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas
2018-01-03 17:26:20 -03:00
Tom Lane 6fcde24063 Fix some minor errors in new PHJ code.
Correct ExecParallelHashTuplePrealloc's estimate of whether the
space_allowed limit is exceeded.  Be more consistent about tuples that
are exactly HASH_CHUNK_THRESHOLD in size (they're "small", not "large").
Neither of these things explain the current buildfarm unhappiness, but
they're still bugs.

Thomas Munro, per gripe by me

Discussion: https://postgr.es/m/CAEepm=34PDuR69kfYVhmZPgMdy8pSA-MYbpesEN1SR+2oj3Y+w@mail.gmail.com
2018-01-03 12:53:49 -05:00
Tom Lane 3decd150a2 Teach eval_const_expressions() to handle some more cases.
Add some infrastructure (mostly macros) to make it easier to write
typical cases for constant-expression simplification.  Add simplification
processing for ArrayRef, RowExpr, and ScalarArrayOpExpr node types,
which formerly went unsimplified even if all their inputs were constants.
Also teach it to simplify FieldSelect from a composite constant.
Make use of the new infrastructure to reduce the amount of code needed
for the existing ArrayExpr and ArrayCoerceExpr cases.

One existing test case changes output as a result of the fact that
RowExpr can now be folded to a constant.  All the new code is exercised
by existing test cases according to gcov, so I feel no need to add
additional tests.

Tom Lane, reviewed by Dmitry Dolgov

Discussion: https://postgr.es/m/3be3b82c-e29c-b674-2163-bf47d98817b1@iki.fi
2018-01-03 12:35:09 -05:00
Peter Eisentraut 35c0754fad Allow ldaps when using ldap authentication
While ldaptls=1 provides an RFC 4513 conforming way to do LDAP
authentication with TLS encryption, there was an earlier de facto
standard way to do LDAP over SSL called LDAPS.  Even though it's not
enshrined in a standard, it's still widely used and sometimes required
by organizations' network policies.  There seems to be no reason not to
support it when available in the client library.  Therefore, add support
when using OpenLDAP 2.4+ or Windows.  It can be configured with
ldapscheme=ldaps or ldapurl=ldaps://...

Add tests for both ways of requesting LDAPS and a test for the
pre-existing ldaptls=1.  Modify the 001_auth.pl test for "diagnostic
messages", which was previously relying on the server rejecting
ldaptls=1.

Author: Thomas Munro
Reviewed-By: Peter Eisentraut
Discussion: https://postgr.es/m/CAEepm=1s+pA-LZUjQ-9GQz0Z4rX_eK=DFXAF1nBQ+ROPimuOYQ@mail.gmail.com
2018-01-03 10:11:26 -05:00
Bruce Momjian 9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Andres Freund f9ccf92e16 Simplify representation of aggregate transition values a bit.
Previously aggregate transition values for hash and other forms of
aggregation (i.e. sort and no group by) were represented
differently. Hash based aggregation used a grouping set indexed array
pointing to an array of transition values, whereas other forms of
aggregation used one flattened array with the index being computed out
of grouping set and transition offsets.

That made upcoming changes hard, so represent both as grouping set
indexed array of per-group data.

As a nice side-effect this also makes aggregation slightly faster,
because computing offsets with `transno + (setno * numTrans)` turns
out not to be that cheap (too big for x86 lea for example).

Author: Andres Freund
Discussion: https://postgr.es/m/20171128003121.nmxbm2ounxzb6n2t@alap3.anarazel.de
2018-01-02 18:23:37 -08:00
Tom Lane 5dc692f78d Ensure proper alignment of tuples in HashMemoryChunkData buffers.
The previous coding relied (without any documentation) on the data[]
member of HashMemoryChunkData being at a MAXALIGN'ed offset.  If it
was not, the tuples would not be maxaligned either, leading to failures
on alignment-picky machines.  While there seems to be no live bug on any
platform we support, this is clearly pretty fragile: any addition to or
rearrangement of the fields in HashMemoryChunkData could break it.
Let's remove the hazard by getting rid of the data[] member and instead
using pointer arithmetic with an explicitly maxalign'ed offset.

Discussion: https://postgr.es/m/14483.1514938129@sss.pgh.pa.us
2018-01-02 21:23:06 -05:00
Alvaro Herrera 54eff5311d Fix deadlock hazard in CREATE INDEX CONCURRENTLY
Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3bd2 specifically to support this case.  In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.

Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e47510 keeping track of
(registering) the catalog snapshot.  To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.

Backpatch to 9.4, which introduced MVCC catalog scans.

Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).

Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
2018-01-02 19:16:16 -03:00
Peter Eisentraut 438036264a Don't cast between GinNullCategory and bool
The original idea was that we could use an isNull-style bool array
directly as a GinNullCategory array.  However, the existing code already
acknowledges that that doesn't really work, because of the possibility
that bool as currently defined can have arbitrary bit patterns for true
values.  So it has to loop through the nullFlags array to set each bool
value to an acceptable value.  But if we are looping through the whole
array anyway, we might as well build a proper GinNullCategory array
instead and abandon the type casting.  That makes the code much safer in
case bool is ever changed to something else.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-02 12:20:56 -05:00
Andres Freund 93ea78b17c Fix EXPLAIN ANALYZE output for Parallel Hash.
In a race case, EXPLAIN ANALYZE could fail to display correct nbatch
and size information.  Refactor so that participants report only on
batches they worked on rather than trying to report on all of them,
and teach explain.c to consider the HashInstrumentation object from
all participants instead of picking the first one it can find.  This
should fix an occasional build farm failure in the "join" regression
test.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
2018-01-01 14:38:23 -08:00
Andres Freund b40933101c Perform slot validity checks in a separate pass over expression.
This reduces code duplication a bit, but the primary benefit that it
makes JITing expression evaluation easier. When doing so we can't, as
previously done in the interpreted case, really change opcode without
recompiling. Nor dow we just carry around unnecessary branches to
avoid re-checking over and over.

As a minor side-effect this makes ExecEvalStepOp() O(log(N)) rather
than O(N).

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:45:25 -08:00
Andres Freund 4717fdb14c Rely on executor utils to build targetlist for DML RETURNING.
This is useful because it gets rid of the sole direct user of
ExecAssignResultType(). A future commit will likely make use of that
and combine creating the targetlist with the initialization of the
result slot. But it seems like good code hygiene anyway.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2017-12-29 12:26:29 -08:00
Magnus Hagander d02974e32e Properly set base backup backends to active in pg_stat_activity
When walsenders were included in pg_stat_activity, only the ones
actually streaming WAL were listed as active when they were active. In
particular, the connections sending base backups were listed as being
idle. Which means that a regular pg_basebackup would show up with one
active and one idle connection, when both were active.

This patch updates to set all walsenders to active when they are
(including those doing very fast things like IDENTIFY_SYSTEM), and then
back to idle. Details about exactly what they are doing is available in
pg_stat_replication.

Patch by me, review by Michael Paquier and David Steele.
2017-12-29 16:28:32 +01:00
Simon Riggs 48c9f49265 Fix race condition when changing synchronous_standby_names
A momentary window exists when synchronous_standby_names
changes that allows commands issued after the change to
continue to act as async until the change becomes visible.
Remove the race by using more appropriate test in syncrep.c

Author: Asim Rama Praveen and Ashwin Agrawal
Reported-by: Xin Zhang, Ashwin Agrawal, and Asim Rama Praveen
Reviewed-by: Michael Paquier, Masahiko Sawada
2017-12-29 14:30:33 +00:00
Simon Riggs 2958a672b1 Extend near-wraparound hints to include replication slots
Author: Feike Steenbergen
Reviewed-by: Michael Paquier
2017-12-29 14:01:25 +00:00
Andres Freund f83040c62a Fix rare assertion failure in parallel hash join.
When a backend runs out of inner tuples to hash, it should detach from
grow_batch_barrier only after it has flushed all batches to disk and
merged counters, not before.  Otherwise a concurrent backend in
ExecParallelHashIncreaseNumBatches() could stop waiting for this
backend and try to read tuples before they have been written.  This
commit reorders those operations and should fix the assertion failures
seen occasionally on the build farm since commit
1804284042.

Author: Thomas Munro
Discussion: https://postgr.es/m/E1eRwXy-0004IK-TO%40gemulon.postgresql.org
2017-12-28 02:41:53 -08:00
Alvaro Herrera be2343221f Protect against hypothetical memory leaks in RelationGetPartitionKey
Also, fix a comment that commit 8a0596cb65 made obsolete.

Reported-by: Robert Haas
Discussion: http://postgr.es/m/CA+TgmoYbpuUUUp2GhYNwWm0qkah39spiU7uOiNXLz20ASfKYoA@mail.gmail.com
2017-12-27 18:06:14 -03:00
Robert Haas 62d02f39e7 Fix race-under-concurrency in PathNameCreateTemporaryDir.
Thomas Munro

Discussion: http://postgr.es/m/CAEepm=1Vp1e3KtftLtw4B60ZV9teNeKu6HxoaaBptQMsRWjJbQ@mail.gmail.com
2017-12-27 10:56:14 -08:00
Teodor Sigaev ad337c76b6 Update relation's stats in pg_class during vacuum full.
Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.

Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.

Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net
2017-12-27 18:25:37 +03:00
Teodor Sigaev ff963b393c Add polygon opclass for SP-GiST
Polygon opclass uses compress method feature of SP-GiST added earlier. For now
it's a single operator class which uses this feature. SP-GiST actually indexes
a bounding boxes of input polygons, so part of supported operations are lossy.
Opclass uses most methods of corresponding opclass over boxes of SP-GiST and
treats bounding boxes as point in 4D-space.

Bump catalog version.

Authors: Nikita Glukhov, Alexander Korotkov with minor editorization by me
Reviewed-By: all authors + Darafei Praliaskouski
Discussion: https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru
2017-12-25 18:59:38 +03:00
Andres Freund 4e2970f880 Fix assert with side effects in the new PHJ code.
Instead of asserting the assert just set the value to what it was
supposed to test...

Per coverity.
2017-12-24 02:57:55 -08:00
Tom Lane c4c2885cbb Fix UNION/INTERSECT/EXCEPT over no columns.
Since 9.4, we've allowed the syntax "select union select" and variants
of that.  However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.

Turns out it's trivial to fix in v10 and later; we just need to be careful
about not generating a Sort node with no sort keys.  However, since a weird
corner case like this is never going to be exercised by developers, we'd
better have thorough regression tests if we want to consider it supported.

Per report from Victor Yegorov.

Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
2017-12-22 12:08:06 -05:00
Teodor Sigaev 854823fa33 Add optional compression method to SP-GiST
Patch allows to have different types of column and value stored in leaf tuples
of SP-GiST. The main application of feature is to transform complex column type
to simple indexed type or for truncating too long value, transformation could
be lossy.  Simple example: polygons are converted to their bounding boxes,
this opclass follows.

Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov
Reviewed-By: all authors + Darafei Praliaskouski
Discussions:
https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru
https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru
2017-12-22 13:33:16 +03:00
Alvaro Herrera 9373baa0f7 Minor edits to catalog files and scripts
This fixes a few typos and small mistakes; it also cleans a few
minor stylistic issues.  The biggest functional change is that
Gen_fmgrtab.pl no longer knows the OID of language 'internal'.

Author: John Naylor
Discussion: https://postgr.es/m/CAJVSVGXAkwbk-A9QHHHf00N905kKisyQbaYwKqaRpze_gPXGfg@mail.gmail.com
2017-12-21 19:07:32 -03:00
Robert Haas cce1ecfc77 Adjust assertion in GetCurrentCommandId.
currentCommandIdUsed is only used to skip redundant increments of the
command counter, and CommandCounterIncrement() is categorically denied
under parallelism anyway.  Therefore, it's OK for
GetCurrentCommandId() to mark the counter value used, as long as it
happens in the leader, not a worker.

Prior to commit e9baa5e9fa, the slightly
incorrect check didn't matter, but now it does.  A test case added by
commit 1804284042 uncovered the problem
by accident; it caused failures with force_parallel_mode=on/regress.

Report and review by Andres Freund.  Patch by me.

Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de
2017-12-21 13:19:59 -05:00
Tom Lane 6719b238e8 Rearrange execution of PARAM_EXTERN Params for plpgsql's benefit.
This patch does three interrelated things:

* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.

* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual.  This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not.  plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant.  While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.

* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for.  This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal.  copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.)  This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).

Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables.  The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.

In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState.  The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.

Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
2017-12-21 12:57:45 -05:00
Alvaro Herrera 8a0596cb65 Get rid of copy_partition_key
That function currently exists to avoid leaking memory in
CacheMemoryContext in case of trouble while the partition key is being
built, but there's a better way: allocate everything in a memcxt that
goes away if the current (sub)transaction fails, and once the partition
key is built and no further errors can occur, make the memcxt permanent
by making it a child of CacheMemoryContext.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20171027172730.eh2domlkpn4ja62m@alvherre.pgsql
2017-12-21 14:21:39 -03:00
Alvaro Herrera 9ef6aba1d3 Fix typo 2017-12-21 13:36:52 -03:00
Tom Lane c98c35cd08 Avoid putting build-location-dependent strings into generated files.
Various Perl scripts we use to generate files were in the habit of
printing things like "generated by $0" into their output files.
That looks like a fine idea at first glance, but it results in
non-reproducible output, because in VPATH builds $0 won't be just
the name of the script file, but a full path for it.  We'd prefer
that you get identical results whether using VPATH or not, so this
is a bad thing.

Some of these places also printed their input file name(s), causing
an additional hazard of the same type.

Hence, establish a policy that thou shalt not print $0, nor input file
pathnames, into output files (they're still allowed in error messages,
though).  Instead just write the script name verbatim.  While we are at
it, we can make these annotations more useful by giving the script's
full relative path name within the PG source tree, eg instead of
Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.

Not all of the changes made here actually affect any files shipped
in finished tarballs today, but it seems best to apply the policy
everyplace so that nobody copies unsafe code into places where it
could matter.

Christoph Berg and Tom Lane

Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
2017-12-21 10:57:06 -05:00
Robert Haas 59d1e2b95a Cancel CV sleep during subtransaction abort.
Generally, error recovery paths that need to do things like
LWLockReleaseAll and pgstat_report_wait_end also need to call
ConditionVariableCancelSleep, but AbortSubTransaction was missed.

Since subtransaction abort might destroy up the DSM segment that
contains the ConditionVariable stored in cv_sleep_target, this
can result in a crash for anything using condition variables.

Reported and diagnosed by Andres Freund.

Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
2017-12-21 09:24:30 -05:00
Andres Freund 1804284042 Add parallel-aware hash joins.
Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel
Hash Join with Parallel Hash.  While hash joins could already appear in
parallel queries, they were previously always parallel-oblivious and had a
partial subplan only on the outer side, meaning that the work of the inner
subplan was duplicated in every worker.

After this commit, the planner will consider using a partial subplan on the
inner side too, using the Parallel Hash node to divide the work over the
available CPU cores and combine its results in shared memory.  If the join
needs to be split into multiple batches in order to respect work_mem, then
workers process different batches as much as possible and then work together
on the remaining batches.

The advantages of a parallel-aware hash join over a parallel-oblivious hash
join used in a parallel query are that it:

 * avoids wasting memory on duplicated hash tables
 * avoids wasting disk space on duplicated batch files
 * divides the work of building the hash table over the CPUs

One disadvantage is that there is some communication between the participating
CPUs which might outweigh the benefits of parallelism in the case of small
hash tables.  This is avoided by the planner's existing reluctance to supply
partial plans for small scans, but it may be necessary to estimate
synchronization costs in future if that situation changes.  Another is that
outer batch 0 must be written to disk if multiple batches are required.

A potential future advantage of parallel-aware hash joins is that right and
full outer joins could be supported, since there is a single set of matched
bits for each hashtable, but that is not yet implemented.

A new GUC enable_parallel_hash is defined to control the feature, defaulting
to on.

Author: Thomas Munro
Reviewed-By: Andres Freund, Robert Haas
Tested-By: Rafia Sabih, Prabhat Sahu
Discussion:
    https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
    https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
2017-12-21 00:43:41 -08:00
Robert Haas f94eec490b When passing query strings to workers, pass the terminating \0.
Otherwise, when the query string is read, we might trailing garbage
beyond the end, unless there happens to be a \0 there by good luck.

Report and patch by Thomas Munro. Reviewed by Rafia Sabih.

Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com
2017-12-20 17:26:50 -05:00
Robert Haas 8526bcb2df Try again to fix accumulation of parallel worker instrumentation.
When a Gather or Gather Merge node is started and stopped multiple
times, accumulate instrumentation data only once, at the end, instead
of after each execution, to avoid recording inflated totals.

Commit 778e78ae9f, the previous attempt
at a fix, instead reset the state after every execution, which worked
for the general instrumentation data but had problems for the additional
instrumentation specific to Sort and Hash nodes.

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
following a design proposal from Thomas Munro, with a comment tweak
by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
2017-12-19 12:21:56 -05:00
Robert Haas 38fc54703e Re-fix wrong costing of Sort under Gather Merge.
Commit dc02c7bca4 changed this call
to create_sort_path() to take -1 rather than limit_tuples because,
at that time, there was no way for a Sort beneath a Gather Merge
to become a top-N sort.

Later, commit 3452dc5240 provided
a way for a Sort beneath a Gather Merge to become a top-N sort,
but failed to revert the previous commit in the process.  Do that.

Report and analysis by Jeff Janes; patch by Thomas Munro; review by
Amit Kapila and by me.

Discussion: http://postgr.es/m/CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
2017-12-19 10:42:17 -05:00
Andres Freund ab9e0e718a Add shared tuplestores.
SharedTuplestore allows multiple participants to write into it and
then read the tuples back from it in parallel.  Each reader receives
partial results.

For now it always uses disk files, but other buffering policies and
other kinds of scans (ie each reader receives complete results) may be
useful in future.

The upcoming parallel hash join feature will use this facility.

Author: Thomas Munro
Reviewed-By: Peter Geoghegan, Andres Freund, Robert Haas
Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com
2017-12-18 14:23:19 -08:00
Peter Eisentraut 25d532698d Move SCRAM-related name definitions to scram-common.h
Mechanism names for SCRAM and channel binding names have been included
in scram.h by the libpq frontend code, and this header references a set
of routines which are only used by the backend.  scram-common.h is on
the contrary usable by both the backend and libpq, so getting those
names from there seems more reasonable.

Author: Michael Paquier <michael.paquier@gmail.com>
2017-12-18 16:59:48 -05:00
Fujii Masao 56a95ee511 Fix bug in cancellation of non-exclusive backup to avoid assertion failure.
Previously an assertion failure occurred when pg_stop_backup() for
non-exclusive backup was aborted while it's waiting for WAL files to
be archived. This assertion failure happened in do_pg_abort_backup()
which was called when a non-exclusive backup was canceled.
do_pg_abort_backup() assumes that there is at least one non-exclusive
backup running when it's called. But pg_stop_backup() can be canceled
even after it marks the end of non-exclusive backup (e.g.,
during waiting for WAL archiving). This broke the assumption that
do_pg_abort_backup() relies on, and which caused an assertion failure.

This commit changes do_pg_abort_backup() so that it does nothing
when non-exclusive backup has been already marked as completed.
That is, the asssumption is also changed, and do_pg_abort_backup()
now can handle even the case where it's called when there is
no running backup.

Backpatch to 9.6 where SQL-callable non-exclusive backup was added.

Author: Masahiko Sawada and Michael Paquier
Reviewed-By: Robert Haas and Fujii Masao
Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
2017-12-19 03:46:14 +09:00
Robert Haas fd7c0fa732 Fix crashes on plans with multiple Gather (Merge) nodes.
es_query_dsa turns out to be broken by design, because it supposes
that there is only one DSA for the whole query, whereas there is
actually one per Gather (Merge) node.  For now, work around that
problem by setting and clearing the pointer around the sections of
code that might need it.  It's probably a better idea to get rid of
es_query_dsa altogether in favor of having each node keep track
individually of which DSA is relevant, but that seems like more than
we would want to back-patch.

Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
Kapila, and by me.

Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
2017-12-18 12:22:31 -05:00
Magnus Hagander 7731c32087 Fix typo on comment
Author: David Rowley
2017-12-18 11:24:55 +01:00
Tom Lane b31a9d7dd3 Suppress compiler warning about no function return value.
Compilers that don't know that ereport(ERROR) doesn't return
complained about the new coding in scanint8() introduced by
commit 101c7ee3e.  Tweak coding to avoid the warning.
Per buildfarm.
2017-12-17 00:41:41 -05:00
Andres Freund 699bf7d05c Perform a lot more sanity checks when freezing tuples.
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.

The errors are emitted with ereport rather than elog, despite being
"should never happen" messages, so a proper error code is emitted. To
avoid superflous translations, mark messages as internal.

Author: Andres Freund and Alvaro Herrera
Reviewed-By: Alvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andres Freund 9c2f0a6c3c Fix pruning of locked and updated tuples.
Previously it was possible that a tuple was not pruned during vacuum,
even though its update xmax (i.e. the updating xid in a multixact with
both key share lockers and an updater) was below the cutoff horizon.

As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, which in turn can lead two to breakages: First,
failing index lookups, which in turn can e.g lead to constraints being
violated. Second, future hot prunes / vacuums can end up making
invisible tuples visible again. There's other harmful scenarios.

Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.

A followup commit will harden the code somewhat against future similar
bugs and already corrupted data.

Author: Andres Freund, with changes by Alvaro Herrera
Reported-By: Daniel Wood
Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
   Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
Discussion:
    https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
    https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
2017-12-14 18:20:47 -08:00
Andrew Dunstan 0fedb4ea69 Fix walsender timeouts when decoding a large transaction
The logical slots have a fast code path for sending data so as not to
impose too high a per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that a transaction with a large number of changes may take a
very long time to be processed and sent to the client. This causes the
walsender to ignore interrupts for potentially a long time and more
importantly it will result in the walsender being killed due to
timeout at the end of such a transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when the last keepalive check happened less
than half the walsender timeout ago. Otherwise the slower code path will
be taken.

Backpatched to 9.4

Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
Ringer and Robert Haas.

Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
2017-12-14 11:13:14 -05:00
Andres Freund 538d114f6d Allow executor nodes to change their ExecProcNode function.
In order for executor nodes to be able to change their ExecProcNode function
after ExecInitNode() has finished, provide ExecSetExecProcNode().  This allows
any wrappers functions that only execProcnode.c knows about to be reinstalled.
The motivation for wanting to change ExecProcNode after ExecInitNode() has
finished is that it is not known until later whether parallel query is
available, so if a parallel variant is to be installed then ExecInitNode()
is too soon to decide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=09rr65VN+cAV5FgyM_z=D77Xy8Fuc9CDDDYbq3pQUezg@mail.gmail.com
2017-12-13 15:47:01 -08:00
Andres Freund 923e8dee88 Add defenses against pre-crash files to BufFileOpenShared().
Crash restarts currently don't clean up temporary files, as a debugging aid.
If a left-over file happens to have the same name as a segment file we're
trying to create, we'll just truncate and reuse it, but there is a problem:
BufFileOpenShared() determines how many segment files exist by trying to open
.0, .1, .2, ... until it finds no more files.  It might be confused by a junk
file that has the next segment number.  To defend against that, make sure we
always create a gap after the end file by unlinking the following name if it
exists.  Also make it an error to try to open a BufFile that doesn't exist
(has no segment 0), so as not to encourage the development of client code
that depends on an interface that we can't reliably provide.

Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D2jhCbC_GFQJaaDhWxLB4EXtT3vVd5czuRNaqF5CWSTog%40mail.gmail.com
2017-12-13 13:27:41 -08:00
Robert Haas 884a60840c Fix parallel index scan hang with deleted or half-dead pages.
The previous coding forgot to release the scan before seizing
it again, leading to a lockup.

Report by Patrick Hemmer.  Diagnosis by Thomas Munro.  Patch by
Amit Kapila.

Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
2017-12-13 16:15:44 -05:00
Robert Haas 1d6fb35ad6 Revert "Fix accumulation of parallel worker instrumentation."
This reverts commit 2c09a5c12a.  Per
further discussion, that doesn't seem to be the best possible fix.

Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com
2017-12-13 15:19:28 -05:00
Tom Lane 9fa6f00b13 Rethink MemoryContext creation to improve performance.
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts.  The key ideas are:

* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.

* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).

* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.

The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations.  That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.

Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant.  Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)

The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.

To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option.  AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.

An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.

Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module.  That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.

In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles.  The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.

Also, mark the memory context method-pointer structs "const",
just for cleanliness.

Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
2017-12-13 13:55:16 -05:00
Peter Eisentraut 3d8874224f Fix crash when using CALL on an aggregate
Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reported-by: Rushabh Lathia <rushabh.lathia@gmail.com>
2017-12-13 10:37:48 -05:00
Andres Freund 8e211f5391 Add float.h include to int8.c, for isnan().
port.h redirects isnan() to _isnan() on windows, which in turn is
provided by float.h rather than math.h. Therefore include the latter
as well.

Per buildfarm.
2017-12-12 23:32:43 -08:00
Andres Freund f512a6e132 Consistently use PG_INT(16|32|64)_(MIN|MAX).
Per buildfarm animal woodlouse.
2017-12-12 18:19:13 -08:00
Andres Freund 101c7ee3ee Use new overflow aware integer operations.
A previous commit added inline functions that provide fast(er) and
correct overflow checks for signed integer math. Use them in a
significant portion of backend code.  There's more to touch in both
backend and frontend code, but these were the easily identifiable
cases.

The old overflow checks are noticeable in integer heavy workloads.

A secondary benefit is that getting rid of overflow checks that rely
on signed integer overflow wrapping around, will allow us to get rid
of -fwrapv in the future. Which in turn slows down other code.

Author: Andres Freund
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
2017-12-12 16:55:37 -08:00
Robert Haas 95b52351fe Remove obsolete comment.
Commit 8b304b8b72 removed replacement
selection, but left behind this comment text.  The optimization to
which the comment refers is not relevant without replacement
selection, because if we had so few tuples as to require only one
tape, we would have just completed the sort in memory.

Peter Geoghegan

Discussion: http://postgr.es/m/CAH2-WznqupLA8CMjp+vqzoe0yXu0DYYbQSNZxmgN76tLnAOZ_w@mail.gmail.com
2017-12-12 19:33:50 -05:00
Robert Haas d329dc2ea4 Remove bug from OPTIMIZER_DEBUG code for partition-wise join.
Etsuro Fujita, reviewed by Ashutosh Bapat

Discussion: http://postgr.es/m/5A2A60E6.6000008@lab.ntt.co.jp
2017-12-12 10:52:15 -05:00
Peter Eisentraut 4034db215b Fix comment
Reported-by: Noah Misch <noah@leadboat.com>
2017-12-11 16:37:39 -05:00
Tom Lane 7eb16ab17d Fix corner-case coredump in _SPI_error_callback().
I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL,
and only fills it in some time later.  If an error were to happen in
between, _SPI_error_callback would try to dereference the null pointer.
This is unlikely --- there's not much between those points except
push-snapshot calls --- but it's clearly not impossible.  Tweak the
callback to do nothing if the pointer isn't set yet.

It's been like this for awhile, so back-patch to all supported branches.
2017-12-11 16:34:28 -05:00
Robert Haas 01a0ca1bed Improve comment about PartitionBoundInfoData.
Ashutosh Bapat, per discussion with Julien Rouhaund, who also
reviewed this patch.

Discussion: http://postgr.es/m/CAFjFpReBR3ftK9C23LLCZY_TDXhhjB_dgE-L9+mfTnA=gkvdvQ@mail.gmail.com
2017-12-11 12:52:15 -05:00
Magnus Hagander d8f632caec Fix typo
Reported by Robins Tharakan
2017-12-09 11:40:31 +01:00
Peter Eisentraut 005ac298b1 Prohibit identity columns on typed tables and partitions
Those cases currently crash and supporting them is more work then
originally thought, so we'll just prohibit these scenarios for now.

Author: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Reported-by: Мансур Галиев <gomer94@yandex.ru>
Bug: #14866
2017-12-08 12:13:04 -05:00
Peter Eisentraut af9f8b7ca3 Fix mistake in comment
Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
2017-12-08 11:23:36 -05:00
Peter Eisentraut 2d2d06b7e2 Apply identity sequence values on COPY
A COPY into a table should apply identity sequence values just like it
does for ordinary defaults.  This was previously forgotten, leading to
null values being inserted, which in turn would fail because identity
columns have not-null constraints.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Steven Winfield <steven.winfield@cantabcapital.com>
Bug: #14952
2017-12-08 09:18:18 -05:00
Robert Haas 28724fd90d Report failure to start a background worker.
When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
or if it is not marked BGW_NEVER_RESTART but is terminated before
startup succeeds, what BgwHandleStatus should be reported?  The
previous code really hadn't considered this possibility (as indicated
by the comments which ignore it completely) and would typically return
BGWH_NOT_YET_STARTED, but that's not a good answer, because then
there's no way for code using GetBackgroundWorkerPid() to tell the
difference between a worker that has not started but will start
later and a worker that has not started and will never be started.
So, when this case happens, return BGWH_STOPPED instead.  Update the
comments to reflect this.

The preceding fix by itself is insufficient to fix the problem,
because the old code also didn't send a notification to the process
identified in bgw_notify_pid when startup failed.  That might've
been technically correct under the theory that the status of the
worker was BGWH_NOT_YET_STARTED, because the status would indeed not
change when the worker failed to start, but now that we're more
usefully reporting BGWH_STOPPED, a notification is needed.

Without these fixes, code which starts background workers and then
uses the recommended APIs to wait for those background workers to
start would hang indefinitely if the postmaster failed to fork a
worker.

Amit Kapila and Robert Haas

Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
2017-12-06 08:58:27 -05:00
Robert Haas 9c64ddd414 Fix Parallel Append crash.
Reported by Tom Lane and the buildfarm.

Amul Sul and Amit Khandekar

Discussion: http://postgr.es/m/17868.1512519318@sss.pgh.pa.us
Discussion: http://postgr.es/m/CAJ3gD9cJQ4d-XhmZ6BqM9rMM2KDBfpkdgOAb4+psz56uBuMQ_A@mail.gmail.com
2017-12-06 08:42:50 -05:00