Commit Graph

18654 Commits

Author SHA1 Message Date
Tom Lane 0d38e4ebb7 Fix minor error message style guide violation.
No periods at the ends of primary error messages, please.

Daniel Gustafsson

Discussion: https://postgr.es/m/43E004C0-18C6-42B4-A313-003B43EB0571@yesql.se
2018-09-19 17:06:40 -04:00
Tom Lane 8f0de712c3 Don't ignore locktable-full failures in StandbyAcquireAccessExclusiveLock.
Commit 37c54863c removed the code in StandbyAcquireAccessExclusiveLock
that checked the return value of LockAcquireExtended.  That created a
bug, because it's still passing reportMemoryError = false to
LockAcquireExtended, meaning that LOCKACQUIRE_NOT_AVAIL will be returned
if we're out of shared memory for the lock table.

In such a situation, the startup process would believe it had acquired an
exclusive lock even though it hadn't, with potentially dire consequences.

To fix, just drop the use of reportMemoryError = false, which allows us
to simplify the call into a plain LockAcquire().  It's unclear that the
locktable-full situation arises often enough that it's worth having a
better recovery method than crash-and-restart.  (I strongly suspect that
the only reason the code path existed at all was that it was relatively
simple to do in the pre-37c54863c implementation.  But now it's not.)

LockAcquireExtended's reportMemoryError parameter is now dead code and
could be removed.  I refrained from doing so, however, because there
was some interest in resurrecting the behavior if we do get reports of
locktable-full failures in the field.  Also, it seems unwise to remove
the parameter concurrently with shipping commit f868a8143, which added a
parameter; if there are any third-party callers of LockAcquireExtended,
we want them to get a wrong-number-of-parameters compile error rather
than a possibly-silent misinterpretation of its last parameter.

Back-patch to 9.6 where the bug was introduced.

Discussion: https://postgr.es/m/6202.1536359835@sss.pgh.pa.us
2018-09-19 12:43:51 -04:00
Alexander Korotkov 2a6368343f Add support for nearest-neighbor (KNN) searches to SP-GiST
Currently, KNN searches were supported only by GiST.  SP-GiST also capable to
support them.  This commit implements that support.  SP-GiST scan stack is
replaced with queue, which serves as stack if no ordering is specified.  KNN
support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops
and poly_ops (catversion is bumped).  Some common parts between GiST and SP-GiST
KNNs are extracted into separate functions.

Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov
Review: Andrey Borodin, Alexander Korotkov
2018-09-19 01:54:10 +03:00
Tom Lane d0cfc3d6a4 Add a debugging option to stress-test outfuncs.c and readfuncs.c.
In the normal course of operation, query trees will be serialized only if
they are stored as views or rules; and plan trees will be serialized only
if they get passed to parallel-query workers.  This leaves an awful lot of
opportunity for bugs/oversights to not get detected, as indeed we've just
been reminded of the hard way.

To improve matters, this patch adds a new compile option
WRITE_READ_PARSE_PLAN_TREES, which is modeled on the longstanding option
COPY_PARSE_PLAN_TREES; but instead of passing all parse and plan trees
through copyObject, it passes them through nodeToString + stringToNode.
Enabling this option in a buildfarm animal or two will catch problems
at least for cases that are exercised by the regression tests.

A small problem with this idea is that readfuncs.c historically has
discarded location fields, on the reasonable grounds that parse
locations in a retrieved view are not relevant to the current query.
But doing that in WRITE_READ_PARSE_PLAN_TREES breaks pg_stat_statements,
and it could cause problems for future improvements that might try to
report error locations at runtime.  To fix that, provide a variant
behavior in readfuncs.c that makes it restore location fields when
told to.

In passing, const-ify the string arguments of stringToNode and its
subsidiary functions, just because it annoyed me that they weren't
const already.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 17:11:54 -04:00
Tom Lane db1071d4ee Fix some minor issues exposed by outfuncs/readfuncs testing.
A test patch to pass parse and plan trees through outfuncs + readfuncs
exposed several issues that need to be fixed to get clean matches:

Query.withCheckOptions failed to get copied; it's intentionally ignored
by outfuncs/readfuncs on the grounds that it'd always be NIL anyway in
stored rules.  This seems less than future-proof, and it's not even
saving very much, so just undo the decision and treat the field like
all others.

Several places that convert a view RTE into a subquery RTE, or similar
manipulations, failed to clear out fields that were specific to the
original RTE type and should be zero in a subquery RTE.  Since readfuncs.c
will leave such fields as zero, equalfuncs.c thinks the nodes are different
leading to a reported mismatch.  It seems like a good idea to clear out the
no-longer-needed fields, even though in principle nothing should look at
them; the node ought to be indistinguishable from how it would look if
we'd built a new node instead of scribbling on the old one.

BuildOnConflictExcludedTargetlist randomly set the resname of some
TargetEntries to "" not NULL.  outfuncs/readfuncs don't distinguish those
cases, and so the string will read back in as NULL ... but equalfuncs.c
does distinguish.  Perhaps we ought to try to make things more consistent
in this area --- but it's just useless extra code space for
BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for
now by making it do that.

catversion bumped because the change in handling of Query.withCheckOptions
affects stored rules.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 15:08:28 -04:00
Tom Lane 09991e5a47 Fix some probably-minor oversights in readfuncs.c.
The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and
colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't
restore them.  This doesn't cause obvious failures, because the only things
that look at those fields are expandRTE() and get_rte_attribute_type(),
which are mostly used during parse analysis, before anything would've
passed the parsetree through outfuncs/readfuncs.  But expandRTE() is used
in build_physical_tlist(), which means that that function will return a
wrong answer for a TABLEFUNC RTE that came from a view.  Very accidentally,
this doesn't cause serious problems, because what it will return is NIL
which callers will interpret as "couldn't build a physical tlist because
of dropped columns".  So you still get a plan that works, though it's
marginally less efficient than it could be.  There are also some other
expandRTE() calls associated with transformation of whole-row Vars in
the planner.  I have been unable to exhibit misbehavior from that, and
it may be unreachable in any case that anyone would care about ... but
I'm not entirely convinced, so this seems like something we should back-
patch a fix for.  Fortunately, we can fix it without forcing a change
of stored rules and a catversion bump, because we can just copy these
lists from the subsidiary TableFunc object.

readfuncs.c was also missing support for NamedTuplestoreScan plan nodes.
This accidentally fails to break parallel query because a query using
a named tuplestore would never be considered parallel-safe anyway.
However, project policy since parallel query came in is that all plan
node types should have outfuncs/readfuncs support, so this is clearly
an oversight that should be repaired.

Noted while fooling around with a patch to test outfuncs/readfuncs more
thoroughly.  That exposed some other issues too, but these are the only
ones that seem worth back-patching.

Back-patch to v10 where both of these features came in.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 13:02:27 -04:00
Thomas Munro 422952ee78 Allow DSM allocation to be interrupted.
Chris Travers reported that the startup process can repeatedly try to
cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
to loop forever.  Teach the retry loop to give up if an interrupt is
pending.  Don't actually check for interrupts in that loop though,
because a non-local exit would skip some clean-up code in the caller.

Back-patch to 9.4 where DSM was added (and posix_fallocate() was later
back-patched).

Author: Chris Travers
Reviewed-by: Ildar Musin, Murat Kabilov, Oleksii Kliukin
Tested-by: Oleksii Kliukin
Discussion: https://postgr.es/m/CAN-RpxB-oeZve_J3SM_6%3DHXPmvEG%3DHX%2B9V9pi8g2YR7YW0rBBg%40mail.gmail.com
2018-09-18 22:56:36 +12:00
Michael Paquier 1d6fbc38d9 Refactor routines for subscription and publication lookups
Those routines gain a missing_ok argument, allowing a caller to get a
NULL result instead of an error if set to true.  This is part of a
larger refactoring effort for objectaddress.c where trying to check for
non-existing objects does not result in cache lookup failures.

Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
2018-09-18 12:00:18 +09:00
Tom Lane 07a3af0ff8 Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)).
The original coding for XMLTABLE thought it could represent a default
namespace by a T_String Value node with a null string pointer.  That's
not okay, though; in particular outfuncs.c/readfuncs.c are not on board
with such a representation, meaning you'll get a null pointer crash
if you try to store a view or rule containing this construct.

To fix, change the parsetree representation so that we have a NULL
list element, instead of a bogus Value node.

This isn't really a functional limitation since default XML namespaces
aren't yet implemented in the executor; you'd just get "DEFAULT
namespace is not supported" anyway.  But crashes are not nice, so
back-patch to v10 where this syntax was added.  Ordinarily we'd consider
a parsetree representation change to be un-backpatchable; but since
existing releases would crash on the way to storing such constructs,
there can't be any existing views/rules to be incompatible with.

Per report from Andrey Lepikhov.

Discussion: https://postgr.es/m/3690074f-abd2-56a9-144a-aa5545d7a291@postgrespro.ru
2018-09-17 13:16:32 -04:00
Tom Lane 3844adbf3c Add outfuncs.c support for RawStmt nodes.
I noticed while poking at a report from Andrey Lepikhov that the
recent addition of RawStmt nodes at the top of raw parse trees
makes it impossible to print any raw parse trees whatsoever,
because outfuncs.c doesn't know RawStmt and hence fails to descend
into it.

While we generally lack outfuncs.c support for utility statements,
there is reasonably complete support for what you can find in a
raw SELECT statement.  It was not my intention to make that all
dead code ... so let's add support for RawStmt.

Back-patch to v10 where RawStmt appeared.
2018-09-16 13:02:47 -04:00
Tom Lane 8f32bacc00 In v11, disable JIT by default (it's still enabled by default in HEAD).
Per discussion, JIT isn't quite mature enough to ship enabled-by-default.

I failed to resist the temptation to do a bunch of copy-editing on the
related documentation.  Also, clean up some inconsistencies in which
section of config.sgml the JIT GUCs are documented in vs. what guc.c
and postgresql.config.sample had.

Discussion: https://postgr.es/m/20180914222657.mw25esrzbcnu6qlu@alap3.anarazel.de
2018-09-15 17:24:35 -04:00
Tom Lane 1f4a920b73 Fix failure with initplans used conditionally during EvalPlanQual rechecks.
The EvalPlanQual machinery assumes that any initplans (that is,
uncorrelated sub-selects) used during an EPQ recheck would have already
been evaluated during the main query; this is implicit in the fact that
execPlan pointers are not copied into the EPQ estate's es_param_exec_vals.
But it's possible for that assumption to fail, if the initplan is only
reached conditionally.  For example, a sub-select inside a CASE expression
could be reached during a recheck when it had not been previously, if the
CASE test depends on a column that was just updated.

This bug is old, appearing to date back to my rewrite of EvalPlanQual in
commit 9f2ee8f28, but was not detected until Kyle Samson reported a case.

To fix, force all not-yet-evaluated initplans used within the EPQ plan
subtree to be evaluated at the start of the recheck, before entering the
EPQ environment.  This could be inefficient, if such an initplan is
expensive and goes unused again during the recheck --- but that's piling
one layer of improbability atop another.  It doesn't seem worth adding
more complexity to prevent that, at least not in the back branches.

It was convenient to use the new-in-v11 ExecEvalParamExecParams function
to implement this, but I didn't like either its name or the specifics of
its API, so revise that.

Back-patch all the way.  Rather than rewrite the patch to avoid depending
on bms_next_member() in the oldest branches, I chose to back-patch that
function into 9.4 and 9.3.  (This isn't the first time back-patches have
needed that, and it exhausted my patience.)  I also chose to back-patch
some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3,
so that the 9.x versions of eval-plan-qual.spec are all the same.

Andrew Gierth diagnosed the problem and contributed the added test cases,
though the actual code changes are by me.

Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
2018-09-15 13:42:33 -04:00
Alvaro Herrera 6b78231d91 Move PartitionDispatchData struct definition to execPartition.c
There's no reason to expose the struct definition, so don't.

Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Discussion: https://postgr.es/m/d3fa24c1-bc65-7133-81df-6474387ccc4f@lab.ntt.co.jp
2018-09-14 19:06:57 -03:00
Alvaro Herrera 20bef2c311 Fix ALTER/TYPE on columns referenced by FKs in partitioned tables
When ALTER TABLE ... SET DATA TYPE affects a column referenced by
constraints and indexes, it drop those constraints and indexes and
recreates them afterwards, so that the definitions match the new data
type.  The original code did this by dropping one object at a time
(commit 077db40fa1 of May 2004), which worked fine because the
dependencies between the objects were pretty straightforward, and
ordering the objects in a specific way was enough to make this work.
However, when there are foreign key constraints in partitioned tables,
the dependencies are no longer so straightforward, and we were getting
errors when attempted:
  ERROR:  cache lookup failed for constraint 16398

This can be fixed by doing all the drops in one pass instead, using
performMultipleDeletions (introduced by df18c51f29 of Aug 2006).  With
this change we can also remove the code to carefully order the list of
objects to be deleted.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAKcux6nWS_m+s=1Udk_U9B+QY7pA-Ac58qR5BdUfOyrwnWHDew@mail.gmail.com
2018-09-14 13:41:20 -03:00
Andrew Gierth 728202b63c Order active window clauses for greater reuse of Sort nodes.
By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.

Author: Daniel Gustafsson (with some editorialization by me)
Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
2018-09-14 17:35:42 +01:00
Amit Kapila 75f9c4ca5a Don't allow LIMIT/OFFSET clause within sub-selects to be pushed to workers.
Allowing sub-select containing LIMIT/OFFSET in workers can lead to
inconsistent results at the top-level as there is no guarantee that the
row order will be fully deterministic.  The fix is to prohibit pushing
LIMIT/OFFSET within sub-selects to workers.

Reported-by: Andrew Fletcher
Bug: 15324
Author: Amit Kapila
Reviewed-by: Dilip Kumar
Backpatch-through: 9.6
Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
2018-09-14 09:36:30 +05:30
Michael Paquier 28a8fa984c Improve autovacuum logging for aggressive and anti-wraparound runs
A log message was being generated when log_min_duration is reached for
autovacuum on a given relation to indicate if it was an aggressive run,
and missed the point of mentioning if it is doing an anti-wrapround
run.  The log message generated is improved so as one, both or no extra
details are added depending on the option set.

Author: Sergei Kornilov
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/11587951532155118@sas1-19a94364928d.qloud-c.yandex.net
2018-09-14 07:35:39 +09:00
Amit Kapila bc153c941d Attach FPI to the first record after full_page_writes is turned on.
XLogInsert fails to attach a required FPI to the first record after
full_page_writes is turned on by the last checkpoint.  This bug got
introduced in 9.5 due to code rearrangement in commits 2c03216d83 and
2076db2aea.  Fix it by ensuring that XLogInsertRecord performs a
recomputation when the given record is generated with FPW as off but
found that the flag has been turned on while actually inserting the
record.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Amit Kapila
Backpatch-through: 9.5 where this problem was introduced
Discussion: https://postgr.es/m/20180420.151043.74298611.horiguchi.kyotaro@lab.ntt.co.jp
2018-09-13 15:32:50 +05:30
Michael Paquier 514a731ddc Simplify static function in extension.c
An extra argument for the filename defining the extension script
location was present, aimed at being used for error reporting, but has
never been used.  This was around since extensions have been added in
d9572c4.

Author: Yugo Nagata
Reviewed-by: Tatsuo Ishii
Discussion: https://postgr.es/m/20180907180504.1ff19e1675bb44a67e9c7ab1@sraoss.co.jp
2018-09-13 16:56:57 +09:00
Peter Eisentraut e5f1bb92cf Simplify index tuple descriptor initialization
We have two code paths for initializing the tuple descriptor for a new
index: For a normal index, we copy the tuple descriptor from the table
and reset a number of fields that are not applicable to indexes.  For an
expression index, we make a blank tuple descriptor and fill in the
needed fields based on the provided expressions.  As pg_attribute has
grown over time, the number of fields that we need to reset in the first
case is now bigger than the number of fields we actually want to copy,
so it's sensible to do it the other way around: Make a blank descriptor
and copy just the fields we need.  This also allows more code sharing
between the two branches, and it avoids having to touch this code for
almost every unrelated change to the pg_attribute structure.

Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
2018-09-13 08:22:03 +02:00
Andrew Gierth b7f6bcbffc Repair bug in regexp split performance improvements.
Commit c8ea87e4b introduced a temporary conversion buffer for
substrings extracted during regexp splits. Unfortunately the code that
sized it was failing to ignore the effects of ignored degenerate
regexp matches, so for regexp_split_* calls it could under-size the
buffer in such cases.

Fix, and add some regression test cases (though those will only catch
the bug if run in a multibyte encoding).

Backpatch to 9.3 as the faulty code was.

Thanks to the PostGIS project, Regina Obe and Paul Ramsey for the
report (via IRC) and assistance in analysis. Patch by me.
2018-09-12 19:31:06 +01:00
Tom Lane fedc97cdfd Remove ruleutils.c's special case for BIT [VARYING] literals.
Up to now, get_const_expr() insisted on prefixing BIT and VARBIT
literals with 'B'.  That's not really necessary, because we always
append explicit-cast syntax to identify the constant's type.
Moreover, it's subtly wrong for VARBIT, because the parser will
interpret B'...' as '...'::"bit"; see make_const() which explicitly
assigns type BITOID for a T_BitString literal.  So what had been
a simple VARBIT literal is reconstructed as ('...'::"bit")::varbit,
which is not the same thing, at least not before constant folding.
This results in odd differences after dump/restore, as complained
of by the patch submitter, and it could result in actual failures in
partitioning or inheritance DDL operations (see commit 542320c2b,
which repaired similar misbehaviors for some other data types).

Fixing it is pretty easy: just remove the special case and let the
default code path handle these types.  We could have kept the special
case for BIT only, but there seems little point in that.

Like the previous patch, I judge that back-patching this into stable
branches wouldn't be a good idea.  However, it seems not quite too
late for v11, so let's fix it there.

Paul Guo, reviewed by Davy Machado and John Naylor, minor adjustments
by me

Discussion: https://postgr.es/m/CABQrizdTra=2JEqA6+Ms1D1k1Kqw+aiBBhC9TreuZRX2JzxLAA@mail.gmail.com
2018-09-11 16:32:25 -04:00
Andrew Gierth 500d49794f Repair double-free in SP-GIST rescan (bug #15378)
spgrescan would first reset traversalCxt, and then traverse a
potentially non-empty stack containing pointers to traversalValues
which had been allocated in those contexts, freeing them a second
time. This bug originates in commit ccd6eb49a where traversalValue was
introduced.

Repair by traversing the stack before the context reset; this isn't
ideal, since it means doing retail pfree in a context that's about to
be reset, but the freeing of a stack entry is also done in other
places in the code during the scan so it's not worth trying to
refactor it further. Regression test added.

Backpatch to 9.6 where the problem was introduced.

Per bug #15378; analysis and patch by me, originally from a report on
IRC by user velix; see also PostGIS ticket #4174; review by Alexander
Korotkov.

Discussion: https://postgr.es/m/153663176628.23136.11901365223750051490@wrigleys.postgresql.org
2018-09-11 18:14:19 +01:00
Alexander Korotkov cf98467242 Improve behavior of to_timestamp()/to_date() functions
to_timestamp()/to_date() functions were introduced mainly for Oracle
compatibility, and became very popular among PostgreSQL users.  However, some
behavior of to_timestamp()/to_date() functions are both incompatible with Oracle
and confusing for our users.  This behavior is related to handling of spaces and
separators in non FX (fixed format) mode.  This commit reworks this behavior
making less confusing, better documented and more compatible with Oracle.

Nevertheless, there are still following incompatibilities with Oracle.
1) We don't insist that there are no format string patterns unmatched to
   input string.
2) In FX mode we don't insist space and separators in format string to exactly
   match input string.
3) When format string patterns are divided by mix of spaces and separators, we
   don't distinguish them, while Oracle takes into account only last group of
   spaces/separators.

Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com
Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova
Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston
2018-09-09 21:19:51 +03:00
Alexander Korotkov 5f08accdad Fix past pd_upper write in ginRedoRecompress()
ginRedoRecompress() replays actions over compressed segments of posting list
in-place.  However, it might lead to write past pg_upper, because intermediate
state during playing the changes can take more space than both original state
and final state.  This commit fixes that by refuse from in-place modification.
Instead page tail is copied once modification is started, and then it's used
as the source of original segments.  Backpatch to 9.4 where posting list
compression was introduced.

Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com
Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian
Review: Sivasubramanian Ramasubramanian
Backpatch-through: 9.4
2018-09-09 21:19:29 +03:00
Tom Lane ff47d4bf1f Work around stdbool problem in dfmgr.c.
Commit 842cb9fa6 refactored things so that dfmgr.c includes <dlfcn.h>,
which before that had only been directly included in platform-specific
stub files.  It turns out that on macOS, <dlfcn.h> includes <stdbool.h>,
and that causes problems on platforms where _Bool is not char-sized ...
which happens to include the PPC versions of macOS.  Work around it
much as we have in plperl.h, by #undef'ing bool after including the
problematic file, but only if we're not using stdbool-style booleans.

Discussion: https://postgr.es/m/E1fxqjl-0003YS-NS@gemulon.postgresql.org
2018-09-09 12:41:27 -04:00
Tom Lane ed0cdf0e05 Install a check for mis-linking of src/port and src/common functions.
On ELF-based platforms (and maybe others?) it's possible for a shared
library, when dynamically loaded into the backend, to call the backend
versions of src/port and src/common functions rather than the frontend
versions that are actually linked into the shlib.  This is definitely
not what we want, because the frontend versions often behave slightly
differently.  Up to now it's been "slight" enough that nobody noticed;
but with the addition of SCRAM support functions in src/common, we're
observing crashes due to the difference between palloc and malloc
memory allocation rules, as reported in bug #15367 from Jeremy Evans.

The purpose of this patch is to create a direct test for this type of
mis-linking, so that we know whether any given platform requires extra
measures to prevent using the wrong functions.  If the test fails, it
will lead to connection failures in the contrib/postgres_fdw regression
test.  At the moment, *BSD platforms using ELF format are known to have
the problem and can be expected to fail; but we need to know whether
anything else does, and we need a reliable ongoing check for future
platforms.

Actually fixing the problem will be the subject of later commit(s).

Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org
2018-09-09 12:23:23 -04:00
Tom Lane f47f314801 Minor cleanup/future-proofing for pg_saslprep().
Ensure that pg_saslprep() initializes its output argument to NULL in
all failure paths, and then remove the redundant initialization that
some (not all) of its callers did.  This does not fix any live bug,
but it reduces the odds of future bugs of omission.

Also add a comment about why the existing failure-path coding is
adequate.

Back-patch so as to keep the function's API consistent across branches,
again to forestall future bug introduction.

Patch by me, reviewed by Michael Paquier

Discussion: https://postgr.es/m/16558.1536407783@sss.pgh.pa.us
2018-09-08 18:20:36 -04:00
Michael Paquier 9226a3b89b Remove duplicated words split across lines in comments
This has been detected using some interesting tricks with sed, and the
method used is mentioned in details in the discussion below.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20180908013109.GB15350@telsasoft.com
2018-09-08 12:24:19 -07:00
Tom Lane 361844fe56 Save/restore SPI's global variables in SPI_connect() and SPI_finish().
This patch removes two sources of interference between nominally
independent functions when one SPI-using function calls another,
perhaps without knowing that it does so.

Chapman Flack pointed out that xml.c's query_to_xml_internal() expects
SPI_tuptable and SPI_processed to stay valid across datatype output
function calls; but it's possible that such a call could involve
re-entrant use of SPI.  It seems likely that there are similar hazards
elsewhere, if not in the core code then in third-party SPI users.
Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which
would typically make for a crash in such a situation.  Restoring them
to the values they had at SPI_connect() seems like a considerably more
useful behavior, and it still meets the design goal of not leaving any
dangling pointers to tuple tables of the function being exited.

Also, cause SPI_connect() to reset these variables to zeroes/nulls after
saving them.  This prevents interference in the opposite direction: it's
possible that a SPI-using function that's only ever been tested standalone
contains assumptions that these variables start out as zeroes.  That was
the case as long as you were the outermost SPI user, but not so much for
an inner user.  Now it's consistent.

Report and fix suggestion by Chapman Flack, actual patch by me.
Back-patch to all supported branches.

Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
2018-09-07 20:09:57 -04:00
Tom Lane f510412df3 Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.
It's somewhat surprising that we got away with this before.  (Actually,
since nobody tests this routinely AFAIK, it might've been broken for
awhile.  But it's definitely broken in the wake of commit f868a8143.)
It seems sufficient to limit the forced recursion to a small number
of levels.

Back-patch to all supported branches, like the preceding patch.

Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07 18:13:29 -04:00
Tom Lane f868a8143a Fix longstanding recursion hazard in sinval message processing.
LockRelationOid and sibling routines supposed that, if our session already
holds the lock they were asked to acquire, they could skip calling
AcceptInvalidationMessages on the grounds that we must have already read
any remote sinval messages issued against the relation being locked.
This is normally true, but there's a critical special case where it's not:
processing inside AcceptInvalidationMessages might attempt to access system
relations, resulting in a recursive call to acquire a relation lock.

Hence, if the outer call had acquired that same system catalog lock, we'd
fall through, despite the possibility that there's an as-yet-unread sinval
message for that system catalog.  This could, for example, result in
failure to access a system catalog or index that had just been processed
by VACUUM FULL.  This is the explanation for buildfarm failures we've been
seeing intermittently for the past three months.  The bug is far older
than that, but commits a54e1f158 et al added a new recursion case within
AcceptInvalidationMessages that is apparently easier to hit than any
previous case.

To fix this, we must not skip calling AcceptInvalidationMessages until
we have *finished* a call to it since acquiring a relation lock, not
merely acquired the lock.  (There's already adequate logic inside
AcceptInvalidationMessages to deal with being called recursively.)
Fortunately, we can implement that at trivial cost, by adding a flag
to LOCALLOCK hashtable entries that tracks whether we know we have
completed such a call.

There is an API hazard added by this patch for external callers of
LockAcquire: if anything is testing for LOCKACQUIRE_ALREADY_HELD,
it might be fooled by the new return code LOCKACQUIRE_ALREADY_CLEAR
into thinking the lock wasn't already held.  This should be a fail-soft
condition, though, unless something very bizarre is being done in
response to the test.

Also, I added an additional output argument to LockAcquireExtended,
assuming that that probably isn't called by any outside code given
the very limited usefulness of its additional functionality.

Back-patch to all supported branches.

Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
2018-09-07 18:04:54 -04:00
Michael Paquier 8582b4d044 Improve handling of corrupted two-phase state files at recovery
When a corrupted two-phase state file is found by WAL replay, be it for
crash recovery or archive recovery, then the file is simply skipped and
a WARNING is logged to the user, causing the transaction to be silently
lost.  Facing an on-disk WAL file which is corrupted is as likely to
happen as what is stored in WAL records, but WAL records are already
able to fail hard if there is a CRC mismatch.  On-disk two-phase state
files, on the contrary, are simply ignored if corrupted.  Note that when
restoring the initial two-phase data state at recovery, files newer than
the horizon XID are discarded hence no files present in pg_twophase/
should be torned and have been made durable by a previous checkpoint, so
recovery should never see any corrupted two-phase state file by design.

The situation got better since 978b2f6 which has added two-phase state
information directly in WAL instead of using on-disk files, so the risk
is limited to two-phase transactions which live across at least one
checkpoint for long periods.  Backups having legit two-phase state files
on-disk could also lose silently transactions when restored if things
get corrupted.

This behavior exists since two-phase commit has been introduced, no
back-patch is done for now per the lack of complaints about this
problem.

Author: Michael Paquier
Discussion: https://postgr.es/m/20180709050309.GM1467@paquier.xyz
2018-09-07 11:00:16 -07:00
Peter Eisentraut 98afa68d93 Use C99 designated initializers for some structs
These are just a few particularly egregious cases that were hard to read
and write, and error prone because of many similar adjacent types.

Discussion: https://www.postgresql.org/message-id/flat/4c9f01be-9245-2148-b569-61a8562ef190%402ndquadrant.com
2018-09-07 11:40:03 +02:00
Peter Eisentraut 842cb9fa62 Refactor dlopen() support
Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support.  So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive.  Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.

Discussion: https://www.postgresql.org/message-id/flat/e11a49cb-570a-60b7-707d-7084c8de0e61%402ndquadrant.com#54e735ae37476a121abb4e33c2549b03
2018-09-06 11:33:04 +02:00
Tom Lane 54b01b9293 Remove no-longer-used variable.
Oversight in 2fbdf1b38.  Per buildfarm.
2018-09-05 14:29:58 -04:00
Alvaro Herrera 2fbdf1b38b Simplify partitioned table creation vs. relcache
In the original code, we were storing the pg_inherits row for a
partitioned table too early: enough that we had a hack for relcache to
avoid falling flat on its face while reading such a partial entry.  If
we finish the pg_class creation first and *then* store the pg_inherits
entry, we don't need that hack.

Also recognize that pg_class.relpartbound is not marked NOT NULL and
therefore it's entirely possible to read null values, so having only
Assert() protection isn't enough.  Change those to if/elog tests
instead.  This qualifies as a robustness fix, so backpatch to pg11.

In passing, remove one access that wasn't actually needed, and reword
one message to be like all the others that check for the same thing.

Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql
2018-09-05 14:36:13 -03:00
Michael Paquier d6e98ebe37 Improve some error message strings and errcodes
This makes a bit less work for translators, by unifying error strings a
bit more with what the rest of the code does, this time for three error
strings in autoprewarm and one in base backup code.

After some code review of slot.c, some file-access errcodes are reported
but lead to an incorrect internal error, while corrupted data makes the
most sense, similarly to the previous work done in e41d0a1.  Also,
after calling rmtree(), a WARNING gets reported, which is a duplicate of
what the internal call report, so make the code more consistent with all
other code paths calling this function.

Author: Michael Paquier
Discussion: https://postgr.es/m/20180902200747.GC1343@paquier.xyz
2018-09-04 11:06:04 -07:00
Tom Lane 17b7c302b5 Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.

Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.

To fix, create a unique index on (conrelid, contypid, conname).  Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain.  (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed.  But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.)  This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.

Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.

Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André.  And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.

While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.

Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 13:45:35 -04:00
Amit Kapila 14e9b2a752 Prohibit pushing subqueries containing window function calculation to
workers.

Allowing window function calculation in workers leads to inconsistent
results because if the input row ordering is not fully deterministic, the
output of window functions might vary across workers.  The fix is to treat
them as parallel-restricted.

In the passing, improve the coding pattern in max_parallel_hazard_walker
so that it has a chain of mutually-exclusive if ... else if ... else if
... else if ... IsA tests.

Reported-by: Marko Tiikkaja
Bug: 15324
Author: Amit Kapila
Reviewed-by: Tom Lane
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAL9smLAnfPJCDUUG4ckX2iznj53V7VSMsYefzZieN93YxTNOcw@mail.gmail.com
2018-09-04 10:28:08 +05:30
Amit Kapila 7c9e19ca9a During the split, set checksum on an empty hash index page.
On a split, we allocate a new splitpoint's worth of bucket pages wherein
we initialize the last page with zeros which is fine, but we forgot to set
the checksum for that last page.

We decided to back-patch this fix till 10 because we don't have an easy
way to test it in prior versions.  Another reason is that the hash-index
code is changed heavily in 10, so it is not advisable to push the fix
without testing it in prior versions.

Author: Amit Kapila
Reviewed-by: Yugo Nagata
Backpatch-through: 10
Discussion: https://postgr.es/m/5d03686d-727c-dbf8-0064-bf8b97ffe850@2ndquadrant.com
2018-09-04 08:35:42 +05:30
Alvaro Herrera c076f3d74a Remove pg_constraint.conincluding
This column was added in commit 8224de4f42 ("Indexes with INCLUDE
columns and their support in B-tree") to ease writing the ruleutils.c
supporting code for that feature, but it turns out to be unnecessary --
we can do the same thing with just one more syscache lookup.

Even the documentation for the new column being removed in this commit
is awkward.

Discussion: https://postgr.es/m/20180902165018.33otxftp3olgtu4t@alvherre.pgsql
2018-09-03 12:59:26 -03:00
Tomas Vondra 4ddd8f5f55 Fix memory leak in TRUNCATE decoding
When decoding a TRUNCATE record, the relids array was being allocated in
the main ReorderBuffer memory context, but not released with the change
resulting in a memory leak.

The array was also ignored when serializing/deserializing the change,
assuming all the information is stored in the change itself.  So when
spilling the change to disk, we've only we have serialized only the
pointer to the relids array.  Thanks to never releasing the array,
the pointer however remained valid even after loading the change back
to memory, preventing an actual crash.

This fixes both the memory leak and (de)serialization.  The relids array
is still allocated in the main ReorderBuffer memory context (none of the
existing ones seems like a good match, and adding an extra context seems
like an overkill).  The allocation is wrapped in a new ReorderBuffer API
functions, to keep the details within reorderbuffer.c, just like the
other ReorderBufferGet methods do.

Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com
Backpatch: 11, where decoding of TRUNCATE was introduced
2018-09-03 02:10:24 +02:00
Michael Paquier caa0c6ceba Fix initial sync of slot parent directory when restoring status
At the beginning of recovery, information from replication slots is
recovered from disk to memory.  In order to ensure the durability of the
information, the status file as well as its parent directory are
synced.  It happens that the sync on the parent directory was done
directly using the status file path, which is logically incorrect, and
the current code has been doing a sync on the same object twice in a
row.

Reported-by: Konstantin Knizhnik
Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru
Backpatch-through: 9.4-
2018-09-02 12:40:30 -07:00
Tom Lane 44cac93464 Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:17 -04:00
Alexander Korotkov ec74369931 Implement "pg_ctl logrotate" command
Currently there are two ways to trigger log rotation in logging collector
process: call pg_rotate_logfile() SQL-function or send SIGUSR1 signal directly
to logging collector process.  However, it's nice to have more suitable way
for external tools to do that, which wouldn't require SQL connection or
knowledge of logging collector pid.  This commit implements triggering log
rotation by "pg_ctl logrotate" command.

Discussion: https://postgr.es/m/20180416.115435.28153375.horiguchi.kyotaro%40lab.ntt.co.jp
Author: Kyotaro Horiguchi, Alexander Kuzmenkov, Alexander Korotkov
2018-09-01 19:46:49 +03:00
Noah Misch ab0ed6153a Ignore server-side delays when enforcing wal_sender_timeout.
Healthy clients of servers having poor I/O performance, such as
buildfarm members hamster and tern, saw unexpected timeouts.  That
disagreed with documentation.  This fix adds one gettimeofday() call
whenever ProcessRepliesIfAny() finds no client reply messages.
Back-patch to 9.4; the bug's symptom is rare and mild, and the code all
moved between 9.3 and 9.4.

Discussion: https://postgr.es/m/20180826034600.GA1105084@rfd.leadboat.com
2018-08-31 22:59:58 -07:00
Michael Paquier c186ba135e Ensure correct minimum consistent point on standbys
Startup process has improved its calculation of incorrect minimum
consistent point in 8d68ee6, which ensures that all WAL available gets
replayed when doing crash recovery, and has introduced an incorrect
calculation of the minimum recovery point for non-startup processes,
which can cause incorrect page references on a standby when for example
the background writer flushed a couple of pages on-disk but was not
updating the control file to let a subsequent crash recovery replay to
where it should have.

The only case where this has been reported to be a problem is when a
standby needs to calculate the latest removed xid when replaying a btree
deletion record, so one would need connections on a standby that happen
just after recovery has thought it reached a consistent point.  Using a
background worker which is started after the consistent point is reached
would be the easiest way to get into problems if it connects to a
database.  Having clients which attempt to connect periodically could
also be a problem, but the odds of seeing this problem are much lower.

The fix used is pretty simple, as the idea is to give access to the
minimum recovery point written in the control file to non-startup
processes so as they use a reference, while the startup process still
initializes its own references of the minimum consistent point so as the
original problem with incorrect page references happening post-promotion
with a crash do not show up.

Reported-by: Alexander Kukushkin
Diagnosed-by: Alexander Kukushkin
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Alexander Kukushkin
Discussion: https://postgr.es/m/153492341830.1368.3936905691758473953@wrigleys.postgresql.org
Backpatch-through: 9.3
2018-08-31 11:03:40 -07:00
Etsuro Fujita 7cfdc77023 Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's.  So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr.  To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:

* When creating an explicit sort node for an input path for a mergejoin
  path for a child join, prepare_sort_from_pathkeys() threw the 'could not
  find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
  subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
  the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
  contrib/postgres_fdw for EvalPlanQual support for a pushed down child
  join, fix_join_expr() threw the 'variable not found in subplan target
  lists' error.

To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me.  While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead.  We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.

Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation.  So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.

Back-patch to PG11 where partitionwise join was introduced.

Report by Rajkumar Raghuwanshi.  Analysis by Ashutosh Bapat, who also
provided some of regression tests.  Patch by me, reviewed by Robert Haas.

Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 20:34:06 +09:00
Etsuro Fujita 2e39f69b66 Remove extra word from src/backend/optimizer/README 2018-08-31 16:40:17 +09:00