Commit Graph

31610 Commits

Author SHA1 Message Date
Robert Haas b022923556 Revise API for partition_rbound_cmp/partition_rbound_datum_cmp.
Instead of passing the PartitionKey, pass just the required bits of
it.  This allows these functions to be used without needing the
PartitionKey to be available, which is important for several
pending patches.

Ashutosh Bapat, reviewed by Amit Langote, with a comment tweak
by me.

Discussion: http://postgr.es/m/3d835ed1-36ab-f06d-0ce8-a76a2bbf7677@lab.ntt.co.jp
Discussion: http://postgr.es/m/b4d88995-094b-320c-b614-2282fae0bf6c@lab.ntt.co.jp
2018-02-23 08:43:52 -05:00
Peter Eisentraut 76b6aa41f4 Support parameters in CALL
To support parameters in CALL, move the parse analysis of the procedure
and arguments into the global transformation phase, so that the parser
hooks can be applied.  And then at execution time pass the parameters
from ProcessUtility on to ExecuteCallStmt.
2018-02-22 21:36:48 -05:00
Robert Haas a6a80134e3 Remove extra words.
Thomas Munro

Discussion: http://postgr.es/m/CAEepm=2x3NUSPed6=-wDYs39KtUU5Dw3mK_NAMWps+18FmkApQ@mail.gmail.com
2018-02-22 18:06:30 -05:00
Peter Eisentraut abcba7001e Fix perlcritic warnings 2018-02-22 15:13:57 -05:00
Peter Eisentraut 10cfce34c0 Add user-callable SHA-2 functions
Add the user-callable functions sha224, sha256, sha384, sha512.  We
already had these in the C code to support SCRAM, but there was no test
coverage outside of the SCRAM tests.  Adding these as user-callable
functions allows writing some tests.  Also, we have a user-callable md5
function but no more modern alternative, which led to wide use of md5 as
a general-purpose hash function, which leads to occasional complaints
about using md5.

Also mark the existing md5 functions as leak-proof.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-02-22 11:34:53 -05:00
Robert Haas edd44738bc Be lazier about partition tuple routing.
It's not necessary to fully initialize the executor data structures
for partitions to which no tuples are ever routed.  Consider, for
example, an INSERT statement that inserts only one row: it only cares
about the partition to which that one row is routed.  The new function
ExecInitPartitionInfo performs the initialization in question only
when a particular partition is about to receive a tuple. This includes
creating, validating, and saving a pointer to the ResultRelInfo,
setting up for speculative insertions, translating WCOs and
initializing the resulting expressions, translating returning lists
and building the appropriate projection information, and setting up a
tuple conversion map.

One thing that's not deferred is locking the child partitions; that
seems desirable but would need more thought.  Still, testing shows
that this makes single-row inserts significantly faster on a table
with many partitions without harming the bulk-insert case.

Amit Langote, reviewed by Etsuro Fujita, with a few changes by me

Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
2018-02-22 10:55:54 -05:00
Robert Haas 810e7e264a Remove extra word from comment.
Etsuro Fujita

Discussion: http://postgr.es/m/5A8EAF74.5010905@lab.ntt.co.jp
2018-02-22 10:08:03 -05:00
Robert Haas de6428afe1 Avoid another valgrind complaint about write() of uninitalized bytes.
Peter Geoghegan, per buildfarm member skink and Andres Freund

Discussion: http://postgr.es/m/20180221053426.gp72lw67yfpzkw7a@alap3.anarazel.de
2018-02-22 09:28:12 -05:00
Robert Haas 9a5c4f58f3 Try to stabilize EXPLAIN output in partition_check test.
Commit 7d8ac9814b adjusted these
tests in the hope of preserving the plan shape, but I failed to
notice that the three partitions were, on my local machine, choosing
two different plan shapes.  This is probably related to the fact
that all three tables have exactly the same row count.  Try to
improve the situation by making pht1_e about half as large as
the other two.

Per Tom Lane and the buildfarm.

Discussion: http://postgr.es/m/25380.1519277713@sss.pgh.pa.us
2018-02-22 08:51:00 -05:00
Robert Haas 7d8ac9814b Charge cpu_tuple_cost * 0.5 for Append and MergeAppend nodes.
Previously, Append didn't charge anything at all, and MergeAppend
charged only cpu_operator_cost, about half the value used here.  This
change might make MergeAppend plans slightly more likely to be chosen
than before, since this commit increases the assumed cost for Append
-- with default values -- by 0.005 per tuple but MergeAppend by only
0.0025 per tuple.  Since the comparisons required by MergeAppend are
costed separately, it's not clear why MergeAppend needs to be
otherwise more expensive than Append, so hopefully this is OK.

Prior to partition-wise join, it didn't really matter whether or not
an Append node had any cost of its own, because every plan had to use
the same number of Append or MergeAppend nodes and in the same places.
Only the relative cost of Append vs. MergeAppend made a difference.
Now, however, it is possible to avoid some of the Append nodes using a
partition-wise join, so it's worth making an effort.  Pending patches
for partition-wise aggregate care too, because an Append of Aggregate
nodes will incur the Append overhead fewer times than an Aggregate
over an Append.  Although in most cases this change will favor the use
of partition-wise techniques, it does the opposite when the join
cardinality is greater than the sum of the input cardinalities.  Since
this situation arises in an existing regression test, I [rhaas]
adjusted it to keep the overall plan shape approximately the same.

Jeevan Chalke, per a suggestion from David Rowley.  Reviewed by
Ashutosh Bapat.  Some changes by me.  The larger patch series of which
this patch is a part was also reviewed and tested by Antonin Houska,
Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, Rafia Sabih, and me.

Discussion: http://postgr.es/m/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com
2018-02-21 23:09:27 -05:00
Tom Lane 38b41f182a Repair pg_upgrade's failure to preserve relfrozenxid for matviews.
This oversight led to data corruption in matviews, manifesting as
"could not access status of transaction" before our most recent releases,
and "found xmin from before relfrozenxid" errors since then.

The proximate cause of the problem seems to have been confusion between
the task of preserving dropped-column status and the task of preserving
frozenxid status.  Those are required for distinct sets of relkinds,
and the reasoning was entirely undocumented in the source code.  In hopes
of forestalling future errors of the same kind, try to improve the
commentary in this area.

In passing, also improve the remarkably unhelpful comments around
pg_upgrade's set_frozenxids().  That's not actually buggy AFAICS,
but good luck figuring out what it does from the old comments.

Per report from Claudio Freire.  It appears that bug #14852 from Alexey
Ermakov is an earlier report of the same issue, and there may be other
cases that we failed to identify at the time.

Patch by me based on analysis by Andres Freund.  The bug dates back
to the introduction of matviews, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAGTBQpbrY9CdRGGhyBZ9yqY4jWaGC85rUF4X+R7d-aim=mBNsw@mail.gmail.com
Discussion: https://postgr.es/m/20171013115320.28049.86457@wrigleys.postgresql.org
2018-02-21 18:40:24 -05:00
Andres Freund 4c0ec9ee28 Use platform independent type for TupleTableSlot->tts_off.
Previously tts_off was, for unknown reasons, of type long. For one
that's unnecessary as tuples are restricted in length, for another
long would be a bad choice of type even if that weren't the case, as
it's not reliably wider than an int. Also HeapTupleHeader->t_len is a
uint32.

This is split off from a larger patch implementing JITed tuple
deforming. Seems like an independent improvement, as tiny as it is.

Author: Andres Freund
2018-02-20 15:12:52 -08:00
Peter Eisentraut c2ff42c6c1 Error message improvement 2018-02-20 17:58:27 -05:00
Tom Lane 3486bcf9e8 Fix pg_dump's logic for eliding sequence limits that match the defaults.
The previous coding here applied atoi() to strings that could represent
values too large to fit in an int.  If the overflowed value happened to
match one of the cases it was looking for, it would drop that limit
value from the output, leading to incorrect restoration of the sequence.

Avoid the unsafe behavior, and also make the logic cleaner by explicitly
calculating the default min/max values for the appropriate kind of
sequence.

Reported and patched by Alexey Bashtanov, though I whacked his patch
around a bit.  Back-patch to v10 where the faulty logic was added.

Discussion: https://postgr.es/m/cb85a9a5-946b-c7c4-9cf2-6cd6e25d7a33@imap.cc
2018-02-20 11:23:42 -05:00
Magnus Hagander 9a44a26b65 Fix typo
Author: Masahiko Sawada
2018-02-20 12:03:18 +01:00
Alvaro Herrera 6f1d723b63 Fix crash in pg_replication_slot_advance
We were trying to use a LSN variable after releasing its containing slot
structure.

Reported by: tushar
Author: amul sul
Reviewed-by: Petr Jelinek, Masahiko Sawada
Discussion: https://postgr.es/m/94ba999c-f76a-0423-6523-b8d531dfe4c7@enterprisedb.com
2018-02-19 22:25:27 -03:00
Tom Lane 159efe4af4 Fix misbehavior of CTE-used-in-a-subplan during EPQ rechecks.
An updating query that reads a CTE within an InitPlan or SubPlan could get
incorrect results if it updates rows that are concurrently being modified.
This is caused by CteScanNext supposing that nothing inside its recursive
ExecProcNode call could change which read pointer is selected in the CTE's
shared tuplestore.  While that's normally true because of scoping
considerations, it can break down if an EPQ plan tree gets built during the
call, because EvalPlanQualStart builds execution trees for all subplans
whether they're going to be used during the recheck or not.  And it seems
like a pretty shaky assumption anyway, so let's just reselect our own read
pointer here.

Per bug #14870 from Andrei Gorita.  This has been broken since CTEs were
implemented, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20171024155358.1471.82377@wrigleys.postgresql.org
2018-02-19 16:00:31 -05:00
Alvaro Herrera 4108a28d3a Fix expected output 2018-02-19 17:56:43 -03:00
Alvaro Herrera eb7ed3f306 Allow UNIQUE indexes on partitioned tables
If we restrict unique constraints on partitioned tables so that they
must always include the partition key, then our standard approach to
unique indexes already works --- each unique key is forced to exist
within a single partition, so enforcing the unique restriction in each
index individually is enough to have it enforced globally.  Therefore we
can implement unique indexes on partitions by simply removing a few
restrictions (and adding others.)

Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql
Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql
Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime
	Casanova, Amit Langote
2018-02-19 17:40:00 -03:00
Tom Lane 524d64ea8e Remove bogus "extern" annotations on function definitions.
While this is not illegal C, project style is to put "extern" only on
declarations not definitions.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f9RKLWXcMBQhvDYhmsMEo+ALuNgA-NE+AX5Uoke9DJ2Xg@mail.gmail.com
2018-02-19 12:07:44 -05:00
Tom Lane 8c44802b6e Remove redundant initialization of a local variable.
In what was doubtless a typo, commit bf6c614a2 introduced a duplicate
initialization of a local variable.  This made Coverity unhappy, as well
as pretty much anybody reading the code.  We don't even have a real use
for the local variable, so just remove it.
2018-02-18 23:32:56 -05:00
Peter Eisentraut ebf6049ebe Fix StaticAssertExpr() under C++
The previous code didn't compile, because static_assert() must end with
a semicolon.  To fix, wrap it in a block, similar to the C code.
2018-02-18 22:28:11 -05:00
Peter Eisentraut 2e1d1ebdff Remove redundant function declaration 2018-02-18 22:28:11 -05:00
Peter Eisentraut 97a804cb2b Message style fix 2018-02-18 17:16:11 -05:00
Peter Eisentraut 1a1adb215c Move function comment to the right place 2018-02-17 20:45:28 -05:00
Peter Eisentraut 7923118c16 Minor comment fix 2018-02-17 20:45:02 -05:00
Alvaro Herrera a26116c6cb Refactor format_type APIs to be more modular
Introduce a new format_type_extended, with a flags bitmask argument that
can modify the default behavior.  A few compatibility and readability
wrappers remain:
	format_type_be
	format_type_be_qualified
	format_type_with_typemod
while format_type_with_typemod_qualified, which had a single caller, is
removed.

Author: Michael Paquier, some revisions by me
Discussion: 20180213035107.GA2915@paquier.xyz
2018-02-17 19:02:15 -03:00
Alvaro Herrera cef60043dd Mention trigger name in trigger test
This makes it more explicit exactly what is going on, for further
proposed behavior changes.

Discussion: https://postgr.es/m/20180214212624.hm7of76flesodamf@alvherre.pgsql
2018-02-17 13:18:34 -03:00
Andres Freund ad7dbee368 Allow tupleslots to have a fixed tupledesc, use in executor nodes.
The reason for doing so is that it will allow expression evaluation to
optimize based on the underlying tupledesc. In particular it will
allow to JIT tuple deforming together with the expression itself.

For that expression initialization needs to be moved after the
relevant slots are initialized - mostly unproblematic, except in the
case of nodeWorktablescan.c.

After doing so there's no need for ExecAssignResultType() and
ExecAssignResultTypeFromTL() anymore, as all former callers have been
converted to create a slot with a fixed descriptor.

When creating a slot with a fixed descriptor, tts_values/isnull can be
allocated together with the main slot, reducing allocation overhead
and increasing cache density a bit.

Author: Andres Freund
Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
2018-02-16 21:17:38 -08:00
Andres Freund bf6c614a2f Do execGrouping.c via expression eval machinery, take two.
This has a performance benefit on own, although not hugely so. The
primary benefit is that it will allow for to JIT tuple deforming and
comparator invocations.

Large parts of this were previously committed (773aec7aa), but the
commit contained an omission around cross-type comparisons and was
thus reverted.

Author: Andres Freund
Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-02-16 14:38:13 -08:00
Peter Eisentraut ad9a274778 Fix crash when canceling parallel query
elog(FATAL) would end up calling PortalCleanup(), which would call
executor shutdown code, which could fail and crash, especially under
parallel query.  This was introduced by
8561e4840c, which did not want to mark an
active portal as failed by a normal transaction abort anymore.  But we
do need to do that for an elog(FATAL) exit.  Introduce a variable
shmem_exit_inprogress similar to the existing proc_exit_inprogress, so
we can tell whether we are in the FATAL exit scenario.

Reported-by: Andres Freund <andres@anarazel.de>
2018-02-16 16:21:24 -05:00
Tom Lane 49bff412ed Remove some inappropriate #includes.
Other header files should never #include postgres.h (nor postgres_fe.h,
nor c.h), per project policy.  Also, there's no need for any backend .c
file to explicitly include elog.h or palloc.h, because postgres.h pulls
those in already.

Extracted from a larger patch by Kyotaro Horiguchi.  The rest of the
removals he suggests require more study, but these are no-brainers.

Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-16 12:14:08 -05:00
Peter Eisentraut 2fb1abaeb0 Rename enable_partition_wise_join to enable_partitionwise_join
Discussion: https://www.postgresql.org/message-id/flat/ad24e4f4-6481-066e-e3fb-6ef4a3121882%402ndquadrant.com
2018-02-16 10:33:59 -05:00
Magnus Hagander f8437c819a Fix typo in comment 2018-02-16 12:46:41 +01:00
Andres Freund 2a41507dab Revert "Do execGrouping.c via expression eval machinery."
This reverts commit 773aec7aa9.

There's an unresolved issue in the reverted commit: It only creates
one comparator function, but in for the nodeSubplan.c case we need
more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in
nodeSubplan.c).

This isn't too difficult to fix, but it's not entirely trivial
either. The fact that the issue only causes breakage on 32bit systems
shows that the current test coverage isn't that great.  To avoid
turning half the buildfarm red till those two issues are addressed,
revert.
2018-02-15 22:39:18 -08:00
Andres Freund 773aec7aa9 Do execGrouping.c via expression eval machinery.
This has a performance benefit on own, although not hugely so. The
primary benefit is that it will allow for to JIT tuple deforming and
comparator invocations.

Author: Andres Freund
Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-02-15 21:55:31 -08:00
Tom Lane 51db0d18fb Fix plpgsql to enforce domain checks when returning a NULL domain value.
If a plpgsql function is declared to return a domain type, and the domain's
constraints forbid a null value, it was nonetheless possible to return
NULL, because we didn't bother to check the constraints for a null result.
I'd noticed this while fooling with domains-over-composite, but had not
gotten around to fixing it immediately.

Add a regression test script exercising this and various other domain
cases, largely borrowed from the plpython_types test.

Although this is clearly a bug fix, I'm not sure whether anyone would
thank us for changing the behavior in stable branches, so I'm inclined
not to back-patch.
2018-02-15 16:25:19 -05:00
Tom Lane 51940f9760 Cast to void in StaticAssertExpr, not its callers.
Seems a bit silly that many (in fact all, as of today) uses of
StaticAssertExpr would need to cast it to void to avoid warnings from
pickier compilers.  Let's just do the cast right in the macro, instead.

In passing, change StaticAssertExpr to StaticAssertStmt in one
place where that seems more apropos.

Discussion: https://postgr.es/m/16161.1518715186@sss.pgh.pa.us
2018-02-15 13:41:30 -05:00
Tom Lane 03c5a00ea3 Move the extern declaration for ExceptionalCondition into c.h.
This is the logical conclusion of our decision to support Assert()
in both frontend and backend code: it should be possible to use that
after including just c.h.  But as things were arranged before, if
you wanted to use Assert() in code that might be compiled for either
environment, you had to include postgres.h for the backend case.
Let's simplify that.

Per buildfarm, some of whose members started throwing warnings after
commit 0c62356cc added an Assert in src/port/snprintf.c.

It's possible that some other src/port files that use the stanza

#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#endif

could now be simplified to just say '#include "c.h"'.  I have not
tested for that, though, and it'd be unlikely to apply for more
than a small number of them.
2018-02-14 19:43:33 -05:00
Tom Lane cbadba8dd6 Revert "Stabilize output of new regression test case".
This effectively reverts commit 9edc97b71 (although the test is now
in a different place and has different contents).  We don't need that
hack anymore, because since commit 4b93f5799, this test case no longer
throws an error and so there's no displayed CONTEXT that could vary
depending on CLOBBER_CACHE_ALWAYS.  The underlying unstable-output
problem isn't really gone, of course, but it no longer manifests here.
2018-02-14 18:42:14 -05:00
Tom Lane feb1cc5593 Stabilize new plpgsql_record regression tests.
The buildfarm's CLOBBER_CACHE_ALWAYS animals aren't happy with some
of the test cases added in commit 4b93f5799.  There are two different
problems:

* In two places, a different CONTEXT stack is shown because the error
is detected in a different place, due to recompiling an expression
from scratch rather than re-using a previously cached plan for it.
I fixed these via the expedient of hiding the CONTEXT stack altogether.

* In one place, a test expected to fail (because a cached plan hadn't
been updated) actually succeeds (because the forced recompile makes
it good).  I couldn't think of a simple workaround for this, so I've
just commented out that test step altogether.

I have hopes of improving things enough that both of these kluges can
be reverted eventually.  The first one is the same kind of problem
previously discussed at
https://postgr.es/m/31545.1512924904@sss.pgh.pa.us
but there was insufficient agreement about how to fix it, so we
just hacked around the output instability (commit 9edc97b71).
The second issue should be fixed by allowing the plan to be rebuilt
when a type conflict is detected.  But for today, let's just make the
buildfarm green again.
2018-02-14 18:17:59 -05:00
Andres Freund 6d7dc53500 Return implementation defined value if pg_$op_s$bit_overflow overflows.
Some older compilers otherwise sometimes complain about undefined
values, even though the return value should not be used in the
overflow case.  We assume that any decent compiler will optimize away
the unnecessary assignment in performance critical cases.

We do not want to restrain the returned value to a specific value,
e.g. 0 or the wrapped-around value, because some fast ways to
implement overflow detecting math do not easily allow for
that (e.g. msvc intrinsics).  As the function documentation already
documents the returned value in case of intrinsics to be
implementation defined, no documentation has to be updated.

Per complaint from Tom Lane and his buildfarm member prairiedog.

Author: Andres Freund
Discussion: https://postgr.es/m/18169.1513958454@sss.pgh.pa.us
2018-02-14 14:23:57 -08:00
Tom Lane 9a725f7b5c Silence assorted "variable may be used uninitialized" warnings.
All of these are false positives, but in each case a fair amount of
analysis is needed to see that, and it's not too surprising that not all
compilers are smart enough.  (In particular, in the logtape.c case, a
compiler lacking the knowledge provided by the Assert would almost surely
complain, so that this warning will be seen in any non-assert build.)

Some of these are of long standing while others are pretty recent,
but it only seems worth fixing them in HEAD.

Jaime Casanova, tweaked a bit by me

Discussion: https://postgr.es/m/CAJGNTeMcYAMJdPAom52dppLMtF-UnEZi0dooj==75OEv1EoBZA@mail.gmail.com
2018-02-14 16:06:49 -05:00
Tom Lane 0c62356cc8 Add an assertion that we don't pass NULL to snprintf("%s").
Per commit e748e902d, we appear to have little or no coverage in the
buildfarm of machines that will dump core when asked to printf a
null string pointer.  Let's try to improve that situation by adding
an assertion that will make src/port/snprintf.c behave that way.
Since it's just an assertion, it won't break anything in production
builds, but it will help developers find this type of oversight.

Note that while our buildfarm coverage of machines that use that
snprintf implementation is pretty thin on the Unix side (apparently
amounting only to gaur/pademelon), all of the MSVC critters use it.

Discussion: https://postgr.es/m/156b989dbc6fe7c4d3223cf51da61195@postgrespro.ru
2018-02-14 15:06:01 -05:00
Tom Lane e748e902de Fix broken logic for reporting PL/Python function names in errcontext.
plpython_error_callback() reported the name of the function associated
with the topmost PL/Python execution context.  This was not merely
wrong if there were nested PL/Python contexts, but it risked a core
dump if the topmost one is an inline code block rather than a named
function.  That will have proname = NULL, and so we were passing a NULL
pointer to snprintf("%s").  It seems that none of the PL/Python-testing
machines in the buildfarm will dump core for that, but some platforms do,
as reported by Marina Polyakova.

Investigation finds that there actually is an existing regression test
that used to prove that the behavior was wrong, though apparently no one
had noticed that it was printing the wrong function name.  It stopped
showing the problem in 9.6 when we adjusted psql to not print CONTEXT
by default for NOTICE messages.  The problem is masked (if your platform
avoids the core dump) in error cases, because PL/Python will throw away
the originally generated error info in favor of a new traceback produced
at the outer level.

Repair by using ErrorContextCallback.arg to pass the correct context to
the error callback.  Add a regression test illustrating correct behavior.

Back-patch to all supported branches, since they're all broken this way.

Discussion: https://postgr.es/m/156b989dbc6fe7c4d3223cf51da61195@postgrespro.ru
2018-02-14 14:47:18 -05:00
Tom Lane f9263006d8 Support CONSTANT/NOT NULL/initial value for plpgsql composite variables.
These features were never implemented previously for composite or record
variables ... not that the documentation admitted it, so there's no doc
updates here.

This also fixes some issues concerning enforcing DOMAIN NOT NULL
constraints against plpgsql variables, although I'm not sure that
that topic is completely dealt with.

I created a new plpgsql test file for these features, and moved the
one relevant existing test case into that file.

Tom Lane, reviewed by Daniel Gustafsson

Discussion: https://postgr.es/m/18362.1514605650@sss.pgh.pa.us
2018-02-13 22:15:08 -05:00
Tom Lane fd333bc763 Speed up plpgsql trigger startup by introducing "promises".
Over the years we've accreted quite a few special variables that are
predefined in plpgsql trigger functions.  The cost of initializing these
variables to their defined values turns out to be a significant part of
the runtime of simple triggers; but, undoubtedly, most real-world triggers
never examine the values of most of these variables.

To improve matters, invent the notion of a variable that has a "promise"
attached to it, specifying which of the predetermined values should be
assigned to the variable if anything ever reads it.  This eliminates all
the unneeded startup overhead, in return for a small penalty on accesses
to these variables.

Tom Lane, reviewed by Pavel Stehule

Discussion: https://postgr.es/m/11986.1514407114@sss.pgh.pa.us
2018-02-13 19:20:37 -05:00
Tom Lane 40301c1c8b Speed up plpgsql function startup by doing fewer pallocs.
Previously, copy_plpgsql_datum did a separate palloc for each variable
needing instance-local storage.  In simple benchmarks this made for a
noticeable fraction of the total runtime.  Improve it by precalculating
the space needed for all of a function's variables and doing just one
palloc for all of them.

In passing, remove PLPGSQL_DTYPE_EXPR from the list of plpgsql "datum"
types, since in fact it has nothing in common with the others, and there
is noplace that needs to discriminate on the basis of dtype between an
expression and any type of datum.  And add comments clarifying which
datum struct fields are generic and which aren't.

Tom Lane, reviewed by Pavel Stehule

Discussion: https://postgr.es/m/11986.1514407114@sss.pgh.pa.us
2018-02-13 19:10:43 -05:00
Tom Lane 4b93f57999 Make plpgsql use its DTYPE_REC code paths for composite-type variables.
Formerly, DTYPE_REC was used only for variables declared as "record";
variables of named composite types used DTYPE_ROW, which is faster for
some purposes but much less flexible.  In particular, the ROW code paths
are entirely incapable of dealing with DDL-caused changes to the number
or data types of the columns of a row variable, once a particular plpgsql
function has been parsed for the first time in a session.  And, since the
stored representation of a ROW isn't a tuple, there wasn't any easy way
to deal with variables of domain-over-composite types, since the domain
constraint checking code would expect the value to be checked to be a
tuple.  A lesser, but still real, annoyance is that ROW format cannot
represent a true NULL composite value, only a row of per-field NULL
values, which is not exactly the same thing.

Hence, switch to using DTYPE_REC for all composite-typed variables,
whether "record", named composite type, or domain over named composite
type.  DTYPE_ROW remains but is used only for its native purpose, to
represent a fixed-at-compile-time list of variables, for instance the
targets of an INTO clause.

To accomplish this without taking significant performance losses, introduce
infrastructure that allows storing composite-type variables as "expanded
objects", similar to the "expanded array" infrastructure introduced in
commit 1dc5ebc90.  A composite variable's value is thereby kept (most of
the time) in the form of separate Datums, so that field accesses and
updates are not much more expensive than they were in the ROW format.
This holds the line, more or less, on performance of variables of named
composite types in field-access-intensive microbenchmarks, and makes
variables declared "record" perform much better than before in similar
tests.  In addition, the logic involved with enforcing composite-domain
constraints against updates of individual fields is in the expanded
record infrastructure not plpgsql proper, so that it might be reusable
for other purposes.

In further support of this, introduce a typcache feature for assigning a
unique-within-process identifier to each distinct tuple descriptor of
interest; in particular, DDL alterations on composite types result in a new
identifier for that type.  This allows very cheap detection of the need to
refresh tupdesc-dependent data.  This improves on the "tupDescSeqNo" idea
I had in commit 687f096ea: that assigned identifying sequence numbers to
successive versions of individual composite types, but the numbers were not
unique across different types, nor was there support for assigning numbers
to registered record types.

In passing, allow plpgsql functions to accept as well as return type
"record".  There was no good reason for the old restriction, and it
was out of step with most of the other PLs.

Tom Lane, reviewed by Pavel Stehule

Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
2018-02-13 18:52:21 -05:00
Peter Eisentraut 7a32ac8a66 Add procedure support to pg_get_functiondef
This also makes procedures work in psql's \ef and \sf commands.

Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
2018-02-13 15:13:44 -05:00
Peter Eisentraut 7cd56f218d Add tests for pg_get_functiondef 2018-02-13 15:13:44 -05:00
Peter Eisentraut a7b8f0661d Fix typo 2018-02-13 15:13:44 -05:00
Peter Eisentraut b4e2ada347 In LDAP test, restart after pg_hba.conf changes
Instead of issuing a reload after pg_hba.conf changes between test
cases, run a full restart.  With a reload, an error in the new
pg_hba.conf is ignored and the tests will continue to run with the old
settings, invalidating the subsequent test cases.  With a restart, a
faulty pg_hba.conf will lead to the test being aborted, which is what
we'd rather want.
2018-02-13 09:12:45 -05:00
Peter Eisentraut ebdb42a0d6 Fix typo
Author: Masahiko Sawada <sawada.mshk@gmail.com>
2018-02-12 22:39:52 -05:00
Alvaro Herrera 8237f27b50 get_relid_attribute_name is dead, long live get_attname
The modern way is to use a missing_ok argument instead of two separate
almost-identical routines, so do that.

Author: Michaël Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz
2018-02-12 19:33:15 -03:00
Robert Haas 88ef48c1cc Fix parallel index builds for dynamic_shared_memory_type=none.
The previous code failed to realize that this setting effectively
disables parallelism, and would crash if it decided to attempt
parallelism anyway.  Instead, treat it as a disabling condition.

Kyotaro Horiguchi, who also reported the issue. Reviewed by Michael
Paquier and Peter Geoghegan.

Discussion: http://postgr.es/m/20180209.170635.256350357.horiguchi.kyotaro@lab.ntt.co.jp
2018-02-12 12:55:12 -05:00
Bruce Momjian 91389228a1 psql: give ^D hint for \q in place where \q is ignored
Also add comment on why exit/quit are not documented.

Discussion: https://postgr.es/m/20180202053928.GA13472@momjian.us
2018-02-12 01:27:06 -05:00
Tom Lane 5c9f2564fa Fix assorted errors in pg_dump's handling of extended statistics objects.
pg_dump supposed that a stats object necessarily shares the same schema
as its underlying table, and that it doesn't have a separate owner.
These things may have been true during early development of the feature,
but they are not true as of v10 release.

Failure to track the object's schema separately turns out to have only
limited consequences, because pg_get_statisticsobjdef() always schema-
qualifies the target object name in the generated CREATE STATISTICS command
(a decision out of step with the rest of ruleutils.c, but I digress).
Therefore the restored object would be in the right schema, so that the
only problem is that the TOC entry would be mislabeled as to schema.  That
could lead to wrong decisions for schema-selective restores, for example.

The ownership issue is a bit more serious: not only was the TOC entry
potentially mislabeled as to owner, but pg_dump didn't bother to issue an
ALTER OWNER command at all, so that after restore the stats object would
continue to be owned by the restoring superuser.

A final point is that decisions as to whether to dump a stats object or
not were driven by whether the underlying table was dumped or not.  While
that's not wrong on its face, it won't scale nicely to the planned future
extension to cross-table statistics.  Moreover, that design decision comes
out of the view of stats objects as being auxiliary to a particular table,
like a rule or trigger, which is exactly where the above problems came
from.  Since we're now treating stats objects more like independent objects
in their own right, they ought to behave like standalone objects for this
purpose too.  So change to using the generic selectDumpableObject() logic
for them (which presently amounts to "dump if containing schema is to be
dumped").

Along the way to fixing this, restructure so that getExtendedStatistics
collects the identity info (only) for all extended stats objects in one
query, and then for each object actually being dumped, we retrieve the
definition in dumpStatisticsExt.  This is necessary to ensure that
schema-qualification in the generated CREATE STATISTICS command happens
with respect to the search path that pg_dump will now be using at restore
time (ie, the schema the stats object is in, not that of the underlying
table).  It's probably also significantly faster in the typical scenario
where only a minority of tables have extended stats.

Back-patch to v10 where extended stats were introduced.

Discussion: https://postgr.es/m/18272.1518328606@sss.pgh.pa.us
2018-02-11 13:24:15 -05:00
Tom Lane d02d4a6d4f Avoid premature free of pass-by-reference CALL arguments.
Prematurely freeing the EState used to evaluate CALL arguments led, in some
cases, to passing dangling pointers to the procedure.  This was masked in
trivial cases because the argument pointers would point to Const nodes in
the original expression tree, and in some other cases because the result
value would end up in the standalone ExprContext rather than in memory
belonging to the EState --- but that wasn't exactly high quality
programming either, because the standalone ExprContext was never
explicitly freed, breaking assorted API contracts.

In addition, using a separate EState for each argument was just silly.

So let's use just one EState, and one ExprContext, and make the latter
belong to the former rather than be standalone, and clean up the EState
(and hence the ExprContext) post-call.

While at it, improve the function's commentary a bit.

Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us
2018-02-10 13:37:12 -05:00
Tom Lane 65b1d76785 Fix oversight in CALL argument handling, and do some minor cleanup.
CALL statements cannot support sub-SELECTs in the arguments of the called
procedure, since they just use ExecEvalExpr to evaluate such arguments.
Teach transformSubLink() to reject the case, as it already does for other
contexts in which subqueries are not supported.

In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum
symbol line up more closely with the phrasing of the error messages it is
associated with.  And fix someone's weak grasp of English grammar in the
preceding EXPR_KIND_PARTITION_EXPRESSION addition.  Also update an
incorrect comment in resolve_unique_index_expr (possibly it was correct
when written, but nowadays transformExpr definitely does reject SRFs here).

Per report from Pavel Stehule --- but this resolves only one of the bugs
he mentions.

Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com
2018-02-10 13:05:14 -05:00
Robert Haas 935dee9ad5 Mark assorted GUC variables as PGDLLIMPORT.
This makes life easier for extension authors.

Metin Doslu

Discussion: http://postgr.es/m/CAL1dPcfa45o1dC-c4t-48v0OZE6oy4ChJhObrtkK8mzNfXqDTA@mail.gmail.com
2018-02-09 15:54:45 -05:00
Robert Haas be42015fcc Clear stmt_timeout_active if we disable_all_timeouts.
Otherwise, we can end up with the flag set when the timeout is
actually disabled, leading to misbehavior.  Commit
f8e5f156b3 introduced this bug.

Reported by Peter Eisentraut.  Analysis and fix by Thomas Munro,
tweaked by me.

Discussion: http://postgr.es/m/6a909374-2602-7136-8c70-397330a418f3@2ndquadrant.com
2018-02-09 15:48:18 -05:00
Robert Haas b78d0160da Fix incorrect method name in comment.
Atsushi Torikoshi

Discussion: http://postgr.es/m/1b056262-4bc0-a982-c899-bb67a0a7fd52@lab.ntt.co.jp
2018-02-08 14:35:54 -05:00
Robert Haas e44dd84325 Avoid listing the same ResultRelInfo in more than one EState list.
Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple
times.  Commit 2f17844104 seems to
be to blame for this.

Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.
2018-02-08 14:29:05 -05:00
Robert Haas 88fdc70060 Fix possible infinite loop with Parallel Append.
When the previously-chosen plan was non-partial, all pa_finished
flags for partial plans are now set, and pa_next_plan has not yet
been set to INVALID_SUBPLAN_INDEX, the previous code could go into
an infinite loop.

Report by Rajkumar Raghuwanshi.  Patch by Amit Khandekar and me.
Review by Kyotaro Horiguchi.

Discussion: http://postgr.es/m/CAJ3gD9cf43z78qY=U=H0HvOEN341qfRO-vLpnKPSviHeWgJQ5w@mail.gmail.com
2018-02-08 12:31:48 -05:00
Peter Eisentraut b3a101eff0 Refine SSL tests test name reporting
Instead of using the psql/libpq connection string as the displayed test
name and relying on "notes" and source code comments to explain the
tests, give the tests self-explanatory names, like we do elsewhere.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-02-08 09:57:10 -05:00
Peter Eisentraut 7c44b75a2a Make new triggers tests more robust
Add explicit collation on the trigger name to avoid locale dependencies.
Also restrict the tables selected, to avoid interference from
concurrently running tests.
2018-02-07 14:57:19 -05:00
Peter Eisentraut 32ff269117 Add more information_schema columns
- table_constraints.enforced
- triggers.action_order
- triggers.action_reference_old_table
- triggers.action_reference_new_table

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-07 10:08:02 -05:00
Robert Haas b98a7cd58f Update out-of-date comment in StartupXLOG.
Commit 4b0d28de06 should have updated
this comment, but did not.

Thomas Munro

Discussion: http://postgr.es/m/CAEepm=0iJ8aqQcF9ij2KerAkuHF3SwrVTzjMdm1H4w++nfBf9A@mail.gmail.com
2018-02-07 08:48:04 -05:00
Robert Haas 4815dfa10f Remove prototype for fmgr() function, which no longer exists.
Commit 5ded4bd214 removed the code
for this function, but neglected to remove the prototype and
associated comments.

Dagfinn Ilmari Mannsåker

Discussion: http://postgr.es/m/d8j4lmuxjzk.fsf@dalvik.ping.uio.no
2018-02-07 08:42:36 -05:00
Tom Lane 0a459cec96 Support all SQL:2011 options for window frame clauses.
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions.  We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion.  That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value.  Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that.  (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)

The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types.  Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.

In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011.  As far as I can see, we are now fully
up to spec on window framing options.

Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.

Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.

Oliver Ford, reviewed and whacked around some by me

Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
2018-02-07 00:06:56 -05:00
Robert Haas 2320945731 Fix incorrect grammar.
Etsuro Fujita

Discussion: http://postgr.es/m/5A7981EA.8020201@lab.ntt.co.jp
2018-02-06 15:50:13 -05:00
Robert Haas 9fafa413ac Avoid valgrind complaint about write() of uninitalized bytes.
LogicalTapeFreeze() may write out its first block when it is dirty but
not full, and then immediately read the first block back in from its
BufFile as a BLCKSZ-width block.  This can only occur in rare cases
where very few tuples were written out, which is currently only
possible with parallel external tuplesorts.  To avoid valgrind
complaints, tell it to treat the tail of logtape.c's buffer as
defined.

Commit 9da0cc3528 exposed this problem
but did not create it.  LogicalTapeFreeze() has always tended to write
out some amount of garbage bytes, but previously never wrote less than
one block of data in total, so the problem was masked.

Per buildfarm members lousyjack and skink.

Peter Geoghegan, based on a suggestion from Tom Lane and me.  Some
comment revisions by me.
2018-02-06 14:24:57 -05:00
Tom Lane 3785f7eee3 Doc: move info for btree opclass implementors into main documentation.
Up to now, useful info for writing a new btree opclass has been buried
in the backend's nbtree/README file.  Let's move it into the SGML docs,
in preparation for extending it with info about "in_range" functions
in the upcoming window RANGE patch.

To do this, I chose to create a new chapter for btree indexes in Part VII
(Internals), parallel to the chapters that exist for the newer index AMs.
This is a pretty short chapter as-is.  At some point somebody might care
to flesh it out with more detail about btree internals, but that is
beyond the scope of my ambition for today.

Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.us
2018-02-06 13:52:27 -05:00
Robert Haas f069c91a57 Fix possible crash in partition-wise join.
The previous code assumed that we'd always succeed in creating
child-joins for a joinrel for which partition-wise join was considered,
but that's not guaranteed, at least in the case where dummy rels
are involved.

Ashutosh Bapat, with some wordsmithing by me.

Discussion: http://postgr.es/m/CAFjFpRf8=uyMYYfeTBjWDMs1tR5t--FgOe2vKZPULxxdYQ4RNw@mail.gmail.com
2018-02-05 17:31:57 -05:00
Tom Lane a926eb84e0 Ensure that all temp files made during pg_upgrade are non-world-readable.
pg_upgrade has always attempted to ensure that the transient dump files
it creates are inaccessible except to the owner.  However, refactoring
in commit 76a7650c4 broke that for the file containing "pg_dumpall -g"
output; since then, that file was protected according to the process's
default umask.  Since that file may contain role passwords (hopefully
encrypted, but passwords nonetheless), this is a particularly unfortunate
oversight.  Prudent users of pg_upgrade on multiuser systems would
probably run it under a umask tight enough that the issue is moot, but
perhaps some users are depending only on pg_upgrade's umask changes to
protect their data.

To fix this in a future-proof way, let's just tighten the umask at
process start.  There are no files pg_upgrade needs to write at a
weaker security level; and if there were, transiently relaxing the
umask around where they're created would be a safer approach.

Report and patch by Tom Lane; the idea for the fix is due to Noah Misch.
Back-patch to all supported branches.

Security: CVE-2018-1053
2018-02-05 10:58:27 -05:00
Tom Lane 3492a0af0b Fix RelationBuildPartitionKey's processing of partition key expressions.
Failure to advance the list pointer while reading partition expressions
from a list results in invoking an input function with inappropriate data,
possibly leading to crashes or, with carefully crafted input, disclosure
of arbitrary backend memory.

Bug discovered independently by Álvaro Herrera and David Rowley.
This patch is by Álvaro but owes something to David's proposed fix.
Back-patch to v10 where the issue was introduced.

Security: CVE-2018-1052
2018-02-05 10:37:30 -05:00
Tom Lane 05d0f13f07 Skip setting up shared instrumentation for Hash node if not needed.
We don't need to set up the shared space for hash join instrumentation data
if instrumentation hasn't been requested.  Let's follow the example of the
similar Sort node code and save a few cycles by skipping that when we can.

This reverts commit d59ff4ab3 and instead allows us to use the safer choice
of passing noError = false to shm_toc_lookup in ExecHashInitializeWorker,
since if we reach that call there should be a TOC entry to be found.

Thomas Munro

Discussion: https://postgr.es/m/E1ehkoZ-0005uW-43%40gemulon.postgresql.org
2018-02-04 22:14:07 -05:00
Tom Lane d59ff4ab31 Fix another instance of unsafe coding for shm_toc_lookup failure.
One or another author of commit 5bcf389ec seems to have thought that
computing an offset from a NULL pointer would yield another NULL pointer.
There may possibly be architectures where that works, but common machines
don't work like that.  Per a quick code review of places calling
shm_toc_lookup and not using noError = false.
2018-02-02 18:32:05 -05:00
Tom Lane 957ff087c8 Be more wary about shm_toc_lookup failure.
Commit 445dbd82a basically missed the point of commit d46633506,
which was that we shouldn't allow shm_toc_lookup() failure to lead
to a core dump or assertion crash, because the odds of such a
failure should never be considered negligible.  It's correct that
we can't expect the PARALLEL_KEY_ERROR_QUEUE TOC entry to be there
if we have no workers.  But if we have no workers, we're not going
to do anything in this function with the lookup result anyway,
so let's just skip it.  That lets the code use the easy-to-prove-safe
noError=false case, rather than anything requiring effort to review.

Back-patch to v10, like the previous commit.

Discussion: https://postgr.es/m/3647.1517601675@sss.pgh.pa.us
2018-02-02 18:26:07 -05:00
Peter Eisentraut 533c5d8bdd Fix application of identity values in some cases
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN.  To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.

For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence.  The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet.  So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-02 14:39:10 -05:00
Robert Haas 9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Robert Haas 9aef173163 Refactor code for partition bound searching
Remove partition_bound_cmp() and partition_bound_bsearch(), whose
void * argument could be, depending on the situation, of any of
three different types: PartitionBoundSpec *, PartitionRangeBound *,
Datum *.

Instead, introduce separate bound-searching functions for each
situation: partition_list_bsearch, partition_range_bsearch,
partition_range_datum_bsearch, and partition_hash_bsearch.  This
requires duplicating the code for binary search, but it makes the
code much more type safe, involves fewer branches at runtime, and
at least in my opinion, is much easier to understand.

Along the way, add an option to partition_range_datum_bsearch
allowing the number of keys to be specified, so that we can search
for partitions based on a prefix of the full list of partition
keys.  This is important for pending work to improve partition
pruning.

Amit Langote, per a suggestion from me.

Discussion: http://postgr.es/m/CA+TgmoaVLDLc8=YESRwD32gPhodU_ELmXyKs77gveiYp+JE4vQ@mail.gmail.com
2018-02-02 09:32:44 -05:00
Robert Haas 9222c0d9ed Add new function WaitForParallelWorkersToAttach.
Once this function has been called, we know that all workers have
started and attached to their error queues -- so if any of them
subsequently exit uncleanly, we'll be sure to throw an ERROR promptly.
Otherwise, users of the ParallelContext machinery must be careful not
to wait forever for a worker that has failed to start.  Parallel query
manages to work without needing this for reasons explained in new
comments added by this patch, but it's a useful primitive for other
parallel operations, such as the pending patch to make creating a
btree index run in parallel.

Amit Kapila, revised by me.  Additional review by Peter Geoghegan.

Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com
2018-02-02 09:00:59 -05:00
Robert Haas ad25a6b1f2 Fix possible failure to mark hash metapage dirty.
Report and suggested fix by Lixian Zou.  Amit Kapila put it
in the form of a patch and reviewed.

Discussion: http://postgr.es/m/151739848647.1239.12528851873396651946@wrigleys.postgresql.org
2018-02-01 15:23:45 -05:00
Bruce Momjian df9f599bc6 psql: Add quit/help behavior/hint, for other tool portability
Issuing 'quit'/'exit' in an empty psql buffer exits psql.  Issuing
'quit'/'exit' in a non-empty psql buffer alone on a line with no prefix
whitespace issues a hint on how to exit.

Also add similar 'help' hints for 'help' in a non-empty psql buffer.

Reported-by: Everaldo Canuto

Discussion: https://postgr.es/m/flat/CALVFHFb-C_5_94hueWg6Dd0zu7TfbpT7hzsh9Zf0DEDOSaAnfA%40mail.gmail.com

Author: original author Robert Haas, modified by me
2018-02-01 08:30:43 -05:00
Robert Haas 22757960bb Fix typo: colums -> columns.
Along the way, also fix code indentation.

Alexander Lakhin, reviewed by Michael Paquier

Discussion: http://postgr.es/m/45c44aa7-7cfa-7f3b-83fd-d8300677fdda@gmail.com
2018-01-31 16:45:37 -05:00
Robert Haas 3ccdc6f9a5 Fix list partition constraints for partition keys of array type.
The old code generated always generated a constraint of the form
col = ANY(ARRAY[val1, val2, ...]), but that's invalid when col is an
array type.  Instead, generate col = val when there's only one value,
col = val1 OR col = val2 OR ... when there are multiple values and
col is of array type, and the old form when there are multiple values
and col is not of an array type.

As a side benefit, this makes constraint exclusion able to prune
a list partition declared to accept a single Boolean value, which
didn't work before.

Amit Langote, reviewed by Etsuro Fujita

Discussion: http://postgr.es/m/97267195-e235-89d1-a41a-c110198dfce9@lab.ntt.co.jp
2018-01-31 15:43:11 -05:00
Peter Eisentraut f75a959155 Refactor client-side SSL certificate checking code
Separate the parts specific to the SSL library from the general logic.

The previous code structure was

open_client_SSL()
calls verify_peer_name_matches_certificate()
calls verify_peer_name_matches_certificate_name()
calls wildcard_certificate_match()

and was completely in fe-secure-openssl.c.  The new structure is

open_client_SSL() [openssl]
calls pq_verify_peer_name_matches_certificate() [generic]
calls pgtls_verify_peer_name_matches_certificate_guts() [openssl]
calls openssl_verify_peer_name_matches_certificate_name() [openssl]
calls pq_verify_peer_name_matches_certificate_name() [generic]
calls wildcard_certificate_match() [generic]

Move the generic functions into a new file fe-secure-common.c, so the
calls generally go fe-connect.c -> fe-secure.c -> fe-secure-${impl}.c ->
fe-secure-common.c, although there is a bit of back-and-forth between
the last two.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-30 22:56:24 -05:00
Peter Eisentraut 38d485fdaa Fix up references to scram-sha-256
pg_hba_file_rules erroneously reported this as scram-sha256.  Fix that.

To avoid future errors and confusion, also adjust documentation links
and internal symbols to have a separator between "sha" and "256".

Reported-by: Christophe Courtois <christophe.courtois@dalibo.com>
Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-30 16:50:30 -05:00
Peter Eisentraut a044378ce2 Add some noreturn attributes to help static analyzers 2018-01-29 20:44:35 -05:00
Peter Eisentraut 07e524d3e9 Silence complaint about dead assignment
The preferred place for "placate compiler" assignments is after
elog(ERROR), not before it.  Otherwise, scan-build complains about a
dead assignment.
2018-01-29 20:43:43 -05:00
Andres Freund c12693d8f3 Introduce ExecQualAndReset() helper.
It's a common task to evaluate a qual and reset the corresponding
expression context. Currently that requires storing the result of the
qual eval, resetting the context, and then reacting on the result. As
that's awkward several places only reset the context next time through
a node. That's not great, so introduce a helper that evaluates and
resets.

It's a bit ugly that it currently uses MemoryContextReset() instead of
ResetExprContext(), but that seems easier than reordering all of
executor.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20180109222544.f7loxrunqh3xjl5f@alap3.anarazel.de
2018-01-29 12:19:12 -08:00
Tom Lane 97d4445a03 Save a few bytes by removing useless last argument to SearchCatCacheList.
There's never any value in giving a fully specified cache key to
SearchCatCacheList: you might as well call SearchCatCache instead,
since there could be only one match.  So the maximum useful number of
key arguments is one less than the supported number of key columns.
We might as well remove the useless extra argument and save some few
bytes per call site, as well as a cycle or so per call.

I believe the reason it was coded like this is that originally, callers
had to write out all the dummy arguments in each call, and so it seemed
less confusing if SearchCatCache and SearchCatCacheList took the same
number of key arguments.  But since commit e26c539e9, callers only write
their live arguments explicitly, making that a non-factor; and there's
surely been enough time for third-party modules to adapt to that coding
style.  So this is only an ABI break not an API break for callers.

Per discussion with Oliver Ford, this might also make it less confusing
how to use SearchCatCacheList correctly.

Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us
2018-01-29 15:13:17 -05:00
Andres Freund fc96c69425 Initialize unused ExprEvalStep fields.
ExecPushExprSlots didn't initialize ExprEvalStep's resvalue/resnull
steps as it didn't use them. That caused wrong valgrind warnings for
an upcoming patch, so zero-intialize.

Also zero-initialize all scratch ExprEvalStep's allocated on the
stack, to avoid issues with similar future omissions of non-critial
data.
2018-01-29 12:01:07 -08:00
Andres Freund ab9f2c429d Prevent growth of simplehash tables when they're "too empty".
In cases where simplehash tables where filled with either a lot of
conflicting hash-values, or values that hash to consecutive
values (i.e. build "chains") the growth heuristics in
d4c62a6b62 could trigger rather
explosively.

To fix that, address some of the reasons (see previous commit) of why
the growth heuristics where needed, and only allow growth when the
table isn't too empty. While that means there's a few cases of bad
input that can be slower, that seems a lot better than running very
quickly out of memory.

Author: Tomas Vondra and Andres Freund, with additional input by
    Thomas Munro, Tom Lane Todd A. Cook
Reported-By: Todd A. Cook, Tomas Vondra, Thomas Munro
Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org
Backpatch: 10, where simplehash was introduced
2018-01-29 11:24:57 -08:00
Andres Freund c068f87723 Improve bit perturbation in TupleHashTableHash.
The changes in b81b5a96f4 did not fully
address the issue, because the bit-mixing of the IV into the final
hash-key didn't prevent clustering in the input-data survive in the
output data.

This didn't cause a lot of problems because of the additional growth
conditions added d4c62a6b62. But as we
want to rein those in due to explosive growth in some edges, this
needs to be fixed.

Author: Andres Freund
Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org
Backpatch: 10, where simplehash was introduced
2018-01-29 11:24:57 -08:00
Tom Lane 15be274601 Avoid misleading psql password prompt when username is multiply specified.
When a password is needed, cases such as
	psql -d "postgresql://alice@localhost/testdb" -U bob
would incorrectly prompt for "Password for user bob: ", when actually the
connection will be attempted with username alice.  The priority order of
which name to use isn't that important here, but the misleading prompt is.

When we are prompting for a password after initial connection failure,
we can fix this reliably by looking at PQuser(conn) to see how libpq
interpreted the connection arguments.  But when we're doing a forced
password prompt because of a -W switch, we can't use that solution.
Fortunately, because the main use of -W is for noninteractive situations,
it's less critical to produce a helpful prompt in such cases.  I made
the startup prompt for -W just say "Password: " all the time, rather
than expending extra code on trying to identify which username to use.
In the case of a \c command (after -W has been given), there's already
logic in do_connect that determines whether the "dbname" is a connstring
or URI, so we can avoid lobotomizing the prompt except in cases that
are actually dubious.  (We could do similarly in startup.c if anyone
complains, but for now it seems not worthwhile, especially since that
would still be only a partial solution.)

Per bug #15025 from Akos Vandra.  Although this is arguably a bug fix,
it doesn't seem worth back-patching.  The case where it matters seems
like a very corner-case usage, and someone might complain that we'd
changed the behavior of -W in a minor release.

Discussion: https://postgr.es/m/20180123130013.7407.24749@wrigleys.postgresql.org
2018-01-29 12:57:09 -05:00
Tom Lane 35a528062c Add stack-overflow guards in set-operation planning.
create_plan_recurse lacked any stack depth check.  This is not per
our normal coding rules, but I'd supposed it was safe because earlier
planner processing is more complex and presumably should eat more
stack.  But bug #15033 from Andrew Grossman shows this isn't true,
at least not for queries having the form of a many-thousand-way
INTERSECT stack.

Further testing showed that recurse_set_operations is also capable
of being crashed in this way, since it likewise will recurse to the
bottom of a parsetree before calling any support functions that
might themselves contain any stack checks.  However, its stack
consumption is only perhaps a third of create_plan_recurse's.

It's possible that this particular problem with create_plan_recurse can
only manifest in 9.6 and later, since before that we didn't build a Path
tree for set operations.  But having seen this example, I now have no
faith in the proposition that create_plan_recurse doesn't need a stack
check, so back-patch to all supported branches.

Discussion: https://postgr.es/m/20180127050845.28812.58244@wrigleys.postgresql.org
2018-01-28 13:39:07 -05:00
Bruce Momjian 010123e144 C includes: Reorder C includes in partition.c
Discussion: https://postgr.es/m/5A69AA50.2060600@lab.ntt.co.jp

Author: Etsuro Fujita
2018-01-27 23:05:52 -05:00
Tom Lane 41fc04ff91 Update time zone data files to tzdata release 2018c.
DST law changes in Brazil, Sao Tome and Principe.  Historical corrections
for Bolivia, Japan, and South Sudan.  The "US/Pacific-New" zone has been
removed (it was only a link to America/Los_Angeles anyway).
2018-01-27 16:42:28 -05:00
Tom Lane 2e668c522e Avoid crash during EvalPlanQual recheck of an inner indexscan.
Commit 09529a70b changed nodeIndexscan.c and nodeIndexonlyscan.c to
postpone initialization of the indexscan proper until the first tuple
fetch.  It overlooked the question of mark/restore behavior, which means
that if some caller attempts to mark the scan before the first tuple fetch,
you get a null pointer dereference.

The only existing user of mark/restore is nodeMergejoin.c, which (somewhat
accidentally) will never attempt to set a mark before the first inner tuple
unless the inner child node is a Material node.  Hence the case can't arise
normally, so it seems sufficient to document the assumption at both ends.
However, during an EvalPlanQual recheck, ExecScanFetch doesn't call
IndexNext but just returns the jammed-in test tuple.  Therefore, if we're
doing a recheck in a plan tree with a mergejoin with inner indexscan,
it's possible to reach ExecIndexMarkPos with iss_ScanDesc still null,
as reported by Guo Xiang Tan in bug #15032.

Really, when there's a test tuple supplied during an EPQ recheck, touching
the index at all is the wrong thing: rather, the behavior of mark/restore
ought to amount to saving and restoring the es_epqScanDone flag.  We can
avoid finding a place to actually save the flag, for the moment, because
given the assumption that no caller will set a mark before fetching a
tuple, es_epqScanDone must always be set by the time we try to mark.
So the actual behavior change required is just to not reach the index
access if a test tuple is supplied.

The set of plan node types that need to consider this issue are those
that support EPQ test tuples (i.e., call ExecScan()) and also support
mark/restore; which is to say, IndexScan, IndexOnlyScan, and perhaps
CustomScan.  It's tempting to try to fix the problem in one place by
teaching ExecMarkPos() itself about EPQ; but ExecMarkPos supports some
plan types that aren't Scans, and also it seems risky to make assumptions
about what a CustomScan wants to do here.  Also, the most likely future
change here is to decide that we do need to support marks placed before
the first tuple, which would require additional work in IndexScan and
IndexOnlyScan in any case.  Hence, fix the EPQ issue in nodeIndexscan.c
and nodeIndexonlyscan.c, accepting the small amount of code duplicated
thereby, and leave it to CustomScan providers to fix this bug if they
have it.

Back-patch to v10 where commit 09529a70b came in.  In earlier branches,
the index_markpos() call is a waste of cycles when EPQ is active, but
no more than that, so it doesn't seem appropriate to back-patch further.

Discussion: https://postgr.es/m/20180126074932.3098.97815@wrigleys.postgresql.org
2018-01-27 13:52:24 -05:00
Tom Lane fb8697b31a Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily.  Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser.  Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.

This does create a user-visible behavioral change, which is that while
formerly all of these would work:
	alter table foo set (fillfactor = 50);
	alter table foo set (FillFactor = 50);
	alter table foo set ("fillfactor" = 50);
	alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others.  However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.

Daniel Gustafsson, reviewed by Michael Paquier, small changes by me

Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
2018-01-26 18:25:14 -05:00
Robert Haas 9fd8b7d632 Factor some code out of create_grouping_paths.
This is preparatory refactoring to prepare the way for partition-wise
aggregate, which will reuse the new subroutines for child grouping
rels.  It also does not seem like a bad idea on general principle,
as the function was getting pretty long.

Jeevan Chalke.  The larger patch series of which this patch is a part
was reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi,
Ashutosh Bapat, David Rowley, Dilip Kumar, Konstantin Knizhnik,
Pascal Legrand, and me.  Some cosmetic changes by me.

Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
2018-01-26 15:03:12 -05:00
Tom Lane 4971d2a322 Remove the obsolete WITH clause of CREATE FUNCTION.
This clause was superseded by SQL-standard syntax back in 7.3.
We've kept it around for backwards-compatibility purposes ever since;
but 15 years seems like long enough for that, especially seeing that
there are undocumented weirdnesses in how it interacts with the
SQL-standard syntax for specifying the same options.

Michael Paquier, per an observation by Daniel Gustafsson;
some small cosmetic adjustments to nearby code by me.

Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz
2018-01-26 12:25:44 -05:00
Peter Eisentraut c1869542b3 Use abstracted SSL API in server connection log messages
The existing "connection authorized" server log messages used OpenSSL
API calls directly, even though similar abstracted API calls exist.
Change to use the latter instead.

Change the function prototype for the functions that return the TLS
version and the cipher to return const char * directly instead of
copying into a buffer.  That makes them slightly easier to use.

Add bits= to the message.  psql shows that, so we might as well show the
same information on the client and server.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 09:50:46 -05:00
Peter Eisentraut a6ef00b5c3 Remove byte-masking macros for Datum conversion macros
As the comment there stated, these were needed for old-style
user-defined functions, but since we removed support for those, we don't
need this anymore.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 08:46:27 -05:00
Tom Lane 1368e92e16 Support --no-comments in pg_dump, pg_dumpall, pg_restore.
We have switches already to suppress other subsidiary object properties,
such as ACLs, security labels, ownership, and tablespaces, so just on
the grounds of symmetry we should allow suppressing comments as well.
Also, commit 0d4e6ed30 added a positive reason to have this feature,
i.e. to allow obtaining the old behavior of selective pg_restore should
anyone desire that.

Recent commits have removed the cases where pg_dump emitted comments on
built-in objects that the restoring user might not have privileges to
comment on, so the original primary motivation for this feature is gone,
but it still seems at least somewhat useful in its own right.

Robins Tharakan, reviewed by Fabrízio Mello

Discussion: https://postgr.es/m/CAEP4nAx22Z4ch74oJGzr5RyyjcyUSbpiFLyeYXX8pehfou92ug@mail.gmail.com
2018-01-25 15:27:24 -05:00
Tom Lane bb415675d8 Add missing "static" markers.
Per buildfarm.
2018-01-25 14:32:28 -05:00
Tom Lane 0d4e6ed308 Clean up some aspects of pg_dump/pg_restore item-selection logic.
Ensure that CREATE DATABASE and related commands are issued when, and
only when, --create is specified.  Previously there were scenarios
where using selective-dump switches would prevent --create from having
any effect.  For example, it would fail to do anything in pg_restore
if the archive file had been made by a selective dump, because there
would be no TOC entry for the database.

Since we don't issue \connect either if we don't issue CREATE DATABASE,
this could result in unexpectedly restoring objects into the wrong
database.

Also fix pg_restore's selective restore logic so that when an object is
selected to be restored, we also restore its ACL, comment, and security
label if any.  Previously there was no way to get the latter properties
except through tedious mucking about with a -L file.  If, for some
reason, you don't want these properties, you can match the old behavior
by adding --no-acl etc.

While at it, try to make _tocEntryRequired() a little better organized
and better documented.

Discussion: https://postgr.es/m/32668.1516848577@sss.pgh.pa.us
2018-01-25 14:26:15 -05:00
Alvaro Herrera 05fb5d6619 Ignore partitioned indexes where appropriate
get_relation_info() was too optimistic about opening indexes in
partitioned tables, which would raise errors when any queries were
planned on such tables.  Fix by ignoring any indexes of the partitioned
kind.

CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem.  Fix by
disallowing these commands in partitioned tables.

Fallout from 8b08f7d482.
2018-01-25 16:12:15 -03:00
Tom Lane 5955d93419 Improve pg_dump's handling of "special" built-in objects.
We had some pretty ad-hoc handling of the public schema and the plpgsql
extension, which are both presumed to exist in template0 but might be
modified or deleted in the database being dumped.

Up to now, by default pg_dump would emit a CREATE EXTENSION IF NOT EXISTS
command as well as a COMMENT command for plpgsql.  The usefulness of the
former is questionable, and the latter caused annoying errors in
non-superuser dump/restore scenarios.  Let's instead install a rule that
built-in extensions (identified by having low-numbered OIDs) are not to be
dumped.  We were doing it that way already in binary-upgrade mode, so this
just makes regular mode behave the same.  It remains true that if someone
has installed a non-default ACL on the plpgsql language, that will get
dumped thanks to the pg_init_privs mechanism.  This is more consistent with
the handling of built-in objects of other kinds.

Also, change the very ad-hoc mechanism that was used to avoid dumping
creation and comment commands for the public schema.  Instead of hardwiring
a test in _printTocEntry(), make use of the DUMP_COMPONENT_ infrastructure
to mark that schema up-front about what we want to do with it.  This has
the visible effect that the public schema won't be mentioned in the output
at all, except for updating its ACL if it has a non-default ACL.
Previously, while it was normally not mentioned, --clean mode would drop
and recreate it, again causing headaches for non-superuser usage.  This
change likewise makes the public schema less special and more like other
built-in objects.

If plpgsql, or the public schema, has been removed entirely in the source
DB, that situation won't be reproduced in the destination ... but that
was true before.

Discussion: https://postgr.es/m/29048.1516812451@sss.pgh.pa.us
2018-01-25 13:54:42 -05:00
Peter Eisentraut 0b5e33f667 Remove use of byte-masking macros in record_image_cmp
These were introduced in 4cbb646334, but
after further analysis and testing, they should not be necessary and
probably weren't the part of that commit that fixed anything.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-25 09:41:19 -05:00
Peter Eisentraut 4a3fdbdf76 Allow spaces in connection strings in SSL tests
Connection strings can have items with spaces in them, wrapped in
quotes.  The tests however ran a SELECT '$connstr' upon connection which
broke on the embedded quotes.  Use dollar quotes on the connstr to
protect against this.  This was hit during the development of the macOS
Secure Transport patch, but is independent of it.

Author: Daniel Gustafsson <daniel@yesql.se>
2018-01-25 09:14:24 -05:00
Robert Haas 945f71db84 Avoid referencing off the end of subplan_partition_offsets.
Report by buildfarm member skink and Tom Lane.  Analysis by me.
Patch by Amit Khandekar.

Discussion: http://postgr.es/m/CAJ3gD9fVA1iXQYhfqHP5n_TEd4U9=V8TL_cc-oKRnRmxgdvJrQ@mail.gmail.com
2018-01-24 16:34:51 -05:00
Peter Eisentraut a61116da8b Add tests for record_image_eq and record_image_cmp
record_image_eq was covered a bit by the materialized view code that it
is meant to support, but record_image_cmp was not tested at all.

While we're here, add more tests to record_eq and record_cmp as well,
for symmetry.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-24 13:23:57 -05:00
Tom Lane 434e6e1484 Improve implementation of pg_attribute_always_inline.
Avoid compiler warnings on MSVC (which doesn't want to see both
__forceinline and inline) and ancient GCC (which doesn't have
__attribute__((always_inline))).

Don't force inline-ing when building at -O0, as the programmer is probably
hoping for exact source-to-object-line correspondence in that case.
(For the moment this only works for GCC; maybe we can extend it later.)

Make pg_attribute_always_inline be syntactically a drop-in replacement
for inline, rather than an additional wart.

And improve the comments.

Thomas Munro and Michail Nikolaev, small tweaks by me

Discussion: https://postgr.es/m/32278.1514863068@sss.pgh.pa.us
Discussion: https://postgr.es/m/CANtu0oiYp74brgntKOxgg1FK5+t8uQ05guSiFU6FYz_5KUhr6Q@mail.gmail.com
2018-01-23 23:07:13 -05:00
Tom Lane bb94ce4d26 Teach reparameterize_path() to handle AppendPaths.
If we're inside a lateral subquery, there may be no unparameterized paths
for a particular child relation of an appendrel, in which case we *must*
be able to create similarly-parameterized paths for each other child
relation, else the planner will fail with "could not devise a query plan
for the given query".  This means that there are situations where we'd
better be able to reparameterize at least one path for each child.

This calls into question the assumption in reparameterize_path() that
it can just punt if it feels like it.  However, the only case that is
known broken right now is where the child is itself an appendrel so that
all its paths are AppendPaths.  (I think possibly I disregarded that in
the original coding on the theory that nested appendrels would get folded
together --- but that only happens *after* reparameterize_path(), so it's
not excused from handling a child AppendPath.)  Given that this code's been
like this since 9.3 when LATERAL was introduced, it seems likely we'd have
heard of other cases by now if there were a larger problem.

Per report from Elvis Pranskevichus.  Back-patch to 9.3.

Discussion: https://postgr.es/m/5981018.zdth1YWmNy@hammer.magicstack.net
2018-01-23 16:50:34 -05:00
Alvaro Herrera 95be5ce1bc Remove unnecessary include
autovacuum.c no longer needs dsa.h, since commit 31ae1638ce.
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoCWvYyXrvdANSHWWWEWJH5TeAWAkJ_2gqrHhukG+OBo1g@mail.gmail.com
2018-01-23 15:22:13 -03:00
Peter Eisentraut f9bbd46adb pgbench: Remove accidental garbage in test file
Author: Fabien COELHO <coelho@cri.ensmp.fr>
2018-01-23 12:31:01 -05:00
Robert Haas 28e04155f1 Update obsolete sentence in README.parallel.
Since 9.6, heavyweight locking is not an abstract and unhandled
concern of the parallel machinery, but rather something to which
we have a specific approach.
2018-01-23 11:22:47 -05:00
Robert Haas 2badb5afb8 Report an ERROR if a parallel worker fails to start properly.
Commit 28724fd90d fixed things so that
if a background worker fails to start due to fork() failure or because
it is terminated before startup succeeds, BGWH_STOPPED will be
reported.  However, that only helps if the code that uses the
background worker machinery notices the change in status, and the code
in parallel.c did not.

To fix that, do two things.  First, make sure that when a worker
exits, it triggers the leader to read from error queues.  That way, if
a worker which has attached to an error queue exits uncleanly, the
leader is sure to throw some error, either the contents of the
ErrorResponse sent by the worker, or "lost connection to parallel
worker" if it exited without sending one.  To cover the case where
the worker never starts up in the first place or exits before
attaching to the error queue, the ParallelContext now keeps track
of which workers have sent at least one message via the error
queue.  A worker which sends no messages by the time the parallel
operation finishes will be checked to see whether it exited before
attaching to the error queue; if so, a new error message, "parallel
worker failed to initialize", will be reported.  If not, we'll
continue to wait until it either starts up and exits cleanly, starts
up and exits uncleanly, or fails to start, and then take the
appropriate action.

Patch by me, reviewed by Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com
2018-01-23 11:03:03 -05:00
Tom Lane 160a4f62ee In pg_dump, force reconnection after issuing ALTER DATABASE SET command(s).
The folly of not doing this was exposed by the buildfarm: in some cases,
the GUC settings applied through ALTER DATABASE SET may be essential to
interpreting the reloaded data correctly.  Another argument why we can't
really get away with the scheme proposed in commit b3f840120 is that it
cannot work for parallel restore: even if the parent process manages to
hang onto the previous GUC state, worker processes would see the state
post-ALTER-DATABASE.  (Perhaps we could have dodged that bullet by
delaying DATABASE PROPERTIES restoration to the end of the run, but
that does nothing for the data semantics problem.)

This leaves us with no solution for the default_transaction_read_only issue
that commit 4bd371f6f intended to work around, other than "you gotta remove
such settings before dumping/upgrading".  However, in view of the fact that
parallel restore broke that hack years ago and no one has noticed, it's
fair to question how many people care.  I'm unexcited about adding a large
dollop of new complexity to handle that corner case.

This would be a one-liner fix, except it turns out that ReconnectToServer
tries to optimize away "redundant" reconnections.  While that may have been
valuable when coded, a quick survey of current callers shows that there are
no cases where that's actually useful, so just remove that check.  While at
it, remove the function's useless return value.

Discussion: https://postgr.es/m/12453.1516655001@sss.pgh.pa.us
2018-01-23 10:55:16 -05:00
Peter Eisentraut 1c2183403b Extract common bits from OpenSSL implementation
Some things in be-secure-openssl.c and fe-secure-openssl.c were not
actually specific to OpenSSL but could also be used by other
implementations.  In order to avoid copy-and-pasting, move some of that
code to common files.
2018-01-23 07:11:39 -05:00
Peter Eisentraut f966101d19 Move SSL API comments to header files
Move the documentation of the SSL API calls are supposed to do into the
headers files, instead of keeping them in the files for the OpenSSL
implementation.  That way, they don't have to be duplicated or be
inconsistent when other implementations are added.
2018-01-23 07:11:39 -05:00
Peter Eisentraut 573bd08b99 Move EDH support to common files
The EDH support is not really specific to the OpenSSL implementation, so
move the support and documentation comments to common files.
2018-01-23 07:11:38 -05:00
Peter Eisentraut 7404e77cc1 Split out documentation of SSL parameters into their own section
Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL".  The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.

Also, the row_security parameter was inconsistently categorized, so
clean that up while we're here.
2018-01-23 07:11:38 -05:00
Peter Eisentraut f5da5683a8 Add installcheck support to more test suites
Several of the test suites under src/test/ were missing an installcheck
target.
2018-01-23 07:11:38 -05:00
Tom Lane b3f8401205 Move handling of database properties from pg_dumpall into pg_dump.
This patch rearranges the division of labor between pg_dump and pg_dumpall
so that pg_dump itself handles all properties attached to a single
database.  Notably, a database's ACL (GRANT/REVOKE status) and local GUC
settings established by ALTER DATABASE SET and ALTER ROLE IN DATABASE SET
can be dumped and restored by pg_dump.  This is a long-requested
improvement.

"pg_dumpall -g" will now produce only role- and tablespace-related output,
nothing about individual databases.  The total output of a regular
pg_dumpall run remains the same.

pg_dump (or pg_restore) will restore database-level properties only when
creating the target database with --create.  This applies not only to
ACLs and GUCs but to the other database properties it already handled,
that is database comments and security labels.  This is more consistent
and useful, but does represent an incompatibility in the behavior seen
without --create.

(This change makes the proposed patch to have pg_dump use "COMMENT ON
DATABASE CURRENT_DATABASE" unnecessary, since there is no case where
the command is issued that we won't know the true name of the database.
We might still want that patch as a feature in its own right, but pg_dump
no longer needs it.)

pg_dumpall with --clean will now drop and recreate the "postgres" and
"template1" databases in the target cluster, allowing their locale and
encoding settings to be changed if necessary, and providing a cleaner
way to set nondefault tablespaces for them than we had before.  This
means that such a script must now always be started in the "postgres"
database; the order of drops and reconnects will not work otherwise.
Without --clean, the script will not adjust any database-level properties
of those two databases (including their comments, ACLs, and security
labels, which it formerly would try to set).

Another minor incompatibility is that the CREATE DATABASE commands in a
pg_dumpall script will now always specify locale and encoding settings.
Formerly those would be omitted if they matched the cluster's default.
While that behavior had some usefulness in some migration scenarios,
it also posed a significant hazard of unwanted locale/encoding changes.
To migrate to another locale/encoding, it's now necessary to use pg_dump
without --create to restore into a database with the desired settings.

Commit 4bd371f6f's hack to emit "SET default_transaction_read_only = off"
is gone: we now dodge that problem by the expedient of not issuing ALTER
DATABASE SET commands until after reconnecting to the target database.
Therefore, such settings won't apply during the restore session.

In passing, improve some shaky grammar in the docs, and add a note pointing
out that pg_dumpall's output can't be expected to load without any errors.
(Someday we might want to fix that, but this is not that patch.)

Haribabu Kommi, reviewed at various times by Andreas Karlsson,
Vaishnavi Prabakaran, and Robert Haas; further hacking by me.

Discussion: https://postgr.es/m/CAJrrPGcUurV0eWTeXODwsOYFN=Ekq36t1s0YnFYUNzsmRfdAyA@mail.gmail.com
2018-01-22 14:09:09 -05:00
Tom Lane d6c84667d1 Reorder code in pg_dump to dump comments etc in a uniform order.
Most of the code in pg_dump dumps an object's comment, security label,
and ACL auxiliary TOC entries, in that order, immediately after the
object's main TOC entry, and at least dumpComment's API spec says this
isn't optional.  dumpDatabase was significantly violating that when
in binary-upgrade mode, by inserting totally unrelated stuff between.
Also, dumpForeignDataWrapper and dumpForeignServer were being randomly
inconsistent.  Reorder code so everybody does it the same.

This may be future-proofing us against some code growing a requirement for
such auxiliary entries to be adjacent to their main entry.  But for now
it's just neatnik-ism, so I see no need for back-patch.

Discussion: https://postgr.es/m/21714.1516553459@sss.pgh.pa.us
2018-01-22 12:37:11 -05:00
Peter Eisentraut f498704346 PL/Python: Fix tests for older Python versions
Commit 8561e4840c neglected to handle
older Python versions that don't support the "with" statement.  So write
the tests in a way that older versions can handle as well.
2018-01-22 12:09:52 -05:00
Tom Lane 2b792ab094 Make pg_dump's ACL, sec label, and comment entries reliably identifiable.
_tocEntryRequired() expects that it can identify ACL, SECURITY LABEL,
and COMMENT TOC entries that are for large objects by seeing whether
the tag for them starts with "LARGE OBJECT ".  While that works fine
for actual large objects, which are indeed tagged that way, it's
subject to false positives unless every such entry's tag starts with an
appropriate type ID.  And in fact it does not work for ACLs, because
up to now we customarily tagged those entries with just the bare name
of the object.  This means that an ACL for an object named
"LARGE OBJECT something" would be misclassified as data not schema,
with undesirable results in a schema-only or data-only dump ---
although pg_upgrade seems unaffected, due to the special case for
binary-upgrade mode further down in _tocEntryRequired().

We can fix this by changing all the dumpACL calls to use the label
strings already in use for comments and security labels, which do
follow the convention of starting with an object type indicator.

Well, mostly they follow it.  dumpDatabase() got it wrong, using
just the bare database name for those purposes, so that a database
named "LARGE OBJECT something" would similarly be subject to having
its comment or security label dropped or included when not wanted.
Bring that into line too.  (Note that up to now, database ACLs have
not been processed by pg_dump, so that this issue doesn't affect them.)

_tocEntryRequired() itself is not free of fault: it was overly liberal
about matching object tags to "LARGE OBJECT " in binary-upgrade mode.
This looks like it is probably harmless because there would be no data
component to strip anyway in that mode, but at best it's trouble
waiting to happen, so tighten that up too.

The possible misclassification of SECURITY LABEL entries for databases is
in principle a security problem, but the opportunities for actual exploits
seem too narrow to be interesting.  The other cases seem like just bugs,
since an object owner can change its ACL or comment for himself, he needn't
try to trick someone else into doing it by choosing a strange name.

This has been broken since per-large-object TOC entries were introduced
in 9.0, so back-patch to all supported branches.

Discussion: https://postgr.es/m/21714.1516553459@sss.pgh.pa.us
2018-01-22 12:06:18 -05:00
Peter Eisentraut 8561e4840c Transaction control in PL procedures
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.

- SPI

Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags.  The only option flag right now is
SPI_OPT_NONATOMIC.  A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed.  This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called.  A nonatomic SPI connection uses different
memory management.  A normal SPI connection allocates its memory in
TopTransactionContext.  For nonatomic connections we use PortalContext
instead.  As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.

SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.

- portalmem.c

Some adjustments were made in the code that cleans up portals at
transaction abort.  The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.

In AtAbort_Portals(), remove the code that marks an active portal as
failed.  As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort.  And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception.  So the code in AtAbort_Portals() is never used
anyway.

In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much.  This mirrors similar code in
PreCommit_Portals().

- PL/Perl

Gets new functions spi_commit() and spi_rollback()

- PL/pgSQL

Gets new commands COMMIT and ROLLBACK.

Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.

- PL/Python

Gets new functions plpy.commit and plpy.rollback.

- PL/Tcl

Gets new commands commit and rollback.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-22 08:43:06 -05:00
Magnus Hagander 1cc4f536ef Support huge pages on Windows
Add support for huge pages (called large pages on Windows) to the
Windows build.

This (probably) breaks compatibility with Windows versions prior to
Windows 2003 or Windows Vista.

Authors: Takayuki Tsunakawa and Thomas Munro
Reviewed by: Magnus Hagander, Amit Kapila
2018-01-21 15:40:46 +01:00
Magnus Hagander 5c15a54e85 Fix wording of "hostaddrs"
The field is still called "hostaddr", so make sure references use
"hostaddr values" instead.

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-21 13:41:52 +01:00
Peter Eisentraut 918e02a221 Improve type conversion of SPI_processed in Python
The previous code converted SPI_processed to a Python float if it didn't
fit into a Python int.  But Python longs have unlimited precision, so
use that instead in all cases.

As in eee50a8d4c, we use the Python
LongLong API unconditionally for simplicity.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
2018-01-20 08:02:01 -05:00
Tom Lane 96102a32a3 Suppress possibly-uninitialized-variable warnings.
Apparently, Peter's compiler has faith that the switch test values here
could never not be valid values of their enums.  Mine does not, and
I tend to agree with it.
2018-01-19 22:16:25 -05:00
Peter Eisentraut eee50a8d4c PL/Python: Simplify PLyLong_FromInt64
We don't actually need two code paths, one for 32 bits and one for 64
bits.  Since the existing code already assumed that "long long" is
available, we can just use PyLong_FromLongLong() for 64 bits as well.
In Python 2.5 and later, PyLong_FromLong() and PyLong_FromLongLong() use
the same code, so there will be no difference for 64-bit platforms.  In
Python 2.4, the code is different, but performance testing showed no
noticeable difference in PL/Python, and that Python version is ancient
anyway.

Discussion: https://www.postgresql.org/message-id/0a02203c-e157-55b2-464e-6087066a1849@2ndquadrant.com
2018-01-19 17:22:38 -05:00
Robert Haas 2f17844104 Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 15:33:06 -05:00
Alvaro Herrera 7f17fd6fc7 Fix CompareIndexInfo's attnum comparisons
When an index column is an expression, it makes no sense to compare its
attribute numbers.

This seems to account for remaining buildfarm fallout from 8b08f7d482.
At least, it solves the issue in my local 32bit VM -- let's see what the
rest thinks.
2018-01-19 16:56:42 -03:00
Peter Eisentraut 8b9e9644dc Replace AclObjectKind with ObjectType
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that.  It's only used in
aclcheck_error.  By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-19 14:01:15 -05:00
Peter Eisentraut 2c6f37ed62 Replace GrantObjectType with ObjectType
There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with
b256f24264.  But this conversion was never
done for the ACL commands until now.

This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Stephen Frost <sfrost@snowman.net>
2018-01-19 14:01:14 -05:00
Alvaro Herrera 42b5856038 Fix pg_dump version comparison
I missed a '0' in the version number string ...

Per buildfarm member crake.
2018-01-19 13:24:54 -03:00
Alvaro Herrera 189d0ff588 Fix regression tests for better stability
Per buildfarm
2018-01-19 12:31:34 -03:00
Alvaro Herrera 8b08f7d482 Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.

As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones.  Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.

To support pg_dump'ing these indexes, add commands
    CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
    ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index).  These reconstruct prior
database state exactly.

Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
	Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 11:49:22 -03:00
Alvaro Herrera 1ef61ddce9 Fix StoreCatalogInheritance1 to use 32bit inhseqno
For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog.  This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:

ERROR:  duplicate key value violates unique constraint «pg_inherits_relid_seqno_index»
DETAIL:  Key (inhrelid, inhseqno)=(329371, 0) already exists.

Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9d111:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349

Backpatch all the way back.

David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
2018-01-19 10:17:54 -03:00
Robert Haas 29d58fd3ad Transfer state pertaining to pending REINDEX operations to workers.
This will allow the pending patch for parallel CREATE INDEX to work
on system catalogs, and to provide the same level of protection
against use of user indexes while they are being rebuilt that we
have for non-parallel CREATE INDEX.

Patch by me, reviewed by Peter Geoghegan.

Discussion: http://postgr.es/m/CA+TgmoYN-YQU9JsGQcqFLovZ-C+Xgp1_xhJQad=cunGG-_p5gg@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wzkv4UNkXYhqQRqk-u9rS7h5c-4cCW+EqQ8K_WSeS43aZg@mail.gmail.com
2018-01-19 07:48:54 -05:00
Simon Riggs 4e54dd2e0a Fix typo in recent commit
Typo in 9c7d06d606

Reported-by: Masahiko Sawada
2018-01-19 06:36:17 +00:00
Peter Eisentraut a228e44ce4 Update comment
The "callback" that this comment was referring to was removed by commit
c0a15e07cd, so update to match the current
code.
2018-01-18 19:36:34 -05:00
Peter Eisentraut 958c7ae0b7 Fix typo and improve punctuation 2018-01-18 13:00:49 -05:00
Peter Eisentraut 77216cae47 Add tests for session_replication_role
This was hardly tested at all.  The trigger case was lightly tested by
the logical replication tests, but rules and event triggers were not
tested at all.
2018-01-18 11:24:07 -05:00
Bruce Momjian f033462d8f Reorder C includes
Reorder header files in joinrels.c and pathnode.c in alphabetical order,
removing unnecessary ones.

Author: Etsuro Fujita
2018-01-17 18:10:05 -05:00
Tom Lane dca48d145e Remove useless lookup of root partitioned rel in ExecInitModifyTable().
node->partitioned_rels is only set in UPDATE/DELETE cases, but
ExecInitModifyTable only uses its "rel" variable in INSERT cases,
so the extra logic to find the root rel is just a waste of complexity
and cycles.

Etsuro Fujita, reviewed by Amit Langote

Discussion: https://postgr.es/m/93cf9816-2f7d-0f67-8ed2-4a4e497a6ab8@lab.ntt.co.jp
2018-01-17 14:44:15 -05:00
Simon Riggs 9c7d06d606 Ability to advance replication slots
Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().

For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.

Author: Petr Jelinek
Reviewed-by: Simon Riggs
2018-01-17 11:38:34 +00:00
Andrew Dunstan 585e166e46 Fix compiler warnings due to commit cc4feded 2018-01-17 03:33:02 -05:00
Andrew Dunstan cc4feded0a Centralize json and jsonb handling of datetime types
The creates a single function JsonEncodeDateTime which will format these
data types in an efficient and consistent manner. This will be all the
more important when we come to jsonpath so we don't have to implement yet
more code doing the same thing in two more places.

This also extends the code to handle time and timetz types which were
not previously handled specially. This requires exposing the time2tm and
timetz2tm functions.

Patch from Nikita Glukhov
2018-01-16 19:07:13 -05:00
Peter Eisentraut d91da5eced Remove useless use of bit-masking macros
In this case, the macros SET_8_BYTES(), GET_8_BYTES(), SET_4_BYTES(),
GET_4_BYTES() are no-ops, so we can just remove them.

The plan is to perhaps remove them from the source code altogether, so
we'll start here.

Discussion: https://www.postgresql.org/message-id/5d51721a-69ef-2053-9172-599b539f0628@2ndquadrant.com
2018-01-16 17:12:16 -05:00
Michael Meskes 649aeb123f Cope with indicator arrays that do not have the correct length.
Patch by: "Rader, David" <davidr@openscg.com>
2018-01-13 14:57:49 +01:00
Tom Lane 680d540502 Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT.
If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.

An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks.  Instead, let's just drop the check on attinhcount.  In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.

This problem has existed for a long time, so back-patch to all supported
branches.  Also add an isolation test verifying related behaviors.

Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.

Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp
2018-01-12 15:46:37 -05:00
Tom Lane 90947674fc Fix incorrect handling of subquery pullup in the presence of grouping sets.
If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.

To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing.  This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.

Back-patch to 9.5 where grouping sets were introduced.

Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth

Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi
2018-01-12 12:24:50 -05:00
Michael Meskes ca4587f3f9 Fix parsing of compatibility mode argument.
Patch by Ashutosh Sharma <ashu.coek88@gmail.com>
2018-01-12 16:00:43 +01:00
Alvaro Herrera 49c784ece7 Remove hard-coded schema knowledge about pg_attribute from genbki.pl
Add the ability to label a column's default value in the catalog header,
and implement this for pg_attribute.  A new function in Catalog.pm is
used to fill in a tuple with defaults.  The build process will complain
loudly if a catalog entry is incomplete,

Commit 8137f2c323 labeled variable length columns for the C preprocessor.
Expose that label to genbki.pl so we can exclude those columns from schema
macros in a general fashion. Also, format schema macro entries according
to their types.

This means slightly less code maintenance, but more importantly it's a
proving ground for mechanisms intended to be used in later commits.

While at it, I (Álvaro) couldn't resist making some changes in
genbki.pl: rename some functions to actually indicate their purpose
instead of actively misleading onlookers; and don't iterate on the whole
of pg_type to find the entry for each catalog row, using a hash instead
of an array.

Author: John Naylor, some changes by Álvaro Herrera
Discussion: https://postgr.es/m/CAJVSVGVJHwD8sfDfZW9TbCHWKf=C1YDRM-rF=2JenRU_y+VcFg@mail.gmail.com
2018-01-12 11:21:42 -03:00
Bruce Momjian bdb70c12b3 C comment: fix "the the" mentions in C comments
Reported-by: Christoph Dreis

Discussion: https://postgr.es/m/007e01d3519e$2734ca10$759e5e30$@freenet.de

Author: Christoph Dreis
2018-01-11 21:50:21 -05:00
Peter Eisentraut bbd3363e12 Refactor subscription tests to use PostgresNode's wait_for_catchup
This was nearly the same code.  Extend wait_for_catchup to allow waiting
for pg_current_wal_lsn() and use that in the subscription tests.  Also
change one use in the pg_rewind tests to use this.

Also remove some broken code in wait_for_catchup and
wait_for_slot_catchup.  The error message in case the waiting failed
wanted to show the current LSN, but the way it was written never
worked.  So since nobody ever cared, just remove it.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-11 13:35:38 -05:00
Tom Lane 4d41b2e092 Add QueryEnvironment to ExplainOneQuery_hook's parameter list.
This should have been done in commit 18ce3a4ab, which added that parameter
to ExplainOneQuery, but it was overlooked.  This makes it impossible for
a user of the hook to pass the queryEnv down to ExplainOnePlan.

It's too late to change this API in v10, I suppose, but fortunately
passing NULL to ExplainOnePlan will work in nearly all interesting
cases in v10.  That might not be true forever, so we'd better fix it.

Tatsuro Yamada, reviewed by Thomas Munro

Discussion: https://postgr.es/m/890e8dd9-c1c7-a422-6892-874f5eaee048@lab.ntt.co.jp
2018-01-11 12:16:18 -05:00
Peter Eisentraut 9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Peter Eisentraut 70d6226e4f Use portal pinning in PL/Perl and PL/Python
PL/pgSQL "pins" internally generated portals so that user code cannot
close them by guessing their names.  Add this functionality to PL/Perl
and PL/Python as well, preventing users from manually closing cursors
created by spi_query and plpy.cursor, respectively.  (PL/Tcl does not
currently offer any cursor functionality.)
2018-01-10 17:10:32 -05:00
Peter Eisentraut 5115854170 Add tests for PL/pgSQL returning unnamed portals as refcursor
Existing tests only covered returning explicitly named portals as
refcursor.  The unnamed cursor case was recently broken without a test
failing.
2018-01-10 16:39:13 -05:00
Peter Eisentraut b48b2f8793 Revert "Move portal pinning from PL/pgSQL to SPI"
This reverts commit b3617cdfbb.

This broke returning unnamed cursors from PL/pgSQL functions.
Apparently, there are no test cases for this.
2018-01-10 16:01:17 -05:00
Tom Lane 3afd75eaac Remove dubious micro-optimization in ckpt_buforder_comparator().
It seems incorrect to assume that the list of CkptSortItems can never
contain duplicate page numbers: concurrent activity could result in some
page getting dropped from a low-numbered buffer and later loaded into a
high-numbered buffer while BufferSync is scanning the buffer pool.
If that happened, the comparator would give self-inconsistent results,
potentially confusing qsort().  Saving one comparison step is not worth
possibly getting the sort wrong.

So far as I can tell, nothing would actually go wrong given our current
implementation of qsort().  It might get a bit slower than expected
if there were a large number of duplicates of one value, but that's
surely a probability-epsilon case.  Still, the comment is wrong,
and if we ever switched to another sort implementation it might be
less forgiving.

In passing, avoid casting away const-ness of the argument pointers;
I've not seen any compiler complaints from that, but it seems likely
that some compilers would not like it.

Back-patch to 9.6 where this code came in, just in case I've underestimated
the possible consequences.

Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us
2018-01-10 15:50:54 -05:00
Robert Haas 2fd58096f0 Add missing "return" statement to accumulate_append_subpath.
Without this, Parallel Append can end up with extra children.

Report by Rajkumar Raghuwanshi.  Fix by Amit Khandekar.  Brown
paper bag bug by me.

Discussion: http://postgr.es/m/CAKcux6mBF-NiddyEe9LwymoUC5+wh8bQJ=uk2gGkOE+L8cv=LA@mail.gmail.com
2018-01-10 11:21:20 -05:00
Peter Eisentraut b3617cdfbb Move portal pinning from PL/pgSQL to SPI
PL/pgSQL "pins" internally generated (unnamed) portals so that user code
cannot close them by guessing their names.  This logic is also useful in
other languages and really for any code.  So move that logic into SPI.
An unnamed portal obtained through SPI_cursor_open() and related
functions is now automatically pinned, and SPI_cursor_close()
automatically unpins a portal that is pinned.

In the core distribution, this affects PL/Perl and PL/Python, preventing
users from manually closing cursors created by spi_query and
plpy.cursor, respectively.  (PL/Tcl does not currently offer any cursor
functionality.)

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
2018-01-10 10:20:51 -05:00
Peter Eisentraut acc67ffd0a Give more accurate error message for dropping pinned portal
The previous code gave the same error message for attempting to drop
pinned and active portals, but those are separate states, so give
separate error messages.
2018-01-10 09:22:07 -05:00
Teodor Sigaev d16c2de624 Fix allowing of leading zero on exponents in pgbench test results
Commit bc7fa0c15c accidentally lost fixes of
0aa1d489ea commit.

Thanks to Thomas Munro
2018-01-10 11:33:37 +03:00
Bruce Momjian fccaea4549 Remove outdated/removed Win32 URLs in C comments
Reported-by: Ashutosh Sharma
2018-01-09 18:33:21 -05:00
Andres Freund 69c3936a14 Expression evaluation based aggregate transition invocation.
Previously aggregate transition and combination functions were invoked
by special case code in nodeAgg.c, evaluating input and filters
separately using the expression evaluation machinery. That turns out
to not be great for performance for several reasons:

- repeated expression evaluations have some cost
- the transition functions invocations are poorly predicted, as
  commonly there are multiple aggregates in a query, resulting in the
  same call-stack invoking different functions.
- filter and input computation had to be done separately
- the special case code made it hard to implement JITing of the whole
  transition function invocation

Address this by building one large expression that computes input,
evaluates filters, and invokes transition functions.

This leads to moderate speedups in queries bottlenecked by aggregate
computations, and enables large speedups for similar cases once JITing
is done.

There's potential for further improvement:
- It'd be nice if we could simplify the somewhat expensive
  aggstate->all_pergroups lookups.
- right now there's still an advance_transition_function invocation in
  nodeAgg.c, leading to some code duplication.

Author: Andres Freund
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-01-09 13:25:38 -08:00
Alvaro Herrera 272c2ab9fd Change some bogus PageGetLSN calls to BufferGetLSNAtomic
As src/backend/access/transam/README says, PageGetLSN may only be called
by processes holding either exclusive lock on buffer, or a shared lock
on buffer plus buffer header lock.  Therefore any place that only holds
a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
which internally obtains buffer header lock prior to reading the LSN.

A few callsites failed to comply with this rule.  This was detected by
running all tests under a new (not committed) assertion that verifies
PageGetLSN locking contract.  All but one of the callsites that failed
the assertion are fixed by this patch.  Remaining callsites were
inspected manually and determined not to need any change.

The exception (unfixed callsite) is in TestForOldSnapshot, which only
has a Page argument, making it impossible to access the corresponding
Buffer from it.  Fixing that seems a much larger patch that will have to
be done separately; and that's just as well, since it was only
introduced in 9.6 and other bugs are much older.

Some of these bugs are ancient; backpatch all the way back to 9.3.

Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
2018-01-09 17:06:31 -03:00
Andrew Dunstan 11b623dd0a Implement TZH and TZM timestamp format patterns
These are compatible with Oracle and required for the datetime template
language for jsonpath in an upcoming patch.

Nikita Glukhov and Andrew Dunstan, reviewed by Pavel Stehule.
2018-01-09 14:25:05 -05:00
Peter Eisentraut a77dd53f30 Remove PortalGetQueryDesc()
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Peter Eisentraut 0f7c49e855 Update portal-related memory context names and API
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.

Rename PortalData's "heap" field to portalContext.  The "heap" naming
seems quite antiquated and confusing.  Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-01-09 13:47:56 -05:00
Tom Lane 3cb1b2a880 Rewrite list_qsort() to avoid trashing its input list.
The initial implementation of list_qsort(), from commit ab7271677,
re-used the ListCells of the input list while not touching the List
header.  This meant that anybody who still had a pointer to the
original header would now be in possession of a corrupted list,
a problem that seems sure to bite us eventually.

One possible solution is to re-use the original List header as well,
giving the function the semantics of update-in-place.  However, that
doesn't seem like a very good idea either given the way that the
function is used in the planner: create_path functions aren't normally
supposed to modify their input lists.  It doesn't look like there would
be a problem today, but it's not hard to foresee a time when modifying
a list of Paths in-place could have side-effects on some other append
path.

On the whole, and in view of the likelihood that this function might
be used in other contexts in the future, it seems best to get rid of
the micro-optimization of re-using the input list cells.  Just build
a new list.

Discussion: https://postgr.es/m/16912.1515449066@sss.pgh.pa.us
2018-01-09 13:25:53 -05:00
Tom Lane 624e440a47 Improve the heuristic for ordering child paths of a parallel append.
Commit ab7271677 introduced code that attempts to order the child
scans of a Parallel Append node in a way that will minimize execution
time, based on total cost and startup cost.  However, it failed to
think hard about what to do when estimated costs are exactly equal;
a case that's particularly likely to occur when comparing on startup
cost.  In such a case the ordering of the child paths would be left
to the whims of qsort, an algorithm that isn't even stable.

We can improve matters by applying the rule used elsewhere in the
planner: if total costs are equal, sort on startup cost, and
vice versa.  When both cost estimates are exactly equal, rather
than letting qsort do something unpredictable, sort based on the
child paths' relids, which should typically result in sorting in
inheritance order.  (The latter provision requires inventing a
qsort-style comparator for bitmapsets, but maybe we'll have use
for that for other reasons in future.)

This results in a few plan changes in the select_parallel test,
but those all look more reasonable than before, when the actual
underlying cost numbers are taken into account.

Discussion: https://postgr.es/m/4944.1515446989@sss.pgh.pa.us
2018-01-09 13:07:52 -05:00
Tom Lane 80259d4dbf While waiting for a condition variable, detect postmaster death.
The general assumption for postmaster child processes is that they
should just exit(1), reasonably promptly, if the postmaster disappears.
condition_variable.c neglected this consideration and could be left
waiting forever, if the counterpart process it is waiting for has
done the right thing and exited.

We had some discussion of adjusting the WaitEventSet API to make it
harder to make this type of mistake in future; but for the moment,
and for v10, let's make this narrow fix.

Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us
2018-01-09 12:34:57 -05:00
Peter Eisentraut c3d41ccf59 Fix ssl tests for when tls-server-end-point is not supported
Add a function to TestLib that allows us to check pg_config.h and then
decide the expected test outcome based on that.

Author: Michael Paquier <michael.paquier@gmail.com>
2018-01-09 12:28:49 -05:00
Tom Lane 8a906204ae Fix race condition during replication origin drop.
replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended.  The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop.  What we can do, in the wake of commit 13db3b936,
is drop the ConditionVariablePrepareToSleep call entirely.  This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs.  We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
2018-01-09 12:09:30 -05:00
Tom Lane 13db3b9363 Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs.
The original coding here insisted that callers manually cancel any prepared
sleep for one condition variable before starting a sleep on another one.
While that's not a huge burden today, it seems like a gotcha that will bite
us in future if the use of condition variables increases; anything we can
do to make the use of this API simpler and more robust is attractive.
Hence, allow these functions to automatically switch their attention to
a different CV when required.  This is safe for the same reason it was OK
for commit aced5a92b to let a broadcast operation cancel any prepared CV
sleep: whenever we return to the other test-and-sleep loop, we will
automatically re-prepare that CV, paying at most an extra test of that
loop's exit condition.

Back-patch to v10 where condition variables were introduced.  Ordinarily
we would probably not back-patch a change like this, but since it does not
invalidate any coding pattern that was legal before, it seems safe enough.
Furthermore, there's an open bug in replorigin_drop() for which the
simplest fix requires this.  Even if we chose to fix that in some more
complicated way, the hazard would remain that we might back-patch some
other bug fix that requires this behavior.

Patch by me, reviewed by Thomas Munro.

Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us
2018-01-09 11:39:10 -05:00
Robert Haas 921059bd66 Don't allow VACUUM VERBOSE ANALYZE VERBOSE.
There are plans to extend the syntax for ANALYZE, so we need to break
the link between VacuumStmt and AnalyzeStmt.  But apart from that, the
syntax above is undocumented and, if discovered by users, might give
the impression that the VERBOSE option for VACUUM differs from the
verbose option from ANALYZE, which it does not.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada

Discussion: http://postgr.es/m/D3FC73E2-9B1A-4DB4-8180-55F57D116B4E@amazon.com
2018-01-09 10:20:48 -05:00
Teodor Sigaev bc7fa0c15c Improve scripting language in pgbench
Added:
 - variable now might contain integer, double, boolean and null values
 - functions ln, exp
 - logical AND/OR/NOT
 - bitwise AND/OR/NOT/XOR
 - bit right/left shift
 - comparison operators
 - IS [NOT] (NULL|TRUE|FALSE)
 - conditional choice (in form of when/case/then)

New operations and functions allow to implement more complicated test scenario.

Author: Fabien Coelho with minor editorization by me
Reviewed-By: Pavel Stehule, Jeevan Ladhe, me
Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.1604030742390.31618@sto
2018-01-09 18:02:04 +03:00
Robert Haas 63008b19ee Fix comment.
RELATION_IS_OTHER_TEMP is tested in the caller, not here.

Discussion: http://postgr.es/m/5A5438E4.3090709@lab.ntt.co.jp
2018-01-09 09:40:31 -05:00
Bruce Momjian d25ee30031 pg_upgrade: prevent check on live cluster from generating error
Previously an inaccurate but harmless error was generated when running
--check on a live server before reporting the servers as compatible.
The fix is to split error reporting and exit control in the exec_prog()
API.

Reported-by: Daniel Westermann

Backpatch-through: 10
2018-01-08 22:43:51 -05:00
Tom Lane e35dba475a Cosmetic improvements in condition_variable.[hc].
Clarify a bunch of comments.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:28:03 -05:00
Tom Lane ea8e1bbc53 Improve error detection capability in proclists.
Previously, although the initial state of a proclist_node is expected
to be next == prev == 0, proclist_delete_offset would reset nodes to
next == prev == INVALID_PGPROCNO when removing them from a list.
This is the same state that a node in a singleton list has, so that
it's impossible to distinguish not-in-a-list from in-a-list.  Change
proclist_delete_offset to reset removed nodes to next == prev == 0,
making it possible to distinguish those cases, and then add Asserts
to the list add and delete functions that the supplied node isn't
or is in a list at entry.  Also tighten assertions about the node
being in the particular list (not some other one) where it is possible
to check that in O(1) time.

In ConditionVariablePrepareToSleep, since we don't expect the process's
cvWaitLink to already be in a list, remove the more-or-less-useless
proclist_contains check; we'd rather have proclist_push_tail's new
assertion fire if that happens.

Improve various comments related to proclists, too.

Patch by me, reviewed by Thomas Munro.  This isn't back-patchable, since
there could theoretically be inlined copies of proclist_delete_offset in
third-party modules.  But it's only improving debuggability anyway.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-08 18:07:04 -05:00
Tom Lane eeb3c2df42 Back off chattiness in RemovePgTempFiles().
In commit 561885db0, as part of normalizing RemovePgTempFiles's error
handling, I removed its behavior of silently ignoring ENOENT failures
during directory opens.  Thomas Munro points out that this is a bad idea at
the top level, because we don't create pgsql_tmp directories until needed.
Thus this coding could produce LOG messages in perfectly normal situations,
which isn't what I intended.  Restore the suppression of ENOENT logging,
but only at top level --- it would still be unexpected for a nested temp
directory to disappear between seeing it in the parent directory and
opening it.

Discussion: https://postgr.es/m/CAEepm=2y06SehAkTnd5sU_eVqdv5P-=Srt1y5vYNQk6yVDVaPw@mail.gmail.com
2018-01-07 20:40:40 -05:00
Simon Riggs 6271fceb8a Add TIMELINE to backup_label file
Allows new test to confirm timelines match

Author: Michael Paquier
Reviewed-by: David Steele
2018-01-06 12:24:19 +00:00
Simon Riggs 6668a54eb8 Default monitoring roles - errata
25fff40798 introduced
default monitoring roles. Apply these corrections:

* Allow access to pg_stat_get_wal_senders()
  by role pg_read_all_stats

* Correct comment in pg_stat_get_wal_receiver()
  to show it is no longer superuser-only.

Author: Feike Steenbergen
Reviewed-by: Michael Paquier

Apply to HEAD, then later backpatch to 10
2018-01-06 11:48:21 +00:00
Tom Lane ccf312a448 Remove return values of ConditionVariableSignal/Broadcast.
In the wake of commit aced5a92b, the semantics of these results are
a bit squishy: we can tell whether we signaled some other process(es),
but we do not know which ones were real waiters versus mere sentinels
for ConditionVariableBroadcast operations.  It does not help much that
ConditionVariableBroadcast will attempt to pass on the signal to the
next real waiter, because (a) there might not be one, and (b) that will
only happen awhile later, anyway.  So these results could overstate how
much effect the calls really had.

However, no existing caller of either function pays any attention to its
result value, so it seems reasonable to just define that as a required
property of a correct algorithm.  To encourage correctness and save some
tiny number of cycles, change both functions to return void.

Patch by me, per an observation by Thomas Munro.  No back-patch, since
if any third parties happen to be using these functions, they might not
appreciate an API break in a minor release.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 20:33:26 -05:00
Tom Lane 3cac0ec859 Reorder steps in ConditionVariablePrepareToSleep for more safety.
In the admittedly-very-unlikely case that AddWaitEventToSet fails,
ConditionVariablePrepareToSleep would error out after already having
set cv_sleep_target, which is probably bad, and after having already
set cv_wait_event_set, which is very bad.  Transaction abort might or
might not clean up cv_sleep_target properly; but there is nothing
that would be aware that the WaitEventSet wasn't fully constructed,
so that all future condition variable sleeps would be broken.
We can easily guard against these hazards with slight restructuring.

Back-patch to v10 where condition_variable.c was introduced.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:42:49 -05:00
Tom Lane aced5a92bf Rewrite ConditionVariableBroadcast() to avoid live-lock.
The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible".  Thomas Munro found out
it was a bit too dumb.  An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle.  Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore.  There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel.  But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
2018-01-05 19:21:30 -05:00
Robert Haas 19c47e7c82 Factor error generation out of ExecPartitionCheck.
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.

Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.

Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
2018-01-05 15:22:33 -05:00
Bruce Momjian 84a6f63e32 pg_upgrade: remove C comment
Revert another part of 959ee6d267 .

Backpatch-through: 10
2018-01-05 14:49:36 -05:00