Commit Graph

34295 Commits

Author SHA1 Message Date
Tom Lane 93f03dad82 Make BufFileCreateTemp() ensure that temp tablespaces are set up.
If PrepareTempTablespaces() has never been called in the current
transaction, OpenTemporaryFile() will fall back to using the default
tablespace, which is a bug if the user wanted temp files placed elsewhere.
gistInitBuildBuffers() appears to have this disease already, and it
seems like an easy trap for future coders to fall into.

We discussed other ways to close this gap, but none of them are prettier
or more reliable than just having BufFileCreateTemp do it.  In particular,
having fd.c do this creates layering issues that we could do without.

Per suggestion from Melanie Plageman.  Arguably this is a bug fix, but
nobody seems very excited about back-patching, so change in HEAD only.

Discussion: https://postgr.es/m/CAAKRu_YwzjuGAmmaw4-8XO=OVFGR1QhY_Pq-t3wjb9ribBJb_Q@mail.gmail.com
2019-05-18 13:51:16 -04:00
Tom Lane d307954a7d "A void function may not return a value".
Per buildfarm.
2019-05-18 00:40:39 -04:00
Andres Freund 147e3722f7 tableam: Avoid relying on relation size to determine validity of tids.
Instead add a tableam callback to do so. To avoid adding per
validation overhead, pass a scan to tuple_tid_valid. In heap's case
we'd otherwise incurred a RelationGetNumberOfBlocks() call for each
tid - which'd have added noticable overhead to nodeTidscan.c.

Author: Andres Freund
Reviewed-By: Ashwin Agrawal
Discussion: https://postgr.es/m/20190515185447.gno2jtqxyktylyvs@alap3.anarazel.de
2019-05-17 18:56:55 -07:00
Andres Freund 7f44ede594 tableam: Don't assume that every AM uses md.c style storage.
Previously various parts of the code routed size requests through
RelationGetNumberOfBlocks[InFork]. That works if md.c is used by the
AM, but not otherwise.

Add a tableam callback to return the size of the table. As not every
AM will use postgres' BLCKSZ, have it return bytes, and have
RelationGetNumberOfBlocksInFork() round the byte size up into blocks.

To allow code outside of the AM to determine the actual relation size
map InvalidForkNumber the total size of a relation, as not every AM
might just need the postgres defined forks.

A few users of RelationGetNumberOfBlocks() ought to be converted away
from that. One case, the use of it to determine whether a tid is
valid, will be fixed in a follow up commit. Others will have to wait
for v13.

Author: Andres Freund
Discussion: https://postgr.es/m/20190423225201.3bbv6tbqzkb5w7cw@alap3.anarazel.de
2019-05-17 18:56:47 -07:00
Tom Lane 6630ccad7a Restructure creation of run-time pruning steps.
Previously, gen_partprune_steps() always built executor pruning steps
using all suitable clauses, including those containing PARAM_EXEC
Params.  This meant that the pruning steps were only completely safe
for executor run-time (scan start) pruning.  To prune at executor
startup, we had to ignore the steps involving exec Params.  But this
doesn't really work in general, since there may be logic changes
needed as well --- for example, pruning according to the last operator's
btree strategy is the wrong thing if we're not applying that operator.
The rules embodied in gen_partprune_steps() and its minions are
sufficiently complicated that tracking their incremental effects in
other logic seems quite impractical.

Short of a complete redesign, the only safe fix seems to be to run
gen_partprune_steps() twice, once to create executor startup pruning
steps and then again for run-time pruning steps.  We can save a few
cycles however by noting during the first scan whether we rejected
any clauses because they involved exec Params --- if not, we don't
need to do the second scan.

In support of this, refactor the internal APIs in partprune.c to make
more use of passing information in the GeneratePruningStepsContext
struct, rather than as separate arguments.

This is, I hope, the last piece of our response to a bug report from
Alan Jackson.  Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-17 19:44:34 -04:00
Michael Paquier 6ba500cae6 Fix regression test outputs
75445c1 has caused various failures in tests across the tree after
updating some error messages, so fix the newly-expected output.

Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/8332.1558048838@sss.pgh.pa.us
2019-05-17 09:40:02 +09:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Peter Geoghegan 3f58cc6dd8 Remove extra nbtree half-dead internal page check.
It's not safe for nbtree VACUUM to attempt to delete a target page whose
right sibling is already half-dead, since that would fail the
cross-check when VACUUM attempts to re-find a downlink to the right
sibling in the parent page.  Logic to prevent this from happening was
added by commit 8da3183780, which addressed a bug in the overhaul of
page deletion that went into PostgreSQL 9.4 (commit efada2b8e9).
VACUUM was made to check the right sibling page, and back off when it
happened to be half-dead already.

However, it is only truly necessary to do the right sibling check on the
leaf level, since that transitively determines if the deletion target's
parent's right sibling page is itself undergoing deletion.  Remove the
internal page level check, and add a comment explaining why the leaf
level check alone suffices.

The extra check is also unnecessary due to the fact that internal pages
that are marked half-dead are generally considered corrupt.  Commit
efada2b8e9 established the principle that there should never be
half-dead internal pages (internal pages pending deletion are possible,
but that status is never directly represented in the internal page).
VACUUM will complain about corruption when it encounters half-dead
internal pages, so VACUUM is bound to raise an error one way or another
when an nbtree index has a half-dead internal page (contrib/amcheck will
also report that the page is corrupt).

It's possible that a pg_upgrade'd 9.3 database will still have half-dead
internal pages, so it may seem like there is an argument for leaving the
check in place to reliably get a cleaner error message that advises the
user to REINDEX.  However, leaf pages are also deleted in the first
phase of deletion prior to PostgreSQL 9.4, so I believe we won't even
attempt to re-find the parent page anyway (we won't have the fully
deleted leaf page as the right sibling of our target page, so we won't
even try to find a downlink for it).

Discussion: https://postgr.es/m/CAH2-Wzm_ntmqJjWLRyKzimFmFvk+BnVAvUpaA4s1h9Ja58woaQ@mail.gmail.com
2019-05-16 15:11:58 -07:00
Tom Lane 3922f10646 Fix bogus logic for combining range-partitioned columns during pruning.
gen_prune_steps_from_opexps's notion of how to do this was overly
complicated and underly correct.

Per discussion of a report from Alan Jackson (though this fixes only one
aspect of that problem).  Back-patch to v11 where this code came in.

Amit Langote

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-16 16:25:43 -04:00
Tom Lane 4b1fcb43d0 Fix partition pruning to treat stable comparison operators properly.
Cross-type comparison operators in a btree or hash opclass might be
only stable not immutable (this is true of timestamp vs. timestamptz
for example).  partprune.c ignored this possibility and would perform
plan-time pruning with them anyway, possibly leading to wrong answers
if the environment changed between planning and execution.

To fix, teach gen_partprune_steps() to do things differently when
creating plan-time pruning steps vs. run-time pruning steps.
analyze_partkey_exprs() also needs an extra check, which is rather
annoying but now is not the time to restructure things enough to
avoid that.

While at it, simplify the logic for the plan-time case a little
by insisting that the comparison value be a Const and nothing else.
This relies on the assumption that eval_const_expressions will have
reduced any immutable expression to a Const; which is not quite
100% true, but certainly any case that comes up often enough to be
interesting should have simplification logic there.

Also improve a bunch of inadequate/obsolete/wrong comments.

Per discussion of a report from Alan Jackson (though this fixes only one
aspect of that problem).  Back-patch to v11 where this code came in.

David Rowley, with some further hacking by me

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-16 11:58:21 -04:00
Peter Geoghegan 489e431ba5 Remove obsolete nbtree insertion comment.
Remove a Berkeley-era comment above _bt_insertonpg() that admonishes the
reader to grok Lehman and Yao's paper before making any changes.  This
made a certain amount of sense back when _bt_insertonpg() was
responsible for most of the things that are now spread across
_bt_insertonpg(), _bt_findinsertloc(), _bt_insert_parent(), and
_bt_split(), but it doesn't work like that anymore.

I believe that this comment alludes to the need to "couple" or "crab"
buffer locks as we ascend the tree as page splits cascade upwards.  The
nbtree README already explains this in detail, which seems sufficient.
Besides, the changes to page splits made by commit 40dae7ec53 altered
the exact details of how buffer locks are retained during splits; Lehman
and Yao's original algorithm seems to release the lock on the left child
page/buffer slightly earlier than _bt_insertonpg()/_bt_insert_parent()
can.
2019-05-15 16:53:11 -07:00
Tom Lane 8a0f0ad540 Remove no-longer-used typedef.
struct ClonedConstraint is no longer needed, so delete it.

Discussion: https://postgr.es/m/18102.1557947143@sss.pgh.pa.us
2019-05-15 17:26:52 -04:00
Peter Geoghegan 7505da2f45 Reverse order of newitem nbtree candidate splits.
Commit fab25024, which taught nbtree to choose candidate split points
more carefully, had _bt_findsplitloc() record all possible split points
in an initial pass over a page that is about to be split.  The order
that candidate split points were processed and stored in was assumed to
match the offset number order of split points on an imaginary version of
the page that contains the same items as the original, but also fits
newitem (the item that provoked the split precisely because it didn't
fit).

However, the order of split points in the final array was not quite what
was expected: the split point that makes newitem the firstright item
came after the split point that makes newitem the lastleft item -- not
before.  As a result, _bt_findsplitloc() could get confused about the
leftmost and rightmost tuples among all possible split points recorded
for the page.  This seems to have no appreciable impact on the quality
of the final split point chosen by _bt_findsplitloc(), but it's still
wrong.

To fix, switch the order in which newitem candidate splits are recorded
in.  This also makes it possible to describe candidate split points in
terms of which pair of adjoining tuples enclose the split point within
_bt_findsplitloc(), making it clearer why it's generally safe for
_bt_split() to expect lastleft and firstright tuples.
2019-05-15 12:22:07 -07:00
Andres Freund aa4b8c61d2 Handle table_complete_speculative's succeeded argument as documented.
For some reason both callsite and the implementation for heapam had
the meaning inverted (i.e. succeeded == true was passed in case of
conflict). That's confusing.

I (Andres) briefly pondered whether it'd be better to rename
table_complete_speculative's argument to 'bool specConflict' or such,
but decided not to. The 'complete' in the function name for me makes
`succeeded` sound a bit better.

Reported-By: Ashwin Agrawal, Melanie Plageman, Heikki Linnakangas
Discussion:
   https://postgr.es/m/CALfoeitk7-TACwYv3hCw45FNPjkA86RfXg4iQ5kAOPhR+F1Y4w@mail.gmail.com
   https://postgr.es/m/97673451-339f-b21e-a781-998d06b1067c@iki.fi
2019-05-14 12:19:32 -07:00
Andres Freund 08e2edc076 Add isolation test for INSERT ON CONFLICT speculative insertion failure.
This path previously was not reliably covered. There was some
heuristic coverage via insert-conflict-toast.spec, but that test is
not deterministic, and only tested for a somewhat specific bug.

Backpatch, as this is a complicated and otherwise untested code
path. Unfortunately 9.5 cannot handle two waiting sessions, and thus
cannot execute this test.

Triggered by a conversion with Melanie Plageman.

Author: Andres Freund
Discussion: https://postgr.es/m/CAAKRu_a7hbyrk=wveHYhr4LbcRnRCG=yPUVoQYB9YO1CdUBE9Q@mail.gmail.com
Backpatch: 9.5-
2019-05-14 11:51:29 -07:00
Tom Lane 6d2fba3189 Fix "make clean" to clean out junk files left behind after ssl tests.
We .gitignore'd this junk, but we didn't actually remove it.
2019-05-14 14:28:33 -04:00
Tom Lane fc9a62af3f Move logging.h and logging.c from src/fe_utils/ to src/common/.
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies.  That makes it
pointless to have distinct libraries at all.  The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.

We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)

Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511.  There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.

Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
2019-05-14 14:20:10 -04:00
Tom Lane 53ddefbaf8 Remove pg_rewind's private logging.h/logging.c files.
The existence of these files became rather confusing with the
introduction of a widely-known logging.h header in commit cc8d41511.
(Indeed, there's already some duplicative #includes here, perhaps
betraying such confusion.)  The only thing left in them, after that
commit, is a progress-reporting function that's neither general-purpose
nor tied in any way to other logging infrastructure.  Hence, let's just
move that function to pg_rewind.c, and get rid of the separate files.

Discussion: https://postgr.es/m/3971.1557787914@sss.pgh.pa.us
2019-05-14 13:11:23 -04:00
Tom Lane 7c850320d8 Fix SQL-style substring() to have spec-compliant greediness behavior.
SQL's regular-expression substring() function is defined to have a
pattern argument that's separated into three subpatterns by escape-
double-quote markers; the function result is the part of the input
matching the second subpattern.  The standard makes it clear that
if there is ambiguity about how to match the input to the subpatterns,
the first and third subpatterns should be taken to match the smallest
possible amount of text (i.e., they're "non greedy", in the terms of
our regex code).  We were not doing it that way: the first subpattern
would eat the largest possible amount of text, causing the function
result to be shorter than what the spec requires.

Fix that by attaching explicit greediness quantifiers to the
subpatterns.  (This depends on the regex fix in commit 8a29ed053;
before that, this didn't reliably change the regex engine's behavior.)

Also, by adding parentheses around each subpattern, we ensure that
"|" (OR) in the subpatterns behave sanely.  Previously, "|" in the
first or third subpatterns didn't work.

This patch also makes the function throw error if you write more than
two escape-double-quote markers, and do something sane if you write
just one, and document that behavior.  Previously, an odd number of
markers led to a confusing complaint about unbalanced parentheses,
while extra pairs of markers were just ignored.  (Note that the spec
requires exactly two markers, but we've historically allowed there
to be none, and this patch preserves the old behavior for that case.)

In passing, adjust some substring() test cases that didn't really
prove what they said they were testing for: they used patterns
that didn't match the data string, so that the output would be
NULL whether or not the function was really strict.

Although this is certainly a bug fix, changing the behavior in back
branches seems undesirable: applications could perhaps be depending on
the old behavior, since it's not obviously wrong unless you read the
spec very closely.  Hence, no back-patch.

Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
2019-05-14 11:27:31 -04:00
Tom Lane fb489e4b31 In bootstrap mode, use default signal handling for SIGINT etc.
Previously, the code pointed the standard process-termination signals
to postgres.c's die().  That would typically result in an attempt to
execute a transaction abort, which is not possible in bootstrap mode,
leading to PANIC.  This choice seems to be a leftover from an old code
structure in which the same signal-assignment code was used for many
sorts of auxiliary processes, including interactive standalone
backends.  It's not very sensible for bootstrap mode, which has no
interest in either interactivity or continuing after an error.  We can
get better behavior with less effort by just letting normal process
termination happen, after which the parent initdb process will clean up.

This is basically cosmetic in any case, since initdb will react the
same way whether bootstrap dies on a signal or abort().  Given the
lack of previous complaints, I don't feel a need to back-patch,
even though the behavior is old.

Discussion: https://postgr.es/m/3850b11a.5121.16aaf827e4a.Coremail.thunder1@126.com
2019-05-14 10:22:28 -04:00
Peter Eisentraut 037165ca95 Update SQL features/conformance information to SQL:2016 2019-05-14 15:44:37 +02:00
Peter Eisentraut eb3a1376c9 Update information_schema for SQL:2016
This is mainly a light renumbering to match the sections in the
standard.
2019-05-14 15:44:37 +02:00
Heikki Linnakangas 22251686f0 Detect internal GiST page splits correctly during index build.
As we descend the GiST tree during insertion, we modify any downlinks on
the way down to include the new tuple we're about to insert (if they don't
cover it already). Modifying an existing downlink might cause an internal
page to split, if the new downlink tuple is larger than the old one. If
that happens, we need to back up to the parent and re-choose a page to
insert to. We used to detect that situation, thanks to the NSN-LSN
interlock normally used to detect concurrent page splits, but that got
broken by commit 9155580fd5. With that commit, we now use a dummy constant
LSN value for every page during index build, so the LSN-NSN interlock no
longer works. I thought that was OK because there can't be any other
backends modifying the index during index build, but missed that the
insertion itself can modify the page we're inserting to. The consequence
was that we would sometimes insert the new tuple to an incorrect page, one
whose downlink doesn't cover the new tuple.

To fix, add a flag to the stack that keeps track of the state while
descending tree, to indicate that a page was split, and that we need to
retry the descend from the parent.

Thomas Munro first reported that the contrib/intarray regression test was
failing occasionally on the buildfarm after commit 9155580fd5. The failure
was intermittent, because the gistchoose() function is not deterministic,
and would only occasionally create the right circumstances for this bug to
cause the failure.

Patch by Anastasia Lubennikova, with some changes by me to make it work
correctly also when the internal page split also causes the "grandparent"
to be split.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJRzLo7tZExWfSbwM3XuK7aAK7FhdBV0FLkbUG%2BW0v0zg%40mail.gmail.com
2019-05-14 13:18:44 +03:00
Heikki Linnakangas e95d550bbb Fix comment on when HOT update is possible.
The conditions listed in this comment have changed several times, and at
some point the thing that the "if so" referred to was negated.

The text was OK up to 9.6. It was differently wrong in v10, v11 and
master, so fix in all those versions.
2019-05-14 13:06:28 +03:00
Etsuro Fujita 7d9eca59cf Fix typo. 2019-05-14 16:05:37 +09:00
Michael Paquier 7e19929ea2 Fix duplicated words in comments
Author: Stephen Amell
Discussion: https://postgr.es/m/539fa271-21b3-777e-a468-d96cffe9c768@gmail.com
2019-05-14 09:37:35 +09:00
Peter Geoghegan ae7291acbc Standardize ItemIdData terminology.
The term "item pointer" should not be used to refer to ItemIdData
variables, since that is needlessly ambiguous.  Only
ItemPointerData/ItemPointer variables should be called item pointers.

To fix, establish the convention that ItemIdData variables should always
be referred to either as "item identifiers" or "line pointers".  The
term "item identifier" already predominates in docs and translatable
messages, and so should be the preferred alternative there.

Discussion: https://postgr.es/m/CAH2-Wz=c=MZQjUzde3o9+2PLAPuHTpVZPPdYxN=E4ndQ2--8ew@mail.gmail.com
2019-05-13 15:53:39 -07:00
Tom Lane 32ebb35128 Fix logical replication's ideas about which type OIDs are built-in.
Only hand-assigned type OIDs should be presumed to match across different
PG servers; those assigned during genbki.pl or during initdb are likely
to change due to addition or removal of unrelated objects.

This means that the cutoff should be FirstGenbkiObjectId (in HEAD)
or FirstBootstrapObjectId (before that), not FirstNormalObjectId.
Compare postgres_fdw's is_builtin() test.

It's likely that this error has no observable consequence in a
normally-functioning system, since ATM the only affected type OIDs are
system catalog rowtypes and information_schema types, which would not
typically be interesting for logical replication.  But you could
probably break it if you tried hard, so back-patch.

Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-13 17:23:00 -04:00
Tom Lane e34ee993fb Improve commentary about hack in is_publishable_class().
The FirstNormalObjectId test here is a kluge that needs to go away,
but the only substitute we can think of is to add a column to pg_class,
which will take more work than can be handled right now.  Add some
commentary in the meanwhile.

Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-13 17:05:48 -04:00
Peter Geoghegan 9b42e71376 Don't leave behind junk nbtree pages during split.
Commit 8fa30f906b reduced the elevel of a number of "can't happen"
_bt_split() errors from PANIC to ERROR.  At the same time, the new right
page buffer for the split could continue to be acquired well before the
critical section.  This was possible because it was relatively
straightforward to make sure that _bt_split() could not throw an error,
with a few specific exceptions.  The exceptional cases were safe because
they involved specific, well understood errors, making it possible to
consistently zero the right page before actually raising an error using
elog().  There was no danger of leaving around a junk page, provided
_bt_split() stuck to this coding rule.

Commit 8224de4f, which introduced INCLUDE indexes, added code to make
_bt_split() truncate away non-key attributes.  This happened at a point
that broke the rule around zeroing the right page in _bt_split().  If
truncation failed (perhaps due to palloc() failure), that would result
in an errant right page buffer with junk contents.  This could confuse
VACUUM when it attempted to delete the page, and should be avoided on
general principle.

To fix, reorganize _bt_split() so that truncation occurs before the new
right page buffer is even acquired.  A junk page/buffer will not be left
behind if _bt_nonkey_truncate()/_bt_truncate() raise an error.

Discussion: https://postgr.es/m/CAH2-WzkcWT_-NH7EeL=Az4efg0KCV+wArygW8zKB=+HoP=VWMw@mail.gmail.com
Backpatch: 11-, where INCLUDE indexes were introduced.
2019-05-13 10:27:59 -07:00
Robert Haas 221b377f09 Improve comment for att_isnull.
The comment implies that a 1 in the null bitmap indicates a null value,
but actually a 0 in the null bitmap indicates a null value. Try to
be more clear.

Patch by me; proposed wording reviewed by Alvaro Herrera and Tom Lane.

Discussion: http://postgr.es/m/CA+TgmobHOP8r6cG+UnsDFMrS30-m=jRrCBhgw-nFkn0k9QnFsg@mail.gmail.com
2019-05-13 13:13:24 -04:00
Tom Lane ddf927fb13 Fix misuse of an integer as a bool.
pgtls_read_pending is declared to return bool, but what the underlying
SSL_pending function returns is a count of available bytes.

This is actually somewhat harmless if we're using C99 bools, but in
the back branches it's a live bug: if the available-bytes count happened
to be a multiple of 256, it would get converted to a zero char value.
On machines where char is signed, counts of 128 and up could misbehave
as well.  The net effect is that when using SSL, libpq might block
waiting for data even though some has already been received.

Broken by careless refactoring in commit 4e86f1b16, so back-patch
to 9.5 where that came in.

Per bug #15802 from David Binderman.

Discussion: https://postgr.es/m/15802-f0911a97f0346526@postgresql.org
2019-05-13 10:53:19 -04:00
Michael Paquier 1171dbde2d Fix incorrect return value in JSON equality function for scalars
equalsJsonbScalarValue() uses a boolean as return type, however for one
code path -1 gets returned, which is confusing.  The origin of the
confusion is visibly that this code got copy-pasted from
compareJsonbScalarValue() since it has been introduced in d1d50bf.

No backpatch, as this is only cosmetic.

Author: Rikard Falkeborn
Discussion: https://postgr.es/m/CADRDgG7mJnek6HNW13f+LF6V=6gag9PM+P7H5dnyWZAv49aBGg@mail.gmail.com
2019-05-13 09:11:50 +09:00
Tom Lane 8a29ed0530 Fix misoptimization of "{1,1}" quantifiers in regular expressions.
A bounded quantifier with m = n = 1 might be thought a no-op.  But
according to our documentation (which traces back to Henry Spencer's
original man page) it still imposes greediness, or non-greediness in the
case of the non-greedy variant "{1,1}?", on whatever it's attached to.

This turns out not to work though, because parseqatom() optimizes away
the m = n = 1 case without regard for whether it's supposed to change
the greediness of the argument RE.

We can fix this by just not applying the optimization when the greediness
needs to change; the subsequent general cases handle it fine.

The three cases in which we can still apply the optimization are
(a) no quantifier, or quantifier does not impose a preference;
(b) atom has no greediness property, implying it cannot match a
variable amount of text anyway; or
(c) quantifier's greediness is same as atom's.
Note that in most cases where one of these applies, we'd have exited
earlier in the "not a messy case" fast path.  I think it's now only
possible to get to the optimization when the atom involves capturing
parentheses or a non-top-level backref.

Back-patch to all supported branches.  I'd ordinarily be hesitant to
put a subtle behavioral change into back branches, but in this case
it's very hard to see a reason why somebody would write "{1,1}?" unless
they're trying to get the documented change-of-greediness behavior.

Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
2019-05-12 18:53:38 -04:00
Noah Misch d02768ddd1 Fail pgwin32_message_to_UTF16() for SQL_ASCII messages.
The function had been interpreting SQL_ASCII messages as UTF8, throwing
an error when they were invalid UTF8.  The new behavior is consistent
with pg_do_encoding_conversion().  This affects LOG_DESTINATION_STDERR
and LOG_DESTINATION_EVENTLOG, which will send untranslated bytes to
write() and ReportEventA().  On buildfarm member bowerbird, enabling
log_connections caused an error whenever the role name was not valid
UTF8.  Back-patch to 9.4 (all supported versions).

Discussion: https://postgr.es/m/20190512015615.GD1124997@rfd.leadboat.com
2019-05-12 10:33:05 -07:00
Tom Lane 85ccb6899c Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b37 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0d deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-11 21:27:29 -04:00
Noah Misch 54c2ecb567 Honor TEMP_CONFIG in TAP suites.
The buildfarm client uses TEMP_CONFIG to implement its extra_config
setting.  Except for stats_temp_directory, extra_config now applies to
TAP suites; extra_config values seen in the past month are compatible
with this.  Back-patch to 9.6, where PostgresNode was introduced, so the
buildfarm can rely on it sooner.

Reviewed by Andrew Dunstan and Tom Lane.

Discussion: https://postgr.es/m/20181229021950.GA3302966@rfd.leadboat.com
2019-05-11 00:22:38 -07:00
Michael Paquier e51bad8fb4 Fix error reporting in reindexdb
When failing to reindex a table or an index, reindexdb would generate an
extra error message related to a database failure, which is misleading.

Backpatch all the way down, as this has been introduced by 85e9a5a0.

Discussion: https://postgr.es/m/CAOBaU_Yo61RwNO3cW6WVYWwH7EYMPuexhKqufb2nFGOdunbcHw@mail.gmail.com
Author: Julien Rouhaud
Reviewed-by: Daniel Gustafsson, Álvaro Herrera, Tom Lane, Michael
Paquier
Backpatch-through: 9.4
2019-05-11 13:00:54 +09:00
Andres Freund 5997a8f4d7 Remove reindex_catalog test from test schedules.
As none of the approaches for avoiding the deadlock issues seem
promising enough, and all the expected reindex related changes have
been made, apply 60c2951e1b to master as well.

Discussion: https://postgr.es/m/4622.1556982247@sss.pgh.pa.us
2019-05-10 12:44:31 -07:00
Tom Lane 610747d86e Cope with EINVAL and EIDRM shmat() failures in PGSharedMemoryAttach.
There's a very old race condition in our code to see whether a pre-existing
shared memory segment is still in use by a conflicting postmaster: it's
possible for the other postmaster to remove the segment in between our
shmctl() and shmat() calls.  It's a narrow window, and there's no risk
unless both postmasters are using the same port number, but that's possible
during parallelized "make check" tests.  (Note that while the TAP tests
take some pains to choose a randomized port number, pg_regress doesn't.)
If it does happen, we treated that as an unexpected case and errored out.

To fix, allow EINVAL to be treated as segment-not-present, and the same
for EIDRM on Linux.  AFAICS, the considerations here are basically
identical to the checks for acceptable shmctl() failures, so I documented
and coded it that way.

While at it, adjust PGSharedMemoryAttach's API to remove its undocumented
dependency on UsedShmemSegAddr in favor of passing the attach address
explicitly.  This makes it easier to be sure we're using a null shmaddr
when probing for segment conflicts (thus avoiding questions about what
EINVAL means).  I don't think there was a bug there, but it required
fragile assumptions about the state of UsedShmemSegAddr during
PGSharedMemoryIsInUse.

Commit c09850992 may have made this failure more probable by applying
the conflicting-segment tests more often.  Hence, back-patch to all
supported branches, as that was.

Discussion: https://postgr.es/m/22224.1557340366@sss.pgh.pa.us
2019-05-10 14:56:41 -04:00
Michael Paquier 752f06443f Fix and improve description of locktag types in lock.h
The description of the lock type for speculative insertions was
incorrect, being copy-pasted from another one.

As discussed, also move the description for all the fields of lock tag
types from the structure listing lock tag types to the set of macros
setting each LOCKTAG.

Author: John Naylor
Discussion: https://postgr.es/m/CACPNZCtA0-ybaC4fFfaDq_8p_TUOLvGxZH9Dm-=TMHZJarBa7Q@mail.gmail.com
2019-05-10 09:35:27 +09:00
Michael Paquier 508300e2e1 Improve and fix some error handling for REINDEX INDEX/TABLE CONCURRENTLY
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY.  First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations.  Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs.  The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.

While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs.  Some error messages are improved,
with wording suggestion coming from Tom Lane.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
2019-05-10 08:18:46 +09:00
Tom Lane 24c19e9f66 Repair issues with faulty generation of merge-append plans.
create_merge_append_plan failed to honor the CP_EXACT_TLIST flag:
it would generate the expected targetlist but then it felt free to
add resjunk sort targets to it.  This demonstrably leads to assertion
failures in v11 and HEAD, and it's probably just accidental that we
don't see the same in older branches.  I've not looked into whether
there would be any real-world consequences in non-assert builds.
In HEAD, create_append_plan has sprouted the same problem, so fix
that too (although we do not have any test cases that seem able to
reach that bug).  This is an oversight in commit 3fc6e2d7f which
invented the CP_EXACT_TLIST flag, so back-patch to 9.6 where that
came in.

convert_subquery_pathkeys would create pathkeys for subquery output
values if they match any EquivalenceClass known in the outer query
and are available in the subquery's syntactic targetlist.  However,
the second part of that condition is wrong, because such values might
not appear in the subquery relation's reltarget list, which would
mean that they couldn't be accessed above the level of the subquery
scan.  We must check that they appear in the reltarget list, instead.
This can lead to dropping knowledge about the subquery's sort
ordering, but I believe it's okay, because any sort key that the
outer query actually has any interest in would appear in the
reltarget list.

This second issue is of very long standing, but right now there's no
evidence that it causes observable problems before 9.6, so I refrained
from back-patching further than that.  We can revisit that choice if
somebody finds a way to make it cause problems in older branches.
(Developing useful test cases for these issues is really problematic;
fixing convert_subquery_pathkeys removes the only known way to exhibit
the create_merge_append_plan bug, and neither of the test cases added
by this patch causes a problem in all branches, even when considering
the issues separately.)

The second issue explains bug #15795 from Suresh Kumar R ("could not
find pathkey item to sort" with nested DISTINCT queries).  I stumbled
across the first issue while investigating that.

Discussion: https://postgr.es/m/15795-fadb56c8e44ee73c@postgresql.org
2019-05-09 16:53:05 -04:00
Etsuro Fujita edbcbe277d postgres_fdw: Fix cost estimation for aggregate pushdown.
In commit 7012b132d0, which added support for aggregate pushdown in
postgres_fdw, the expense of evaluating the final scan/join target
computed by make_group_input_target() was not accounted for at all in
costing aggregate pushdown paths with local statistics.  The right fix
for this would be to have a separate upper stage to adjust the final
scan/join relation (see comments for apply_scanjoin_target_to_paths());
but for now, fix by adding the tlist eval cost when costing aggregate
pushdown paths with local statistics.

Apply this to HEAD only to avoid destabilizing existing plan choices.

Author: Etsuro Fujita
Reviewed-By: Antonin Houska
Discussion: https://postgr.es/m/5C66A056.60007%40lab.ntt.co.jp
2019-05-09 18:39:23 +09:00
Thomas Munro 47a338cfcd Fix SxactGlobalXmin tracking.
Commit bb16aba50 broke the code that maintains SxactGlobalXmin.  It
could get stuck when a well-timed READ ONLY transaction runs.  If
SxactGlobalXmin stops advancing, transactions on the
FinishedSerializableTransactions queue are never cleaned up, so
resources are effectively leaked.  Revert that hunk of the commit.

Also revert another similar hunk that was probably harmless, but
unnecessary and unjustified, relating to the DOOMED flag in case of
RO_SAFE early release.

Author: Thomas Munro
Reported-by: Tom Lane
Discussion: https://postgr.es/m/16170.1557251214%40sss.pgh.pa.us
2019-05-09 20:32:26 +12:00
Peter Eisentraut cd805f46d8 pg_controldata: Add common gettext flags
So it picks up strings in pg_log_* calls.  This was forgotten when it
was added to all other relevant subdirectories.
2019-05-09 09:16:59 +02:00
Peter Eisentraut 02daece4ab Fix grammar in error message 2019-05-09 09:16:59 +02:00
Tom Lane 2d7d946cd3 Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.

The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed.  That's even sillier.  With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.

With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID.  Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)".  But keep
IsCatalogRelation as a convenience function.

This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid.  The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.

While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.

Also improve the comments in catalog.c.

There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId.  I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId.  But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.

Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-08 23:27:38 -04:00
Michael Paquier 3ae3c18b36 Fix error status of vacuumdb when multiple jobs are used
When running a batch of VACUUM or ANALYZE commands on a given database,
there were cases where it is possible to have vacuumdb not report an
error where it actually should, leading to incorrect status results.

Author: Julien Rouhaud
Reviewed-by: Amit Kapila, Michael Paquier
Discussion: https://postgr.es/m/CAOBaU_ZuTwz7CtqLYJ1Ouuh272bTQPLN8b1bAPk0bCBm4PDMTQ@mail.gmail.com
Backpatch-through: 9.5
2019-05-09 10:29:10 +09:00
Peter Geoghegan d95e36dc38 Remove obsolete nbtree split REDO routine comment.
Commit dd299df818, which added suffix truncation to nbtree, simplified
the WAL record format used by page splits.  It became necessary to
explicitly WAL-log the new high key for the left half of a split in all
cases, which relieved the REDO routine from having to reconstruct a new
high key for the left page by copying the first item from the right
page.  Remove a comment that referred to the previous practice.
2019-05-08 12:47:20 -07:00
Alvaro Herrera 61639816b8 Fix error messages
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.

Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6).
2019-05-08 13:20:16 -04:00
Peter Eisentraut add85ead4a Fix table lock levels for REINDEX INDEX CONCURRENTLY
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX.  However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only.  This would lead to lock upgrades later,
leading to possible deadlocks.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
2019-05-08 14:30:23 +02:00
Thomas Munro 8efe710d9c Probe only 127.0.0.1 when looking for ports on Unix.
Commit c0985099, later adjusted by commit 4ab02e81, probed 0.0.0.0
in addition to 127.0.0.1, for the benefit of Windows build farm
animals.  It isn't really useful on Unix systems, and turned out to
be a bit inconvenient to users of some corporate firewall software.
Switch back to probing just 127.0.0.1 on non-Windows systems.

Back-patch to 9.6, like the earlier changes.

Discussion: https://postgr.es/m/CA%2BhUKG%2B21EPwfgs4m%2BtqyRtbVqkOUvP8QQ8sWk9%2Bh55Aub1H3A%40mail.gmail.com
2019-05-08 22:02:47 +12:00
Etsuro Fujita b7434dc007 Add missing periods to comments. 2019-05-08 16:49:09 +09:00
Heikki Linnakangas 256be1050c Remove leftover reference to old "flat file" mechanism in a comment.
The flat file mechanism was removed in PostgreSQL 9.0.
2019-05-08 09:32:34 +03:00
Peter Geoghegan d65b5ccad6 Correct obsolete nbtsort.c minimum key comment.
It is no longer possible under any circumstances for nbtree code to
reconstruct a strict lower bound key (parent page's pivot tuple key) for
a right sibling page by retrieving the first item in the right sibling
page.
2019-05-07 21:42:12 -07:00
Alexander Korotkov e5f9786317 Add jsonpath_encoding_1.out changes missed in 29ceacc3f9
Reported-by: Tom Lane
Discussion: https://postgr.es/m/14305.1557268259%40sss.pgh.pa.us
2019-05-08 01:55:31 +03:00
Alexander Korotkov 29ceacc3f9 Improve error reporting in jsonpath
This commit contains multiple improvements to error reporting in jsonpath
including but not limited to getting rid of following things:

 * definition of error messages in macros,
 * errdetail() when valueable information could fit to errmsg(),
 * word "singleton" which is not properly explained anywhere,
 * line breaks in error messages.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/14890.1555523005%40sss.pgh.pa.us
Author: Alexander Korotkov
Reviewed-by: Tom Lane
2019-05-08 01:02:59 +03:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Magnus Hagander 98719af6c2 Fix typos and clarify a comment
Author: Daniel Gustafsson <daniel@yesql.se>
2019-05-07 18:26:09 +02:00
Tom Lane 8d0ddccec6 Avoid "invalid memory alloc request size" while reading pg_stat_activity.
On a 64-bit machine, if you set track_activity_query_size and
max_connections such that their product exceeds 1GB, shared memory
setup will still succeed (given enough RAM), but attempts to read
pg_stat_activity fail with "invalid memory alloc request size".
Work around that by using MemoryContextAllocHuge to allocate the
local copy of the activity strings.  Using the "huge" API costs us
nothing extra in normal cases, and it seems better than throwing
an error and/or explaining to people why they can't do this.

This situation seems insanely profligate today, but who knows what
people will consider normal in ten or twenty years?  So let's fix it
in HEAD but not worry about a back-patch.

Per report from James Tomson.

Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com
2019-05-07 11:41:37 -04:00
Amit Kapila 7db0cde6b5 Revert "Avoid the creation of the free space map for small heap relations".
This feature was using a process local map to track the first few blocks
in the relation.  The map was reset each time we get the block with enough
freespace.  It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created.  The new
design would be better both in terms of API design and performance.

List of commits reverted, in reverse chronological order:

06c8a5090e  Improve code comments in b0eaa4c51b.
13e8643bfc  During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9  Add more tests for FSM.
9c32e4c350  Clear the local map when not used.
29d108cdec  Update the documentation for FSM behavior..
08ecdfe7e5  Make FSM test portable.
b0eaa4c51b  Avoid creation of the free space map for small heap relations.

Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
2019-05-07 09:30:24 +05:30
Michael Paquier af82f95abb Remove some code related to 7.3 and older servers from tools of src/bin/
This code was broken as of 582edc3, and is most likely not used anymore.
Note that pg_dump supports servers down to 8.0, and psql has code to
support servers down to 7.4.

Author: Julien Rouhaud
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAOBaU_Y5y=zo3+2gf+2NJC1pvMYPcbRXoQaPXx=U7+C8Qh4CzQ@mail.gmail.com
2019-05-07 09:39:39 +09:00
Alvaro Herrera a1ec7402e9 Revert "Make pg_dump emit ATTACH PARTITION instead of PARTITION OF"
... and fallout (from branches 10, 11 and master).  The change was
ill-considered, and it broke a few normal use cases; since we don't have
time to fix it, we'll try again after this week's minor releases.

Reported-by: Rushabh Lathia
Discussion: https://postgr.es/m/CAGPqQf0iQV=PPOv2Btog9J9AwOQp6HmuVd6SbGTR_v3Zp2XT1w@mail.gmail.com
2019-05-06 12:23:49 -04:00
Michael Paquier 91248608a6 Add tests for error message generation in partition tuple routing
This adds extra tests for the error message generated for partition
tuple routing in the executor, using more than three levels of
partitioning including partitioned tables with no partitions.  These
tests have been added to fix CVE-2019-10129 on REL_11_STABLE.  HEAD has
no active bugs in this area, but it lacked coverage.

Author: Michael Paquier
Reviewed-by: Noah Misch
Security: CVE-2019-10129
2019-05-06 21:44:24 +09:00
Dean Rasheed a0905056fd Use checkAsUser for selectivity estimator checks, if it's set.
In examine_variable() and examine_simple_variable(), when checking the
user's table and column privileges to determine whether to grant
access to the pg_statistic data, use checkAsUser for the privilege
checks, if it's set. This will be the case if we're accessing the
table via a view, to indicate that we should perform privilege checks
as the view owner rather than the current user.

This change makes this planner check consistent with the check in the
executor, so the planner will be able to make use of statistics if the
table is accessible via the view. This fixes a performance regression
introduced by commit e2d4ef8de8, which affects queries against
non-security barrier views in the case where the user doesn't have
privileges on the underlying table, but the view owner does.

Note that it continues to provide the same safeguards controlling
access to pg_statistic for direct table access (in which case
checkAsUser won't be set) and for security barrier views, because of
the nearby checks on rte->security_barrier and rte->securityQuals.

Back-patch to all supported branches because e2d4ef8de8 was.

Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.
2019-05-06 11:54:32 +01:00
Dean Rasheed 1aebfbea83 Fix security checks for selectivity estimation functions with RLS.
In commit e2d4ef8de8, security checks were added to prevent
user-supplied operators from running over data from pg_statistic
unless the user has table or column privileges on the table, or the
operator is leakproof. For a table with RLS, however, checking for
table or column privileges is insufficient, since that does not
guarantee that the user has permission to view all of the column's
data.

Fix this by also checking for securityQuals on the RTE, and insisting
that the operator be leakproof if there are any. Thus the
leakproofness check will only be skipped if there are no securityQuals
and the user has table or column privileges on the table -- i.e., only
if we know that the user has access to all the data in the column.

Back-patch to 9.5 where RLS was added.

Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.

Security: CVE-2019-10130
2019-05-06 11:38:43 +01:00
Tom Lane bd5e8b627b Bring pg_nextoid()'s error messages into line with message style guide.
Noticed while reviewing nearby code.  Given all the disclaimers about
this not being meant as user-facing code, I wonder whether we should
make these non-translatable?  But in any case there's little excuse
for them not to be good English.
2019-05-05 17:06:53 -04:00
Tom Lane 9691aa72e2 Fix style violations in syscache lookups.
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result.  A tiny minority of calls
creatively did it differently.  Bring them into line with the rest.

This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.

Back-patch to v11 just to keep the branches in sync.  (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)

Per searching to see if anyplace else had made the same error
repaired in 62148c352.
2019-05-05 13:10:07 -04:00
Tom Lane 62148c3520 Add check for syscache lookup failure in update_relispartition().
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4).  A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.

Per Coverity.  Back-patch to v11 where the broken code originated.
2019-05-05 12:44:32 -04:00
Michael Paquier 84e4570da9 Fix set of issues with memory-allocation system calls in frontend code
Like the backend, the frontend has wrappers on top of malloc() and such
whose use is recommended.  Particularly, it is possible to do memory
allocation without issuing an error.  Some binaries missed the use of
those wrappers, so let's fix the gap for consistency.

This also fixes two latent bugs:
- In pg_dump/pg_dumpall when parsing an ACL item, on an out-of-memory
error for strdup(), the code considered the failure as a ACL parsing
problem instead of an actual OOM.
- In pg_waldump, an OOM when building the target directory string would
cause a crash.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/gY0y9xenfoBPc-Tufsr2Zg-MmkrJslm0Tw_CMg4p_j58-k_PXNC0klMdkKQkg61BkXC9_uWo-DcUzfxnHqpkpoR5jjVZrPHqKYikcHIiONhg=@yesql.se
2019-05-04 16:32:19 +09:00
Noah Misch 34ff542a71 MSVC: Build ~35% faster by calling dumpbin just once per directory.
Peifeng Qiu

Discussion: https://postgr.es/m/CABmtVJiKXQjast0dQD-8KAtfm8XmyYxo-4Dc7+M+fBr8JRTqkw@mail.gmail.com
2019-05-03 21:56:47 -07:00
Peter Geoghegan 7b37f4b02e Correct more obsolete nbtree page split comments.
Commit 3f342839 corrected obsolete comments about buffer locks at the
main _bt_insert_parent() call site, but missed similar obsolete comments
above _bt_insert_parent() itself.  Both sets of comments were rendered
obsolete by commit 40dae7ec53, which made the nbtree page split
algorithm more robust.  Fix the comments that were missed the first time
around now.

In passing, refine a related _bt_insert_parent() comment about
re-finding the parent page to insert new downlink.
2019-05-03 13:34:45 -07:00
Tom Lane f884dca495 Remove RelationSetIndexList().
In the wake of commit f912d7dec, RelationSetIndexList isn't used any
more.  It was always a horrid wart, so getting rid of it is very nice.
We can also convert rd_indexvalid back to a plain boolean.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-03 10:26:14 -04:00
Tom Lane f912d7dec2 Fix reindexing of pg_class indexes some more.
Commits 3dbb317d3 et al failed under CLOBBER_CACHE_ALWAYS testing.
Investigation showed that to reindex pg_class_oid_index, we must
suppress accesses to the index (via SetReindexProcessing) before we call
RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement
therein; otherwise, relcache reloads happening within the CCI may try to
fetch pg_class rows using the index's new relfilenode value, which is as
yet an empty file.

Of course, the point of 3dbb317d3 was that that ordering didn't work
either, because then RelationSetNewRelfilenode's own update of the index's
pg_class row cannot access the index, should it need to.

There are various ways we might have got around that, but Andres Freund
came up with a brilliant solution: for a mapped index, we can really just
skip the pg_class update altogether.  The only fields it was actually
changing were relpages etc, but it was just setting them to zeroes which
is useless make-work.  (Correct new values will be installed at the end
of index build.)  All pg_class indexes are mapped and probably always will
be, so this eliminates the problem by removing work rather than adding it,
always a pleasant outcome.  Having taught RelationSetNewRelfilenode to do
it that way, we can revert the code reordering in reindex_index.  (But
I left the moved setup code where it was; there seems no reason why it
has to run without use of the old index.  If you're trying to fix a
busted pg_class index, you'll have had to disable system index use
altogether to get this far.)

Moreover, this means we don't need RelationSetIndexList at all, because
reindex_relation's hacking to make "REINDEX TABLE pg_class" work is
likewise now unnecessary.  We'll leave that code in place in the back
branches, but a follow-on patch will remove it in HEAD.

In passing, do some minor cleanup for commit 5c1560606 (in HEAD only),
notably removing a duplicate newrnode assignment.

Patch by me, using a core idea due to Andres Freund.  Back-patch to all
supported branches, as 3dbb317d3 was.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-02 19:11:28 -04:00
Alvaro Herrera 2bf372a4ae heap_prepare_freeze_tuple: Simplify coding
Commit d2599ecfcc introduced some contorted, confused code around:
readers would think that it's possible for HeapTupleHeaderGetXmin return
a non-frozen value for some frozen tuples, which would be disastrous.
There's no actual bug, but it seems better to make it clearer.

Per gripe from Tom Lane and Andres Freund.
Discussion: https://postgr.es/m/30116.1555430496@sss.pgh.pa.us
2019-05-02 16:14:08 -04:00
Peter Geoghegan 6dd86c269d Fix nbtsort.c's page space accounting.
Commit dd299df818, which made heap TID a tiebreaker nbtree index
column, introduced new rules on page space management to make suffix
truncation safe.  In general, suffix truncation needs to have a small
amount of extra space available on the new left page when splitting a
leaf page.  This is needed in case it turns out that truncation cannot
even "truncate away the heap TID column", resulting in a
larger-than-firstright leaf high key with an explicit heap TID
representation.

Despite all this, CREATE INDEX/nbtsort.c did not account for the
possible need for extra heap TID space on leaf pages when deciding
whether or not a new item could fit on current page.  This could lead to
"failed to add item to the index page" errors when CREATE
INDEX/nbtsort.c tried to finish off a leaf page that lacked space for a
larger-than-firstright leaf high key (it only had space for firstright
tuple, which was just short of what was needed following "truncation").

Several conditions needed to be met all at once for CREATE INDEX to
fail.  The problem was in the hard limit on what will fit on a page,
which tends to be masked by the soft fillfactor-wise limit.  The easiest
way to recreate the problem seems to be a CREATE INDEX on a low
cardinality text column, with tuples that are of non-uniform width,
using a fillfactor of 100.

To fix, bring nbtsort.c in line with nbtsplitloc.c, which already
pessimistically assumes that all leaf page splits will have high keys
that have a heap TID appended.

Reported-By: Andreas Joseph Krogh
Discussion: https://postgr.es/m/VisenaEmail.c5.3ee7fe277d514162.16a6d785bea@tc7-visena
2019-05-02 12:33:35 -07:00
Robert Haas dd69597988 Fix some problems with VACUUM (INDEX_CLEANUP FALSE).
The new nleft_dead_tuples and nleft_dead_itemids fields are confusing
and do not seem like the correct way forward.  One of them is tested
via an assertion that can fail, as it has already done on buildfarm
member topminnow.  Remove the assertion and the fields.

Change the logic for the case where a tuple is not initially pruned
by heap_page_prune but later diagnosed HEAPTUPLE_DEAD by
HeapTupleSatisfiesVacuum.  Previously, tupgone = true was set in
that case, which leads to treating the tuple as one that will be
removed.  In a normal vacuum, that's OK, because we'll remove
index entries for it and then the second heap pass will remove the
tuple itself, but when index cleanup is disabled, those things
don't happen, so we must instead treat it as a recently-dead
tuple that we have voluntarily chosen to keep.

Report and analysis by Tom Lane.  This patch loosely based on one
from Masahiko Sawada, but I changed most of it.
2019-05-02 10:07:13 -04:00
Magnus Hagander 659e53498c Fix union for pgstat message types
The message type for temp files and for checksum failures were missing
from the union. Due to the coding style used there was no compiler error
when this happend. So change the code to actively use the union thereby
producing a compiler error if the same mistake happens again, suggested
by Tom Lane.

Author: Julien Rouhaud
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
2019-05-01 12:30:44 +02:00
Andres Freund 809c9b48f4 Run catalog reindexing test from 3dbb317d32 serially, to avoid deadlocks.
The tests turn out to cause deadlocks in some circumstances. Fairly
reproducibly so with -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE.  Some of the deadlocks may be hard to fix
without disproportionate measures, but others probably should be fixed
- but not in 12.

We discussed removing the new tests until we can fix the issues
underlying the deadlocks, but results from buildfarm animal
markhor (which runs with CLOBBER_CACHE_ALWAYS) indicates that there
might be a more severe, as of yet undiagnosed, issue (including on
stable branches) with reindexing catalogs. The failure is:
ERROR: could not read block 0 in file "base/16384/28025": read only 0 of 8192 bytes
Therefore it seems advisable to keep the tests.

It's not certain that running the tests in isolation removes the risk
of deadlocks. It's possible that additional locks are needed to
protect against a concurrent auto-analyze or such.

Per discussion with Tom Lane.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
Backpatch: 9.4-, like 3dbb317d3
2019-04-30 17:45:32 -07:00
Andres Freund 4b40d40b30 Fix unused variable compiler warning in !debug builds.
Introduced in 3dbb317d3.  Fix by using the new local variable in more
places.

Reported-By: Bruce Momjian (off-list)
Backpatch: 9.4-, like 3dbb317d3
2019-04-30 17:45:32 -07:00
Andres Freund a0b5bb6e02 Improve comment spelling and style in llvmjit_deform.c.
Author: Justin Pryzby
Discussion:
    https://postgr.es/m/20190408141828.GE10080@telsasoft.com
    https://postgr.es/m/20181127184133.GM10913@telsasoft.com
2019-04-30 16:20:07 -07:00
Andres Freund 3a48005b00 Improve code inferring length of bitmap for JITed tuple deforming.
While discussing comment improvements (see next commit) by Justin
Pryzby, Tom complained about a few details of the logic to infer the
length of the NULL bitmap when building the JITed tuple deforming
function. That bitmap allows to avoid checking the tuple header's
natts, a check which often causes a pipeline stall

Improvements:
a) As long as missing columns aren't taken into account, we can
   continue to infer the length of the NULL bitmap from NOT NULL
   columns following it. Previously we stopped at the first missing
   column.  It's unlikely to matter much in practice, but the
   alternative would have been to document why we stop.
b) For robustness reasons it seems better to also check against
   attisdropped - RemoveAttributeById() sets attnotnull to false, but
   an additional check is trivial.
c) Improve related comments

Discussion: https://postgr.es/m/20637.1555957068@sss.pgh.pa.us
Backpatch: -
2019-04-30 16:20:07 -07:00
Tom Lane e03ff73969 Clean up handling of constraint_exclusion and enable_partition_pruning.
The interaction of these parameters was a bit confused/confusing,
and in fact v11 entirely misses the opportunity to apply partition
constraints when a partition is accessed directly (rather than
indirectly from its parent).

In HEAD, establish the principle that enable_partition_pruning controls
partition pruning and nothing else.  When accessing a partition via its
parent, we do partition pruning (if enabled by enable_partition_pruning)
and then there is no need to consider partition constraints in the
constraint_exclusion logic.  When accessing a partition directly, its
partition constraints are applied by the constraint_exclusion logic,
only if constraint_exclusion = on.

In v11, we can't have such a clean division of these GUCs' effects,
partly because we don't want to break compatibility too much in a
released branch, and partly because the clean coding requires
inheritance_planner to have applied partition pruning to a partitioned
target table, which it doesn't in v11.  However, we can tweak things
enough to cover the missed case, which seems like a good idea since
it's potentially a performance regression from v10.  This patch keeps
v11's previous behavior in which enable_partition_pruning overrides
constraint_exclusion for an inherited target table, though.

In HEAD, also teach relation_excluded_by_constraints that it's okay to use
inheritable constraints when trying to prune a traditional inheritance
tree.  This might not be thought worthy of effort given that that feature
is semi-deprecated now, but we have enough infrastructure that it only
takes a couple more lines of code to do it correctly.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/9813f079-f16b-61c8-9ab7-4363cab28d80@lab.ntt.co.jp
Discussion: https://postgr.es/m/29069.1555970894@sss.pgh.pa.us
2019-04-30 15:03:50 -04:00
Alvaro Herrera 9f8b717a80 Message style fixes 2019-04-30 10:33:37 -04:00
Alvaro Herrera 9a83afecb7 Widen tuple counter variables from long to int64
Mistake in ab0dfc961b6a; progress reporting would have wrapped around
for indexes created with more than 2^31 tuples.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=WbNxc5ob5NJ9yqo2RMJ0q4HXDS30GVCobeCvC9A1L9A@mail.gmail.com
2019-04-30 10:27:38 -04:00
Andres Freund 3dbb317d32 Fix potential assertion failure when reindexing a pg_class index.
When reindexing individual indexes on pg_class it was possible to
either trigger an assertion failure:
TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((index)->rd_id)))

That's because reindex_index() called SetReindexProcessing() - which
enables an asserts ensuring no index insertions happen into the index
- before calling RelationSetNewRelfilenode(). That not correct for
indexes on pg_class, because RelationSetNewRelfilenode() updates the
relevant pg_class row, which needs to update the indexes.

The are two reasons this wasn't noticed earlier. Firstly the bug
doesn't trigger when reindexing all of pg_class, as reindex_relation
has code "hiding" all yet-to-be-reindexed indexes. Secondly, the bug
only triggers when the the update to pg_class doesn't turn out to be a
HOT update - otherwise there's no index insertion to trigger the
bug. Most of the time there's enough space, making this bug hard to
trigger.

To fix, move RelationSetNewRelfilenode() to before the
SetReindexProcessing() (and, together with some other code, to outside
of the PG_TRY()).

To make sure the error checking intended by SetReindexProcessing() is
more robust, modify CatalogIndexInsert() to check
ReindexIsProcessingIndex() even when the update is a HOT update.

Also add a few regression tests for REINDEXing of system catalogs.

The last two improvements would have prevented some of the issues
fixed in 5c1560606d from being introduced in the first place.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
Backpatch: 9.4-, the bug is present in all branches
2019-04-29 19:42:08 -07:00
Andres Freund 5c1560606d Fix several recently introduced issues around handling new relation forks.
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".

1) To pass data to the relation_set_new_filenode()
   RelationSetNewRelfilenode() was made to update RelationData.rd_rel
   directly. That's not OK however, as it makes the relcache entries
   temporarily inconsistent. Which among other scenarios is a problem
   if a REINDEX targets an index on pg_class - the
   CatalogTupleUpdate() in RelationSetNewRelfilenode().  Presumably
   that was introduced because other places in the code do so - while
   those aren't "good practice" they don't appear to be actively
   buggy (e.g. because system tables may not be targeted).

   I (Andres) should have caught this while reviewing and signficantly
   evolving the code in that commit, mea culpa.

   Fix that by instead passing in the new RelFileNode as separate
   argument to relation_set_new_filenode() and rely on the relcache to
   update the catalog entry. Also revert that the
   RelationMapUpdateMap() call was changed to immediate, and undo some
   other more unnecessary changes.

2) Document that the relation_set_new_filenode cannot rely on the
   whole relcache entry to be valid. It might be worthwhile to
   refactor the code to never have to rely on that, but given the way
   heap_create() is currently coded, that'd be a large change.

3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
   table AM might not use shared buffers at all. Move to
   index_copy_data() and heapam_relation_copy_data().

4) heapam_relation_set_new_filenode() previously sometimes accessed
   rel->rd_rel->relpersistence rather than the `persistence`
   argument. Code movement mistake.

5) Previously heapam_relation_set_new_filenode() re-opened the smgr
   relation to create the init for, if necesary. Instead have
   RelationCreateStorage() return the SMgrRelation and use it to
   create the init fork.

6) Add a note about the danger of modifying the relcache directly to
   ATExecSetTableSpace() - it's currently not a bug because there's a
   check ERRORing for catalog tables.

Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
2019-04-29 19:28:05 -07:00
Peter Geoghegan 9ee7414ed0 Remove obsolete _bt_insert_parent() comment.
Remove a comment that refers to a coding practice that was fully removed
by commit a8b8f4db, which introduced MarkBufferDirty().  It looks like
the comment was even obsolete before then, since it concerns
write-ordering dependencies with synchronous buffer writes.
2019-04-29 14:14:38 -07:00
Tom Lane a1a789eb5a In walreceiver, don't try to do ereport() in a signal handler.
This is quite unsafe, even for the case of ereport(FATAL) where we won't
return control to the interrupted code, and despite this code's use of
a flag to restrict the areas where we'd try to do it.  It's possible
for example that we interrupt malloc or free while that's holding a lock
that's meant to protect against cross-thread interference.  Then, any
attempt to do malloc or free within ereport() will result in a deadlock,
preventing the walreceiver process from exiting in response to SIGTERM.
We hypothesize that this explains some hard-to-reproduce failures seen
in the buildfarm.

Hence, get rid of the immediate-exit code in WalRcvShutdownHandler,
as well as the logic associated with WalRcvImmediateInterruptOK.
Instead, we need to take care that potentially-blocking operations
in the walreceiver's data transmission logic (libpqwalreceiver.c)
will respond reasonably promptly to the process's latch becoming
set and then call ProcessWalRcvInterrupts.  Much of the needed code
for that was already present in libpqwalreceiver.c.  I refactored
things a bit so that all the uses of PQgetResult use latch-aware
waiting, but didn't need to do much more.

These changes should be enough to ensure that libpqwalreceiver.c
will respond promptly to SIGTERM whenever it's waiting to receive
data.  In principle, it could block for a long time while waiting
to send data too, and this patch does nothing to guard against that.
I think that that hazard is mostly theoretical though: such blocking
should occur only if we fill the kernel's data transmission buffers,
and we don't generally send enough data to make that happen without
waiting for input.  If we find out that the hazard isn't just
theoretical, we could fix it by using PQsetnonblocking, but that
would require more ticklish changes than I care to make now.

This is a bug fix, but it seems like too big a change to push into
the back branches without much more testing than there's time for
right now.  Perhaps we'll back-patch once we have more confidence
in the change.

Patch by me; thanks to Thomas Munro for review.

Discussion: https://postgr.es/m/20190416070119.GK2673@paquier.xyz
2019-04-29 12:26:07 -04:00
Michael Paquier 9c592896d9 Fix some typos
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/42kEeWei6VxLGh12QbR08hiI5Pm-c3XgbK7qj393PSttEhVbnnQoFXHKzXjPRZLUpndWAfHIuZuUqGZBzyXadmEUCSqm9xphWur_I8vESMA=@yesql.se
2019-04-29 23:52:42 +09:00
Alvaro Herrera ffbce803e6 Message fixes 2019-04-29 10:05:45 -04:00
Peter Eisentraut cd3e27464c Fix potential catalog corruption with temporary identity columns
If a temporary table with an identity column and ON COMMIT DROP is
created in a single-statement transaction (not useful, but allowed),
it would leave the catalog corrupted.  We need to add a
CommandCounterIncrement() so that PreCommit_on_commit_actions() sees
the created dependency between table and sequence and can clean it
up.

The analogous and more useful case of doing this in a transaction
block already runs some CommandCounterIncrement() before it gets to
the on-commit cleanup, so it wasn't a problem in practical use.

Several locations for placing the new CommandCounterIncrement() call
were discussed.  This patch places it at the end of
standard_ProcessUtility().  That would also help if other commands
were to create catalog entries that some on-commit action would like
to see.

Bug: #15631
Reported-by: Serge Latyntsev <dnsl48@gmail.com>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-04-29 08:49:03 +02:00
Tom Lane c3f67ed6e4 Do pre-release housekeeping on catalog data, and fix jsonpath send/recv.
Run renumber_oids.pl to move high-numbered OIDs down, as per pre-beta
tasks specified by RELEASE_CHANGES.  (The only change is 8394 -> 3428.)

Also run reformat_dat_file.pl while I'm here.

While looking at the reformat diffs, I chanced to notice that type
jsonpath had typsend and typreceive = '-', which surely is not the
intention given that jsonpath_send and jsonpath_recv exist.
Fix that.  It's safe to assume that these functions have never been
tested :-(.  I didn't try, but somebody should.
2019-04-28 17:16:50 -04:00
Noah Misch 90e7f31773 Use preprocessor conditions compatible with Emacs indent.
Emacs wrongly indented hundreds of subsequent lines.
2019-04-28 12:56:53 -07:00
Tom Lane e481d26285 Clean up minor warnings from buildfarm.
Be more consistent about use of XXXGetDatum macros in new jsonpath
code.  This is mostly to avoid having code that looks randomly
different from everyplace else that's doing the exact same thing.

In pg_regress.c, avoid an unreferenced-function warning from
compilers that don't understand pg_attribute_unused().  Putting
the function inside the same #ifdef as its only caller is more
straightforward coding anyway.

In be-secure-openssl.c, avoid use of pg_attribute_unused() on a label.
That's pretty creative, but there's no good reason to suppose that
it's portable, and there's absolutely no need to use goto's here in the
first place.  (This wasn't actually causing any buildfarm complaints,
but it's new code in v12 so it has no portability track record.)
2019-04-28 12:45:55 -04:00
Tom Lane 48f8fa9ce0 Portability fix for zic.c.
Missed an inttypes.h dependency in previous patch.  Per buildfarm.
2019-04-26 21:20:55 -04:00
Tom Lane acb897b806 Sync our copy of the timezone library with IANA release tzcode2019a.
This corrects a small bug in zic that caused it to output an incorrect
year-2440 transition in the Africa/Casablanca zone.

More interestingly, zic has grown a "-r" option that limits the range of
zone transitions that it will put into the output files.  That might be
useful to people who don't like the weird GMT offsets that tzdb likes
to use for very old dates.  It appears that for dates before the cutoff
time specified with -r, zic will use the zone's standard-time offset
as of the cutoff time.  So for example one might do

	make install ZIC_OPTIONS='-r @-1893456000'

to cause all dates before 1910-01-01 to be treated as though 1910
standard time prevailed indefinitely far back.  (Don't blame me for
the unfriendly way of specifying the cutoff time --- it's seconds
since or before the Unix epoch.  You can use extract(epoch ...)
to calculate it.)

As usual, back-patch to all supported branches.
2019-04-26 19:46:26 -04:00
Tom Lane d312de3fc0 Update time zone data files to tzdata release 2019a.
DST law changes in Palestine and Metlakatla.
Historical corrections for Israel.

Etc/UCT is now a backward-compatibility link to Etc/UTC, instead
of being a separate zone that generates the abbreviation "UCT",
which nowadays is typically a typo.  Postgres will still accept
"UCT" as an input zone name, but it won't output it.
2019-04-26 17:56:26 -04:00
Tom Lane c01eb619a8 Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.

In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.

Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.

In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0.  Both of these changes still seem like smart
future-proofing.

This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to.  However, fixing those problems will take more work which may not
get back-patched into v11.  We need a fix for the corruption problem now.

Per bug #15672 from Jianing Yang.

Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.

Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 17:18:07 -04:00
Alvaro Herrera 7fcdb5e002 pg_dump: store unused attribs as NULL instead of '\0'
Commit f831d4accd changed pg_dump to emit (and pg_restore to
understand) NULLs for unused members in ArchiveEntry structs, as a side
effect of some code beautification.  That broke pg_restore of dumps
generated with older pg_dump, however, so it was reverted in
19455c9f56.  Since the archiver version number has been bumped in
3b925e905d, we can put it back.

Author: Dmitry Dolgov
Discussion: https://postgr.es/m/CA+q6zcXx0XHqLsFJLaUU2j5BDiBAHig=YRoBC_YVq7VJGvzBEA@mail.gmail.com
2019-04-26 12:05:17 -04:00
Alvaro Herrera 05b38c7e63 Fix partitioned index attachment
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table.  One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.

Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
2019-04-25 11:22:29 -04:00
Fujii Masao c247ae0922 Fix file path in comment. 2019-04-25 23:49:37 +09:00
Fujii Masao 978b032d1f Fix function names in comments.
Commit 3eb77eba5a renamed some functions, but forgot to
update some comments referencing to those functions.
This commit fixes those function names in the comments.

Kyotaro Horiguchi
2019-04-25 23:43:48 +09:00
Alvaro Herrera 87259588d0 Fix tablespace inheritance for partitioned rels
Commit ca4103025d left a few loose ends.  The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8, but some things remained:

* When ALTER TABLE rewrites tables, the indexes must remain in the
  tablespace they were originally in.  This didn't work because
  index recreation during ALTER TABLE runs manufactured SQL (yuck),
  which runs afoul of default_tablespace in competition with the parent
  relation tablespace.  To fix, reset default_tablespace to the empty
  string temporarily, and add the TABLESPACE clause as appropriate.

* Setting a partitioned rel's tablespace to the database default is
  confusing; if it worked, it would direct the partitions to that
  tablespace regardless of default_tablespace.  But in reality it does
  not work, and making it work is a larger project.  Therefore, throw
  an error when this condition is detected, to alert the unwary.

Add some docs and tests, too.

Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 10:31:32 -04:00
Alvaro Herrera 3b23552ad8 Make pg_dump emit ATTACH PARTITION instead of PARTITION OF
Using PARTITION OF can result in column ordering being changed from the
database being dumped, if the partition uses a column layout different
from the parent's.  It's not pg_dump's job to editorialize on table
definitions, so this is not acceptable; back-patch all the way back to
pg10, where partitioned tables where introduced.

This change also ensures that partitions end up in the correct
tablespace, if different from the parent's; this is an oversight in
ca4103025d (in pg12 only).  Partitioned indexes (in pg11) don't have
this problem, because they're already created as independent indexes and
attached to their parents afterwards.

This change also has the advantage that the partition is restorable from
the dump (as a standalone table) even if its parent table isn't
restored.

Author: David Rowley
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
Discussion: https://postgr.es/m/20190423185007.GA27954@alvherre.pgsql
2019-04-24 15:30:37 -04:00
Tom Lane 0fae846232 Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
2019-04-24 14:15:44 -04:00
Alvaro Herrera 0a999e1290 Unify error messages
... for translatability purposes.
2019-04-24 09:26:13 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Tom Lane 7ad1cd31bf Repair assorted issues in locale data extraction.
cache_locale_time (extraction of LC_TIME-related info) had never been
taught the lessons we previously learned about extraction of info related
to LC_MONETARY and LC_NUMERIC.  Specifically, commit 95a777c61 taught
PGLC_localeconv() that data coming out of localeconv() was in an encoding
determined by the relevant locale, but we didn't realize that there's a
similar issue with strftime().  And commit a4930e7ca hardened
PGLC_localeconv() against errors occurring partway through, but failed
to do likewise for cache_locale_time().  So, rearrange the latter
function to perform encoding conversion and not risk failure while
it's got the locales set to temporary values.

This time around I also changed PGLC_localeconv() to treat it as FATAL
if it can't restore the previous settings of the locale values.  There
is no reason (except possibly OOM) for that to fail, and proceeding with
the wrong locale values seems like a seriously bad idea --- especially
on Windows where we have to also temporarily change LC_CTYPE.  Also,
protect against the possibility that we can't identify the codeset
reported for LC_MONETARY or LC_NUMERIC; rather than just failing,
try to validate the data without conversion.

The user-visible symptom this fixes is that if LC_TIME is set to a locale
name that implies an encoding different from the database encoding,
non-ASCII localized day and month names would be retrieved in the wrong
encoding, leading to either unexpected encoding-conversion error reports
or wrong output from to_char().  The other possible failure modes are
unlikely enough that we've not seen reports of them, AFAIK.

The encoding conversion problems do not manifest on Windows, since
we'd already created special-case code to handle that issue there.

Per report from Juan José Santamaría Flecha.  Back-patch to all
supported versions.

Juan José Santamaría Flecha and Tom Lane

Discussion: https://postgr.es/m/CAC+AXB22So5aZm2vZe+MChYXec7gWfr-n-SK-iO091R0P_1Tew@mail.gmail.com
2019-04-23 18:51:30 -04:00
Tom Lane e0fb4c9d01 Remove useless comment.
Commit e439c6f0c removed IndexStmt.relationId, but not the comment
that had been added to explain it.  Said comment was therefore
very confusing.
2019-04-23 17:17:26 -04:00
Peter Geoghegan 9b10926263 Prevent O(N^2) unique index insertion edge case.
Commit dd299df8 made nbtree treat heap TID as a tiebreaker column,
establishing the principle that there is only one correct location (page
and page offset number) for every index tuple, no matter what.
Insertions of tuples into non-unique indexes proceed as if heap TID
(scan key's scantid) is just another user-attribute value, but
insertions into unique indexes are more delicate.  The TID value in
scantid must initially be omitted to ensure that the unique index
insertion visits every leaf page that duplicates could be on.  The
scantid is set once again after unique checking finishes successfully,
which can force _bt_findinsertloc() to step right one or more times, to
locate the leaf page that the new tuple must be inserted on.

Stepping right within _bt_findinsertloc() was assumed to occur no more
frequently than stepping right within _bt_check_unique(), but there was
one important case where that assumption was incorrect: inserting a
"duplicate" with NULL values.  Since _bt_check_unique() didn't do any
real work in this case, it wasn't appropriate for _bt_findinsertloc() to
behave as if it was finishing off a conventional unique insertion, where
any existing physical duplicate must be dead or recently dead.
_bt_findinsertloc() might have to grovel through a substantial portion
of all of the leaf pages in the index to insert a single tuple, even
when there were no dead tuples.

To fix, treat insertions of tuples with NULLs into a unique index as if
they were insertions into a non-unique index: never unset scantid before
calling _bt_search() to descend the tree, and bypass _bt_check_unique()
entirely.  _bt_check_unique() is no longer responsible for incoming
tuples with NULL values.

Discussion: https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
2019-04-23 10:33:57 -07:00
Tom Lane f4a3fdfbdc Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY.
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null.  It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.

The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations.  In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries.  (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.)  This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.

To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null.  (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.)  Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null.  This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.

To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX.  I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command.  This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.

Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables.  The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL.  To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario.  This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.

In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.

A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way.  That seems fine to me, so
I didn't make any effort to avoid the wording change.

The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either.  However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works.  On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.

Patch by me, but thanks to Jie Zhang for a preliminary version.

Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
2019-04-23 12:25:27 -04:00
Tom Lane c06e3550dc Don't request pretty-printed output from xmlNodeDump().
xml.c passed format = 1 to xmlNodeDump(), resulting in sometimes getting
extra whitespace (newlines + spaces) in the output.  We don't really want
that, first because whitespace might be semantically significant in some
XML uses, and second because it happens only very inconsistently.  Only
one case in our regression tests is affected.

This potentially affects the results of xpath() and the XMLTABLE construct,
when emitting nodeset values.

Note that the older code in contrib/xml2 doesn't do this; it seems
to have been an aboriginal bad decision in commit ea3b212fe.

While this definitely seems like a bug to me, the small number of
complaints to date argues against back-patching a behavioral change.
Hence, fix in HEAD only, at least for now.

Per report from Jean-Marc Voillequin.

Discussion: https://postgr.es/m/1EC8157EB499BF459A516ADCF135ADCE3A23A9CA@LON-WGMSX712.ad.moodys.net
2019-04-23 10:51:07 -04:00
Michael Paquier ccae190b91 Fix detection of passwords hashed with MD5 or SCRAM-SHA-256
This commit fixes a couple of issues related to the way password
verifiers hashed with MD5 or SCRAM-SHA-256 are detected, leading to
being able to store in catalogs passwords which do not follow the
supported hash formats:
- A MD5-hashed entry was checked based on if its header uses "md5" and
if the string length matches what is expected.  Unfortunately the code
never checked if the hash only used hexadecimal characters, as reported
by Tom Lane.
- A SCRAM-hashed entry was checked based on only its header, which
should be "SCRAM-SHA-256$", but it never checked for any fields
afterwards, as reported by Jonathan Katz.

Backpatch down to v10, which is where SCRAM has been introduced, and
where password verifiers in plain format have been removed.

Author: Jonathan Katz
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/016deb6b-1f0a-8e9f-1833-a8675b170aa9@postgresql.org
Backpatch-through: 10
2019-04-23 15:43:21 +09:00
Andres Freund b5f58cf213 Convert gist to compute page level xid horizon on primary.
Due to parallel development, gist added the missing conflict
information in c952eae52a, while 558a9165e0 moved that computation
to the primary for the index types that already had it.  Thus adapt
gist to also compute on the primary, using
index_compute_xid_horizon_for_tuples() instead of its own copy of the
logic.

This also adds pg_waldump support for XLOG_GIST_DELETE records, which
previously was not properly present.

Bumps WAL version.

Author: Andres Freund
Discussion: https://postgr.es/m/20190406050243.bszosdg4buvabfrt@alap3.anarazel.de
2019-04-22 14:28:30 -07:00
Tomas Vondra d08c44f7a4 Fix mvdistinct and dependencies size calculations
The formulas used to calculate size while (de)serializing mvndistinct
and functional dependencies were based on offset() of the structs. But
that is incorrect, because the structures are not copied directly, we
we copy the individual fields directly.

At the moment this works fine, because there is no alignment padding
on any platform we support. But it might break if we ever added some
fields into any of the structs, for example. It's also confusing.

Fixed by reworking the macros to directly sum sizes of serialized
fields. The macros are now useful only for serialiation, so there is
no point in keeping them in the public header file. So make them
private by moving them to the .c files.

Also adds a couple more asserts to check the serialization, and fixes
an incorrect allocation of MVDependency instead of (MVDependency *).

Reported-By: Tom Lane
Discussion: https://postgr.es/m/29785.1555365602@sss.pgh.pa.us
2019-04-21 20:23:34 +02:00
Stephen Frost eb882a1b71 GSSAPI: Improve documentation and tests
The GSSAPI encryption patch neglected to update the protocol
documentation to describe how to set up a GSSAPI encrypted connection
from a client to the server, so fix that by adding the appropriate
documentation to protocol.sgml.

The tests added for encryption support were overly long and couldn't be
run in parallel due to race conditions; this was largely because each
test was setting up its own KDC to perform the tests.  Instead, merge
the authentication tests and the encryption tests into the original
test, where we only create one KDC to run the tests with.  Also, have
the tests check what the server's opinion is of the connection and if it
was GSS authenticated or encrypted using the pg_stat_gssapi view.

In passing, fix the libpq label for GSSENC-Mode to be consistent with
the "PGGSSENCMODE" environment variable.

Missing protocol documentation pointed out by Michael Paquier.
Issues with the tests pointed out by Tom Lane and Peter Eisentraut.

Refactored tests and added documentation by me.

Reviewed by Robbie Harwood (protocol documentation) and Michael Paquier
(rework of the tests).
2019-04-19 21:22:22 -04:00
Andres Freund b8b94ea129 Fix slot type issue for fuzzy distance index scan over out-of-core table AM.
For amcanreorderby scans the nodeIndexscan.c's reorder queue holds
heap tuples, but the underlying table likely does not. Before this fix
we'd return different types of slots, depending on whether the tuple
came from the reorder queue, or from the index + table.

While that could be fixed by signalling that the node doesn't return a
fixed type of slot, it seems better to instead remove the separate
slot for the reorder queue, and use ExecForceStoreHeapTuple() to store
tuples from the queue. It's not particularly common to need
reordering, after all.

This reverts most of the iss_ReorderQueueSlot related changes to
nodeIndexscan.c made in 1a0586de36, except that now
ExecForceStoreHeapTuple() is used instead of ExecStoreHeapTuple().

Noticed when testing zheap against the in-core version of tableam.

Author: Andres Freund
2019-04-19 11:42:37 -07:00
Andres Freund 88e6ad3054 Fix two memory leaks around force-storing tuples in slots.
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true.  Fix that by freeing the tuple in that case.

Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.

The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-19 11:39:56 -07:00
Tom Lane 4d5840cea9 Fix problems with auto-held portals.
HoldPinnedPortals() did things in the wrong order: it must not mark
a portal autoHeld until it's been successfully held.  Otherwise,
a failure while persisting the portal results in a server crash
because we think the portal is in a good state when it's not.

Also add a check that portal->status is READY before attempting to
hold a pinned portal.  We have such a check before the only other
use of HoldPortal(), so it seems unwise not to check it here.

Lastly, rethink the responsibility for where to call HoldPinnedPortals.
The comment for it imagined that it was optional for any individual PL
to call it or not, but that cannot be the case: if some outer level of
procedure has a pinned portal, failing to persist it when an inner
procedure commits is going to be trouble.  Let's have SPI do it instead
of the individual PLs.  That's not a complete solution, since in theory
a PL might not be using SPI to perform commit/rollback, but such a PL
is going to have to be aware of lots of related requirements anyway.
(This change doesn't cause an API break for any external PLs that might
be calling HoldPinnedPortals per the old regime, because calling it
twice during a commit or rollback sequence won't hurt.)

Per bug #15703 from Julian Schauder.  Back-patch to v11 where this code
came in.

Discussion: https://postgr.es/m/15703-c12c5bc0ea34ba26@postgresql.org
2019-04-19 11:20:37 -04:00
Michael Paquier 148266fa35 Fix collection of typos and grammar mistakes in docs and comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20190330224333.GQ5815@telsasoft.com
2019-04-19 16:57:40 +09:00
Michael Paquier 5a9323eab6 Remove dependency to pageinspect in recovery tests
If contrib/pageinspect is not installed, this causes the test checking
the minimum recovery point to fail.  The point is that the dependency
with pageinspect is not really necessary as the test does also all
checks with an offline cluster by scanning directly the on-disk pages,
which is enough for the purpose of the test.

Per complaint from Tom Lane.

Author: Michael Paquier
Discussion: https://postgr.es/m/17806.1555566345@sss.pgh.pa.us
2019-04-19 15:51:23 +09:00
Andres Freund 75e03eabea Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs.
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.

Noticed when running the out-of-core zheap AM against the in-core
version of tableam.

Author: Andres Freund
2019-04-18 17:53:54 -07:00
Peter Eisentraut bb385c4fb0 Fix handling of temp and unlogged tables in FOR ALL TABLES publications
If a FOR ALL TABLES publication exists, temporary and unlogged tables
are ignored for publishing changes.  But CheckCmdReplicaIdentity()
would still check in that case that such a table has a replica
identity set before accepting updates.  To fix, have
GetRelationPublicationActions() return that such a table publishes no
actions.

Discussion: https://www.postgresql.org/message-id/f3f151f7-c4dd-1646-b998-f60bd6217dd3@2ndquadrant.com
2019-04-18 08:55:55 +02:00
Andres Freund 4d01835927 pg_dump: Remove stray option parsing support for -o.
I (Andres) missed this in 578b229718, the removal of WITH OIDS
support.

Author: Daniel Verite
Discussion: https://postgr.es/m/f06e9735-3717-4904-8c95-47d0b9c3bb10@manitou-mail.org
2019-04-17 17:28:02 -07:00
Alvaro Herrera 421a2c4832 Tie loose ends in psql's new \dP command
* Remove one unnecessary pg_class join in SQL command.  Not needed,
  because we use a regclass cast instead.

* Doc: refer to "partitioned relations" rather than specifically tables,
  since indexes are also displayed.

* Rename "On table" column to "Table", for consistency with \di.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20190407212525.GB10080@telsasoft.com
2019-04-17 18:38:49 -04:00
Alvaro Herrera b036982db7 psql: display tablespace for partitioned indexes
Nothing was shown previously.
2019-04-17 18:17:43 -04:00
Bruce Momjian fb9c475597 postgresql.conf.sample: add proper defaults for include actions
Previously, include actions include_dir, include_if_exists, and include
listed commented-out values which were not the defaults, which is
inconsistent with other entries.  Instead, replace them with '', which
is the default value.

Reported-by: Emanuel Araújo

Discussion: https://postgr.es/m/CAMuTAkYMx6Q27wpELDR3_v9aG443y7ZjeXu15_+1nGUjhMWOJA@mail.gmail.com

Backpatch-through: 9.4
2019-04-17 18:12:10 -04:00
Tom Lane 1a75c1d0c5 Fix unportable code in pgbench.
The buildfarm points out that UINT64_FORMAT might not work with sscanf;
it's calibrated for our printf implementation, which might not agree
with the platform-supplied sscanf.  Fall back to just accepting an
unsigned long, which is already more than the documentation promises.

Oversight in e6c3ba7fb; back-patch to v11, as that was.
2019-04-17 17:30:29 -04:00
Tom Lane 8cde7f4948 Fix assorted minor bogosity in GSSAPI transport error messages.
I noted that some buildfarm members were complaining about %ld being
used to format values that are (probably) declared size_t.  Use %zu
instead, and insert a cast just in case some versions of the GSSAPI
API declare the length field differently.  While at it, clean up
gratuitous differences in wording of equivalent messages, show
the complained-of length in all relevant messages not just some,
include trailing newline where needed, adjust random deviations
from project-standard code layout and message style, etc.
2019-04-17 17:06:50 -04:00
Tom Lane b4f96d69ad Minor jsonpath fixes.
Restore missed "make clean" rule, fix misspelling.

John Naylor

Discussion: https://postgr.es/m/CACPNZCt5B8jDCCGQiFoSuqmg-za_NCy4QDioBTLaNRih9+-bXg@mail.gmail.com
2019-04-17 13:37:00 -04:00
Magnus Hagander 252b707bc4 Return NULL for checksum failures if checksums are not enabled
Returning 0 could falsely indicate that there is no problem. NULL
correctly indicates that there is no information about potential
problems.

Also return 0 as numbackends instead of NULL for shared objects (as no
connection can be made to a shared object only).

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
2019-04-17 13:51:48 +02:00
Michael Paquier 9010156445 Fix thinko introduced by 82a5649 in slot.c
When saving a replication slot, failing to close the temporary path used
to save the slot information is considered as a failure and reported as
such.  However the code forgot to leave immediately as other failure
paths do.

Noticed while looking up at this area of the code for another patch.
2019-04-17 10:01:22 +09:00
Michael Paquier 47ac2033d4 Simplify some ERROR paths clearing wait events and transient files
Transient files and wait events get normally cleaned up when seeing an
exception (be it in the context of a transaction for a backend or
another process like the checkpointer), hence there is little point in
complicating error code paths to do this work.  This shaves a bit of
code, and removes some extra handling with errno which needed to be
preserved during the cleanup steps done.

Reported-by: Masahiko Sawada
Author: Michael Paquier
Reviewed-by: Tom Lane, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDhHYVq5KkXfkaHhmjA-zJYj-e4teiRAJefvXuKJz1tKQ@mail.gmail.com
2019-04-17 09:51:45 +09:00
Michael Paquier a6dcf9df4d Rework handling of invalid indexes with REINDEX CONCURRENTLY
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway).  This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.

Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.

More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.

Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
2019-04-17 09:33:51 +09:00
Michael Paquier 5ed4b123b6 Remove duplicate assignment when initializing logical decoder context
The private data in the WAL reader is already getting set when
allocating it.

Author: Antonin Houska
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/30563.1555329094@localhost
2019-04-16 15:08:38 +09:00
Noah Misch e12a472612 Don't write to stdin of a test process that could have already exited.
Instead, close that stdin.  Per buildfarm member conchuela.  Back-patch
to 9.6, where the test was introduced.

Discussion: https://postgr.es/m/26478.1555373328@sss.pgh.pa.us
2019-04-15 18:13:44 -07:00
Tom Lane dde7fb7836 Use [FLEXIBLE_ARRAY_MEMBER] not [1] in MultiSortSupportData.
This struct seems to have not gotten the word about preferred
coding style for variable-length arrays.
2019-04-15 19:32:44 -04:00
Tomas Vondra dbb984128e Convert pre-existing stats_ext tests to new style
The regression tests added in commit 7300a69950 test cardinality
estimates using a function that extracts the interesting pieces
from the EXPLAIN output, instead of testing the whole plan. That
seems both easier to understand and less fragile, so this applies
the same approach to pre-existing tests of ndistinct coefficients
and functional dependencies.

Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-04-16 00:02:22 +02:00
Tomas Vondra 3824ca30d1 Fix pg_mcv_list deserialization
The memcpy() was copying type OIDs in the wrong direction, so the
deserialized MCV list always had them as 0. This is mostly harmless
except when printing the data in pg_mcv_list_items(), in which case
it reported

    ERROR:  cache lookup failed for type 0

Also added a simple regression test for pg_mcv_list_items() function,
printing a single-item MCV list.

Reported-By: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCX6T0iDTTZrqyec4Cd6b4yuL7euu4=rQRXaVBAVrUi1Cg@mail.gmail.com
2019-04-16 00:01:39 +02:00
Tom Lane 4b40e44f07 Fix failure with textual partition hash keys.
Commit 5e1963fb7 overlooked two places in partbounds.c that now
need to pass a collation identifier to the hash functions for
a partition key column.

Amit Langote, per report from Jesper Pedersen

Discussion: https://postgr.es/m/a620f85a-42ab-e0f3-3337-b04b97e2e2f5@redhat.com
2019-04-15 16:47:09 -04:00
Tom Lane 47169c2550 Avoid possible regression test instability in timestamp.sql.
Concurrent autovacuum could result in a change in the order of the
live rows in timestamp_tbl.  While this would not happen with the
default autovacuum parameters, it's fairly easy to hit if
autovacuum_vacuum_threshold is made small enough to allow autovac
to decide to process this table.  That's a stumbling block for trying
to exercise autovacuum aggressively using the core regression tests.

To fix, replace an unqualified DELETE with a TRUNCATE.  There's a
similar DELETE just above (and no order-sensitive queries between),
so this doesn't lose any test coverage and might indeed be argued
to improve it.

Discussion: https://postgr.es/m/17428.1555348950@sss.pgh.pa.us
2019-04-15 16:20:01 -04:00
Alexander Korotkov 1e87198182 Fix division by zero in _bt_vacuum_needs_cleanup()
Checks inside _bt_vacuum_needs_cleanup() allow division by zero to happen when
metad->btm_last_cleanup_num_heap_tuples == 0.  This commit adjusts the
expression so that no division by zero might happen.

Reported-by: Piotr Stefaniak
Discussion: https://postgr.es/m/DB8PR03MB5931C41F7787A95313F08322F22A0%40DB8PR03MB5931.eurprd03.prod.outlook.com
Reviewed-by: Masahiko Sawada
Backpatch-through: 11
2019-04-15 20:20:43 +03:00
Etsuro Fujita 3a45321a49 Fix thinko in ExecCleanupTupleRouting().
Commit 3f2393edef changed ExecCleanupTupleRouting() so that it skipped
cleaning up subplan resultrels before calling EndForeignInsert(), but
that would cause an issue: when those resultrels were foreign tables,
the FDWs would fail to shut down.  Repair by skipping it after calling
EndForeignInsert() as before.

Author: Etsuro Fujita
Reviewed-by: David Rowley and Amit Langote
Discussion: https://postgr.es/m/5CAF3B8F.2090905@lab.ntt.co.jp
2019-04-15 19:01:09 +09:00
Peter Eisentraut abb9c63b2c Unbreak index optimization for LIKE on bytea
The same code is used to handle both text and bytea, but bytea is not
collation-aware, so we shouldn't call get_collation_isdeterministic()
in that case, since that will error out with an invalid collation.

Reported-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAM2%2B6%3DWaf3qJ1%3DyVTUH8_yG-SC0xcBMY%2BSFLhvKKNnWNXSUDBw%40mail.gmail.com
2019-04-15 09:29:17 +02:00
Michael Paquier c34677fdaa Fix SHOW ALL command for non-superusers with replication connection
Since Postgres 10, SHOW commands can be triggered with replication
connections in a WAL sender context, however it missed that a
transaction context is needed for syscache lookups.  This commit makes
sure that the syscache lookups can happen correctly by setting a
transaction context when running SHOW commands in a WAL sender.

Superuser-only parameters can be displayed using SHOW commands not only
to superusers, but also to members of system role pg_read_all_settings,
which requires a syscache lookup to check if the connected role is a
member of this system role or not, or the instance crashes.  Superusers
do not need to check the syscache so it worked correctly in this case.

New tests are added to cover this issue.

Reported-by: Alexander Kukushkin
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/15734-2daa8761eeed8e20@postgresql.org
Backpatch-through: 10
2019-04-15 12:34:32 +09:00
Noah Misch 4ab02e8156 Test both 0.0.0.0 and 127.0.0.x addresses to find a usable port.
Commit c098509927 changed
PostgresNode::get_new_node() to probe 0.0.0.0 instead of 127.0.0.1, but
the new test was less effective for Windows native Perl.  This increased
the failure rate of buildfarm members bowerbird and jacana.  Instead,
test 0.0.0.0 and concrete addresses.  This restores the old level of
defense, but the algorithm is still subject to its longstanding time of
check to time of use race condition.  Back-patch to 9.6, like the
previous change.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-14 20:02:19 -07:00
Michael Paquier d9f543e9e9 Switch TAP tests of pg_rewind to use non-superuser role, take two
Up to now the tests of pg_rewind have been using a superuser for all its
tests (which is the default of many tests actually, and something that
ought to be reviewed) when involving an online source server, still it
is possible to use a non-superuser role to do that as long as this role
is granted permissions to execute all the source-side functions used for
the rewind.  This is possible since v11, and was already documented as
of bfc8068.

PostgresNode::init is extended so as callers of this routine can add
extra options to configure the authentication of a new node, which gets
used by this commit, and allows the tests to work properly on Windows
where SSPI is used.

This will allow to catch up easily any change in pg_rewind if the tool
begins to use more backend-side functions, so as the properties
introduced by v11 are kept.

Per suggestion from Peter Eisentraut.

Author: Michael Paquier
Reviewed-by: Magnus Hagander
Discussion: https://postgr.es/m/20190411041336.GM2728@paquier.xyz
2019-04-14 18:47:51 +09:00
Noah Misch 9daefff122 MSYS: Translate REGRESS_SHLIB to a Windows file name.
Per buildfarm member jacana.  Back-patch to v11; earlier branches skip
the affected test under msys.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-14 00:42:34 -07:00
Noah Misch 947a35014f When Perl "kill(9, ...)" fails, try "pg_ctl kill".
Per buildfarm member jacana, the former fails under msys Perl 5.8.8.
Back-patch to 9.6, like the code in question.

Discussion: https://postgr.es/m/GrdLgAdUK9FdyZg8VIcTDKVOkys122ZINEb3CjjoySfGj2KyPiMKTh1zqtRp0TAD7FJ27G-OBB3eplxIB5GhcQH5o8zzGZfp0MuJaXJxVxk=@yesql.se
2019-04-13 11:09:27 -07:00
Tom Lane 5f1433ac5e Prevent memory leaks associated with relcache rd_partcheck structures.
The original coding of generate_partition_qual() just copied the list
of predicate expressions into the global CacheMemoryContext, making it
effectively impossible to clean up when the owning relcache entry is
destroyed --- the relevant code in RelationDestroyRelation() only managed
to free the topmost List header :-(.  This resulted in a session-lifespan
memory leak whenever a table partition's relcache entry is rebuilt.
Fortunately, that's not normally a large data structure, and rebuilds
shouldn't occur all that often in production situations; but this is
still a bug worth fixing back to v10 where the code was introduced.

To fix, put the cached expression tree into its own small memory context,
as we do with other complicated substructures of relcache entries.
Also, deal more honestly with the case that a partition has an empty
partcheck list; while that probably isn't a case that's very interesting
for production use, it's legal.

In passing, clarify comments about how partitioning-related relcache
data structures are managed, and add some Asserts that we're not leaking
old copies when we overwrite these data fields.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/7961.1552498252@sss.pgh.pa.us
2019-04-13 13:22:26 -04:00
Noah Misch c098509927 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:38 -07:00
Michael Paquier db8db624e8 Revert "Switch TAP tests of pg_rewind to use a role with minimal permissions"
This reverts commit d4e2a84, which added a new user with limited
permissions to run the TAP tests of pg_rewind.  Buildfarm machine
members on Windows jacana and bowerbird have been complaining about
that, the new role not being able to run the rewind because SSPI is not
configured to allow it.

Fixing the test requires passing down directly the new user to
pg_regress with --create-role so as SSPI can work properly.

Reported-by: Andrew Dunstan
Discussion: https://postgr.es/m/3cd43d33-f415-cc41-ade3-7230ab15b2c9@2ndQuadrant.com
2019-04-13 13:20:21 +09:00
Magnus Hagander 77bd49adba Show shared object statistics in pg_stat_database
This adds a row to the pg_stat_database view with datoid 0 and datname
NULL for those objects that are not in a database. This was added
particularly for checksums, but we were already tracking more satistics
for these objects, just not returning it.

Also add a checksum_last_failure column that holds the timestamptz of
the last checksum failure that occurred in a database (or in a
non-dataabase file), if any.

Author: Julien Rouhaud <rjuju123@gmail.com>
2019-04-12 14:04:50 +02:00
Peter Eisentraut ef6f30fe77 Fix REINDEX CONCURRENTLY of partitions
In case of a partition index, when swapping the old and new index, we
also need to attach the new index as a partition and detach the old
one.  Also, to handle partition indexes, we not only need to change
dependencies referencing the index, but also dependencies of the index
referencing something else.  The previous code did this only
specifically for a constraint, but we also need to do this for
partitioned indexes.  So instead write a generic function that does it
for all dependencies.

Author: Michael Paquier <michael@paquier.xyz>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/DF4PR8401MB11964EDB77C860078C343BEBEE5A0%40DF4PR8401MB1196.NAMPRD84.PROD.OUTLOOK.COM#154df1fedb735190a773481765f7b874
2019-04-12 08:36:05 +02:00
Thomas Munro f7feb020c3 Fix GetNewTransactionId()'s interaction with xidVacLimit.
Commit ad308058 switched to returning a FullTransactionId, but
failed to load the potentially updated value in the case where
xidVacLimit is reached and we release and reacquire the lock.
Repair, closing bug #15727.

While reviewing that commit, also fix the size computation used
by EstimateTransactionStateSize() and switch to the mul_size()
macro traditionally used in such expressions.

Author: Thomas Munro
Reported-by: Roman Zharkov
Discussion: https://postgr.es/m/15727-0be246e7d852d229%40postgresql.org
2019-04-12 16:47:50 +12:00
Michael Paquier d87ab88686 Fix typos in reloptions.c
Author: Kirk Jamison
Discussion: https://postgr.es/m/D09B13F772D2274BB348A310EE3027C6493463@g01jpexmbkw24
2019-04-12 12:56:38 +09:00
Michael Paquier d4e2a843e6 Switch TAP tests of pg_rewind to use a role with minimal permissions
Up to now the tests of pg_rewind have been using a superuser for all the
tests (which is the default of many tests actually, and something that
ought to be reviewed) when involving an online source server, still it
is possible to use a non-superuser role to do that as long as this role
is granted permissions to execute all the source-side functions used for
the rewind.  This is possible since v11, and was already documented as
of bfc8068.

This will allow to catch up easily any change in pg_rewind if the tool
begins to use more backend-side functions, so as the properties
introduced by v11 are kept.

Per suggestion from Peter Eisentraut.

Author: Michael Paquier
Reviewed-by: Magnus Hagander
Discussion: https://postgr.es/m/20190411041336.GM2728@paquier.xyz
2019-04-12 10:46:43 +09:00
Michael Paquier d527fda621 Fix more strcmp() calls using boolean-like comparisons for result checks
Such calls can confuse the reader as strcmp() uses an integer as result.
The places patched here have been spotted by Thomas Munro, David Rowley
and myself.

Author: Michael Paquier
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20190411021946.GG2728@paquier.xyz
2019-04-12 10:16:49 +09:00
Tom Lane 798070ec05 Re-order some regression test scripts for more parallelism.
Move the strings, numerology, insert, insert_conflict, select and
errors tests to be parts of nearby parallel groups, instead of
executing by themselves.  (Moving "select" required adjusting the
constraints test, which uses a table named "tmp" as select also
does.  There don't seem to be any other conflicts.)

Move psql and stats_ext to the next parallel group, where the rules
test also has a long runtime.  To make it safe to run stats_ext in
parallel with rules, I adjusted the latter to only dump views/rules
from the pg_catalog and public schemas, which was what it was doing
anyway.  stats_ext makes some views in a transient schema, which now
will not affect rules.

Reorder serial_schedule to match parallel_schedule.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 18:16:50 -04:00
Tom Lane 5874c70557 Speed up sort-order-comparison tests in create_index_spgist.
This test script verifies that KNN searches of an SP-GiST index
produce the same sort order as a seqscan-and-sort.  The FULL JOINs
used for that are exceedingly slow, however.  Investigation shows
that the problem is that the initial join is on the rank() values,
and we have a lot of duplicates due to the data set containing 1000
duplicate points.  We're therefore going to produce 1000000 join
rows that have to be thrown away again by the join filter.

We can improve matters by using row_number() instead of rank(),
so that the initial join keys are unique.  The catch is that
that makes the results sensitive to the sorting of rows with
equal distances from the reference point.  That doesn't matter
for the actually-equal points, but as luck would have it, the
data set also contains two distinct points that have identical
distances to the origin.  So those two rows could legitimately
appear in either order, causing unwanted output from the check
queries.

However, it doesn't seem like it's the job of this test to
check whether the <-> operator correctly computes distances;
its charter is just to verify that SP-GiST emits the values
in distance order.  So we can dodge the indeterminacy problem
by having the check only compare row numbers and distances
not the actual point values.

This change reduces the run time of create_index_spgist by a good
three-quarters, on my machine, with ensuing beneficial effects on
the runtime of create_index (thanks to interactions with CREATE
INDEX CONCURRENTLY tests in the latter).  I see a net improvement
of more than 2X in the runtime of their parallel test group.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 17:01:35 -04:00
Tom Lane 385d396b80 Split up a couple of long-running regression test scripts.
The point of this change is to increase the potential for parallelism
while running the core regression tests.  Most people these days are
using parallel testing modes on multi-core machines, so we might as
well try a bit harder to keep multiple cores busy.  Hence, a test that
runs much longer than others in its parallel group is a candidate to
be sub-divided.

In this patch, create_index.sql and join.sql are split up.
I haven't changed the content of the tests in any way, just
moved them.

I moved create_index.sql's SP-GiST-related tests into a new script
create_index_spgist, and moved its btree multilevel page deletion test
over to the existing script btree_index.  (btree_index is a more natural
home for that test, and it's shorter than others in its parallel group,
so this doesn't hurt total runtime of that group.)  There might be
room for more aggressive splitting of create_index, but this is enough
to improve matters considerably.

Likewise, I moved join.sql's "exercises for the hash join code" into
a new file join_hash.  Those exercises contributed three-quarters of
the script's runtime.  Which might well be excessive ... but for the
moment, I'm satisfied with shoving them into a different parallel
group, where they can share runtime with the roughly-equally-lengthy
gist test.

(Note for anybody following along at home: there are interesting
interactions between the runtimes of create_index and anything running
in parallel with it, because the tests of CREATE INDEX CONCURRENTLY
in that file will repeatedly block waiting for concurrent transactions
to commit.  As committed in this patch, create_index and
create_index_spgist have roughly equal runtimes, but that's mostly an
artifact of forced synchronization of the CONCURRENTLY tests; when run
serially, create_index is much faster.  A followup patch will reduce
the runtime of create_index_spgist and thereby also create_index.)

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 16:15:54 -04:00
Tom Lane 6726d8d476 Move plpgsql error-trapping tests to a new module-specific test file.
The test for statement timeout has a 2-second timeout, which was only
moderately annoying when it was written, but nowadays it contributes
a pretty significant chunk of the elapsed time needed to run the core
regression tests on a fast machine.  We can improve this situation by
pushing the test into a plpgsql-specific test file instead of having
it in a core regression test.  That's a clean win when considering
just the core tests.  Even when considering check-world or a buildfarm
test run, we should come out ahead because the core tests get run
more times in those sequences.

Furthermore, since the plpgsql tests aren't currently parallelized,
it seems likely that the timing problems reflected in commit f1e671a0b
(which increased that timeout from 1 sec to 2) will be much less severe
in this context.  Hence, let's try cutting the timeout back to 1 second
in hopes of a further win for check-world.  We can undo that if
buildfarm experience proves it to be a bad idea.

To give the new test file some modicum of intellectual coherency,
I moved the surrounding tests related to error-trapping along with
the statement timeout test proper.  Those other tests don't run long
enough to have any particular bearing on test-runtime considerations.
The tests are the same as before, except with minor adjustments to
not depend on an externally-created table.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 15:09:28 -04:00
Michael Meskes ed16ba3248 Fix off-by-one check that can lead to a memory overflow in ecpg.
Patch by Liu Huailing <liuhuailing@cn.fujitsu.com>
2019-04-11 20:56:17 +02:00
Tom Lane 4aaa3b5cf1 Remove duplicative polygon SP-GiST sequencing test.
Code coverage comparisons confirm that the tests using
quad_poly_tbl_ord_seq1/quad_poly_tbl_ord_idx1 hit no code
paths not also covered by the similar tests using
quad_poly_tbl_ord_seq2/quad_poly_tbl_ord_idx2.  Since these
test cases are pretty expensive, they need to contribute more
than zero benefit.

In passing, make quad_poly_tbl_ord_seq2 a temp table, since
there seems little reason to keep it around after the test.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 14:35:47 -04:00
Tom Lane f72d9a5e7d Remove redundant and ineffective test for btree insertion fast path.
indexing.sql's test for this feature was added along with the
feature in commit 2b2727343.  However, shortly later that test was
rendered ineffective by commit 074251db6, which limited when the
optimization would be applied, so that the test didn't test it.
Since then, commit dd299df81 added new tests (in btree_index.sql)
that actually do test the feature.  Code coverage comparisons
confirm that this test sequence adds no meaningful coverage, and
it's rather expensive, accounting for nearly half of the runtime
of indexing.sql according to my measurements.  So let's remove it.

Per advice from Peter Geoghegan.

Discussion: https://postgr.es/m/735.1554935715@sss.pgh.pa.us
2019-04-11 13:15:59 -04:00
Alvaro Herrera 65d857d92c Fix declaration after statement
This style is frowned upon.  I inadvertently introduced one in commit
fe0e0b4fc7.  (My compiler does not complain about it, even though
-Wdeclaration-after-statement is specified.  Weird.)

Author: Masahiko Sawada
2019-04-10 22:31:40 -04:00
Tom Lane 4cae471d1b Fix backwards test in operator_precedence_warning logic.
Warnings about unary minus might have been wrong.  It's a bit
surprising that nobody noticed yet ... probably the precedence-warning
feature hasn't really been used much in the field.

Rikard Falkeborn

Discussion: https://postgr.es/m/CADRDgG6fzA8A2oeygUw4=o7ywo4kvz26NxCSgpq22nMD73Bx4Q@mail.gmail.com
2019-04-10 19:02:21 -04:00
Peter Eisentraut 765525c8c2 pg_restore: Make not verbose by default
This was accidentally changed in
cc8d415117.

Reported-by: Christoph Berg <myon@debian.org>
2019-04-10 11:45:00 +02:00
Amit Kapila bdf35744bd Avoid counting transaction stats for parallel worker cooperating
transaction.

The transaction that is initiated by the parallel worker to cooperate
with the actual transaction started by the main backend to complete the
query execution should not be counted as a separate transaction.  The
other internal transactions started and committed by the parallel worker
are still counted as separate transactions as we that is what we do in
other places like autovacuum.

This will partially fix the bloat in transaction stats due to additional
transactions performed by parallel workers.  For a complete fix, we need to
decide how we want to show all the transactions that are started internally
for various operations and that is a matter of separate patch.

Reported-by: Haribabu Kommi
Author: Haribabu Kommi
Reviewed-by: Amit Kapila, Jamison Kirk and Rahila Syed
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAJrrPGc9=jKXuScvNyQ+VNhO0FZk7LLAShAJRyZjnedd2D61EQ@mail.gmail.com
2019-04-10 08:24:15 +05:30
Thomas Munro d614aae02e Improve comment in sync.h.
Per off-list complaint from Andres Freund.
2019-04-10 12:49:49 +12:00
Thomas Munro 255044889d Fix typos. 2019-04-10 09:21:06 +12:00
Tom Lane 9476131278 Prevent inlining of multiply-referenced CTEs with outer recursive refs.
This has to be prevented because inlining would result in multiple
self-references, which we don't support (and in fact that's disallowed
by the SQL spec, see statements about linearly vs. nonlinearly
recursive queries).  Bug fix for commit 608b167f9.

Per report from Yaroslav Schekin (via Andrew Gierth)

Discussion: https://postgr.es/m/87wolmg60q.fsf@news-spur.riddles.org.uk
2019-04-09 15:47:35 -04:00
Alvaro Herrera 4dba0f6dc4 Fix typo 2019-04-09 13:00:12 -04:00
Alvaro Herrera fe0e0b4fc7 Fix memory leak in pgbench
Commit 25ee70511e introduced a memory leak in pgbench: some PGresult
structs were not being freed during error bailout, because we're now
doing more PQgetResult() calls than previously.  Since there's more
cleanup code outside the discard_response() routine than in it, refactor
the cleanup code, removing the routine.

This has little effect currently, since we abandon processing after
hitting errors, but if we ever get further pgbench features (such as
testing for serializable transactions), it'll matter.

Per Coverity.

Reviewed-by: Michaël Paquier
2019-04-09 12:46:34 -04:00
Tom Lane a2418f9e23 Test some more cases with partitioned tables in EvalPlanQual.
We weren't testing anything involving EPQ on UPDATEs that move tuples
into different partitions.  Depending on the implementation,
it might be that these cases aren't actually very interesting ...
but given our thin coverage of EPQ in general, I think it's a good
idea to have a test case.

Amit Langote, minor tweak by me

Discussion: https://postgr.es/m/7889df35-ad1a-691a-00e3-4d4b18f364e3@lab.ntt.co.jp
2019-04-09 11:43:03 -04:00
Noah Misch ba3fb5d4fb Define WIN32_STACK_RLIMIT throughout win32 and cygwin builds.
The MSVC build system already did this, and commit
617dc6d299 used it in a second file.
Back-patch to 9.4, like that commit.

Discussion: https://postgr.es/m/CAA8=A7_1SWc3+3Z=-utQrQFOtrj_DeohRVt7diA2tZozxsyUOQ@mail.gmail.com
2019-04-09 08:25:39 -07:00
Peter Eisentraut 9efe068e48 Replace tabs with spaces in one .sql file
Let's at least keep this consistent within the same file.
2019-04-09 15:54:37 +02:00
Heikki Linnakangas 16954e22e2 Fix example in comment.
Author: Adrien Nayrat
2019-04-09 08:33:42 +03:00
Noah Misch 617dc6d299 Avoid "could not reattach" by providing space for concurrent allocation.
We've long had reports of intermittent "could not reattach to shared
memory" errors on Windows.  Buildfarm member dory fails that way when
PGSharedMemoryReAttach() execution overlaps with creation of a thread
for the process's "default thread pool".  Fix that by providing a second
region to receive asynchronous allocations that would otherwise intrude
into UsedShmemSegAddr.  In pgwin32_ReserveSharedMemoryRegion(), stop
trying to free reservations landing at incorrect addresses; the caller's
next step has been to terminate the affected process.  Back-patch to 9.4
(all supported versions).

Reviewed by Tom Lane.  He also did much of the prerequisite research;
see commit bcbf2346d6.

Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com
2019-04-08 21:39:00 -07:00
Andres Freund 6421011ea2 tableam: comment and formatting fixes.
Author: Heikki Linnakangas
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-08 16:24:36 -07:00
Tom Lane 45f8eaa8e3 Fix improper interaction of FULL JOINs with lateral references.
join_is_legal() needs to reject forming certain outer joins in cases
where that would lead the planner down a blind alley.  However, it
mistakenly supposed that the way to handle full joins was to treat them
as applying the same constraints as for left joins, only to both sides.
That doesn't work, as shown in bug #15741 from Anthony Skorski: given
a lateral reference out of a join that's fully enclosed by a full join,
the code would fail to believe that any join ordering is legal, resulting
in errors like "failed to build any N-way joins".

However, we don't really need to consider full joins at all for this
purpose, because we effectively force them to be evaluated in syntactic
order, and that order is always legal for lateral references.  Hence,
get rid of this broken logic for full joins and just ignore them instead.

This seems to have been an oversight in commit 7e19db0c0.
Back-patch to all supported branches, as that was.

Discussion: https://postgr.es/m/15741-276f1f464b3f40eb@postgresql.org
2019-04-08 16:09:26 -04:00
Tom Lane a8cb8f1246 Fix EvalPlanQualStart to handle partitioned result rels correctly.
The es_root_result_relations array needs to be shallow-copied in the
same way as the main es_result_relations array, else EPQ rechecks on
partitioned result relations fail, as seen in bug #15677 from
Norbert Benkocs.

Amit Langote, isolation test case added by me

Discussion: https://postgr.es/m/15677-0bf089579b4cd02d@postgresql.org
Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
2019-04-08 12:20:22 -04:00
Fujii Masao 119dcfad98 Add vacuum_truncate reloption.
vacuum_truncate controls whether vacuum tries to truncate off
any empty pages at the end of the table. Previously vacuum always
tried to do the truncation. However, the truncation could cause
some problems; for example, ACCESS EXCLUSIVE lock needs to
be taken on the table during the truncation and can cause
the query cancellation on the standby even if hot_standby_feedback
is true. Setting this reloption to false can be helpful to avoid
such problems.

Author: Tsunakawa Takayuki
Reviewed-By: Julien Rouhaud, Masahiko Sawada, Michael Paquier, Kirk Jamison and Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwE5UqFqSq1=kV3QtTUtXphTdyHA-8rAj4A=Y+e4kyp3BQ@mail.gmail.com
2019-04-08 16:43:57 +09:00
Andres Freund 4c9e1bd0a3 Reset memory context once per tuple in validateForeignKeyConstraint.
When using tableam ExecFetchSlotHeapTuple() might return a separately
allocated tuple. We could use the shouldFree argument to explicitly
free it, but it seems more robust to to protect

Also add a CHECK_FOR_INTERRUPTS() after each tuple. It's likely that
each AM has (heap does) a CFI somewhere in the relevant path, but it
seems more robust to have one in validateForeignKeyConstraint()
itself.

Note that this only affects the cases that couldn't be optimized to be
verified with a query.

Author: Andres Freund
Reviewed-By: Tom Lane (in an earlier version)
Discussion:
    https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
    https://postgr.es/m/CAKJS1f_SHKcPYMsi39An5aUjhAcEMZb6Cx1Sj1QWEWSiKJkBVQ@mail.gmail.com
    https://postgr.es/m/20180711185628.mrvl46bjgk2uxoki@alap3.anarazel.de
2019-04-07 22:42:42 -07:00
Andres Freund 41f5e04aec Fix a number of issues around modifying a previously updated row.
This commit fixes three, unfortunately related, issues:

1) Since 5db6df0c01, the introduction of DML via tableam, it was
   possible to trigger "ERROR: unexpected table_lock_tuple status: 1"
   when updating a row that was previously updated in the same
   transaction - but only when the previously updated row was before
   updated in a concurrent transaction (and READ COMMITTED was
   used). The reason for that was that that case simply wasn't
   expected. Fixing that lead to:

2) Even before the above commit, there were error checks (introduced
   in 6868ed7491) preventing a row being updated by different
   commands within the same statement (say in a function called by an
   UPDATE) - but that check wasn't performed when the row was first
   updated in a concurrent transaction - instead the second update was
   silently skipped in that case. After this change we throw the same
   error as we'd without the concurrent transaction.

3) The error messages (introduced in 6868ed7491) preventing such
   updates emitted the same error message for both DELETE and
   UPDATE ("tuple to be updated was already modified by an operation
   triggered by the current command"). While that could be changed
   separately, it made it hard to write tests that verify the correct
   correct behavior of the code.

This commit changes heap's implementation of table_lock_tuple() to
return TM_SelfModified instead of TM_Invisible (previously loosely
modeled after EvalPlanQualFetch), and teaches nodeModifyTable.c to
handle that in response to table_lock_tuple() and not just in response
to table_(delete|update).

Additionally it fixes the wrong error message (see 3 above). The
comment for table_lock_tuple() is also adjusted to state that
TM_Deleted won't return information in TM_FailureData - it'll not
always be available.

This also adds tests to ensure that DELETE/UPDATE correctly error out
when affecting a row that concurrently was modified by another
transaction.

Author: Andres Freund
Reported-By: Tom Lane, when investigating a bug bug fix to another bug
    by Amit Langote
Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
2019-04-07 22:14:47 -07:00
Michael Paquier 964bae4d84 Add more tests for partition tuple routing with dropped attributes
As bug #15733 has proved, we are lacking coverage for partition tuple
routing with dropped attributes when involving three levels of
partitioning or more.  There was only an active bug in this area for
v11, and HEAD is proving to handle those scenarios fine, still it lacked
some coverage for the previous problem.

Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/15733-7692379e310b80ec@postgresql.org
2019-04-08 13:44:55 +09:00
Tom Lane 80a96e066e Avoid fetching past the end of the indoption array.
pg_get_indexdef_worker carelessly fetched indoption entries even for
non-key index columns that don't have one.  99.999% of the time this
would be harmless, since the code wouldn't examine the value ... but
some fine day this will be a fetch off the end of memory, resulting
in SIGSEGV.

Detected through valgrind testing.  Odd that the buildfarm's valgrind
critters haven't noticed.
2019-04-07 18:19:16 -04:00
Alvaro Herrera 1c5d9270e3 psql \dP: list partitioned tables and indexes
The new command lists partitioned relations (tables and/or indexes),
possibly with their sizes, possibly including partitioned partitions;
their parents (if not top-level); if indexes show the tables they belong
to; and their descriptions.

While there are various possible improvements to this, having it in this
form is already a great improvement over not having any way to obtain
this report.

Author: Pavel Stěhule, with help from Mathias Brossard, Amit Langote and
	Justin Pryzby.
Reviewed-by: Amit Langote, Mathias Brossard, Melanie Plageman,
	Michaël Paquier, Álvaro Herrera
2019-04-07 15:07:21 -04:00
Tom Lane 159970bcad Clean up side-effects of commits ab5fcf2b0 et al.
Before those commits, partitioning-related code in the executor could
assume that ModifyTableState.resultRelInfo[] contains only leaf partitions.
However, now a fully-pruned update results in a dummy ModifyTable that
references the root partitioned table, and that breaks some stuff.

In v11, this led to an assertion or core dump in the tuple routing code.
Fix by disabling tuple routing, since we don't need that anyway.
(I chose to do that in HEAD as well for safety, even though the problem
doesn't manifest in HEAD as it stands.)

In v10, this confused ExecInitModifyTable's decision about whether it
needed to close the root table.  But we can get rid of that altogether
by being smarter about where to find the root table.

Note that since the referenced commits haven't shipped yet, this
isn't fixing any bug the field has seen.

Amit Langote, per a report from me

Discussion: https://postgr.es/m/20710.1554582479@sss.pgh.pa.us
2019-04-07 12:54:22 -04:00
Peter Eisentraut 03f9e5cba0 Report progress of REINDEX operations
This uses the same infrastructure that the CREATE INDEX progress
reporting uses.  Add a column to pg_stat_progress_create_index to
report the OID of the index being worked on.  This was not necessary
for CREATE INDEX, but it's useful for REINDEX.

Also edit the phase descriptions a bit to be more consistent with the
source code comments.

Discussion: https://www.postgresql.org/message-id/ef6a6757-c36a-9e81-123f-13b19e36b7d7%402ndquadrant.com
2019-04-07 12:35:29 +02:00
Peter Eisentraut 106f2eb664 Cast pg_stat_progress_cluster.cluster_index_relid to oid
It's tracked internally as bigint, but when presented to the user it
should be oid.
2019-04-07 10:31:32 +02:00
Tom Lane 46e3442c9e Fix failures in validateForeignKeyConstraint's slow path.
The foreign-key-checking loop in ATRewriteTables failed to ignore
relations without storage (e.g., partitioned tables), unlike the
initial loop.  This accidentally worked as long as RI_Initial_Check
succeeded, which it does in most practical cases (including all the
ones exercised in the existing regression tests :-().  However, if
that failed, as for instance when there are permissions issues,
then we entered the slow fire-the-trigger-on-each-tuple path.
And that would try to read from the referencing relation, and fail
if it lacks storage.

A second problem, recently introduced in HEAD, was that this loop
had been broken by sloppy refactoring for the tableam API changes.

Repair both issues, and add a regression test case so we have some
coverage on this code path.  Back-patch as needed to v11.

(It looks like this code could do with additional bulletproofing,
but let's get a working test case in place first.)

Hadi Moshayedi, Tom Lane, Andres Freund

Discussion: https://postgr.es/m/CAK=1=WrnNmBbe5D9sm3t0a6dnAq3cdbF1vXY816j1wsMqzC8bw@mail.gmail.com
Discussion: https://postgr.es/m/19030.1554574075@sss.pgh.pa.us
Discussion: https://postgr.es/m/20190325180405.jytoehuzkeozggxx%40alap3.anarazel.de
2019-04-06 15:09:09 -04:00
Michael Paquier 249d649996 Add support TCP user timeout in libpq and the backend server
Similarly to the set of parameters for keepalive, a connection parameter
for libpq is added as well as a backend GUC, called tcp_user_timeout.

Increasing the TCP user timeout is useful to allow a connection to
survive extended periods without end-to-end connection, and decreasing
it allows application to fail faster.  By default, the parameter is 0,
which makes the connection use the system default, and follows a logic
close to the keepalive parameters in its handling.  When connecting
through a Unix-socket domain, the parameters have no effect.

Author: Ryohei Nagaura
Reviewed-by: Fabien Coelho, Robert Haas, Kyotaro Horiguchi, Kirk
Jamison, Mikalai Keida, Takayuki Tsunakawa, Andrei Yahorau
Discussion: https://postgr.es/m/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
2019-04-06 15:23:37 +09:00
Tom Lane 959d00e9db Use Append rather than MergeAppend for scanning ordered partitions.
If we need ordered output from a scan of a partitioned table, but
the ordering matches the partition ordering, then we don't need to
use a MergeAppend to combine the pre-ordered per-partition scan
results: a plain Append will produce the same results.  This
both saves useless comparison work inside the MergeAppend proper,
and allows us to start returning tuples after istarting up just
the first child node not all of them.

However, all is not peaches and cream, because if some of the
child nodes have high startup costs then there will be big
discontinuities in the tuples-returned-versus-elapsed-time curve.
The planner's cost model cannot handle that (yet, anyway).
If we model the Append's startup cost as being just the first
child's startup cost, we may drastically underestimate the cost
of fetching slightly more tuples than are available from the first
child.  Since we've had bad experiences with over-optimistic choices
of "fast start" plans for ORDER BY LIMIT queries, that seems scary.
As a klugy workaround, set the startup cost estimate for an ordered
Append to be the sum of its children's startup costs (as MergeAppend
would).  This doesn't really describe reality, but it's less likely
to cause a bad plan choice than an underestimated startup cost would.
In practice, the cases where we really care about this optimization
will have child plans that are IndexScans with zero startup cost,
so that the overly conservative estimate is still just zero.

David Rowley, reviewed by Julien Rouhaud and Antonin Houska

Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
2019-04-05 19:20:43 -04:00
Alvaro Herrera 9f06d79ef8 Add facility to copy replication slots
This allows the user to create duplicates of existing replication slots,
either logical or physical, and even changing properties such as whether
they are temporary or the output plugin used.

There are multiple uses for this, such as initializing multiple replicas
using the slot for one base backup; when doing investigation of logical
replication issues; and to select a different output plugins.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Andres Freund, Petr Jelinek
Discussion: https://postgr.es/m/CAD21AoAm7XX8y_tOPP6j4Nzzch12FvA1wPqiO690RCk+uYVstg@mail.gmail.com
2019-04-05 18:05:18 -03:00
Thomas Munro de2b38419c Wake up interested backends when a checkpoint fails.
Commit c6c9474a switched to condition variables instead of sleep
loops to notify backends of checkpoint start and stop, but forgot
to broadcast in case of checkpoint failure.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGJKbCd%2B_K%2BSEBsbHxVT60SG0ivWHHAdvL0bLTUt2xpA2w%40mail.gmail.com
2019-04-06 09:31:48 +13:00
Tom Lane 478cacb50e Ensure consistent name matching behavior in processSQLNamePattern().
Prior to v12, if you used a collation-sensitive regex feature in a
pattern handled by processSQLNamePattern() (for instance, \d '\\w+'
in psql), the behavior you got matched the database's default collation.
Since commit 586b98fdf you'd usually get C-collation behavior, because
the catalog "name"-type columns are now marked as COLLATE "C".  Add
explicit COLLATE specifications to restore the prior behavior.

(Note for whoever writes the v12 release notes: the need for this shows
that while 586b98fdf preserved pre-v12 behavior of "name" columns for
simple comparison operators, it changed the behavior of regex operators
on those columns.  Although this patch fixes it for pattern matches
generated by our own tools, user-written queries will still be affected.
So we'd better mention this issue as a compatibility item.)

Daniel Vérité

Discussion: https://postgr.es/m/701e51f0-0ec0-4e70-a365-1958d66dd8d2@manitou-mail.org
2019-04-05 12:59:57 -04:00
Peter Eisentraut edda32ee25 Fix compiler warning
Rewrite get_attgenerated() to avoid compiler warning if the compiler
does not recognize that elog(ERROR) does not return.

Reported-by: David Rowley <david.rowley@2ndquadrant.com>
2019-04-05 09:23:07 +02:00
Noah Misch 82150a05be Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:52 -07:00
Thomas Munro 794c543b17 Fix bugs in mdsyncfiletag().
Commit 3eb77eba moved a _mdfd_getseg() call from mdsync() into a new
callback function mdsyncfiletag(), but didn't get the arguments quite
right.  Without the EXTENSION_DONT_CHECK_SIZE flag we fail to open a
segment if lower-numbered segments have been truncated, and it wants
a block number rather than a segment number.

While comparing with the older coding, also remove an unnecessary
clobbering of errno, and adjust the code in mdunlinkfiletag() to
ressemble the original code from mdpostckpt() more closely instead
of using an unnecessary call to smgropen().

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGL%2BYLUOA0eYiBXBfwW%2BbH5kFgh94%3DgQH0jHEJ-t5Y91wQ%40mail.gmail.com
2019-04-05 17:41:58 +13:00
Stephen Frost c46c85d459 Handle errors during GSSAPI startup better
There was some confusion over the format of the error message returned
from the server during GSSAPI startup; specifically, it was expected
that a length would be returned when, in reality, at this early stage in
the startup sequence, no length is returned from the server as part of
an error message.

Correct the client-side code for dealing with error messages sent by the
server during startup by simply reading what's available into our
buffer, after we've discovered it's an error message, and then reporting
back what was returned.

In passing, also add in documentation of the environment variable
PGGSSENCMODE which was missed previously, and adjust the code to look
for the PGGSSENCMODE variable (the environment variable change was
missed in the prior GSSMODE -> GSSENCMODE commit).

Error-handling issue discovered by Peter Eisentraut, the rest were items
discovered during testing of the error handling.
2019-04-04 22:52:42 -04:00
Andres Freund 57a7a3adfe Remove unused struct member, enforce multi_insert callback presence.
Author: David Rowley, Andres Freund
Discussion: https://postgr.es/m/CAKJS1f9=9phmm66diAji4gvHnWSrK7BGFoNct+mEUT_c8pPOjw@mail.gmail.com
2019-04-04 17:39:39 -07:00
Andres Freund ea97e440b8 Harden tableam against nonexistant / wrong kind of AMs.
Previously it was allowed to set default_table_access_method to an
empty string. That makes sense for default_tablespace, where that was
copied from, as it signals falling back to the database's default
tablespace. As there is no equivalent for table AMs, forbid that.

Also make sure to throw a usable error when creating a table using an
index AM, by using get_am_type_oid() to implement get_table_am_oid()
instead of a separate copy. Previously we'd error out only later, in
GetTableAmRoutine().

Thirdly remove GetTableAmRoutineByAmId() - it was only used in an
earlier version of 8586bf7ed8.

Add tests for the above (some for index AMs as well).
2019-04-04 17:39:39 -07:00
Peter Geoghegan 344b7e11bb Add test coverage for rootdescend verification.
Commit c1afd175, which added support for rootdescend verification to
amcheck, added only minimal regression test coverage.  Address this by
making sure that rootdescend verification is run on a multi-level index.
In passing, simplify some of the regression tests that exercise
multi-level nbtree page deletion.

Both issues spotted while rereviewing coverage of the nbtree patch
series using gcov.
2019-04-04 17:25:35 -07:00
Andres Freund 86b85044e8 tableam: Add table_multi_insert() and revamp/speed-up COPY FROM buffering.
This adds table_multi_insert(), and converts COPY FROM, the only user
of heap_multi_insert, to it.

A simple conversion of COPY FROM use slots would have yielded a
slowdown when inserting into a partitioned table for some
workloads. Different partitions might need different slots (both slot
types and their descriptors), and dropping / creating slots when
there's constant partition changes is measurable.

Thus instead revamp the COPY FROM buffering for partitioned tables to
allow to buffer inserts into multiple tables, flushing only when
limits are reached across all partition buffers. By only dropping
slots when there've been inserts into too many different partitions,
the aforementioned overhead is gone. By allowing larger batches, even
when there are frequent partition changes, we actuall speed such cases
up significantly.

By using slots COPY of very narrow rows into unlogged / temporary
might slow down very slightly (due to the indirect function calls).

Author: David Rowley, Andres Freund, Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190327054923.t3epfuewxfqdt22e@alap3.anarazel.de
2019-04-04 16:28:18 -07:00
Tom Lane 7bac3acab4 Add a "SQLSTATE-only" error verbosity option to libpq and psql.
This is intended for use mostly in test scripts for external tools,
which could do without cross-PG-version variations in error message
wording.  Of course, the SQLSTATE isn't guaranteed stable either, but
it should be more so than the error message text.

Note: there's a bit of an ABI change for libpq here, but it seems
OK because if somebody compiles against a newer version of libpq-fe.h,
and then tries to pass PQERRORS_SQLSTATE to PQsetErrorVerbosity()
of an older libpq library, it will be accepted and then act like
PQERRORS_DEFAULT, thanks to the way the tests in pqBuildErrorMessage3
have historically been phrased.  That seems acceptable.

Didier Gautheron, reviewed by Dagfinn Ilmari Mannsåker

Discussion: https://postgr.es/m/CAJRYxuKyj4zA+JGVrtx8OWAuBfE-_wN4sUMK4H49EuPed=mOBw@mail.gmail.com
2019-04-04 17:22:02 -04:00
Alvaro Herrera 413ccaa74d pg_restore: Require "-f -" to mean stdout
The previous convention that stdout was selected by default when nothing
is specified was just too error-prone.

After a suggestion from Andrew Gierth.
Author: Euler Taveira
Reviewed-by: Yoshikazu Imai, José Arthur Benetasso Villanova
Discussion: https://postgr.es/m/87sgwrmhdv.fsf@news-spur.riddles.org.uk
2019-04-04 16:54:15 -03:00
Tom Lane 9c703c169a Make queries' locking of indexes more consistent.
The assertions added by commit b04aeb0a0 exposed that there are some
code paths wherein the executor will try to open an index without
holding any lock on it.  We do have some lock on the index's table,
so it seems likely that there's no fatal problem with this (for
instance, the index couldn't get dropped from under us).  Still,
it's bad practice and we should fix it.

To do so, remove the optimizations in ExecInitIndexScan and friends
that tried to avoid taking a lock on an index belonging to a target
relation, and just take the lock always.  In non-bug cases, this
will result in no additional shared-memory access, since we'll find
in the local lock table that we already have a lock of the desired
type; hence, no significant performance degradation should occur.

Also, adjust the planner and executor so that the type of lock taken
on an index is always identical to the type of lock taken for its table,
by relying on the recently added RangeTblEntry.rellockmode field.
This avoids some corner cases where that might not have been true
before (possibly resulting in extra locking overhead), and prevents
future maintenance issues from having multiple bits of logic that
all needed to be in sync.  In addition, this change removes all core
calls to ExecRelationIsTargetRelation, which avoids a possible O(N^2)
startup penalty for queries with large numbers of target relations.
(We'd probably remove that function altogether, were it not that we
advertise it as something that FDWs might want to use.)

Also adjust some places in selfuncs.c to not take any lock on indexes
they are transiently opening, since we can assume that plancat.c
did that already.

In passing, change gin_clean_pending_list() to take RowExclusiveLock
not AccessShareLock on its target index.  Although it's not clear that
that's actually a bug, it seemed very strange for a function that's
explicitly going to modify the index to use only AccessShareLock.

David Rowley, reviewed by Julien Rouhaud and Amit Langote,
a bit of further tweaking by me

Discussion: https://postgr.es/m/19465.1541636036@sss.pgh.pa.us
2019-04-04 15:12:58 -04:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Peter Geoghegan 74eb2176bf Invalidate binary search bounds consistently.
_bt_check_unique() failed to invalidate binary search bounds in the
event of a live conflict following commit e5adcb78.  This resulted in
problems after waiting for the conflicting xact to commit or abort.  The
subsequent call to _bt_check_unique() would restore the initial binary
search bounds, rather than starting a new search.  Fix by explicitly
invalidating bounds when it becomes clear that there is a live conflict
that insertion will have to wait to resolve.

Ashutosh Sharma, with a few additional tweaks by me.

Author: Ashutosh Sharma
Reported-By: Ashutosh Sharma
Diagnosed-By: Ashutosh Sharma
Discussion: https://postgr.es/m/CAE9k0PnQp-qr-UYKMSCzdC2FBzdE4wKP41hZrZvvP26dKLonLg@mail.gmail.com
2019-04-04 09:38:08 -07:00
Stephen Frost 87e16db5eb Move the be_gssapi_get_* prototypes
The be_gssapi_get_* prototypes were put close to similar ones for SSL-
but a bit too close since that meant they ended up only being included
for SSL-enabled builds.  Move those to be under ENABLE_GSS instead.

Pointed out by Tom.
2019-04-04 11:11:46 -04:00
Thomas Munro 3eb77eba5a Refactor the fsync queue for wider use.
Previously, md.c and checkpointer.c were tightly integrated so that
fsync calls could be handed off and processed in the background.
Introduce a system of callbacks and file tags, so that other modules
can hand off fsync work in the same way.

For now only md.c uses the new interface, but other users are being
proposed.  Since there may be use cases that are not strictly SMGR
implementations, use a new function table for sync handlers rather
than extending the traditional SMGR one.

Instead of using a bitmapset of segment numbers for each RelFileNode
in the checkpointer's hash table, make the segment number part of the
key.  This requires sending explicit "forget" requests for every
segment individually when relations are dropped, but suits the file
layout schemes of proposed future users better (ie sparse or high
segment numbers).

Author: Shawn Debnath and Thomas Munro
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAEepm=2gTANm=e3ARnJT=n0h8hf88wqmaZxk0JYkxw+b21fNrw@mail.gmail.com
2019-04-04 23:38:38 +13:00
Noah Misch 6f0e190056 Silence -Wimplicit-fallthrough in sysv_shmem.c.
Commit 2f932f71d9 added code that elicits
a warning on buildfarm member flaviventris.  Back-patch to 9.4, like
that commit.

Reported by Andres Freund.

Discussion: https://postgr.es/m/20190404020057.galelv7by75ekqrh@alap3.anarazel.de
2019-04-03 23:23:35 -07:00
Noah Misch 16ee6eaf80 Make src/test/recovery/t/017_shm.pl safe for concurrent execution.
Buildfarm members idiacanthus and komodoensis, which share a host, both
executed this test in the same second.  That failed.  Back-patch to 9.6,
where the test first appeared.

Discussion: https://postgr.es/m/20190404020543.GA1319573@rfd.leadboat.com
2019-04-03 23:16:37 -07:00
Michael Paquier 92c76021ae Improve readability of some tests in strings.sql
c251336 has added some tests to check if a toast relation should be
empty or not, hardcoding the toast relation name when calling
pg_relation_size().  pg_class.reltoastrelid offers the same information,
so simplify the tests to use that.

Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190403065949.GH3298@paquier.xyz
2019-04-04 10:24:56 +09:00
Andres Freund b73c3a1196 tableam: basic documentation.
This adds documentation about the user oriented parts of table access
methods (i.e. the default_table_access_method GUC and the USING clause
for CREATE TABLE etc), adds a basic chapter about the table access
method interface, and adds a note to storage.sgml that it's contents
don't necessarily apply for non-builtin AMs.

Author: Haribabu Kommi and Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-04-03 17:40:29 -07:00
Noah Misch ab9ed9be23 Assert that pgwin32_signal_initialize() has been called early enough.
Before the pgwin32_signal_initialize() call, the backend version of
pg_usleep() has no effect.  No in-tree code falls afoul of that today,
but temporary commit 23078689a9 did so.

Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com
2019-04-03 17:11:16 -07:00
Noah Misch f433394e48 Handle USE_MODULE_DB for all tests able to use an installed postmaster.
When $(MODULES) and $(MODULE_big) are empty, derive the database name
from the first element of $(REGRESS) instead of using a constant string.
When deriving the database name from $(MODULES), use its first element
instead of the entire list; the earlier approach would fail if any
multi-module directory had $(REGRESS) tests.  Treat isolation suites and
src/pl correspondingly.  Under USE_MODULE_DB=1, installcheck-world and
check-world no longer reuse any database name in a given postmaster.
Buildfarm members axolotl, mandrill and frogfish saw spurious "is being
accessed by other users" failures that would not have happened without
database name reuse.  (The CountOtherDBBackends() 5s deadline expired
during DROP DATABASE; a backend for an earlier test suite had used the
same database name and had not yet exited.)  Back-patch to 9.4 (all
supported versions), except bits pertaining to isolation suites.

Concept reviewed by Andrew Dunstan, Andres Freund and Tom Lane.

Discussion: https://postgr.es/m/20190401135213.GE891537@rfd.leadboat.com
2019-04-03 17:06:01 -07:00
Noah Misch 2f932f71d9 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:46 -07:00
Tomas Vondra ea569d64ac Add SETTINGS option to EXPLAIN, to print modified settings.
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values.  With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan.  Only options affecting planning,
with values different from the built-in default are printed.

This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.

Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
2019-04-04 00:04:31 +02:00
Alvaro Herrera d1f04b96b9 Tweak docs for log_statement_sample_rate
Author: Justin Pryzby, partly after a suggestion from Masahiko Sawada
Discussion: https://postgr.es/m/20190328135918.GA27808@telsasoft.com
Discussion: https://postgr.es/m/CAD21AoB9+y8N4+Fan-ne-_7J5yTybPttxeVKfwUocKp4zT1vNQ@mail.gmail.com
2019-04-03 18:56:56 -03:00
Alvaro Herrera 799e220346 Log all statements from a sample of transactions
This is useful to obtain a view of the different transaction types in an
application, regardless of the durations of the statements each runs.

Author: Adrien Nayrat
Reviewed-by: Masahiko Sawada, Hayato Kuroda, Andres Freund
2019-04-03 18:43:59 -03:00
Tom Lane d8c0bd9fef Remove now-unnecessary thread pointer arguments in pgbench.
Not required after nuking the zipfian thread-local cache.

Also add a comment about hazardous pointer punning in threadRun(),
and avoid using "thread" to refer to the threads array as a whole.

Fabien Coelho and Tom Lane, per suggestion from Alvaro Herrera

Discussion: https://postgr.es/m/alpine.DEB.2.21.1904032126060.7997@lancre
2019-04-03 17:16:09 -04:00
Tomas Vondra c50b3158bf Reduce overhead of pg_mcv_list (de)serialization
Commit ea4e1c0e8f resolved issues with memory alignment in serialized
pg_mcv_list values, but it required copying data to/from the varlena
buffer during serialization and deserialization.  As the MCV lits may
be fairly large, the overhead (memory consumption, CPU usage) can get
rather significant too.

This change tweaks the serialization format so that the alignment is
correct with respect to the varlena value, and so the parts may be
accessed directly without copying the data.

Catversion bump, as it affects existing pg_statistic_ext data.
2019-04-03 21:23:40 +02:00
Stephen Frost b0b39f72b9 GSSAPI encryption support
On both the frontend and backend, prepare for GSSAPI encryption
support by moving common code for error handling into a separate file.
Fix a TODO for handling multiple status messages in the process.
Eliminate the OIDs, which have not been needed for some time.

Add frontend and backend encryption support functions.  Keep the
context initiation for authentication-only separate on both the
frontend and backend in order to avoid concerns about changing the
requested flags to include encryption support.

In postmaster, pull GSSAPI authorization checking into a shared
function.  Also share the initiator name between the encryption and
non-encryption codepaths.

For HBA, add "hostgssenc" and "hostnogssenc" entries that behave
similarly to their SSL counterparts.  "hostgssenc" requires either
"gss", "trust", or "reject" for its authentication.

Similarly, add a "gssencmode" parameter to libpq.  Supported values are
"disable", "require", and "prefer".  Notably, negotiation will only be
attempted if credentials can be acquired.  Move credential acquisition
into its own function to support this behavior.

Add a simple pg_stat_gssapi view similar to pg_stat_ssl, for monitoring
if GSSAPI authentication was used, what principal was used, and if
encryption is being used on the connection.

Finally, add documentation for everything new, and update existing
documentation on connection security.

Thanks to Michael Paquier for the Windows fixes.

Author: Robbie Harwood, with changes to the read/write functions by me.
Reviewed in various forms and at different times by: Michael Paquier,
   Andres Freund, David Steele.
Discussion: https://www.postgresql.org/message-id/flat/jlg1tgq1ktm.fsf@thriss.redhat.com
2019-04-03 15:02:33 -04:00
Alvaro Herrera 5f6fc34af5 Copy name when cloning FKs recurses to partitions
We were passing a string owned by a syscache entry, which was released
before recursing.  Fix by pstrdup'ing the string.

Per buildfarm member prion.
2019-04-03 15:35:54 -03:00
Alvaro Herrera f56f8f8da6 Support foreign keys that reference partitioned tables
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys.  Now it is possible to do that.

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
2019-04-03 14:40:21 -03:00
Heikki Linnakangas 9155580fd5 Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.

For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.

Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 17:03:15 +03:00
Alvaro Herrera 5f768045a1 Correctly initialize newly added struct member
Valgrind was rightly complaining that IndexVacuumInfo->report_progress
(added by commit ab0dfc961b) was not being initialized in some code
paths.  Repair.

Per buildfarm member lousyjack.
2019-04-03 09:58:47 -03:00
Alvaro Herrera e8abf97af7 Prevent use of uninitialized variable
Per buildfarm member longfin.
2019-04-02 16:03:26 -03:00
Alvaro Herrera 11074f26bc Update expected output for modified catalog definition
Pilot error in previous commit
2019-04-02 15:45:45 -03:00
Alvaro Herrera ab0dfc961b Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.

There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.

The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)

Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 15:18:08 -03:00
Stephen Frost 4d0e994eed Add support for partial TOAST decompression
When asked for a slice of a TOAST entry, decompress enough to return the
slice instead of decompressing the entire object.

For use cases where the slice is at, or near, the beginning of the entry,
this avoids a lot of unnecessary decompression work.

This changes the signature of pglz_decompress() by adding a boolean to
indicate if it's ok for the call to finish before consuming all of the
source or destination buffers.

Author: Paul Ramsey
Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe
Discussion: https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
2019-04-02 12:35:32 -04:00
Etsuro Fujita d50d172e51 postgres_fdw: Perform the (FINAL, NULL) upperrel operations remotely.
The upper-planner pathification allows FDWs to arrange to push down
different types of upper-stage operations to the remote side.  This
commit teaches postgres_fdw to do it for the (FINAL, NULL) upperrel,
which is responsible for doing LockRows, LIMIT, and/or ModifyTable.
This provides the ability for postgres_fdw to handle SELECT commands
so that it 1) skips the LockRows step (if any) (note that this is
safe since it performs early locking) and 2) pushes down the LIMIT
and/or OFFSET restrictions (if any) to the remote side.  This doesn't
handle the INSERT/UPDATE/DELETE cases.

Author: Etsuro Fujita
Reviewed-By: Antonin Houska and Jeff Janes
Discussion: https://postgr.es/m/87pnz1aby9.fsf@news-spur.riddles.org.uk
2019-04-02 20:30:45 +09:00
Etsuro Fujita aef65db676 Refactor create_limit_path() to share cost adjustment code with FDWs.
This is in preparation for an upcoming commit.

Author: Etsuro Fujita
Reviewed-By: Antonin Houska and Jeff Janes
Discussion: https://postgr.es/m/87pnz1aby9.fsf@news-spur.riddles.org.uk
2019-04-02 19:55:12 +09:00
Dean Rasheed e2d28c0f40 Perform RLS subquery checks as the right user when going via a view.
When accessing a table with RLS via a view, the RLS checks are
performed as the view owner. However, the code neglected to propagate
that to any subqueries in the RLS checks. Fix that by calling
setRuleCheckAsUser() for all RLS policy quals and withCheckOption
checks for RTEs with RLS.

Back-patch to 9.5 where RLS was added.

Per bug #15708 from daurnimator.

Discussion: https://postgr.es/m/15708-d65cab2ce9b1717a@postgresql.org
2019-04-02 08:13:59 +01:00
Michael Paquier 280e5f1405 Add progress reporting to pg_checksums
This adds a new option to pg_checksums called -P/--progress, showing
every second some information about the computation state of an
operation for --check and --enable (--disable only updates the control
file and is quick).  This requires a pre-scan of the data folder so as
the total size of checksummable items can be calculated, and then it
gets compared to the amount processed.

Similarly to what is done for pg_rewind and pg_basebackup, the
information printed in the progress report consists of the current
amount of data computed and the total amount of data to compute.  This
could be extended later on.

Author: Michael Banck, Bernd Helmle
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/1535719851.1286.17.camel@credativ.de
2019-04-02 10:58:07 +09:00
Thomas Munro 475861b261 Add wal_recycle and wal_init_zero GUCs.
On at least ZFS, it can be beneficial to create new WAL files every
time and not to bother zero-filling them.  Since it's not clear which
other filesystems might benefit from one or both of those things,
add individual GUCs to control those two behaviors independently and
make only very general statements in the docs.

Author: Jerry Jelinek, with some adjustments by Thomas Munro
Reviewed-by: Alvaro Herrera, Andres Freund, Tomas Vondra, Robert Haas and others
Discussion: https://postgr.es/m/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com
2019-04-02 14:37:14 +13:00
Andres Freund d45e401586 tableam: Add table_finish_bulk_insert().
This replaces the previous calls of heap_sync() in places using
bulk-insert. By passing in the flags used for bulk-insert the AM can
decide (first at insert time and then during the finish call) which of
the optimizations apply to it, and what operations are necessary to
finish a bulk insert operation.

Also change HEAP_INSERT_* flags to TABLE_INSERT, and rename hi_options
to ti_options.

These changes are made even in copy.c, which hasn't yet been converted
to tableam. There's no harm in doing so.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-04-01 14:41:42 -07:00
Tom Lane 26a76cb640 Restrict pgbench's zipfian parameter to ensure good performance.
Remove the code that supported zipfian distribution parameters less
than 1.0, as it had undocumented performance hazards, and it's not
clear that the case is useful enough to justify either fixing or
documenting those hazards.

Also, since the code path for parameter > 1.0 could perform badly
for values very close to 1.0, establish a minimum allowed value
of 1.001.  This solution seems superior to the previous vague
documentation warning about small values not performing well.

Fabien Coelho, per a gripe from Tomas Vondra

Discussion: https://postgr.es/m/b5e172e9-ad22-48a3-86a3-589afa20e8f7@2ndquadrant.com
2019-04-01 17:37:34 -04:00
Thomas Munro 4fd05bb55b Fix deadlock in heap_compute_xid_horizon_for_tuples().
We can't call code that uses syscache while we hold buffer locks
on a catalog relation.  If passed such a relation, just fall back
to the general effective_io_concurrency GUC rather than trying to
look up the containing tablespace's IO concurrency setting.

We might find a better way to control prefetching in follow-up
work, but for now this is enough to avoid the deadlock introduced
by commit 558a9165e0.

Reviewed-by: Andres Freund
Diagnosed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CA%2BhUKGLCwPF0S4Mk7S8qw%2BDK0Bq65LueN9rofAA3HHSYikW-Zw%40mail.gmail.com
Discussion: https://postgr.es/m/962831d8-c18d-180d-75fb-8b842e3a2742%40chrullrich.net
2019-04-02 09:29:49 +13:00
Tom Lane 12d46ac392 Improve documentation about our XML functionality.
Add a section explaining how our XML features depart from current
versions of the SQL standard.  Update and clarify the descriptions
of some XML functions.

Chapman Flack, reviewed by Ryan Lambert

Discussion: https://postgr.es/m/5BD1284C.1010305@anastigmatix.net
Discussion: https://postgr.es/m/5C81F8C0.6090901@anastigmatix.net
Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
2019-04-01 16:20:22 -04:00
Tom Lane b2b819019f Add volatile qualifier missed in commit 2e616dee9.
Noted by Pavel Stehule

Discussion: https://postgr.es/m/CAFj8pRAaGO5FX7bnP3E=mRssoK8y5T78x7jKy-vDiyS68L888Q@mail.gmail.com
2019-04-01 14:37:25 -04:00
Peter Eisentraut cc8d415117 Unified logging system for command-line programs
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.

Features:

- Program name is automatically prefixed.

- Message string does not end with newline.  This removes a common
  source of inconsistencies and omissions.

- Additionally, a final newline is automatically stripped, simplifying
  use of PQerrorMessage() etc., another common source of mistakes.

- I converted error message strings to use %m where possible.

- As a result of the above several points, more translatable message
  strings can be shared between different components and between
  frontends and backend, without gratuitous punctuation or whitespace
  differences.

- There is support for setting a "log level".  This is not meant to be
  user-facing, but can be used internally to implement debug or
  verbose modes.

- Lazy argument evaluation, so no significant overhead if logging at
  some level is disabled.

- Some color in the messages, similar to gcc and clang.  Set
  PG_COLOR=auto to try it out.  Some colors are predefined, but can be
  customized by setting PG_COLORS.

- Common files (common/, fe_utils/, etc.) can handle logging much more
  simply by just using one API without worrying too much about the
  context of the calling program, requiring callbacks, or having to
  pass "progname" around everywhere.

- Some programs called setvbuf() to make sure that stderr is
  unbuffered, even on Windows.  But not all programs did that.  This
  is now done centrally.

Soft goals:

- Reduces vertical space use and visual complexity of error reporting
  in the source code.

- Encourages more deliberate classification of messages.  For example,
  in some cases it wasn't clear without analyzing the surrounding code
  whether a message was meant as an error or just an info.

- Concepts and terms are vaguely aligned with popular logging
  frameworks such as log4j and Python logging.

This is all just about printing stuff out.  Nothing affects program
flow (e.g., fatal exits).  The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.

I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded.  One significant change is that
pg_rewind used to write all error messages to stdout.  That is now
changed to stderr.

Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
2019-04-01 20:01:35 +02:00
Alexander Korotkov b4cc19ab01 Throw error in jsonb_path_match() when result is not single boolean
jsonb_path_match() checks if jsonb document matches jsonpath query.  Therefore,
jsonpath query should return single boolean.  Currently, if result of jsonpath
is not a single boolean, NULL is returned independently whether silent mode
is on or off.  But that appears to be wrong when silent mode is off.  This
commit makes jsonb_path_match() throw an error in this case.

Author: Nikita Glukhov
2019-04-01 18:09:20 +03:00
Alexander Korotkov 2e643501e5 Restrict some cases in parsing numerics in jsonpath
Jsonpath now accepts integers with leading zeroes and floats starting with
a dot.  However, SQL standard requires to follow JSON specification, which
doesn't allow none of these cases.  Our json[b] datatypes also restrict that.
So, restrict it in jsonpath altogether.

Author: Nikita Glukhov
2019-04-01 18:09:09 +03:00
Alexander Korotkov 0a02e2ae02 GIN support for @@ and @? jsonpath operators
This commit makes existing GIN operator classes jsonb_ops and json_path_ops
support "jsonb @@ jsonpath" and "jsonb @? jsonpath" operators.  Basic idea is
to extract statements of following form out of jsonpath.

 key1.key2. ... .keyN = const

The rest of jsonpath is rechecked from heap.

Catversion is bumped.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Nikita Glukhov, Alexander Korotkov
Reviewed-by: Jonathan Katz, Pavel Stehule
2019-04-01 18:08:52 +03:00
Peter Eisentraut 7241911782 Catch syntax error in generated column definition
The syntax

    GENERATED BY DEFAULT AS (expr)

is not allowed but we have to accept it in the grammar to avoid
shift/reduce conflicts because of the similar syntax for identity
columns.  The existing code just ignored this, incorrectly.  Add an
explicit error check and a bespoke error message.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
2019-04-01 10:46:37 +02:00
Michael Paquier 4ae7f02b03 Fix thinko in allocation call during MVC list deserialization
Spotted by Coverity.
2019-04-01 14:16:27 +09:00
Noah Misch 5a907404b5 Update HINT for pre-existing shared memory block.
One should almost always terminate an old process, not use a manual
removal tool like ipcrm.  Removal of the ipcclean script eleven years
ago (39627b1ae6) and its non-replacement
corroborate that manual shm removal is now a niche goal.  Back-patch to
9.4 (all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20180812064815.GB2301738@rfd.leadboat.com
2019-03-31 19:32:48 -07:00
Andres Freund bfbcad478f tableam: bitmap table scan.
This moves bitmap heap scan support to below an optional tableam
callback. It's optional as the whole concept of bitmap heapscans is
fairly block specific.

This basically moves the work previously done in bitgetpage() into the
new scan_bitmap_next_block callback, and the direct poking into the
buffer done in BitmapHeapNext() into the new scan_bitmap_next_tuple()
callback.

The abstraction is currently somewhat leaky because
nodeBitmapHeapscan.c's prefetching and visibilitymap based logic
remains - it's likely that we'll later have to move more into the
AM. But it's not trivial to do so without introducing a significant
amount of code duplication between the AMs, so that's a project for
later.

Note that now nodeBitmapHeapscan.c and the associated node types are a
bit misnamed. But it's not clear whether renaming wouldn't be a cure
worse than the disease. Either way, that'd be best done in a separate
commit.

Author: Andres Freund
Reviewed-By: Robert Haas (in an older version)
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-31 18:37:57 -07:00
Andres Freund 73c954d248 tableam: sample scan.
This moves sample scan support to below tableam. It's not optional as
there is, in contrast to e.g. bitmap heap scans, no alternative way to
perform tablesample queries. If an AM can't deal with the block based
API, it will have to throw an ERROR.

The tableam callbacks for this are block based, but given the current
TsmRoutine interface, that seems to be required.

The new interface doesn't require TsmRoutines to perform visibility
checks anymore - that requires the TsmRoutine to know details about
the AM, which we want to avoid.  To continue to allow taking the
returned number of tuples account SampleScanState now has a donetuples
field (which previously e.g. existed in SystemRowsSamplerData), which
is only incremented after the visibility check succeeds.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-31 18:37:57 -07:00
Andres Freund 4bb50236eb tableam: Formatting and other minor cleanups.
The superflous heapam_xlog.h includes were reported by Peter
Geoghegan.
2019-03-31 18:16:53 -07:00
Peter Geoghegan 76a39f2295 Fix nbtree high key "continuescan" row compare bug.
Commit 29b64d1d mishandled skipping over truncated high key attributes
during row comparisons.  The row comparison key matching loop would loop
forever when a truncated attribute was encountered for a row compare
subkey.  Fix by following the example of other code in the loop: advance
the current subkey, or break out of the loop when the last subkey is
reached.

Add test coverage for the relevant _bt_check_rowcompare() code path.
The new test case is somewhat tied to nbtree implementation details,
which isn't ideal, but seems unavoidable.
2019-03-31 17:24:04 -07:00
Tom Lane 8fba397f0c Add test case exercising formerly-unreached code in inheritance_planner.
There was some debate about whether the code I'd added to remap
AppendRelInfos obtained from the initial SELECT planning run is
actually necessary.  Add a test case demonstrating that it is.

Discussion: https://postgr.es/m/23831.1553873385@sss.pgh.pa.us
2019-03-31 15:49:06 -04:00
Tom Lane 9fd4de119c Compute root->qual_security_level in a less random place.
We can set this up once and for all in subquery_planner's initial survey
of the flattened rangetable, rather than incrementally adjusting it in
build_simple_rel.  The previous approach made it rather hard to reason
about exactly when the value would be available, and we were definitely
using it in some places before the final value was computed.

Noted while fooling around with Amit Langote's patch to delay creation
of inheritance child rels.  That didn't break this code, but it made it
even more fragile, IMO.
2019-03-31 13:47:41 -04:00
Michael Paquier 2aa6e331ea Skip redundant anti-wraparound vacuums
An anti-wraparound vacuum has to be by definition aggressive as it needs
to work on all the pages of a relation.  However it can happen that due
to some concurrent activity an anti-wraparound vacuum is marked as
non-aggressive, which makes it redundant with a previous run, and
it is actually useless as an anti-wraparound vacuum should process all
the pages of a relation.  This commit makes such vacuums to be skipped.

An anti-wraparound vacuum not aggressive can be found easily by mixing
low values of autovacuum_freeze_max_age (to control anti-wraparound) and
autovacuum_freeze_table_age (to control the aggressiveness).

28a8fa9 has added some extra logging printing all the possible
combinations of anti-wraparound and aggressive vacuums, which now gets
simplified as an anti-wraparound vacuum also non-aggressive gets
skipped.

Per discussion mainly between Andrew Dunstan, Robert Haas, Álvaro
Herrera, Kyotaro Horiguchi, Masahiko Sawada, and myself.

Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Andrew Dunstan, Álvaro Herrera
Discussion: https://postgr.es/m/20180914153554.562muwr3uwujno75@alvherre.pgsql
2019-03-31 22:59:12 +09:00
Andrew Dunstan 47b3c26642 Have pg_upgrade's Makefile honor NO_TEMP_INSTALL
Backpatch to 9.5, when pg_upgrade's location changed.

Discussion: https://postgr.es/m/5506b8fa-7dad-8483-053c-7ca7ef04f01a@2ndQuadrant.com
2019-03-31 08:19:05 -04:00
Andres Freund 696d78469f tableam: Move heap specific logic from estimate_rel_size below tableam.
This just moves the table/matview[/toast] determination of relation
size to a callback, and uses a copy of the existing logic to implement
that callback for heap.

It probably would make sense to also move the index specific logic
into a callback, so the metapage handling (and probably more) can be
index specific. But that's a separate task.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-30 19:26:36 -07:00
Andres Freund 737a292b5d tableam: VACUUM and ANALYZE support.
This is a relatively straightforward move of the current
implementation to sit below tableam. As the current analyze sampling
implementation is pretty inherently block based, the tableam analyze
interface is as well. It might make sense to generalize that at some
point, but that seems like a larger project that shouldn't be
undertaken at the same time as the introduction of tableam.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-30 19:25:58 -07:00
Tomas Vondra 0f5493fdf1 Fix typo
Author: John Naylor
2019-03-31 03:29:58 +02:00
Tom Lane 428b260f87 Speed up planning when partitions can be pruned at plan time.
Previously, the planner created RangeTblEntry and RelOptInfo structs
for every partition of a partitioned table, even though many of them
might later be deemed uninteresting thanks to partition pruning logic.
This incurred significant overhead when there are many partitions.
Arrange to postpone creation of these data structures until after
we've processed the query enough to identify restriction quals for
the partitioned table, and then apply partition pruning before not
after creation of each partition's data structures.  In this way
we need not open the partition relations at all for partitions that
the planner has no real interest in.

For queries that can be proven at plan time to access only a small
number of partitions, this patch improves the practical maximum
number of partitions from under 100 to perhaps a few thousand.

Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley

Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-30 18:58:55 -04:00
Tomas Vondra ad3107b973 Fix compiler warnings in multivariate MCV code
Compiler warnings were observed on gcc 3.4.6 (on gaur).

The assert is unnecessary, as the indexes are uint16 and so always >= 0.

Reported-by: Tom Lane
2019-03-30 18:43:16 +01:00
Tomas Vondra ea4e1c0e8f Additional fixes of memory alignment in pg_mcv_list code
Commit d85e0f366a tried to fix memory alignment issues in serialization
and deserialization of pg_mcv_list values, but it was a few bricks shy.
The arrays of uint16 indexes in serialized items was not aligned, and
the both the values and isnull flags were using the same pointer.

Per investigation by Tom Lane on gaur.
2019-03-30 18:34:59 +01:00
Tom Lane 7ad6498fd5 Avoid crash in partitionwise join planning under GEQO.
While trying to plan a partitionwise join, we may be faced with cases
where one or both input partitions for a particular segment of the join
have been pruned away.  In HEAD and v11, this is problematic because
earlier processing didn't bother to make a pruned RelOptInfo fully
valid.  With an upcoming patch to make partition pruning more efficient,
this'll be even more problematic because said RelOptInfo won't exist at
all.

The existing code attempts to deal with this by retroactively making the
RelOptInfo fully valid, but that causes crashes under GEQO because join
planning is done in a short-lived memory context.  In v11 we could
probably have fixed this by switching to the planner's main context
while fixing up the RelOptInfo, but that idea doesn't scale well to the
upcoming patch.  It would be better not to mess with the base-relation
data structures during join planning, anyway --- that's just a recipe
for order-of-operations bugs.

In many cases, though, we don't actually need the child RelOptInfo,
because if the input is certainly empty then the join segment's result
is certainly empty, so we can skip making a join plan altogether.  (The
existing code ultimately arrives at the same conclusion, but only after
doing a lot more work.)  This approach works except when the pruned-away
partition is on the nullable side of a LEFT, ANTI, or FULL join, and the
other side isn't pruned.  But in those cases the existing code leaves a
lot to be desired anyway --- the correct output is just the result of
the unpruned side of the join, but we were emitting a useless outer join
against a dummy Result.  Pending somebody writing code to handle that
more nicely, let's just abandon the partitionwise-join optimization in
such cases.

When the modified code skips making a join plan, it doesn't make a
join RelOptInfo either; this requires some upper-level code to
cope with nulls in part_rels[] arrays.  We would have had to have
that anyway after the upcoming patch.

Back-patch to v11 since the crash is demonstrable there.

Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
2019-03-30 12:48:32 -04:00
Peter Eisentraut fc22b6623b Generated columns
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.

This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-30 08:15:57 +01:00
Peter Eisentraut 6b8b5364dd Small code simplification for REINDEX CONCURRENTLY
This was left over from an earlier code structure.
2019-03-30 07:16:24 +01:00
Peter Geoghegan 9c7fb7e6d8 Tweak some nbtree-related code comments. 2019-03-29 12:29:05 -07:00
Tomas Vondra d85e0f366a Fix memory alignment in pg_mcv_list serialization
Blind attempt at fixing ia64, hppa an sparc builds.

The serialized representation of MCV lists did not enforce proper memory
alignment for internal fields, resulting in deserialization issues on
platforms that are more sensitive to this (ia64, sparc and hppa).

This forces a catalog version bump, because the layout of serialized
pg_mcv_list changes.

Broken since 7300a699.
2019-03-29 19:06:38 +01:00
Andres Freund d3a5fc17eb Show table access methods as such in psql's \dA.
Previously we didn't display a type for table access methods.

Author: Haribabu Kommi
Discussion: CAJrrPGeeYOqP3hkZyohDx_8dot4zvPuPMDBmhJ=iC85cTBNeYw@mail.gmail.com
2019-03-29 08:59:40 -07:00
Andres Freund ffa8444ce4 tableam: Comment fixes.
Author: Haribabu Kommi
Discussion: CAJrrPGeeYOqP3hkZyohDx_8dot4zvPuPMDBmhJ=iC85cTBNeYw@mail.gmail.com
2019-03-29 08:17:26 -07:00
Robert Haas 41b54ba78e Allow existing VACUUM options to take a Boolean argument.
This makes VACUUM work more like EXPLAIN already does without changing
the meaning of any commands that already work.  It is intended to
facilitate the addition of future VACUUM options that may take
non-Boolean parameters or that default to false.

Masahiko Sawada, reviewed by me.

Discussion: http://postgr.es/m/CA+TgmobpYrXr5sUaEe_T0boabV0DSm=utSOZzwCUNqfLEEm8Mw@mail.gmail.com
Discussion: http://postgr.es/m/CAD21AoBaFcKBAeL5_++j+Vzir2vBBcF4juW7qH8b3HsQY=Q6+w@mail.gmail.com
2019-03-29 08:22:49 -04:00
Robert Haas c900c15269 Warn more strongly about the dangers of exclusive backup mode.
Especially, warn about the hazards of mishandling the backup_label
file.  Adjust a couple of server messages to be more clear about
the hazards associated with removing backup_label files, too.

David Steele and Robert Haas, reviewed by Laurenz Albe, Martín
Marqués, Peter Eisentraut, and Magnus Hagander.

Discussion: http://postgr.es/m/7d85c387-000e-16f0-e00b-50bf83c22127@pgmasters.net
2019-03-29 08:15:16 -04:00
Peter Eisentraut bb76134b08 Fix incorrect code in new REINDEX CONCURRENTLY code
The previous code was adding pointers to transient variables to a
list, but by the time the list was read, the variable might be gone,
depending on the compiler.  Fix it by making copies in the proper
memory context.
2019-03-29 10:53:40 +01:00
Peter Eisentraut 5dc92b844e REINDEX CONCURRENTLY
This adds the CONCURRENTLY option to the REINDEX command.  A REINDEX
CONCURRENTLY on a specific index creates a new index (like CREATE
INDEX CONCURRENTLY), then renames the old index away and the new index
in place and adjusts the dependencies, and then drops the old
index (like DROP INDEX CONCURRENTLY).  The REINDEX command also has
the capability to run its other variants (TABLE, DATABASE) with the
CONCURRENTLY option (but not SYSTEM).

The reindexdb command gets the --concurrently option.

Author: Michael Paquier, Andreas Karlsson, Peter Eisentraut
Reviewed-by: Andres Freund, Fujii Masao, Jim Nasby, Sergei Kornilov
Discussion: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee
2019-03-29 08:26:33 +01:00
Andres Freund d25f519107 tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.
This moves the responsibility for:
- creating the storage necessary for a relation, including creating a
  new relfilenode for a relation with existing storage
- non-transactional truncation of a relation
- VACUUM FULL / CLUSTER's rewrite of a table
below tableam.

This is fairly straight forward, with a bit of complexity smattered in
to move the computation of xid / multixid horizons below the AM, as
they don't make sense for every table AM.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-28 20:01:43 -07:00
Thomas Munro 7e69323bf7 Fix typo.
Author: Masahiko Sawada
2019-03-29 10:03:58 +13:00
Andres Freund 46bcd2af18 Fix a few comment copy & pastos. 2019-03-28 13:42:37 -07:00
Tomas Vondra 62bf0fb35c Fix deserialization of pg_mcv_list values
There were multiple issues in deserialization of pg_mcv_list values.

Firstly, the data is loaded from syscache, but the deserialization was
performed after ReleaseSysCache(), at which point the data might have
already disappeared.  Fixed by moving the calls in statext_mcv_load,
and using the same NULL-handling code as existing stats.

Secondly, the deserialized representation used pointers into the
serialized representation.  But that is also unsafe, because the data
may disappear at any time.  Fixed by reworking and simplifying the
deserialization code to always copy all the data.

And thirdly, when deserializing values for types passed by value, the
code simply did memcpy(d,s,typlen) which however does not work on
bigendian machines.  Fixed by using fetch_att/store_att_byval.
2019-03-28 20:03:14 +01:00
Thomas Munro ad308058cc Use FullTransactionId for the transaction stack.
Provide GetTopFullTransactionId() and GetCurrentFullTransactionId().
The intended users of these interfaces are access methods that use
xids for visibility checks but don't want to have to go back and
"freeze" existing references some time later before the 32 bit xid
counter wraps around.

Use a new struct to serialize the transaction state for parallel
query, because FullTransactionId doesn't fit into the previous
serialization scheme very well.

Author: Thomas Munro
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
2019-03-28 18:24:43 +13:00
Thomas Munro 2fc7af5e96 Add basic infrastructure for 64 bit transaction IDs.
Instead of inferring epoch progress from xids and checkpoints,
introduce a 64 bit FullTransactionId type and use it to track xid
generation.  This fixes an unlikely bug where the epoch is reported
incorrectly if the range of active xids wraps around more than once
between checkpoints.

The only user-visible effect of this commit is to correct the epoch
used by txid_current() and txid_status(), also visible with
pg_controldata, in those rare circumstances.  It also creates some
basic infrastructure so that later patches can use 64 bit
transaction IDs in more places.

The new type is a struct that we pass by value, as a form of strong
typedef.  This prevents the sort of accidental confusion between
TransactionId and FullTransactionId that would be possible if we
were to use a plain old uint64.

Author: Thomas Munro
Reported-by: Amit Kapila
Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
2019-03-28 18:12:20 +13:00
Andres Freund 2a96909a4a tableam: Support for an index build's initial table scan(s).
To support building indexes over tables of different AMs, the scans to
do so need to be routed through the table AM.  While moving a fair
amount of code, nearly all the changes are just moving code to below a
callback.

Currently the range based interface wouldn't make much sense for non
block based table AMs. But that seems aceptable for now.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-27 19:59:06 -07:00
Tomas Vondra a63b29a1de Minor improvements for the multivariate MCV lists
The MCV build should always call get_mincount_for_mcv_list(), as the
there is no other logic to decide whether the MCV list represents all
the data. So just remove the (ngroups > nitems) condition.

Also, when building MCV lists, the number of items was limited by the
statistics target (i.e. up to 10000). But when deserializing the MCV
list, a different value (8192) was used to check the input, causing
an error.  Simply ensure that the same value is used in both places.

This should have been included in 7300a69950, but I forgot to include it
in that commit.
2019-03-27 20:07:41 +01:00
Tomas Vondra 7300a69950 Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.

Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-03-27 18:32:18 +01:00
Tom Lane 333ed246c6 Avoid passing query tlist around separately from root->processed_tlist.
In the dim past, the planner kept the fully-processed version of the query
targetlist (the result of preprocess_targetlist) in grouping_planner's
local variable "tlist", and only grudgingly passed it to individual other
routines as needed.  Later we discovered a need to still have it available
after grouping_planner finishes, and invented the root->processed_tlist
field for that purpose, but it wasn't used internally to grouping_planner;
the tlist was still being passed around separately in the same places as
before.

Now comes a proposed patch to allow appendrel expansion to add entries
to the processed tlist, well after preprocess_targetlist has finished
its work.  To avoid having to pass around the tlist explicitly, it's
proposed to allow appendrel expansion to modify root->processed_tlist.
That makes aliasing the tlist with assorted parameters and local
variables really scary.  It would accidentally work as long as the
tlist is initially nonempty, because then the List header won't move
around, but it's not exactly hard to think of ways for that to break.
Aliased values are poor programming practice anyway.

Hence, get rid of local variables and parameters that can be identified
with root->processed_tlist, in favor of just using that field directly.
And adjust comments to match.  (Some of the new comments speak as though
it's already possible for appendrel expansion to modify the tlist; that's
not true yet, but will happen in a later patch.)

Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-27 12:57:49 -04:00
Alvaro Herrera 9938d11633 pgbench: doExecuteCommand -> executeMetaCommand
The new function is only in charge of meta commands, not SQL commands.
This change makes the code a little clearer: now all the state changes
are effected by advanceConnectionState.  It also removes one indent
level, which makes the diff look bulkier than it really is.

Author: Fabien Coelho
Reviewed-by: Kirk Jamison
Discussion: https://postgr.es/m/alpine.DEB.2.21.1811240904500.12627@lancre
2019-03-27 12:21:02 -03:00
Tom Lane a51cc7e9e6 Suppress uninitialized-variable warning.
Apparently Andres' compiler is smart enough to see that hpage
must be initialized before use ... but mine isn't.
2019-03-27 11:10:42 -04:00
Michael Paquier ecfed4a122 Improve error handling of column references in expression transformation
Column references are not allowed in default expressions and partition
bound expressions, and are restricted as such once the transformation of
their expressions is done.  However, trying to use more complex column
references can lead to confusing error messages.  For example, trying to
use a two-field column reference name for default expressions and
partition bounds leads to "missing FROM-clause entry for table", which
makes no sense in their respective context.

In order to make the errors generated more useful, this commit adds more
verbose messages when transforming column references depending on the
context.  This has a little consequence though: for example an
expression using an aggregate with a column reference as argument would
cause an error to be generated for the column reference, while the
aggregate was the problem reported before this commit because column
references get transformed first.

The confusion exists for default expressions for a long time, and the
problem is new as of v12 for partition bounds.  Still per the lack of
complaints on the matter no backpatch is done.

The patch has been written by Amit Langote and me, and Tom Lane has
provided the improvement of the documentation for default expressions on
the CREATE TABLE page.

Author: Amit Langote, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190326020853.GM2558@paquier.xyz
2019-03-27 21:04:25 +09:00
Thomas Munro d2fd7f74ee Fix off-by-one error in txid_status().
The transaction ID returned by GetNextXidAndEpoch() is in the future,
so we can't attempt to access its status or we might try to read a
CLOG page that doesn't exist.  The > vs >= confusion probably stemmed
from the choice of a variable name containing the word "last" instead
of "next", so fix that too.

Back-patch to 10 where the function arrived.

Author: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKG%2Buua_BV5cyfsioKVN2d61Lukg28ECsWTXKvh%3DBtN2DPA%40mail.gmail.com
2019-03-27 21:30:04 +13:00
Michael Paquier 1983af8e89 Switch some palloc/memset calls to palloc0
Some code paths have been doing some allocations followed by an
immediate memset() to initialize the allocated area with zeros, this is
a bit overkill as there are already interfaces to do both things in one
call.

Author: Daniel Gustafsson
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/vN0OodBPkKs7g2Z1uyk3CUEmhdtspHgYCImhlmSxv1Xn6nY1ZnaaGHL8EWUIQ-NEv36tyc4G5-uA3UXUF2l4sFXtK_EQgLN1hcgunlFVKhA=@yesql.se
2019-03-27 12:02:50 +09:00
Michael Paquier 5bde1651bb Switch function current_schema[s]() to be parallel-unsafe
When invoked for the first time in a session, current_schema() and
current_schemas() can finish by creating a temporary schema.  Currently
those functions are parallel-safe, however if for a reason or another
they get launched across multiple parallel workers, they would fail when
attempting to create a temporary schema as temporary contexts are not
supported in this case.

The original issue has been spotted by buildfarm members crake and
lapwing, after commit c5660e0 has introduced the first regression tests
based on current_schema() in the tree.  After that, 396676b has
introduced a workaround to avoid parallel plans but that was not
completely right either.

Catversion is bumped.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190118024618.GF1883@paquier.xyz
2019-03-27 11:35:12 +09:00
Tomas Vondra 6ca015f9f0 Track unowned relations in doubly-linked list
Relations dropped in a single transaction are tracked in a list of
unowned relations.  With large number of dropped relations this resulted
in poor performance at the end of a transaction, when the relations are
removed from the singly linked list one by one.

Commit b4166911 attempted to address this issue (particularly when it
happens during recovery) by removing the relations in a reverse order,
resulting in O(1) lookups in the list of unowned relations.  This did
not work reliably, though, and it was possible to trigger the O(N^2)
behavior in various ways.

Instead of trying to remove the relations in a specific order with
respect to the linked list, which seems rather fragile, switch to a
regular doubly linked.  That allows us to remove relations cheaply no
matter where in the list they are.

As b4166911 was a bugfix, backpatched to all supported versions, do the
same thing here.

Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/80c27103-99e4-1d0c-642c-d9f3b94aaa0a%402ndquadrant.com
Backpatch-through: 9.4
2019-03-27 02:39:39 +01:00
Andres Freund 558a9165e0 Compute XID horizon for page level index vacuum on primary.
Previously the xid horizon was only computed during WAL replay. That
had two major problems:
1) It relied on knowing what the table pointed to looks like. That was
   easy enough before the introducing of tableam (we knew it had to be
   heap, although some trickery around logging the heap relfilenodes
   was required). But to properly handle table AMs we need
   per-database catalog access to look up the AM handler, which
   recovery doesn't allow.
2) Not knowing the xid horizon also makes it hard to support logical
   decoding on standbys. When on a catalog table, we need to be able
   to conflict with slots that have an xid horizon that's too old. But
   computing the horizon by visiting the heap only works once
   consistency is reached, but we always need to be able to detect
   conflicts.

There's also a secondary problem, in that the current method performs
redundant work on every standby. But that's counterbalanced by
potentially computing the value when not necessary (either because
there's no standby, or because there's no connected backends).

Solve 1) and 2) by moving computation of the xid horizon to the
primary and by involving tableam in the computation of the horizon.

To address the potentially increased overhead, increase the efficiency
of the xid horizon computation for heap by sorting the tids, and
eliminating redundant buffer accesses. When prefetching is available,
additionally perform prefetching of buffers.  As this is more of a
maintenance task, rather than something routinely done in every read
only query, we add an arbitrary 10 to the effective concurrency -
thereby using IO concurrency, when not globally enabled.  That's
possibly not the perfect formula, but seems good enough for now.

Bumps WAL format, as latestRemovedXid is now part of the records, and
the heap's relfilenode isn't anymore.

Author: Andres Freund, Amit Khandekar, Robert Haas
Reviewed-By: Robert Haas
Discussion:
    https://postgr.es/m/20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de
    https://postgr.es/m/20181214014235.dal5ogljs3bmlq44@alap3.anarazel.de
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-26 16:52:54 -07:00
Alvaro Herrera 126d631222 Fix partitioned index creation bug with dropped columns
ALTER INDEX .. ATTACH PARTITION fails if the partitioned table where the
index is defined contains more dropped columns than its partition, with
this message:
  ERROR:  incorrect attribute map
The cause was that one caller of CompareIndexInfo was passing the number
of attributes of the partition rather than the parent, which confused
the length check.  Repair.

This can cause pg_upgrade to fail when used on such a database.  Leave
some more objects around after regression tests, so that the case is
detected by pg_upgrade test suite.

Remove some spurious empty lines noticed while looking for other cases
of the same problem.

Discussion: https://postgr.es/m/20190326213924.GA2322@alvherre.pgsql
2019-03-26 20:19:28 -03:00
Tom Lane 53bcf5e3db Build "other rels" of appendrel baserels in a separate step.
Up to now, otherrel RelOptInfos were built at the same time as baserel
RelOptInfos, thanks to recursion in build_simple_rel().  However,
nothing in query_planner's preprocessing cares at all about otherrels,
only baserels, so we don't really need to build them until just before
we enter make_one_rel.  This has two benefits:

* create_lateral_join_info did a lot of extra work to propagate
lateral-reference information from parents to the correct children.
But if we delay creation of the children till after that, it's
trivial (and much harder to break, too).

* Since we have all the restriction quals correctly assigned to
parent appendrels by this point, it'll be possible to do plan-time
pruning and never make child RelOptInfos at all for partitions that
can be pruned away.  That's not done here, but will be later on.

Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley

Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-26 18:21:10 -04:00
Tom Lane 8994cc6ffc Add ORDER BY to more ICU regression test cases.
Commit c77e12208 didn't fully fix the problem.  Per buildfarm
and local testing.
2019-03-26 17:46:04 -04:00
Tom Lane 7c366ac969 Fix oversight in data-type change for autovacuum_vacuum_cost_delay.
Commit caf626b2c missed that the relevant reloptions entry needs
to be moved from the intRelOpts[] array to realRelOpts[].
Somewhat surprisingly, it seems to work anyway, perhaps because
the desired default and limit values are all integers.  We ought
to have either a simpler data structure or better cross-checking
here, but that's for another patch.

Nikolay Shaplov

Discussion: https://postgr.es/m/4861742.12LTaSB3sv@x200m
2019-03-26 13:32:38 -04:00
Alvaro Herrera 1d21ba8a9b psql: Schema-qualify typecast in one \d query
Bug introduced in my commit bc87f22ef6
2019-03-26 13:06:41 -03:00
Tom Lane e8d5dd6be7 Get rid of duplicate child RTE for a partitioned table.
We've been creating duplicate RTEs for partitioned tables just
because we do so for regular inheritance parent tables.  But unlike
regular-inheritance parents which are themselves regular tables
and thus need to be scanned, partitioned tables don't need the
extra RTE.

This makes the conditions for building a child RTE the same as those
for building an AppendRelInfo, allowing minor simplification in
expand_single_inheritance_child.  Since the planner's actual processing
is driven off the AppendRelInfo list, nothing much changes beyond that,
we just have one fewer useless RTE entry.

Amit Langote, reviewed and hacked a bit by me

Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-26 12:03:27 -04:00
Alvaro Herrera 1af25ca0c2 Improve psql's \d display of foreign key constraints
When used on a partition containing foreign keys coming from one of its
ancestors, \d would (rather unhelpfully) print the details about the
pg_constraint row in the partition.  This becomes a bit frustrating when
the user tries things like dropping the FK in the partition; instead,
show the details for the foreign key on the table where it is defined.

Also, when a table is referenced by a foreign key on a partitioned
table, we would show multiple "Referenced by" lines, one for each
partition, which gets unwieldy pretty fast.  Modify that so that it
shows only one line for the ancestor partitioned table where the FK is
defined.

Discussion: https://postgr.es/m/20181204143834.ym6euxxxi5aeqdpn@alvherre.pgsql
Reviewed-by: Tom Lane, Amit Langote, Peter Eisentraut
2019-03-26 11:14:34 -03:00
Peter Eisentraut c8c885b7a5 Fix misplaced const
These instances were apparently trying to carry the const qualifier
from the arguments through the complex casts, but for that the const
qualifier was misplaced.
2019-03-26 09:23:08 +01:00
Andres Freund 2ac1b2b175 Remove heap_hot_search().
After 71bdc99d0d, "tableam: Add helper for indexes to check if a
corresponding table tuples exist." there's no in-core user left. As
there's unlikely to be an external user, and such an external user
could easily be adjusted to use table_index_fetch_tuple_check(),
remove heap_hot_search().

Per complaint from Peter Geoghegan

Author: Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzn0Oq4ftJrTqRAsWy2WGjv0QrJcwoZ+yqWsF_Z5vjUBFw@mail.gmail.com
2019-03-25 19:04:41 -07:00
Michael Paquier cdde886d36 Fix crash when using partition bound expressions
Since 7c079d7, partition bounds are able to use generalized expression
syntax when processed, treating "minvalue" and "maxvalue" as specific
cases as they get passed down for transformation as a column references.

The checks for infinite bounds in range expressions have been lax
though, causing crashes when trying to use column reference names with
more than one field.  Here is an example causing a crash:
CREATE TABLE list_parted (a int) PARTITION BY LIST (a);
CREATE TABLE part_list_crash PARTITION OF list_parted
  FOR VALUES IN (somename.somename);

Note that the creation of the second relation should fail as partition
bounds cannot have column references in their expressions, so when
finding an expression which does not match the expected infinite bounds,
then this commit lets the generic transformation machinery check after
it.  The error message generated in this case references as well a
missing RTE, which is confusing.  This problem will be treated
separately as it impacts as well default expressions for some time, and
for now only the cases where a crash can happen are fixed.

While on it, extend the set of regression tests in place for list
partition bounds and add an extra set for range partition bounds.

Reported-by: Alexander Lakhin
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/15668-0377b1981aa1a393@postgresql.org
2019-03-26 10:09:14 +09:00
Andres Freund 2e3da03e9e tableam: Add table_get_latest_tid, to wrap heap_get_latest_tid.
This primarily is to allow WHERE CURRENT OF to continue to work as it
currently does. It's not clear to me that these semantics make sense
for every AM, but it works for the in-core heap, and the out of core
zheap. We can refine it further at a later point if necessary.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 17:14:48 -07:00
Andres Freund 71bdc99d0d tableam: Add helper for indexes to check if a corresponding table tuples exist.
This is, likely exclusively, useful to verify that conflicts detected
in a unique index are with live tuples, rather than dead ones.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 16:52:55 -07:00
Thomas Munro aa1419e63f Add MacPorts support to src/test/ldap tests.
Previously the test knew how to find an OpenLDAP installation at the
paths used by Homebrew.  Add the MacPorts paths too.

Author: Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CA%2BhUKGKrjGS7sO4jc53gp3qipCtEvThtdP_%3DzoixgX5ZBq4Nbw%40mail.gmail.com
2019-03-26 11:44:18 +13:00
Tom Lane f7111f72d2 Improve planner's selectivity estimates for inequalities on CTID.
We were getting just DEFAULT_INEQ_SEL for comparisons such as
"ctid >= constant", but it's possible to do a lot better if we don't
mind some assumptions about the table's tuple density being reasonably
uniform.  There are already assumptions much like that elsewhere in
the planner, so that hardly seems like much of an objection.

Extracted from a patch set that also proposes to introduce a special
executor node type for such queries.  Not sure if that's going to make
it into v12, but improving the selectivity estimate is useful
independently of that.

Edmund Horner, reviewed by David Rowley

Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
2019-03-25 18:42:52 -04:00
Tom Lane 8edd0e7946 Suppress Append and MergeAppend plan nodes that have a single child.
If there's only one child relation, the Append or MergeAppend isn't
doing anything useful, and can be elided.  It does have a purpose
during planning though, which is to serve as a buffer between parent
and child Var numbering.  Therefore we keep it all the way through
to setrefs.c, and get rid of it only after fixing references in the
plan level(s) above it.  This works largely the same as setrefs.c's
ancient hack to get rid of no-op SubqueryScan nodes, and can even
share some code with that.

Note the change to make setrefs.c use apply_tlist_labeling rather than
ad-hoc code.  This has the effect of propagating the child's resjunk
and ressortgroupref labels, which formerly weren't propagated when
removing a SubqueryScan.  Doing that is demonstrably necessary for
the [Merge]Append cases, and seems harmless for SubqueryScan, if only
because trivial_subqueryscan is afraid to collapse cases where the
resjunk marking differs.  (I suspect that restriction could now be
removed, though it's unclear that it'd make any new matches possible,
since the outer query can't have references to a child resjunk column.)

David Rowley, reviewed by Alvaro Herrera and Tomas Vondra

Discussion: https://postgr.es/m/CAKJS1f_7u8ATyJ1JGTMHFoKDvZdeF-iEBhs+sM_SXowOr9cArg@mail.gmail.com
2019-03-25 15:42:35 -04:00
Peter Geoghegan f21668f328 Add "split after new tuple" nbtree optimization.
Add additional heuristics to the algorithm for locating an optimal split
location.  New logic identifies localized monotonically increasing
values in indexes with multiple columns.  When this insertion pattern is
detected, page splits split just after the new item that provoked a page
split (or apply leaf fillfactor in the style of a rightmost page split).
This optimization is a variation of the long established leaf fillfactor
optimization used during rightmost page splits.

50/50 page splits are only appropriate with a pattern of truly random
insertions, where the average space utilization ends up at 65% - 70%.
Without this patch, affected cases have leaf pages that are no more than
about 50% full on average.  Future insertions can never make use of the
free space left behind.  With this patch, affected cases have leaf pages
that are about 90% full on average (assuming a fillfactor of 90).

Localized monotonically increasing insertion patterns are presumed to be
fairly common in real-world applications.  There is a fair amount of
anecdotal evidence for this.  Both pg_depend system catalog indexes
(pg_depend_depender_index and pg_depend_reference_index) are at least
20% smaller after the regression tests are run when the optimization is
available.  Furthermore, many of the indexes created by a fair use
implementation of TPC-C for Postgres are consistently about 40% smaller
when the optimization is available.

Note that even pg_upgrade'd v3 indexes make use of this optimization.

Author: Peter Geoghegan
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WzkpKeZJrXvR_p7VSY1b-s85E3gHyTbZQzR0BkJ5LrWF_A@mail.gmail.com
2019-03-25 09:44:25 -07:00
Tom Lane f7ff0ae842 Further code review for new integerset code.
Mostly cosmetic adjustments, but I added a more reliable method of
detecting whether an iteration is in progress.
2019-03-25 12:23:48 -04:00
Heikki Linnakangas 9f75e37723 Refactor code to print pgbench progress reports.
threadRun() function is very long and deeply-nested. Extract the code to
print progress reports to a separate function, to make it slightly easier
to read.

Author: Fabien Coelho
Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1903101225270.17271%40lancre
2019-03-25 18:07:29 +02:00
Robert Haas 5857be907d Fix use of wrong datatype with sizeof().
OID and int are the same size, but they are not the same thing.

David Rowley

Discussion: http://postgr.es/m/CAKJS1f_MhS++XngkTvWL9X1v8M5t-0N0B-R465yHQY=TmNV0Ew@mail.gmail.com
2019-03-25 11:28:06 -04:00
Alvaro Herrera 25ee70511e pgbench: Remove \cset
Partial revert of commit 6260cc550b, "pgbench: add \cset and \gset
commands".

While \gset is widely considered a useful and necessary tool for user-
defined benchmarks, \cset does not have as much value, and its
implementation was considered "not to be up to project standards"
(though I, Álvaro, can't quite understand exactly how).  Therefore,
remove \cset.

Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1903230716030.18811@lancre
Discussion: https://postgr.es/m/201901101900.mv7zduch6sad@alvherre.pgsql
2019-03-25 12:16:07 -03:00
Robert Haas 6f97457e0d Add progress reporting for CLUSTER and VACUUM FULL.
This uses the same progress reporting infrastructure added in commit
c16dc1aca5 and extends it to these
additional cases.  We lack the ability to track the internal progress
of sorts and index builds so the information reported is
coarse-grained for some parts of the operation, but it still seems
like a significant improvement over having nothing at all.

Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael
Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me.  A fair
amount of polishing also by me.

Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp
2019-03-25 10:59:04 -04:00
Alexander Korotkov 1d88a75c42 Get rid of backtracking in jsonpath_scan.l
Non-backtracking flex parsers work faster than backtracking ones.  So, this
commit gets rid of backtracking in jsonpath_scan.l.  That required explicit
handling of some cases as well as manual backtracking for some cases.  More
regression tests for numerics are added.

Discussion: https://mail.google.com/mail/u/0?ik=a20b091faa&view=om&permmsgid=msg-f%3A1628425344167939063
Author: John Naylor, Nikita Gluknov, Alexander Korotkov
2019-03-25 15:43:56 +03:00
Alexander Korotkov 8b17298f0b Cosmetic changes for jsonpath_gram.y and jsonpath_scan.l
This commit include formatting improvements, renamings and comments.  Also,
it makes jsonpath_scan.l be more uniform with other our lexers.  Firstly,
states names are renamed to more short alternatives.  Secondly, <INITIAL>
prefix removed from the rules.  Corresponding rules are moved to the tail, so
they would anyway work only in initial state.

Author: Alexander Korotkov
Reviewed-by: John Naylor
2019-03-25 15:42:51 +03:00
Heikki Linnakangas d303122eab Clean up the Simple-8b encoder code.
Coverity complained that simple8b_encode() might read beyond the end of
the 'diffs' array, in the loop to encode the integers. That was a false
positive, because we never get into the loop in modes 0 or 1, and the
array is large enough for all the other modes. But I admit it's very
subtle, so it's not surprising that Coverity didn't see it, and it's not
very obvious to humans either. Refactor it, so that the second loop
re-computes the differences, instead of carrying them over from the first
loop in the 'diffs' array. This way, the 'diffs' array is not needed
anymore. It makes no measurable difference in performance, and seems more
straightforward this way.

Also, improve the comments in simple8b_encode(): fix the comment about its
return value that was flat-out wrong, and explain the condition when it
returns EMPTY_CODEWORD better.

In the passing, move the 'selector' from the codeword's low bits to the
high bits. It doesn't matter much, but looking at the original paper, and
googling around for other Simple-8b implementations, that's how it's
usually done.

Per Coverity, and Tom Lane's report off-list.
2019-03-25 11:39:51 +02:00
Peter Eisentraut 148cf5f462 Align timestamps in pg_regress output
This way the timestamps line up in a mix of "ok" and "FAILED" output.

Author: Christoph Berg <christoph.berg@credativ.de>
Discussion: https://www.postgresql.org/message-id/20190321115059.GF2687%40msg.df7cb.de
2019-03-25 10:02:24 +01:00
Peter Eisentraut 481018f280 Add macro to cast away volatile without allowing changes to underlying type
This adds unvolatize(), which works just like unconstify() but for volatile.

Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com
2019-03-25 09:37:03 +01:00
Andres Freund 9a8ee1dc65 tableam: Add and use table_fetch_row_version().
This is essentially the tableam version of heapam_fetch(),
i.e. fetching a tuple identified by a tid, performing visibility
checks.

Note that this different from table_index_fetch_tuple(), which is for
index lookups. It therefore has to handle a tid pointing to an earlier
version of a tuple if the AM uses an optimization like heap's HOT. Add
comments to that end.

This commit removes the stats_relation argument from heap_fetch, as
it's been unused for a long time.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-25 00:17:59 -07:00
Peter Eisentraut c77e12208c Add ORDER BY to regression test case
Apparently, the output order is different on different endianness, per
build farm member snapper.
2019-03-25 08:15:38 +01:00
Andres Freund 919e48b943 tableam: Use in CREATE TABLE AS and CREATE MATERIALIZED VIEW.
Previously those directly performed a heap_insert(). Use
table_insert() instead.  The input slot of those routines is not of
the target relation - we could fix that by copying if necessary, but
that'd not be beneficial for performance. As those codepaths don't
access any AM specific tuple fields (say xmin/xmax), there's no need
to use an AM specific slot.

Author: Andres Freund
Reviewed-By: Haribabu Kommi
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-24 18:58:37 -07:00
Tom Lane 940311e4bb Un-hide most cascaded-drop details in regression test results.
Now that the ordering of DROP messages ought to be stable everywhere,
we should not need these kluges of hiding DETAIL output just to avoid
unstable ordering.  Hiding it's not great for test coverage, so
let's undo that where possible.

In a small number of places, it's necessary to leave it in, for
example because the output might include a variable pg_temp_nnn
schema name.  I also left things alone in places where the details
would depend on other regression test scripts, e.g. plpython_drop.sql.

Perhaps buildfarm experience will show this to be a bad idea,
but if so I'd like to know why.

Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
2019-03-24 19:15:37 -04:00
Tom Lane af6550d344 Sort dependent objects before reporting them in DROP ROLE.
Commit 8aa9dd74b didn't quite finish the job in this area after all,
because DROP ROLE has a code path distinct from DROP OWNED BY, and
it was still reporting dependent objects in whatever order the index
scan returned them in.

Buildfarm experience shows that index ordering of equal-keyed objects is
significantly less stable than before in the wake of using heap TIDs as
tie-breakers.  So if we try to hide the unstable ordering by suppressing
DETAIL reports, we're just going to end up having to do that for every
DROP that reports multiple objects.  That's not great from a coverage
or problem-detection standpoint, and it's something we'll inevitably
forget in future patches, leading to more iterations of fixing-an-
unstable-result.  So let's just bite the bullet and sort here too.

Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
2019-03-24 18:17:53 -04:00
Peter Geoghegan 59ab3be9e4 Remove dead code from nbtsplitloc.c.
It doesn't make sense to consider the possibility that there will only
be one candidate split point when choosing among split points to find
the split with the lowest penalty.  This is a vestige of an earlier
version of the patch that became commit fab25024.

Issue spotted while rereviewing coverage of the nbtree patch series
using gcov.
2019-03-24 12:28:58 -07:00
Michael Paquier 276d2e6c2d Make current_logfiles use permissions assigned to files in data directory
Since its introduction in 19dc233c, current_logfiles has been assigned
the same permissions as a log file, which can be enforced with
log_file_mode.  This setup can lead to incompatibility problems with
group access permissions as current_logfiles is not located in the log
directory, but at the root of the data folder.  Hence, if group
permissions are used but log_file_mode is more restrictive, a backup
with a user in the group having read access could fail even if the log
directory is located outside of the data folder.

Per discussion with the folks mentioned below, we have concluded that
current_logfiles should not be treated as a log file as it only stores
metadata related to log files, and that it should use the same
permissions as all other files in the data directory.  This solution has
the merit to be simple and fixes all the interaction problems between
group access and log_file_mode.

Author: Haribabu Kommi
Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com
Backpatch-through: 11, where group access has been added.
2019-03-24 21:00:35 +09:00
Peter Eisentraut 280a408b48 Transaction chaining
Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which
start new transactions with the same transaction characteristics as the
just finished one, per SQL standard.

Support for transaction chaining in PL/pgSQL is also added.  This
functionality is especially useful when running COMMIT in a loop in
PL/pgSQL.

Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
2019-03-24 11:33:02 +01:00
Andres Freund b2db277057 Remove spurious return.
Per buildfarm member anole.

Author: Andres Freund
2019-03-23 21:09:39 -07:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Tom Lane f778e537a0 Remove inadequate check for duplicate "xml" PI.
I failed to think about PIs starting with "xml".  We don't really
need this check at all, so just take it out.  Oversight in
commit 8d1dadb25 et al.
2019-03-23 17:40:19 -04:00
Tom Lane 4870dce37f Ensure xmloption = content while restoring pg_dump output.
In combination with the previous commit, this ensures that valid XML
data can always be dumped and reloaded, whether it is "document"
or "content".

Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
2019-03-23 16:51:37 -04:00
Tom Lane 8d1dadb25b Accept XML documents when xmloption = content, as required by SQL:2006+.
Previously we were using the SQL:2003 definition, which doesn't allow
this, but that creates a serious dump/restore gotcha: there is no
setting of xmloption that will allow all valid XML data.  Hence,
switch to the 2006 definition.

Since libxml doesn't accept <!DOCTYPE> directives in the mode we
use for CONTENT parsing, the implementation is to detect <!DOCTYPE>
in the input and switch to DOCUMENT parsing mode.  This should not
cost much, because <!DOCTYPE> should be close to the front of the
input if it's there at all.  It's possible that this causes the
error messages for malformed input to be slightly different than
they were before, if said input includes <!DOCTYPE>; but that does
not seem like a big problem.

In passing, buy back a few cycles in parsing of large XML documents
by not doing strlen() of the whole input in parse_xml_decl().

Back-patch because dump/restore failures are not nice.  This change
shouldn't break any cases that worked before, so it seems safe to
back-patch.

Chapman Flack (revised a bit by me)

Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
2019-03-23 16:51:37 -04:00
Peter Geoghegan 05f110cc0b Suppress DETAIL output from an event_trigger test.
Suppress 3 lines of unstable DETAIL output from a DROP ROLE statement in
event_trigger.sql.  This is further cleanup for commit dd299df8.

Note that the event_trigger test instability issue is very similar to
the recently suppressed foreign_data test instability issue.  Both
issues involve DETAIL output for a DROP ROLE statement that needed to be
changed as part of dd299df8.

Per buildfarm member macaque.
2019-03-23 13:49:53 -07:00
Peter Geoghegan 29b64d1de7 Add nbtree high key "continuescan" optimization.
Teach nbtree forward index scans to check the high key before moving to
the right sibling page in the hope of finding that it isn't actually
necessary to do so.  The new check may indicate that the scan definitely
cannot find matching tuples to the right, ending the scan immediately.
We already opportunistically force a similar "continuescan orientated"
key check of the final non-pivot tuple when it's clear that it cannot be
returned to the scan due to being dead-to-all.  The new high key check
is complementary.

The new approach for forward scans is more effective than checking the
final non-pivot tuple, especially with composite indexes and non-unique
indexes.  The improvements to the logic for picking a split point added
by commit fab25024 make it likely that relatively dissimilar high keys
will appear on a page.  A distinguishing key value that can only appear
on non-pivot tuples on the right sibling page will often be present in
leaf page high keys.

Since forcing the final item to be key checked no longer makes any
difference in the case of forward scans, the existing extra key check is
now only used for backwards scans.  Backward scans continue to
opportunistically check the final non-pivot tuple, which is actually the
first non-pivot tuple on the page (not the last).

Note that even pg_upgrade'd v3 indexes make use of this optimization.

Author: Peter Geoghegan, Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WzkOmUduME31QnuTFpimejuQoiZ-HOf0pOWeFZNhTMctvA@mail.gmail.com
2019-03-23 11:01:53 -07:00
Michael Paquier 4ba96d1b82 Improve format of code and some error messages in pg_checksums
This makes the code more consistent with the surroundings.

Author: Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CAFcNs+pXb_35r5feMU3-dWsWxXU=Yjq+spUsthFyGFbT0QcaKg@mail.gmail.com
2019-03-23 21:56:43 +09:00
Tom Lane fb50d3f03f Add unreachable "break" to satisfy -Wimplicit-fallthrough.
gcc is a bit pickier about this than perhaps it should be.

Discussion: https://postgr.es/m/E1h6zzT-0003ft-DD@gemulon.postgresql.org
2019-03-23 01:32:58 -04:00
Andres Freund cdcffe2263 Expand EPQ tests for UPDATEs and DELETEs
Previously there was basically no coverage for UPDATEs encountering
deleted rows, and no coverage for DELETE having to perform EPQ. That's
problematic for an upcoming commit in which EPQ is tought to integrate
with tableams.  Also, there was no test for UPDATE to encounter a row
UPDATEd into another partition.

Author: Andres Freund
2019-03-22 19:55:23 -07:00
Michael Paquier e0090c8690 Add option -N/--no-sync to pg_checksums
This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Reviewed-by: Michael Banck, Fabien Coelho, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
2019-03-23 08:37:36 +09:00
Peter Eisentraut 7b084b3831 Revert "Add gitignore entries for jsonpath_gram.h"
This reverts commit 4e274a043f.

These files aren't actually built anymore since 550b9d26f.
2019-03-23 00:19:34 +01:00
Michael Paquier ed308d7837 Add options to enable and disable checksums in pg_checksums
An offline cluster can now work with more modes in pg_checksums:
- --enable enables checksums in a cluster, updating all blocks with a
correct checksum, and updating the control file at the end.
- --disable disables checksums in a cluster, updating only the control
file.
- --check is an extra option able to verify checksums for a cluster, and
the default used if no mode is specified.

When running --enable or --disable, the data folder gets fsync'd for
durability, and then it is followed by a control file update and flush
to keep the operation consistent should the tool be interrupted, killed
or the host unplugged.  If no mode is specified in the options, then
--check is used for compatibility with older versions of pg_checksums
(named pg_verify_checksums in v11 where it was introduced).

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
2019-03-23 08:12:55 +09:00
Peter Eisentraut 87914e708a Make subscription collation test work independent of locale
We need to set the database to UTF8 encoding so that the test can use
Unicode escapes.
2019-03-22 23:33:31 +01:00
Peter Eisentraut 4e274a043f Add gitignore entries for jsonpath_gram.h 2019-03-22 23:19:30 +01:00
Tom Lane 734308a220 Rearrange make_partitionedrel_pruneinfo to avoid work when we can't prune.
Postpone most of the effort of constructing PartitionedRelPruneInfos
until after we have found out whether run-time pruning is needed at all.
This costs very little duplicated effort (basically just an extra
find_base_rel() call per partition) and saves quite a bit when we
can't do run-time pruning.

Also, merge the first loop (for building relid_subpart_map) into
the second loop, since we don't need the map to be valid during
that loop.

Amit Langote

Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-22 14:56:12 -04:00
Peter Geoghegan 09963cedce Go back to suppressing foreign_data DETAIL test output.
This is almost a straight revert of commit fff518d, which itself was a
revert of 7d3bf73ac.

It turns out that commit 8aa9dd74, which sorted dependent objects before
deletion in DROP OWNED BY, was not sufficient to make all remaining
unstable DETAIL output stable.  Unstable DETAIL output from DROP ROLE
was not affected, because that happens to use a different code path.  It
doesn't seem worthwhile to fix the other code path at this time.

Discussion: https://postgr.es/m/6226.1553274783@sss.pgh.pa.us
2019-03-22 11:34:28 -07:00
Tom Lane c8151e6423 Don't copy PartitionBoundInfo in set_relation_partition_info.
I (tgl) remain dubious that it's a good idea for PartitionDirectory
to hold a pin on a relcache entry throughout planning, rather than
copying the data or using some kind of refcount scheme.  However, it's
certainly the responsibility of the PartitionDirectory code to ensure
that what it's handing back is a stable data structure, not that of
its caller.  So this is a pretty clear oversight in commit 898e5e329,
and one that can cost a lot of performance when there are many
partitions.

Amit Langote (extracted from a much larger patch set)

Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-22 14:16:58 -04:00
Heikki Linnakangas b5fd4972a3 Fix yet more portability bugs in integerset and its tests.
There were more large constants that needed UINT64CONST. And one variable
was declared as "int", when it needed to be uint64. These bugs were only
visible on 32-bit systems; clearly I should've tested on one, given that
this code does a lot of work with 64-bit integers.

Also, in the test "huge distances" test, the code created some values with
random distances between them, but the test logic didn't take into account
the possibility that the random distance was exactly 1. That never actually
happens with the seed we're using, but let's be tidy.
2019-03-22 17:59:19 +02:00
Peter Eisentraut 638db07814 Fix ICU tests for older ICU versions
Change the tests to use old-style ICU locale specifications so that
they can run on older ICU versions.
2019-03-22 14:40:56 +01:00
Heikki Linnakangas c477c68c8f More portability fixes for integerset tests.
Use UINT64CONST for large constants.
2019-03-22 14:57:35 +02:00
Heikki Linnakangas 32f8ddf7e1 Make printf format strings in test_integerset portable.
Use UINT64_FORMAT for printing uint64s.
2019-03-22 14:42:33 +02:00
Heikki Linnakangas 608c5f4347 Make the integerset test more verbose.
Buildfarm member 'woodlouse' failed one of the tests, and I'm not sure
which test failed. Better to print the names of the tests, so that it
will appear in the regression.diffs on failure.
2019-03-22 14:32:53 +02:00
Heikki Linnakangas d1b9ee4e44 Fix bug in the GiST vacuum's 2nd stage.
We mustn't assume that the IndexVacuumInfo pointer passed to bulkdelete()
stage is still valid in the vacuumcleanup() stage.

Per very pink buildfarm.
2019-03-22 14:11:46 +02:00
Heikki Linnakangas 7df159a620 Delete empty pages during GiST VACUUM.
To do this, we scan GiST two times. In the first pass we make note of
empty leaf pages and internal pages. At second pass we scan through
internal pages, looking for downlinks to the empty pages.

Deleting internal pages is still not supported, like in nbtree, the last
child of an internal page is never deleted. That means that if you have a
workload where new keys are always inserted to different area than where
old keys are removed, the index will still grow without bound. But the rate
of growth will be an order of magnitude slower than before.

Author: Andrey Borodin
Discussion: https://www.postgresql.org/message-id/B1E4DF12-6CD3-4706-BDBD-BF3283328F60@yandex-team.ru
2019-03-22 13:21:45 +02:00
Heikki Linnakangas df816f6ad5 Add IntegerSet, to hold large sets of 64-bit ints efficiently.
The set is implemented as a B-tree, with a compact representation at leaf
items, using Simple-8b algorithm, so that clusters of nearby values use
less memory.

The IntegerSet isn't used for anything yet, aside from the test code, but
we have two patches in the works that would benefit from this: A patch to
allow GiST vacuum to delete empty pages, and a patch to reduce heap
VACUUM's memory usage, by storing the list of dead TIDs more efficiently
and lifting the 1 GB limit on its size.

This includes a unit test module, in src/test/modules/test_integerset.
It can be used to verify correctness, as a regression test, but if you run
it manully, it can also print memory usage and execution time of some of
the tests.

Author: Heikki Linnakangas, Andrey Borodin
Reviewed-by: Julien Rouhaud
Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
2019-03-22 13:21:45 +02:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Michael Paquier 2ab6d28d23 Fix crash with pg_partition_root
Trying to call the function with the top-most parent of a partition tree
was leading to a crash.  In this case the correct result is to return
the top-most parent itself.

Reported-by: Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20190322032612.GA323@alvherre.pgsql
2019-03-22 17:27:38 +09:00
Peter Geoghegan fff518d051 Revert "Suppress DETAIL output from a foreign_data test."
This should be superseded by commit 8aa9dd74.
2019-03-21 15:33:13 -07:00
Alvaro Herrera 03ae9d59bd Catversion bump announced in previous commit but forgotten 2019-03-21 18:43:41 -03:00
Alvaro Herrera 7e7c57bbb2 Fix dependency recording bug for partitioned PKs
When DefineIndex recurses to create constraints on partitions, it needs
to use the value returned by index_constraint_create to set up partition
dependencies.  However, in the course of fixing the DEPENDENCY_INTERNAL_AUTO
mess, commit 1d92a0c9f7 introduced some code to that function that
clobbered the return value, causing the recorded OID to be of the wrong
object.  Close examination of pg_depend after creating the tables leads
to indescribable objects :-( My sin (in commit bdc3d7fa23, while
preparing for DDL deparsing in event triggers) was to use a variable
name for the return value that's typically used for throwaway objects in
dependency-setting calls ("referenced").  Fix by changing the variable
names to match extended practice (the return value is "myself" rather
than "referenced".)

The pg_upgrade test notices the problem (in an indirect way: the pg_dump
outputs are in different order), but only if you create the objects in a
specific way that wasn't being used in the existing tests.  Add a stanza
to leave some objects around that shows the bug.

Catversion bump because preexisting databases might have bogus pg_depend
entries.

Discussion: https://postgr.es/m/20190318204235.GA30360@alvherre.pgsql
2019-03-21 18:34:29 -03:00
Tom Lane bfb456c1b9 Improve error reporting for DROP FUNCTION/PROCEDURE/AGGREGATE/ROUTINE.
These commands allow the argument type list to be omitted if there is
just one object that matches by name.  However, if that syntax was
used with DROP IF EXISTS and there was more than one match, you got
a "function ... does not exist, skipping" notice message rather than a
truthful complaint about the ambiguity.  This was basically due to
poor factorization and a rats-nest of logic, so refactor the relevant
lookup code to make it cleaner.

Note that this amounts to narrowing the scope of which sorts of
error conditions IF EXISTS will bypass.  Per discussion, we only
intend it to skip no-such-object cases, not multiple-possible-matches
cases.

Per bug #15572 from Ash Marath.  Although this definitely seems like
a bug, it's not clear that people would thank us for changing the
behavior in minor releases, so no back-patch.

David Rowley, reviewed by Julien Rouhaud and Pavel Stehule

Discussion: https://postgr.es/m/15572-ed1b9ed09503de8a@postgresql.org
2019-03-21 11:52:08 -04:00
Thomas Munro 0f086f84ad Add DNS SRV support for LDAP server discovery.
LDAP servers can be advertised on a network with RFC 2782 DNS SRV
records.  The OpenLDAP command-line tools automatically try to find
servers that way, if no server name is provided by the user.  Teach
PostgreSQL to do the same using OpenLDAP's support functions, when
building with OpenLDAP.

For now, we assume that HAVE_LDAP_INITIALIZE (an OpenLDAP extension
available since OpenLDAP 2.0 and also present in Apple LDAP) implies
that you also have ldap_domain2hostlist() (which arrived in the same
OpenLDAP version and is also present in Apple LDAP).

Author: Thomas Munro
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CAEepm=2hAnSfhdsd6vXsM6VZVN0br-FbAZ-O+Swk18S5HkCP=A@mail.gmail.com
2019-03-21 15:28:17 +13:00
Tom Lane 8aa9dd74b3 Sort the dependent objects before deletion in DROP OWNED BY.
This finishes a task we left undone in commit f1ad067fc, by extending
the delete-in-descending-OID-order rule to deletions triggered by
DROP OWNED BY.  We've coped with machine-dependent deletion orders
one time too many, and the new issues caused by Peter G's recent
nbtree hacking seem like the last straw.

Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
2019-03-20 18:06:29 -04:00
Alvaro Herrera a6da004715 Add index_get_partition convenience function
This new function simplifies some existing coding, as well as supports
future patches.

Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
Reviewed-by: Amit Langote, Jesper Pedersen
2019-03-20 18:18:50 -03:00
Peter Geoghegan 3d0dcc5c7f Fix spurious compiler warning in nbtxlog.c.
Cleanup from commit dd299df8.

Per complaint from Tom Lane.
2019-03-20 14:04:35 -07:00
Peter Geoghegan 7d3bf73ac4 Suppress DETAIL output from a foreign_data test.
Unstable sort order related to changes to nbtree from commit dd299df8
can cause two lines of DETAIL output to be in opposite-of-expected
order.  Suppress the output using the same VERBOSITY hack that is used
elsewhere in the foreign_data tests.

Note that the same foreign_data.out DETAIL output was mechanically
updated by commit dd299df8.  Only a few such changes were required,
though.

Per buildfarm member batfish.

Discussion: https://postgr.es/m/CAH2-WzkCQ_MtKeOpzozj7QhhgP1unXsK8o9DMAFvDqQFEPpkYQ@mail.gmail.com
2019-03-20 13:38:38 -07:00
Alvaro Herrera 815b20ae0c Restore RI trigger sanity check
I unnecessarily removed this check in 3de241dba8 because I
misunderstood what the final representation of constraints across a
partitioning hierarchy was to be.  Put it back (in both branches).

Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
2019-03-20 17:28:43 -03:00
Peter Geoghegan fab2502433 Consider secondary factors during nbtree splits.
Teach nbtree to give some consideration to how "distinguishing"
candidate leaf page split points are.  This should not noticeably affect
the balance of free space within each half of the split, while still
making suffix truncation truncate away significantly more attributes on
average.

The logic for choosing a leaf split point now uses a fallback mode in
the case where the page is full of duplicates and it isn't possible to
find even a minimally distinguishing split point.  When the page is full
of duplicates, the split should pack the left half very tightly, while
leaving the right half mostly empty.  Our assumption is that logical
duplicates will almost always be inserted in ascending heap TID order
with v4 indexes.  This strategy leaves most of the free space on the
half of the split that will likely be where future logical duplicates of
the same value need to be placed.

The number of cycles added is not very noticeable.  This is important
because deciding on a split point takes place while at least one
exclusive buffer lock is held.  We avoid using authoritative insertion
scankey comparisons to save cycles, unlike suffix truncation proper.  We
use a faster binary comparison instead.

Note that even pg_upgrade'd v3 indexes make use of these optimizations.
Benchmarking has shown that even v3 indexes benefit, despite the fact
that suffix truncation will only truncate non-key attributes in INCLUDE
indexes.  Grouping relatively similar tuples together is beneficial in
and of itself, since it reduces the number of leaf pages that must be
accessed by subsequent index scans.

Author: Peter Geoghegan
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WzmmoLNQOj9mAD78iQHfWLJDszHEDrAzGTUMG3mVh5xWPw@mail.gmail.com
2019-03-20 10:12:19 -07:00
Peter Geoghegan dd299df818 Make heap TID a tiebreaker nbtree index column.
Make nbtree treat all index tuples as having a heap TID attribute.
Index searches can distinguish duplicates by heap TID, since heap TID is
always guaranteed to be unique.  This general approach has numerous
benefits for performance, and is prerequisite to teaching VACUUM to
perform "retail index tuple deletion".

Naively adding a new attribute to every pivot tuple has unacceptable
overhead (it bloats internal pages), so suffix truncation of pivot
tuples is added.  This will usually truncate away the "extra" heap TID
attribute from pivot tuples during a leaf page split, and may also
truncate away additional user attributes.  This can increase fan-out,
especially in a multi-column index.  Truncation can only occur at the
attribute granularity, which isn't particularly effective, but works
well enough for now.  A future patch may add support for truncating
"within" text attributes by generating truncated key values using new
opclass infrastructure.

Only new indexes (BTREE_VERSION 4 indexes) will have insertions that
treat heap TID as a tiebreaker attribute, or will have pivot tuples
undergo suffix truncation during a leaf page split (on-disk
compatibility with versions 2 and 3 is preserved).  Upgrades to version
4 cannot be performed on-the-fly, unlike upgrades from version 2 to
version 3.  contrib/amcheck continues to work with version 2 and 3
indexes, while also enforcing stricter invariants when verifying version
4 indexes.  These stricter invariants are the same invariants described
by "3.1.12 Sequencing" from the Lehman and Yao paper.

A later patch will enhance the logic used by nbtree to pick a split
point.  This patch is likely to negatively impact performance without
smarter choices around the precise point to split leaf pages at.  Making
these two mostly-distinct sets of enhancements into distinct commits
seems like it might clarify their design, even though neither commit is
particularly useful on its own.

The maximum allowed size of new tuples is reduced by an amount equal to
the space required to store an extra MAXALIGN()'d TID in a new high key
during leaf page splits.  The user-facing definition of the "1/3 of a
page" restriction is already imprecise, and so does not need to be
revised.  However, there should be a compatibility note in the v12
release notes.

Author: Peter Geoghegan
Reviewed-By: Heikki Linnakangas, Alexander Korotkov
Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
2019-03-20 10:04:01 -07:00
Peter Geoghegan e5adcb789d Refactor nbtree insertion scankeys.
Use dedicated struct to represent nbtree insertion scan keys.  Having a
dedicated struct makes the difference between search type scankeys and
insertion scankeys a lot clearer, and simplifies the signature of
several related functions.  This is based on a suggestion by Andrey
Lepikhov.

Streamline how unique index insertions cache binary search progress.
Cache the state of in-progress binary searches within _bt_check_unique()
for later instead of having callers avoid repeating the binary search in
an ad-hoc manner.  This makes it easy to add a new optimization:
_bt_check_unique() now falls out of its loop immediately in the common
case where it's already clear that there couldn't possibly be a
duplicate.

The new _bt_check_unique() scheme makes it a lot easier to manage cached
binary search effort afterwards, from within _bt_findinsertloc().  This
is needed for the upcoming patch to make nbtree tuples unique by
treating heap TID as a final tiebreaker column.  Unique key binary
searches need to restore lower and upper bounds.  They cannot simply
continue to use the >= lower bound as the offset to insert at, because
the heap TID tiebreaker column must be used in comparisons for the
restored binary search (unlike the original _bt_check_unique() binary
search, where scankey's heap TID column must be omitted).

Author: Peter Geoghegan, Heikki Linnakangas
Reviewed-By: Heikki Linnakangas, Andrey Lepikhov
Discussion: https://postgr.es/m/CAH2-WzmE6AhUdk9NdWBf4K3HjWXZBX3+umC7mH7+WDrKcRtsOw@mail.gmail.com
2019-03-20 09:30:57 -07:00
Alexander Korotkov 550b9d26f8 Get rid of jsonpath_gram.h and jsonpath_scanner.h
Jsonpath grammar and scanner are both quite small.  It doesn't worth complexity
to compile them separately.  This commit makes grammar and scanner be compiled
at once.  Therefore, jsonpath_gram.h and jsonpath_gram.h are no longer needed.
This commit also does some reorganization of code in jsonpath_gram.y.

Discussion: https://postgr.es/m/d47b2023-3ecb-5f04-d253-d557547cf74f%402ndQuadrant.com
2019-03-20 11:13:34 +03:00
Alexander Korotkov 641fde2523 Remove ambiguity for jsonb_path_match() and jsonb_path_exists()
There are 2-arguments and 4-arguments versions of jsonb_path_match() and
jsonb_path_exists().  But 4-arguments versions have optional 3rd and 4th
arguments, that leads to ambiguity.  In the same time 2-arguments versions are
needed only for @@ and @? operators.  So, rename 2-arguments versions to
remove the ambiguity.

Catversion is bumped.
2019-03-20 10:30:56 +03:00
Tom Lane 1f39a1c064 Restructure libpq's handling of send failures.
Originally, if libpq got a failure (e.g., ECONNRESET) while trying to
send data to the server, it would just report that and wash its hands
of the matter.  It was soon found that that wasn't a very pleasant way
of coping with server-initiated disconnections, so we introduced a hack
(pqHandleSendFailure) in the code that sends queries to make it peek
ahead for server error reports before reporting the send failure.

It now emerges that related cases can occur during connection setup;
in particular, as of TLS 1.3 it's unsafe to assume that SSL connection
failures will be reported by SSL_connect rather than during our first
send attempt.  We could have fixed that in a hacky way by applying
pqHandleSendFailure after a startup packet send failure, but
(a) pqHandleSendFailure explicitly disclaims suitability for use in any
state except query startup, and (b) the problem still potentially exists
for other send attempts in libpq.

Instead, let's fix this in a more general fashion by eliminating
pqHandleSendFailure altogether, and instead arranging to postpone
all reports of send failures in libpq until after we've made an
attempt to read and process server messages.  The send failure won't
be reported at all if we find a server message or detect input EOF.

(Note: this removes one of the reasons why libpq typically overwrites,
rather than appending to, conn->errorMessage: pqHandleSendFailure needed
that behavior so that the send failure report would be replaced if we
got a server message or read failure report.  Eventually I'd like to get
rid of that overwrite behavior altogether, but today is not that day.
For the moment, pqSendSome is assuming that its callees will overwrite
not append to conn->errorMessage.)

Possibly this change should get back-patched someday; but it needs
testing first, so let's not consider that till after v12 beta.

Discussion: https://postgr.es/m/CAEepm=2n6Nv+5tFfe8YnkUm1fXgvxR0Mm1FoD+QKG-vLNGLyKg@mail.gmail.com
2019-03-19 16:20:28 -04:00
Alexander Korotkov 5e28b778bf Rename typedef in jsonpath_gram.y from "string" to "JsonPathString"
Reason is the same as in 75c57058b0.
2019-03-19 21:01:10 +03:00
Peter Geoghegan 1009920aaa Tweak nbtsearch.c function prototype order.
nbtsearch.c's static function prototypes were slightly out of order.
Make the order consistent with static function definition order.
2019-03-19 09:59:05 -07:00
Tom Lane 0dfe3d0ef5 Make checkpoint requests more robust.
Commit 6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying
to request a checkpoint but the checkpointer hasn't started yet (or,
much less likely, our kill() call fails).  However buildfarm experience
shows that that's not quite enough for slow or heavily-loaded machines.
There's no good reason to assume that the checkpointer won't start
eventually, so we may as well make the timeout much longer, say 60 sec.

However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
idea to be waiting at all, much less for as long as 60 sec.  We can
remove the need for that, and make this whole thing more robust, by
adjusting the code so that the existence of a pending checkpoint
request is clear from the contents of shared memory, and making sure
that the checkpointer process will notice it at startup even if it did
not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
call to wait at all; if it can't send the signal, it can nonetheless
assume that the checkpointer will eventually service the request.

A potential downside of this change is that "kill -INT" on the checkpointer
process is no longer enough to trigger a checkpoint, should anyone be
relying on something so hacky.  But there's no obvious reason to do it
like that rather than issuing a plain old CHECKPOINT command, so we'll
assume that nobody is.  There doesn't seem to be a way to preserve this
undocumented quasi-feature without introducing race conditions.

Since a principal reason for messing with this is to prevent intermittent
buildfarm failures, back-patch to all supported branches.

Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
2019-03-19 12:49:27 -04:00
Peter Eisentraut 28988a84cf Reorder LOCALLOCK structure members to compact the size
Save 8 bytes (on x86-64) by filling up padding holes.

Author: Takayuki Tsunakawa <tsunakawa.takay@jp.fujitsu.com>
Discussion: https://www.postgresql.org/message-id/20190219001639.ft7kxir2iz644alf@alap3.anarazel.de
2019-03-19 14:07:08 +01:00
Alexander Korotkov 75c57058b0 Rename typedef in jsonpath_scan.l from "keyword" to "JsonPathKeyword"
Typedef name should be both unique and non-intersect with variable names
across all the sources.  That makes both pg_indent and debuggers happy.

Discussion: https://postgr.es/m/23865.1552936099%40sss.pgh.pa.us
2019-03-19 13:40:55 +03:00
Peter Eisentraut 590a87025b Ignore attempts to add TOAST table to shared or catalog tables
Running ALTER TABLE on any table will check if a TOAST table needs to be
added.  On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables.  On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs.  In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.

So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.

This allows running ALTER TABLE on such tables without messing up the
TOAST situation.  Legitimate uses for ALTER TABLE on system catalogs
include setting reloptions (say, fillfactor or autovacuum settings).

(All this still requires allow_system_table_mods, which is independent
of this.)

Discussion: https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975@2ndquadrant.com
2019-03-19 11:15:50 +01:00
Peter Eisentraut e537ac5182 Fix whitespace 2019-03-19 10:28:34 +01:00
Peter Eisentraut 1f050c08f9 Fix bug in support for collation attributes on older ICU versions
Unrecognized attribute names are supposed to be ignored.  But the code
would error out on an unrecognized attribute value even if it did not
recognize the attribute name.  So unrecognized attributes wouldn't
really be ignored unless the value happened to be one that matched a
recognized value.  This would break some important cases where the
attribute would be processed by ucol_open() directly.  Fix that and
add a test case.

The restructured code should also avoid compiler warnings about
initializing a UColAttribute value to -1, because the type might be an
unsigned enum.  (reported by Andres Freund)
2019-03-19 09:37:46 +01:00
Robert Haas 53680c116c Fix copyfuncs/equalfuncs support for VacuumStmt.
Commit 6776142a07 failed to do this,
and the buildfarm broke.

Patch by me, per advice from Tom Lane and Michael Paquier.

Discussion: http://postgr.es/m/13988.1552960403@sss.pgh.pa.us
2019-03-18 23:21:36 -04:00
Andrew Gierth 01bde4fa4c Implement OR REPLACE option for CREATE AGGREGATE.
Aggregates have acquired a dozen or so optional attributes in recent
years for things like parallel query and moving-aggregate mode; the
lack of an OR REPLACE option to add or change these for an existing
agg makes extension upgrades gratuitously hard. Rectify.
2019-03-19 01:16:50 +00:00
Tom Lane f2004f19ed Fix memory leak in printtup.c.
Commit f2dec34e1 changed things so that printtup's output stringinfo
buffer was allocated outside the per-row temporary context, not inside
it.  This creates a need to free that buffer explicitly when the temp
context is freed, but that was overlooked.  In most cases, this is all
happening inside a portal or executor context that will go away shortly
anyhow, but that's not always true.  Notably, the stringinfo ends up
getting leaked when JDBC uses row-at-a-time fetches.  For a query
that returns wide rows, that adds up after awhile.

Per bug #15700 from Matthias Otterbach.  Back-patch to v11 where the
faulty code was added.

Discussion: https://postgr.es/m/15700-8c408321a87d56bb@postgresql.org
2019-03-18 17:54:41 -04:00
Robert Haas 6776142a07 Revise parse tree representation for VACUUM and ANALYZE.
Like commit f41551f61f, this aims
to make it easier to add non-Boolean options to VACUUM (or, in
this case, to ANALYZE).  Instead of building up a bitmap of
options directly in the parser, build up a list of DefElem
objects and let ExecVacuum() sort it out; right now, we make
no use of the fact that a DefElem can carry an associated value,
but it will be easy to make that change in the future.

Masahiko Sawada

Discussion: http://postgr.es/m/CAD21AoATE4sn0jFFH3NcfUZXkU2BMbjBWB_kDj-XWYA-LXDcQA@mail.gmail.com
2019-03-18 15:14:52 -04:00
Robert Haas f41551f61f Fold vacuum's 'int options' parameter into VacuumParams.
Many places need both, so this allows a few functions to take one
fewer parameter.  More importantly, as soon as we add a VACUUM
option that takes a non-Boolean parameter, we need to replace
'int options' with a struct, and it seems better to think
of adding more fields to VacuumParams rather than passing around
both VacuumParams and a separate struct as well.

Patch by me, reviewed by Masahiko Sawada

Discussion: http://postgr.es/m/CA+Tgmob6g6-s50fyv8E8he7APfwCYYJ4z0wbZC2yZeSz=26CYQ@mail.gmail.com
2019-03-18 13:57:33 -04:00
Peter Eisentraut 1ffa59a85c Fix optimization of foreign-key on update actions
In RI_FKey_pk_upd_check_required(), we check among other things
whether the old and new key are equal, so that we don't need to run
cascade actions when nothing has actually changed.  This was using the
equality operator.  But the effect of this is that if a value in the
primary key is changed to one that "looks" different but compares as
equal, the update is not propagated.  (Examples are float -0 and 0 and
case-insensitive text.)  This appears to violate the SQL standard, and
it also behaves inconsistently if in a multicolumn key another key is
also updated that would cause the row to compare as not equal.

To fix, if we are looking at the PK table in ri_KeysEqual(), then do a
bytewise comparison similar to record_image_eq() instead of using the
equality operators.  This only makes a difference for ON UPDATE
CASCADE, but for consistency we treat all changes to the PK the same.  For
the FK table, we continue to use the equality operators.

Discussion: https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com
2019-03-18 17:19:21 +01:00
Peter Eisentraut fb5806533f Remove unused macro
It has never been used.
2019-03-18 09:13:08 +01:00
Alexander Korotkov a0478b6998 Revert 4178d8b91c
As it was agreed to worsen the code readability.

Discussion: https://postgr.es/m/ecfcfb5f-3233-eaa9-0c83-07056fb49a83%402ndquadrant.com
2019-03-18 09:54:29 +03:00
Michael Paquier 8b938d36f7 Refactor more code logic to update the control file
ce6afc6 has begun the refactoring work by plugging pg_rewind into a
central routine to update the control file, and left around two extra
copies, with one in xlog.c for the backend and one in pg_resetwal.c.  By
adding an extra option to the central routine in controldata_utils.c to
control if a flush of the control file needs to be done, it is proving
to be straight-forward to make xlog.c and pg_resetwal.c use the central
code path at the condition of moving the wait event tracking there.
Hence, this allows to have only one central code path to update the
control file, shaving the code from the duplicates.

This refactoring actually fixes a problem in pg_resetwal.  Previously,
the control file was first removed before being recreated.  So if a
crash happened between the moment the file was removed and the moment
the file was created, then it would have been possible to not have a
control file anymore in the database folder.

Author: Fabien Coelho
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre
2019-03-18 12:59:35 +09:00
Michael Paquier a7eadaaaaf Fix pg_rewind when rewinding new database with tables included
This fixes an issue introduced by 266b6ac, which has added filters to
exclude file patterns on the target and source data directories to
reduce the number of files transferred.  Filters get applied to both
the target and source data files, and include pg_internal.init which is
present for each database once relations are created on it.  However, if
the target differed from the source with at least one new database with
relations, the rewind would fail due to the exclusion filters applied on
the target files, causing pg_internal.init to still be present on the
target database folder, while its contents should have been completely
removed so as there is nothing remaining inside at the time of the
folder deletion.

Applying exclusion filters on the source files is fine, because this way
the amount of data copied from the source to the target is reduced.  And
actually, not applying the filters on the target is what pg_rewind
should do, because this causes such files to be automatically removed
during the rewind on the target.  Exclusion filters apply to paths which
are removed or recreated automatically at startup, so removing all those
files on the target during the rewind is a win.

The existing set of TAP tests already stresses the rewind of databases,
but it did not include any tables on those newly-created databases.
Creating extra tables in this case is enough to reproduce the failure,
so the existing tests are extended to close the gap.

Reported-by: Mithun Cy
Author: Michael Paquier
Discussion: https://postgr.es/m/CADq3xVYt6_pO7ZzmjOqPgY9HWsL=kLd-_tNyMtdfjKqEALDyTA@mail.gmail.com
Backpatch-through: 11
2019-03-18 10:34:45 +09:00
Michael Paquier fa33956595 Error out in pg_checksums on incompatible block size
pg_checksums is compiled with a given block size and has a hard
dependency to it per the way checksums are calculated via
checksum_impl.h, and trying to use the tool on a data folder which has
not the same block size would result in incorrect checksum calculations
and/or block read errors, meaning that the data folder is corrupted.
This is harmless as checksums are only checked now, but very confusing
for the user so issue an error properly if the block size used at
compilation and the block size used in the data folder do not match.

Reported-by: Sergei Kornilov
Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander
Discussion: https://postgr.es/m/20190317054657.GA3357@paquier.xyz
ackpatch-through: 11
2019-03-18 09:11:52 +09:00
Alexander Korotkov 4178d8b91c Beautify initialization of JsonValueList and JsonLikeRegexContext
Instead of tricky assignment to {0} introduce special macros, which
explicitly initialize every field.
2019-03-17 12:58:26 +03:00
Alexander Korotkov aa1b7f3866 Apply const qualifier to keywords of jsonpath_scan.l
Discussion: https://postgr.es/m/CAEeOP_a-Pfy%3DU9-f%3DgQ0AsB8FrxrC8xCTVq%2BeO71-2VoWP5cag%40mail.gmail.com
Author: Mark G
2019-03-17 12:50:38 +03:00
Alexander Korotkov c183a07f27 Remove some make rules added in 142c400d72
Because they fail build of jsonpath_scan.c.
2019-03-17 11:16:42 +03:00
Alexander Korotkov 142c400d72 Fix make rules for jsonpath grammar making them similar to SQL grammar
Reported-by: Jeff Janes, Tom Lane
Discussion: https://postgr.es/m/CAMkU%3D1w1qBvoW82ZTFpAKae027R-2OHw-m6ALe0VQRNAFueBVA%40mail.gmail.com
2019-03-17 10:55:52 +03:00
Peter Eisentraut b8f9a2a69a Add support for collation attributes on older ICU versions
Starting in ICU 54, collation customization attributes can be
specified in the locale string, for example
"@colStrength=primary;colCaseLevel=yes".  Add support for this for
older ICU versions as well, by adding some minimal parsing of the
attributes in the locale string and calling ucol_setAttribute() on
them.  This is essentially what never ICU versions do internally in
ucol_open().  This was we can offer this functionality in a consistent
way in all ICU versions supported by PostgreSQL.

Also add some tests for ICU collation customization.

Reported-by: Daniel Verite <daniel@manitou-mail.org>
Discussion: https://www.postgresql.org/message-id/0270ebd4-f67c-8774-1a5a-91adfb9bb41f@2ndquadrant.com
2019-03-17 08:47:15 +01:00
Alexander Korotkov 042162d628 Fix compiler warning in jsonpath_exec.c
Warning was observed in gcc 4.4.6, gcc 4.4.7 and probably others.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/25151.1552751426%40sss.pgh.pa.us
2019-03-17 10:12:06 +03:00
Peter Eisentraut 0176eb210e Remove another unnecessary application_name specification in test
see 8e93a516e6
2019-03-16 22:38:59 +01:00
Tom Lane c43ecdee0f Further adjust the tests for the hyperbolic functions.
It looks like we can leave in most of the test cases for Infinity/NaN
inputs, but buildfarm member jacana gets the wrong answer for acosh(Inf).
It's not worth carrying a variant expected file for that, so just disable
that one test.

Discussion: https://postgr.es/m/E1h3nUY-0000sM-Vf@gemulon.postgresql.org
2019-03-16 15:50:13 -04:00
Tom Lane 20f7c3d560 Suppress -Wimplicit-fallthrough warnings in new jsonpath code.
Per buildfarm.  See commit 41c912cad for precedent.
2019-03-16 12:34:46 -04:00
Amit Kapila f27314ff9a Update copyright year in files added by 1bb5e78218. 2019-03-16 16:00:38 +05:30
Alexander Korotkov 16d489b0fe Numeric error suppression in jsonpath
Add support of numeric error suppression to jsonpath as it's required by
standard.  This commit doesn't use PG_TRY()/PG_CATCH() in order to implement
that.  Instead, it provides internal versions of numeric functions used, which
support error suppression.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Alexander Korotkov, Nikita Glukhov
Reviewed-by: Tomas Vondra
2019-03-16 12:21:19 +03:00
Alexander Korotkov 72b6460336 Partial implementation of SQL/JSON path language
SQL 2016 standards among other things contains set of SQL/JSON features for
JSON processing inside of relational database.  The core of SQL/JSON is JSON
path language, allowing access parts of JSON documents and make computations
over them.  This commit implements partial support JSON path language as
separate datatype called "jsonpath".  The implementation is partial because
it's lacking datetime support and suppression of numeric errors.  Missing
features will be added later by separate commits.

Support of SQL/JSON features requires implementation of separate nodes, and it
will be considered in subsequent patches.  This commit includes following
set of plain functions, allowing to execute jsonpath over jsonb values:

 * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]).
 * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]).

This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which
are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb,
jsonpath) correspondingly.  These operators will have an index support
(implemented in subsequent patches).

Catversion bumped, to add new functions and operators.

Code was written by Nikita Glukhov and Teodor Sigaev, revised by me.
Documentation was written by Oleg Bartunov and Liudmila Mantrova.  The work
was inspired by Oleg Bartunov.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 12:16:48 +03:00
Peter Eisentraut 893d6f8a1f Avoid casting away a const 2019-03-16 10:13:03 +01:00
Peter Eisentraut 8e93a516e6 Don't propagate PGAPPNAME through pg_ctl in tests
When libpq is loaded in the server (for instance, by
libpqwalreceiver), it may use libpq environment variables set in the
postmaster environment for connection parameter defaults.  This has
some confusing effects in our test suites.  For example, the TAP test
infrastructure sets PGAPPNAME to allow identifying clients in the
server log.  But this environment variable is also inherited by
temporary servers started with pg_ctl and is then in turn used by
libpqwalreceiver as the application_name for connecting to remote
servers where it then shows up in pg_stat_replication and is relevant
for things like synchronous_standby_names.  Replication already has a
suitable default for application_name, and overriding that
accidentally then requires the individual test cases to re-override
that, which is all very confusing and unnecessary.

To fix, unset PGAPPNAME temporarily before running pg_ctl start or
restart in the tests.

More comprehensive approaches like unsetting all environment variables
in pg_ctl were considered but might be too complicated to achieve
portably.

The now unnecessary re-overriding of application_name by test cases is
also removed.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/33383613-690e-6f1b-d5ba-4957ff40f6ce@2ndquadrant.com
2019-03-16 08:58:40 +01:00
Michael Meskes c21d6033f7 Use correct connection name variable in ecpglib.
Fixed-by: Kuroda-san <kuroda.hayato@jp.fujitsu.com>
2019-03-16 04:01:06 +01:00
Amit Kapila 06c8a5090e Improve code comments in b0eaa4c51b.
Author: John Naylor
Discussion: https://postgr.es/m/CACPNZCswjyGJxTT=mxHgK=Z=mJ9uJ4WEx_UO=bNwpR_i0EaHHg@mail.gmail.com
2019-03-16 06:55:56 +05:30
Tom Lane d3f48dfae4 Further reduce memory footprint of CLOBBER_CACHE_ALWAYS testing.
Some buildfarm members using CLOBBER_CACHE_ALWAYS have been having OOM
problems of late.  Commit 2455ab488 addressed this problem by recovering
space transiently used within RelationBuildPartitionDesc, but it turns
out that leaves quite a lot on the table, because other subroutines of
RelationBuildDesc also leak memory like mad.  Let's move the temp-context
management into RelationBuildDesc so that leakage from the other
subroutines is also recovered.

I examined this issue by arranging for postgres.c to dump the size of
MessageContext just before resetting it in each command cycle, and
then running the update.sql regression test (which is one of the two
that are seeing buildfarm OOMs) with and without CLOBBER_CACHE_ALWAYS.
Before 2455ab488, the peak space usage with CCA was as much as 250MB.
That patch got it down to ~80MB, but with this patch it's about 0.5MB,
and indeed the space usage now seems nearly indistinguishable from a
non-CCA build.

RelationBuildDesc's traditional behavior of not worrying about leaking
transient data is of many years' standing, so I'm pretty hesitant to
change that without more evidence that it'd be useful in a normal build.
(So far as I can see, non-CCA memory consumption is about the same with
or without this change, whuch if anything suggests that it isn't useful.)
Hence, configure the patch so that we recover space only when
CLOBBER_CACHE_ALWAYS or CLOBBER_CACHE_RECURSIVELY is defined.  However,
that choice can be overridden at compile time, in case somebody would
like to do some performance testing and try to develop evidence for
changing that decision.

It's possible that we ought to back-patch this change, but in the
absence of back-branch OOM problems in the buildfarm, I'm not in
a hurry to do that.

Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-03-15 13:46:26 -04:00
Peter Eisentraut aefcc2bba2 PL/Tcl: Improve trigger tests organization
The trigger tests for PL/Tcl were spread aroud pltcl_setup.sql and
pltcl_queries.sql, mixed with other tests, which makes them hard to
follow and edit.  Move all the trigger-related pieces to a new file
pltcl_trigger.sql.  This also makes the test setup more similar to
plperl and plpython.
2019-03-15 12:42:07 +01:00
Peter Eisentraut 69039fda83 Add walreceiver API to get remote server version
Add a separate walreceiver API function walrcv_server_version() to get
the version of the remote server, instead of doing it as part of
walrcv_identify_system().  This allows the server version to be
available even for uses that don't call IDENTIFY_SYSTEM, and it seems
cleaner anyway.

This is for an upcoming patch, not currently used.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/20190115071359.GF1433@paquier.xyz
2019-03-15 10:16:26 +01:00
Michael Paquier 4e197bf195 Fix typo related to to_tsvector() in tests of json and jsonb
Author: Sho Kato
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/25C1C6B2E7BE044889E4FE8643A58BA963E1D03D@G01JPEXMBKW03
2019-03-15 16:20:11 +09:00
Thomas Munro bb16aba50c Enable parallel query with SERIALIZABLE isolation.
Previously, the SERIALIZABLE isolation level prevented parallel query
from being used.  Allow the two features to be used together by
sharing the leader's SERIALIZABLEXACT with parallel workers.

An extra per-SERIALIZABLEXACT LWLock is introduced to make it safe to
share, and new logic is introduced to coordinate the early release
of the SERIALIZABLEXACT required for the SXACT_FLAG_RO_SAFE
optimization, as follows:

The first backend to observe the SXACT_FLAG_RO_SAFE flag (set by
some other transaction) will 'partially release' the SERIALIZABLEXACT,
meaning that the conflicts and locks it holds are released, but the
SERIALIZABLEXACT itself will remain active because other backends
might still have a pointer to it.

Whenever any backend notices the SXACT_FLAG_RO_SAFE flag, it clears
its own MySerializableXact variable and frees local resources so that
it can skip SSI checks for the rest of the transaction.  In the
special case of the leader process, it transfers the SERIALIZABLEXACT
to a new variable SavedSerializableXact, so that it can be completely
released at the end of the transaction after all workers have exited.

Remove the serializable_okay flag added to CreateParallelContext() by
commit 9da0cc35, because it's now redundant.

Author: Thomas Munro
Reviewed-by: Haribabu Kommi, Robert Haas, Masahiko Sawada, Kevin Grittner
Discussion: https://postgr.es/m/CAEepm=0gXGYhtrVDWOTHS8SQQy_=S9xo+8oCxGLWZAOoeJ=yzQ@mail.gmail.com
2019-03-15 17:47:04 +13:00
Amit Kapila 13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs.
If a heap on the old cluster has 4 pages or fewer, and the old cluster
was PG v11 or earlier, don't copy or link the FSM. This will shrink
space usage for installations with large numbers of small tables.

This will allow pg_upgrade to take advantage of commit b0eaa4c51b where
we have avoided creation of the free space map for small heap relations.

Author: John Naylor
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com
2019-03-15 08:25:57 +05:30
Peter Eisentraut 2fadf24e24 Reorder identity regression test
The previous test order had the effect that if something was wrong
with the identity functionality, the create_table_like test would
likely fail or crash first, which is confusing.  Reorder so that the
identity test comes before create_table_like.
2019-03-15 00:21:30 +01:00
Tom Lane de57004799 Fix some oversights in commit 2455ab488.
The idea was to generate all the junk in a destroyable subcontext rather
than leaking it in the caller's context, but partition_bounds_create was
still being called in the caller's context, allowing plenty of scope for
leakage.  Also, get_rel_relkind() was still being called in the rel's
rd_pdcxt, creating a risk of session-lifespan memory wastage.

Simplify the logic a bit while at it.  Also, reduce rd_pdcxt to
ALLOCSET_SMALL_SIZES, since it seems likely to not usually be big.

Probably something like this needs to be back-patched into v11,
but for now let's get some buildfarm testing on this.

Discussion: https://postgr.es/m/15943.1552601288@sss.pgh.pa.us
2019-03-14 18:36:33 -04:00
Peter Eisentraut 61dc407893 Improve code comment 2019-03-14 22:44:21 +01:00
Peter Eisentraut 8bee36708f Remove unused #include 2019-03-14 22:03:14 +01:00
Peter Eisentraut b13a913607 Add BKI_DEFAULT to pg_class.relrewrite
This column is always 0 on disk, so it doesn't have to be tracked
separately for each entry.
2019-03-14 21:25:39 +01:00
Tom Lane 0a9d7e1f6d Ensure dummy paths have correct required_outer if rel is parameterized.
The assertions added by commits 34ea1ab7f et al found another problem:
set_dummy_rel_pathlist and mark_dummy_rel were failing to label
the dummy paths they create with the correct outer_relids, in case
the relation is necessarily parameterized due to having lateral
references in its tlist.  It's likely that this has no user-visible
consequences in production builds, at the moment; but still an assertion
failure is a bad thing, so back-patch the fix.

Per bug #15694 from Roman Zharkov (via Alexander Lakhin)
and an independent report by Tushar Ahuja.

Discussion: https://postgr.es/m/15694-74f2ca97e7044f7f@postgresql.org
Discussion: https://postgr.es/m/7d72ab20-c725-3ce2-f99d-4e64dd8a0de6@enterprisedb.com
2019-03-14 12:16:36 -04:00
Robert Haas 2455ab4884 Defend against leaks into RelationBuildPartitionDesc.
In normal builds, this isn't very important, because the leaks go
into fairly short-lived contexts, but under CLOBBER_CACHE_ALWAYS,
this can result in leaking hundreds of megabytes into MessageContext,
which probably explains recent failures on hyrax.

This may or may not be the best long-term strategy for dealing
with this leak, but we can change it later if we come up with
something better.  For now, do this to make the buildfarm green
again (hopefully).  Commit 898e5e3290
seems to have exacerbated this problem for reasons that are not
quite clear, but I don't believe it's actually the cause.

Discussion: http://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-03-14 12:14:47 -04:00
Peter Eisentraut c6ff0b892c Refactor ParamListInfo initialization
There were six copies of identical nontrivial code.  Put it into a
function.
2019-03-14 13:30:09 +01:00
Michael Paquier 6eebfdc38b Fix thinko when bumping on temporary directories in pg_checksums
This fixes an oversight from 5c99513.  This has no actual consequence as
PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR have the same value so when
bumping on a temporary path the directory scan was still moving on to
the next entry instead of skipping the rest of the scan, but let's keep
the logic correct.

Author: Michael Banck
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20190314.115417.58230569.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch-through: 11
2019-03-14 14:14:49 +09:00
Tom Lane 401b87a24f Sync commentary in transam.h and bki.sgml.
Commit a6417078c missed updating some comments in transam.h about
reservation of high OIDs for development purposes.  Also tamp down
an over-optimistic comment there about how easy it'd be to change
FirstNormalObjectId.

Earlier, commit 09568ec3d failed to update bki.sgml for the split
between genbki.pl-assigned OIDs and those assigned during initdb.

Also fix genbki.pl so that it will complain if it overruns
that split.  It's possible that doing so would have no very bad
consequences, but that's no excuse for not detecting it.
2019-03-14 00:23:40 -04:00
Michael Paquier 364298be22 Fix race condition in recently-added TAP test for recovery consistency
A couple of queries are run on the primary to create and fill in a test
table, which gets checked on the standby afterwards.  However the test
was not waiting for the confirmation that the necessary records have
been replayed on the standby, leading to spurious failures.

Per buildfarm member loach.  Thanks to Thomas Munro for the report and
Tom Lane for the failure analysis.

Discussion: https://postgr.es/m/CA+hUKGLUpqG52xtriUz5RpmeKPoEfNxNc-CginG+Cx+X2-Ycew@mail.gmail.com
2019-03-14 12:41:45 +09:00
Tom Lane c015f853bf Adjust the tests for the hyperbolic functions.
Preliminary results from the buildfarm suggest that no platform gets
commit c6f153dcf's test cases wrong by more than one or two units in
the last place, so setting extra_float_digits = 0 should be plenty
to hide the cross-platform variations.

Also, add tests for Infinity/NaN inputs.  I think it highly likely
that we'll end up removing these again, rather than adding code to
make ancient platforms conform.  But it seems useful to find out
just how many platforms have such issues before we make a decision.

Discussion: https://postgr.es/m/E1h3nUY-0000sM-Vf@gemulon.postgresql.org
2019-03-13 21:05:33 -04:00
Tom Lane c6f153dcfe Rethink how to test the hyperbolic functions.
The initial commit tried to test them on trivial cases such as 0,
reasoning that we shouldn't hit any portability issues that way.
The buildfarm immediately proved that hope ill-founded, and anyway
it's not a great testing scheme because it doesn't prove that we're
even calling the right library function for each SQL function.

Instead, let's test them at inputs such as 1 (or something within
the valid range, as needed), so that each function should produce
a different output.

As committed, this is just about certain to show portability
failures, because it's very unlikely that every platform computes
these functions the same as mine down to the last bit.  However,
I want to put it through a buildfarm cycle this way, so that
we can see how big the variations are.  The plan is to add
"set extra_float_digits = -1", or whatever we need in order to
hide the variations; but first we need data.

Discussion: https://postgr.es/m/E1h3nUY-0000sM-Vf@gemulon.postgresql.org
2019-03-13 18:13:45 -04:00
Thomas Munro c6c9474aaf Use condition variables to wait for checkpoints.
Previously we used a polling/sleeping loop to wait for checkpoints
to begin and end, which leads to up to a couple hundred milliseconds
of needless thumb-twiddling.  Use condition variables instead.

Author: Thomas Munro
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGLY7sDe%2Bbg1K%3DbnEzOofGoo4bJHYh9%2BcDCXJepb6DQmLw%40mail.gmail.com
2019-03-14 10:59:33 +13:00
Robert Haas 5655565c07 Revert setting client_min_messages to 'debug1' in new tests.
The buildfarm doesn't like this, because some buildfarm members have
log_statement = 'all'.  We could change the log level of the messages
instead, but Tom doesn't like that.  So let's do this instead, at
least for now.

Patch by Sergei Kornilov, applied here in reverse.

Discussion: http://postgr.es/m/2123251552490241@myt6-fe24916a5562.qloud-c.yandex.net
2019-03-13 13:18:25 -04:00
Peter Eisentraut f177660ab0 Include all columns in default names for foreign key constraints
When creating a name for a foreign key constraint when none is
specified, use all column names instead of only the first one, similar
to how it is already done for index names.

Author: Paul Martinez <hellopfm@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAF+2_SFjky6XRfLNRXpkG97W6PRbOO_mjAxqXzAAimU=c7w7_A@mail.gmail.com
2019-03-13 14:25:42 +01:00
Robert Haas bbb96c3704 Allow ALTER TABLE .. SET NOT NULL to skip provably unnecessary scans.
If existing CHECK or NOT NULL constraints preclude the presence
of nulls, we need not look to see whether any are present.

Sergei Kornilov, reviewed by Stephen Frost, Ildar Musin, David Rowley,
and by me.

Discussion: http://postgr.es/m/81911511895540@web58j.yandex.ru
2019-03-13 08:55:00 -04:00
Michael Paquier b0825d28ea Add TAP test to check consistency of minimum recovery LSN
c186ba13 has fixed an issue related to the updates of the minimum
recovery LSN across multiple processes on standbys, but we never really
had a test case able to reliably check its logic.

This commit introduces a new test case to close the gap, and is designed
to check the consistency of data based on the minimum recovery point set
by either the startup process or the checkpointer for both an offline
cluster (by looking at the on-disk page headers) and an online cluster
(using pageinspect).

Note that with c186ba13 reverted, this test fails badly for both the
online and offline cases, as designed.

Author: Michael Paquier, Andrew Gierth
Reviewed-by: Andrew Gierth, Georgios Kokolatos, Arthur Zakirov
Discussion: https://postgr.es/m/20181108044525.GA17482@paquier.xyz
2019-03-13 14:58:24 +09:00
Michael Paquier 6dd263cfaa Rename pg_verify_checksums to pg_checksums
The current tool name is too restrictive and focuses only on verifying
checksums.  As more options to control checksums for an offline cluster
are planned to be added, switch to a more generic name.  Documentation
as well as all past references to the tool are updated.

Author: Michael Paquier
Reviewed-by: Michael Banck, Fabien Coelho, Seigei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
2019-03-13 10:43:20 +09:00
Michael Paquier c9ae7f704c Fix cross-version compatibility checks of pg_verify_checksums
pg_verify_checksums performs a read of the control file, and the data it
fetches should be from a data folder compatible with the major version
of Postgres the binary has been compiled with, but we never actually
checked that compatibility.

Reported-by: Sergei Kornilov
Author: Michael Paquier
Reviewed-by: Sergei Kornilov
Discussion: https://postgr.es/m/155231347133.16480.11453587097036807558.pgcf@coridan.postgresql.org
Backpatch-through: 11
2019-03-13 09:51:02 +09:00
Peter Geoghegan 3f34283973 Correct obsolete nbtree page split comment.
Commit 40dae7ec53, which made the nbtree page split algorithm more
robust, made _bt_insert_parent() only unlock the right child of the
parent page before inserting a new downlink into the parent.  Update a
comment from the Berkeley days claiming that both left and right child
pages are unlocked before the new downlink actually gets inserted.

The claim that it is okay to release both locks early based on Lehman
and Yao's say-so never made much sense.  Lehman and Yao must sometimes
"couple" buffer locks across a pair of internal pages when relocating a
downlink, unlike the corresponding code within _bt_getstack().
2019-03-12 16:40:05 -07:00
Tom Lane f1d85aa98e Add support for hyperbolic functions, as well as log10().
The SQL:2016 standard adds support for the hyperbolic functions
sinh(), cosh(), and tanh().  POSIX has long required libm to
provide those functions as well as their inverses asinh(),
acosh(), atanh().  Hence, let's just expose the libm functions
to the SQL level.  As with the trig functions, we only implement
versions for float8, not numeric.

For the moment, we'll assume that all platforms actually do have
these functions; if experience teaches otherwise, some autoconf
effort may be needed.

SQL:2016 also adds support for base-10 logarithm, but with the
function name log10(), whereas the name we've long used is log().
Add aliases named log10() for the float8 and numeric versions.

Lætitia Avrot

Discussion: https://postgr.es/m/CAB_COdguG22LO=rnxDQ2DW1uzv8aQoUzyDQNJjrR4k00XSgm5w@mail.gmail.com
2019-03-12 15:55:09 -04:00
Tom Lane 3aa0395d4e Remove remaining hard-wired OID references in the initial catalog data.
In the v11-era commits that taught genbki.pl to resolve symbolic
OID references in the initial catalog data, we didn't bother to
make every last reference symbolic; some of the catalogs have so
few initial rows that it didn't seem worthwhile.

However, the new project policy that OIDs assigned by new patches
should be automatically renumberable changes this calculus.
A patch that wants to add a row in one of these catalogs would have
a problem when the OID it assigns gets renumbered.  Hence, do the
mop-up work needed to make all OID references in initial data be
symbolic, and establish an associated project policy that we'll
never again write a hard-wired OID reference there.

No catversion bump since the contents of postgres.bki aren't
actually changed by this commit.

Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
2019-03-12 12:30:35 -04:00
Tom Lane a6417078c4 Create a script that can renumber manually-assigned OIDs.
This commit adds a Perl script renumber_oids.pl, which can reassign a
range of manually-assigned OIDs to someplace else by modifying OID
fields of the catalog *.dat files and OID-assigning macros in the
catalog *.h files.

Up to now, we've encouraged new patches that need manually-assigned
OIDs to use OIDs just above the range of existing OIDs.  Predictably,
this leads to patches stepping on each others' toes, as whichever
one gets committed first creates an OID conflict that other patch
author(s) have to resolve manually.  With the availability of
renumber_oids.pl, we can eliminate a lot of this hassle.
The new project policy, therefore, is:

* Encourage new patches to use high OIDs (the documentation suggests
choosing a block of OIDs at random in 8000..9999).

* After feature freeze in each development cycle, run renumber_oids.pl
to move all such OIDs down to lower numbers, thus freeing the high OID
range for the next development cycle.

This plan should greatly reduce the risk of OID collisions between
concurrently-developed patches.  Also, if such a collision happens
anyway, we have the option to resolve it without much effort by doing
an off-schedule OID renumbering to get the first-committed patch out
of the way.  Or a patch author could use renumber_oids.pl to change
their patch's assignments without much pain.

This approach does put a premium on not hard-wiring any OID values
in places where renumber_oids.pl and genbki.pl can't fix them.
Project practice in that respect seems to be pretty good already,
but a follow-on patch will sand down some rough edges.

John Naylor and Tom Lane, per an idea of Peter Geoghegan's

Discussion: https://postgr.es/m/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-=eaochT0dd2BN9CQ@mail.gmail.com
2019-03-12 10:50:48 -04:00
Etsuro Fujita b5afdde6a7 Fix testing of parallel-safety of scan/join target.
In commit 960df2a971 ("Correctly assess parallel-safety of tlists when
SRFs are used."), the testing of scan/join target was done incorrectly,
which caused a plan-quality problem.  Backpatch through to v11 where
the aforementioned commit went in, since this is a regression from v10.

Author: Etsuro Fujita
Reviewed-by: Robert Haas and Tom Lane
Discussion: https://postgr.es/m/5C75303E.8020303@lab.ntt.co.jp
2019-03-12 16:21:57 +09:00
Amit Kapila 6f918159a9 Add more tests for FSM.
In commit b0eaa4c51b, we left out a test that used a vacuum to remove dead
rows as the behavior of test was not predictable.  This test has been
rewritten to use fillfactor instead to control free space.  Since we no
longer need to remove dead rows as part of the test, put the fsm regression
test in a parallel group.

Author: John Naylor
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1L=qWp_bJ5aTc9+fy4Ewx2LPaLWY-RbR4a60g_rupCKnQ@mail.gmail.com
2019-03-12 08:14:28 +05:30
Michael Paquier ce6afc6823 Add routine able to update the control file to src/common/
This adds a new routine to src/common/ which is compatible with both the
frontend and backend code, able to update the control file's contents.
This is now getting used only by pg_rewind, but some upcoming patches
which add more control on checksums for offline instances will make use
of it.  This could also get used more by the backend as xlog.c has its
own flavor of the same logic with some wait events and an additional
flush phase before closing the opened file descriptor, but this is let
as separate work.

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
2019-03-12 10:03:33 +09:00
Tom Lane 1a83a80a2f Allow fractional input values for integer GUCs, and improve rounding logic.
Historically guc.c has just refused examples like set work_mem = '30.1GB',
but it seems more useful for it to take that and round off the value to
some reasonable approximation of what the user said.  Just rounding to
the parameter's native unit would work, but it would lead to rather
silly-looking settings, such as 31562138kB for this example.  Instead
let's round to the nearest multiple of the next smaller unit (if any),
producing 30822MB.

Also, do the units conversion math in floating point and round to integer
(if needed) only at the end.  This produces saner results for inputs that
aren't exact multiples of the parameter's native unit, and removes another
difference in the behavior for integer vs. float parameters.

In passing, document the ability to use hex or octal input where it
ought to be documented.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-11 19:13:55 -04:00
Andres Freund 32b8f0b033 Remove spurious return.
Per buildfarm member anole.

Author: Andres Freund
2019-03-11 15:04:00 -07:00
Tom Lane d9c5e9629b Give up on testing guc.c's behavior for "infinity" inputs.
Further buildfarm testing shows that on the machines that are failing
ac75959cd's test case, what we're actually getting from strtod("-infinity")
is a syntax error (endptr == value) not ERANGE at all.  This test case
is not worth carrying two sets of expected output for, so just remove it,
and revert commit b212245f9's misguided attempt to work around the platform
dependency.

Discussion: https://postgr.es/m/E1h33xk-0001Og-Gs@gemulon.postgresql.org
2019-03-11 17:53:09 -04:00
Andres Freund 8cacea7a72 Ensure sufficient alignment for ParallelTableScanDescData in BTShared.
Previously ParallelTableScanDescData was just a member in BTShared,
but after c2fe139c2 that doesn't guarantee sufficient alignment as
specific AMs might (are likely to) need atomic variables in the
struct.

One might think that MAXALIGNing would be sufficient, but as a
comment in shm_toc_allocate() explains, that's not enough. For now,
copy the hack described there.

For parallel sequential scans no such change is needed, as its
allocations go through shm_toc_allocate().

An alternative approach would have been to allocate the parallel scan
descriptor in a separate TOC entry, but there seems little benefit in
doing so.

Per buildfarm member dromedary.

Author: Andres Freund
Discussion: https://postgr.es/m/20190311203126.ty5gbfz42gjbm6i6@alap3.anarazel.de
2019-03-11 14:26:43 -07:00
Andres Freund c2fe139c20 tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:

1) Heap scans need to be generalized into table scans. Do this by
   introducing TableScanDesc, which will be the "base class" for
   individual AMs. This contains the AM independent fields from
   HeapScanDesc.

   The previous heap_{beginscan,rescan,endscan} et al. have been
   replaced with a table_ version.

   There's no direct replacement for heap_getnext(), as that returned
   a HeapTuple, which is undesirable for a other AMs. Instead there's
   table_scan_getnextslot().  But note that heap_getnext() lives on,
   it's still used widely to access catalog tables.

   This is achieved by new scan_begin, scan_end, scan_rescan,
   scan_getnextslot callbacks.

2) The portion of parallel scans that's shared between backends need
   to be able to do so without the user doing per-AM work. To achieve
   that new parallelscan_{estimate, initialize, reinitialize}
   callbacks are introduced, which operate on a new
   ParallelTableScanDesc, which again can be subclassed by AMs.

   As it is likely that several AMs are going to be block oriented,
   block oriented callbacks that can be shared between such AMs are
   provided and used by heap. table_block_parallelscan_{estimate,
   intiialize, reinitialize} as callbacks, and
   table_block_parallelscan_{nextpage, init} for use in AMs. These
   operate on a ParallelBlockTableScanDesc.

3) Index scans need to be able to access tables to return a tuple, and
   there needs to be state across individual accesses to the heap to
   store state like buffers. That's now handled by introducing a
   sort-of-scan IndexFetchTable, which again is intended to be
   subclassed by individual AMs (for heap IndexFetchHeap).

   The relevant callbacks for an AM are index_fetch_{end, begin,
   reset} to create the necessary state, and index_fetch_tuple to
   retrieve an indexed tuple.  Note that index_fetch_tuple
   implementations need to be smarter than just blindly fetching the
   tuples for AMs that have optimizations similar to heap's HOT - the
   currently alive tuple in the update chain needs to be fetched if
   appropriate.

   Similar to table_scan_getnextslot(), it's undesirable to continue
   to return HeapTuples. Thus index_fetch_heap (might want to rename
   that later) now accepts a slot as an argument. Core code doesn't
   have a lot of call sites performing index scans without going
   through the systable_* API (in contrast to loads of heap_getnext
   calls and working directly with HeapTuples).

   Index scans now store the result of a search in
   IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
   target is not generally a HeapTuple anymore that seems cleaner.

To be able to sensible adapt code to use the above, two further
callbacks have been introduced:

a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
   slots capable of holding a tuple of the AMs
   type. table_slot_callbacks() and table_slot_create() are based
   upon that, but have additional logic to deal with views, foreign
   tables, etc.

   While this change could have been done separately, nearly all the
   call sites that needed to be adapted for the rest of this commit
   also would have been needed to be adapted for
   table_slot_callbacks(), making separation not worthwhile.

b) tuple_satisfies_snapshot checks whether the tuple in a slot is
   currently visible according to a snapshot. That's required as a few
   places now don't have a buffer + HeapTuple around, but a
   slot (which in heap's case internally has that information).

Additionally a few infrastructure changes were needed:

I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
   internally uses a slot to keep track of tuples. While
   systable_getnext() still returns HeapTuples, and will so for the
   foreseeable future, the index API (see 1) above) now only deals with
   slots.

The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.

Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 12:46:41 -07:00
Andrew Dunstan a478415281 pgbench: increase the maximum number of variables/arguments
pgbench's arbitrary limit of 10 arguments for SQL statements or
metacommands is far too low. Increase it to 256.

This results in a very modest increase in memory usage, not enough to
worry about.

The maximum includes the SQL statement or metacommand. This is reflected
in the comments and revised TAP tests.

Simon Riggs and Dagfinn Ilmari Mannsåker with some light editing by me.
Reviewed by: David Rowley and Fabien Coelho

Discussion: https://postgr.es/m/CANP8+jJiMJOAf-dLoHuR-8GENiK+eHTY=Omw38Qx7j2g0NDTXA@mail.gmail.com
2019-03-11 13:18:37 -04:00
Amit Kapila a6e48da088 Fix typos in commit 8586bf7ed8.
Author: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KNv1Mg2krf4E9ssWFnE=8A9mZ1VbVywXBZTFSzb+wP2g@mail.gmail.com
2019-03-11 09:58:46 -07:00
Alvaro Herrera af38498d4c Move hash_any prototype from access/hash.h to utils/hashutils.h
... as well as its implementation from backend/access/hash/hashfunc.c to
backend/utils/hash/hashfn.c.

access/hash is the place for the hash index AM, not really appropriate
for generic facilities, which is what hash_any is; having things the old
way meant that anything using hash_any had to include the AM's include
file, pointlessly polluting its namespace with unrelated, unnecessary
cruft.

Also move the HTEqual strategy number to access/stratnum.h from
access/hash.h.

To avoid breaking third-party extension code, add an #include
"utils/hashutils.h" to access/hash.h.  (An easily removed line by
committers who enjoy their asbestos suits to protect them from angry
extension authors.)

Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
2019-03-11 13:17:50 -03:00
Tom Lane b212245f96 In guc.c, ignore ERANGE errors from strtod().
Instead, just proceed with the infinity or zero result that it should
return for overflow/underflow.  This avoids a platform dependency,
in that various versions of strtod are inconsistent about whether they
signal ERANGE for a value that's specified as infinity.

It's possible this won't be enough to remove the buildfarm failures
we're seeing from ac75959cd, in which case I'll take out the infinity
test case that commit added.  But first let's see if we can fix it.

Discussion: https://postgr.es/m/E1h33xk-0001Og-Gs@gemulon.postgresql.org
2019-03-11 11:25:26 -04:00
Michael Meskes 08cecfaf60 Fix potential memory access violation in ecpg if filename of include file is
shorter than 2 characters.

Patch by: "Wu, Fei" <wufei.fnst@cn.fujitsu.com>
2019-03-11 16:11:16 +01:00
Michael Meskes 98bdaab0d9 Fix ecpglib regression that made it impossible to close a cursor that was
opened in a prepared statement.

Patch by: "Kuroda, Hayato" <kuroda.hayato@jp.fujitsu.com>
2019-03-11 16:00:13 +01:00
Peter Eisentraut 3c06715447 Remove unused macro
Use was removed in 25ca5a9a54.
2019-03-11 09:29:50 +01:00
Peter Eisentraut 27f3dea648 psql: Add documentation URL to \help output
Add a link to the specific command's reference web page to the bottom
of its \help output.

Discussion: https://www.postgresql.org/message-id/flat/40179bd0-fa7d-4108-1991-a20ae9ad5667%402ndquadrant.com
2019-03-11 09:11:37 +01:00
Michael Paquier f2d84a4a6b Adjust error message for partial writes in WAL segments
93473c6 has removed openLogOff, changing on the way the error message
which is used to report partial writes to WAL segments.  The
newly-introduced error message used the offset up to which the write has
happened, keeping always the same total length to write.  This changes
the error message so as the number of bytes left to write are reported.

Reported-by: Michael Paquier
Author: Robert Haas
Discussion: https://postgr.es/m/20190306235251.GA17293@paquier.xyz
2019-03-11 09:31:25 +09:00
Tom Lane cbccac371c Reduce the default value of autovacuum_vacuum_cost_delay to 2ms.
This is a better way to implement the desired change of increasing
autovacuum's default resource consumption.

Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
2019-03-10 15:16:21 -04:00
Tom Lane 52985e4fea Revert "Increase the default vacuum_cost_limit from 200 to 2000"
This reverts commit bd09503e63.

Per discussion, it seems like what we should do instead is to
reduce the default value of autovacuum_vacuum_cost_delay by the
same factor.  That's functionally equivalent as long as the
platform can accurately service the smaller delay request, which
should be true on anything released in the last 10 years or more.
And smaller, more-closely-spaced delays are better in terms of
providing a steady I/O load.

Discussion: https://postgr.es/m/28720.1552101086@sss.pgh.pa.us
2019-03-10 15:05:25 -04:00
Tom Lane caf626b2cd Convert [autovacuum_]vacuum_cost_delay into floating-point GUCs.
This change makes it possible to specify sub-millisecond delays,
which work well on most modern platforms, though that was not true
when the cost-delay feature was designed.

To support this without breaking existing configuration entries,
improve guc.c to allow floating-point GUCs to have units.  Also,
allow "us" (microseconds) as an input/output unit for time-unit GUCs.
(It's not allowed as a base unit, at least not yet.)

Likewise change the autovacuum_vacuum_cost_delay reloption to be
floating-point; this forces a catversion bump because the layout of
StdRdOptions changes.

This patch doesn't in itself change the default values or allowed
ranges for these parameters, and it should not affect the behavior
for any already-allowed setting for them.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 15:01:39 -04:00
Tom Lane 28a65fc360 Include GUC's unit, if it has one, in out-of-range error messages.
This should reduce confusion in cases where we've applied a units
conversion, so that the number being reported (and the quoted range
limits) are in some other units than what the user gave in the
setting we're rejecting.

Some of the changes here assume that float GUCs can have units,
which isn't true just yet, but will be shortly.

Discussion: https://postgr.es/m/3811.1552169665@sss.pgh.pa.us
2019-03-10 13:18:17 -04:00
Tom Lane ac75959cdc Disallow NaN as a value for floating-point GUCs.
None of the code that uses GUC values is really prepared for them to
hold NaN, but parse_real() didn't have any defense against accepting
such a value.  Treat it the same as a syntax error.

I haven't attempted to analyze the exact consequences of setting any
of the float GUCs to NaN, but since they're quite unlikely to be good,
this seems like a back-patchable bug fix.

Note: we don't need an explicit test for +-Infinity because those will
be rejected by existing range checks.  I added a regression test for
that in HEAD, but not older branches because the spelling of the value
in the error message will be platform-dependent in branches where we
don't always use port/snprintf.c.

Discussion: https://postgr.es/m/1798.1552165479@sss.pgh.pa.us
2019-03-10 12:59:16 -04:00
Alvaro Herrera 203749a8a6 pg_upgrade: Ignore TOAST for partitioned tables
Since partitioned tables in pg12 do not have toast tables, trying to set
the toast OID confuses pg_upgrade.  Have pg_dump omit those values to
avoid the problem.

Per Andres Freund and buildfarm members crake and snapper
Discussion: https://postgr.es/m/20190306204104.yle5jfbnqkcwykni@alap3.anarazel.de
2019-03-10 13:20:58 -03:00
Alexander Korotkov f2e403803f Support for INCLUDE attributes in GiST indexes
Similarly to B-tree, GiST index access method gets support of INCLUDE
attributes.  These attributes aren't used for tree navigation and aren't
present in non-leaf pages.  But they are present in leaf pages and can be
fetched during index-only scan.

The point of having INCLUDE attributes in GiST indexes is slightly different
from the point of having them in B-tree.  The main point of INCLUDE attributes
in B-tree is to define UNIQUE constraint over part of attributes enabled for
index-only scan.  In GiST the main point of INCLUDE attributes is to use
index-only scan for attributes, whose data types don't have GiST opclasses.

Discussion: https://postgr.es/m/73A1A452-AD5F-40D4-BD61-978622FF75C1%40yandex-team.ru
Author: Andrey Borodin, with small changes by me
Reviewed-by: Andreas Karlsson
2019-03-10 11:37:17 +03:00
Tom Lane a0b7626268 Simplify release-note links to back branches.
Now that https://www.postgresql.org/docs/release/ is populated,
replace the stopgap text we had under "Prior Releases" with
a pointer to that archive.

Discussion: https://postgr.es/m/e0f09c9a-bd2b-862a-d379-601dfabc8969@postgresql.org
2019-03-09 18:42:39 -05:00
Magnus Hagander 0516c61b75 Add new clientcert hba option verify-full
This allows a login to require both that the cn of the certificate
matches (like authentication type cert) *and* that another
authentication method (such as password or kerberos) succeeds as well.

The old value of clientcert=1 maps to the new clientcert=verify-ca,
clientcert=0 maps to the new clientcert=no-verify, and the new option
erify-full will add the validation of the CN.

Author: Julian Markwort, Marius Timmer
Reviewed by: Magnus Hagander, Thomas Munro
2019-03-09 12:19:47 -08:00
Magnus Hagander 6b9e875f72 Track block level checksum failures in pg_stat_database
This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.

Author: Magnus Hagander
Reviewed by: Julien Rouhaud
2019-03-09 10:47:30 -08:00
Noah Misch 3c5926301a Avoid some table rewrites for ALTER TABLE .. SET DATA TYPE timestamp.
When the timezone is UTC, timestamptz and timestamp are binary coercible
in both directions.  See b8a18ad485 and
c22ecc6562 for the previous attempt in
this problem space.  Skip the table rewrite; for now, continue to
needlessly rewrite any index on an affected column.

Reviewed by Simon Riggs and Tom Lane.

Discussion: https://postgr.es/m/20190226061450.GA1665944@rfd.leadboat.com
2019-03-08 20:16:27 -08:00
Michael Paquier 82a5649fb9 Tighten use of OpenTransientFile and CloseTransientFile
This fixes two sets of issues related to the use of transient files in
the backend:
1) OpenTransientFile() has been used in some code paths with read-write
flags while read-only is sufficient, so switch those calls to be
read-only where necessary.  These have been reported by Joe Conway.
2) When opening transient files, it is up to the caller to close the
file descriptors opened.  In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error.  However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it.  This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.

Note that one frontend code path is impacted by this commit so as an
error is issued when fetching control file data, making backend and
frontend to be treated consistently.

Reported-by: Joe Conway, Michael Paquier
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway
Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz
Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
2019-03-09 08:50:55 +09:00
Alvaro Herrera 2e616dee9e Fix crash with old libxml2
Certain libxml2 versions (such as the 2.7.6 commonly seen in older
distributions, but apparently only on x86_64) contain a bug that causes
xmlCopyNode, when called on a XML_DOCUMENT_NODE, to return a node that
xmlFreeNode crashes on.  Arrange to call xmlFreeDoc instead of
xmlFreeNode for those nodes.

Per buildfarm members lapwing and grison.

Author: Pavel Stehule, light editing by Álvaro.
Discussion: https://postgr.es/m/20190308024436.GA2374@alvherre.pgsql
2019-03-08 19:13:25 -03:00
Tom Lane 1b76168da7 Reformat catalog .dat files.
Test run for my previous commit; cleans up formatting issues in some
other recent commits.
2019-03-08 12:01:27 -05:00
Tom Lane 27aaf6eff4 Minor improvements for reformat_dat_file.pl.
Use Getopt::Long in preference to hand-rolled option parsing code.

Also, remove "-I .../backend/catalog" switch from the Makefile
invocations.  That's been unnecessary for some time, and leaving it
there gives the false impression it's needed in manual invocations.

John Naylor (extracted from a larger but more controversial patch)

Discussion: https://postgr.es/m/CACPNZCsHdcQN2jQ1=ptbi1Co2Nj3aHgRCUMk62=ThgWNabPY+Q@mail.gmail.com
2019-03-08 11:48:49 -05:00
Michael Paquier beeb8e2e07 Fix compatibility of pg_basebackup -R with 11 and older versions
When 2dedf4d9 has integrated recovery.conf into postgresql.conf, it also
changed pg_basebackup -R in the way recovery configuration is
generated.  However this implementation forgot the fact that
pg_basebackup needs to keep compatibility with older server versions as
well.

Reported-by: Devrim Gündüz
Author: Sergei Kornilov, Michael Paquier
Discussion: https://postgr.es/m/3458f7cd12d74acd90180a671c8d5a081d60e162.camel@gunduz.org
2019-03-08 10:17:23 +09:00
Alvaro Herrera 251cf2e27b Fix minor deficiencies in XMLTABLE, xpath(), xmlexists()
Correctly process nodes of more types than previously.  In some cases,
nodes were being ignored (nothing was output); in other cases, trying to
return them resulted in errors about unrecognized nodes.  In yet other
cases, necessary escaping (of XML special characters) was not being
done.  Fix all those (as far as the authors could find) and add
regression tests cases verifying the new behavior.

I (Álvaro) was of two minds about backpatching these changes.  They do
seem bugfixes that would benefit most users of the affected functions;
but on the other hand it would change established behavior in minor
releases, so it seems prudent not to.

Authors: Pavel Stehule, Markus Winand, Chapman Flack
Discussion:
   https://postgr.es/m/CAFj8pRA6J25CtAZ2TuRvxK3gat7-bBUYh0rfE2yM7Hj9GD14Dg@mail.gmail.com
   https://postgr.es/m/8BDB0627-2105-4564-AA76-7849F028B96E@winand.at

The elephant in the room as pointed out by Chapman Flack, not fixed in
this commit, is that we still have XMLTABLE operating on XPath 1.0
instead of the standard-mandated XQuery (or even its subset XPath 2.0).
Fixing that is a major undertaking, however.
2019-03-07 18:16:34 -03:00
Tom Lane 1d33858406 Fix handling of targetlist SRFs when scan/join relation is known empty.
When we introduced separate ProjectSetPath nodes for application of
set-returning functions in v10, we inadvertently broke some cases where
we're supposed to recognize that the result of a subquery is known to be
empty (contain zero rows).  That's because IS_DUMMY_REL was just looking
for a childless AppendPath without allowing for a ProjectSetPath being
possibly stuck on top.  In itself, this didn't do anything much worse
than produce slightly worse plans for some corner cases.

Then in v11, commit 11cf92f6e rearranged things to allow the scan/join
targetlist to be applied directly to partial paths before they get
gathered.  But it inserted a short-circuit path for dummy relations
that was a little too short: it failed to insert a ProjectSetPath node
at all for a targetlist containing set-returning functions, resulting in
bogus "set-valued function called in context that cannot accept a set"
errors, as reported in bug #15669 from Madelaine Thibaut.

The best way to fix this mess seems to be to reimplement IS_DUMMY_REL
so that it drills down through any ProjectSetPath nodes that might be
there (and it seems like we'd better allow for ProjectionPath as well).

While we're at it, make it look at rel->pathlist not cheapest_total_path,
so that it gives the right answer independently of whether set_cheapest
has been done lately.  That dependency looks pretty shaky in the context
of code like apply_scanjoin_target_to_paths, and even if it's not broken
today it'd certainly bite us at some point.  (Nastily, unsafe use of the
old coding would almost always work; the hazard comes down to possibly
looking through a dangling pointer, and only once in a blue moon would
you find something there that resulted in the wrong answer.)

It now looks like it was a mistake for IS_DUMMY_REL to be a macro: if
there are any extensions using it, they'll continue to use the old
inadequate logic until they're recompiled, after which they'll fail
to load into server versions predating this fix.  Hopefully there are
few such extensions.

Having fixed IS_DUMMY_REL, the special path for dummy rels in
apply_scanjoin_target_to_paths is unnecessary as well as being wrong,
so we can just drop it.

Also change a few places that were testing for partitioned-ness of a
planner relation but not using IS_PARTITIONED_REL for the purpose; that
seems unsafe as well as inconsistent, plus it required an ugly hack in
apply_scanjoin_target_to_paths.

In passing, save a few cycles in apply_scanjoin_target_to_paths by
skipping processing of pre-existing paths for partitioned rels,
and do some cosmetic cleanup and comment adjustment in that function.

I renamed IS_DUMMY_PATH to IS_DUMMY_APPEND with the intention of breaking
any code that might be using it, since in almost every case that would
be wrong; IS_DUMMY_REL is what to be using instead.

In HEAD, also make set_dummy_rel_pathlist static (since it's no longer
used from outside allpaths.c), and delete is_dummy_plan, since it's no
longer used anywhere.

Back-patch as appropriate into v11 and v10.

Tom Lane and Julien Rouhaud

Discussion: https://postgr.es/m/15669-02fb3296cca26203@postgresql.org
2019-03-07 14:22:13 -05:00
Robert Haas 898e5e3290 Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.
We still require AccessExclusiveLock on the partition itself, because
otherwise an insert that violates the newly-imposed partition
constraint could be in progress at the same time that we're changing
that constraint; only the lock level on the parent relation is
weakened.

To make this safe, we have to cope with (at least) three separate
problems. First, relevant DDL might commit while we're in the process
of building a PartitionDesc.  If so, find_inheritance_children() might
see a new partition while the RELOID system cache still has the old
partition bound cached, and even before invalidation messages have
been queued.  To fix that, if we see that the pg_class tuple seems to
be missing or to have a null relpartbound, refetch the value directly
from the table. We can't get the wrong value, because DETACH PARTITION
still requires AccessExclusiveLock throughout; if we ever want to
change that, this will need more thought. In testing, I found it quite
difficult to hit even the null-relpartbound case; the race condition
is extremely tight, but the theoretical risk is there.

Second, successive calls to RelationGetPartitionDesc might not return
the same answer.  The query planner will get confused if lookup up the
PartitionDesc for a particular relation does not return a consistent
answer for the entire duration of query planning.  Likewise, query
execution will get confused if the same relation seems to have a
different PartitionDesc at different times.  Invent a new
PartitionDirectory concept and use it to ensure consistency.  This
ensures that a single invocation of either the planner or the executor
sees the same view of the PartitionDesc from beginning to end, but it
does not guarantee that the planner and the executor see the same
view.  Since this allows pointers to old PartitionDesc entries to
survive even after a relcache rebuild, also postpone removing the old
PartitionDesc entry until we're certain no one is using it.

For the most part, it seems to be OK for the planner and executor to
have different views of the PartitionDesc, because the executor will
just ignore any concurrently added partitions which were unknown at
plan time; those partitions won't be part of the inheritance
expansion, but invalidation messages will trigger replanning at some
point.  Normally, this happens by the time the very next command is
executed, but if the next command acquires no locks and executes a
prepared query, it can manage not to notice until a new transaction is
started.  We might want to tighten that up, but it's material for a
separate patch.  There would still be a small window where a query
that started just after an ATTACH PARTITION command committed might
fail to notice its results -- but only if the command starts before
the commit has been acknowledged to the user. All in all, the warts
here around serializability seem small enough to be worth accepting
for the considerable advantage of being able to add partitions without
a full table lock.

Although in general the consequences of new partitions showing up
between planning and execution are limited to the query not noticing
the new partitions, run-time partition pruning will get confused in
that case, so that's the third problem that this patch fixes.
Run-time partition pruning assumes that indexes into the PartitionDesc
are stable between planning and execution.  So, add code so that if
new partitions are added between plan time and execution time, the
indexes stored in the subplan_map[] and subpart_map[] arrays within
the plan's PartitionedRelPruneInfo get adjusted accordingly.  There
does not seem to be a simple way to generalize this scheme to cope
with partitions that are removed, mostly because they could then get
added back again with different bounds, but it works OK for added
partitions.

This code does not try to ensure that every backend participating in
a parallel query sees the same view of the PartitionDesc.  That
currently doesn't matter, because we never pass PartitionDesc
indexes between backends.  Each backend will ignore the concurrently
added partitions which it notices, and it doesn't matter if different
backends are ignoring different sets of concurrently added partitions.
If in the future that matters, for example because we allow writes in
parallel query and want all participants to do tuple routing to the same
set of partitions, the PartitionDirectory concept could be improved to
share PartitionDescs across backends.  There is a draft patch to
serialize and restore PartitionDescs on the thread where this patch
was discussed, which may be a useful place to start.

Patch by me.  Thanks to Alvaro Herrera, David Rowley, Simon Riggs,
Amit Langote, and Michael Paquier for discussion, and to Alvaro
Herrera for some review.

Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
2019-03-07 11:13:12 -05:00
Alvaro Herrera eaaa5986ad Fix the BY {REF,VALUE} clause of XMLEXISTS/XMLTABLE
This clause is used to indicate the passing mode of a XML document, but
we were doing it wrong: we accepted BY REF and ignored it, and rejected
BY VALUE as a syntax error.  The reality, however, is that documents are
always passed BY VALUE, so rejecting that clause was silly.  Change
things so that we accept BY VALUE.

BY REF continues to be accepted, and continues to be ignored.

Author: Chapman Flack
Reviewed-by: Pavel Stehule
Discussion: https://postgr.es/m/5C297BB7.9070509@anastigmatix.net
2019-03-07 11:20:35 -03:00
Alvaro Herrera cb706ec4b6 Add missing <limits.h>
Per buildfarm
2019-03-07 10:08:29 -03:00
Alvaro Herrera 7e413a0f82 pg_dump: allow multiple rows per insert
This is useful to speed up loading data in a different database engine.

Authors: Surafel Temesgen and David Rowley.  Lightly edited by Álvaro.
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/CALAY4q9kumSdnRBzvRJvSRf2+BH20YmSvzqOkvwpEmodD-xv6g@mail.gmail.com
2019-03-07 09:34:17 -03:00
Thomas Munro 42210524cc Remove useless header inclusion. 2019-03-07 20:02:22 +13:00
Thomas Munro 91595f9d49 Drop the vestigial "smgr" type.
Before commit 3fa2bb31 this type appeared in the catalogs to
select which of several block storage mechanisms each relation
used.

New features under development propose to revive the concept of
different block storage managers for new kinds of data accessed
via bufmgr.c, but don't need to put references to them in the
catalogs.  So, avoid useless maintenance work on this type by
dropping it.  Update some regression tests that were referencing
it where any type would do.

Discussion: https://postgr.es/m/CA%2BhUKG%2BDE0mmiBZMtZyvwWtgv1sZCniSVhXYsXkvJ_Wo%2B83vvw%40mail.gmail.com
2019-03-07 15:44:04 +13:00
Andres Freund 277cb78983 Don't reuse slots between root and partition in ON CONFLICT ... UPDATE.
Until now the the slot to store the conflicting tuple, and the result
of the ON CONFLICT SET, where reused between partitions. That
necessitated changing slots descriptor when switching partitions.

Besides the overhead of switching descriptors on a slot (which
requires memory allocations and prevents JITing), that's importantly
also problematic for tableam. There individual partitions might belong
to different tableams, needing different kinds of slots.

In passing also fix ExecOnConflictUpdate to clear the existing slot at
exit. Otherwise that slot could continue to hold a pin till the query
ends, which could be far too long if the input data set is large, and
there's no further conflicts. While previously also problematic, it's
now more important as there will be more such slots when partitioned.

Author: Andres Freund
Reviewed-By: Robert Haas, David Rowley
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-06 15:43:33 -08:00
Andres Freund d16a74c20c Fix equalfuncs for accessMethod addition in 8586bf7ed8.
In a complete brown paper bag moment, I forgot to include equalfuncs
in my previous fix of copy/out/readfuncs.  Thanks Tom for noticing.

Discussion: https://postgr.es/m/1659.1551903210@sss.pgh.pa.us
2019-03-06 13:04:09 -08:00
Andrew Dunstan 342cb650e0 Don't log incomplete startup packet if it's empty
This will stop logging cases where, for example, a monitor opens a
connection and immediately closes it. If the packet contains any data an
incomplete packet will still be logged.

Author: Tom Lane

Discussion: https://postgr.es/m/a1379a72-2958-1ed0-ef51-09a21219b155@2ndQuadrant.com
2019-03-06 15:36:41 -05:00
Andres Freund b172342321 Fix copy/out/readfuncs for accessMethod addition in 8586bf7ed8.
This includes a catversion bump, as IntoClause is theoretically
speaking part of storable rules. In practice I don't think that can
happen, but there's no reason to be stingy here.

Per buildfarm member calliphoridae.
2019-03-06 11:55:28 -08:00
Andres Freund 863aa55624 Fix collation dependency in test introduced in 8586bf7ed8, take 2.
Per buildfarm.  This time I hopefully actually made sure to get all
the cases...
2019-03-06 11:27:14 -08:00
Andres Freund 836f634522 Fix collation dependency in test introduced in 8586bf7ed8.
Per buildfarm.
2019-03-06 11:06:01 -08:00
Andres Freund 3b925e905d tableam: Add pg_dump support.
This adds pg_dump support for table AMs in a similar manner to how
tablespaces are handled. That is, instead of specifying the AM for
every CREATE TABLE etc, emit SET default_table_access_method
statements. That makes it easier to change the AM for all/most tables
in a dump, and allows restore to succeed even if some AM is not
available.

This increases the dump archive version, as a tables/matview's AM
needs to be tracked therein.

Author: Dimitri Dolgov, Andres Freund
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 09:54:38 -08:00
Andres Freund 8586bf7ed8 tableam: introduce table AM infrastructure.
This introduces the concept of table access methods, i.e. CREATE
  ACCESS METHOD ... TYPE TABLE and
  CREATE TABLE ... USING (storage-engine).
No table access functionality is delegated to table AMs as of this
commit, that'll be done in following commits.

Subsequent commits will incrementally abstract table access
functionality to be routed through table access methods. That change
is too large to be reviewed & committed at once, so it'll be done
incrementally.

Docs will be updated at the end, as adding them incrementally would
likely make them less coherent, and definitely is a lot more work,
without a lot of benefit.

Table access methods are specified similar to index access methods,
i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with
callbacks. In contrast to index AMs that struct needs to live as long
as a backend, typically that's achieved by just returning a pointer to
a constant struct.

Psql's \d+ now displays a table's access method. That can be disabled
with HIDE_TABLEAM=true, which is mainly useful so regression tests can
be run against different AMs.  It's quite possible that this behaviour
still needs to be fine tuned.

For now it's not allowed to set a table AM for a partitioned table, as
we've not resolved how partitions would inherit that. Disallowing
allows us to introduce, if we decide that's the way forward, such a
behaviour without a compatibility break.

Catversion bumped, to add the heap table AM and references to it.

Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
    https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de
    https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 09:54:38 -08:00
Andres Freund f217761856 Fix bug in clearing of virtual tuple slot.
I broke/typoed this in 4da597edf1. Astonishingly this mostly
doesn't cause breakage, except when trying to change the tuple
descriptor of a slot (because TTS_FLAG_FIXED is assumed to be set).

Author: Andres Freund
2019-03-06 09:54:38 -08:00
Robert Haas 93473c6ac8 Removed unused variable, openLogOff.
Antonin Houska

Discussion: http://postgr.es/m/30413.1551870730@localhost
2019-03-06 09:44:08 -05:00
Andrew Dunstan bd09503e63 Increase the default vacuum_cost_limit from 200 to 2000
The original 200 default value was set back in f425b605f4 when the cost
delay settings were first added.  Hardware has improved quite a bit since
then and we've also made improvements such as sorting buffers during
checkpoints (9cd00c457e) which should result in less random writes.

This low default value was reportedly causing problems with badly
configured servers and in the absence of a native method to remove
excessive bloat from tables without incurring an AccessExclusiveLock, this
often made cleaning up the damage caused by badly configured auto-vacuums
difficult.

It seems more likely that someone will notice that auto-vacuum is running
too quickly than too slowly, so let's go all out and multiple the default
value for the setting by 10.  With the default vacuum_cost_page_dirty and
autovacuum_vacuum_cost_delay (assuming a page size of 8192 bytes), this
allows autovacuum a theoretical maximum dirty write rate of around 39MB/s
instead of just 3.9MB/s.

Author: David Rowley

Discussion: https://postgr.es/m/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_rYvw@mail.gmail.com
2019-03-06 09:10:12 -05:00
Michael Paquier ff9bff0a85 Teach SKIP_LOCKED to psql tab completion of VACUUM and ANALYZE
This was missing since 803b130, which has introduced the option for the
user-facing VACUUM and ANALYZE.

Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD2TMdTxRhZ7WSp940V82_OAyPmgHnbi25UbbArLgA92Q@mail.gmail.com
2019-03-06 14:42:52 +09:00
Andrew Dunstan e988878f85 Fix pgbench TAP test failure with funky file names (redux)
This test fails if the containing directory contains a funny character
such as a space or some perl metacharacter. To avoid that, we check for
files names using readdir and a regex, rather than using a glob pattern.

Discussion: https://postgr.es/m/CAM6_UM6dGdU39PKAC24T+HD9ouy0jLN9vH6163K8QEEzr__iZw@mail.gmail.com

Author: Fabien COELHO
Reviewed-by: Raúl Marín Rodríguez
2019-03-05 10:46:35 -05:00
Peter Eisentraut 9b1384dd6a Remove duplicate macro
The original commit appears to have accidentally introduced a
duplicate definition.  Keep only one of them.
2019-03-05 15:02:56 +01:00
Heikki Linnakangas fe280694d0 Scan GiST indexes in physical order during VACUUM.
Scanning an index in physical order is faster than walking it in logical
order, because sequential I/O is faster than random I/O. The idea and code
structure is borrowed from B-tree vacuum code.

Patch by Andrey Borodin, with changes by me. Based on early work by
Konstantin Kuznetsov, although the patch has been rewritten multiple times
since his original version.

Discussion: https://www.postgresql.org/message-id/1B9FAC6F-FA19-4A24-8C1B-F4F574844892%40yandex-team.ru
2019-03-05 15:19:48 +02:00
Peter Geoghegan 35bc0ec7c8 Note case where nbtree VACUUM finishes splits.
The nbtree README claims that VACUUM can never finish interrupted page
splits by design.  That isn't entirely accurate, though.  Note an
exception to the general rule.

Discussion: https://postgr.es/m/CAH2-Wz=_Xvv8byzK_LvY4ci76OgsHCQzoKF7We8yG9waO7j6rA@mail.gmail.com
2019-03-04 17:57:36 -08:00
Andrew Dunstan 1638623f34 Disable dump_connstr test on Msys2
For some reason the dump test with names with high bits set fails on
Msys2 (although not Msys1). Disable the tests for now, so that other
tests can run.
2019-03-04 17:35:54 -05:00
Andrew Dunstan 4eff1e9f0b Allow recovery tests to run on Windows as an admin user
This is the only test that fails when run as an admin user. The reason
is that when Postgres is started via pg_ctl its admin privileges are
lowered. However, this test called 'postgres -D datadir' directly,
resulting in a failure. Replace that by calling pg_ctl and then checking
the result for the expected failure, and the logfile for the expected
error message.
2019-03-04 15:54:02 -05:00
Peter Geoghegan 72c7c4e386 Correct obsolete nbtree page split WAL comment.
Commit 2c03216d83, which revamped the WAL record format, failed to
update a comment referencing the old API.  Update the comment.
2019-03-04 12:32:40 -08:00
Alvaro Herrera b96f6b1948 pg_partition_ancestors
Adds another introspection feature for partitioning, necessary for
further psql patches.

Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/20190226222757.GA31622@alvherre.pgsql
2019-03-04 16:14:29 -03:00
Alvaro Herrera d12fbe2f8e Test partition functions with legacy inheritance children, too
It's worth immortalizing this behavior, per discussion.

Discussion: https://postgr.es/m/20190228193203.GA26151@alvherre.pgsql
2019-03-04 15:52:41 -03:00
Peter Eisentraut 278584b526 Remove volatile from latch API
This was no longer useful since the latch functions use memory
barriers already, which are also compiler barriers, and volatile does
not help with cross-process access.

Discussion: https://www.postgresql.org/message-id/flat/20190218202511.qsfpuj5sy4dbezcw%40alap3.anarazel.de#18783c27d73e9e40009c82f6e0df0974
2019-03-04 11:30:41 +01:00
Michael Paquier 754b90f657 Fix error handling of readdir() port implementation on first file lookup
The implementation of readdir() in src/port/ which gets used by MSVC has
been added in 399a36a, and since the beginning it considers all errors
on the first file lookup as ENOENT, setting errno accordingly and
letting the routine caller think that the directory is empty.  While
this is normally enough for the case of the backend, this can confuse
callers of this routine on Windows as all errors would map to the same
behavior.  So, for example, even permission errors would be thought as
having an empty directory, while there could be contents in it.

This commit changes the error handling so as readdir() gets a behavior
similar to native implementations: force errno=0 when seeing
ERROR_FILE_NOT_FOUND as error and consider other errors as plain
failures.

While looking at the patch, I noticed that MinGW does not enforce
errno=0 when looking at the first file, but it gets enforced on the next
file lookups.  A comment related to that was incorrect in the code.

Reported-by: Yuri Kurenkov
Diagnosed-by: Yuri Kurenkov, Grigory Smolkin
Author:	Konstantin Knizhnik
Reviewed-by: Andrew Dunstan, Michael Paquier
Discussion: https://postgr.es/m/2cad7829-8d66-e39c-b937-ac825db5203d@postgrespro.ru
Backpatch-through: 9.4
2019-03-04 09:49:06 +09:00
Andrew Dunstan 5bd9160f27 fix thinko in logrotate test 2019-03-03 19:15:47 -05:00
Andrew Dunstan d611175e53 Don't do pg_ctl logrotate test on Windows
The test crashes and burns quite badly, for some reason, but even if it
didn't it wouldn't work, since Windows doesn't let you rename a file
held by a running process.
2019-03-03 18:19:44 -05:00
Tom Lane 80b9e9c466 Improve performance of index-only scans with many index columns.
StoreIndexTuple was a loop over index_getattr, which is O(N^2)
if the index columns are variable-width, and the performance
impact is already quite visible at ten columns.  The obvious
move is to replace that with a call to index_deform_tuple ...
but that's *also* a loop over index_getattr.  Improve it to
be essentially a clone of heap_deform_tuple.

(There are a few other places that loop over all index columns
with index_getattr, and perhaps should be changed likewise,
but most of them don't seem performance-critical.  Anyway, the
rest would mostly only be interested in the index key columns,
which there aren't likely to be so many of.  Wide index tuples
are a new thing with INCLUDE.)

Konstantin Knizhnik

Discussion: https://postgr.es/m/e06b2d27-04fc-5c0e-bb8c-ecd72aa24959@postgrespro.ru
2019-03-03 16:57:14 -05:00
Andrew Dunstan 78b408a20a Avoid accidental wildcard expansion in msys shell
Commit f092de05 added a test for pg_dumpall --exclude-database including
the wildcard pattern '*dump*' which matches some files in the source
directory. The test library on msys uses the shell which expands this
and thus the program gets incorrect arguments. This doesn't happen if
the pattern doesn't match any files, so here the pattern is set to
'*dump_test*' which is such a pattern.

Per buildfarm animal jacana.
2019-03-03 11:48:12 -05:00
Dean Rasheed ed4653db8c Further fixing for multi-row VALUES lists for updatable views.
Previously, rewriteTargetListIU() generated a list of attribute
numbers from the targetlist, which were passed to rewriteValuesRTE(),
which expected them to contain the same number of entries as there are
columns in the VALUES RTE, and to be in the same order. That was fine
when the target relation was a table, but for an updatable view it
could be broken in at least three different ways ---
rewriteTargetListIU() could insert additional targetlist entries for
view columns with defaults, the view columns could be in a different
order from the columns of the underlying base relation, and targetlist
entries could be merged together when assigning to elements of an
array or composite type. As a result, when recursing to the base
relation, the list of attribute numbers generated from the rewritten
targetlist could no longer be relied upon to match the columns of the
VALUES RTE. We got away with that prior to 41531e42d3 because it used
to always be the case that rewriteValuesRTE() did nothing for the
underlying base relation, since all DEFAULTS had already been replaced
when it was initially invoked for the view, but that was incorrect
because it failed to apply defaults from the base relation.

Fix this by examining the targetlist entries more carefully and
picking out just those that are simple Vars referencing the VALUES
RTE. That's sufficient for the purposes of rewriteValuesRTE(), which
is only responsible for dealing with DEFAULT items in the VALUES
RTE. Any DEFAULT item in the VALUES RTE that doesn't have a matching
simple-Var-assignment in the targetlist is an error which we complain
about, but in theory that ought to be impossible.

Additionally, move this code into rewriteValuesRTE() to give a clearer
separation of concerns between the 2 functions. There is no need for
rewriteTargetListIU() to know about the details of the VALUES RTE.

While at it, fix the comment for rewriteValuesRTE() which claimed that
it doesn't support array element and field assignments --- that hasn't
been true since a3c7a993d5 (9.6 and later).

Back-patch to all supported versions, with minor differences for the
pre-9.6 branches, which don't support array element and field
assignments to the same column in multi-row VALUES lists.

Reviewed by Amit Langote.

Discussion: https://postgr.es/m/15623-5d67a46788ec8b7f@postgresql.org
2019-03-03 10:51:13 +00:00
Michael Paquier 3422955735 Consider only relations part of partition trees in partition functions
This changes the partition functions so as tables and indexes which are
not part of partition trees are handled the same way as what is done for
undefined objects and unsupported relkinds: pg_partition_tree() returns
no rows and pg_partition_root() returns a NULL result.  Hence,
partitioned tables, partitioned indexes and relations whose flag
pg_class.relispartition is set are considered as valid objects to
process.

Previously, tables and indexes not included in a partition tree were
processed the same way as a partition or a partitioned table, which
caused the functions to return inconsistent results for inherited
tables, especially when inheriting from multiple tables.

Reported-by: Álvaro Herrera
Author: Amit Langote, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190228193203.GA26151@alvherre.pgsql
2019-03-02 18:18:59 +09:00
Andres Freund 70b9bda65f Use a virtual rather than a heap slot in two places where that suffices.
Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-01 17:26:43 -08:00
Tom Lane 3396138a6d Check we don't misoptimize a NOT IN where the subquery returns no rows.
Future-proofing against a common mistake in attempts to optimize NOT IN.
We don't have such an optimization right now, but attempts to do so
are in the works, and some of 'em are buggy.  Add a regression test case
covering the point.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f90E9agVZryVyUpbHQbjTt5ExqS2Fsodmt5_A7E_cEyVA@mail.gmail.com
2019-03-01 17:57:20 -05:00
Tom Lane 65ce07e020 Teach optimizer's predtest.c more things about ScalarArrayOpExpr.
In particular, make it possible to prove/refute "x IS NULL" and
"x IS NOT NULL" predicates from a clause involving a ScalarArrayOpExpr
even when we are unable or unwilling to deconstruct the expression
into an AND/OR tree.  This avoids a former unexpected degradation of
plan quality when the size of an ARRAY[] expression or array constant
exceeded the arbitrary MAX_SAOP_ARRAY_SIZE limit.  For IS-NULL proofs,
we don't really care about the values of the individual array elements;
at most, we care whether there are any, and for some common cases we
needn't even know that.

The main user-visible effect of this is to let the optimizer recognize
applicability of partial indexes with "x IS NOT NULL" predicates to
queries with "x IN (array)" clauses in some cases where it previously
failed to recognize that.  The structure of predtest.c is such that a
bunch of related proofs will now also succeed, but they're probably
much less useful in the wild.

James Coleman, reviewed by David Rowley

Discussion: https://postgr.es/m/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg@mail.gmail.com
2019-03-01 17:14:17 -05:00
Peter Eisentraut aad21d4c3c Fix whitespace 2019-03-01 20:56:53 +01:00
Andrew Dunstan 97b6f2eb75 Remove tests for pg_dumpall --exclude-database missing argument
It turns out that different getopt implementations spell the error for
missing arguments different ways. This test is of fairly marginal
value, so instead of trying to keep up  with the different error
messages just remove the test.
2019-03-01 14:14:50 -05:00
Andres Freund ad0bda5d24 Store tuples for EvalPlanQual in slots, rather than as HeapTuples.
For the upcoming pluggable table access methods it's quite
inconvenient to store tuples as HeapTuples, as that'd require
converting tuples from a their native format into HeapTuples. Instead
use slots to manage epq tuples.

To fit into that scheme, change the foreign data wrapper callback
RefetchForeignRow, to store the tuple in a slot. Insist on using the
caller provided slot, so it conveniently can be stored in the
corresponding EPQ slot.  As there is no in core user of
RefetchForeignRow, that change was done blindly, but we plan to test
that soon.

To avoid duplicating that work for row locks, move row locks to just
directly use the EPQ slots - it previously temporarily stored tuples
in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given
we'd possibly end up with a significant number of additional slots.

The behaviour of es_epqTupleSet[rti -1] is now checked by
es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a
slot containing an empty tuple.

Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-01 10:37:57 -08:00
Andrew Dunstan 6cbdbd9e8d Add extra descriptive headings in pg_dumpall
Headings are added for the User Configurations and Databases sections,
and for each user configuration and database in the output.

Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1812272222130.32444@lancre
2019-03-01 11:38:54 -05:00
Andrew Dunstan f092de0503 Add --exclude-database option to pg_dumpall
This option functions similarly to pg_dump's --exclude-table option, but
for database names. The option can be given once, and the argument can
be a pattern including wildcard characters.

Author: Andrew Dunstan.
Reviewd-by: Fabien Coelho and Michael Paquier
Discussion: https://postgr.es/m/43a54a47-4aa7-c70e-9ca6-648f436dd6e6@2ndQuadrant.com
2019-03-01 10:47:44 -05:00
Amit Kapila 9c32e4c350 Clear the local map when not used.
After commit b0eaa4c51b, we use a local map of pages to find the required
space for small relations.  We do clear this map when we have found a block
with enough free space, when we extend the relation, or on transaction
abort so that it can be used next time.  However, we miss to clear it when
we didn't find any pages to try from the map which leads to an assertion
failure when we later tried to use it after relation extension.

In the passing, I have improved some comments in this area.

Reported-by: Tom Lane based on buildfarm results
Author: Amit Kapila
Reviewed-by: John Naylor
Tested-by: Kuntal Ghosh
Discussion: https://postgr.es/m/32368.1551114120@sss.pgh.pa.us
2019-03-01 07:38:47 +05:30
Michael Paquier 0f3cdf873e Make pg_partition_tree return no rows on unsupported and undefined objects
The function was tweaked so as it returned one row full of NULLs when
working on an unsupported relkind or an undefined object as of cc53123,
and after discussion with Amit and Álvaro it looks more natural to make
it return no rows.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Amit Langote
Discussion: https://postgr.es/m/20190227184808.GA17357@alvherre.pgsql
2019-03-01 09:07:07 +09:00
Andres Freund 253655116b Don't superfluously materialize slot after DELETE from an FDW.
Previously that was needed to safely store the table oid, but after
b8d71745ea that's not necessary anymore.

Author: Andres Freund
2019-02-28 14:54:12 -08:00
Andres Freund 8f0577386e Don't force materializing when copying a buffer tuple table slot.
After 5408e233f0 it's not necessary to force materializing the
target slot when copying from one buffer slot to another. Previously
that was required because the HeapTupleData portion of the source slot
wasn't guaranteed to stay valid long enough, but now we can simply
copy that part into the destination slot's tupdata.

Author: Andres Freund
2019-02-28 14:54:12 -08:00
Joe Conway 4598a99cf2 Make get_controlfile not leak file descriptors
When backend functions were added to expose controldata via SQL,
reading of pg_control was consolidated under src/common so that
both frontend and backend could share the same code. That move
from frontend-only to shared frontend-backend failed to recognize
the risk (and coding standards violation) of using a bare open().
In particular, it risked leaking file descriptors if transient
errors occurred while reading the file. Fix that by using
OpenTransientFile() instead in the backend case, which is
purpose-built for this type of usage.

Since there have been no complaints from the field, and an intermittent
failure low risk, no backpatch. Hard failure would of course be bad, but
in that case these functions are probably the least of your worries.

Author: Joe Conway
Reviewed-By: Michael Paquier
Reported by: Michael Paquier
Discussion: https://postgr.es/m/20190227074728.GA15710@paquier.xyz
2019-02-28 15:57:40 -05:00
Andres Freund f414abd62d Allow buffer tuple table slots to materialize after ExecStoreVirtualTuple().
While not common, it can be useful to store a virtual tuple into a
buffer tuple table slot, and then materialize that slot. So far we've
asserted out, which surprisingly wasn't a problem for anything in
core. But that seems fragile, and it also breaks redis_fdw after
ff11e7f4b9.

Thus, allow materializing a virtual tuple stored in a buffer tuple
table slot.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20190227181621.xholonj7ff7ohxsg@alap3.anarazel.de
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-28 12:28:03 -08:00
Alvaro Herrera 19455c9f56 pg_dump: Fix ArchiveEntry handling of some empty values
Commit f831d4acc changed what pg_dump emits for some empty fields: they
were output as empty strings before, NULL pointer afterwards.  That
makes old pg_restore unable to work (crash) with such files, which is
unacceptable.  Return to the original representation by explicitly
setting those struct members to "" where needed; remove some no longer
needed checks for NULL input.

We can declutter the code a little by returning to NULLs when we next
update the archive version, so add a note to remind us later.

Discussion: https://postgr.es/m/20190225074539.az6j3u464cvsoxh6@depesz.com
Reported-by: hubert depesz lubaczewski
Author: Dmitry Dolgov
2019-02-28 17:19:44 -03:00
Peter Eisentraut 3f61999cc9 Merge near-duplicate code in RI triggers
Merge ri_setnull and ri_setdefault into one function ri_set.  These
functions were to a large part identical.

This is a continuation in spirit of
4797f9b519.

Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ccdd3e1-10b0-dd05-d8a7-183507c11eb1%402ndquadrant.com
2019-02-28 20:35:55 +01:00
Tom Lane c94fb8e8ac Standardize some more loops that chase down parallel lists.
We have forboth() and forthree() macros that simplify iterating
through several parallel lists, but not everyplace that could
reasonably use those was doing so.  Also invent forfour() and
forfive() macros to do the same for four or five parallel lists,
and use those where applicable.

The immediate motivation for doing this is to reduce the number
of ad-hoc lnext() calls, to reduce the footprint of a WIP patch.
However, it seems like good cleanup and error-proofing anyway;
the places that were combining forthree() with a manually iterated
loop seem particularly illegible and bug-prone.

There was some speculation about restructuring related parsetree
representations to reduce the need for parallel list chasing of
this sort.  Perhaps that's a win, or perhaps not, but in any case
it would be considerably more invasive than this patch; and it's
not particularly related to my immediate goal of improving the
List infrastructure.  So I'll leave that question for another day.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-02-28 14:25:01 -05:00
Peter Eisentraut 0a271df705 Clean up some variable names in ri_triggers.c
There was a mix of old_slot/oldslot, new_slot/newslot.  Since we've
changed everything from row to slot, we might as well take this
opportunity to clean this up.

Also update some more comments for the slot change.
2019-02-28 15:29:00 +01:00
Peter Eisentraut 554ebf6878 Compact for loops
Declare loop variable in for loop, for readability and to save space.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ccdd3e1-10b0-dd05-d8a7-183507c11eb1%402ndquadrant.com
2019-02-28 11:06:33 +01:00
Peter Eisentraut 05d604718f Reduce comments
Reduce the vertical space used by comments in ri_triggers.c, making
the file longer and more tedious to read than it needs to be.  Update
some comments to use a more common style.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ccdd3e1-10b0-dd05-d8a7-183507c11eb1%402ndquadrant.com
2019-02-28 11:06:33 +01:00
Peter Eisentraut 0afdecc1e5 Remove unnecessary unused MATCH PARTIAL code
ri_triggers.c spends a lot of space catering to a not-yet-implemented
MATCH PARTIAL option.  An actual implementation would probably not use
the existing code structure anyway, so let's just simplify this for
now.

First, have ri_FetchConstraintInfo() check that riinfo->confmatchtype
is valid.  Then we don't have to repeat that everywhere.

In the various referential action functions, we don't need to pay
attention to the match type at all right now, so remove all that code.
A future MATCH PARTIAL implementation would probably have some
conditions added to the present code, but it won't need an entirely
separate switch branch in each case.

In RI_FKey_fk_upd_check_required(), reorganize the code to make it
much simpler.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ccdd3e1-10b0-dd05-d8a7-183507c11eb1%402ndquadrant.com
2019-02-28 11:06:33 +01:00
Peter Eisentraut 22c0d52b8d Update comment
for ff11e7f4b9
2019-02-28 10:43:00 +01:00
Michael Paquier b3a156858a Improve documentation of data_sync_retry
Reflecting an updated parameter value requires a server restart, which
was not mentioned in the documentation and in postgresql.conf.sample.

Reported-by: Thomas Poty
Discussion: https://postgr.es/m/15659-0cd812f13027a2d8@postgresql.org
2019-02-28 11:02:11 +09:00
Michael Paquier 87c346a35e Fix SCRAM authentication via SSL when mixing versions of OpenSSL
When using a libpq client linked with OpenSSL 1.0.1 or older to connect
to a backend linked with OpenSSL 1.0.2 or newer, the server would send
SCRAM-SHA-256-PLUS and SCRAM-SHA-256 as valid mechanisms for the SASL
exchange, and the client would choose SCRAM-SHA-256-PLUS even if it does
not support channel binding, leading to a confusing error.  In this
case, what the client ought to do is switch to SCRAM-SHA-256 so as the
authentication can move on and succeed.

So for a SCRAM authentication over SSL, here are all the cases present
and how we deal with them using libpq:
1) Server supports channel binding, it sends SCRAM-SHA-256-PLUS and
SCRAM-SHA-256 as allowed mechanisms.
1-1) Client supports channel binding, chooses SCRAM-SHA-256-PLUS.
1-2) Client does not support channel binding, chooses SCRAM-SHA-256.
2) Server does not support channel binding, sends SCRAM-SHA-256 as
allowed mechanism.
2-1) Client supports channel binding, still it has no choice but to
choose SCRAM-SHA-256.
2-2) Client does not support channel binding, it chooses SCRAM-SHA-256.
In all these scenarios the connection should succeed, and the one which
was handled incorrectly prior this commit is 1-2), causing the
connection attempt to fail because client chose SCRAM-SHA-256-PLUS over
SCRAM-SHA-256.

Reported-by: Hugh Ranalli
Diagnosed-by: Peter Eisentraut
Author: Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: https://postgr.es/m/CAAhbUMO89SqUk-5mMY+OapgWf-twF2NA5sCucbHEzMfGbvcepA@mail.gmail.com
Backpatch-through: 11
2019-02-28 09:40:28 +09:00
Andres Freund 5963b29e03 Initialize variable to silence compiler warning.
After ff11e7f4b9 Tom's compiler warns about accessing a potentially
uninitialized rInfo. That's not actually possible, but it's
understandable the compiler would get this wrong. NULL initialize too.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/11199.1551285318@sss.pgh.pa.us
2019-02-27 09:14:34 -08:00
Peter Eisentraut 538ecc17c4 Set cluster_name for PostgresNode.pm instances
This can help identifying test instances more easily at run time, and
it also provides some minimal test coverage for the cluster_name
feature.

Reviewed-by: Euler Taveira <euler@timbira.com.br>
Discussion: https://www.postgresql.org/message-id/flat/1257eaee-4874-e791-e83a-46720c72cac7@2ndquadrant.com
2019-02-27 10:59:25 +01:00
Peter Eisentraut 6ae578a91e Set fallback_application_name for a walreceiver to cluster_name
By default, the fallback_application_name for a physical walreceiver
is "walreceiver".  This means that multiple standbys cannot be
distinguished easily on a primary, for example in pg_stat_activity or
synchronous_standby_names.

If cluster_name is set, use that for fallback_application_name in the
walreceiver.  (If it's not set, it remains "walreceiver".)  If someone
set cluster_name to identify their instance, we might as well use that
by default to identify the node remotely as well.  It's still possible
to specify another application_name in primary_conninfo explicitly.

Reviewed-by: Euler Taveira <euler@timbira.com.br>
Discussion: https://www.postgresql.org/message-id/flat/1257eaee-4874-e791-e83a-46720c72cac7@2ndquadrant.com
2019-02-27 10:59:25 +01:00
Michael Paquier 414a9d3cf3 Fix memory leak when inserting tuple at relation creation for CTAS
The leak has been introduced by 763f2ed which has addressed the problem
for transient tables, and forgot CREATE TABLE AS which shares a similar
logic when receiving a new tuple to store into the newly-created
relation.

Author: Jeff Janes
Discussion: https://postgr.es/m/CAMkU=1xZXtz3mziPEPD2Fubbas4G2RWkZm5HHABtfKVcbu1=Sg@mail.gmail.com
2019-02-27 14:14:06 +09:00
Andres Freund ff11e7f4b9 Use slots in trigger infrastructure, except for the actual invocation.
In preparation for abstracting table storage, convert trigger.c to
track tuples in slots. Which also happens to make code calling
triggers simpler.

As the calling interface for triggers themselves is not changed in
this patch, HeapTuples still are extracted from the slot at that
time. But that's handled solely inside trigger.c, not visible to
callers. It's quite likely that we'll want to revise the external
trigger interface, but that's a separate large project.

As part of this work the slots used for old/new/return tuples are
moved from EState into ResultRelInfo, as different updated tables
might need different slots. The slots are now also now created
on-demand, which is good both from an efficiency POV, but also makes
the modifying code simpler.

Author: Andres Freund, Amit Khandekar and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:38 -08:00
Andres Freund b8d71745ea Store table oid and tuple's tid in tuple slots directly.
After the introduction of tuple table slots all table AMs need to
support returning the table oid of the tuple stored in a slot created
by said AM. It does not make sense to re-implement that in every AM,
therefore move handling of table OIDs into the TupleTableSlot
structure itself.  It's possible that we, at a later date, might want
to get rid of HeapTupleData.t_tableOid entirely, but doing so before
the abstractions for table AMs are integrated turns out to be too
hard, so delay that for now.

Similarly, every AM needs to support the concept of a tuple
identifier (tid / item pointer) for its tuples. It's quite possible
that we'll generalize the exact form of a tid at a future point (to
allow for things like index organized tables), but for now many parts
of the code know about tids, so there's not much point in abstracting
tids away. Therefore also move into slot (rather than providing API to
set/get the tid associated with the tuple in a slot).

Once table AM includes insert/updating/deleting tuples, the
responsibility to set the correct tid after such an action will move
into that. After that change, code doing such modifications, should
not have to deal with HeapTuples directly anymore.

Author: Andres Freund, Haribabu Kommi and Ashutosh Bapat
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 20:31:16 -08:00
Andres Freund 5408e233f0 Allow to use HeapTupleData embedded in [Buffer]HeapTupleTableSlot.
That avoids having to care about the lifetime of the
HeapTupleHeaderData passed to ExecStore[Buffer]HeapTuple(). That
doesn't make a huge difference for a plain HeapTupleTableSlot, but for
BufferHeapTupleTableSlot it can be a significant advantage, avoiding
the need to materialize slots where it's inconvenient to provide a
HeapTupleData with appropriate lifetime to point to the on-disk tuple.

It's quite possible that we'll want to add support functions for
constructing HeapTuples using that embedded HeapTupleData, but for now
callers do so themselves.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 18:15:59 -08:00
Andres Freund 8aa02b52db Add ExecStorePinnedBufferHeapTuple.
This allows to avoid an unnecessary pin/unpin cycle when storing a
tuple in an already pinned buffer into a slot, when the pin isn't
further needed at the call site.

Only a single caller for now (to ensure coverage), but upcoming
patches will increase use of the new function.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-02-26 17:59:01 -08:00
Robert Haas f4b6341d5f Change lock acquisition order in expand_inherited_rtentry.
Previously, this function acquired locks in the order using
find_all_inheritors(), which locks the children of each table that it
processes in ascending OID order, and which processes the inheritance
hierarchy as a whole in a breadth-first fashion.  Now, it processes
the inheritance hierarchy in a depth-first fashion, and at each level
it proceeds in the order in which tables appear in the PartitionDesc.
If table inheritance rather than table partitioning is used, the old
order is preserved.

This change moves the locking of any given partition much closer to
the code that actually expands that partition.  This seems essential
if we ever want to allow concurrent DDL to add or remove partitions,
because if the set of partitions can change, we must use the same data
to decide which partitions to lock as we do to decide which partitions
to expand; otherwise, we might expand a partition that we haven't
locked.  It should hopefully also facilitate efforts to postpone
inheritance expansion or locking for performance reasons, because
there's really no way to postpone locking some partitions if
we're blindly locking them all using find_all_inheritors().

The only downside of this change which is known to me is that it
further deviates from the principle that we should always lock the
inheritance hierarchy in find_all_inheritors() order to avoid deadlock
risk.  However, we've already crossed that bridge in commit
9eefba181f and there are futher patches
pending that make similar changes, so this isn't really giving up
anything that we haven't surrendered already -- and it seems entirely
worth it, given the performance benefits some of those changes seem
likely to bring.

Patch by me; thanks to David Rowley for discussion of these issues.

Discussion: http://postgr.es/m/CAKJS1f_eEYVEq5tM8sm1k-HOwG0AyCPwX54XG9x4w0zy_N4Q_Q@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZUwPf_uanjF==gTGBMJrn8uCq52XYvAEorNkLrUdoawg@mail.gmail.com
2019-02-26 12:22:57 -05:00
Michael Meskes 42ccbe4351 Free memory in ecpg bytea regression test.
While not really a problem it's easier to run tools like valgrind against it
when fixed.
2019-02-26 11:59:35 +01:00
Michael Meskes 0cc0507940 Hopefully fixing memory handling issues in ecpglib that Coverity found. 2019-02-26 10:56:54 +01:00
Michael Paquier 6e52209eb1 Simplify some code in pg_rewind when syncing target directory
9a4059d simplified the flush of target data folder when finishing
processing, and could have done a bit more.

Discussion: https://postgr.es/m/20190131064759.GA13429@paquier.xyz
2019-02-26 16:08:24 +09:00
Peter Geoghegan 2ab23445bc Remove unneeded argument from _bt_getstackbuf().
_bt_getstackbuf() is called at exactly two points following commit
efada2b8e9 (one call site is concerned with page splits, while the
other is concerned with page deletion).  The parent buffer returned by
_bt_getstackbuf() is write-locked in both cases.  Remove the 'access'
argument and make _bt_getstackbuf() assume that callers require a
write-lock.
2019-02-25 17:47:43 -08:00
Peter Geoghegan 067786cea0 Correct obsolete nbtree page deletion comment.
Commit efada2b8e9, which made the nbtree page deletion algorithm more
robust, removed _bt_getstackbuf() calls from _bt_pagedel().  It failed
to update a comment that referenced the earlier approach.  Update the
comment to explain that the _bt_getstackbuf() page deletion call site
mirrors the only other remaining _bt_getstackbuf() call site, which is
reached during page splits.
2019-02-25 16:54:18 -08:00
Peter Eisentraut b6926dee0e psql: Remove obsolete code
The check in create_help.pl for a null end tag (</>) has been obsolete
since the conversion from SGML to XML, since XML does not allow that
anymore.
2019-02-25 12:00:29 +01:00
Peter Eisentraut bc09d5e4cc Remove unnecessary use of PROCEDURAL
Remove some unnecessary, legacy-looking use of the PROCEDURAL keyword
before LANGUAGE.  We mostly don't use this anymore, so some of these
look a bit old.

There is still some use in pg_dump, which is harder to remove because
it's baked into the archive format, so I'm not touching that.

Discussion: https://www.postgresql.org/message-id/2330919b-62d9-29ac-8de3-58c024fdcb96@2ndquadrant.com
2019-02-25 08:38:59 +01:00
Michael Paquier effe7d9552 Make release of 2PC identifier and locks consistent in COMMIT PREPARED
When preparing a transaction in two-phase commit, a dummy PGPROC entry
holding the GID used for the transaction is registered, which gets
released once COMMIT PREPARED is run.  Prior releasing its shared memory
state, all the locks taken in the prepared transaction are released
using a dedicated set of callbacks (pgstat and multixact having similar
callbacks), which may cause the locks to be released before the GID is
set free.

Hence, there is a small window where lock conflicts could happen, for
example:
- Transaction A releases its locks, still holding its GID in shared
memory.
- Transaction B held a lock which conflicted with locks of transaction
A.
- Transaction B continues its processing, reusing the same GID as
transaction A.
- Transaction B fails because of a conflicting GID, already in use by
transaction A.

This commit changes the shared memory state release so as post-commit
callbacks and predicate lock cleanup happen consistently with the shared
memory state cleanup for the dummy PGPROC entry.  The race window is
small and 2PC had this issue from the start, so no backpatch is done.
On top if that fixes discussed involved ABI breakages, which are not
welcome in stable branches.

Reported-by: Oleksii Kliukin, Ildar Musin
Diagnosed-by: Oleksii Kliukin, Ildar Musin
Author: Michael Paquier
Reviewed-by: Masahiko Sawada, Oleksii Kliukin
Discussion: https://postgr.es/m/BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6@hintbits.com
2019-02-25 14:19:34 +09:00
Thomas Munro 29ddb548f6 Fix inconsistent out-of-memory error reporting in dsa.c.
Commit 16be2fd1 introduced the flag DSA_ALLOC_NO_OOM to control whether
the DSA allocator would raise an error or return InvalidDsaPointer on
failure to allocate.  One edge case was not handled correctly: if we
fail to allocate an internal "span" object for a large allocation, we
would always return InvalidDsaPointer regardless of the flag; a caller
not expecting that could then dereference a null pointer.

This is a plausible explanation for a one-off report of a segfault.

Remove a redundant pair of braces so that all three stanzas that handle
DSA_ALLOC_NO_OOM match in style, for visual consistency.

While fixing inconsistencies, if FreePageManagerGet() can't supply the
pages that our book-keeping says it should be able to supply, then we
should always report a FATAL error.  Previously we treated that as a
regular allocation failure in one code path, but as a FATAL condition
in another.

Back-patch to 10, where dsa.c landed.

Author: Thomas Munro
Reported-by: Jakub Glapa
Discussion: https://postgr.es/m/CAEepm=2oPqXxyWQ-1o60tpOLrwkw=VpgNXqqF1VN2EyO9zKGQw@mail.gmail.com
2019-02-25 11:11:40 +13:00