Commit Graph

19542 Commits

Author SHA1 Message Date
Alexander Korotkov da24961e9e Fix declarations of couple jsonpath functions
Make jsonb_path_query_array() and jsonb_path_query_first() use
PG_FUNCTION_ARGS macro instead of its expansion.
2019-05-19 07:45:42 +03:00
Tom Lane 93f03dad82 Make BufFileCreateTemp() ensure that temp tablespaces are set up.
If PrepareTempTablespaces() has never been called in the current
transaction, OpenTemporaryFile() will fall back to using the default
tablespace, which is a bug if the user wanted temp files placed elsewhere.
gistInitBuildBuffers() appears to have this disease already, and it
seems like an easy trap for future coders to fall into.

We discussed other ways to close this gap, but none of them are prettier
or more reliable than just having BufFileCreateTemp do it.  In particular,
having fd.c do this creates layering issues that we could do without.

Per suggestion from Melanie Plageman.  Arguably this is a bug fix, but
nobody seems very excited about back-patching, so change in HEAD only.

Discussion: https://postgr.es/m/CAAKRu_YwzjuGAmmaw4-8XO=OVFGR1QhY_Pq-t3wjb9ribBJb_Q@mail.gmail.com
2019-05-18 13:51:16 -04:00
Tom Lane d307954a7d "A void function may not return a value".
Per buildfarm.
2019-05-18 00:40:39 -04:00
Andres Freund 147e3722f7 tableam: Avoid relying on relation size to determine validity of tids.
Instead add a tableam callback to do so. To avoid adding per
validation overhead, pass a scan to tuple_tid_valid. In heap's case
we'd otherwise incurred a RelationGetNumberOfBlocks() call for each
tid - which'd have added noticable overhead to nodeTidscan.c.

Author: Andres Freund
Reviewed-By: Ashwin Agrawal
Discussion: https://postgr.es/m/20190515185447.gno2jtqxyktylyvs@alap3.anarazel.de
2019-05-17 18:56:55 -07:00
Andres Freund 7f44ede594 tableam: Don't assume that every AM uses md.c style storage.
Previously various parts of the code routed size requests through
RelationGetNumberOfBlocks[InFork]. That works if md.c is used by the
AM, but not otherwise.

Add a tableam callback to return the size of the table. As not every
AM will use postgres' BLCKSZ, have it return bytes, and have
RelationGetNumberOfBlocksInFork() round the byte size up into blocks.

To allow code outside of the AM to determine the actual relation size
map InvalidForkNumber the total size of a relation, as not every AM
might just need the postgres defined forks.

A few users of RelationGetNumberOfBlocks() ought to be converted away
from that. One case, the use of it to determine whether a tid is
valid, will be fixed in a follow up commit. Others will have to wait
for v13.

Author: Andres Freund
Discussion: https://postgr.es/m/20190423225201.3bbv6tbqzkb5w7cw@alap3.anarazel.de
2019-05-17 18:56:47 -07:00
Tom Lane 6630ccad7a Restructure creation of run-time pruning steps.
Previously, gen_partprune_steps() always built executor pruning steps
using all suitable clauses, including those containing PARAM_EXEC
Params.  This meant that the pruning steps were only completely safe
for executor run-time (scan start) pruning.  To prune at executor
startup, we had to ignore the steps involving exec Params.  But this
doesn't really work in general, since there may be logic changes
needed as well --- for example, pruning according to the last operator's
btree strategy is the wrong thing if we're not applying that operator.
The rules embodied in gen_partprune_steps() and its minions are
sufficiently complicated that tracking their incremental effects in
other logic seems quite impractical.

Short of a complete redesign, the only safe fix seems to be to run
gen_partprune_steps() twice, once to create executor startup pruning
steps and then again for run-time pruning steps.  We can save a few
cycles however by noting during the first scan whether we rejected
any clauses because they involved exec Params --- if not, we don't
need to do the second scan.

In support of this, refactor the internal APIs in partprune.c to make
more use of passing information in the GeneratePruningStepsContext
struct, rather than as separate arguments.

This is, I hope, the last piece of our response to a bug report from
Alan Jackson.  Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-17 19:44:34 -04:00
Alvaro Herrera 75445c1515 More message style fixes
Discussion: https://postgr.es/m/20190515183005.GA26486@alvherre.pgsql
2019-05-16 19:14:31 -04:00
Peter Geoghegan 3f58cc6dd8 Remove extra nbtree half-dead internal page check.
It's not safe for nbtree VACUUM to attempt to delete a target page whose
right sibling is already half-dead, since that would fail the
cross-check when VACUUM attempts to re-find a downlink to the right
sibling in the parent page.  Logic to prevent this from happening was
added by commit 8da3183780, which addressed a bug in the overhaul of
page deletion that went into PostgreSQL 9.4 (commit efada2b8e9).
VACUUM was made to check the right sibling page, and back off when it
happened to be half-dead already.

However, it is only truly necessary to do the right sibling check on the
leaf level, since that transitively determines if the deletion target's
parent's right sibling page is itself undergoing deletion.  Remove the
internal page level check, and add a comment explaining why the leaf
level check alone suffices.

The extra check is also unnecessary due to the fact that internal pages
that are marked half-dead are generally considered corrupt.  Commit
efada2b8e9 established the principle that there should never be
half-dead internal pages (internal pages pending deletion are possible,
but that status is never directly represented in the internal page).
VACUUM will complain about corruption when it encounters half-dead
internal pages, so VACUUM is bound to raise an error one way or another
when an nbtree index has a half-dead internal page (contrib/amcheck will
also report that the page is corrupt).

It's possible that a pg_upgrade'd 9.3 database will still have half-dead
internal pages, so it may seem like there is an argument for leaving the
check in place to reliably get a cleaner error message that advises the
user to REINDEX.  However, leaf pages are also deleted in the first
phase of deletion prior to PostgreSQL 9.4, so I believe we won't even
attempt to re-find the parent page anyway (we won't have the fully
deleted leaf page as the right sibling of our target page, so we won't
even try to find a downlink for it).

Discussion: https://postgr.es/m/CAH2-Wzm_ntmqJjWLRyKzimFmFvk+BnVAvUpaA4s1h9Ja58woaQ@mail.gmail.com
2019-05-16 15:11:58 -07:00
Tom Lane 3922f10646 Fix bogus logic for combining range-partitioned columns during pruning.
gen_prune_steps_from_opexps's notion of how to do this was overly
complicated and underly correct.

Per discussion of a report from Alan Jackson (though this fixes only one
aspect of that problem).  Back-patch to v11 where this code came in.

Amit Langote

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-16 16:25:43 -04:00
Tom Lane 4b1fcb43d0 Fix partition pruning to treat stable comparison operators properly.
Cross-type comparison operators in a btree or hash opclass might be
only stable not immutable (this is true of timestamp vs. timestamptz
for example).  partprune.c ignored this possibility and would perform
plan-time pruning with them anyway, possibly leading to wrong answers
if the environment changed between planning and execution.

To fix, teach gen_partprune_steps() to do things differently when
creating plan-time pruning steps vs. run-time pruning steps.
analyze_partkey_exprs() also needs an extra check, which is rather
annoying but now is not the time to restructure things enough to
avoid that.

While at it, simplify the logic for the plan-time case a little
by insisting that the comparison value be a Const and nothing else.
This relies on the assumption that eval_const_expressions will have
reduced any immutable expression to a Const; which is not quite
100% true, but certainly any case that comes up often enough to be
interesting should have simplification logic there.

Also improve a bunch of inadequate/obsolete/wrong comments.

Per discussion of a report from Alan Jackson (though this fixes only one
aspect of that problem).  Back-patch to v11 where this code came in.

David Rowley, with some further hacking by me

Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
2019-05-16 11:58:21 -04:00
Peter Geoghegan 489e431ba5 Remove obsolete nbtree insertion comment.
Remove a Berkeley-era comment above _bt_insertonpg() that admonishes the
reader to grok Lehman and Yao's paper before making any changes.  This
made a certain amount of sense back when _bt_insertonpg() was
responsible for most of the things that are now spread across
_bt_insertonpg(), _bt_findinsertloc(), _bt_insert_parent(), and
_bt_split(), but it doesn't work like that anymore.

I believe that this comment alludes to the need to "couple" or "crab"
buffer locks as we ascend the tree as page splits cascade upwards.  The
nbtree README already explains this in detail, which seems sufficient.
Besides, the changes to page splits made by commit 40dae7ec53 altered
the exact details of how buffer locks are retained during splits; Lehman
and Yao's original algorithm seems to release the lock on the left child
page/buffer slightly earlier than _bt_insertonpg()/_bt_insert_parent()
can.
2019-05-15 16:53:11 -07:00
Peter Geoghegan 7505da2f45 Reverse order of newitem nbtree candidate splits.
Commit fab25024, which taught nbtree to choose candidate split points
more carefully, had _bt_findsplitloc() record all possible split points
in an initial pass over a page that is about to be split.  The order
that candidate split points were processed and stored in was assumed to
match the offset number order of split points on an imaginary version of
the page that contains the same items as the original, but also fits
newitem (the item that provoked the split precisely because it didn't
fit).

However, the order of split points in the final array was not quite what
was expected: the split point that makes newitem the firstright item
came after the split point that makes newitem the lastleft item -- not
before.  As a result, _bt_findsplitloc() could get confused about the
leftmost and rightmost tuples among all possible split points recorded
for the page.  This seems to have no appreciable impact on the quality
of the final split point chosen by _bt_findsplitloc(), but it's still
wrong.

To fix, switch the order in which newitem candidate splits are recorded
in.  This also makes it possible to describe candidate split points in
terms of which pair of adjoining tuples enclose the split point within
_bt_findsplitloc(), making it clearer why it's generally safe for
_bt_split() to expect lastleft and firstright tuples.
2019-05-15 12:22:07 -07:00
Andres Freund aa4b8c61d2 Handle table_complete_speculative's succeeded argument as documented.
For some reason both callsite and the implementation for heapam had
the meaning inverted (i.e. succeeded == true was passed in case of
conflict). That's confusing.

I (Andres) briefly pondered whether it'd be better to rename
table_complete_speculative's argument to 'bool specConflict' or such,
but decided not to. The 'complete' in the function name for me makes
`succeeded` sound a bit better.

Reported-By: Ashwin Agrawal, Melanie Plageman, Heikki Linnakangas
Discussion:
   https://postgr.es/m/CALfoeitk7-TACwYv3hCw45FNPjkA86RfXg4iQ5kAOPhR+F1Y4w@mail.gmail.com
   https://postgr.es/m/97673451-339f-b21e-a781-998d06b1067c@iki.fi
2019-05-14 12:19:32 -07:00
Tom Lane 7c850320d8 Fix SQL-style substring() to have spec-compliant greediness behavior.
SQL's regular-expression substring() function is defined to have a
pattern argument that's separated into three subpatterns by escape-
double-quote markers; the function result is the part of the input
matching the second subpattern.  The standard makes it clear that
if there is ambiguity about how to match the input to the subpatterns,
the first and third subpatterns should be taken to match the smallest
possible amount of text (i.e., they're "non greedy", in the terms of
our regex code).  We were not doing it that way: the first subpattern
would eat the largest possible amount of text, causing the function
result to be shorter than what the spec requires.

Fix that by attaching explicit greediness quantifiers to the
subpatterns.  (This depends on the regex fix in commit 8a29ed053;
before that, this didn't reliably change the regex engine's behavior.)

Also, by adding parentheses around each subpattern, we ensure that
"|" (OR) in the subpatterns behave sanely.  Previously, "|" in the
first or third subpatterns didn't work.

This patch also makes the function throw error if you write more than
two escape-double-quote markers, and do something sane if you write
just one, and document that behavior.  Previously, an odd number of
markers led to a confusing complaint about unbalanced parentheses,
while extra pairs of markers were just ignored.  (Note that the spec
requires exactly two markers, but we've historically allowed there
to be none, and this patch preserves the old behavior for that case.)

In passing, adjust some substring() test cases that didn't really
prove what they said they were testing for: they used patterns
that didn't match the data string, so that the output would be
NULL whether or not the function was really strict.

Although this is certainly a bug fix, changing the behavior in back
branches seems undesirable: applications could perhaps be depending on
the old behavior, since it's not obviously wrong unless you read the
spec very closely.  Hence, no back-patch.

Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
2019-05-14 11:27:31 -04:00
Tom Lane fb489e4b31 In bootstrap mode, use default signal handling for SIGINT etc.
Previously, the code pointed the standard process-termination signals
to postgres.c's die().  That would typically result in an attempt to
execute a transaction abort, which is not possible in bootstrap mode,
leading to PANIC.  This choice seems to be a leftover from an old code
structure in which the same signal-assignment code was used for many
sorts of auxiliary processes, including interactive standalone
backends.  It's not very sensible for bootstrap mode, which has no
interest in either interactivity or continuing after an error.  We can
get better behavior with less effort by just letting normal process
termination happen, after which the parent initdb process will clean up.

This is basically cosmetic in any case, since initdb will react the
same way whether bootstrap dies on a signal or abort().  Given the
lack of previous complaints, I don't feel a need to back-patch,
even though the behavior is old.

Discussion: https://postgr.es/m/3850b11a.5121.16aaf827e4a.Coremail.thunder1@126.com
2019-05-14 10:22:28 -04:00
Peter Eisentraut 037165ca95 Update SQL features/conformance information to SQL:2016 2019-05-14 15:44:37 +02:00
Peter Eisentraut eb3a1376c9 Update information_schema for SQL:2016
This is mainly a light renumbering to match the sections in the
standard.
2019-05-14 15:44:37 +02:00
Heikki Linnakangas 22251686f0 Detect internal GiST page splits correctly during index build.
As we descend the GiST tree during insertion, we modify any downlinks on
the way down to include the new tuple we're about to insert (if they don't
cover it already). Modifying an existing downlink might cause an internal
page to split, if the new downlink tuple is larger than the old one. If
that happens, we need to back up to the parent and re-choose a page to
insert to. We used to detect that situation, thanks to the NSN-LSN
interlock normally used to detect concurrent page splits, but that got
broken by commit 9155580fd5. With that commit, we now use a dummy constant
LSN value for every page during index build, so the LSN-NSN interlock no
longer works. I thought that was OK because there can't be any other
backends modifying the index during index build, but missed that the
insertion itself can modify the page we're inserting to. The consequence
was that we would sometimes insert the new tuple to an incorrect page, one
whose downlink doesn't cover the new tuple.

To fix, add a flag to the stack that keeps track of the state while
descending tree, to indicate that a page was split, and that we need to
retry the descend from the parent.

Thomas Munro first reported that the contrib/intarray regression test was
failing occasionally on the buildfarm after commit 9155580fd5. The failure
was intermittent, because the gistchoose() function is not deterministic,
and would only occasionally create the right circumstances for this bug to
cause the failure.

Patch by Anastasia Lubennikova, with some changes by me to make it work
correctly also when the internal page split also causes the "grandparent"
to be split.

Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJRzLo7tZExWfSbwM3XuK7aAK7FhdBV0FLkbUG%2BW0v0zg%40mail.gmail.com
2019-05-14 13:18:44 +03:00
Heikki Linnakangas e95d550bbb Fix comment on when HOT update is possible.
The conditions listed in this comment have changed several times, and at
some point the thing that the "if so" referred to was negated.

The text was OK up to 9.6. It was differently wrong in v10, v11 and
master, so fix in all those versions.
2019-05-14 13:06:28 +03:00
Etsuro Fujita 7d9eca59cf Fix typo. 2019-05-14 16:05:37 +09:00
Michael Paquier 7e19929ea2 Fix duplicated words in comments
Author: Stephen Amell
Discussion: https://postgr.es/m/539fa271-21b3-777e-a468-d96cffe9c768@gmail.com
2019-05-14 09:37:35 +09:00
Peter Geoghegan ae7291acbc Standardize ItemIdData terminology.
The term "item pointer" should not be used to refer to ItemIdData
variables, since that is needlessly ambiguous.  Only
ItemPointerData/ItemPointer variables should be called item pointers.

To fix, establish the convention that ItemIdData variables should always
be referred to either as "item identifiers" or "line pointers".  The
term "item identifier" already predominates in docs and translatable
messages, and so should be the preferred alternative there.

Discussion: https://postgr.es/m/CAH2-Wz=c=MZQjUzde3o9+2PLAPuHTpVZPPdYxN=E4ndQ2--8ew@mail.gmail.com
2019-05-13 15:53:39 -07:00
Tom Lane 32ebb35128 Fix logical replication's ideas about which type OIDs are built-in.
Only hand-assigned type OIDs should be presumed to match across different
PG servers; those assigned during genbki.pl or during initdb are likely
to change due to addition or removal of unrelated objects.

This means that the cutoff should be FirstGenbkiObjectId (in HEAD)
or FirstBootstrapObjectId (before that), not FirstNormalObjectId.
Compare postgres_fdw's is_builtin() test.

It's likely that this error has no observable consequence in a
normally-functioning system, since ATM the only affected type OIDs are
system catalog rowtypes and information_schema types, which would not
typically be interesting for logical replication.  But you could
probably break it if you tried hard, so back-patch.

Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-13 17:23:00 -04:00
Tom Lane e34ee993fb Improve commentary about hack in is_publishable_class().
The FirstNormalObjectId test here is a kluge that needs to go away,
but the only substitute we can think of is to add a column to pg_class,
which will take more work than can be handled right now.  Add some
commentary in the meanwhile.

Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-13 17:05:48 -04:00
Peter Geoghegan 9b42e71376 Don't leave behind junk nbtree pages during split.
Commit 8fa30f906b reduced the elevel of a number of "can't happen"
_bt_split() errors from PANIC to ERROR.  At the same time, the new right
page buffer for the split could continue to be acquired well before the
critical section.  This was possible because it was relatively
straightforward to make sure that _bt_split() could not throw an error,
with a few specific exceptions.  The exceptional cases were safe because
they involved specific, well understood errors, making it possible to
consistently zero the right page before actually raising an error using
elog().  There was no danger of leaving around a junk page, provided
_bt_split() stuck to this coding rule.

Commit 8224de4f, which introduced INCLUDE indexes, added code to make
_bt_split() truncate away non-key attributes.  This happened at a point
that broke the rule around zeroing the right page in _bt_split().  If
truncation failed (perhaps due to palloc() failure), that would result
in an errant right page buffer with junk contents.  This could confuse
VACUUM when it attempted to delete the page, and should be avoided on
general principle.

To fix, reorganize _bt_split() so that truncation occurs before the new
right page buffer is even acquired.  A junk page/buffer will not be left
behind if _bt_nonkey_truncate()/_bt_truncate() raise an error.

Discussion: https://postgr.es/m/CAH2-WzkcWT_-NH7EeL=Az4efg0KCV+wArygW8zKB=+HoP=VWMw@mail.gmail.com
Backpatch: 11-, where INCLUDE indexes were introduced.
2019-05-13 10:27:59 -07:00
Michael Paquier 1171dbde2d Fix incorrect return value in JSON equality function for scalars
equalsJsonbScalarValue() uses a boolean as return type, however for one
code path -1 gets returned, which is confusing.  The origin of the
confusion is visibly that this code got copy-pasted from
compareJsonbScalarValue() since it has been introduced in d1d50bf.

No backpatch, as this is only cosmetic.

Author: Rikard Falkeborn
Discussion: https://postgr.es/m/CADRDgG7mJnek6HNW13f+LF6V=6gag9PM+P7H5dnyWZAv49aBGg@mail.gmail.com
2019-05-13 09:11:50 +09:00
Tom Lane 8a29ed0530 Fix misoptimization of "{1,1}" quantifiers in regular expressions.
A bounded quantifier with m = n = 1 might be thought a no-op.  But
according to our documentation (which traces back to Henry Spencer's
original man page) it still imposes greediness, or non-greediness in the
case of the non-greedy variant "{1,1}?", on whatever it's attached to.

This turns out not to work though, because parseqatom() optimizes away
the m = n = 1 case without regard for whether it's supposed to change
the greediness of the argument RE.

We can fix this by just not applying the optimization when the greediness
needs to change; the subsequent general cases handle it fine.

The three cases in which we can still apply the optimization are
(a) no quantifier, or quantifier does not impose a preference;
(b) atom has no greediness property, implying it cannot match a
variable amount of text anyway; or
(c) quantifier's greediness is same as atom's.
Note that in most cases where one of these applies, we'd have exited
earlier in the "not a messy case" fast path.  I think it's now only
possible to get to the optimization when the atom involves capturing
parentheses or a non-top-level backref.

Back-patch to all supported branches.  I'd ordinarily be hesitant to
put a subtle behavioral change into back branches, but in this case
it's very hard to see a reason why somebody would write "{1,1}?" unless
they're trying to get the documented change-of-greediness behavior.

Discussion: https://postgr.es/m/5bb27a41-350d-37bf-901e-9d26f5592dd0@charter.net
2019-05-12 18:53:38 -04:00
Noah Misch d02768ddd1 Fail pgwin32_message_to_UTF16() for SQL_ASCII messages.
The function had been interpreting SQL_ASCII messages as UTF8, throwing
an error when they were invalid UTF8.  The new behavior is consistent
with pg_do_encoding_conversion().  This affects LOG_DESTINATION_STDERR
and LOG_DESTINATION_EVENTLOG, which will send untranslated bytes to
write() and ReportEventA().  On buildfarm member bowerbird, enabling
log_connections caused an error whenever the role name was not valid
UTF8.  Back-patch to 9.4 (all supported versions).

Discussion: https://postgr.es/m/20190512015615.GD1124997@rfd.leadboat.com
2019-05-12 10:33:05 -07:00
Tom Lane 85ccb6899c Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b37 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0d deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-11 21:27:29 -04:00
Tom Lane 610747d86e Cope with EINVAL and EIDRM shmat() failures in PGSharedMemoryAttach.
There's a very old race condition in our code to see whether a pre-existing
shared memory segment is still in use by a conflicting postmaster: it's
possible for the other postmaster to remove the segment in between our
shmctl() and shmat() calls.  It's a narrow window, and there's no risk
unless both postmasters are using the same port number, but that's possible
during parallelized "make check" tests.  (Note that while the TAP tests
take some pains to choose a randomized port number, pg_regress doesn't.)
If it does happen, we treated that as an unexpected case and errored out.

To fix, allow EINVAL to be treated as segment-not-present, and the same
for EIDRM on Linux.  AFAICS, the considerations here are basically
identical to the checks for acceptable shmctl() failures, so I documented
and coded it that way.

While at it, adjust PGSharedMemoryAttach's API to remove its undocumented
dependency on UsedShmemSegAddr in favor of passing the attach address
explicitly.  This makes it easier to be sure we're using a null shmaddr
when probing for segment conflicts (thus avoiding questions about what
EINVAL means).  I don't think there was a bug there, but it required
fragile assumptions about the state of UsedShmemSegAddr during
PGSharedMemoryIsInUse.

Commit c09850992 may have made this failure more probable by applying
the conflicting-segment tests more often.  Hence, back-patch to all
supported branches, as that was.

Discussion: https://postgr.es/m/22224.1557340366@sss.pgh.pa.us
2019-05-10 14:56:41 -04:00
Michael Paquier 508300e2e1 Improve and fix some error handling for REINDEX INDEX/TABLE CONCURRENTLY
This improves the user experience when it comes to restrict several
flavors of REINDEX CONCURRENTLY.  First, for INDEX, remove a restriction
on shared relations as we already check after catalog relations.  Then,
for TABLE, add a proper error message when attempting to run the command
on system catalogs.  The code path of CREATE INDEX CONCURRENTLY already
complains about that, but if a REINDEX is issued then then the error
generated is confusing.

While on it, add more tests to check restrictions on catalog indexes and
on toast table/index for catalogs.  Some error messages are improved,
with wording suggestion coming from Tom Lane.

Reported-by: Tom Lane
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/23694.1556806002@sss.pgh.pa.us
2019-05-10 08:18:46 +09:00
Tom Lane 24c19e9f66 Repair issues with faulty generation of merge-append plans.
create_merge_append_plan failed to honor the CP_EXACT_TLIST flag:
it would generate the expected targetlist but then it felt free to
add resjunk sort targets to it.  This demonstrably leads to assertion
failures in v11 and HEAD, and it's probably just accidental that we
don't see the same in older branches.  I've not looked into whether
there would be any real-world consequences in non-assert builds.
In HEAD, create_append_plan has sprouted the same problem, so fix
that too (although we do not have any test cases that seem able to
reach that bug).  This is an oversight in commit 3fc6e2d7f which
invented the CP_EXACT_TLIST flag, so back-patch to 9.6 where that
came in.

convert_subquery_pathkeys would create pathkeys for subquery output
values if they match any EquivalenceClass known in the outer query
and are available in the subquery's syntactic targetlist.  However,
the second part of that condition is wrong, because such values might
not appear in the subquery relation's reltarget list, which would
mean that they couldn't be accessed above the level of the subquery
scan.  We must check that they appear in the reltarget list, instead.
This can lead to dropping knowledge about the subquery's sort
ordering, but I believe it's okay, because any sort key that the
outer query actually has any interest in would appear in the
reltarget list.

This second issue is of very long standing, but right now there's no
evidence that it causes observable problems before 9.6, so I refrained
from back-patching further than that.  We can revisit that choice if
somebody finds a way to make it cause problems in older branches.
(Developing useful test cases for these issues is really problematic;
fixing convert_subquery_pathkeys removes the only known way to exhibit
the create_merge_append_plan bug, and neither of the test cases added
by this patch causes a problem in all branches, even when considering
the issues separately.)

The second issue explains bug #15795 from Suresh Kumar R ("could not
find pathkey item to sort" with nested DISTINCT queries).  I stumbled
across the first issue while investigating that.

Discussion: https://postgr.es/m/15795-fadb56c8e44ee73c@postgresql.org
2019-05-09 16:53:05 -04:00
Etsuro Fujita edbcbe277d postgres_fdw: Fix cost estimation for aggregate pushdown.
In commit 7012b132d0, which added support for aggregate pushdown in
postgres_fdw, the expense of evaluating the final scan/join target
computed by make_group_input_target() was not accounted for at all in
costing aggregate pushdown paths with local statistics.  The right fix
for this would be to have a separate upper stage to adjust the final
scan/join relation (see comments for apply_scanjoin_target_to_paths());
but for now, fix by adding the tlist eval cost when costing aggregate
pushdown paths with local statistics.

Apply this to HEAD only to avoid destabilizing existing plan choices.

Author: Etsuro Fujita
Reviewed-By: Antonin Houska
Discussion: https://postgr.es/m/5C66A056.60007%40lab.ntt.co.jp
2019-05-09 18:39:23 +09:00
Thomas Munro 47a338cfcd Fix SxactGlobalXmin tracking.
Commit bb16aba50 broke the code that maintains SxactGlobalXmin.  It
could get stuck when a well-timed READ ONLY transaction runs.  If
SxactGlobalXmin stops advancing, transactions on the
FinishedSerializableTransactions queue are never cleaned up, so
resources are effectively leaked.  Revert that hunk of the commit.

Also revert another similar hunk that was probably harmless, but
unnecessary and unjustified, relating to the DOOMED flag in case of
RO_SAFE early release.

Author: Thomas Munro
Reported-by: Tom Lane
Discussion: https://postgr.es/m/16170.1557251214%40sss.pgh.pa.us
2019-05-09 20:32:26 +12:00
Tom Lane 2d7d946cd3 Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.

The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed.  That's even sillier.  With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.

With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID.  Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)".  But keep
IsCatalogRelation as a convenience function.

This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid.  The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.

While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.

Also improve the comments in catalog.c.

There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId.  I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId.  But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.

Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-08 23:27:38 -04:00
Peter Geoghegan d95e36dc38 Remove obsolete nbtree split REDO routine comment.
Commit dd299df818, which added suffix truncation to nbtree, simplified
the WAL record format used by page splits.  It became necessary to
explicitly WAL-log the new high key for the left half of a split in all
cases, which relieved the REDO routine from having to reconstruct a new
high key for the left page by copying the first item from the right
page.  Remove a comment that referred to the previous practice.
2019-05-08 12:47:20 -07:00
Alvaro Herrera 61639816b8 Fix error messages
Some messages related to foreign servers were reporting the server name
without quotes, or not at all; our style is to have all names be quoted,
and the server name already appears quoted in a few other messages, so
just add quotes and make them all consistent.

Remove an extra "s" in other messages (typos introduced by myself in
f56f8f8da6).
2019-05-08 13:20:16 -04:00
Peter Eisentraut add85ead4a Fix table lock levels for REINDEX INDEX CONCURRENTLY
REINDEX CONCURRENTLY locks tables with ShareUpdateExclusiveLock rather
than the ShareLock used by a plain REINDEX.  However,
RangeVarCallbackForReindexIndex() was not updated for that and still
used the ShareLock only.  This would lead to lock upgrades later,
leading to possible deadlocks.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de
2019-05-08 14:30:23 +02:00
Heikki Linnakangas 256be1050c Remove leftover reference to old "flat file" mechanism in a comment.
The flat file mechanism was removed in PostgreSQL 9.0.
2019-05-08 09:32:34 +03:00
Peter Geoghegan d65b5ccad6 Correct obsolete nbtsort.c minimum key comment.
It is no longer possible under any circumstances for nbtree code to
reconstruct a strict lower bound key (parent page's pivot tuple key) for
a right sibling page by retrieving the first item in the right sibling
page.
2019-05-07 21:42:12 -07:00
Alexander Korotkov 29ceacc3f9 Improve error reporting in jsonpath
This commit contains multiple improvements to error reporting in jsonpath
including but not limited to getting rid of following things:

 * definition of error messages in macros,
 * errdetail() when valueable information could fit to errmsg(),
 * word "singleton" which is not properly explained anywhere,
 * line breaks in error messages.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/14890.1555523005%40sss.pgh.pa.us
Author: Alexander Korotkov
Reviewed-by: Tom Lane
2019-05-08 01:02:59 +03:00
Fujii Masao b84dbc8eb8 Add TRUNCATE parameter to VACUUM.
This commit adds new parameter to VACUUM command, TRUNCATE,
which specifies that VACUUM should attempt to truncate off
any empty pages at the end of the table and allow the disk space
for the truncated pages to be returned to the operating system.

This parameter, if specified, overrides the vacuum_truncate
reloption. If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Author: Fujii Masao
Reviewed-by: Julien Rouhaud, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoD+qtrSDL=GSma4Wd3kLYLeRC0hPna-YAdkDeV4z156vg@mail.gmail.com
2019-05-08 02:10:33 +09:00
Magnus Hagander 98719af6c2 Fix typos and clarify a comment
Author: Daniel Gustafsson <daniel@yesql.se>
2019-05-07 18:26:09 +02:00
Tom Lane 8d0ddccec6 Avoid "invalid memory alloc request size" while reading pg_stat_activity.
On a 64-bit machine, if you set track_activity_query_size and
max_connections such that their product exceeds 1GB, shared memory
setup will still succeed (given enough RAM), but attempts to read
pg_stat_activity fail with "invalid memory alloc request size".
Work around that by using MemoryContextAllocHuge to allocate the
local copy of the activity strings.  Using the "huge" API costs us
nothing extra in normal cases, and it seems better than throwing
an error and/or explaining to people why they can't do this.

This situation seems insanely profligate today, but who knows what
people will consider normal in ten or twenty years?  So let's fix it
in HEAD but not worry about a back-patch.

Per report from James Tomson.

Discussion: https://postgr.es/m/1CFDCCD6-B268-48D8-85C8-400D2790B2C3@pushd.com
2019-05-07 11:41:37 -04:00
Amit Kapila 7db0cde6b5 Revert "Avoid the creation of the free space map for small heap relations".
This feature was using a process local map to track the first few blocks
in the relation.  The map was reset each time we get the block with enough
freespace.  It was discussed that it would be better to track this map on
a per-relation basis in relcache and then invalidate the same whenever
vacuum frees up some space in the page or when FSM is created.  The new
design would be better both in terms of API design and performance.

List of commits reverted, in reverse chronological order:

06c8a5090e  Improve code comments in b0eaa4c51b.
13e8643bfc  During pg_upgrade, conditionally skip transfer of FSMs.
6f918159a9  Add more tests for FSM.
9c32e4c350  Clear the local map when not used.
29d108cdec  Update the documentation for FSM behavior..
08ecdfe7e5  Make FSM test portable.
b0eaa4c51b  Avoid creation of the free space map for small heap relations.

Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
2019-05-07 09:30:24 +05:30
Dean Rasheed a0905056fd Use checkAsUser for selectivity estimator checks, if it's set.
In examine_variable() and examine_simple_variable(), when checking the
user's table and column privileges to determine whether to grant
access to the pg_statistic data, use checkAsUser for the privilege
checks, if it's set. This will be the case if we're accessing the
table via a view, to indicate that we should perform privilege checks
as the view owner rather than the current user.

This change makes this planner check consistent with the check in the
executor, so the planner will be able to make use of statistics if the
table is accessible via the view. This fixes a performance regression
introduced by commit e2d4ef8de8, which affects queries against
non-security barrier views in the case where the user doesn't have
privileges on the underlying table, but the view owner does.

Note that it continues to provide the same safeguards controlling
access to pg_statistic for direct table access (in which case
checkAsUser won't be set) and for security barrier views, because of
the nearby checks on rte->security_barrier and rte->securityQuals.

Back-patch to all supported branches because e2d4ef8de8 was.

Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.
2019-05-06 11:54:32 +01:00
Dean Rasheed 1aebfbea83 Fix security checks for selectivity estimation functions with RLS.
In commit e2d4ef8de8, security checks were added to prevent
user-supplied operators from running over data from pg_statistic
unless the user has table or column privileges on the table, or the
operator is leakproof. For a table with RLS, however, checking for
table or column privileges is insufficient, since that does not
guarantee that the user has permission to view all of the column's
data.

Fix this by also checking for securityQuals on the RTE, and insisting
that the operator be leakproof if there are any. Thus the
leakproofness check will only be skipped if there are no securityQuals
and the user has table or column privileges on the table -- i.e., only
if we know that the user has access to all the data in the column.

Back-patch to 9.5 where RLS was added.

Dean Rasheed, reviewed by Jonathan Katz and Stephen Frost.

Security: CVE-2019-10130
2019-05-06 11:38:43 +01:00
Tom Lane bd5e8b627b Bring pg_nextoid()'s error messages into line with message style guide.
Noticed while reviewing nearby code.  Given all the disclaimers about
this not being meant as user-facing code, I wonder whether we should
make these non-translatable?  But in any case there's little excuse
for them not to be good English.
2019-05-05 17:06:53 -04:00
Tom Lane 9691aa72e2 Fix style violations in syscache lookups.
Project style is to check the success of SearchSysCacheN and friends
by applying HeapTupleIsValid to the result.  A tiny minority of calls
creatively did it differently.  Bring them into line with the rest.

This is just cosmetic, since HeapTupleIsValid is indeed just a null
check at the moment ... but that may not be true forever, and in any
case it puts a mental burden on readers who may wonder why these
call sites are not like the rest.

Back-patch to v11 just to keep the branches in sync.  (The bulk of these
errors seem to have originated in v11 or v12, though a few are old.)

Per searching to see if anyplace else had made the same error
repaired in 62148c352.
2019-05-05 13:10:07 -04:00
Tom Lane 62148c3520 Add check for syscache lookup failure in update_relispartition().
Omitted in commit 05b38c7e6 (though it looks like the original blame
belongs to 9e9befac4).  A failure is admittedly unlikely, but if it
did happen, SIGSEGV is not the approved method of reporting it.

Per Coverity.  Back-patch to v11 where the broken code originated.
2019-05-05 12:44:32 -04:00
Peter Geoghegan 7b37f4b02e Correct more obsolete nbtree page split comments.
Commit 3f342839 corrected obsolete comments about buffer locks at the
main _bt_insert_parent() call site, but missed similar obsolete comments
above _bt_insert_parent() itself.  Both sets of comments were rendered
obsolete by commit 40dae7ec53, which made the nbtree page split
algorithm more robust.  Fix the comments that were missed the first time
around now.

In passing, refine a related _bt_insert_parent() comment about
re-finding the parent page to insert new downlink.
2019-05-03 13:34:45 -07:00
Tom Lane f884dca495 Remove RelationSetIndexList().
In the wake of commit f912d7dec, RelationSetIndexList isn't used any
more.  It was always a horrid wart, so getting rid of it is very nice.
We can also convert rd_indexvalid back to a plain boolean.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-03 10:26:14 -04:00
Tom Lane f912d7dec2 Fix reindexing of pg_class indexes some more.
Commits 3dbb317d3 et al failed under CLOBBER_CACHE_ALWAYS testing.
Investigation showed that to reindex pg_class_oid_index, we must
suppress accesses to the index (via SetReindexProcessing) before we call
RelationSetNewRelfilenode, or at least before we do CommandCounterIncrement
therein; otherwise, relcache reloads happening within the CCI may try to
fetch pg_class rows using the index's new relfilenode value, which is as
yet an empty file.

Of course, the point of 3dbb317d3 was that that ordering didn't work
either, because then RelationSetNewRelfilenode's own update of the index's
pg_class row cannot access the index, should it need to.

There are various ways we might have got around that, but Andres Freund
came up with a brilliant solution: for a mapped index, we can really just
skip the pg_class update altogether.  The only fields it was actually
changing were relpages etc, but it was just setting them to zeroes which
is useless make-work.  (Correct new values will be installed at the end
of index build.)  All pg_class indexes are mapped and probably always will
be, so this eliminates the problem by removing work rather than adding it,
always a pleasant outcome.  Having taught RelationSetNewRelfilenode to do
it that way, we can revert the code reordering in reindex_index.  (But
I left the moved setup code where it was; there seems no reason why it
has to run without use of the old index.  If you're trying to fix a
busted pg_class index, you'll have had to disable system index use
altogether to get this far.)

Moreover, this means we don't need RelationSetIndexList at all, because
reindex_relation's hacking to make "REINDEX TABLE pg_class" work is
likewise now unnecessary.  We'll leave that code in place in the back
branches, but a follow-on patch will remove it in HEAD.

In passing, do some minor cleanup for commit 5c1560606 (in HEAD only),
notably removing a duplicate newrnode assignment.

Patch by me, using a core idea due to Andres Freund.  Back-patch to all
supported branches, as 3dbb317d3 was.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-02 19:11:28 -04:00
Alvaro Herrera 2bf372a4ae heap_prepare_freeze_tuple: Simplify coding
Commit d2599ecfcc introduced some contorted, confused code around:
readers would think that it's possible for HeapTupleHeaderGetXmin return
a non-frozen value for some frozen tuples, which would be disastrous.
There's no actual bug, but it seems better to make it clearer.

Per gripe from Tom Lane and Andres Freund.
Discussion: https://postgr.es/m/30116.1555430496@sss.pgh.pa.us
2019-05-02 16:14:08 -04:00
Peter Geoghegan 6dd86c269d Fix nbtsort.c's page space accounting.
Commit dd299df818, which made heap TID a tiebreaker nbtree index
column, introduced new rules on page space management to make suffix
truncation safe.  In general, suffix truncation needs to have a small
amount of extra space available on the new left page when splitting a
leaf page.  This is needed in case it turns out that truncation cannot
even "truncate away the heap TID column", resulting in a
larger-than-firstright leaf high key with an explicit heap TID
representation.

Despite all this, CREATE INDEX/nbtsort.c did not account for the
possible need for extra heap TID space on leaf pages when deciding
whether or not a new item could fit on current page.  This could lead to
"failed to add item to the index page" errors when CREATE
INDEX/nbtsort.c tried to finish off a leaf page that lacked space for a
larger-than-firstright leaf high key (it only had space for firstright
tuple, which was just short of what was needed following "truncation").

Several conditions needed to be met all at once for CREATE INDEX to
fail.  The problem was in the hard limit on what will fit on a page,
which tends to be masked by the soft fillfactor-wise limit.  The easiest
way to recreate the problem seems to be a CREATE INDEX on a low
cardinality text column, with tuples that are of non-uniform width,
using a fillfactor of 100.

To fix, bring nbtsort.c in line with nbtsplitloc.c, which already
pessimistically assumes that all leaf page splits will have high keys
that have a heap TID appended.

Reported-By: Andreas Joseph Krogh
Discussion: https://postgr.es/m/VisenaEmail.c5.3ee7fe277d514162.16a6d785bea@tc7-visena
2019-05-02 12:33:35 -07:00
Robert Haas dd69597988 Fix some problems with VACUUM (INDEX_CLEANUP FALSE).
The new nleft_dead_tuples and nleft_dead_itemids fields are confusing
and do not seem like the correct way forward.  One of them is tested
via an assertion that can fail, as it has already done on buildfarm
member topminnow.  Remove the assertion and the fields.

Change the logic for the case where a tuple is not initially pruned
by heap_page_prune but later diagnosed HEAPTUPLE_DEAD by
HeapTupleSatisfiesVacuum.  Previously, tupgone = true was set in
that case, which leads to treating the tuple as one that will be
removed.  In a normal vacuum, that's OK, because we'll remove
index entries for it and then the second heap pass will remove the
tuple itself, but when index cleanup is disabled, those things
don't happen, so we must instead treat it as a recently-dead
tuple that we have voluntarily chosen to keep.

Report and analysis by Tom Lane.  This patch loosely based on one
from Masahiko Sawada, but I changed most of it.
2019-05-02 10:07:13 -04:00
Magnus Hagander 659e53498c Fix union for pgstat message types
The message type for temp files and for checksum failures were missing
from the union. Due to the coding style used there was no compiler error
when this happend. So change the code to actively use the union thereby
producing a compiler error if the same mistake happens again, suggested
by Tom Lane.

Author: Julien Rouhaud
Reported-By: Tomas Vondra
Discussion: https://postgr.es/m/20190430163328.zd4rrlnbvgaqlcdz@development
2019-05-01 12:30:44 +02:00
Andres Freund 4b40d40b30 Fix unused variable compiler warning in !debug builds.
Introduced in 3dbb317d3.  Fix by using the new local variable in more
places.

Reported-By: Bruce Momjian (off-list)
Backpatch: 9.4-, like 3dbb317d3
2019-04-30 17:45:32 -07:00
Andres Freund a0b5bb6e02 Improve comment spelling and style in llvmjit_deform.c.
Author: Justin Pryzby
Discussion:
    https://postgr.es/m/20190408141828.GE10080@telsasoft.com
    https://postgr.es/m/20181127184133.GM10913@telsasoft.com
2019-04-30 16:20:07 -07:00
Andres Freund 3a48005b00 Improve code inferring length of bitmap for JITed tuple deforming.
While discussing comment improvements (see next commit) by Justin
Pryzby, Tom complained about a few details of the logic to infer the
length of the NULL bitmap when building the JITed tuple deforming
function. That bitmap allows to avoid checking the tuple header's
natts, a check which often causes a pipeline stall

Improvements:
a) As long as missing columns aren't taken into account, we can
   continue to infer the length of the NULL bitmap from NOT NULL
   columns following it. Previously we stopped at the first missing
   column.  It's unlikely to matter much in practice, but the
   alternative would have been to document why we stop.
b) For robustness reasons it seems better to also check against
   attisdropped - RemoveAttributeById() sets attnotnull to false, but
   an additional check is trivial.
c) Improve related comments

Discussion: https://postgr.es/m/20637.1555957068@sss.pgh.pa.us
Backpatch: -
2019-04-30 16:20:07 -07:00
Tom Lane e03ff73969 Clean up handling of constraint_exclusion and enable_partition_pruning.
The interaction of these parameters was a bit confused/confusing,
and in fact v11 entirely misses the opportunity to apply partition
constraints when a partition is accessed directly (rather than
indirectly from its parent).

In HEAD, establish the principle that enable_partition_pruning controls
partition pruning and nothing else.  When accessing a partition via its
parent, we do partition pruning (if enabled by enable_partition_pruning)
and then there is no need to consider partition constraints in the
constraint_exclusion logic.  When accessing a partition directly, its
partition constraints are applied by the constraint_exclusion logic,
only if constraint_exclusion = on.

In v11, we can't have such a clean division of these GUCs' effects,
partly because we don't want to break compatibility too much in a
released branch, and partly because the clean coding requires
inheritance_planner to have applied partition pruning to a partitioned
target table, which it doesn't in v11.  However, we can tweak things
enough to cover the missed case, which seems like a good idea since
it's potentially a performance regression from v10.  This patch keeps
v11's previous behavior in which enable_partition_pruning overrides
constraint_exclusion for an inherited target table, though.

In HEAD, also teach relation_excluded_by_constraints that it's okay to use
inheritable constraints when trying to prune a traditional inheritance
tree.  This might not be thought worthy of effort given that that feature
is semi-deprecated now, but we have enough infrastructure that it only
takes a couple more lines of code to do it correctly.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/9813f079-f16b-61c8-9ab7-4363cab28d80@lab.ntt.co.jp
Discussion: https://postgr.es/m/29069.1555970894@sss.pgh.pa.us
2019-04-30 15:03:50 -04:00
Alvaro Herrera 9f8b717a80 Message style fixes 2019-04-30 10:33:37 -04:00
Alvaro Herrera 9a83afecb7 Widen tuple counter variables from long to int64
Mistake in ab0dfc961b6a; progress reporting would have wrapped around
for indexes created with more than 2^31 tuples.

Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wz=WbNxc5ob5NJ9yqo2RMJ0q4HXDS30GVCobeCvC9A1L9A@mail.gmail.com
2019-04-30 10:27:38 -04:00
Andres Freund 3dbb317d32 Fix potential assertion failure when reindexing a pg_class index.
When reindexing individual indexes on pg_class it was possible to
either trigger an assertion failure:
TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((index)->rd_id)))

That's because reindex_index() called SetReindexProcessing() - which
enables an asserts ensuring no index insertions happen into the index
- before calling RelationSetNewRelfilenode(). That not correct for
indexes on pg_class, because RelationSetNewRelfilenode() updates the
relevant pg_class row, which needs to update the indexes.

The are two reasons this wasn't noticed earlier. Firstly the bug
doesn't trigger when reindexing all of pg_class, as reindex_relation
has code "hiding" all yet-to-be-reindexed indexes. Secondly, the bug
only triggers when the the update to pg_class doesn't turn out to be a
HOT update - otherwise there's no index insertion to trigger the
bug. Most of the time there's enough space, making this bug hard to
trigger.

To fix, move RelationSetNewRelfilenode() to before the
SetReindexProcessing() (and, together with some other code, to outside
of the PG_TRY()).

To make sure the error checking intended by SetReindexProcessing() is
more robust, modify CatalogIndexInsert() to check
ReindexIsProcessingIndex() even when the update is a HOT update.

Also add a few regression tests for REINDEXing of system catalogs.

The last two improvements would have prevented some of the issues
fixed in 5c1560606d from being introduced in the first place.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
Backpatch: 9.4-, the bug is present in all branches
2019-04-29 19:42:08 -07:00
Andres Freund 5c1560606d Fix several recently introduced issues around handling new relation forks.
Most of these stem from d25f519107 "tableam: relation creation, VACUUM
FULL/CLUSTER, SET TABLESPACE.".

1) To pass data to the relation_set_new_filenode()
   RelationSetNewRelfilenode() was made to update RelationData.rd_rel
   directly. That's not OK however, as it makes the relcache entries
   temporarily inconsistent. Which among other scenarios is a problem
   if a REINDEX targets an index on pg_class - the
   CatalogTupleUpdate() in RelationSetNewRelfilenode().  Presumably
   that was introduced because other places in the code do so - while
   those aren't "good practice" they don't appear to be actively
   buggy (e.g. because system tables may not be targeted).

   I (Andres) should have caught this while reviewing and signficantly
   evolving the code in that commit, mea culpa.

   Fix that by instead passing in the new RelFileNode as separate
   argument to relation_set_new_filenode() and rely on the relcache to
   update the catalog entry. Also revert that the
   RelationMapUpdateMap() call was changed to immediate, and undo some
   other more unnecessary changes.

2) Document that the relation_set_new_filenode cannot rely on the
   whole relcache entry to be valid. It might be worthwhile to
   refactor the code to never have to rely on that, but given the way
   heap_create() is currently coded, that'd be a large change.

3) ATExecSetTableSpace() shouldn't do FlushRelationBuffers() itself. A
   table AM might not use shared buffers at all. Move to
   index_copy_data() and heapam_relation_copy_data().

4) heapam_relation_set_new_filenode() previously sometimes accessed
   rel->rd_rel->relpersistence rather than the `persistence`
   argument. Code movement mistake.

5) Previously heapam_relation_set_new_filenode() re-opened the smgr
   relation to create the init for, if necesary. Instead have
   RelationCreateStorage() return the SMgrRelation and use it to
   create the init fork.

6) Add a note about the danger of modifying the relcache directly to
   ATExecSetTableSpace() - it's currently not a bug because there's a
   check ERRORing for catalog tables.

Regression tests and assertion improvements that together trigger the
bug described in 1) will be added in a later commit, as there is a
related bug on all branches.

Reported-By: Michael Paquier
Diagnosed-By: Tom Lane and Andres Freund
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20190418011430.GA19133@paquier.xyz
2019-04-29 19:28:05 -07:00
Peter Geoghegan 9ee7414ed0 Remove obsolete _bt_insert_parent() comment.
Remove a comment that refers to a coding practice that was fully removed
by commit a8b8f4db, which introduced MarkBufferDirty().  It looks like
the comment was even obsolete before then, since it concerns
write-ordering dependencies with synchronous buffer writes.
2019-04-29 14:14:38 -07:00
Tom Lane a1a789eb5a In walreceiver, don't try to do ereport() in a signal handler.
This is quite unsafe, even for the case of ereport(FATAL) where we won't
return control to the interrupted code, and despite this code's use of
a flag to restrict the areas where we'd try to do it.  It's possible
for example that we interrupt malloc or free while that's holding a lock
that's meant to protect against cross-thread interference.  Then, any
attempt to do malloc or free within ereport() will result in a deadlock,
preventing the walreceiver process from exiting in response to SIGTERM.
We hypothesize that this explains some hard-to-reproduce failures seen
in the buildfarm.

Hence, get rid of the immediate-exit code in WalRcvShutdownHandler,
as well as the logic associated with WalRcvImmediateInterruptOK.
Instead, we need to take care that potentially-blocking operations
in the walreceiver's data transmission logic (libpqwalreceiver.c)
will respond reasonably promptly to the process's latch becoming
set and then call ProcessWalRcvInterrupts.  Much of the needed code
for that was already present in libpqwalreceiver.c.  I refactored
things a bit so that all the uses of PQgetResult use latch-aware
waiting, but didn't need to do much more.

These changes should be enough to ensure that libpqwalreceiver.c
will respond promptly to SIGTERM whenever it's waiting to receive
data.  In principle, it could block for a long time while waiting
to send data too, and this patch does nothing to guard against that.
I think that that hazard is mostly theoretical though: such blocking
should occur only if we fill the kernel's data transmission buffers,
and we don't generally send enough data to make that happen without
waiting for input.  If we find out that the hazard isn't just
theoretical, we could fix it by using PQsetnonblocking, but that
would require more ticklish changes than I care to make now.

This is a bug fix, but it seems like too big a change to push into
the back branches without much more testing than there's time for
right now.  Perhaps we'll back-patch once we have more confidence
in the change.

Patch by me; thanks to Thomas Munro for review.

Discussion: https://postgr.es/m/20190416070119.GK2673@paquier.xyz
2019-04-29 12:26:07 -04:00
Peter Eisentraut cd3e27464c Fix potential catalog corruption with temporary identity columns
If a temporary table with an identity column and ON COMMIT DROP is
created in a single-statement transaction (not useful, but allowed),
it would leave the catalog corrupted.  We need to add a
CommandCounterIncrement() so that PreCommit_on_commit_actions() sees
the created dependency between table and sequence and can clean it
up.

The analogous and more useful case of doing this in a transaction
block already runs some CommandCounterIncrement() before it gets to
the on-commit cleanup, so it wasn't a problem in practical use.

Several locations for placing the new CommandCounterIncrement() call
were discussed.  This patch places it at the end of
standard_ProcessUtility().  That would also help if other commands
were to create catalog entries that some on-commit action would like
to see.

Bug: #15631
Reported-by: Serge Latyntsev <dnsl48@gmail.com>
Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-04-29 08:49:03 +02:00
Noah Misch 90e7f31773 Use preprocessor conditions compatible with Emacs indent.
Emacs wrongly indented hundreds of subsequent lines.
2019-04-28 12:56:53 -07:00
Tom Lane e481d26285 Clean up minor warnings from buildfarm.
Be more consistent about use of XXXGetDatum macros in new jsonpath
code.  This is mostly to avoid having code that looks randomly
different from everyplace else that's doing the exact same thing.

In pg_regress.c, avoid an unreferenced-function warning from
compilers that don't understand pg_attribute_unused().  Putting
the function inside the same #ifdef as its only caller is more
straightforward coding anyway.

In be-secure-openssl.c, avoid use of pg_attribute_unused() on a label.
That's pretty creative, but there's no good reason to suppose that
it's portable, and there's absolutely no need to use goto's here in the
first place.  (This wasn't actually causing any buildfarm complaints,
but it's new code in v12 so it has no portability track record.)
2019-04-28 12:45:55 -04:00
Tom Lane c01eb619a8 Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.

In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.

Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.

In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0.  Both of these changes still seem like smart
future-proofing.

This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to.  However, fixing those problems will take more work which may not
get back-patched into v11.  We need a fix for the corruption problem now.

Per bug #15672 from Jianing Yang.

Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.

Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 17:18:07 -04:00
Alvaro Herrera 05b38c7e63 Fix partitioned index attachment
When an existing index in a partition is attached to a new index on
its parent, we forgot to set the "relispartition" flag correctly, which
meant that it was not possible to find the index in various operations,
such as adding a foreign key constraint that references that partitioned
table.  One of four places that was assigning the parent index was
forgetting to do that, so fix by shifting responsibility of updating the
flag to the routine that changes the parent.

Author: Amit Langote, Álvaro Herrera
Reported-by: Hubert "depesz" Lubaczewski
Discussion: https://postgr.es/m/CA+HiwqHMsRtRYRWYTWavKJ8x14AFsv7bmAV46mYwnfD3vy8goQ@mail.gmail.com
2019-04-25 11:22:29 -04:00
Fujii Masao 978b032d1f Fix function names in comments.
Commit 3eb77eba5a renamed some functions, but forgot to
update some comments referencing to those functions.
This commit fixes those function names in the comments.

Kyotaro Horiguchi
2019-04-25 23:43:48 +09:00
Alvaro Herrera 87259588d0 Fix tablespace inheritance for partitioned rels
Commit ca4103025d left a few loose ends.  The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8, but some things remained:

* When ALTER TABLE rewrites tables, the indexes must remain in the
  tablespace they were originally in.  This didn't work because
  index recreation during ALTER TABLE runs manufactured SQL (yuck),
  which runs afoul of default_tablespace in competition with the parent
  relation tablespace.  To fix, reset default_tablespace to the empty
  string temporarily, and add the TABLESPACE clause as appropriate.

* Setting a partitioned rel's tablespace to the database default is
  confusing; if it worked, it would direct the partitions to that
  tablespace regardless of default_tablespace.  But in reality it does
  not work, and making it work is a larger project.  Therefore, throw
  an error when this condition is detected, to alert the unwary.

Add some docs and tests, too.

Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 10:31:32 -04:00
Tom Lane 0fae846232 Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
2019-04-24 14:15:44 -04:00
Alvaro Herrera 0a999e1290 Unify error messages
... for translatability purposes.
2019-04-24 09:26:13 -04:00
Andres Freund fdc7efcc30 Allow pg_class xid & multixid horizons to not be set.
This allows table AMs that don't need these horizons. This was already
documented in the tableam relation_set_new_filenode callback, but an
assert prevented if from actually working (the test AM code contained
the change itself). Defang the asserts in the general code, and move
the stronger ones into heap AM.

Relatedly, after CLUSTER/VACUUM, we'd always assign a relfrozenxid /
relminmxid. Change the table_relation_copy_for_cluster() interface to
allow the AM to overwrite the horizons that get set on the pg_class
entry.  This'd also in the future allow AMs like heap to compute a
relfrozenxid during rewrite that's the table's actual minimum rather
than a pre-determined value.  Arguably it'd have been better to move
the whole computation / setting of those values into the callback, but
it seems likely that for other reasons it'd be better to be able to
use one value to vacuum/cluster multiple tables (e.g. a toast's
horizon shouldn't be different than the table's).

Reported-By: Heikki Linnakangas
Author: Andres Freund
Discussion: https://postgr.es/m/9a7fb9cc-2419-5db7-8840-ddc10c93f122@iki.fi
2019-04-23 21:42:12 -07:00
Tom Lane 7ad1cd31bf Repair assorted issues in locale data extraction.
cache_locale_time (extraction of LC_TIME-related info) had never been
taught the lessons we previously learned about extraction of info related
to LC_MONETARY and LC_NUMERIC.  Specifically, commit 95a777c61 taught
PGLC_localeconv() that data coming out of localeconv() was in an encoding
determined by the relevant locale, but we didn't realize that there's a
similar issue with strftime().  And commit a4930e7ca hardened
PGLC_localeconv() against errors occurring partway through, but failed
to do likewise for cache_locale_time().  So, rearrange the latter
function to perform encoding conversion and not risk failure while
it's got the locales set to temporary values.

This time around I also changed PGLC_localeconv() to treat it as FATAL
if it can't restore the previous settings of the locale values.  There
is no reason (except possibly OOM) for that to fail, and proceeding with
the wrong locale values seems like a seriously bad idea --- especially
on Windows where we have to also temporarily change LC_CTYPE.  Also,
protect against the possibility that we can't identify the codeset
reported for LC_MONETARY or LC_NUMERIC; rather than just failing,
try to validate the data without conversion.

The user-visible symptom this fixes is that if LC_TIME is set to a locale
name that implies an encoding different from the database encoding,
non-ASCII localized day and month names would be retrieved in the wrong
encoding, leading to either unexpected encoding-conversion error reports
or wrong output from to_char().  The other possible failure modes are
unlikely enough that we've not seen reports of them, AFAIK.

The encoding conversion problems do not manifest on Windows, since
we'd already created special-case code to handle that issue there.

Per report from Juan José Santamaría Flecha.  Back-patch to all
supported versions.

Juan José Santamaría Flecha and Tom Lane

Discussion: https://postgr.es/m/CAC+AXB22So5aZm2vZe+MChYXec7gWfr-n-SK-iO091R0P_1Tew@mail.gmail.com
2019-04-23 18:51:30 -04:00
Peter Geoghegan 9b10926263 Prevent O(N^2) unique index insertion edge case.
Commit dd299df8 made nbtree treat heap TID as a tiebreaker column,
establishing the principle that there is only one correct location (page
and page offset number) for every index tuple, no matter what.
Insertions of tuples into non-unique indexes proceed as if heap TID
(scan key's scantid) is just another user-attribute value, but
insertions into unique indexes are more delicate.  The TID value in
scantid must initially be omitted to ensure that the unique index
insertion visits every leaf page that duplicates could be on.  The
scantid is set once again after unique checking finishes successfully,
which can force _bt_findinsertloc() to step right one or more times, to
locate the leaf page that the new tuple must be inserted on.

Stepping right within _bt_findinsertloc() was assumed to occur no more
frequently than stepping right within _bt_check_unique(), but there was
one important case where that assumption was incorrect: inserting a
"duplicate" with NULL values.  Since _bt_check_unique() didn't do any
real work in this case, it wasn't appropriate for _bt_findinsertloc() to
behave as if it was finishing off a conventional unique insertion, where
any existing physical duplicate must be dead or recently dead.
_bt_findinsertloc() might have to grovel through a substantial portion
of all of the leaf pages in the index to insert a single tuple, even
when there were no dead tuples.

To fix, treat insertions of tuples with NULLs into a unique index as if
they were insertions into a non-unique index: never unset scantid before
calling _bt_search() to descend the tree, and bypass _bt_check_unique()
entirely.  _bt_check_unique() is no longer responsible for incoming
tuples with NULL values.

Discussion: https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
2019-04-23 10:33:57 -07:00
Tom Lane f4a3fdfbdc Avoid order-of-execution problems with ALTER TABLE ADD PRIMARY KEY.
Up to now, DefineIndex() was responsible for adding attnotnull constraints
to the columns of a primary key, in any case where it hadn't been
convenient for transformIndexConstraint() to mark those columns as
is_not_null.  It (or rather its minion index_check_primary_key) did this
by executing an ALTER TABLE SET NOT NULL command for the target table.

The trouble with this solution is that if we're creating the index due
to ALTER TABLE ADD PRIMARY KEY, and the outer ALTER TABLE has additional
sub-commands, the inner ALTER TABLE's operations executed at the wrong
time with respect to the outer ALTER TABLE's operations.  In particular,
the inner ALTER would perform a validation scan at a point where the
table's storage might be inconsistent with its catalog entries.  (This is
on the hairy edge of being a security problem, but AFAICS it isn't one
because the inner scan would only be interested in the tuples' null
bitmaps.)  This can result in unexpected failures, such as the one seen
in bug #15580 from Allison Kaptur.

To fix, let's remove the attempt to do SET NOT NULL from DefineIndex(),
reducing index_check_primary_key's role to verifying that the columns are
already not null.  (It shouldn't ever see such a case, but it seems wise
to keep the check for safety.)  Instead, make transformIndexConstraint()
generate ALTER TABLE SET NOT NULL subcommands to be executed ahead of
the ADD PRIMARY KEY operation in every case where it can't force the
column to be created already-not-null.  This requires only minor surgery
in parse_utilcmd.c, and it makes for a much more satisfying spec for
transformIndexConstraint(): it's no longer having to take it on faith
that someone else will handle addition of NOT NULL constraints.

To make that work, we have to move the execution of AT_SetNotNull into
an ALTER pass that executes ahead of AT_PASS_ADD_INDEX.  I moved it to
AT_PASS_COL_ATTRS, and put that after AT_PASS_ADD_COL to avoid failure
when the column is being added in the same command.  This incidentally
fixes a bug in the only previous usage of AT_PASS_COL_ATTRS, for
AT_SetIdentity: it didn't work either for a newly-added column.

Playing around with this exposed a separate bug in ALTER TABLE ONLY ...
ADD PRIMARY KEY for partitioned tables.  The intent of the ONLY modifier
in that context is to prevent doing anything that would require holding
lock for a long time --- but the implied SET NOT NULL would recurse to
the child partitions, and do an expensive validation scan for any child
where the column(s) were not already NOT NULL.  To fix that, invent a
new ALTER subcommand AT_CheckNotNull that just insists that a child
column be already NOT NULL, and apply that, not AT_SetNotNull, when
recursing to children in this scenario.  This results in a slightly laxer
definition of ALTER TABLE ONLY ... SET NOT NULL for partitioned tables,
too: that command will now work as long as all children are already NOT
NULL, whereas before it just threw up its hands if there were any
partitions.

In passing, clean up the API of generateClonedIndexStmt(): remove a
useless argument, ensure that the output argument is not left undefined,
update the header comment.

A small side effect of this change is that no-such-column errors in ALTER
TABLE ADD PRIMARY KEY now produce a different message that includes the
table name, because they are now detected by the SET NOT NULL step which
has historically worded its error that way.  That seems fine to me, so
I didn't make any effort to avoid the wording change.

The basic bug #15580 is of very long standing, and these other bugs
aren't new in v12 either.  However, this is a pretty significant change
in the way ALTER TABLE ADD PRIMARY KEY works.  On balance it seems best
not to back-patch, at least not till we get some more confidence that
this patch has no new bugs.

Patch by me, but thanks to Jie Zhang for a preliminary version.

Discussion: https://postgr.es/m/15580-d1a6de5a3d65da51@postgresql.org
Discussion: https://postgr.es/m/1396E95157071C4EBBA51892C5368521017F2E6E63@G08CNEXMBPEKD02.g08.fujitsu.local
2019-04-23 12:25:27 -04:00
Tom Lane c06e3550dc Don't request pretty-printed output from xmlNodeDump().
xml.c passed format = 1 to xmlNodeDump(), resulting in sometimes getting
extra whitespace (newlines + spaces) in the output.  We don't really want
that, first because whitespace might be semantically significant in some
XML uses, and second because it happens only very inconsistently.  Only
one case in our regression tests is affected.

This potentially affects the results of xpath() and the XMLTABLE construct,
when emitting nodeset values.

Note that the older code in contrib/xml2 doesn't do this; it seems
to have been an aboriginal bad decision in commit ea3b212fe.

While this definitely seems like a bug to me, the small number of
complaints to date argues against back-patching a behavioral change.
Hence, fix in HEAD only, at least for now.

Per report from Jean-Marc Voillequin.

Discussion: https://postgr.es/m/1EC8157EB499BF459A516ADCF135ADCE3A23A9CA@LON-WGMSX712.ad.moodys.net
2019-04-23 10:51:07 -04:00
Michael Paquier ccae190b91 Fix detection of passwords hashed with MD5 or SCRAM-SHA-256
This commit fixes a couple of issues related to the way password
verifiers hashed with MD5 or SCRAM-SHA-256 are detected, leading to
being able to store in catalogs passwords which do not follow the
supported hash formats:
- A MD5-hashed entry was checked based on if its header uses "md5" and
if the string length matches what is expected.  Unfortunately the code
never checked if the hash only used hexadecimal characters, as reported
by Tom Lane.
- A SCRAM-hashed entry was checked based on only its header, which
should be "SCRAM-SHA-256$", but it never checked for any fields
afterwards, as reported by Jonathan Katz.

Backpatch down to v10, which is where SCRAM has been introduced, and
where password verifiers in plain format have been removed.

Author: Jonathan Katz
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/016deb6b-1f0a-8e9f-1833-a8675b170aa9@postgresql.org
Backpatch-through: 10
2019-04-23 15:43:21 +09:00
Andres Freund b5f58cf213 Convert gist to compute page level xid horizon on primary.
Due to parallel development, gist added the missing conflict
information in c952eae52a, while 558a9165e0 moved that computation
to the primary for the index types that already had it.  Thus adapt
gist to also compute on the primary, using
index_compute_xid_horizon_for_tuples() instead of its own copy of the
logic.

This also adds pg_waldump support for XLOG_GIST_DELETE records, which
previously was not properly present.

Bumps WAL version.

Author: Andres Freund
Discussion: https://postgr.es/m/20190406050243.bszosdg4buvabfrt@alap3.anarazel.de
2019-04-22 14:28:30 -07:00
Tomas Vondra d08c44f7a4 Fix mvdistinct and dependencies size calculations
The formulas used to calculate size while (de)serializing mvndistinct
and functional dependencies were based on offset() of the structs. But
that is incorrect, because the structures are not copied directly, we
we copy the individual fields directly.

At the moment this works fine, because there is no alignment padding
on any platform we support. But it might break if we ever added some
fields into any of the structs, for example. It's also confusing.

Fixed by reworking the macros to directly sum sizes of serialized
fields. The macros are now useful only for serialiation, so there is
no point in keeping them in the public header file. So make them
private by moving them to the .c files.

Also adds a couple more asserts to check the serialization, and fixes
an incorrect allocation of MVDependency instead of (MVDependency *).

Reported-By: Tom Lane
Discussion: https://postgr.es/m/29785.1555365602@sss.pgh.pa.us
2019-04-21 20:23:34 +02:00
Andres Freund b8b94ea129 Fix slot type issue for fuzzy distance index scan over out-of-core table AM.
For amcanreorderby scans the nodeIndexscan.c's reorder queue holds
heap tuples, but the underlying table likely does not. Before this fix
we'd return different types of slots, depending on whether the tuple
came from the reorder queue, or from the index + table.

While that could be fixed by signalling that the node doesn't return a
fixed type of slot, it seems better to instead remove the separate
slot for the reorder queue, and use ExecForceStoreHeapTuple() to store
tuples from the queue. It's not particularly common to need
reordering, after all.

This reverts most of the iss_ReorderQueueSlot related changes to
nodeIndexscan.c made in 1a0586de36, except that now
ExecForceStoreHeapTuple() is used instead of ExecStoreHeapTuple().

Noticed when testing zheap against the in-core version of tableam.

Author: Andres Freund
2019-04-19 11:42:37 -07:00
Andres Freund 88e6ad3054 Fix two memory leaks around force-storing tuples in slots.
As reported by Tom, when ExecStoreMinimalTuple() had to perform a
conversion to store the minimal tuple in the slot, it forgot to
respect the shouldFree flag, and leaked the tuple into the current
memory context if true.  Fix that by freeing the tuple in that case.

Looking at the relevant code made me (Andres) realize that not having
the shouldFree parameter to ExecForceStoreHeapTuple() was a bad
idea. Some callers had to locally implement the necessary logic, and
in one case it was missing, creating a potential per-group leak in
non-hashed aggregation.

The choice to not free the tuple in ExecComputeStoredGenerated() is
not pretty, but not introduced by this commit - I'll start a separate
discussion about it.

Reported-By: Tom Lane
Discussion: https://postgr.es/m/366.1555382816@sss.pgh.pa.us
2019-04-19 11:39:56 -07:00
Tom Lane 4d5840cea9 Fix problems with auto-held portals.
HoldPinnedPortals() did things in the wrong order: it must not mark
a portal autoHeld until it's been successfully held.  Otherwise,
a failure while persisting the portal results in a server crash
because we think the portal is in a good state when it's not.

Also add a check that portal->status is READY before attempting to
hold a pinned portal.  We have such a check before the only other
use of HoldPortal(), so it seems unwise not to check it here.

Lastly, rethink the responsibility for where to call HoldPinnedPortals.
The comment for it imagined that it was optional for any individual PL
to call it or not, but that cannot be the case: if some outer level of
procedure has a pinned portal, failing to persist it when an inner
procedure commits is going to be trouble.  Let's have SPI do it instead
of the individual PLs.  That's not a complete solution, since in theory
a PL might not be using SPI to perform commit/rollback, but such a PL
is going to have to be aware of lots of related requirements anyway.
(This change doesn't cause an API break for any external PLs that might
be calling HoldPinnedPortals per the old regime, because calling it
twice during a commit or rollback sequence won't hurt.)

Per bug #15703 from Julian Schauder.  Back-patch to v11 where this code
came in.

Discussion: https://postgr.es/m/15703-c12c5bc0ea34ba26@postgresql.org
2019-04-19 11:20:37 -04:00
Michael Paquier 148266fa35 Fix collection of typos and grammar mistakes in docs and comments
Author: Justin Pryzby
Discussion: https://postgr.es/m/20190330224333.GQ5815@telsasoft.com
2019-04-19 16:57:40 +09:00
Andres Freund 75e03eabea Fix potential use-after-free for BEFORE UPDATE row triggers on non-core AMs.
When such a trigger returns the old row version, it naturally get
stored in the slot for the trigger result. When a table AMs doesn't
store HeapTuples internally, ExecBRUpdateTriggers() frees the old row
version passed to triggers - but before this fix it might still be
referenced by the slot holding the new tuple.

Noticed when running the out-of-core zheap AM against the in-core
version of tableam.

Author: Andres Freund
2019-04-18 17:53:54 -07:00
Peter Eisentraut bb385c4fb0 Fix handling of temp and unlogged tables in FOR ALL TABLES publications
If a FOR ALL TABLES publication exists, temporary and unlogged tables
are ignored for publishing changes.  But CheckCmdReplicaIdentity()
would still check in that case that such a table has a replica
identity set before accepting updates.  To fix, have
GetRelationPublicationActions() return that such a table publishes no
actions.

Discussion: https://www.postgresql.org/message-id/f3f151f7-c4dd-1646-b998-f60bd6217dd3@2ndquadrant.com
2019-04-18 08:55:55 +02:00
Bruce Momjian fb9c475597 postgresql.conf.sample: add proper defaults for include actions
Previously, include actions include_dir, include_if_exists, and include
listed commented-out values which were not the defaults, which is
inconsistent with other entries.  Instead, replace them with '', which
is the default value.

Reported-by: Emanuel Araújo

Discussion: https://postgr.es/m/CAMuTAkYMx6Q27wpELDR3_v9aG443y7ZjeXu15_+1nGUjhMWOJA@mail.gmail.com

Backpatch-through: 9.4
2019-04-17 18:12:10 -04:00
Tom Lane 8cde7f4948 Fix assorted minor bogosity in GSSAPI transport error messages.
I noted that some buildfarm members were complaining about %ld being
used to format values that are (probably) declared size_t.  Use %zu
instead, and insert a cast just in case some versions of the GSSAPI
API declare the length field differently.  While at it, clean up
gratuitous differences in wording of equivalent messages, show
the complained-of length in all relevant messages not just some,
include trailing newline where needed, adjust random deviations
from project-standard code layout and message style, etc.
2019-04-17 17:06:50 -04:00
Tom Lane b4f96d69ad Minor jsonpath fixes.
Restore missed "make clean" rule, fix misspelling.

John Naylor

Discussion: https://postgr.es/m/CACPNZCt5B8jDCCGQiFoSuqmg-za_NCy4QDioBTLaNRih9+-bXg@mail.gmail.com
2019-04-17 13:37:00 -04:00
Magnus Hagander 252b707bc4 Return NULL for checksum failures if checksums are not enabled
Returning 0 could falsely indicate that there is no problem. NULL
correctly indicates that there is no information about potential
problems.

Also return 0 as numbackends instead of NULL for shared objects (as no
connection can be made to a shared object only).

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
2019-04-17 13:51:48 +02:00
Michael Paquier 9010156445 Fix thinko introduced by 82a5649 in slot.c
When saving a replication slot, failing to close the temporary path used
to save the slot information is considered as a failure and reported as
such.  However the code forgot to leave immediately as other failure
paths do.

Noticed while looking up at this area of the code for another patch.
2019-04-17 10:01:22 +09:00
Michael Paquier 47ac2033d4 Simplify some ERROR paths clearing wait events and transient files
Transient files and wait events get normally cleaned up when seeing an
exception (be it in the context of a transaction for a backend or
another process like the checkpointer), hence there is little point in
complicating error code paths to do this work.  This shaves a bit of
code, and removes some extra handling with errno which needed to be
preserved during the cleanup steps done.

Reported-by: Masahiko Sawada
Author: Michael Paquier
Reviewed-by: Tom Lane, Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDhHYVq5KkXfkaHhmjA-zJYj-e4teiRAJefvXuKJz1tKQ@mail.gmail.com
2019-04-17 09:51:45 +09:00
Michael Paquier a6dcf9df4d Rework handling of invalid indexes with REINDEX CONCURRENTLY
Per discussion with others, allowing REINDEX INDEX CONCURRENTLY to work
for invalid indexes when working directly on them can have a lot of
value to unlock situations with invalid indexes without having to use a
dance involving DROP INDEX followed by an extra CREATE INDEX
CONCURRENTLY (which would not work for indexes with constraint
dependency anyway).  This also does not create extra bloat on the
relation involved as this works on individual indexes, so let's enable
it.

Note that REINDEX TABLE CONCURRENTLY still bypasses invalid indexes as
we don't want to bloat the number of indexes defined on a relation in
the event of multiple and successive failures of REINDEX CONCURRENTLY.

More regression tests are added to cover those behaviors, using an
invalid index created with CREATE INDEX CONCURRENTLY.

Reported-by: Dagfinn Ilmari Mannsåker, Álvaro Herrera
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20190411134947.GA22043@alvherre.pgsql
2019-04-17 09:33:51 +09:00
Michael Paquier 5ed4b123b6 Remove duplicate assignment when initializing logical decoder context
The private data in the WAL reader is already getting set when
allocating it.

Author: Antonin Houska
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/30563.1555329094@localhost
2019-04-16 15:08:38 +09:00
Tomas Vondra 3824ca30d1 Fix pg_mcv_list deserialization
The memcpy() was copying type OIDs in the wrong direction, so the
deserialized MCV list always had them as 0. This is mostly harmless
except when printing the data in pg_mcv_list_items(), in which case
it reported

    ERROR:  cache lookup failed for type 0

Also added a simple regression test for pg_mcv_list_items() function,
printing a single-item MCV list.

Reported-By: Dean Rasheed
Discussion: https://postgr.es/m/CAEZATCX6T0iDTTZrqyec4Cd6b4yuL7euu4=rQRXaVBAVrUi1Cg@mail.gmail.com
2019-04-16 00:01:39 +02:00
Tom Lane 4b40e44f07 Fix failure with textual partition hash keys.
Commit 5e1963fb7 overlooked two places in partbounds.c that now
need to pass a collation identifier to the hash functions for
a partition key column.

Amit Langote, per report from Jesper Pedersen

Discussion: https://postgr.es/m/a620f85a-42ab-e0f3-3337-b04b97e2e2f5@redhat.com
2019-04-15 16:47:09 -04:00