Commit Graph

18739 Commits

Author SHA1 Message Date
Amit Kapila ccc84a956b Match the buffer usage tracking for leader and worker backends.
In the leader backend, we don't track the buffer usage for ExecutorStart
phase whereas in worker backend we track it for ExecutorStart phase as
well.  This leads to different value for buffer usage stats for the
parallel and non-parallel query.  Change the code so that worker backend
also starts tracking buffer usage after ExecutorStart.

Author: Amit Kapila and Robert Haas
Reviewed-by: Robert Haas and Andres Freund
Backpatch-through: 9.6 where this code was introduced
Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
2018-08-03 09:11:37 +05:30
Tom Lane 1c2cb2744b Fix run-time partition pruning for appends with multiple source rels.
The previous coding here supposed that if run-time partitioning applied to
a particular Append/MergeAppend plan, then all child plans of that node
must be members of a single partitioning hierarchy.  This is totally wrong,
since an Append could be formed from a UNION ALL: we could have multiple
hierarchies sharing the same Append, or child plans that aren't part of any
hierarchy.

To fix, restructure the related plan-time and execution-time data
structures so that we can have a separate list or array for each
partitioning hierarchy.  Also track subplans that are not part of any
hierarchy, and make sure they don't get pruned.

Per reports from Phil Florent and others.  Back-patch to v11, since
the bug originated there.

David Rowley, with a lot of cosmetic adjustments by me; thanks also
to Amit Langote for review.

Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com
2018-08-01 19:42:52 -04:00
Alvaro Herrera c40489e449 Fix logical replication slot initialization
This was broken in commit 9c7d06d606, which inadvertently gave the
wrong value to fast_forward in one StartupDecodingContext call.  Fix by
flipping the value.  Add a test for the obvious error, namely trying to
initialize a replication slot with an nonexistent output plugin.

While at it, move the CreateDecodingContext call earlier, so that any
errors are reported before sending the CopyBoth message.

Author: Dave Cramer <davecramer@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CADK3HHLVkeRe1v4P02-5hj55H3_yJg3AEtpXyEY5T3wuzO2jSg@mail.gmail.com
2018-08-01 17:47:15 -04:00
Alvaro Herrera 91bc213d90 Fix unnoticed variable shadowing in previous commit
Per buildfarm.
2018-08-01 17:04:57 -04:00
Alvaro Herrera 1c9bb02d8e Fix per-tuple memory leak in partition tuple routing
Some operations were being done in a longer-lived memory context,
causing intra-query leaks.  It's not noticeable unless you're doing a
large COPY, but if you are, it eats enough memory to cause a problem.

Co-authored-by: Kohei KaiGai <kaigai@heterodb.com>
Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CAOP8fzYtVFWZADq4c=KoTAqgDrHWfng+AnEPEZccyxqxPVbbWQ@mail.gmail.com
2018-08-01 16:29:15 -04:00
Peter Eisentraut 0d5f05cde0 Allow multi-inserts during COPY into a partitioned table
CopyFrom allows multi-inserts to be used for non-partitioned tables, but
this was disabled for partitioned tables.  The reason for this appeared
to be that the tuple may not belong to the same partition as the
previous tuple did.  Not allowing multi-inserts here greatly slowed down
imports into partitioned tables.  These could take twice as long as a
copy to an equivalent non-partitioned table.  It seems wise to do
something about this, so this change allows the multi-inserts by
flushing the so-far inserted tuples to the partition when the next tuple
does not belong to the same partition, or when the buffer fills.  This
improves performance when the next tuple in the stream commonly belongs
to the same partition as the previous tuple.

In cases where the target partition changes on every tuple, using
multi-inserts slightly slows the performance.  To get around this we
track the average size of the batches that have been inserted and
adaptively enable or disable multi-inserts based on the size of the
batch.  Some testing was done and the regression only seems to exist
when the average size of the insert batch is close to 1, so let's just
enable multi-inserts when the average size is at least 1.3.  More
performance testing might reveal a better number for, this, but since
the slowdown was only 1-2% it does not seem critical enough to spend too
much time calculating it.  In any case it may depend on other factors
rather than just the size of the batch.

Allowing multi-inserts for partitions required a bit of work around the
per-tuple memory contexts as we must flush the tuples when the next
tuple does not belong the same partition.  In which case there is no
good time to reset the per-tuple context, as we've already built the new
tuple by this time.  In order to work around this we maintain two
per-tuple contexts and just switch between them every time the partition
changes and reset the old one.  This does mean that the first of each
batch of tuples is not allocated in the same memory context as the
others, but that does not matter since we only reset the context once
the previous batch has been inserted.

Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
2018-08-01 10:23:09 +02:00
Tom Lane f3eb76b399 Further fixes for quoted-list GUC values in pg_dump and ruleutils.c.
Commits 742869946 et al turn out to be a couple bricks shy of a load.
We were dumping the stored values of GUC_LIST_QUOTE variables as they
appear in proconfig or setconfig catalog columns.  However, although that
quoting rule looks a lot like SQL-identifier double quotes, there are two
critical differences: empty strings ("") are legal, and depending on which
variable you're considering, values longer than NAMEDATALEN might be valid
too.  So the current technique fails altogether on empty-string list
entries (as reported by Steven Winfield in bug #15248) and it also risks
truncating file pathnames during dump/reload of GUC values that are lists
of pathnames.

To fix, split the stored value without any downcasing or truncation,
and then emit each element as a SQL string literal.

This is a tad annoying, because we now have three copies of the
comma-separated-string splitting logic in varlena.c as well as a fourth
one in dumputils.c.  (Not to mention the randomly-different-from-those
splitting logic in libpq...)  I looked at unifying these, but it would
be rather a mess unless we're willing to tweak the API definitions of
SplitIdentifierString, SplitDirectoriesString, or both.  That might be
worth doing in future; but it seems pretty unsafe for a back-patched
bug fix, so for now accept the duplication.

Back-patch to all supported branches, as the previous fix was.

Discussion: https://postgr.es/m/7585.1529435872@sss.pgh.pa.us
2018-07-31 13:00:14 -04:00
Tom Lane 6574f19127 Remove dead code left behind by 1b6801051. 2018-07-30 19:11:02 -04:00
Alvaro Herrera d25d45e4d9 Verify range bounds to bms_add_range when necessary
Now that the bms_add_range boundary protections are gone, some
alternative ones are needed in a few places.

Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Discussion: https://postgr.es/m/3437ccf8-a144-55ff-1e2f-fc16b437823b@lab.ntt.co.jp
2018-07-30 18:45:39 -04:00
Alvaro Herrera 1b68010518 Change bms_add_range to be a no-op for empty ranges
In commit 84940644de, bms_add_range was added with an API to fail with
an error if an empty range was specified.  This seems arbitrary and
unhelpful, so turn that case into a no-op instead.  Callers that require
further verification on the arguments or result can apply them by
themselves.

This fixes the bug that partition pruning throws an API error for a case
involving the default partition of a default partition, as in the
included test case.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/16590.1532622503@sss.pgh.pa.us
2018-07-30 18:44:33 -04:00
Alvaro Herrera 4f10e7ea7b Set ActiveSnapshot when logically replaying inserts
Input functions for the inserted tuples may require a snapshot, when
they are replayed by native logical replication.  An example is a domain
with a constraint using a SQL-language function, which prior to this
commit failed to apply on the subscriber side.

Reported-by: Mai Peng <maily.peng@webedia-group.com>
Co-authored-by: Minh-Quan TRAN <qtran@itscaro.me>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/4EB4BD78-BFC3-4D04-B8DA-D53DF7160354@webedia-group.com
Discussion: https://postgr.es/m/153211336163.1404.11721804383024050689@wrigleys.postgresql.org
2018-07-30 16:30:07 -04:00
Peter Eisentraut 98efa76fe3 Add ssl_library preset parameter
This allows querying the SSL implementation used on the server side.
It's analogous to using PQsslAttribute(conn, "library") in libpq.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
2018-07-30 13:46:27 +02:00
Tomas Vondra ab87b8fedc Mark variable used only in assertion with PG_USED_FOR_ASSERTS_ONLY
Perpendicular lines always intersect, so the line_interpt_line() return
value in line_closept_point() was used only in an assertion, triggering
compiler warnings in non-assert builds.
2018-07-29 23:08:00 +02:00
Tomas Vondra 74294c7301 Restore handling of -0 in the C field of lines in line_construct().
Commit a7dc63d904 inadvertedly removed this bit originally introduced
by 43fe90f66a, causing regression test failures on some platforms,
due to producing {1,-1,-0} instead of {1,-1,0}.
2018-07-29 21:11:05 +02:00
Michael Paquier 9f7ba88aa4 Fix two oversights from 9ebe0572 which refactored cluster_rel
The recheck option became a no-op as ClusterOption failed to set proper
values for each element.  There was a second code path where local
options got overwritten.

Both issues have been spotted by Coverity.
2018-07-29 22:00:42 +09:00
Noah Misch e09144e6ce Document security implications of qualified names.
Commit 5770172cb0 documented secure schema
usage, and that advice suffices for using unqualified names securely.
Document, in typeconv-func primarily, the additional issues that arise
with qualified names.  Back-patch to 9.3 (all supported versions).

Reviewed by Jonathan S. Katz.

Discussion: https://postgr.es/m/20180721012446.GA1840594@rfd.leadboat.com
2018-07-28 20:08:01 -07:00
Tomas Vondra 6bf0bc842b Provide separate header file for built-in float types
Some data types under adt/ have separate header files, but most simple
ones do not, and their public functions are defined in builtins.h.  As
the patches improving geometric types will require making additional
functions public, this seems like a good opportunity to create a header
for floats types.

Commit 1acf757255 made _cmp functions public to solve NaN issues locally
for GiST indexes.  This patch reworks it in favour of a more widely
applicable API.  The API uses inline functions, as they are easier to
use compared to macros, and avoid double-evaluation hazards.

Author: Emre Hasegeli
Reviewed-by: Kyotaro Horiguchi

Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com
2018-07-29 03:30:48 +02:00
Tomas Vondra a7dc63d904 Refactor geometric functions and operators
The primary goal of this patch is to eliminate duplicate code and share
code between different geometric data types more often, to prepare the
ground for additional patches.  Until now the code reuse was limited,
probably because the simpler types (line and point) were implemented
after the more complex ones.

The changes are quite extensive and can be summarised as:

* Eliminate SQL-level function calls.
* Re-use more functions to implement others.
* Unify internal function names and signatures.
* Remove private functions from geo_decls.h.
* Replace should-not-happen checks with assertions.
* Add comments describe for various functions.
* Remove some unreachable code.
* Define delimiter symbols of line datatype like the other ones.
* Remove the GEODEBUG macro and printf() calls.
* Unify code style of a few oddly formatted lines.

While the goal was to cause minimal user-visible changes, it was not
possible to keep the original behavior in all cases - for example when
handling NaN values, or when reusing code makes the functions return
consistent results.

Author: Emre Hasegeli
Reviewed-by: Kyotaro Horiguchi, me

Discussion: https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com
2018-07-29 02:36:29 +02:00
Alexander Korotkov d2086b08b0 Reduce path length for locking leaf B-tree pages during insertion
In our B-tree implementation appropriate leaf page for new tuple
insertion is acquired using _bt_search() function.  This function always
returns leaf page locked in shared mode.  In order to obtain exclusive
lock, caller have to relock the page.

This commit makes _bt_search() function lock leaf page immediately in
exclusive mode when needed.  That removes unnecessary relock and, in
turn reduces lock contention for B-tree leaf pages.  Our experiments
on multi-core systems showed acceleration up to 4.5 times in corner
case.

Discussion: https://postgr.es/m/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Yoshikazu Imai, Simon Riggs, Peter Geoghegan
2018-07-28 00:31:40 +03:00
Alvaro Herrera 8a9b72c3ea Fix grammar in README.tuplock
Author: Brad DeJong
Discussion: https://postgr.es/m/CAJnrtnxrA4FqZi0Z6kGPQKMiZkWv2xxgSDQ+hv1jDrf8WCKjjw@mail.gmail.com
2018-07-27 10:56:30 -04:00
Robert Haas 3e32109049 Use key and partdesc from PartitionDispatch where possible.
Instead of repeatedly fishing the data out of the relcache entry,
let's use the version that we cached in the PartitionDispatch.  We
could alternatively rip out the PartitionDispatch fields altogether,
but it doesn't make much sense to have them and not use them; before
this patch, partdesc was set but altogether unused.  Amit Langote and
I both thought using them was a litle better than removing them, so
this patch takes that approach.

Discussion: http://postgr.es/m/CA+TgmobFnxcaW-Co-XO8=yhJ5pJXoNkCj6Z7jm9Mwj9FGv-D7w@mail.gmail.com
2018-07-27 09:40:52 -04:00
Amit Kapila 8ce29bb4f0 Fix the buffer release order for parallel index scans.
During parallel index scans, if the current page to be read is deleted, we
skip it and try to get the next page for a scan without releasing the buffer
lock on the current page.  To get the next page, sometimes it needs to wait
for another process to complete its scan and advance it to the next page.
Now, it is quite possible that the master backend has errored out before
advancing the scan and issued a termination signal for all workers.  The
workers failed to notice the termination request during wait because the
interrupts are held due to buffer lock on the previous page.  This lead to
all workers being stuck.

The fix is to release the buffer lock on current page before trying to get
the next page.  We are already doing same in backward scans, but missed
it for forward scans.

Reported-by: Victor Yegorov
Bug: 15290
Diagnosed-by: Thomas Munro and Amit Kapila
Author: Amit Kapila
Reviewed-by: Thomas Munro
Tested-By: Thomas Munro and Victor Yegorov
Backpatch-through: 10 where parallel index scans were introduced
Discussion: https://postgr.es/m/153228422922.1395.1746424054206154747@wrigleys.postgresql.org
2018-07-27 10:53:00 +05:30
Tom Lane 662d12aea1 Avoid crash in eval_const_expressions if a Param's type changes.
Since commit 6719b238e it's been possible for the values of plpgsql
record field variables to be exposed to the planner as Params.
(Before that, plpgsql never supplied values for such variables during
planning, so that the problematic code wasn't reached.)  Other places
that touch potentially-type-mutable Params either cope gracefully or
do runtime-test-and-ereport checks that the type is what they expect.
But eval_const_expressions() just had an Assert, meaning that it either
failed the assertion or risked crashes due to using an incompatible
value.

In this case, rather than throwing an ereport immediately, we can just
not perform a const-substitution in case of a mismatch.  This seems
important for the same reason that the Param fetch was speculative:
we might not actually reach this part of the expression at runtime.

Test case will follow in a separate commit.

Patch by me, pursuant to bug report from Andrew Gierth.
Back-patch to v11 where the previous commit appeared.

Discussion: https://postgr.es/m/87wotkfju1.fsf@news-spur.riddles.org.uk
2018-07-26 16:08:45 -04:00
Andres Freund 3acc4acd9b LLVMJIT: Release JIT context after running ExprContext shutdown callbacks.
Due to inlining it previously was possible that an ExprContext's
shutdown callback pointed to a JITed function. As the JIT context
previously was shut down before the shutdown callbacks were called,
that could lead to segfaults.  Fix the ordering.

Reported-By: Dmitry Dolgov
Author: Andres Freund
Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com
Backpatch: 11-, where JIT compilation was added
2018-07-25 16:31:49 -07:00
Andres Freund bcafa263ec LLVMJIT: Check for 'noinline' attribute in recursively inlined functions.
Previously the attribute was only checked for external functions
inlined, not "static" functions that had to be inlined as
dependencies.

This isn't really a bug, but makes debugging a bit harder. The new
behaviour also makes more sense. Therefore backpatch.

Author: Andres Freund
Backpatch: 11-, where JIT compilation was added
2018-07-25 16:23:59 -07:00
Thomas Munro 2d30675952 Pad semaphores to avoid false sharing.
In a USE_UNNAMED_SEMAPHORES build, the default on Linux and FreeBSD
since commit ecb0d20a, we have an array of sem_t objects.  This
turned out to reduce performance compared to the previous default
USE_SYSV_SEMAPHORES on an 8 socket system.  Testing showed that the
lost performance could be regained by padding the array elements so
that they have their own cache lines.  This matches what we do for
similar hot arrays (see LWLockPadded, WALInsertLockPadded).

Back-patch to 10, where unnamed semaphores were adopted as the default
semaphore interface on those operating systems.

Author: Thomas Munro
Reviewed-by: Andres Freund
Reported-by: Mithun Cy
Tested-by: Mithun Cy, Tom Lane, Thomas Munro
Discussion: https://postgr.es/m/CAD__OugYDM3O%2BdyZnnZSbJprSfsGFJcQ1R%3De59T3hcLmDug4_w%40mail.gmail.com
2018-07-25 11:00:29 +12:00
Andres Freund b2bb3dc0e0 Defend against some potential spurious compiler warnings in 86eaf208e.
Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-AbCFeFU92GZZYqNOVRnPtUwczSYmR2NHCyf9uHUnNiw@mail.gmail.com
2018-07-24 10:10:22 -07:00
Michael Paquier 9ebe0572ce Refactor cluster_rel() to handle more options
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM.  As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.

This only reworks the API and its callers, without providing anything
user-facing.  Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions.  This could be used as well as a base to extend the
grammar of CLUSTER later on.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
2018-07-24 11:37:32 +09:00
Michael Paquier d9fadbf131 Fix calculation for WAL segment recycling and removal
Commit 4b0d28de06 has removed the prior checkpoint and related
facilities but has left WAL recycling based on the LSN of the prior
checkpoint, which causes incorrect calculations for WAL removal and
recycling for max_wal_size and min_wal_size.  This commit changes things
so as the base calculation point is the last checkpoint generated.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch: 11-, where the prior checkpoint has been removed.
2018-07-24 10:32:56 +09:00
Thomas Munro 1bc180cd2a Use setproctitle_fast() to update the ps status, if available.
FreeBSD has introduced a faster variant of setproctitle().  Use it,
where available.

Author: Thomas Munro
Discussion: https://postgr.es/m/CAEepm=1wKMTi81uodJ=1KbJAz5WedOg=cr8ewEXrUFeaxWEgww@mail.gmail.com
2018-07-24 13:09:22 +12:00
Andres Freund e9a9843e13 LLVMJIT: Adapt to API changes in gdb and perf support.
During the work of upstreaming my previous patches for gdb and perf
support the API changed. Adapt.  Normally this wouldn't necessarily be
something to backpatch, but the previous API wasn't upstream, and at
least the gdb support is quite useful for debugging.

Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
2018-07-22 21:13:34 -07:00
Andres Freund a38b833a7c LLVMJIT: Fix LLVM build for LLVM > 7.
The location of LLVMAddPromoteMemoryToRegisterPass moved.

Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
2018-07-22 21:13:02 -07:00
Andres Freund 1307bc3d45 Reset context at the tail end of JITed EEOP_AGG_PLAIN_TRANS.
While no negative consequences are currently known, it's clearly wrong
to not reset the context in one of the branches.

Reported-By: Dmitry Dolgov
Author: Dmitry Dolgov
Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com
Backpatch: 11-, where JIT compilation support was added
2018-07-22 20:31:22 -07:00
Michael Paquier e41d0a1090 Add proper errcodes to new error messages for read() failures
Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable
failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the
most sense here.

While on the way, fix one errcode_for_file_access missing in origin.c
since the code has been created, and remove one assignment of errno to 0
before calling read(), as this was around to fit with what was present
before 811b6e36 where errno would not be set when not enough bytes are
read.  I have noticed the first one, and Tom has pinged me about the
second one.

Author: Michael Paquier
Reported-by: Tom Lane
Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us
2018-07-23 09:37:36 +09:00
Michael Paquier 56df07bb9e Make more consistent some error messages for file-related operations
Some error messages which report something about a file operation use
as well context which is already provided within the path being worked
on, making things rather duplicated.  This creates more work for
translators, and does not actually bring clarity.

More could be done, however in a lot of cases the context used is
actually useful, still that patch gets down things with a good cut.

Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/20180718044711.GA8565@paquier.xyz
2018-07-23 09:19:12 +09:00
Andres Freund 6b4d860311 Fix JITed EEOP_AGG_INIT_TRANS, which missed some state.
The JIT compiled implementation missed maintaining
AggState->{current_set,curaggcontext}. That could lead to trouble
because the transition value could be allocated in the wrong context.

Reported-By: Rushabh Lathia
Diagnosed-By: Dmitry Dolgov
Author: Dmitry Dolgov, with minor changes by me
Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com
Backpatch: 11-, where JIT compilation support was added
2018-07-22 17:00:41 -07:00
Andres Freund 86eaf208ea Hand code string to integer conversion for performance.
As benchmarks show, using libc's string-to-integer conversion is
pretty slow. At least part of the reason for that is that strtol[l]
have to be more generic than what largely is required inside pg.

This patch considerably speeds up int2/int4 input (int8 already was
already using hand-rolled code).

Most of the existing pg_atoi callers have been converted. But as one
requires pg_atoi's custom delimiter functionality, and as it seems
likely that there's external pg_atoi users, it seems sensible to just
keep pg_atoi around.

Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de
2018-07-22 14:58:23 -07:00
Andres Freund 3522d0eaba Deduplicate "invalid input syntax" messages for various types.
Previously a lot of the error messages referenced the type in the
error message itself. That requires that the message is translated
separately for each type.

Note that currently a few smallint cases continue to reference the
integer, rather than smallint, type. A later patch will create a
separate routine for 16bit input.

Author: Andres Freund
Discussion: https://postgr.es/m/20180707200158.wpqkd7rjr4jxq5g7@alap3.anarazel.de
2018-07-22 14:58:01 -07:00
Michael Paquier 96cdeae07f Add toast tables to most system catalogs
It has been project policy to create toast tables only for those catalogs
that might reasonably need one.  Since this judgment call can change over
time, just create one for every catalog, as this can be useful when
creating rather-long entries in catalogs, with recent examples being in
the shape of policy expressions or customly-formatted SCRAM verifiers.

To prevent circular dependencies and to avoid adding complexity to VACUUM
FULL logic, exclude pg_class, pg_attribute, and pg_index.  Also, to
prevent pg_upgrade from seeing a non-empty new cluster, exclude
pg_largeobject and pg_largeobject_metadata from the set as large object
data is handled as user data.  Those relations have no reason to use a
toast table anyway.

Author: Joe Conway, John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/84ddff04-f122-784b-b6c5-3536804495f8@joeconway.com
2018-07-20 07:43:41 +09:00
Tom Lane 2409716755 Remove undocumented restriction against duplicate partition key columns.
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior.  Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.

There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.

Back-patch to v10 where this code was added.

Report and patch by Yugo Nagata.

Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp
2018-07-19 15:41:46 -04:00
Tom Lane 028e3da294 Fix pg_get_indexdef()'s behavior for included index columns.
The multi-argument form of pg_get_indexdef() failed to print anything when
asked to print a single index column that is an included column rather than
a key column.  This seems an unintentional result of someone having tried
to take a short-cut and use the attrsOnly flag for two different purposes.
To fix, split said flag into two flags, attrsOnly which suppresses
non-attribute info, and keysOnly which suppresses included columns.
Add a test case using psql's \d command, which relies on that function.

(It's mighty tempting at this point to replace pg_get_indexdef_worker's
mess of boolean flag arguments with a single bitmask-of-flags argument,
which would allow making the call sites much more self-documenting.
But I refrained for the moment.)

Discussion: https://postgr.es/m/21724.1531943735@sss.pgh.pa.us
2018-07-19 14:53:48 -04:00
Alvaro Herrera 1573995f55 Rewrite comments in replication slot advance implementation
The code added by 9c7d06d606 was a bit obscure; clarify that by
rewriting the comments.  Lack of clarity has already caused bugs, so
it's a worthy goal.

Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Michaël Paquier <michael@paquier.xyz>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com>
Discussion: https://postgr.es/m/87y3fgoyrn.fsf@ars-thinkpad
2018-07-19 14:15:44 -04:00
Alexander Korotkov 309765fa1e Fix handling of empty uncompressed posting list pages in GIN
PostgreSQL 9.4 introduces posting list compression in GIN.  This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly.  Underlying code appears to always
expect at least one item on uncompressed posting list page.  But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees.  This commit fixes that.

Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com
Author: Sivasubramanian Ramasubramanian, Alexander Korotkov
Backpatch-through: 9.4
2018-07-19 21:04:17 +03:00
Heikki Linnakangas 99fdebaf2d Rephrase a few comments for clarity.
I was confused by what "intended to be parallel serially" meant, until
Robert Haas and David G. Johnston explained it. Rephrase the comment to
make it more clear, using David's suggested wording.

Discussion: https://www.postgresql.org/message-id/1fec9022-41e8-e484-70ce-2179b08c2092%40iki.fi
2018-07-19 16:08:09 +03:00
Heikki Linnakangas 405cb356d6 Fix comment.
This comment was copy-pasted from nodeAppend.c to nodeMergeAppend.c, but
while committing 5220bb7533, I modified wrong copy of it.

Spotted by David Rowley
2018-07-19 15:39:06 +03:00
Heikki Linnakangas 5220bb7533 Expand run-time partition pruning to work with MergeAppend
This expands the support for the run-time partition pruning which was added
for Append in 499be013de to also allow unneeded subnodes of a MergeAppend
to be removed.

Author: David Rowley
Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com
2018-07-19 13:49:43 +03:00
Michael Paquier b33ef397a1 Fix print of Path nodes when using OPTIMIZER_DEBUG
GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5)
have gone missing.  The order of the Path nodes was inconsistent with
what is listed in nodes.h, so make the order consistent at the same time
to ease future checks and additions.

Author: Sawada Masahiko
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com
2018-07-19 09:54:39 +09:00
Michael Paquier c6598b8b05 Fix re-parameterize of MergeAppendPath
Instead of MergeAppendPath, MergeAppend nodes were considered.  This
code is not covered by any tests now, which should be addressed at some
point.

This is an oversight from f49842d, which introduced partition-wise joins
in v11, so back-patch down to that.

Author: Michael Paquier
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/20180718062202.GC8565@paquier.xyz
2018-07-19 09:01:57 +09:00
Tom Lane 701fd0bbc9 Drop the rule against included index columns duplicating key columns.
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here.  Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.

(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)

Hence, let's just drop the prohibition.

In passing, do some copy-editing on the documentation for the
included-column feature.

Yugo Nagata; documentation and test corrections by me

Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com
2018-07-18 14:43:03 -04:00
Tom Lane 3cb646264e Use a ResourceOwner to track buffer pins in all cases.
Historically, we've allowed auxiliary processes to take buffer pins without
tracking them in a ResourceOwner.  However, that creates problems for error
recovery.  In particular, we've seen multiple reports of assertion crashes
in the startup process when it gets an error while holding a buffer pin,
as for example if it gets ENOSPC during a write.  In a non-assert build,
the process would simply exit without releasing the pin at all.  We've
gotten away with that so far just because a failure exit of the startup
process translates to a database crash anyhow; but any similar behavior
in other aux processes could result in stuck pins and subsequent problems
in vacuum.

To improve this, institute a policy that we must *always* have a resowner
backing any attempt to pin a buffer, which we can enforce just by removing
the previous special-case code in resowner.c.  Add infrastructure to make
it easy to create a process-lifespan AuxProcessResourceOwner and clear
out its contents at appropriate times.  Replace existing ad-hoc resowner
management in bgwriter.c and other aux processes with that.  (Thus, while
the startup process gains a resowner where it had none at all before, some
other aux process types are replacing an ad-hoc resowner with this code.)
Also use the AuxProcessResourceOwner to manage buffer pins taken during
StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap
process or a standalone backend rather than a true auxiliary process.

In passing, remove some other ad-hoc resource owner creations that had
gotten cargo-culted into various other places.  As far as I can tell
that was all unnecessary, and if it had been necessary it was incomplete,
due to lacking any provision for clearing those resowners later.
(Also worth noting in this connection is that a process that hasn't called
InitBufferPoolBackend has no business accessing buffers; so there's more
to do than just add the resowner if we want to touch buffers in processes
not covered by this patch.)

Although this fixes a very old bug, no back-patch, because there's no
evidence of any significant problem in non-assert builds.

Patch by me, pursuant to a report from Justin Pryzby.  Thanks to
Robert Haas and Kyotaro Horiguchi for reviews.

Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
2018-07-18 12:15:16 -04:00
Heikki Linnakangas 6b387179ba Fix misc typos, mostly in comments.
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.

Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
2018-07-18 16:17:32 +03:00
Michael Paquier 94019c879a Fix more portability issues with casts to Size when using off_t
This should tame the beast, as there are no other places where off_t is
used in the new error messages.

Reported again by longfin, which complained about walsender.c while I
spotted the other two ones while double-checking.
2018-07-18 09:51:53 +09:00
Michael Paquier 8bd064f0c7 Fix casting in error message for two-phase file
This error from from 811b6e3, which causes compilation warnings with OSX
10.3 and clang.

Reported by Tom Lane, via buildfarm member longfin.
2018-07-18 09:11:34 +09:00
Michael Paquier 811b6e36a9 Rework error messages around file handling
Some error messages related to file handling are using the code path
context to define their state.  For example, 2PC-related errors are
referring to "two-phase status files", or "relation mapping file" is
used for catalog-to-filenode mapping, however those prove to be
difficult to translate, and are not more helpful than just referring to
the path of the file being worked on.  So simplify all those error
messages by just referring to files with their path used.  In some
cases, like the manipulation of WAL segments, the context is actually
helpful so those are kept.

Calls to the system function read() have also been rather inconsistent
with their error handling sometimes not reporting the number of bytes
read, and some other code paths trying to use an errno which has not
been set.  The in-core functions are using a more consistent pattern
with this patch, which checks for both errno if set or if an
inconsistent read is happening.

So as to care about pluralization when reading an unexpected number of
byte(s), "could not read: read %d of %zu" is used as error message, with
%d field being the output result of read() and %zu the expected size.
This simplifies the work of translators with less variations of the same
message.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz
2018-07-18 08:01:23 +09:00
Alvaro Herrera 1c04d4beea Revise BuildIndexValueDescription to simplify it
Getting a pg_index tuple from syscache when the open index relation is
available is pointless -- just use the one from relcache.

Noticed while reviewing code for cb9db2ab06.

No backpatch.
2018-07-16 20:20:15 -04:00
Alvaro Herrera cb9db2ab06 Fix ALTER TABLE...SET STATS error message for included columns
The existing error message was complaining that the column is not an
expression, which is not correct.  Introduce a suitable wording
variation and a test.

Co-authored-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/20180628182803.e4632d5a.nagata@sraoss.co.jp
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
2018-07-16 20:00:39 -04:00
Alvaro Herrera e353389d24 Fix partition pruning with IS [NOT] NULL clauses
The original code was unable to prune partitions that could not possibly
contain NULL values, when the query specified less than all columns in a
multicolumn partition key.  Reorder the if-tests so that it is, and add
more commentary and regression tests.

Reported-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: amul sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/CAFjFpRc7qjLUfXLVBBC_HAnx644sjTYM=qVoT3TJ840HPbsTXw@mail.gmail.com
2018-07-16 18:38:59 -04:00
Robert Haas 32df1c9afa Add subtransaction handling for table synchronization workers.
Since the old logic was completely unaware of subtransactions, a
change made in a subsequently-aborted subtransaction would still cause
workers to be stopped at toplevel transaction commit.  Fix that by
managing a stack of worker lists rather than just one.

Amit Khandekar and Robert Haas

Discussion: http://postgr.es/m/CAJ3gD9eaG_mWqiOTA2LfAug-VRNn1hrhf50Xi1YroxL37QkZNg@mail.gmail.com
2018-07-16 17:33:22 -04:00
Peter Eisentraut f7cb2842bf Add plan_cache_mode setting
This allows overriding the choice of custom or generic plan.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
2018-07-16 13:35:41 +02:00
Peter Eisentraut a06e56b247 doc: Update redirecting links
Update links that resulted in redirects.  Most are changes from http to
https, but there are also some other minor edits.  (There are still some
redirects where the target URL looks less elegant than the one we
currently have.  I have left those as is.)
2018-07-16 10:48:05 +02:00
Tom Lane 1007b0a126 Fix hashjoin costing mistake introduced with inner_unique optimization.
In final_cost_hashjoin(), commit 9c7f5229a allowed inner_unique cases
to follow a code path previously used only for SEMI/ANTI joins; but it
neglected to fix an if-test within that path that assumed SEMI and ANTI
were the only possible cases.  This resulted in a wrong value for
hashjointuples, and an ensuing bad cost estimate, for inner_unique normal
joins.  Fortunately, for inner_unique normal joins we can assume the number
of joined tuples is the same as for a SEMI join; so there's no need for
more code, we just have to invert the test to check for ANTI not SEMI.

It turns out that in two contrib tests in which commit 9c7f5229a
changed the plan expected for a query, the change was actually wrong
and induced by this estimation error, not by any real improvement.
Hence this patch also reverts those changes.

Per report from RK Korlapati.  Backpatch to v10 where the error was
introduced.

David Rowley

Discussion: https://postgr.es/m/CA+SNy03bhq0fodsfOkeWDCreNjJVjsdHwUsb7AG=jpe0PtZc_g@mail.gmail.com
2018-07-14 11:59:12 -04:00
Tom Lane 4984784f83 Fix crash in json{b}_populate_recordset() and json{b}_to_recordset().
As of commit 37a795a60, populate_recordset_worker() tried to pass back
(as rsi.setDesc) a tupdesc that it also had cached in its fn_extra.
But the core executor would free the passed-back tupdesc, risking a
crash if the function were called again in the same query.  The safest
and least invasive way to fix that is to make an extra tupdesc copy
to pass back.

While at it, I failed to resist the temptation to get rid of unnecessary
get_fn_expr_argtype() calls here and in populate_record_worker().

Per report from Dmitry Dolgov; thanks to Michael Paquier and
Andrew Gierth for investigation and discussion.

Discussion: https://postgr.es/m/CA+q6zcWzN9ztCfR47ZwgTr1KLnuO6BAY6FurxXhovP4hxr+yOQ@mail.gmail.com
2018-07-13 14:16:54 -04:00
Heikki Linnakangas 42f70cd9c3 Improve performance of tuple conversion map generation
Previously convert_tuples_by_name_map naively performed a search of each
outdesc column starting at the first column in indesc and searched each
indesc column until a match was found.  When partitioned tables had many
columns this could result in slow generation of the tuple conversion maps.
For INSERT and UPDATE statements that touched few rows, this could mean a
very large overhead indeed.

We can do a bit better with this loop.  It's quite likely that the columns
in partitioned tables and their partitions are in the same order, so it
makes sense to start searching for each column outer column at the inner
column position 1 after where the previous match was found (per idea from
Alexander Kuzmenkov). This makes the best case search O(N) instead of
O(N^2).  The worst case is still O(N^2), but it seems unlikely that would
happen.

Likewise, in the planner, make_inh_translation_list's search for the
matching column could often end up falling back on an O(N^2) type search.
This commit also improves that by first checking the column that follows
the previous match, instead of the column with the same attnum.  If we
fail to match here we fallback on the syscache's hashtable lookup.

Author: David Rowley
Reviewed-by: Alexander Kuzmenkov
Discussion: https://www.postgresql.org/message-id/CAKJS1f9-wijVgMdRp6_qDMEQDJJ%2BA_n%3DxzZuTmLx5Fz6cwf%2B8A%40mail.gmail.com
2018-07-13 19:54:05 +03:00
Tom Lane 130beba36d Fix inadequate buffer locking in FSM and VM page re-initialization.
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking.  There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.

This does not add any locking overhead in the normal code path where the
page is OK.  It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.

Problem noted by R P Asim.  It's been like this for a long time, so
back-patch to all supported branches.

Discussion: https://postgr.es/m/CANXE4Te4G0TGq6cr0-TvwP0H4BNiK_-hB5gHe8mF+nz0mcYfMQ@mail.gmail.com
2018-07-13 11:53:10 -04:00
Peter Eisentraut 3884072329 Prohibit transaction commands in security definer procedures
Starting and aborting transactions in security definer procedures
doesn't work.  StartTransaction() insists that the security context
stack is empty, so this would currently cause a crash, and
AbortTransaction() resets it.  This could be made to work by
reorganizing the code, but right now we just prohibit it.

Reported-by: amul sul <sulamul@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b96Gupt_LFL7uNyy3c50-wbhA68NUjiK5%3DrF6_w%3Dpq_T%3DQ%40mail.gmail.com
2018-07-13 10:41:32 +02:00
Thomas Munro e8d9caa436 Accept invalidation messages in InitializeSessionUserId().
If the authentication method modified the system catalogs through a
separate database connection (say, to create a new role on the fly),
make sure syscache sees the changes before we try to find the user.

Author: Thomas Munro
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3_h0_cgmz5PMyab4xk_OFrg6G5VCN%3DnF4chFXM9iFOqA%40mail.gmail.com
2018-07-13 16:19:15 +12:00
Michael Paquier ce89ad0fa0 Fix argument of pg_create_logical_replication_slot for slot name
All attributes and arguments using a slot name map to the data type
"name", but this function has been using "text".  This is cosmetic, as
even if text is used then the slot name would be truncated to 64
characters anyway and stored as such.  The documentation already said
so and the function already assumed that the argument was of this type
when fetching its value.

Bump catalog version.

Author: Sawada Masahiko
Discussion: https://postgr.es/m/CAD21AoADYz_-eAqH5AVFaCaojcRgwpo9PW=u8kgTMys63oB8Cw@mail.gmail.com
2018-07-13 09:32:12 +09:00
Michael Paquier 5fc1008e8a Clean up temporary WAL segments after an instance crash
Temporary WAL segments are created in pg_wal and named as xlogtemp.pid
before being renamed to the real deal when creating a new segment.  If
an instance crashes after the temporary segment is created and before
the rename is done, then the server would finish with unremovable data.

After an instance crash, scan pg_wal and remove any such segments.  With
repetitive unlucky crashes this would contribute to disk bloat and
presents risks of ENOSPC especially with max_wal_size close to the
maximum allowed.

Author: Michael Paquier
Reviewed-by: Yugo Nagata, Heikki Linnakangas
Discussion: https://postgr.es/m/20180514054955.GF1528@paquier.xyz
2018-07-13 06:43:20 +09:00
Peter Eisentraut 5e6e2c8773 Reset shmem_exit_inprogress after shmem_exit()
In ad9a274778, shmem_exit_inprogress was
introduced.  But we need to reset it after shmem_exit(), because unlike
the similar proc_exit(), shmem_exit() can also be called for cleanup
when the process will not exit.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
2018-07-12 20:22:17 +02:00
Alvaro Herrera cd073d8f70 Fix FK checks of TRUNCATE involving partitioned tables
When truncating a table that is referenced by foreign keys in
partitioned tables, the check to ensure the referencing table are also
truncated spuriously failed.  This is because it was relying on
relhastriggers as a proxy for the table having FKs, and that's wrong for
partitioned tables.  Fix it to consider such tables separately.  There
may be a better way ... but this code is pretty inefficient already.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquiër <michael@paquier.xyz>
Discussion: https://postgr.es/m/20180711000624.zmeizicibxeehhsg@alvherre.pgsql
2018-07-12 12:09:08 -04:00
Peter Eisentraut 8e599897ca Improve two error messages 2018-07-12 12:35:59 +02:00
Amit Kapila 40ca70ebcc Allow using the updated tuple while moving it to a different partition.
An update that causes the tuple to be moved to a different partition was
missing out on re-constructing the to-be-updated tuple, based on the latest
tuple in the update chain.  Instead, it's simply deleting the latest tuple
and inserting a new tuple in the new partition based on the old tuple.
Commit 2f17844104 didn't consider this case, so some of the updates were
getting lost.

In passing, change the argument order for output parameter in ExecDelete
and add some commentary about it.

Reported-by: Pavan Deolasee
Author: Amit Khandekar, with minor changes by me
Reviewed-by: Dilip Kumar, Amit Kapila and Alvaro Herrera
Backpatch-through: 11
Discussion: https://postgr.es/m/CAJ3gD9fRbEzDqdeDq1jxqZUb47kJn+tQ7=Bcgjc8quqKsDViKQ@mail.gmail.com
2018-07-12 12:51:39 +05:30
Michael Paquier edc6b41bd4 Rename VACOPT_NOWAIT to VACOPT_SKIP_LOCKED
When it comes to SELECT ... FOR or LOCK, NOWAIT means to not wait for
something to happen, and issue an error.  SKIP LOCKED means to not wait
for something to happen but to move on without issuing an error.  The
internal option of autovacuum and autoanalyze mentioned above, used only
when wraparound is not involved was named NOWAIT, but behaves like SKIP
LOCKED which is confusing.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20180307050345.GA3095@paquier.xyz
2018-07-12 14:28:28 +09:00
Michael Paquier 6551f3daa2 Add assertion in expand_vacuum_rel() for non-autovacuum path
The code path where the assertion is added helps to check that
autovacuum always includes a relation OID when doing a vacuum on it.
Extracted from a larger patch set to add support for SKIP LOCKED with
manual VACUUM commands.

Author: Nathan Bossart
Discussion: https://postgr.es/m/9EF7EBE4-720D-4CF1-9D0E-4403D7E92990@amazon.com
2018-07-12 13:50:17 +09:00
Michael Paquier 9a7b7adc13 Make logical WAL sender report streaming state appropriately
WAL senders sending logically-decoded data fail to properly report in
"streaming" state when starting up, hence as long as one extra record is
not replayed, such WAL senders would remain in a "catchup" state, which
is inconsistent with the physical cousin.

This can be easily reproduced by for example using pg_recvlogical and
restarting the upstream server.  The TAP tests have been slightly
modified to detect the failure and strengthened so as future tests also
make sure that a node is in streaming state when waiting for its
catchup.

Backpatch down to 9.4 where this code has been introduced.

Reported-by: Sawada Masahiko
Author: Simon Riggs, Sawada Masahiko
Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran
Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
2018-07-12 10:19:35 +09:00
Tom Lane 57cd2b6e6d Fix create_scan_plan's handling of sortgrouprefs for physical tlists.
We should only run apply_pathtarget_labeling_to_tlist if CP_LABEL_TLIST
was specified, because only in that case has use_physical_tlist checked
that the labeling will succeed; otherwise we may get an "ORDER/GROUP BY
expression not found in targetlist" error.  (This subsumes the previous
test about gating_clauses, because we reset "flags" to zero earlier
if there are gating clauses to apply.)

The only known case in which a failure can occur is with a ProjectSet
path directly atop a table scan path, although it seems likely that there
are other cases or will be such in future.  This means that the failure
is currently only visible in the v10 branch: 9.6 didn't have ProjectSet,
while in v11 and HEAD, apply_scanjoin_target_to_paths for some weird
reason is using create_projection_path not apply_projection_to_path,
masking the problem because there's a ProjectionPath in between.

Nonetheless this code is clearly wrong on its own terms, so back-patch
to 9.6 where this logic was introduced.

Per report from Regina Obe.

Discussion: https://postgr.es/m/001501d40f88$75186950$5f493bf0$@pcorp.us
2018-07-11 15:25:28 -04:00
Tom Lane ff4f889164 Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode.
nodeWindowAgg.c failed to cope with the possibility that no ordering
columns are defined in the window frame for GROUPS mode or RANGE OFFSET
mode, leading to assertion failures or odd errors, as reported by Masahiko
Sawada and Lukas Eder.  In RANGE OFFSET mode, an ordering column is really
required, so add an Assert about that.  In GROUPS mode, the code would
work, except that the node initialization code wasn't in sync with the
execution code about when to set up tuplestore read pointers and spare
slots.  Fix the latter for consistency's sake (even though I think the
changes described below make the out-of-sync cases unreachable for now).

Per SQL spec, a single ordering column is required for RANGE OFFSET mode,
and at least one ordering column is required for GROUPS mode.  The parser
enforced the former but not the latter; add a check for that.

We were able to reach the no-ordering-column cases even with fully spec
compliant queries, though, because the planner would drop partitioning
and ordering columns from the generated plan if they were redundant with
earlier columns according to the redundant-pathkey logic, for instance
"PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual.
While in principle that's an optimization that could save some pointless
comparisons at runtime, it seems unlikely to be meaningful in the real
world.  I think this behavior was not so much an intentional optimization
as a side-effect of an ancient decision to construct the plan node's
ordering-column info by reverse-engineering the PathKeys of the input
path.  If we give up redundant-column removal then it takes very little
code to generate the plan node info directly from the WindowClause,
ensuring that we have the expected number of ordering columns in all
cases.  (If anyone does complain about this, the planner could perhaps
be taught to remove redundant columns only when it's safe to do so,
ie *not* in RANGE OFFSET mode.  But I doubt anyone ever will.)

With these changes, the WindowAggPath.winpathkeys field is not used for
anything anymore, so remove it.

The test cases added here are not actually very interesting given the
removal of the redundant-column-removal logic, but they would represent
important corner cases if anyone ever tries to put that back.

Tom Lane and Masahiko Sawada.  Back-patch to v11 where RANGE OFFSET
and GROUPS modes were added.

Discussion: https://postgr.es/m/CAD21AoDrWqycq-w_+Bx1cjc+YUhZ11XTj9rfxNiNDojjBx8Fjw@mail.gmail.com
Discussion: https://postgr.es/m/153086788677.17476.8002640580496698831@wrigleys.postgresql.org
2018-07-11 12:07:20 -04:00
Alexander Korotkov edf59c40dd Fix more wrong paths in header comments
It appears that there are more files, whose header comment paths are
wrong.  So, fix those paths.  No backpatching per proposal of Tom Lane.

Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com
2018-07-11 17:57:04 +03:00
Alvaro Herrera f2c587067a Rethink how to get float.h in old Windows API for isnan/isinf
We include <float.h> in every place that needs isnan(), because MSVC
used to require it.  However, since MSVC 2013 that's no longer necessary
(cf. commit cec8394b5c), so we can retire the inclusion to a
version-specific stanza in win32_port.h, where it doesn't need to
pollute random .c files.  The header is of course still needed in a few
places for other reasons.

I (Álvaro) removed float.h from a few more files than in Emre's original
patch.  This doesn't break the build in my system, but we'll see what
the buildfarm has to say about it all.

Author: Emre Hasegeli
Discussion: https://postgr.es/m/CAE2gYzyc0+5uG+Cd9-BSL7NKC8LSHLNg1Aq2=8ubjnUwut4_iw@mail.gmail.com
2018-07-11 09:11:48 -04:00
Alexander Korotkov a01d0fa1d8 Fix wrong file path in header comment
Header comment of shm_mq.c was mistakenly specifying path to shm_mq.h.
It was introduced in ec9037df.  So, theoretically it could be
backpatched to 9.4, but it doesn't seem to worth it.
2018-07-11 13:16:46 +03:00
Thomas Munro f98b8476cd Use signals for postmaster death on FreeBSD.
Use FreeBSD 11.2's new support for detecting parent process death to
make PostmasterIsAlive() very cheap, as was done for Linux in an
earlier commit.

Author: Thomas Munro
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
2018-07-11 13:14:07 +12:00
Thomas Munro 9f09529952 Use signals for postmaster death on Linux.
Linux provides a way to ask for a signal when your parent process dies.
Use that to make PostmasterIsAlive() very cheap.

Based on a suggestion from Andres Freund.

Author: Thomas Munro, Heikki Linnakangas
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi
Discussion: https://postgr.es/m/20180411002643.6buofht4ranhei7k%40alap3.anarazel.de
2018-07-11 12:47:06 +12:00
Michael Paquier 56a7147213 Block replication slot advance for these not yet reserving WAL
Such replication slots are physical slots freshly created without WAL
being reserved, which is the default behavior, which have not been used
yet as WAL consumption resources to retain WAL.  This prevents advancing
a slot to a position older than any WAL available, which could falsify
calculations for WAL segment recycling.

This also cleans up a bit the code, as ReplicationSlotRelease() would be
called on ERROR, and improves error messages.

Reported-by: Kyotaro Horiguchi
Author: Michael Paquier
Reviewed-by: Andres Freund, Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180626071305.GH31353@paquier.xyz
2018-07-11 08:56:24 +09:00
Alvaro Herrera b6e3a3a492 Better handle pseudotypes as partition keys
We fail to handle polymorphic types properly when they are used as
partition keys: we were unnecessarily adding a RelabelType node on top,
which confuses code examining the nodes.  In particular, this makes
predtest.c-based partition pruning not to work, and ruleutils.c to emit
expressions that are uglier than needed.  Fix it by not adding RelabelType
when not needed.

In master/11 the new pruning code is separate so it doesn't suffer from
this problem, since we already fixed it (in essentially the same way) in
e5dcbb88a1, which also added a few tests; back-patch those tests to
pg10 also.  But since UPDATE/DELETE still uses predtest.c in pg11, this
change improves partitioning for those cases too.  Add tests for this.
The ruleutils.c behavior change is relevant in pg11/master too.

Co-authored-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/54745d13-7ed4-54ac-97d8-ea1eec95ae25@lab.ntt.co.jp
2018-07-10 15:19:40 -04:00
Peter Eisentraut bcbd940806 Remove dynamic_shared_memory_type=none
PostgreSQL nowadays offers some kind of dynamic shared memory feature on
all supported platforms.  Having the choice of "none" prevents us from
relying on DSM in core features.  So this patch removes the choice of
"none".

Author: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
2018-07-10 18:35:24 +02:00
Tom Lane 0905fe8911 Avoid emitting a bogus WAL record when recycling an all-zero btree page.
Commit fafa374f2 caused _bt_getbuf() to possibly emit a WAL record for
a page that it was about to recycle.  However, it failed to distinguish
all-zero pages from dead pages, which is important because only the
latter have valid btpo.xact values, or indeed any special space at all.
Recycling an all-zero page with XLogStandbyInfoActive() enabled therefore
led to an Assert failure, or to emission of a WAL record containing a
bogus cutoff XID, which might lead to unnecessary query cancellations
on hot standby servers.

Per reports from Antonin Houska and 自己.  Amit Kapila was first to
propose this fix, and Robert Haas, myself, and Kyotaro Horiguchi
reviewed it at various times.

This is an old bug, so back-patch to all supported branches.

Discussion: https://postgr.es/m/2628.1474272158@localhost
Discussion: https://postgr.es/m/48875502.f4a0.1635f0c27b0.Coremail.zoulx1982@163.com
2018-07-09 19:26:19 -04:00
Alvaro Herrera a22445ff0b Flip argument order in XLogSegNoOffsetToRecPtr
Commit fc49e24fa6 added an input argument after the existing output
argument.  Flip those.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20180708182345.imdgovmkffgtihhk@alvherre.pgsql
2018-07-09 14:33:38 -04:00
Peter Eisentraut ec67b89816 Add UtilityReturnsTuples() support for CALL
This ensures that prepared statements for CALL can return tuples.
2018-07-09 13:58:08 +02:00
Michael Paquier cbc55da556 Rework order of end-of-recovery actions to delay timeline history write
A critical failure in some of the end-of-recovery actions before the
end-of-recovery record is written can cause PostgreSQL to react
inconsistently with the rest of the cluster in the event of a crash
before the final record is written.  Two such failures are for example
an error while processing a two-phase state files or when operating on
recovery.conf.  With this commit, the failures are still considered
FATAL, but the write of the timeline history file is delayed as much as
possible so as the window between the moment the file is written and the
end-of-recovery record is generated gets minimized. This way, in the
event of a crash or a failure, the new timeline decided at promotion
will not seem taken by other nodes in the cluster.  It is not really
possible to reduce to zero this window, hence one could still see
failures if a crash happens between the history file write and the
end-of-recovery record, so any future code should be careful when
adding new end-of-recovery actions.  The original report from Magnus
Hagander mentioned a renamed recovery.conf as original end-of-recovery
failure which caused a timeline to be seen as taken but the subsequent
processing on the now-missing recovery.conf cause the startup process to
issue stop on FATAL, which at follow-up startup made the system
inconsistent because of on-disk changes which already happened.

Processing of two-phase state files still needs some work as corrupted
entries are simply ignored now.  This is left as a future item and this
commit fixes the original complain.

Reported-by: Magnus Hagander
Author: Heikki Linnakangas
Reviewed-by: Alexander Korotkov, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com
2018-07-09 10:22:34 +09:00
Peter Geoghegan e915fed291 Correct obsolete unique index insertion comment.
Commit bc292937ae failed to update a comment about unique index
checking.  _bt_insertonpg() is no longer responsible for finding an
insertion location while preventing conflicting insertions.
2018-07-08 10:50:13 -07:00
Michael Paquier 677da8c15d Use access() to check file existence in GetNewRelFileNode()
Previous code used BasicOpenFile() and close() just to check for a file
collision, while there is no need to hold open a file descriptor but
that's an overkill here.

Author: Paul Guo
Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CABQrizcUtiHaquxK=d4etBX8GF9kbZB50Nt1gO9_aN-e9SptyQ@mail.gmail.com
2018-07-08 18:53:20 +09:00
Peter Eisentraut 0903bbdad2 Add separate error message for procedure does not exist
While we probably don't want to split up all error messages into
function and procedure variants, this one is a very prominent one, so
it's helpful to be more specific here.
2018-07-07 11:17:04 +02:00
Peter Eisentraut 2e78c5b522 Fix assert in nested SQL procedure call
When executing CALL in PL/pgSQL, we need to set a snapshot before
invoking the to-be-called procedure.  Otherwise, the to-be-called
procedure might end up running without a snapshot.  For LANGUAGE SQL
procedures, this would result in an assertion failure.  (For most other
languages, this is usually not a problem, because those use SPI and SPI
sets snapshots in most cases.)  Setting the snapshot restores the
behavior of how CALL worked when it was handled as a generic SQL
statement in PL/pgSQL (exec_stmt_execsql()).

This change revealed another problem:  In SPI_commit(), we popped the
active snapshot before committing the transaction, to avoid "snapshot %p
still active" errors.  However, there is no particular reason why only
at most one snapshot should be on the stack.  So change this to pop all
active snapshots instead of only one.
2018-07-06 23:25:44 +02:00
Peter Eisentraut e34ec13620 Allow CALL with polymorphic type arguments
In order to be able to resolve polymorphic types, we need to set fn_expr
before invoking the procedure.
2018-07-06 22:40:16 +02:00
Alvaro Herrera 0ce5cf2ef2 Allow replication slots to be dropped in single-user mode
Starting with commit 9915de6c1c, replication slot drop uses a
condition variable sleep to wait until the current user of the slot goes
away.  This is more user friendly than the previous behavior of erroring
out if the slot is in use, but it fails with a not-for-user-consumption
error message in single-user mode; plus, if you're using single-user
mode because you don't want to start the server in the regular mode
(say, disk is full and WAL won't recycle because of the slot), it's
inconvenient.

Fix by skipping the cond variable sleep in single-user mode, since
there can't be anybody to wait for anyway.

Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3b2f809f-326c-38dd-7a9e-897f957a4eb1@enterprisedb.com
2018-07-06 16:38:30 -04:00
Andrew Dunstan 8fb68aa265 Print DEBUG2 like that rather than as DEBUG
DEBUG is an alias for DEBUG2, but we want DEBUG2 to show in the settings
no matter how it was spelled.

Takeshi Ideriha

Discussion: https://postgr.es/m/4E72940DA2BF16479384A86D54D0988A5678EC03@G01JPEXMBKW04
2018-07-06 07:29:12 -04:00
Alvaro Herrera 3ca966c06f logical decoding: beware of an unset specinsert change
Coverity complains that there is no protection in the code (at least in
non-assertion-enabled builds) against speculative insertion failing to
follow the expected protocol.  Add an elog(ERROR) for the case.
2018-07-05 17:42:37 -04:00
Michael Paquier 3c64dcb1e3 Prevent references to invalid relation pages after fresh promotion
If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position older than the first end-of-recovery checkpoint while
all the WAL available should be replayed.  This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed.  When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed.  Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.

Pavan Deolasee has reported and diagnosed the failure in the first
place, and the base fix idea to rely on the local copy of
minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me.  The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to make it faster and more reliable with sleep phases.

Backpatch down to all supported versions where the bug appears, aka 9.3
which is where the end-of-recovery checkpoint is not run by the startup
process anymore.  The test gets easily supported down to 10, still it
has been tested on all branches.

Reported-by: Pavan Deolasee
Diagnosed-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
2018-07-05 10:46:18 +09:00
Andres Freund 249126e761 Use context with correct lifetime in hypothetical_dense_rank_final.
The query lifetime expression context created in
hypothetical_dense_rank_final() was buggily allocated in the calling
memory context. I (Andres) broke that in bf6c614a2f.

Reported-By: Rajkumar Raghuwanshi
Author: Amit Langote
Discussion:  https://postgr.es/m/CAKcux6kmzWmur5HhA_aU6gYVFu0RLQdgJJ+aC9SLdcOvBSrpfA@mail.gmail.com
Backpatch: 11-
2018-07-04 17:36:01 -07:00
Andres Freund 3a01f68e35 Check for interrupts inside the nbtree page deletion code.
When deleting pages the nbtree code has to walk through siblings of a
tree node. When those sibling links are corrupted that can lead to
endless loops - which are currently not interruptible.  This is
especially problematic if autovacuum is repeatedly blocked on such
indexes, as it can be hard to get out of that situation without
resorting to single user mode.

Thus add interrupt checks to appropriate places in such
loops. Unfortunately in one of the cases it's it's not easy to do so.

Between 9.3 and 9.4 the page deletion (and page split) code changed
significantly. Before it was significantly less robust against
interruptions. Therefore don't backpatch to 9.3.

Author: Andres Freund
Discussion: https://postgr.es/m/20180627191629.wkunw2qbibnvlz53@alap3.anarazel.de
Backpatch: 9.4-
2018-07-04 14:58:25 -07:00
Fujii Masao b41669118c Improve the performance of relation deletes during recovery.
When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.

To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.

This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
2018-07-05 02:23:46 +09:00
Michael Paquier fc057b2b8f Remove dead code for temporary relations in partition planning
Since recent commit 1c7c317c, temporary relations cannot be mixed with
permanent relations within the same partition tree, and the same counts
for temporary relations created by other sessions, which the planner
simply discarded.  Instead be paranoid and issue an error, as those
should be blocked at definition time, at least for now.

At the same time, a test case is added to stress what has been moved
when expand_partitioned_rtentry gets called recursively but bumps on a
partitioned relation with no partitions which should be handled the same
way as the non-inheritance case.  This code may be reworked in a close
future, and covering this code path will limit surprises.

Reported-by: David Rowley
Author: David Rowley
Reviewed-by: Amit Langote, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f_HyV1txn_4XSdH5EOhBMYaCwsXyAj6bHXk9gOu4JKsbw@mail.gmail.com
2018-07-04 10:37:40 +09:00
Michael Paquier c55de5e512 Add wait event for fsync of WAL segments
This has been visibly a forgotten spot in the first implementation of
wait events for I/O added by 249cf07, and what has been missing is a
fsync call for WAL segments which is a wrapper reacting on the value of
GUC wal_sync_method.

Reported-by: Konstantin Knizhnik
Author: Konstantin Knizhnik
Reviewed-by: Craig Ringer, Michael Paquier
Discussion: https://postgr.es/m/4a243897-0ad8-f471-aa40-242591f2476e@postgrespro.ru
2018-07-02 22:19:46 +09:00
Michael Paquier c072e80337 Correct function name in comment of logical decoding code
Reported-by: Dave Cramer
Author: Euler Taveira
Discussion: https://postgr.es/m/CADK3HHKnPGJDLhjOFBY6+70Wd14iEH8c2GKw7UrOuUHp_GNFrA@mail.gmail.com
2018-07-02 13:30:12 +09:00
Andrew Dunstan 1e9c858090 pgindent run prior to branching 2018-06-30 12:25:49 -04:00
Alvaro Herrera 41372071df Fix crash when ALTER TABLE recreates indexes on partitions
The skip_build flag was not being passed correctly when recursing to
indexes on partitions, leading to attempts to rebuild indexes when they
were not yet ready to be rebuilt.

Reported-by: Rajkumar Raghuwanshi
Discussion: https://postgr.es/m/CAKcux6mxNCGsgATwf5CGMF8g4WSupCXicCVMeKUTuWbyxHOMsQ@mail.gmail.com
2018-06-29 11:49:30 -04:00
Michael Paquier dad5f8a3d5 Make capitalization of term "OpenSSL" more consistent
This includes code comments and documentation.  No backpatch as this is
cosmetic even if there are documentation changes which are user-facing.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/BB89928E-2BC7-489E-A5E4-6D204B3954CF@yesql.se
2018-06-29 09:45:44 +09:00
Amit Kapila 2e61c50785 Fix thinko in comments.
A slot can not be stored in a tuple but it's vice versa.

Reported-by: Ashutosh Bapat
Author: Ashutosh Bapat
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAFjFpRcHhNhXdegyJv3KKDWrwO1_NB_KYZM_ZSDeMOZaL1A5jQ@mail.gmail.com
2018-06-27 18:05:24 +05:30
Andres Freund 986070872f Remove duplicated return statement from llvmjit code.
The duplicated return clearly doesn't make sense / isn't
reachable. Likely introduced by me (Andres), while revising the code.

Author: Rushabh Lathia
Discussion: https://postgr.es/m/CAGPqQf2raxWOcbuTP36M1rEF3=Rfo7oD29K3psdyHMeE5swBRg@mail.gmail.com
2018-06-26 23:16:50 -07:00
Amit Kapila 8121ab88e7 Cosmetic improvements for faster column addition.
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.

Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
2018-06-27 08:16:13 +05:30
Alvaro Herrera f49a80c481 Fix "base" snapshot handling in logical decoding
Two closely related bugs are fixed.  First, xmin of logical slots was
advanced too early.  During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them.  The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN.  This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.

The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.

Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.

Liberally sprinkle code comments, and rewrite a few existing ones.  This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
2018-06-26 16:48:10 -04:00
Alexander Korotkov 4d54543efa Fix upper limit for vacuum_cleanup_index_scale_factor
6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX.  DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.

This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.

Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com
2018-06-26 21:55:59 +03:00
Peter Geoghegan aefb0a382c Correct a comment on logtape.c's leader tape.
randomAccess parallel tuplesorts are disallowed because the leader would
try to write to its own leader tape, not because the leader would try to
write to a worker tape directly.

Cleanup from commit 9da0cc3528.
2018-06-26 11:16:20 -07:00
Peter Geoghegan cdc2693a11 Remove obsolete comment block in nbtsort.c.
Building a new nbtree index through incremental insertions would always
be slower than our actual approach of sorting using tuplesort,
assembling leaf pages from tuplesort output, and writing and WAL-logging
whole pages.  Remove a comment block from the Berkeley days claiming
that incremental insertions might be slightly faster with presorted
input.

Discussion: https://postgr.es/m/CAH2-WzmKs4mLAoFgJ3yHMRYc849efc=dw+pNRb3NEog2oJoCNw@mail.gmail.com
2018-06-26 10:08:44 -07:00
Alvaro Herrera 040da42367 Enable failure to rename a partitioned index
Concurrently with partitioned index development (commit 8b08f7d482),
the code to handle failure to rename indexes was refactored (commit
8b9e9644dc).  Turns out that that particular case was untested, which
naturally led it to be broken.  Add tests and the missing code line.

Co-authored-by: David Rowley <dgrowley@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6mfYMS3OX0ywjOiWiGSEKhJf-1zdeTceHFbd0mScUzU5A@mail.gmail.com
2018-06-26 11:54:45 -04:00
Alvaro Herrera 7d872c91a3 Allow direct lookups of AppendRelInfo by child relid
find_appinfos_by_relids had quite a large overhead when the number of
items in the append_rel_list was high, as it had to trawl through the
append_rel_list looking for AppendRelInfos belonging to the given
childrelids.  Since there can only be a single AppendRelInfo for each
child rel, it seems much better to store an array in PlannerInfo which
indexes these by child relid, making the function O(1) rather than O(N).
This function was only called once inside the planner, so just replace
that call with a lookup to the new array.  find_childrel_appendrelinfo
is now unused and thus removed.

This fixes a planner performance regression new to v11 reported by
Thomas Reiss.

Author: David Rowley
Reported-by: Thomas Reiss
Reviewed-by: Ashutosh Bapat
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com
2018-06-26 10:35:26 -04:00
Alexander Korotkov 6ca33a885b Increase upper limit for vacuum_cleanup_index_scale_factor
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36.  However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).

vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics.  And after detailed consideration it appears
that risk of stalled index statistics is low.  And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor.  So, set upper limit for these
GUC and reloption to DBL_MAX.

Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
2018-06-26 15:00:51 +03:00
Thomas Munro a40cff8956 Move RecoveryLockList into a hash table.
Standbys frequently need to release all locks held by a given xid.
Instead of searching one big list linearly, let's create one list
per xid and put them in a hash table, so we can find what we need
in O(1) time.

Earlier analysis and a prototype were done by David Rowley, though
this isn't his patch.

Back-patch all the way.

Author: Thomas Munro
Diagnosed-by: David Rowley, Andres Freund
Reviewed-by: Andres Freund, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAEepm%3D1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA%40mail.gmail.com
Discussion: https://postgr.es/m/CAKJS1f9vJ841HY%3DwonnLVbfkTWGYWdPN72VMxnArcGCjF3SywA%40mail.gmail.com
2018-06-26 18:45:45 +12:00
Alvaro Herrera 322548a8ab Update obsolete comments
Commit 9fab40ad32 removed some pre-allocating logic in
reorderbuffer.c, but left outdated comments in place.  Repair.

Author: Álvaro Herrera
2018-06-25 15:45:48 -04:00
Peter Eisentraut 299addd592 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 884f33d735870f94357820800840af3e93ff4628
2018-06-25 12:37:18 +02:00
Michael Paquier 6cb3372411 Address set of issues with errno handling
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set.  Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.

This patch uses the brute-force approach of correcting all those code
paths.  Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.

Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
2018-06-25 11:19:05 +09:00
Alvaro Herrera 475be5e790 When index recurses to a partition, map columns numbers
Two out of three code paths were mapping column numbers correctly if a
partition had different column numbers than parent table, but the most
commonly used one (recursing in CREATE INDEX to a new index on a
partition) failed to map attribute numbers in expressions.  Oddly
enough, attnums in WHERE clauses are already handled correctly
everywhere.

Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp
Reviewed-by: Álvaro Herrera
2018-06-22 16:45:48 -04:00
Robert Haas c6f28af5d7 Avoid generating bogus paths with partitionwise aggregate.
Previously, if some or all partitions had no partially aggregated path,
we would still try to generate a partially aggregated path for the
parent, leading to assertion failures or wrong answers.

Report by Rajkumar Raghuwanshi.  Patch by Jeevan Chalke, reviewed
by Ashutosh Bapat.  A few changes by me.

Discussion: http://postgr.es/m/CAKcux6=q4+Mw8gOOX16ef6ZMFp9Cve7KWFstUsrDa4GiFaXGUQ@mail.gmail.com
2018-06-22 09:20:19 -04:00
Andrew Dunstan 2448adf29c Allow for pg_upgrade of attributes with missing values
Commit 16828d5c02 neglected to do this, so upgraded databases would
silently get null instead of the specified default in rows without the
attribute defined.

A new binary upgrade function is provided to perform this and pg_dump is
adjusted to output a call to the function if required in binary upgrade
mode.

Also included is code to drop missing attribute values for dropped
columns. That way if the type is later dropped the missing value won't
have a dangling reference to the type.

Finally the regression tests are adjusted to ensure that there is a row
with a missing value so that this code is exercised in upgrade testing.

Catalog version unfortunately bumped.

Regression test changes from Tom Lane.
Remainder from me, reviewed by Tom Lane, Andres Freund, Alvaro Herrera

Discussion: https://postgr.es/m/19987.1529420110@sss.pgh.pa.us
2018-06-22 08:42:36 -04:00
Alexander Korotkov 9a994e37e0 Fixes for vacuum_cleanup_index_scale_factor GUC option
vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs.  However, it affects not only autovacuum, but also manually run
VACUUM.  It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.

Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample.  So, add it there with appropriate comment.

Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com
2018-06-22 12:26:21 +03:00
Michael Paquier 0aa5e65ab4 Fix typo in comment of commit_ts.c for incorrect reference to CLOG
Author: Shao Bret
2018-06-22 13:30:26 +09:00
Amit Kapila 98d476a965 Improve coding pattern in Parallel Append code.
The create_append_path code didn't consider that list_concat will
modify it's first argument leading to inconsistent traversal of
resulting list.  In practice, it won't lead to any user-visible bug
but changing it for making the code behave consistently.

Reported-by: Tom Lane
Author: Tom Lane
Reviewed-by: Amit Khandekar and Amit Kapila
Discussion: https://postgr.es/m/32365.1528994120@sss.pgh.pa.us
2018-06-22 08:43:36 +05:30
Tom Lane ec4719cd15 Fix partial aggregation for variance(int4) and related aggregates.
A typo in numeric_poly_combine caused bogus results for queries using
it, but of course would only manifest if parallel aggregation is
performed.  Reported by Rajkumar Raghuwanshi.

David Rowley did the diagnosis and the fix; I editorialized rather
heavily on his regression test additions.

Back-patch to v10 where the breakage was introduced (by 9cca11c91).

Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com
2018-06-21 16:18:39 -04:00
Alvaro Herrera e474c2b7e4 Set correct context for XPath evaluation
According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node.  This becomes visible when a relative path rather than
absolute is used as row expression.  Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.

Other functions such as xpath() and xpath_exists() also have this
problem.  While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too.  However, this is a backwards-incompatible change.

No backpatch, out of fear of breaking code depending on the original
broken behavior.

Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
2018-06-21 15:56:11 -04:00
Tom Lane 07e5a21352 Fix mishandling of sortgroupref labels while splitting SRF targetlists.
split_pathtarget_at_srfs() neglected to worry about sortgroupref labels
in the intermediate PathTargets it constructs.  I think we'd supposed
that their labeling didn't matter, but it does at least for the case that
GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet
step(s).  This results in "ERROR: ORDER/GROUP BY expression not found in
targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi.

To fix, make this logic track the sortgroupref labeling of expressions,
not just their contents.  This also restores the pre-v10 behavior that
separate GROUP BY expressions will be kept distinct even if they are
textually equal().

Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com
2018-06-21 10:58:42 -04:00
Alvaro Herrera b7f0be9a7e Accept TEXT and CDATA nodes in XMLTABLE's column_expression.
Column expressions that match TEXT or CDATA nodes must return the
contents of the nodes themselves, not the content of non-existing
children (i.e. the empty string).

Author: Markus Winand
Reported-by: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
2018-06-20 12:58:12 -04:00
Alvaro Herrera 8f97af60d1 Consistently use the term 'partitioned rel' in partprune comments
We were using 'partition rel' in a few places, which is quite confusing.

Author: Amit Langote
Reviewed-by: David Rowley
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/fd256561-31a2-4b7e-cd84-d8241e7ebc3f@lab.ntt.co.jp
2018-06-20 11:43:01 -04:00
Amit Kapila 403318b71f Don't consider parallel append for parallel unsafe paths.
Commit ab72716778 allowed Parallel Append paths to be generated for a
relation that is not parallel safe.  Prevent that from happening.

Initial analysis by Tom Lane.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Kapila and Rajkumar Raghuwanshi
Reviewed-by: Amit Khandekar and Robert Haas
Discussion:https://postgr.es/m/CAKcux6=tPJ6nJ08r__nU_pmLQiC0xY15Fn0HvG1Cprsjdd9s_Q@mail.gmail.com
2018-06-20 07:51:42 +05:30
Michael Paquier 1c7c317cd9 Clarify use of temporary tables within partition trees
Since their introduction, partition trees have been a bit lossy
regarding temporary relations.  Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.

Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence.  This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message.  There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature.  Partitions respect 2), 3) and 4) already.

It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.

Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there.  Documentation also
includes limitations related to the use of temporary tables with
partition trees.

Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
2018-06-20 10:42:25 +09:00
Tom Lane 45e98ee730 Remove obsolete prohibition on function name matching a column name.
ProcedureCreate formerly threw an error if the function to be created
has one argument of composite type and the function name matches some
column of the composite type.  This was a (very non-bulletproof) defense
against creating situations where f(x) and x.f are ambiguous.  But we
don't really need such a defense in the wake of commit b97a3465d, which
allows us to deal with such situations fairly cleanly.  This behavior
also created a dump-and-reload hazard, since a function might be
rejected if a conflicting column name had been added to the input
composite type later.  Hence, let's just drop the check.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com
2018-06-18 11:57:33 -04:00
Tom Lane b97a3465d7 Consider syntactic form when disambiguating function vs column reference.
Postgres has traditionally considered the syntactic forms f(x) and x.f
to be equivalent, allowing tricks such as writing a function and then
using it as though it were a computed-on-demand column.  However, our
behavior when both interpretations are feasible left something to be
desired: we always chose the column interpretation.  This could lead
to very surprising results, as in a recent bug report from Neil Conway.
It also created a dump-and-reload hazard, since what was a function
call in a dumped view could get interpreted as a column reference
at reload, if a matching column name had been added to the underlying
table since the view was created.

What seems better, in ambiguous situations, is to prefer the choice
matching the syntactic form of the reference.  This seems much less
astonishing in general, and it fixes the dump/reload hazard.

Although this could be called a bug fix, there have been few complaints
and there's some small risk of breaking applications that depend on the
old behavior, so no back-patch.  It does seem reasonable to slip it
into v11, though.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com
2018-06-18 11:39:33 -04:00
Michael Paquier 70b4f82a4b Prevent hard failures of standbys caused by recycled WAL segments
When a standby's WAL receiver stops reading WAL from a WAL stream, it
writes data to the current WAL segment without having priorily zero'ed
the page currently written to, which can cause the WAL reader to read
junk data from a past recycled segment and then it would try to get a
record from it.  While sanity checks in place provide most of the
protection needed, in some rare circumstances, with chances increasing
when a record header crosses a page boundary, then the startup process
could fail violently on an allocation failure, as follows:
FATAL:  invalid memory alloc request size XXX

This is confusing for the user and also unhelpful as this requires in
the worst case a manual restart of the instance, impacting potentially
the availability of the cluster, and this also makes WAL data look like
it is in a corrupted state.

The chances of seeing failures are higher if the connection between the
standby and its root node is unstable, causing WAL pages to be written
in the middle.  A couple of approaches have been discussed, like
zero-ing  new WAL pages within the WAL receiver itself but this has the
disadvantage of impacting performance of any existing instances as this
breaks the sequential writes done by the WAL receiver.  This commit
deals with the problem with a more simple approach, which has no
performance impact without reducing the detection of the problem: if a
record is found with a length higher than 1GB for backends, then do not
try any allocation and report a soft failure which will force the
standby to retry reading WAL.  It could be possible that the allocation
call passes and that an unnecessary amount of memory is allocated,
however follow-up checks on records would just fail, making this
allocation short-lived anyway.

This patch owes a great deal to Tsunakawa Takayuki for reporting the
failure first, and then discussing a couple of potential approaches to
the problem.

Backpatch down to 9.5, which is where palloc_extended has been
introduced.

Reported-by: Tsunakawa Takayuki
Reviewed-by: Tsunakawa Takayuki
Author: Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
2018-06-18 10:43:27 +09:00
Tom Lane 9b53d96684 Suppress -Wshift-negative-value warnings.
Clean up four places that result in compiler warnings when using recent
gcc with this warning class enabled (as seen on buildfarm members
calliphoridae, skink, and others).  In all these places, this is purely
cosmetic, because the shift distance could not be large enough to risk
a change of sign, so there's no chance of implementation-dependent
behavior.  Still, it's easy enough to avoid the warning by casting the
shifted value to unsigned, so let's do that.

Patch HEAD only, this isn't worth a back-patch.
2018-06-17 16:15:11 -04:00
Tom Lane 0dcf68e5a1 Fix some minor error-checking oversights in ParseFuncOrColumn().
Recent additions to ParseFuncOrColumn to make it reject non-procedure
functions in CALL were neither adequate nor documented.  Reorganize
the code to ensure uniform results for all the cases that should be
rejected.  Also, use ERRCODE_WRONG_OBJECT_TYPE for this case as well
as the converse case of a procedure in a non-CALL context.  The
original coding used ERRCODE_UNDEFINED_FUNCTION which seems wrong,
and is certainly inconsistent with the adjacent wrong-kind-of-routine
errors.

This reorganization also causes the checks for aggregate decoration with
a non-aggregate function to be made in the FUNCDETAIL_COERCION case;
that they were not is a long-standing oversight.

Discussion: https://postgr.es/m/14497.1529089235@sss.pgh.pa.us
2018-06-16 14:11:14 -04:00
Simon Riggs 15378c1a15 Remove AELs from subxids correctly on standby
Issues relate only to subtransactions that hold AccessExclusiveLocks
when replayed on standby.

Prior to PG10, aborting subtransactions that held an
AccessExclusiveLock failed to release the lock until top level commit or
abort. 49bff5300d fixed that.

However, 49bff5300d also introduced a similar bug where subtransaction
commit would fail to release an AccessExclusiveLock, leaving the lock to
be removed sometimes early and sometimes late. This commit fixes
that bug also. Backpatch to PG10 needed.

Tested by observation. Note need for multi-node isolationtester to improve
test coverage for this and other HS cases.

Reported-by: Simon Riggs
Author: Simon Riggs
2018-06-16 14:03:29 +01:00
Tatsuo Ishii 1cfdb1cb0e Fix memory leak in BufFileCreateShared().
Also this commit unifies some duplicated code in makeBufFile() and
BufFileOpenShared() into new function makeBufFileCommon().

Author: Antonin Houska
Reviewed-By: Thomas Munro, Tatsuo Ishii
Discussion: https://postgr.es/m/16139.1529049566%40localhost
2018-06-16 14:21:08 +09:00
Alvaro Herrera ff03112bdc Fix off-by-one bug in XactLogCommitRecord
Commit 1eb6d6527a introduced zeroed alignment bytes in the GID field
of commit/abort WAL records.  Fixup commit cf5a189059 later changed
that representation into a regular cstring with a single terminating
zero byte, but it also introduced an off-by-one mistake.  Fix that.

Author: Nikhil Sontakke
Reported-by: Nikhil Sontakke
Discussion: https://postgr.es/m/CAMGcDxey6dG1DP34_tJMoWPcp5sPJUAL4K5CayUUXLQSx2GQpA@mail.gmail.com
2018-06-15 15:00:41 -04:00
Tatsuo Ishii 969274d813 Fix memory leak.
Memory is allocated twice for "file" and "files" variables in
BufFileOpenShared().

Author: Antonin Houska
Discussion: https://postgr.es/m/11329.1529045692%40localhost
2018-06-15 16:32:59 +09:00
Alvaro Herrera 74da7cda31 Fail BRIN control functions during recovery explicitly
They already fail anyway, but prior to this patch they raise an ugly
error message about a lock that cannot be acquired.  This just improves
the message.

Author: Masahiko Sawada
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com
Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera
2018-06-14 12:51:32 -04:00
Simon Riggs dc878ffedf Remove spurious code comments in standby related code
GetRunningTransactionData() suggested that subxids were not worth
optimizing away if overflowed, yet they have already been removed
for that case.

Changes to LogAccessExclusiveLock() API forgot to remove the
prior comment when it was copied to LockAcquire().
2018-06-14 12:17:51 +01:00
Simon Riggs 802bde87ba Remove cut-off bug from RunningTransactionData
32ac7a118f tried to fix a Hot Standby issue
reported by Greg Stark, but in doing so caused
a different bug to appear, noted by Andres Freund.

Revoke the core changes from 32ac7a118f,
leaving in its place a minor change in code
ordering and comments to explain for the future.
2018-06-14 12:02:41 +01:00
Tom Lane 91781335ed Code review for match_clause_to_partition_key().
Fix inconsistent decisions about NOMATCH vs UNSUPPORTED result codes.
If we're going to cater for partkeys that have the same expression and
different collations, surely we should also support partkeys with the
same expression and different opclasses.

Clean up shaky handling of commuted opclauses, eg checking the wrong
operator to see what its negator is.  This wouldn't cause any actual
bugs given a sane opclass definition, but it doesn't seem helpful to
expend more code to be less correct.

Improve handling of null elements in ScalarArrayOp arrays: in the
"op ALL" case, we can conclude they result in an unsatisfiable clause.

Minor cosmetic changes and comment improvements.
2018-06-13 16:10:30 -04:00
Tom Lane 19832753f1 Fix some ill-chosen names for globally-visible partition support functions.
"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.
2018-06-13 13:18:02 -04:00
Tom Lane e23bae82cf Fix up run-time partition pruning's use of relcache's partition data.
The previous coding saved pointers into the partitioned table's relcache
entry, but then closed the relcache entry, causing those pointers to
nominally become dangling.  Actual trouble would be seen in the field
only if a relcache flush occurred mid-query, but that's hardly out of
the question.

While we could fix this by copying all the data in question at query
start, it seems better to just hold the relcache entry open for the
whole query.

While at it, improve the handling of support-function lookups: do that
once per query not once per pruning test.  There's still something to be
desired here, in that we fail to exploit the possibility of caching data
across queries in the fn_extra fields of the relcache's FmgrInfo structs,
which could happen if we just used those structs in-place rather than
copying them.  However, combining that with the possibility of per-query
lookups of cross-type comparison functions seems to require changes in the
APIs of a lot of the pruning support functions, so it's too invasive to
consider as part of this patch.  A win would ensue only for complex
partition key data types (e.g. arrays), so it may not be worth the
trouble.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
2018-06-13 12:03:26 -04:00
Andres Freund a54e1f1587 Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point.  After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
  ERROR: found xmin ... from before relfrozenxid ...
and
  ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
   theory that the relcache entry for such tables doesn't
   change. Which is largely true, except for fields like relfrozenxid
   etc.  This means that changes to relations vacuumed in other
   sessions weren't picked up by already existing sessions.  Luckily
   autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
   was never invalidated while running.  That means that for such
   tables (e.g. pg_authid, pg_database) it's not just already existing
   sessions that are affected, but even new connections are as well.
   That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad.  The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
    https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
    https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
    https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
    https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-
2018-06-12 11:13:21 -07:00
Peter Eisentraut 8a07ebb3c1 Convert debug message from ereport to elog 2018-06-12 11:33:39 -04:00
Tom Lane bdc643e5e4 Fix access to just-closed relcache entry.
It might be impossible for this to cause a problem in non-debug builds,
since there'd be no opportunity for the relcache entry to get recycled
before the fetch.  It blows up nicely with -DRELCACHE_FORCE_RELEASE plus
valgrind, though.

Evidently introduced by careless refactoring in commit f0e44751d.
Back-patch accordingly.

Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us
2018-06-11 19:18:04 -04:00
Michael Paquier f8795d2ec8 Fix oversight from 9e149c8 with spin-lock handling
Calling an external function while a pin-lock is held is a bad idea as
those are designed to be short-lived.  The stress of a first commit into
a large git history may contribute to that.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20180611164952.vmxdpdpirdtkdsz6@alap3.anarazel.de
2018-06-12 06:52:34 +09:00
Tom Lane 69025c5a07 Improve ExecFindInitialMatchingSubPlans's subplan renumbering logic.
We don't need two passes if we scan child partitions before parents,
as that way the children's present_parts are up to date before they're
needed.  I (tgl) think there's actually a bug being fixed here, for the
case of an intermediate partitioned table with no direct leaf children,
but haven't attempted to construct a test case to prove it.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com
2018-06-11 17:35:53 -04:00
Tom Lane 4e23236403 Improve commentary about run-time partition pruning data structures.
No code changes except for a couple of new Asserts.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com
2018-06-11 17:35:53 -04:00
Alvaro Herrera 5b0c7e2f75 Don't needlessly check the partition contraint twice
Starting with commit f0e44751d7, ExecConstraints was in charge of
running the partition constraint; commit 19c47e7c82 modified that so
that caller could request to skip that checking depending on some
conditions, but that commit and 15ce775faa together introduced a small
bug there which caused ExecInsert to request skipping the constraint
check but have this not be honored -- in effect doing the check twice.
This could have been fixed in a very small patch, but on further
analysis of the involved function and its callsites, it turns out to be
simpler to give the responsibility of checking the partition constraint
fully to the caller, and return ExecConstraints to its original
(pre-partitioning) shape where it only checked tuple descriptor-related
constraints.  Each caller must do partition constraint checking on its
own schedule, which is more convenient after commit 2f17844104 anyway.

Reported-by: David Rowley
Author: David Rowley, Álvaro Herrera
Reviewed-by: Amit Langote, Amit Khandekar, Simon Riggs
Discussion: https://postgr.es/m/CAKJS1f8w8+awsxgea8wt7_UX8qzOQ=Tm1LD+U1fHqBAkXxkW2w@mail.gmail.com
2018-06-11 17:12:16 -04:00
Tom Lane be3d90026a Fix run-time partition pruning code to handle NULL values properly.
The previous coding just ignored pruning constraints that compare a
partition key to a null-valued expression.  This is silly, since really
what we can do there is conclude that all partitions are rejected: the
pruning operator is known strict so the comparison must always fail.

This also fixes the logic to not ignore constisnull for a Const comparison
value.  That's probably an unreachable case, since the planner would
normally have simplified away a strict operator with a constant-null input.
But this code has no business assuming that.

David Rowley, per a gripe from me

Discussion: https://postgr.es/m/26279.1528670981@sss.pgh.pa.us
2018-06-11 12:08:15 -04:00
Peter Eisentraut 387543f7bd Make new error code name match SQL standard more closely
Discussion: https://www.postgresql.org/message-id/dff3d555-bea4-ac24-29b2-29521b9d08e8%402ndquadrant.com
2018-06-11 11:15:28 -04:00
Michael Paquier f731cfa94c Fix a couple of bugs with replication slot advancing feature
A review of the code has showed up a couple of issues fixed by this
commit:
- Physical slots have been using the confirmed LSN position as a start
comparison point which is always 0/0, instead use the restart LSN
position (logical slots need to use the confirmed LSN position, which
was correct).
- The actual slot update was incorrect for both physical and logical
slots.  Physical slots need to use their restart_lsn as base comparison
point (confirmed_flush was used because of previous point), and logical
slots need to begin reading WAL from restart_lsn (confirmed_flush was
used as well), while confirmed_flush is compiled depending on the
decoding context and record read, and is the LSN position returned back
to the caller.
- Never return 0/0 if a slot cannot be advanced.  This way, if a slot is
advanced while the activity is idle, then the same position is returned
to the caller over and over without raising an error.  Instead return
the LSN the slot has been advanced to.  With repetitive calls, the same
position is returned hence caller can directly monitor the difference in
progress in bytes by doing simply LSN difference calculations, which
should be monotonic.

Note that as the slot is owned by the backend advancing it, then the
read of those fields is fine lock-less, while updates need to happen
while the slot mutex is held, so fix that on the way as well.  Other
locks for in-memory data of replication slots have been already fixed
previously.

Some of those issues have been pointed out by Petr and Simon during the
patch, while I noticed some of them after looking at the code.  This
also visibly takes of a recently-discovered bug causing assertion
failures which can be triggered by a two-step slot forwarding which
first advanced the slot to a WAL page boundary and secondly advanced it
to the latest position, say 'FF/FFFFFFF' to make sure that the newest
LSN is used as forward point.  It would have been nice to drop a test
for that, but the set of operators working on pg_lsn limits it, so this
is left for a future exercise.

Author: Michael Paquier
Reviewed-by: Petr Jelinek, Simon Riggs
Discussion: https://postgr.es/m/CANP8+jLyS=X-CAk59BJnsxKQfjwrmKicHQykyn52Qj-Q=9GLCw@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru
2018-06-11 09:26:13 +09:00
Tom Lane 321f648a31 Assorted cosmetic cleanup of run-time-partition-pruning code.
Use "subplan" rather than "subnode" to refer to the child plans of
a partitioning Append; this seems a bit more specific and hence
clearer.  Improve assorted comments.  No non-cosmetic changes.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 18:24:34 -04:00
Tom Lane 939449de0e Relocate partition pruning structs to a saner place.
These struct definitions were originally dropped into primnodes.h,
which is a poor choice since that's mainly intended for primitive
expression node types; these are not in that category.  What they
are is auxiliary info in Plan trees, so move them to plannodes.h.

For consistency, also relocate some related code that was apparently
placed with the aid of a dartboard.

There's no interesting code changes in this commit, just reshuffling.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 16:30:14 -04:00
Tom Lane 73b7f48f78 Improve run-time partition pruning to handle any stable expression.
The initial coding of the run-time-pruning feature only coped with cases
where the partition key(s) are compared to Params.  That is a bit silly;
we can allow it to work with any non-Var-containing stable expression, as
long as we take special care with expressions containing PARAM_EXEC Params.
The code is hardly any longer this way, and it's considerably clearer
(IMO at least).  Per gripe from Pavel Stehule.

David Rowley, whacked around a bit by me

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 15:22:32 -04:00
Michael Paquier 9e149c847f Fix and document lock handling for in-memory replication slot data
While debugging issues on HEAD for the new slot forwarding feature of
Postgres 11, some monitoring of the code surrounding in-memory slot data
has proved that the lock handling may cause inconsistent data to be read
by read-only callers of slot functions, particularly
pg_get_replication_slots() which fetches data for the system view
pg_replication_slots, or modules looking directly at slot information.

The code paths involved in those problems concern logical decoding
initialization (down to 9.4) and WAL reservation for slots (new as of
10).

A set of comments documenting all the lock handlings, particularly the
dependency with LW locks for slots and the in_use flag as well as the
internal mutex lock is added, based on a suggested by Simon Riggs.

Some of the fixed code exists down to 9.4 where WAL decoding has been
introduced, but as those race conditions are really unlikely going to
happen as those concern code paths for slot and decoding creation, just
fix the problem on HEAD.

Author: Michael Paquier

Discussion: https://postgr.es/m/20180528085747.GA27845@paquier.xyz
2018-06-10 19:39:26 +09:00
Thomas Munro 86a2218eb0 Limit Parallel Hash's bucket array to MaxAllocSize.
Make sure that we don't exceed MaxAllocSize when increasing the number of
buckets.  Perhaps later we'll remove that limit and use DSA_ALLOC_HUGE, but
for now just prevent further increases like the non-parallel code.  This
change avoids the error from bug report #15225.

Author: Thomas Munro
Reviewed-By: Tom Lane
Reported-by: Frits Jalvingh
Discussion: https://postgr.es/m/152802081668.26724.16985037679312485972%40wrigleys.postgresql.org
2018-06-10 20:30:25 +12:00
Peter Geoghegan f6b95ff434 Fix typo in JIT README.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/3747D478-41F9-439F-8074-AC81A5C76346@yesql.se
2018-06-09 09:33:53 -07:00
Alvaro Herrera 0c8910a0ca Teach SHOW ALL to honor pg_read_all_settings membership
Also, fix the pg_settings view to display source filename and line
number when invoked by a pg_read_all_settings member.  This addition by
me (Álvaro).

Also, fix wording of the comment in GetConfigOption regarding the
restriction it implements, renaming the parameter for extra clarity.
Noted by Michaël.

These were all oversight in commit 25fff40798fc; backpatch to pg10,
where that commit first appeared.

Author: Laurenz Albe
Reviewed-by: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/1519917758.6586.8.camel@cybertec.at
2018-06-08 16:19:05 -04:00
Peter Eisentraut acad8b409a Fix typo 2018-06-08 11:55:12 -04:00
Peter Eisentraut 25cf4ed1dc Add missing serial commas 2018-06-07 23:37:09 -04:00
Simon Riggs 32ac7a118f Exclude VACUUMs from RunningXactData
GetRunningTransactionData() should ignore VACUUM procs because in some
cases they are assigned xids. This could lead to holding back xmin via
the route of passing the xid to standby and then having that hold back
xmin on master via feedback.

Backpatch to 9.1 needed, but will only do so on supported versions.
Backpatch once proven on the buildfarm.

Reported-by: Greg Stark
Author: Simon Riggs
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CANP8+jJBYt=4PpTfiPb0UrH1_iPhzsxKH5Op_Wec634F0ohnAw@mail.gmail.com
2018-06-07 20:38:12 +01:00
Magnus Hagander 848b1f3e35 Fix typo in README
Author: Daniel Gustafsson <daniel@yesql.se>
2018-06-07 14:40:38 +02:00
Heikki Linnakangas 57f06a7611 Fix obsolete comment.
The 'orig_slot' argument was removed in commit c0a8ae7be3, but that
commit forgot to update the comment.

Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/194ac4bf-7b4a-c887-bf26-bc1a85ea995a@lab.ntt.co.jp
2018-06-07 09:56:22 +03:00
Alvaro Herrera eee381ef5e Fix function code in error report
This bug causes a lseek() failure to be reported as a "could not open"
failure in the error message, muddling bug reports.  I introduced this
copy-and-pasteo in commit 78e1220104.

Noticed while reviewing code for bug report #15221, from lily liang.  In
version 10 the affected function is only used by multixact.c and
commit_ts, and only in corner-case circumstances, neither of which are
involved in the reported bug (a pg_subtrans failure.)

Author: Álvaro Herrera
2018-06-06 14:48:08 -04:00
Peter Eisentraut 3f85c62d9e Fix spurious non-ASCII bytes 2018-06-04 16:17:34 -04:00
Peter Eisentraut 2241ad1556 Fix typo 2018-06-04 15:34:42 -04:00
Noah Misch ef31095000 Reconcile nodes/*funcs.c with PostgreSQL 11 work.
This covers new fields in two outfuncs.c functions having no readfuncs.c
counterpart.  Thus, this changes only debugging output.
2018-05-31 16:07:13 -07:00
Teodor Sigaev 08186dc05b Move _bt_upgrademetapage() into critical section.
Any changes on page should be done in critical section, so move
_bt_upgrademetapage into critical section. Improve comment. Found by Amit
Kapila during post-commit review of 857f9c36.

Author: Amit Kapila
2018-05-30 19:45:39 +03:00
Peter Eisentraut 3c9cf06945 Initialize new jsonb iterator to zero
Use palloc0() instead of palloc() to create a new JsonbIterator.
Otherwise, the isScalar field is sometimes not initialized.  There is
probably no impact in practice, but it's cleaner this way and it avoids
future problems.
2018-05-28 23:53:43 -04:00
Andrew Dunstan f963f80970 Avoid use of unportable hex constant in convutils.pm
Discussion: https://postgr.es/m/5a6d6de8-cff8-1ffb-946c-ccf381800ea1@2ndQuadrant.com
2018-05-27 10:41:19 -04:00
Andrew Dunstan 3a7cc727c7 Don't fall off the end of perl functions
This complies with the perlcritic policy
Subroutines::RequireFinalReturn, which is a severity 4 policy. Since we
only currently check at severity level 5, the policy is raised to that
level until we move to level 4 or lower, so that any new infringements
will be caught.

A small cosmetic piece of tidying of the pgperlcritic script is
included.

Mike Blackwell

Discussion: https://postgr.es/m/CAESHdJpfFm_9wQnQ3koY3c91FoRQsO-fh02za9R3OEMndOn84A@mail.gmail.com
2018-05-27 09:08:42 -04:00
Tom Lane b86b7bfa3e Improve English wording of some other getObjectDescription() messages.
Print columns as "column C of <relation>" rather than "<relation> column
C".  This seems to read noticeably better in English, as evidenced by the
regression test output changes, and the code change also makes it possible
for translators to adjust the phrase order in other languages.

Also change the output for OCLASS_DEFAULT from "default for %s" to
"default value for %s".  This seems to read better and is also more
consistent with the output of, for instance, getObjectTypeDescription().

Kyotaro Horiguchi, per a complaint from me

Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
2018-05-24 14:01:10 -04:00
Tom Lane 7c89eb750d Improve translatability of some getObjectDescription() messages.
Refactor some cases in getObjectDescription so that the translator has
more control over phrase order in the translated messages.  This doesn't
cause any changes in the English results.  (I was sorely tempted to
reorder "... belonging to role %s in schema %s" into "... in schema %s
belonging to role %s", but refrained.)

In principle we could back-patch this, but since translators have not
complained about these cases previously, it seems better not to thrash
the translatable strings in back branches.

Kyotaro Horiguchi, tweaked a bit by me

Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
2018-05-24 13:20:16 -04:00
Tom Lane 1a31baf61e Fix objectaddress.c code for publication relations.
getObjectDescription and getObjectIdentity failed to schema-qualify
the name of the published table, which is bad in getObjectDescription and
unforgivable in getObjectIdentity.  Actually, getObjectIdentity failed to
emit the table's name at all unless "objname" output is requested, which
accidentally works for some (all?) extant callers but is clearly not the
intended API.  Somebody had also not gotten the memo that the output of
getObjectIdentity is not to be translated.

To fix getObjectDescription, I made it call getRelationDescription, which
required refactoring the translatable string for the case, but is more
future-proof in case we ever publish relations that aren't plain tables.
While at it, I made the English output look like "publication of table X
in publication Y"; the added "of" seems to me to make it read much better.

Back-patch to v10 where publications were introduced.

Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
2018-05-24 12:38:55 -04:00
Tom Lane 056f52d9c3 Properly schema-qualify additional object types in getObjectDescription().
Collations, conversions, extended statistics objects (in >= v10),
and all four types of text search objects have schema-qualified names.
getObjectDescription() ignored that and would emit just the base name of
the object, potentially producing wrong or at least highly misleading
output.  Fix it to add the schema name whenever the object is not "visible"
in the current search path, as is the rule for other schema-qualifiable
object types.

Although in common situations the output won't change, this seems to me
(tgl) to be a bug worthy of back-patching, hence do so.

Kyotaro Horiguchi, per a complaint from me

Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
2018-05-24 12:07:41 -04:00
Tom Lane 1d96c1b91a Fix incorrect ordering of operations in pg_resetwal and pg_rewind.
Commit c37b3d08c dropped its added GetDataDirectoryCreatePerm call into
the wrong place in pg_resetwal.c, namely after the chdir to DataDir.
That broke invocations using a relative path, as reported by Tushar Ahuja.
We could have left it where it was and changed the argument to be ".",
but that'd result in a rather confusing error message in event of a
failure, so re-ordering seems like a better solution.

Similarly reorder operations in pg_rewind.c.  The issue there is that
it doesn't seem like a good idea to do any actual operations before the
not-root check (on Unix) or the restricted token acquisition (on Windows).
I don't know that this is an actual bug, but I'm definitely not convinced
that it isn't, either.

Assorted other code review for c37b3d08c and da9b580d8: fix some
misspelled or otherwise badly worded comments, put the #include for
<sys/stat.h> where it actually belongs, etc.

Discussion: https://postgr.es/m/aeb9c3a7-3c3f-a57f-1a18-c8d4fcdc2a1f@enterprisedb.com
2018-05-23 10:59:55 -04:00
Heikki Linnakangas b06d8e58b5 Accept "B" in all memory-unit GUCs, and improve error messages.
Commit 6e7baa3227 added support for "B" unit, for specifying config options
in bytes. However, it was only accepted in GUC_UNIT_BYTE settings,
wal_segment_size and track_activity_query_size, and not e.g. in work_mem.
This patch makes it consistent, so that "B" accepted in all the same
contexts where "kB", "MB", and so forth are accepted.

Add "B" to the list of accepted units in the error hint, along with "kB",
"MB", etc.

Add an entry in the conversion table for "TB" to "B" conversion. A terabyte
is out of range for any GUC_UNIT_BYTE option, so you always get an "out of
range" error with that, but without it, you get a confusing error message
that claims that "TB" is not an accepted unit, with a hint that nevertheless
lists "TB" as an accepted unit.

Reviewed-by: Alexander Korotkov, Andres Freund
Discussion: https://www.postgresql.org/message-id/1bfe7f4a-7e22-aa6e-7b37-f4d222ed2d67@iki.fi
2018-05-23 10:22:26 +03:00
Tom Lane ecac23511e Widen COPY FROM's current-line-number counter from 32 to 64 bits.
Because the code for the HEADER option skips a line when this counter
is zero, a very long COPY FROM WITH HEADER operation would drop a line
every 2^32 lines.  A lesser but still unfortunate problem is that errors
would show a wrong input line number for errors occurring beyond the
2^31'st input line.  While such large input streams seemed impractical
when this code was first written, they're not any more.  Widening the
counter (and some associated variables) to uint64 should be enough to
prevent problems for the foreseeable future.

David Rowley

Discussion: https://postgr.es/m/CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
2018-05-22 13:32:52 -04:00
Heikki Linnakangas 17f188cf00 Add missing files to src/backend/lib/README.
The README lists all the files available in the directory, along with short
descriptions of each, but a few newly added ones were missing. While we're
at it, reorder the list into alphabetical order.

Author: Takeshi Ideriha
Discussion: https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A56793487@G01JPEXMBKW04
2018-05-22 13:25:28 +03:00
Heikki Linnakangas a0b37684ba Fix typo in comment. 2018-05-22 11:18:16 +03:00
Peter Eisentraut 4f72ca14de Update SQL features list 2018-05-21 15:29:22 -04:00
Peter Eisentraut 917a68f010 Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 3a5a71cccad5c68e01008e9e3a4f06930197a05e
2018-05-21 12:29:52 -04:00
Andrew Gierth 1da162e1f5 Fix SQL:2008 FETCH FIRST syntax to allow parameters.
OFFSET <x> ROWS FETCH FIRST <y> ROWS ONLY syntax is supposed to accept
<simple value specification>, which includes parameters as well as
literals. When this syntax was added all those years ago, it was done
inconsistently, with <x> and <y> being different subsets of the
standard syntax.

Rectify that by making <x> and <y> accept the same thing, and allowing
either a (signed) numeric literal or a c_expr there, which allows for
parameters, variables, and parenthesized arbitrary expressions.

Per bug #15200 from Lukas Eder.

Backpatch all the way, since this has been broken from the start.

Discussion: https://postgr.es/m/877enz476l.fsf@news-spur.riddles.org.uk
Discussion: http://postgr.es/m/152647780335.27204.16895288237122418685@wrigleys.postgresql.org
2018-05-21 17:27:08 +01:00
Tom Lane f755a152d4 Improve spelling of new FINALFUNC_MODIFY aggregate attribute.
I'd used SHARABLE as a value originally, but Peter Eisentraut points out
that dictionaries agree that SHAREABLE is the preferred spelling.
Run around and change that before it's too late.

Discussion: https://postgr.es/m/d2e1afd4-659c-50d6-1b20-7cfd3675e909@2ndquadrant.com
2018-05-21 11:41:42 -04:00
Tom Lane 81256cd05f Fix unsafe usage of strerror(errno) within ereport().
This is the converse of the unsafe-usage-of-%m problem: the reason
ereport/elog provide that format code is mainly to dodge the hazard
of errno getting changed before control reaches functions within the
arguments of the macro.  I only found one instance of this hazard,
but it's been there since 9.4 :-(.
2018-05-21 00:32:28 -04:00
Tom Lane c6e846446d printf("%lf") is not portable, so omit the "l".
The "l" (ell) width spec means something in the corresponding scanf usage,
but not here.  While modern POSIX says that applying "l" to "f" and other
floating format specs is a no-op, SUSv2 says it's undefined.  Buildfarm
experience says that some old compilers emit warnings about it, and at
least one old stdio implementation (mingw's "ANSI" option) actually
produces wrong answers and/or crashes.

Discussion: https://postgr.es/m/21670.1526769114@sss.pgh.pa.us
Discussion: https://postgr.es/m/c085e1da-0d64-1c15-242d-c921f32e0d5c@dunslane.net
2018-05-20 11:40:54 -04:00
Tom Lane e7a808f947 Assorted minor cleanups for bootstrap-data Perl scripts.
FindDefinedSymbol was intended to take an array of possible include
paths, but it never actually worked correctly for any but the first
array element.  Since there's no use-case for more than one path
anyway, let's just simplify this code and its callers by redefining
it as taking only one include path.

Minor other code-beautification without functional effects, except
that in one place we format the output as pgindent would do.

John Naylor

Discussion: https://postgr.es/m/CAJVSVGXM_n32hTTkircW4_K1LQFsJNb6xjs0pAP4QC0ZpyJfPQ@mail.gmail.com
2018-05-19 16:04:47 -04:00
Stephen Frost e2b83ff556 Fix for globals.c- c.h must come first
Commit da9b580 mistakenly put a system header before postgres.h (which
includes c.h).  That can cause portability issues and broke (at least)
builds with older Windows compilers.

Discovered by Mark Dilger.

Discussion: https://postgr.es/m/BF04A27A-D132-4927-A80A-BAD18695E954@gmail.com
2018-05-18 21:20:27 -04:00
Robert Haas b949bbcb7e Further adjust comment in get_partition_dispatch_recurse.
In editing 09b12d52db I made it wrong;
fix that and try to more clearly explain the situation.

Patch by me, reviewed by David Rowley and Amit Langote

Discussion: http://postgr.es/m/CA+TgmobAq+mA5hzm0a5OS38qQY5758DDDGqa3sBJN4hvir-H9w@mail.gmail.com
2018-05-18 16:11:22 -04:00
Magnus Hagander cfb758b6d9 Fix error message on short read of pg_control
Instead of saying "error: success", indicate that we got a working read
but it was too short.
2018-05-18 17:54:18 +02:00
Peter Eisentraut 9effb63e0d Message wording and pluralization improvements 2018-05-17 23:05:27 -04:00
Tom Lane d1fc750b51 Make numeric power() handle NaNs according to the modern POSIX spec.
In commit 6bdf1303b, we ensured that power()/^ for float8 would honor
the NaN behaviors specified by POSIX standards released in this century,
ie NaN ^ 0 = 1 and 1 ^ NaN = 1.  However, numeric_power() was not
touched and continued to follow the once-common behavior that every
case involving NaN input produces NaN.  For consistency, let's switch
the numeric behavior to the modern spec in the same release that ensures
that behavior for float8.

(Note that while 6bdf1303b was initially back-patched, we later undid
that, concluding that any behavioral change should appear only in v11.)

Discussion: https://postgr.es/m/10898.1526421338@sss.pgh.pa.us
2018-05-17 11:10:50 -04:00
Tom Lane 2efc924180 Detoast plpgsql variables if they might live across a transaction boundary.
Up to now, it's been safe for plpgsql to store TOAST pointers in its
variables because the ActiveSnapshot for whatever query called the plpgsql
function will surely protect such TOAST values from being vacuumed away,
even if the owning table rows are committed dead.  With the introduction of
procedures, that assumption is no longer good in "non atomic" executions
of plpgsql code.  We adopt the slightly brute-force solution of detoasting
all TOAST pointers at the time they are stored into variables, if we're in
a non-atomic context, just in case the owning row goes away.

Some care is needed to avoid long-term memory leaks, since plpgsql tends
to run with CurrentMemoryContext pointing to its call-lifespan context,
but we shouldn't assume that no memory is leaked by heap_tuple_fetch_attr.
In plpgsql proper, we can do the detoasting work in the "eval_mcontext".

Most of the code thrashing here is due to the need to add this capability
to expandedrecord.c as well as plpgsql proper.  In expandedrecord.c,
we can't assume that the caller's context is short-lived, so make use of
the short-term sub-context that was already invented for checking domain
constraints.  In view of this repurposing, it seems good to rename that
variable and associated code from "domain_check_cxt" to "short_term_cxt".

Peter Eisentraut and Tom Lane

Discussion: https://postgr.es/m/5AC06865.9050005@anastigmatix.net
2018-05-16 14:56:52 -04:00
Tom Lane a11b3bd37f Fix misprocessing of equivalence classes involving record_eq().
canonicalize_ec_expression() is supposed to agree with coerce_type() as to
whether a RelabelType should be inserted to make a subexpression be valid
input for the operators of a given opclass.  However, it did the wrong
thing with named-composite-type inputs to record_eq(): it put in a
RelabelType to RECORDOID, which the parser doesn't.  In some cases this was
harmless because all code paths involving a particular equivalence class
did the same thing, but in other cases this would result in failing to
recognize a composite-type expression as being a member of an equivalence
class that it actually is a member of.  The most obvious bad effect was to
fail to recognize that an index on a composite column could provide the
sort order needed for a mergejoin on that column, as reported by Teodor
Sigaev.  I think there might be other, subtler, cases that result in
misoptimization.  It also seems possible that an unwanted RelabelType
would sometimes get into an emitted plan --- but because record_eq and
friends don't examine the declared type of their input expressions, that
would not create any visible problems.

To fix, just treat RECORDOID as if it were a polymorphic type, which in
some sense it is.  We might want to consider formalizing that a bit more
someday, but for the moment this seems to be the only place where an
IsPolymorphicType() test ought to include RECORDOID as well.

This has been broken for a long time, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/a6b22369-e3bf-4d49-f59d-0c41d3551e81@sigaev.ru
2018-05-16 13:46:23 -04:00
Robert Haas 7fc7dac1a7 Pass the correct PlannerInfo to PlanForeignModify/PlanDirectModify.
Previously, we passed the toplevel PlannerInfo, but we actually want
to pass the relevant subroot.  One problem with passing the toplevel
PlannerInfo is that the FDW which wants to push down an UPDATE or
DELETE against a join won't find the relevant joinrel there.
As of commit 1bc0100d27, postgres_fdw
tries to do exactly this and can be made to fail an assertion as a
result.

It's possible that this should be regarded as a bug fix and
back-patched to earlier releases, but for lack of a test case that
fails in earlier releases, no back-patch for now.

Etsuro Fujita, reviewed by Amit Langote.

Discussion: http://postgr.es/m/5AF43E02.30000@lab.ntt.co.jp
2018-05-16 11:32:38 -04:00
Robert Haas 09b12d52db Improve comment in get_partition_dispatch_recurse.
David Rowley, reviewed by Amit Langote, and revised a bit by me.

Discussion: http://postgr.es/m/CAKJS1f9yyimYyFzbHM4EwE+tkj4jvrHqSH0H4S4Kbas=UFpc9Q@mail.gmail.com
2018-05-16 10:47:57 -04:00
Tom Lane 05ca21b878 Fix type checking for support functions of parallel VARIADIC aggregates.
The impact of VARIADIC on the combine/serialize/deserialize support
functions of an aggregate wasn't thought through carefully.  There is
actually no impact, because variadicity isn't passed through to these
functions (and it doesn't seem like it would need to be).  However,
lookup_agg_function was mistakenly told to check things as though it were
passed through.  The net result was that it was impossible to declare an
aggregate that had both VARIADIC input and parallelism support functions.

In passing, fix a runtime check in nodeAgg.c for the combine function's
strictness to make its error message agree with the creation-time check.
The previous message was actually backwards, and it doesn't seem like
there's a good reason to have two versions of this message text anyway.

Back-patch to 9.6 where parallel aggregation was introduced.

Alexey Bashtanov; message fix by me

Discussion: https://postgr.es/m/f86dde87-fef4-71eb-0480-62754aaca01b@imap.cc
2018-05-15 15:06:53 -04:00
Alvaro Herrera 4eaa537275 Don't allow partitioned index on foreign-table partitions
Creating indexes on foreign tables is already forbidden, but local
partitioned indexes (commit 8b08f7d482) forgot to check for them.  Add
a preliminary check to prevent wasting time.

Another school of thought says to allow the index to be created if it's
not a unique index; but it's possible to do better in the future (enable
indexing of foreign tables, somehow), so we avoid painting ourselves in
a corner by rejecting all cases, to avoid future grief (a.k.a. backward
incompatible changes).

Reported-by: Arseny Sher
Author: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/87sh71cakz.fsf@ars-thinkpad
2018-05-14 13:23:07 -04:00
Magnus Hagander fc2a41e23e Fix file paths in comments
Author: Daniel Gustafsson <daniel@yesql.se>
2018-05-14 18:59:43 +02:00
Teodor Sigaev 8e12f4a250 Various improvements of skipping index scan during vacuum technics
- Change vacuum_cleanup_index_scale_factor GUC to PGC_USERSET.
  vacuum_cleanup_index_scale_factor GUC was defined as PGC_SIGHUP.  But this
  GUC affects not only autovacuum.  So it might be useful to change it from user
  session in order to influence manually runned VACUUM.
- Add missing tab-complete support for vacuum_cleanup_index_scale_factor
  reloption.
- Fix condition for B-tree index cleanup.
  Zero value of vacuum_cleanup_index_scale_factor means that user wants B-tree
  index cleanup to be never skipped.
- Documentation and comment improvements

Authors: Justin Pryzby, Alexander Korotkov, Liudmila Mantrova
Reviewed by: all authors and Robert Haas
Discussion: https://www.postgresql.org/message-id/flat/20180502023025.GD7631%40telsasoft.com
2018-05-10 13:31:47 +03:00
Tom Lane 234bb985c5 Update time zone data files to tzdata release 2018e.
DST law changes in North Korea.  Redefinition of "daylight savings" in
Ireland, as well as for some past years in Namibia and Czechoslovakia.
Additional historical corrections for Czechoslovakia.

With this change, the IANA database models Irish timekeeping as following
"standard time" in summer, and "daylight savings" in winter, so that the
daylight savings offset is one hour behind standard time not one hour
ahead.  This does not change their UTC offset (+1:00 in summer, 0:00 in
winter) nor their timezone abbreviations (IST in summer, GMT in winter),
though now "IST" is more correctly read as "Irish Standard Time" not "Irish
Summer Time".  However, the "is_dst" column in the pg_timezone_names view
will now be true in winter and false in summer for the Europe/Dublin zone.

Similar changes were made for Namibia between 1994 and 2017, and for
Czechoslovakia between 1946 and 1947.

So far as I can find, no Postgres internal logic cares about which way
tm_isdst is reported; in particular, since commit b2cbced9e we do not
rely on it to decide how to interpret ambiguous timestamps during DST
transitions.  So I don't think this change will affect any Postgres
behavior other than the timezone-view outputs.

Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us
2018-05-09 13:56:22 -04:00
Alvaro Herrera d758d9702e Fix assorted partition pruning bugs
match_clause_to_partition_key failed to consider COERCION_PATH_ARRAYCOERCE
cases in scalar-op-array expressions, so it was possible to crash the
server easily.  To handle this case properly (ie. prune partitions) we
would need to run a bit of executor code during planning.  Maybe it can
be improved, but for now let's just not crash.  Add a test case that
used to trigger the crash.
Author: Michaël Paquier

match_clause_to_partition_key failed to indicate that operators that
don't have a commutator in a btree opclass are unsupported.  It is
possible for this to cause a crash later if such an operator is used in
a scalar-op-array expression.  Add a test case that used to the crash.
Author: Amit Langote

One caller of gen_partprune_steps_internal in
match_clause_to_partition_key was too optimistic about the former never
returning an empty step list.  Rid it of its innocence.  (Having fixed
the bug above, I no longer know how to exploit this, so no test case for
it, but it remained a bug.)  Revise code flow a little bit, for
succintness.
Author: Álvaro Herrera

Reported-by: Marina Polyakova
Reviewed-by: Michaël Paquier
Reviewed-by: Amit Langote
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ff8f9bfa485ff961d6bb43e54120485b@postgrespro.ru
2018-05-09 11:27:04 -03:00
Andrew Dunstan 35361ee788 Restrict vertical tightness to parentheses in Perl code
The vertical tightness settings collapse vertical whitespace between
opening and closing brackets (parentheses, square brakets and braces).
This can make data structures in particular harder to read, and is not
very consistent with our style in non-Perl code. This patch restricts
that setting to parentheses only, and reformats all the perl code
accordingly. Not applying this to parentheses has some unfortunate
effects, so the consensus is to keep the setting for parentheses and not
for the others.

The diff for this patch does highlight some places where structures
should have trailing commas. They can be added manually, as there is no
automatic tool to do so.

Discussion: https://postgr.es/m/a2f2b87c-56be-c070-bfc0-36288b4b41c1@2ndQuadrant.com
2018-05-09 10:14:46 -04:00
Andrew Dunstan 286bb240e1 perltidy some recent code changes before changing perltidy settings 2018-05-09 10:05:35 -04:00
Alvaro Herrera d1e2cac5ff Make gen_partprune_steps static
There's no need to export this function, so don't.  Michaël didn't
actually write the patch, but we list him as first author because with a
trivial one like this, intellectual authorship is as important (if not
more) as bit shovelling.

Author: Michaël Paquier, Amit Langote
Discussion: https://postgr.es/m/c91299c4-199b-0f16-339b-a29d6d2a39ee@lab.ntt.co.jp
2018-05-09 10:40:25 -03:00
Alvaro Herrera c775fb9e18 Remove useless 'default' clause
Author: Michael Paquier
Reviewed-by: Amit Langote
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180424012042.GD1570@paquier.xyz
Discussion: https://postgr.es/m/20180509061039.GC11897@paquier.xyz
2018-05-09 10:33:55 -03:00
Teodor Sigaev cb5d942959 Improve jsonb cast error message
Initial variant of error message didn't follow style of another casting error
messages and wasn't informative. Per gripe from Robert Haas.

Reviewer: Tom Lane
Discussion: https://www.postgresql.org/message-id/flat/CA%2BTgmob08StTV9yu04D0idRFNMh%2BUoyKax5Otvrix7rEZC8rMw%40mail.gmail.com#CA+Tgmob08StTV9yu04D0idRFNMh+UoyKax5Otvrix7rEZC8rMw@mail.gmail.com
2018-05-09 13:23:16 +03:00
Peter Eisentraut 831f5d11ec Refine error messages
"JSON" when not referring to a data type should be upper case.
2018-05-08 14:36:31 -04:00
Tom Lane 3a675f729e Count heap tuples in non-SnapshotAny path in IndexBuildHeapRangeScan().
Brown-paper-bag bug in commit 7c91a0364: when we rearranged the placement
of "reltuples += 1" statements, we missed including one in this code path.

The net effect of that was that CREATE INDEX CONCURRENTLY would set the
table's pg_class.reltuples to zero, as would index builds done during
bootstrap mode.  (It seems like parallel index builds ought to fail
similarly, but they don't, perhaps because reltuples is computed in some
other way.  You certainly couldn't figure that out from the abysmally
underdocumented parallelism code in this area.)

I was led to this by wondering why initdb seemed to have slowed down as
a result of 7c91a0364, as is evident in the buildfarm's timing history.
The reason is that every system catalog with indexes had pg_class.reltuples
= 0 after bootstrap, causing the planner to make some terrible choices for
queries in the post-bootstrap steps.  On my workstation, this fix causes
the runtime of "initdb -N" to drop from ~2.0 sec to ~1.4 sec, which is
almost though not quite back to where it was in v10.  That's not much of
a deal for production use perhaps, but it makes a noticeable difference
for buildfarm and "make check-world" runs, which do a lot of initdbs.
2018-05-08 00:20:19 -04:00
Andrew Dunstan d2c1512ac4 Clean up some perlcritic warnings
In Catalog.pm, mark eval of a string instead of a block as allowed.
Disallow perlcritic completely in Gen_dummy_probes.pl, as it's
generated code.
Protect a couple of lines in plperl code from  perltidy, so that the
annotation for perlcritic stays on the same line as the construct it
would otherwise object to.
2018-05-07 15:35:32 -04:00
Tom Lane 513ff52e81 Suppress compiler warnings when building with --enable-dtrace.
Most versions of "dtrace -h" drop const qualifiers from the declarations
of probe functions (though macOS gets it right).  This causes compiler
warnings when we pass in pointers to const.  Repair by extending our
existing post-processing of the probes.h file.  To do so, assume that all
"char *" arguments should be "const char *"; that seems reasonably safe.

Thomas Munro

Discussion: https://postgr.es/m/CAEepm=2j1pWSruQJqJ91ZDzD8w9ZZDsM4j2C6x75C-VryWg-_w@mail.gmail.com
2018-05-07 13:44:09 -04:00
Tom Lane d160882a17 Fix bootstrap parser so that its keywords are unreserved words.
Mark Dilger pointed out that the bootstrap parser does not allow
any of its keywords to appear as column values unless they're quoted,
and proposed dealing with that by quoting such values in genbki.pl.
Looking closer, though, we also have that problem with respect to table,
column, and type names appearing in the .bki file: the parser would fail
if any of those matched any of its keywords.  While so far there have
been no conflicts (that I've heard of), this seems like a booby trap
waiting to catch somebody.  Rather than clutter genbki.pl with enough
quoting logic to handle all that, let's make the bootstrap parser grow
up a little bit and treat its keywords as unreserved.

Experimentation shows that it's fairly easy to do so with the exception
of _null_, which I don't have a big problem with keeping as a reserved
word.  The only change needed is that we can't have the "close" command
take an optional table name: it has to either require or forbid the
table name to avoid shift/reduce conflicts.  genbki.pl has historically
always included the table name, so I took that option.

The implementation has bootscanner.l passing forward the string value
of each keyword, in case bootparse.y needs that.  This avoids needing to
know the precise spelling of each keyword in bootparse.y, which is good
because that's not always obvious from the token name.

Discussion: https://postgr.es/m/3024FC91-DB6D-4732-B31C-DF772DF039A0@gmail.com
2018-05-05 16:23:07 -04:00
Tom Lane cb3e9e40bc Put in_range_float4_float8's work in-line.
In commit 8b29e88cd, I'd dithered about whether to make
in_range_float4_float8 be a standalone copy of the float in-range logic
or have it punt to in_range_float8_float8.  I went with the latter, which
saves code space though at the cost of performance and readability.

However, it emerges that this tickles a compiler or hardware bug on
buildfarm member opossum.  Test results from commit 55e0e4581 show
conclusively that widening a float4 NaN to float8 produces Inf, not NaN,
on that machine; which accounts perfectly for the window RANGE test
failures it's been showing.  We can dodge this problem by making
in_range_float4_float8 be an independent function, so that it checks
for NaN inputs before widening them.

Ordinarily I'd not be very excited about working around such obviously
broken functionality; but given that this was a judgment call to begin
with, I don't mind reversing it.
2018-05-05 13:21:50 -04:00
Heikki Linnakangas 0668719801 Fix scenario where streaming standby gets stuck at a continuation record.
If a continuation record is split so that its first half has already been
removed from the master, and is only present in pg_wal, and there is a
recycled WAL segment in the standby server that looks like it would
contain the second half, recovery would get stuck. The code in
XLogPageRead() incorrectly started streaming at the beginning of the
WAL record, even if we had already read the first page.

Backpatch to 9.4. In principle, older versions have the same problem, but
without replication slots, there was no straightforward mechanism to
prevent the master from recycling old WAL that was still needed by standby.
Without such a mechanism, I think it's reasonable to assume that there's
enough slack in how many old segments are kept around to not run into this,
or you have a WAL archive.

Reported by Jonathon Nelson. Analysis and patch by Kyotaro HORIGUCHI, with
some extra comments by me.

Discussion: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com
2018-05-05 01:34:53 +03:00
Alvaro Herrera d2599ecfcc Don't mark pages all-visible spuriously
Dan Wood diagnosed a long-standing problem that pages containing tuples
that are locked by multixacts containing live lockers may spuriously end
up as candidates for getting their all-visible flag set.  This has the
long-term effect that multixacts remain unfrozen; this may previously
pass undetected, but since commit XYZ it would be reported as
  "ERROR: found multixact 134100944 from before relminmxid 192042633"
because when a later vacuum tries to freeze the page it detects that a
multixact that should have gotten frozen, wasn't.

Dan proposed a (correct) patch that simply sets a variable to its
correct value, after a bogus initialization.  But, per discussion, it
seems better coding to avoid the bogus initializations altogether, since
they could give rise to more bugs later.  Therefore this fix rewrites
the logic a little bit to avoid depending on the bogus initializations.

This bug was part of a family introduced in 9.6 by commit a892234f830e;
later, commit 38e9f90a22 fixed most of them, but this one was
unnoticed.

Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera
Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera
Discussion: https://postgr.es/m/84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com
2018-05-04 18:24:45 -03:00
Tom Lane 59cb323053 Fix precedence problem in new Perl code.
I think this bit of commit 1f1cd9b5d didn't do quite what I meant :-(
2018-05-04 09:46:35 -04:00
Teodor Sigaev 2a9e04f0a8 Don't truncate away non-key attributes for leftmost downlinks.
nbtsort.c does not need to truncate away non-key attributes for the
minimum key of the leftmost page on a level, since this is only used to
build a minus infinity downlink for the level's leftmost page.
Truncating away non-key attributes in advance of truncating away all
attributes in _bt_sortaddtup() does not affect the correctness of CREATE
INDEX, but it is misleading.

Author: Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/CAH2-WzkAS2M3ussHG-s_Av=Zo6dPjOxyu5fNRkYnxQV+YzGQ4w@mail.gmail.com
2018-05-04 12:38:23 +03:00
Teodor Sigaev 0bef1c0678 Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.

This fixes two bugs:

1. If fast update was turned on concurrently, subsequent inserts to the
   pending list would not conflict with predicate locks that were acquired
   earlier, on entry pages. The included 'predicate-gin-fastupdate' test
   demonstrates that. To fix, make all scans acquire a predicate lock on
   the metapage. That lock represents a scan of the pending list, whether
   or not there is a pending list at the moment. Forget about the
   optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
   point of predicate locks is to lock the gabs between values, whether
   or not there is a match. The included 'predicate-gin-nomatch' test
   tests that case.

In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)

Also, some spelling  fixes.

Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
2018-05-04 11:27:50 +03:00
Tom Lane 1f1cd9b5dd Avoid overwriting unchanged output files in genbki.pl and Gen_fmgrtab.pl.
If a particular output file already exists with the contents it should
have, leave it alone, so that its mod timestamp is not advanced.

In builds using --enable-depend, this can avoid the need to recompile .c
files whose included files didn't actually change.  It's not clear whether
it saves much of anything for users of ccache; but the cost of doing the
file comparisons seems to be negligible, so we might as well do it.

For developers using the MSVC toolchain, this will create a regression:
msvc/Solution.pm will sometimes run genbki.pl or Gen_fmgrtab.pl
unnecessarily.  I'll look into fixing that separately.

Discussion: https://postgr.es/m/16925.1525376229@sss.pgh.pa.us
2018-05-03 18:06:45 -04:00
Tom Lane 9bf28f96c7 Rearrange makefile rules for running Gen_fmgrtab.pl.
Make these rules look more like the ones associated with genbki.pl,
to wit:

* Use a stamp file to record when we last ran the script, instead of
relying on the timestamps of the individual output files.

* Take the knowledge out of backend/Makefile and put it in utils/Makefile
where it belongs.  I moved down the handling of errcodes.h and probes.h
too, although those continue to be built by separate processes.

In itself, this is just much-needed cleanup with little practical effect.
However, by decoupling these makefile rules from the timestamps of the
generated header files, we open the door to not advancing those timestamps
unnecessarily, which will be taken advantage of by the next commit.

msvc/Solution.pm should be taught to do things similarly, but I'll leave
that for another commit.

Discussion: https://postgr.es/m/16925.1525376229@sss.pgh.pa.us
2018-05-03 17:54:18 -04:00
Teodor Sigaev 8f9be261f4 Add HOLD_INTERRUPTS section into FinishPreparedTransaction.
If an interrupt arrives in the middle of FinishPreparedTransaction
and any callback decide to call CHECK_FOR_INTERRUPTS (e.g.
RemoveTwoPhaseFile can write a warning with ereport, which checks for
interrupts) then it's possible to leave current GXact undeleted.

Backpatch to all supported branches

Stas Kelvich

Discussion: ihttps://www.postgresql.org/message-id/3AD85097-A3F3-4EBA-99BD-C38EDF8D2949@postgrespro.ru
2018-05-03 20:08:29 +03:00
Tom Lane a7a7387575 Further improve code for probing the availability of ARM CRC instructions.
Andrew Gierth pointed out that commit 1c72ec6f4 would yield the wrong
answer on big-endian ARM systems, because the data being CRC'd would be
different.  To fix that, and avoid the rather unsightly hard-wired
constant, simply compare the hardware and software implementations'
results.

While we're at it, also log the resulting decision at DEBUG1, and error
out if the hw and sw results unexpectedly differ.  Also, since this
file must compile for both frontend and backend, avoid incorrect
dependencies on backend-only headers.

In passing, add a comment to postmaster.c about when the CRC function
pointer will get initialized.

Thomas Munro, based on complaints from Andrew Gierth and Tom Lane

Discussion: https://postgr.es/m/HE1PR0801MB1323D171938EABC04FFE7FA9E3110@HE1PR0801MB1323.eurprd08.prod.outlook.com
2018-05-03 11:32:57 -04:00
Peter Eisentraut 30c66e77be Fix SPI error cleanup and memory leak
Since the SPI stack has been moved from TopTransactionContext to
TopMemoryContext, setting _SPI_stack to NULL in AtEOXact_SPI() leaks
memory.  In fact, we don't need to do that anymore: We just leave the
allocated stack around for the next SPI use.

Also, refactor the SPI cleanup so that it is run both at transaction end
and when returning to the main loop on an exception.  The latter is
necessary when a procedure calls a COMMIT or ROLLBACK command that
itself causes an error.
2018-05-03 08:39:15 -04:00
Tom Lane fbb2e9a030 Fix assorted compiler warnings seen in the buildfarm.
Failure to use DatumGetFoo/FooGetDatum macros correctly, or at all,
causes some warnings about sign conversion.  This is just cosmetic
at the moment but in principle it's a type violation, so clean up
the instances I could find.

autoprewarm.c and sharedfileset.c contained code that unportably
assumed that pid_t is the same size as int.  We've variously dealt
with this by casting pid_t to int or to unsigned long for printing
purposes; I went with the latter.

Fix uninitialized-variable warning in RestoreGUCState.  This is
a live bug in some sense, but of no great significance given that
nobody is very likely to care what "line number" is associated with
a GUC that hasn't got a source file recorded.
2018-05-02 15:52:54 -04:00
Tom Lane 447dbf7aa7 Fix bogus code for extracting extended-statistics data from syscache.
statext_dependencies_load and statext_ndistinct_load were not up to snuff,
in addition to being randomly different from each other.  In detail:

* Deserialize the fetched bytea value before releasing the syscache
entry, not after.  This mistake causes visible regression test failures
when running with -DCATCACHE_FORCE_RELEASE.  Since it's not exposed by
-DCLOBBER_CACHE_ALWAYS, I think there may be no production hazard here
at present, but it's at least a latent bug.

* Use DatumGetByteaPP not DatumGetByteaP to save a detoasting cycle
for short stats values; the deserialize function has to be, and is,
prepared for short-header values since its other caller uses PP.

* Use a test-and-elog for null stats values in both functions, rather
than a test-and-elog in one case and an Assert in the other.  Perhaps
Asserts would be sufficient in both cases, but I don't see a good
argument for them being different.

* Minor cosmetic changes to make these functions more visibly alike.

Backpatch to v10 where this code came in.

Amit Langote, minor additional hacking by me

Discussion: https://postgr.es/m/1349aabb-3a1f-6675-9fc0-65e2ce7491dd@lab.ntt.co.jp
2018-05-02 12:23:00 -04:00
Heikki Linnakangas 445e31bdc7 Fix some sloppiness in the new BufFileSize() and BufFileAppend() functions.
There were three related issues:

* BufFileAppend() incorrectly reset the seek position on the 'source' file.
  As a result, if you had called BufFileRead() on the file before calling
  BufFileAppend(), it got confused, and subsequent calls would read/write
  at wrong position.

* BufFileSize() did not work with files opened with BufFileOpenShared().

* FileGetSize() only worked on temporary files.

To fix, change the way BufFileSize() works so that it works on shared
files. Remove FileGetSize() altogether, as it's no longer needed. Remove
buffilesize from TapeShare struct, as the leader process can simply call
BufFileSize() to get the tape's size, there's no need to pass it through
shared memory anymore.

Discussion: https://www.postgresql.org/message-id/CAH2-WznEDYe_NZXxmnOfsoV54oFkTdMy7YLE2NPBLuttO96vTQ@mail.gmail.com
2018-05-02 17:23:13 +03:00
Andres Freund 2993435dba Further -Wimplicit-fallthrough cleanup.
Tom's earlier commit in 41c912cad1 didn't update a few cases that
are only encountered with the non-standard --with-llvm config
flag. Additionally there's also one case that appears to be a
deficiency in gcc's (up to trunk as of a few days ago) detection of
"fallthrough" comments - changing the placement slightly fixes that.

Author: Andres Freund
Discussion: https://postgr.es/m/20180502003239.wfnqu7ekz7j7imm4@alap3.anarazel.de
2018-05-01 19:53:48 -07:00
Tom Lane b2328bf62b Fix some assorted compiler warnings on Windows.
Don't overflow the result type of constant expressions.  Don't negate
unsigned types.  Define HAVE_STDBOOL_H for Visual C++ 2013 and later.

Thomas Munro
Reviewed-By: Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/CAEepm%3D3%3DTDYEXUEcHpEx%2BTwc31wo7PA0oBAiNt6sWmq93MW02A%40mail.gmail.com
2018-05-01 19:38:26 -04:00
Tom Lane 41c912cad1 Clean up warnings from -Wimplicit-fallthrough.
Recent gcc can warn about switch-case fall throughs that are not
explicitly labeled as intentional.  This seems like a good thing,
so clean up the warnings exposed thereby by labeling all such
cases with comments that gcc will recognize.

In files that already had one or more suitable comments, I generally
matched the existing style of those.  Otherwise I went with
/* FALLTHROUGH */, which is one of the spellings approved at the
more-restrictive-than-default level -Wimplicit-fallthrough=4.
(At the default level you can also spell it /* FALL ?THRU */,
and it's not picky about case.  What you can't do is include
additional text in the same comment, so some existing comments
containing versions of this aren't good enough.)

Testing with gcc 8.0.1 (Fedora 28's current version), I found that
I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR);
apparently, for this purpose gcc doesn't recognize that those don't
return.  That seems like possibly a gcc bug, but it's fine because
in most places we did that anyway; so this amounts to a visit from the
style police.

Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us
2018-05-01 19:35:08 -04:00
Robert Haas 37a3058bc7 Fix interaction of foreign tuple routing with remote triggers.
Without these fixes, changes to the inserted tuple made by remote
triggers are ignored when building local RETURNING tuples.

In the core code, call ExecInitRoutingInfo at a later point from
within ExecInitPartitionInfo so that the FDW callback gets invoked
after the returning list has been built.  But move CheckValidResultRel
out of ExecInitRoutingInfo so that it can happen at an earlier stage.

In postgres_fdw, refactor assorted deparsing functions to work with
the RTE rather than the PlannerInfo, which saves us having to
construct a fake PlannerInfo in cases where we don't have a real one.
Then, we can pass down a constructed RTE that yields the correct
deparse result when no real one exists.  Unfortunately, this
necessitates a hack that understands how the core code manages RT
indexes for update tuple routing, which is ugly, but we don't have a
better idea right now.

Original report, analysis, and patch by Etsuro Fujita.  Heavily
refactored by me.  Then worked over some more by Amit Langote.

Discussion: http://postgr.es/m/5AD4882B.10002@lab.ntt.co.jp
2018-05-01 13:21:46 -04:00
Tom Lane bcbf2346d6 Remove investigative code for can't-reattach-to-shared-memory errors.
Revert commits 23078689a, 73042b8d1, ce07aff48, f7df8043f, 6ba0cc4bd,
eb16011f4, 68e7e973d, 63ca350ef.  We still have a problem here, but
somebody who's actually a Windows developer will need to spend time
on it.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-05-01 13:06:31 -04:00
Tom Lane 23078689a9 Does it help to wait before reattaching?
Revert the map/unmap dance I tried in commit 73042b8d1; that helps
not at all.

Instead, speculate that the unwanted allocation is being done on
another thread, and thus timing variations explain the apparent
unpredictability.  Temporarily add a 1-second sleep before the
VirtualFree call, in hopes that any such other threads will
quiesce and not jog our elbow.

This is obviously not a desirable long-term fix, but as a means of
investigation it seems useful.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-30 20:09:31 -04:00
Tom Lane 73042b8d13 Map and unmap the shared memory block before risking VirtualFree.
The idea here is to get Windows' userspace infrastructure to allocate
whatever space it needs for MapViewOfFileEx() before we release the
locked-down space that we want to map the shared memory block into.

This is a fairly brute-force attempt, and would likely (for example)
fail with large shared memory on 32-bit Windows.  We could perhaps
ameliorate that by mapping only part of the shared memory block in
this way, but for the moment I just want to see if this approach
will fix dory's problem.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-30 17:07:14 -04:00
Tom Lane ce07aff48f Further effort at preventing memory map dump from affecting the results.
Rather than elog'ing immediately, push the map data into a preallocated
StringInfo.  Perhaps this will prevent some of the mid-operation
allocations that are evidently happening now.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-30 16:19:51 -04:00
Peter Eisentraut c5679256e9 Write error messages about duplicate OIDs to stderr 2018-04-30 14:18:46 -04:00
Peter Eisentraut 33a1c2145c Remove "Generating" output from catalog scripts
So by default, they don't output anything if everything is well.

Discussion: https://www.postgresql.org/message-id/867f8a1a-6cf0-d835-78d8-0844e4936241%402ndquadrant.com
2018-04-30 14:18:07 -04:00
Peter Eisentraut 92e1583b43 Don't do logical replication of TRUNCATE of zero tables
When due to publication configuration, a TRUNCATE change ends up with
zero tables to be published, don't send the message out, just skip it.
It's not wrong, but obviously useless overhead.
2018-04-30 13:49:20 -04:00
Tom Lane f7df8043f0 Remove Windows module-list-dumping code.
This code is evidently allocating memory and thus confusing matters
even more.  Let's see whether we can learn anything with
just VirtualQuery.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-30 13:20:13 -04:00
Tom Lane 6ba0cc4bd3 Dump full memory maps around failing Windows reattach code.
This morning's results from buildfarm member dory make it pretty
clear that something is getting mapped into the just-freed space,
but not what that something is.  Replace my minimalistic probes
with a full dump of the process address space and module space,
based on Noah's work at
<20170403065106.GA2624300%40tornado.leadboat.com>

This is all (probably) to get reverted once we have fixed the
problem, but for now we need information.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-30 11:16:21 -04:00
Tom Lane eb16011f4c Get still more info about Windows can't-reattach-to-shared-memory errors.
After some thought about the info captured so far, it seems possible
that MapViewOfFileEx is itself causing some DLL to get loaded into
the space just freed by VirtualFree.  The previous commit here didn't
capture enough info to really prove the case for that, so let's add
one more VirtualQuery in between those steps.  Also, be sure to
capture the post-Map state before we emit any log entries, just in
case elog() is invoking some code not previously loaded.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-29 20:41:19 -04:00
Tom Lane 6bdf1303b3 Avoid wrong results for power() with NaN input on more platforms.
Buildfarm results show that the modern POSIX rule that 1 ^ NaN = 1 is not
honored on *BSD until relatively recently, and really old platforms don't
believe that NaN ^ 0 = 1 either.  (This is unsurprising, perhaps, since
SUSv2 doesn't require either behavior.)  In hopes of getting to platform
independent behavior, let's deal with all the NaN-input cases explicitly
in dpow().

Note that numeric_power() doesn't know either of these special cases.
But since that behavior is platform-independent, I think it should be
addressed separately, and probably not back-patched.

Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp
2018-04-29 18:15:16 -04:00
Tom Lane 68e7e973d2 Get more info about Windows can't-reattach-to-shared-memory errors.
Commit 63ca350ef neglected to probe the state of things *before*
the VirtualFree call, which now looks like it might be interesting.

Discussion: https://postgr.es/m/25495.1524517820@sss.pgh.pa.us
2018-04-29 16:02:45 -04:00