Commit Graph

18436 Commits

Author SHA1 Message Date
Michael Paquier c072e80337 Correct function name in comment of logical decoding code
Reported-by: Dave Cramer
Author: Euler Taveira
Discussion: https://postgr.es/m/CADK3HHKnPGJDLhjOFBY6+70Wd14iEH8c2GKw7UrOuUHp_GNFrA@mail.gmail.com
2018-07-02 13:30:12 +09:00
Andrew Dunstan 1e9c858090 pgindent run prior to branching 2018-06-30 12:25:49 -04:00
Alvaro Herrera 41372071df Fix crash when ALTER TABLE recreates indexes on partitions
The skip_build flag was not being passed correctly when recursing to
indexes on partitions, leading to attempts to rebuild indexes when they
were not yet ready to be rebuilt.

Reported-by: Rajkumar Raghuwanshi
Discussion: https://postgr.es/m/CAKcux6mxNCGsgATwf5CGMF8g4WSupCXicCVMeKUTuWbyxHOMsQ@mail.gmail.com
2018-06-29 11:49:30 -04:00
Michael Paquier dad5f8a3d5 Make capitalization of term "OpenSSL" more consistent
This includes code comments and documentation.  No backpatch as this is
cosmetic even if there are documentation changes which are user-facing.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/BB89928E-2BC7-489E-A5E4-6D204B3954CF@yesql.se
2018-06-29 09:45:44 +09:00
Amit Kapila 2e61c50785 Fix thinko in comments.
A slot can not be stored in a tuple but it's vice versa.

Reported-by: Ashutosh Bapat
Author: Ashutosh Bapat
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAFjFpRcHhNhXdegyJv3KKDWrwO1_NB_KYZM_ZSDeMOZaL1A5jQ@mail.gmail.com
2018-06-27 18:05:24 +05:30
Andres Freund 986070872f Remove duplicated return statement from llvmjit code.
The duplicated return clearly doesn't make sense / isn't
reachable. Likely introduced by me (Andres), while revising the code.

Author: Rushabh Lathia
Discussion: https://postgr.es/m/CAGPqQf2raxWOcbuTP36M1rEF3=Rfo7oD29K3psdyHMeE5swBRg@mail.gmail.com
2018-06-26 23:16:50 -07:00
Amit Kapila 8121ab88e7 Cosmetic improvements for faster column addition.
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.

Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
2018-06-27 08:16:13 +05:30
Alvaro Herrera f49a80c481 Fix "base" snapshot handling in logical decoding
Two closely related bugs are fixed.  First, xmin of logical slots was
advanced too early.  During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them.  The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN.  This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.

The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.

Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.

Liberally sprinkle code comments, and rewrite a few existing ones.  This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
2018-06-26 16:48:10 -04:00
Alexander Korotkov 4d54543efa Fix upper limit for vacuum_cleanup_index_scale_factor
6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX.  DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.

This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.

Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com
2018-06-26 21:55:59 +03:00
Peter Geoghegan aefb0a382c Correct a comment on logtape.c's leader tape.
randomAccess parallel tuplesorts are disallowed because the leader would
try to write to its own leader tape, not because the leader would try to
write to a worker tape directly.

Cleanup from commit 9da0cc3528.
2018-06-26 11:16:20 -07:00
Peter Geoghegan cdc2693a11 Remove obsolete comment block in nbtsort.c.
Building a new nbtree index through incremental insertions would always
be slower than our actual approach of sorting using tuplesort,
assembling leaf pages from tuplesort output, and writing and WAL-logging
whole pages.  Remove a comment block from the Berkeley days claiming
that incremental insertions might be slightly faster with presorted
input.

Discussion: https://postgr.es/m/CAH2-WzmKs4mLAoFgJ3yHMRYc849efc=dw+pNRb3NEog2oJoCNw@mail.gmail.com
2018-06-26 10:08:44 -07:00
Alvaro Herrera 040da42367 Enable failure to rename a partitioned index
Concurrently with partitioned index development (commit 8b08f7d482),
the code to handle failure to rename indexes was refactored (commit
8b9e9644dc).  Turns out that that particular case was untested, which
naturally led it to be broken.  Add tests and the missing code line.

Co-authored-by: David Rowley <dgrowley@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6mfYMS3OX0ywjOiWiGSEKhJf-1zdeTceHFbd0mScUzU5A@mail.gmail.com
2018-06-26 11:54:45 -04:00
Alvaro Herrera 7d872c91a3 Allow direct lookups of AppendRelInfo by child relid
find_appinfos_by_relids had quite a large overhead when the number of
items in the append_rel_list was high, as it had to trawl through the
append_rel_list looking for AppendRelInfos belonging to the given
childrelids.  Since there can only be a single AppendRelInfo for each
child rel, it seems much better to store an array in PlannerInfo which
indexes these by child relid, making the function O(1) rather than O(N).
This function was only called once inside the planner, so just replace
that call with a lookup to the new array.  find_childrel_appendrelinfo
is now unused and thus removed.

This fixes a planner performance regression new to v11 reported by
Thomas Reiss.

Author: David Rowley
Reported-by: Thomas Reiss
Reviewed-by: Ashutosh Bapat
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com
2018-06-26 10:35:26 -04:00
Alexander Korotkov 6ca33a885b Increase upper limit for vacuum_cleanup_index_scale_factor
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36.  However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).

vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics.  And after detailed consideration it appears
that risk of stalled index statistics is low.  And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor.  So, set upper limit for these
GUC and reloption to DBL_MAX.

Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
2018-06-26 15:00:51 +03:00
Thomas Munro a40cff8956 Move RecoveryLockList into a hash table.
Standbys frequently need to release all locks held by a given xid.
Instead of searching one big list linearly, let's create one list
per xid and put them in a hash table, so we can find what we need
in O(1) time.

Earlier analysis and a prototype were done by David Rowley, though
this isn't his patch.

Back-patch all the way.

Author: Thomas Munro
Diagnosed-by: David Rowley, Andres Freund
Reviewed-by: Andres Freund, Tom Lane, Robert Haas
Discussion: https://postgr.es/m/CAEepm%3D1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA%40mail.gmail.com
Discussion: https://postgr.es/m/CAKJS1f9vJ841HY%3DwonnLVbfkTWGYWdPN72VMxnArcGCjF3SywA%40mail.gmail.com
2018-06-26 18:45:45 +12:00
Alvaro Herrera 322548a8ab Update obsolete comments
Commit 9fab40ad32 removed some pre-allocating logic in
reorderbuffer.c, but left outdated comments in place.  Repair.

Author: Álvaro Herrera
2018-06-25 15:45:48 -04:00
Peter Eisentraut 299addd592 Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 884f33d735870f94357820800840af3e93ff4628
2018-06-25 12:37:18 +02:00
Michael Paquier 6cb3372411 Address set of issues with errno handling
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set.  Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.

This patch uses the brute-force approach of correcting all those code
paths.  Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.

Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
2018-06-25 11:19:05 +09:00
Alvaro Herrera 475be5e790 When index recurses to a partition, map columns numbers
Two out of three code paths were mapping column numbers correctly if a
partition had different column numbers than parent table, but the most
commonly used one (recursing in CREATE INDEX to a new index on a
partition) failed to map attribute numbers in expressions.  Oddly
enough, attnums in WHERE clauses are already handled correctly
everywhere.

Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp
Reviewed-by: Álvaro Herrera
2018-06-22 16:45:48 -04:00
Robert Haas c6f28af5d7 Avoid generating bogus paths with partitionwise aggregate.
Previously, if some or all partitions had no partially aggregated path,
we would still try to generate a partially aggregated path for the
parent, leading to assertion failures or wrong answers.

Report by Rajkumar Raghuwanshi.  Patch by Jeevan Chalke, reviewed
by Ashutosh Bapat.  A few changes by me.

Discussion: http://postgr.es/m/CAKcux6=q4+Mw8gOOX16ef6ZMFp9Cve7KWFstUsrDa4GiFaXGUQ@mail.gmail.com
2018-06-22 09:20:19 -04:00
Andrew Dunstan 2448adf29c Allow for pg_upgrade of attributes with missing values
Commit 16828d5c02 neglected to do this, so upgraded databases would
silently get null instead of the specified default in rows without the
attribute defined.

A new binary upgrade function is provided to perform this and pg_dump is
adjusted to output a call to the function if required in binary upgrade
mode.

Also included is code to drop missing attribute values for dropped
columns. That way if the type is later dropped the missing value won't
have a dangling reference to the type.

Finally the regression tests are adjusted to ensure that there is a row
with a missing value so that this code is exercised in upgrade testing.

Catalog version unfortunately bumped.

Regression test changes from Tom Lane.
Remainder from me, reviewed by Tom Lane, Andres Freund, Alvaro Herrera

Discussion: https://postgr.es/m/19987.1529420110@sss.pgh.pa.us
2018-06-22 08:42:36 -04:00
Alexander Korotkov 9a994e37e0 Fixes for vacuum_cleanup_index_scale_factor GUC option
vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs.  However, it affects not only autovacuum, but also manually run
VACUUM.  It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.

Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample.  So, add it there with appropriate comment.

Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com
2018-06-22 12:26:21 +03:00
Michael Paquier 0aa5e65ab4 Fix typo in comment of commit_ts.c for incorrect reference to CLOG
Author: Shao Bret
2018-06-22 13:30:26 +09:00
Amit Kapila 98d476a965 Improve coding pattern in Parallel Append code.
The create_append_path code didn't consider that list_concat will
modify it's first argument leading to inconsistent traversal of
resulting list.  In practice, it won't lead to any user-visible bug
but changing it for making the code behave consistently.

Reported-by: Tom Lane
Author: Tom Lane
Reviewed-by: Amit Khandekar and Amit Kapila
Discussion: https://postgr.es/m/32365.1528994120@sss.pgh.pa.us
2018-06-22 08:43:36 +05:30
Tom Lane ec4719cd15 Fix partial aggregation for variance(int4) and related aggregates.
A typo in numeric_poly_combine caused bogus results for queries using
it, but of course would only manifest if parallel aggregation is
performed.  Reported by Rajkumar Raghuwanshi.

David Rowley did the diagnosis and the fix; I editorialized rather
heavily on his regression test additions.

Back-patch to v10 where the breakage was introduced (by 9cca11c91).

Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com
2018-06-21 16:18:39 -04:00
Alvaro Herrera e474c2b7e4 Set correct context for XPath evaluation
According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node.  This becomes visible when a relative path rather than
absolute is used as row expression.  Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.

Other functions such as xpath() and xpath_exists() also have this
problem.  While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too.  However, this is a backwards-incompatible change.

No backpatch, out of fear of breaking code depending on the original
broken behavior.

Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
2018-06-21 15:56:11 -04:00
Tom Lane 07e5a21352 Fix mishandling of sortgroupref labels while splitting SRF targetlists.
split_pathtarget_at_srfs() neglected to worry about sortgroupref labels
in the intermediate PathTargets it constructs.  I think we'd supposed
that their labeling didn't matter, but it does at least for the case that
GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet
step(s).  This results in "ERROR: ORDER/GROUP BY expression not found in
targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi.

To fix, make this logic track the sortgroupref labeling of expressions,
not just their contents.  This also restores the pre-v10 behavior that
separate GROUP BY expressions will be kept distinct even if they are
textually equal().

Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com
2018-06-21 10:58:42 -04:00
Alvaro Herrera b7f0be9a7e Accept TEXT and CDATA nodes in XMLTABLE's column_expression.
Column expressions that match TEXT or CDATA nodes must return the
contents of the nodes themselves, not the content of non-existing
children (i.e. the empty string).

Author: Markus Winand
Reported-by: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
2018-06-20 12:58:12 -04:00
Alvaro Herrera 8f97af60d1 Consistently use the term 'partitioned rel' in partprune comments
We were using 'partition rel' in a few places, which is quite confusing.

Author: Amit Langote
Reviewed-by: David Rowley
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/fd256561-31a2-4b7e-cd84-d8241e7ebc3f@lab.ntt.co.jp
2018-06-20 11:43:01 -04:00
Amit Kapila 403318b71f Don't consider parallel append for parallel unsafe paths.
Commit ab72716778 allowed Parallel Append paths to be generated for a
relation that is not parallel safe.  Prevent that from happening.

Initial analysis by Tom Lane.

Reported-by: Rajkumar Raghuwanshi
Author: Amit Kapila and Rajkumar Raghuwanshi
Reviewed-by: Amit Khandekar and Robert Haas
Discussion:https://postgr.es/m/CAKcux6=tPJ6nJ08r__nU_pmLQiC0xY15Fn0HvG1Cprsjdd9s_Q@mail.gmail.com
2018-06-20 07:51:42 +05:30
Michael Paquier 1c7c317cd9 Clarify use of temporary tables within partition trees
Since their introduction, partition trees have been a bit lossy
regarding temporary relations.  Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.

Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence.  This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message.  There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature.  Partitions respect 2), 3) and 4) already.

It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.

Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there.  Documentation also
includes limitations related to the use of temporary tables with
partition trees.

Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
2018-06-20 10:42:25 +09:00
Tom Lane 45e98ee730 Remove obsolete prohibition on function name matching a column name.
ProcedureCreate formerly threw an error if the function to be created
has one argument of composite type and the function name matches some
column of the composite type.  This was a (very non-bulletproof) defense
against creating situations where f(x) and x.f are ambiguous.  But we
don't really need such a defense in the wake of commit b97a3465d, which
allows us to deal with such situations fairly cleanly.  This behavior
also created a dump-and-reload hazard, since a function might be
rejected if a conflicting column name had been added to the input
composite type later.  Hence, let's just drop the check.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com
2018-06-18 11:57:33 -04:00
Tom Lane b97a3465d7 Consider syntactic form when disambiguating function vs column reference.
Postgres has traditionally considered the syntactic forms f(x) and x.f
to be equivalent, allowing tricks such as writing a function and then
using it as though it were a computed-on-demand column.  However, our
behavior when both interpretations are feasible left something to be
desired: we always chose the column interpretation.  This could lead
to very surprising results, as in a recent bug report from Neil Conway.
It also created a dump-and-reload hazard, since what was a function
call in a dumped view could get interpreted as a column reference
at reload, if a matching column name had been added to the underlying
table since the view was created.

What seems better, in ambiguous situations, is to prefer the choice
matching the syntactic form of the reference.  This seems much less
astonishing in general, and it fixes the dump/reload hazard.

Although this could be called a bug fix, there have been few complaints
and there's some small risk of breaking applications that depend on the
old behavior, so no back-patch.  It does seem reasonable to slip it
into v11, though.

Discussion: https://postgr.es/m/CAOW5sYa3Wp7KozCuzjOdw6PiOYPi6D=VvRybtH2S=2C0SVmRmA@mail.gmail.com
2018-06-18 11:39:33 -04:00
Michael Paquier 70b4f82a4b Prevent hard failures of standbys caused by recycled WAL segments
When a standby's WAL receiver stops reading WAL from a WAL stream, it
writes data to the current WAL segment without having priorily zero'ed
the page currently written to, which can cause the WAL reader to read
junk data from a past recycled segment and then it would try to get a
record from it.  While sanity checks in place provide most of the
protection needed, in some rare circumstances, with chances increasing
when a record header crosses a page boundary, then the startup process
could fail violently on an allocation failure, as follows:
FATAL:  invalid memory alloc request size XXX

This is confusing for the user and also unhelpful as this requires in
the worst case a manual restart of the instance, impacting potentially
the availability of the cluster, and this also makes WAL data look like
it is in a corrupted state.

The chances of seeing failures are higher if the connection between the
standby and its root node is unstable, causing WAL pages to be written
in the middle.  A couple of approaches have been discussed, like
zero-ing  new WAL pages within the WAL receiver itself but this has the
disadvantage of impacting performance of any existing instances as this
breaks the sequential writes done by the WAL receiver.  This commit
deals with the problem with a more simple approach, which has no
performance impact without reducing the detection of the problem: if a
record is found with a length higher than 1GB for backends, then do not
try any allocation and report a soft failure which will force the
standby to retry reading WAL.  It could be possible that the allocation
call passes and that an unnecessary amount of memory is allocated,
however follow-up checks on records would just fail, making this
allocation short-lived anyway.

This patch owes a great deal to Tsunakawa Takayuki for reporting the
failure first, and then discussing a couple of potential approaches to
the problem.

Backpatch down to 9.5, which is where palloc_extended has been
introduced.

Reported-by: Tsunakawa Takayuki
Reviewed-by: Tsunakawa Takayuki
Author: Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
2018-06-18 10:43:27 +09:00
Tom Lane 9b53d96684 Suppress -Wshift-negative-value warnings.
Clean up four places that result in compiler warnings when using recent
gcc with this warning class enabled (as seen on buildfarm members
calliphoridae, skink, and others).  In all these places, this is purely
cosmetic, because the shift distance could not be large enough to risk
a change of sign, so there's no chance of implementation-dependent
behavior.  Still, it's easy enough to avoid the warning by casting the
shifted value to unsigned, so let's do that.

Patch HEAD only, this isn't worth a back-patch.
2018-06-17 16:15:11 -04:00
Tom Lane 0dcf68e5a1 Fix some minor error-checking oversights in ParseFuncOrColumn().
Recent additions to ParseFuncOrColumn to make it reject non-procedure
functions in CALL were neither adequate nor documented.  Reorganize
the code to ensure uniform results for all the cases that should be
rejected.  Also, use ERRCODE_WRONG_OBJECT_TYPE for this case as well
as the converse case of a procedure in a non-CALL context.  The
original coding used ERRCODE_UNDEFINED_FUNCTION which seems wrong,
and is certainly inconsistent with the adjacent wrong-kind-of-routine
errors.

This reorganization also causes the checks for aggregate decoration with
a non-aggregate function to be made in the FUNCDETAIL_COERCION case;
that they were not is a long-standing oversight.

Discussion: https://postgr.es/m/14497.1529089235@sss.pgh.pa.us
2018-06-16 14:11:14 -04:00
Simon Riggs 15378c1a15 Remove AELs from subxids correctly on standby
Issues relate only to subtransactions that hold AccessExclusiveLocks
when replayed on standby.

Prior to PG10, aborting subtransactions that held an
AccessExclusiveLock failed to release the lock until top level commit or
abort. 49bff5300d fixed that.

However, 49bff5300d also introduced a similar bug where subtransaction
commit would fail to release an AccessExclusiveLock, leaving the lock to
be removed sometimes early and sometimes late. This commit fixes
that bug also. Backpatch to PG10 needed.

Tested by observation. Note need for multi-node isolationtester to improve
test coverage for this and other HS cases.

Reported-by: Simon Riggs
Author: Simon Riggs
2018-06-16 14:03:29 +01:00
Tatsuo Ishii 1cfdb1cb0e Fix memory leak in BufFileCreateShared().
Also this commit unifies some duplicated code in makeBufFile() and
BufFileOpenShared() into new function makeBufFileCommon().

Author: Antonin Houska
Reviewed-By: Thomas Munro, Tatsuo Ishii
Discussion: https://postgr.es/m/16139.1529049566%40localhost
2018-06-16 14:21:08 +09:00
Alvaro Herrera ff03112bdc Fix off-by-one bug in XactLogCommitRecord
Commit 1eb6d6527a introduced zeroed alignment bytes in the GID field
of commit/abort WAL records.  Fixup commit cf5a189059 later changed
that representation into a regular cstring with a single terminating
zero byte, but it also introduced an off-by-one mistake.  Fix that.

Author: Nikhil Sontakke
Reported-by: Nikhil Sontakke
Discussion: https://postgr.es/m/CAMGcDxey6dG1DP34_tJMoWPcp5sPJUAL4K5CayUUXLQSx2GQpA@mail.gmail.com
2018-06-15 15:00:41 -04:00
Tatsuo Ishii 969274d813 Fix memory leak.
Memory is allocated twice for "file" and "files" variables in
BufFileOpenShared().

Author: Antonin Houska
Discussion: https://postgr.es/m/11329.1529045692%40localhost
2018-06-15 16:32:59 +09:00
Alvaro Herrera 74da7cda31 Fail BRIN control functions during recovery explicitly
They already fail anyway, but prior to this patch they raise an ugly
error message about a lock that cannot be acquired.  This just improves
the message.

Author: Masahiko Sawada
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com
Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera
2018-06-14 12:51:32 -04:00
Simon Riggs dc878ffedf Remove spurious code comments in standby related code
GetRunningTransactionData() suggested that subxids were not worth
optimizing away if overflowed, yet they have already been removed
for that case.

Changes to LogAccessExclusiveLock() API forgot to remove the
prior comment when it was copied to LockAcquire().
2018-06-14 12:17:51 +01:00
Simon Riggs 802bde87ba Remove cut-off bug from RunningTransactionData
32ac7a118f tried to fix a Hot Standby issue
reported by Greg Stark, but in doing so caused
a different bug to appear, noted by Andres Freund.

Revoke the core changes from 32ac7a118f,
leaving in its place a minor change in code
ordering and comments to explain for the future.
2018-06-14 12:02:41 +01:00
Tom Lane 91781335ed Code review for match_clause_to_partition_key().
Fix inconsistent decisions about NOMATCH vs UNSUPPORTED result codes.
If we're going to cater for partkeys that have the same expression and
different collations, surely we should also support partkeys with the
same expression and different opclasses.

Clean up shaky handling of commuted opclauses, eg checking the wrong
operator to see what its negator is.  This wouldn't cause any actual
bugs given a sane opclass definition, but it doesn't seem helpful to
expend more code to be less correct.

Improve handling of null elements in ScalarArrayOp arrays: in the
"op ALL" case, we can conclude they result in an unsatisfiable clause.

Minor cosmetic changes and comment improvements.
2018-06-13 16:10:30 -04:00
Tom Lane 19832753f1 Fix some ill-chosen names for globally-visible partition support functions.
"compute_hash_value" is particularly gratuitously generic, but IMO
all of these ought to have names clearly related to partitioning.
2018-06-13 13:18:02 -04:00
Tom Lane e23bae82cf Fix up run-time partition pruning's use of relcache's partition data.
The previous coding saved pointers into the partitioned table's relcache
entry, but then closed the relcache entry, causing those pointers to
nominally become dangling.  Actual trouble would be seen in the field
only if a relcache flush occurred mid-query, but that's hardly out of
the question.

While we could fix this by copying all the data in question at query
start, it seems better to just hold the relcache entry open for the
whole query.

While at it, improve the handling of support-function lookups: do that
once per query not once per pruning test.  There's still something to be
desired here, in that we fail to exploit the possibility of caching data
across queries in the fn_extra fields of the relcache's FmgrInfo structs,
which could happen if we just used those structs in-place rather than
copying them.  However, combining that with the possibility of per-query
lookups of cross-type comparison functions seems to require changes in the
APIs of a lot of the pruning support functions, so it's too invasive to
consider as part of this patch.  A win would ensue only for complex
partition key data types (e.g. arrays), so it may not be worth the
trouble.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us
2018-06-13 12:03:26 -04:00
Andres Freund a54e1f1587 Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point.  After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
  ERROR: found xmin ... from before relfrozenxid ...
and
  ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
   theory that the relcache entry for such tables doesn't
   change. Which is largely true, except for fields like relfrozenxid
   etc.  This means that changes to relations vacuumed in other
   sessions weren't picked up by already existing sessions.  Luckily
   autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
   was never invalidated while running.  That means that for such
   tables (e.g. pg_authid, pg_database) it's not just already existing
   sessions that are affected, but even new connections are as well.
   That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad.  The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
    https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
    https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
    https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
    https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-
2018-06-12 11:13:21 -07:00
Peter Eisentraut 8a07ebb3c1 Convert debug message from ereport to elog 2018-06-12 11:33:39 -04:00
Tom Lane bdc643e5e4 Fix access to just-closed relcache entry.
It might be impossible for this to cause a problem in non-debug builds,
since there'd be no opportunity for the relcache entry to get recycled
before the fetch.  It blows up nicely with -DRELCACHE_FORCE_RELEASE plus
valgrind, though.

Evidently introduced by careless refactoring in commit f0e44751d.
Back-patch accordingly.

Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us
2018-06-11 19:18:04 -04:00
Michael Paquier f8795d2ec8 Fix oversight from 9e149c8 with spin-lock handling
Calling an external function while a pin-lock is held is a bad idea as
those are designed to be short-lived.  The stress of a first commit into
a large git history may contribute to that.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20180611164952.vmxdpdpirdtkdsz6@alap3.anarazel.de
2018-06-12 06:52:34 +09:00