Commit Graph

35795 Commits

Author SHA1 Message Date
Fujii Masao b5310e4ff6 Remove non-fast promotion.
When fast promotion was supported in 9.3, non-fast promotion became
undocumented feature and it's basically not available for ordinary users.
However we decided not to remove non-fast promotion at that moment,
to leave it for a release or two for debugging purpose or as an emergency
method because fast promotion might have some issues, and then to
remove it later. Now, several versions were released since that decision
and there is no longer reason to keep supporting non-fast promotion.
Therefore this commit removes non-fast promotion.

Author: Fujii Masao
Reviewed-by: Hamid Akhtar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/76066434-648f-f567-437b-54853b43398f@oss.nttdata.com
2020-07-29 21:24:26 +09:00
Jeff Davis 9878b643f3 HashAgg: use better cardinality estimate for recursive spilling.
Use HyperLogLog to estimate the group cardinality in a spilled
partition. This estimate is used to choose the number of partitions if
we recurse.

The previous behavior was to use the number of tuples in a spilled
partition as the estimate for the number of groups, which lead to
overpartitioning. That could cause the number of batches to be much
higher than expected (with each batch being very small), which made it
harder to interpret EXPLAIN ANALYZE results.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com
Backpatch-through: 13
2020-07-28 23:16:28 -07:00
Michael Paquier f2130e77da Fix incorrect print format in json.c
Oid is unsigned, so %u needs to be used and not %d.  The code path
involved here is not normally reachable, so no backpatch is done.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200728015523.GA27308@telsasoft.com
2020-07-29 14:44:32 +09:00
Thomas Munro cb04ad4985 Move syncscan.c to src/backend/access/common.
Since the tableam.c code needs to make use of the syncscan.c routines
itself, and since other block-oriented AMs might also want to use it one
day, it didn't make sense for it to live under src/backend/access/heap.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGLCnG%3DNEAByg6bk%2BCT9JZD97Y%3DAxKhh27Su9FeGWOKvDg%40mail.gmail.com
2020-07-29 16:59:33 +12:00
Peter Geoghegan c49c74d192 Rename another "hash_mem" local variable.
Missed by my commit 564ce621.

Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-28 17:59:16 -07:00
Peter Geoghegan b1d79127ed Correct obsolete UNION hash aggs comment.
Oversight in commit 1f39bce0, which added disk-based hash aggregation.

Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-28 17:14:07 -07:00
David Rowley 0e3e1c4e1c Make EXPLAIN ANALYZE of HashAgg more similar to Hash Join
There were various unnecessary differences between Hash Agg's EXPLAIN
ANALYZE output and Hash Join's.  Here we modify the Hash Agg output so
that it's better aligned to Hash Join's.

The following changes have been made:
1. Start batches counter at 1 instead of 0.
2. Always display the "Batches" property, even when we didn't spill to
   disk.
3. Use the text "Batches" instead of "HashAgg Batches" for text format.
4. Use the text "Memory Usage" instead of "Peak Memory Usage" for text
   format.
5. Include "Batches" before "Memory Usage" in both text and non-text
   formats.

In passing also modify the "Planned Partitions" property so that we show
it regardless of if the value is 0 or not for non-text EXPLAIN formats.
This was pointed out by Justin Pryzby and probably should have been part
of 40efbf870.

Reviewed-by: Justin Pryzby, Jeff Davis
Discussion: https://postgr.es/m/CAApHDvrshRnA6C0VFnu7Fb9TVvgGo80PUMm5+2DiaS1gEkPvtw@mail.gmail.com
Backpatch-through: 13, where HashAgg batching was introduced
2020-07-29 11:42:21 +12:00
Amit Kapila 45fdc9738b Extend the logical decoding output plugin API with stream methods.
This adds seven methods to the output plugin API, adding support for
streaming changes of large in-progress transactions.

* stream_start
* stream_stop
* stream_abort
* stream_commit
* stream_change
* stream_message
* stream_truncate

Most of this is a simple extension of the existing methods, with
the semantic difference that the transaction (or subtransaction)
is incomplete and may be aborted later (which is something the
regular API does not really need to deal with).

This also extends the 'test_decoding' plugin, implementing these
new stream methods.

The stream_start/start_stop are used to demarcate a chunk of changes
streamed for a particular toplevel transaction.

This commit simply adds these new APIs and the upcoming patch to "allow
the streaming mode in ReorderBuffer" will use these APIs.

Author: Tomas Vondra, Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-07-28 08:09:44 +05:30
Etsuro Fujita 13838740f6 Fix some issues with step generation in partition pruning.
In the case of range partitioning, get_steps_using_prefix() assumes that
the passed-in prefix list contains at least one clause for each of the
partition keys earlier than one specified in the passed-in
step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps())
didn't take it into account, which led to a server crash or incorrect
results when the list contained no clauses for such partition keys, as
reported in bug #16500 and #16501 from Kobayashi Hisanori.  Update the
caller to call that function only when the list created there contains
at least one clause for each of the earlier partition keys in the case
of range partitioning.

While at it, fix some other issues:

* The list to pass to get_steps_using_prefix() is allowed to contain
  multiple clauses for the same partition key, as described in the
  comment for that function, but that function actually assumed that the
  list contained just a single clause for each of middle partition keys,
  which led to an assertion failure when the list contained multiple
  clauses for such partition keys.  Update that function to match the
  comment.
* In the case of hash partitioning, partition keys are allowed to be
  NULL, in which case the list to pass to get_steps_using_prefix()
  contains no clauses for NULL partition keys, but that function treats
  that case as like the case of range partitioning, which led to the
  assertion failure.  Update the assertion test to take into account
  NULL partition keys in the case of hash partitioning.
* Fix a typo in a comment in get_steps_using_prefix_recurse().
* gen_partprune_steps() failed to detect self-contradiction from
  strict-qual clauses and an IS NULL clause for the same partition key
  in some cases, producing incorrect partition-pruning steps, which led
  to incorrect results of partition pruning, but didn't cause any
  user-visible problems fortunately, as the self-contradiction is
  detected later in the query planning.  Update that function to detect
  the self-contradiction.

Per bug #16500 and #16501 from Kobayashi Hisanori.  Patch by me, initial
diagnosis for the reported issue and review by Dmitry Dolgov.
Back-patch to v11, where partition pruning was introduced.

Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org
Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org
2020-07-28 11:00:00 +09:00
Peter Geoghegan bcbf9446a2 Remove hashagg_avoid_disk_plan GUC.
Note: This GUC was originally named enable_hashagg_disk when it appeared
in commit 1f39bce0, which added disk-based hash aggregation.  It was
subsequently renamed in commit 92c58fd9.

Author: Peter Geoghegan
Reviewed-By: Jeff Davis, Álvaro Herrera
Discussion: https://postgr.es/m/9d9d1e1252a52ea1bad84ea40dbebfd54e672a0f.camel%40j-davis.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-27 17:53:19 -07:00
Michael Paquier e971357961 Fix handling of structure for bytea data type in ECPG
Some code paths dedicated to bytea used the structure for varchar.  This
did not lead to any actual bugs, as bytea and varchar have the same
definition, but it could become a trap if one of these definitions
changes for a new feature or a bug fix.

Issue introduced by 050710b.

Author: Shenhao Wang
Reviewed-by: Vignesh C, Michael Paquier
Discussion: https://postgr.es/m/07ac7dee1efc44f99d7f53a074420177@G08CNEXMBPEKD06.g08.fujitsu.local
Backpatch-through: 12
2020-07-27 10:28:06 +09:00
Jeff Davis 200f6100a9 Fix LookupTupleHashEntryHash() pipeline-stall issue.
Refactor hash lookups in nodeAgg.c to improve performance.

Author: Andres Freund and Jeff Davis
Discussion: https://postgr.es/m/20200612213715.op4ye4q7gktqvpuo%40alap3.anarazel.de
Backpatch-through: 13
2020-07-26 15:09:46 -07:00
David Rowley 56788d2156 Allocate consecutive blocks during parallel seqscans
Previously we would allocate blocks to parallel workers during a parallel
sequential scan 1 block at a time.  Since other workers were likely to
request a block before a worker returns for another block number to work
on, this could lead to non-sequential I/O patterns in each worker which
could cause the operating system's readahead to perform poorly or not at
all.

Here we change things so that we allocate consecutive "chunks" of blocks
to workers and have them work on those until they're done, at which time
we allocate another chunk for the worker.  The size of these chunks is
based on the size of the relation.

Initial patch here was by Thomas Munro which showed some good improvements
just having a fixed chunk size of 64 blocks with a simple ramp-down near
the end of the scan. The revisions of the patch to make the chunk size
based on the relation size and the adjusted ramp-down in powers of two was
done by me, along with quite extensive benchmarking to determine the
optimal chunk sizes.

For the most part, benchmarks have shown significant performance
improvements for large parallel sequential scans on Linux, FreeBSD and
Windows using SSDs.  It's less clear how this affects the performance of
cloud providers.  Tests done so far are unable to obtain stable enough
performance to provide meaningful benchmark results.  It is possible that
this could cause some performance regressions on more obscure filesystems,
so we may need to later provide users with some ability to get something
closer to the old behavior.  For now, let's leave that until we see that
it's really required.

Author: Thomas Munro, David Rowley
Reviewed-by: Ranier Vilela, Soumyadeep Chakraborty, Robert Haas
Reviewed-by: Amit Kapila, Kirk Jamison
Discussion: https://postgr.es/m/CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com
2020-07-26 21:02:45 +12:00
Michael Paquier 11a68e4b53 Tweak behavior of pg_stat_activity.leader_pid
The initial implementation of leader_pid in pg_stat_activity added by
b025f32 took the approach to strictly print what a PGPROC entry
includes.  In short, if a backend has been involved in parallel query at
least once, leader_pid would remain set as long as the backend is alive.
For a parallel group leader, this means that the field would always be
set after it participated at least once in parallel query, and after
more discussions this could be confusing if using for example a
connection pooler.

This commit changes the data printed so as leader_pid becomes always
NULL for a parallel group leader, showing up a non-NULL value only for
the parallel workers, and actually as long as a parallel query is
running as workers are shut down once the query has completed.

This does not change the definition of any catalog, so no catalog bump
is needed.  Per discussion with Justin Pryzby, Álvaro Herrera, Julien
Rouhaud and me.

Discussion: https://postgr.es/m/20200721035145.GB17300@paquier.xyz
Backpatch-through: 13
2020-07-26 16:32:11 +09:00
Noah Misch 15e4419722 Remove optimization for RAND_poll() failing.
The loop to generate seed data will exit on RAND_status(), so we don't
need to handle the case of RAND_poll() failing separately.  Failures
here are rare, so this a code cleanup, essentially.

Daniel Gustafsson, reviewed by David Steele and Michael Paquier.

Discussion: https://postgr.es/m/9B038FA5-23E8-40D0-B932-D515E1D8F66A@yesql.se
2020-07-25 14:50:59 -07:00
Noah Misch ce4939ff70 Use RAND_poll() for seeding randomness after fork().
OpenSSL deprecated RAND_cleanup(), and OpenSSL 1.1.0 made it into a
no-op.  Replace it with RAND_poll(), per an OpenSSL community
recommendation.  While this has no user-visible consequences under
OpenSSL defaults, it might help under non-default settings.

Daniel Gustafsson, reviewed by David Steele and Michael Paquier.

Discussion: https://postgr.es/m/9B038FA5-23E8-40D0-B932-D515E1D8F66A@yesql.se
2020-07-25 14:50:59 -07:00
Tom Lane 0a0727ccfc Improve performance of binary COPY FROM through better buffering.
At least on Linux and macOS, fread() turns out to have far higher
per-call overhead than one could wish.  Reading 64KB of data at a time
and then parceling it out with our own memcpy logic makes binary COPY
from a file significantly faster --- around 30% in simple testing for
cases with narrow text columns (on Linux ... even more on macOS).

In binary COPY from frontend, there's no per-call fread(), and this
patch introduces an extra layer of memcpy'ing, but it still manages
to eke out a small win.  Apparently, the control-logic overhead in
CopyGetData() is enough to be worth avoiding for small fetches.

Bharath Rupireddy and Amit Langote, reviewed by Vignesh C,
cosmetic tweaks by me

Discussion: https://postgr.es/m/CALj2ACU5Bz06HWLwqSzNMN=Gupoj6Rcn_QVC+k070V4em9wu=A@mail.gmail.com
2020-07-25 16:34:35 -04:00
Tom Lane 8a37951eeb Mark built-in coercion functions as leakproof where possible.
Making these leakproof seems helpful since (for example) if you have a
function f(int8) that is leakproof, you don't want it to effectively
become non-leakproof when you apply it to an int4 or int2 column.
But that's what happens today, since the implicit up-coercion will
not be leakproof.

Most of the coercion functions that visibly can't throw errors are
functions that convert numeric datatypes to other, wider ones.
Notable is that float4_numeric and float8_numeric can be marked
leakproof; before commit a57d312a7 they could not have been.
I also marked the functions that coerce strings to "name" as leakproof;
that's okay today because they truncate silently, but if we ever
reconsidered that behavior then they could no longer be leakproof.

I desisted from marking rtrim1() as leakproof; it appears so right now,
but the code seems a little too complex and perhaps subject to change,
since it's shared with other SQL functions.

Discussion: https://postgr.es/m/459322.1595607431@sss.pgh.pa.us
2020-07-25 12:54:58 -04:00
Amit Kapila 2a2494229a Fix buffer usage stats for nodes above Gather Merge.
Commit 85c9d347 addressed a similar problem for Gather and Gather
Merge nodes but forgot to account for nodes above parallel nodes.  This
still works for nodes above Gather node because we shut down the workers
for Gather node as soon as there are no more tuples.  We can do a similar
thing for Gather Merge as well but it seems better to account for stats
during nodes shutdown after completing the execution.

Reported-by: Stéphane Lorek, Jehan-Guillaume de Rorthais
Author: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/20200718160206.584532a2@firost
2020-07-25 10:20:39 +05:30
Tom Lane 79d6d1a277 Replace TS_execute's TS_EXEC_CALC_NOT flag with TS_EXEC_SKIP_NOT.
It's fairly silly that ignoring NOT subexpressions is TS_execute's
default behavior.  It's wrong on its face and it encourages errors
of omission.  Moreover, the only two remaining callers that aren't
specifying CALC_NOT are in ts_headline calculations, and it's very
arguable that those are bugs: if you've specified "!foo" in your
query, why would you want to get a headline that includes "foo"?

Hence, rip that out and change the default behavior to be to calculate
NOT accurately.  As a concession to the slim chance that there is still
somebody somewhere who needs the incorrect behavior, provide a new
SKIP_NOT flag to explicitly request that.

Back-patch into v13, mainly because it seems better to change this
at the same time as the previous commit's rejiggering of TS_execute
related APIs.  Any outside callers affected by this change are
probably also affected by that one.

Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
2020-07-24 15:43:56 -04:00
Tom Lane 2f2007fbb2 Fix assorted bugs by changing TS_execute's callback API to ternary logic.
Text search sometimes failed to find valid matches, for instance
'!crew:A'::tsquery might fail to locate 'crew:1B'::tsvector during
an index search.  The root of the issue is that TS_execute's callback
functions were not changed to use ternary (yes/no/maybe) reporting
when we made the search logic itself do so.  It's somewhat annoying
to break that API, but on the other hand we now see that any code
using plain boolean logic is almost certainly broken since the
addition of phrase search.  There seem to be very few outside callers
of this code anyway, so we'll just break them intentionally to get
them to adapt.

This allows removal of tsginidx.c's private re-implementation of
TS_execute, since that's now entirely duplicative.  It's also no
longer necessary to avoid use of CALC_NOT in tsgistidx.c, since
the underlying callbacks can now do something reasonable.

Back-patch into v13.  We can't change this in stable branches,
but it seems not quite too late to fix it in v13.

Tom Lane and Pavel Borisov

Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com
2020-07-24 15:26:51 -04:00
Peter Eisentraut 25244b8972 Rename configure.in to configure.ac
The new name has been preferred by Autoconf for a long time.  Future
versions of Autoconf will warn about the old name.

Discussion: https://www.postgresql.org/message-id/flat/e796c185-5ece-8569-248f-dd3799701be1%402ndquadrant.com
2020-07-24 10:42:08 +02:00
Thomas Munro 42dee8b8e3 Fix error message.
Remove extra space.  Back-patch to all releases, like commit 7897e3bb.

Author: Lu, Chenyang <lucy.fnst@cn.fujitsu.com>
Discussion: https://postgr.es/m/795d03c6129844d3803e7eea48f5af0d%40G08CNEXMBPEKD04.g08.fujitsu.local
2020-07-23 21:10:49 +12:00
Amit Kapila c55040ccd0 WAL Log invalidations at command end with wal_level=logical.
When wal_level=logical, write invalidations at command end into WAL so
that decoding can use this information.

This patch is required to allow the streaming of in-progress transactions
in logical decoding.  The actual work to allow streaming will be committed
as a separate patch.

We still add the invalidations to the cache and write them to WAL at
commit time in RecordTransactionCommit(). This uses the existing
XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource
manager (see LogStandbyInvalidations for details).

So existing code relying on those invalidations (e.g. redo) does not need
to be changed.

The invalidations written at command end uses a new xlog record type
XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See
LogLogicalInvalidations for details.

These new xlog records are ignored by existing redo procedures, which
still rely on the invalidations written to commit records.

The invalidations are decoded and accumulated in top-transaction, and then
executed during replay.  This obviates the need to decode the
invalidations as part of a commit record.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS.

Author: Dilip Kumar, Tomas Vondra, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-07-23 08:34:48 +05:30
Tom Lane a57d312a77 Support infinity and -infinity in the numeric data type.
Add infinities that behave the same as they do in the floating-point
data types.  Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.

The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.

Patch by me, reviewed by Dean Rasheed and Andrew Gierth

Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
2020-07-22 19:19:44 -04:00
Thomas Munro a5073871ea Fix conversion table generator scripts.
convutils.pm used implicit conversion of undefined value to integer
zero.  Some of conversion scripts are susceptible to regexp greediness.
Fix, avoiding whitespace changes in the output.  Also update ICU URLs
that moved.

No need to back-patch, because the output of these scripts is also in
the source tree so we shouldn't need to rerun them on back-branches.

Author: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ7SEGLbj%3D%3DTQCcyKRA9aqj8%2B6L%3DexSq1y25TA%3DWxLziQ%40mail.gmail.com
2020-07-22 16:50:03 +12:00
Michael Paquier e47c2602aa Fix comment in sha2.h
An incorrect reference to SHA-1 was present.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/FE26C953-FA87-4BB9-9105-AA1F8705B0D0@yesql.se
2020-07-22 10:16:21 +09:00
Tom Lane bd0d893aa7 neqjoinsel must now pass through collation to eqjoinsel.
Since commit 044c99bc5, eqjoinsel passes the passed-in collation
to any operators it invokes.  However, neqjoinsel failed to pass
on whatever collation it got, so that if we invoked a
collation-dependent operator via that code path, we'd get "could not
determine which collation to use for string comparison" or the like.

Per report from Justin Pryzby.  Back-patch to v12, like the previous
commit.

Discussion: https://postgr.es/m/20200721191606.GL5748@telsasoft.com
2020-07-21 19:41:03 -04:00
Peter Geoghegan 4a70f829d8 Add nbtree Valgrind buffer lock checks.
Holding just a buffer pin (with no buffer lock) on an nbtree buffer/page
provides very weak guarantees, especially compared to heapam, where it's
often safe to read a page while only holding a buffer pin.  This commit
has Valgrind enforce the following rule: it is never okay to access an
nbtree buffer without holding both a pin and a lock on the buffer.

A draft version of this patch detected questionable code that was
cleaned up by commits fa7ff642 and 7154aa16.  The code in question used
to access an nbtree buffer page's special/opaque area with no buffer
lock (only a buffer pin).  This practice (which isn't obviously unsafe)
is hereby formally disallowed in nbtree.  There doesn't seem to be any
reason to allow it, and banning it keeps things simple for Valgrind.

The new checks are implemented by adding custom nbtree client requests
(located in LockBuffer() wrapper functions); these requests are
"superimposed" on top of the generic bufmgr.c Valgrind client requests
added by commit 1e0dfd16.  No custom resource management cleanup code is
needed to undo the effects of marking buffers as non-accessible under
this scheme.

Author: Peter Geoghegan
Reviewed-By: Anastasia Lubennikova, Georgios Kokolatos
Discussion: https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
2020-07-21 15:50:58 -07:00
Tom Lane 670c0a1d47 Weaken type-OID-matching checks in array_recv and record_recv.
Rather than always insisting on an exact match of the type OID in the
data to the element type or column type we expect, complain only when
both OIDs fall within the manually-assigned range.  This acknowledges
the reality that user-defined types don't have stable OIDs, while
still preserving some of the mistake-detection value of the old test.

(It's not entirely clear whether to error if one OID is manually
assigned and the other isn't.  But perhaps that case could arise in
cross-version cases where a former extension type has been imported
into core, so I let it pass.)

This change allows us to remove the prohibition on binary transfer
of user-defined arrays and composites in the recently-landed support
for binary logical replication (commit 9de77b545).  We can just
unconditionally drop that check, since if the client has asked for
binary transfer it must be >= v14 and must have this change.

Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
2020-07-21 15:19:46 -04:00
Tom Lane fc032bed2f Be more careful about marking catalog columns NOT NULL by default.
The bug fixed in commit 72eab84a5 would not have occurred if initdb
had a less surprising rule about which columns should be marked
NOT NULL by default.  Let's make that rule be strictly that the
column must be fixed-width and its predecessors must be fixed-width
and NOT NULL, removing the hacky and unsafe exceptions for oidvector
and int2vector.

Since we do still want all existing oidvector and int2vector columns
to be marked NOT NULL, we have to put BKI_FORCE_NOT_NULL labels on
them.  But making this less magic and more documented seems like a
good idea, even if it's a shade more verbose.

I didn't bump catversion since the initial catalog contents are
not actually changed by this patch.  Note however that the
contents of postgres.bki do change, and feeding an old copy of
that to a new backend will produce wrong results.

Discussion: https://postgr.es/m/204760.1595181800@sss.pgh.pa.us
2020-07-21 13:03:48 -04:00
Tom Lane 3e66019f15 Assert that we don't insert nulls into attnotnull catalog columns.
The executor checks for this error, and so does the bootstrap catalog
loader, but we never checked for it in retail catalog manipulations.
The folly of that has now been exposed, so let's add assertions
checking it.  Checking in CatalogTupleInsert[WithInfo] and
CatalogTupleUpdate[WithInfo] should be enough to cover this.

Back-patch to v10; the aforesaid functions didn't exist before that,
and it didn't seem worth adapting the patch to the oldest branches.
But given the risk of JIT crashes, I think we certainly need this
as far back as v11.

Pre-v13, we have to explicitly exclude pg_subscription.subslotname
and pg_subscription_rel.srsublsn from the checks, since they are
mismarked.  (Even if we change our mind about applying BKI_FORCE_NULL
in the branch tips, it doesn't seem wise to have assertions that
would fire in existing databases.)

Discussion: https://postgr.es/m/298837.1595196283@sss.pgh.pa.us
2020-07-21 12:38:08 -04:00
Michael Paquier c273d9d8ce Rework tab completion of COPY and \copy in psql
This corrects and simplifies $subject in a number of ways:
- Remove from the completion the pre-9.0 grammar still supported for
compatibility purposes.  This simplifies the code, and allows to extend
it more easily with new patterns.
- Add completion for the options of FORMAT within a WITH clause.
- Complete WHERE and WITH clauses correctly depending on if TO or FROM
are used, WHERE being only available with COPY FROM.

Author: Vignesh C, Michael Paquier
Reviewed-by: Ahsan Hadi
Discussion: https://postgr.es/m/CALDaNm3zWr=OmxeNqOqfT=uZTSdam_j-gkX94CL8eTNfgUtf6A@mail.gmail.com
2020-07-21 12:05:07 +09:00
Tom Lane a4faef8f8f Fix some corner cases for window ranges with infinite offsets.
Many situations where the offset is infinity were not handled sanely.
We should generally allow the val versus base +/- offset comparison to
proceed according to the normal rules of IEEE arithmetic; however, we
must do something special for the corner cases where base +/- offset
would produce NaN due to subtracting two like-signed infinities.
That corresponds to asking which values infinitely precede +inf or
infinitely follow -inf, which should certainly be true of any finite
value or of the opposite-signed infinity.  After some discussion it
seems that the best decision is to make it true of the same-signed
infinity as well, ie, just return constant TRUE if the calculation
would produce a NaN.

(We could write this with a bit less code by subtracting anyway,
and then checking for a NaN result.  However, I prefer this
formulation because it'll be easier to transpose into numeric.c.)

Although this seems like clearly a bug fix with respect to finite
values, it is less obviously correct for infinite values.  Between
that and the fact that the whole issue only arises for very strange
window specifications (e.g. RANGE BETWEEN 'inf' PRECEDING AND 'inf'
PRECEDING), I'll desist from back-patching.

Noted by Dean Rasheed.

Discussion: https://postgr.es/m/3393130.1594925893@sss.pgh.pa.us
2020-07-20 22:03:18 -04:00
Tom Lane 4fb6aeb4f6 Make floating-point "NaN / 0" return NaN instead of raising an error.
This is more consistent with the IEEE 754 spec and our treatment of
NaNs elsewhere; in particular, the case has always acted that way in
"numeric" arithmetic.

Noted by Dean Rasheed.

Discussion: https://postgr.es/m/3421746.1594927785@sss.pgh.pa.us
2020-07-20 19:44:45 -04:00
Peter Geoghegan 6ca7cd89a2 Assert that buffer is pinned in LockBuffer().
Strengthen the LockBuffer() assertion that verifies BufferIsValid() by
making it verify BufferIsPinned() instead.  Do the same in nearby
related functions.

There is probably not much chance that anybody will try to lock a buffer
that is not already pinned, but we might as well make sure of that.
2020-07-20 16:03:38 -07:00
Tom Lane 0fa0b487b5 Correctly mark pg_subscription_rel.srsublsn as nullable.
The code has always set this column to NULL when it's not valid,
but the catalog header's description failed to reflect that,
as did the SGML docs, as did some of the code.  To prevent future
coding errors of the same ilk, let's hide the field from C code
as though it were variable-length (which, in a sense, it is).

As with commit 72eab84a5, we can only fix this cleanly in HEAD
and v13; the problem extends further back but we'll need some
klugery in the released branches.

Discussion: https://postgr.es/m/367660.1595202498@sss.pgh.pa.us
2020-07-20 14:55:56 -04:00
Tom Lane d5daae47db Fix construction of updated-columns bitmap in logical replication.
Commit b9c130a1f failed to apply the publisher-to-subscriber column
mapping while checking which columns were updated.  Perhaps less
significantly, it didn't exclude dropped columns either.  This could
result in an incorrect updated-columns bitmap and thus wrong decisions
about whether to fire column-specific triggers on the subscriber while
applying updates.  In HEAD (since commit 9de77b545), it could also
result in accesses off the end of the colstatus array, as detected by
buildfarm member skink.  Fix the logic, and adjust 003_constraints.pl
so that the problem is exposed in unpatched code.

In HEAD, also add some assertions to check that we don't access off
the ends of these newly variable-sized arrays.

Back-patch to v10, as b9c130a1f was.

Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
2020-07-20 13:40:16 -04:00
Fujii Masao c3fe108c02 Rename wal_keep_segments to wal_keep_size.
max_slot_wal_keep_size that was added in v13 and wal_keep_segments are
the GUC parameters to specify how much WAL files to retain for
the standby servers. While max_slot_wal_keep_size accepts the number of
bytes of WAL files, wal_keep_segments accepts the number of WAL files.
This difference of setting units between those similar parameters could
be confusing to users.

To alleviate this situation, this commit renames wal_keep_segments to
wal_keep_size, and make users specify the WAL size in it instead of
the number of WAL files.

There was also the idea to rename max_slot_wal_keep_size to
max_slot_wal_keep_segments, in the discussion. But we have been moving
away from measuring in segments, for example, checkpoint_segments was
replaced by max_wal_size. So we concluded to rename wal_keep_segments
to wal_keep_size.

Back-patch to v13 where max_slot_wal_keep_size was added.

Author: Fujii Masao
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele
Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com
2020-07-20 13:30:18 +09:00
Amit Kapila 0bead9af48 Immediately WAL-log subtransaction and top-level XID association.
The logical decoding infrastructure needs to know which top-level
transaction the subxact belongs to, in order to decode all the
changes. Until now that might be delayed until commit, due to the
caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring
incremental decoding.

So we also write the assignment info into WAL immediately, as part
of the next WAL record (to minimize overhead) only when wal_level=logical.
We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is
required for avoiding overflow in the hot standby snapshot.

Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID.

Author: Tomas Vondra, Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-07-20 08:48:26 +05:30
Fujii Masao d05b172a76 Add generic_plans and custom_plans fields into pg_prepared_statements.
There was no easy way to find how many times generic and custom plans
have been executed for a prepared statement. This commit exposes those
numbers of times in pg_prepared_statements view.

Author: Atsushi Torikoshi, Kyotaro Horiguchi
Reviewed-by: Tatsuro Yamada, Masahiro Ikeda, Fujii Masao
Discussion: https://postgr.es/m/CACZ0uYHZ4M=NZpofH6JuPHeX=__5xcDELF8hT8_2T+R55w4RQw@mail.gmail.com
2020-07-20 11:55:50 +09:00
Amit Kapila 044dc7b964 Fix minor typo in nodeIncrementalSort.c.
Author: Vignesh C
Reviewed-by: James Coleman
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CALDaNm0WjZqRvdeL59ZfYH0o4mLbKQ23jm-bnjXcFzgpANx55g@mail.gmail.com
2020-07-20 07:45:26 +05:30
Peter Geoghegan a766d6ca22 Avoid harmless Valgrind no-buffer-pin errors.
Valgrind builds with assertions enabled sometimes perform a
theoretically unsafe page access inside an assertion in
heapam_tuple_lock().  This happened when the eval-plan-qual isolation
test ran one of the permutations added by commit a2418f9e23.

Avoid complaints from Valgrind by moving the assertion ever so slightly.
This is minor cleanup for commit 1e0dfd16, which added Valgrind buffer
access instrumentation.

No backpatch, since this only happens within an assertion, and seems
very unlikely to cause any real problems even with assert-enabled
builds.
2020-07-19 16:12:51 -07:00
Peter Geoghegan 46ef520b95 Mark buffers as defined to Valgrind consistently.
Make PinBuffer() mark buffers as defined to Valgrind unconditionally,
including when the buffer header spinlock must be acquired.  Failure to
handle that case could lead to false positive reports from Valgrind.

This theoretically creates a risk that we'll mark buffers defined even
when external callers don't end up with a buffer pin.  That seems
perfectly acceptable, though, since in general we make no guarantees
about buffers that are unsafe to access being reliably marked as unsafe.

Oversight in commit 1e0dfd16, which added valgrind buffer access
instrumentation.
2020-07-19 09:46:44 -07:00
Tom Lane 72eab84a56 Correctly mark pg_subscription.subslotname as nullable.
Due to the layout of this catalog, subslotname has to be explicitly
marked BKI_FORCE_NULL, else initdb will default to the assumption
that it's non-nullable.  Since, in fact, CREATE/ALTER SUBSCRIPTION
will store null values there, the existing marking is just wrong,
and has been since this catalog was invented.

We haven't noticed because not much in the system actually depends
on attnotnull being truthful.  However, JIT'ed tuple deconstruction
does depend on that in some cases, allowing crashes or wrong answers
in queries that inspect pg_subscription.  Commit 9de77b545 quite
accidentally exposed this on the buildfarm members that force JIT
activation.

Back-patch to v13.  The problem goes further back, but we cannot
force initdb in released branches, so some klugier solution will
be needed there.  Before working on that, push this simple fix
to try to get the buildfarm back to green.

Discussion: https://postgr.es/m/4118109.1595096139@sss.pgh.pa.us
2020-07-19 12:37:23 -04:00
Peter Eisentraut 4d3db13621 Define OPENSSL_API_COMPAT
This avoids deprecation warnings from newer OpenSSL versions (3.0.0 in
particular).

Discussion: https://www.postgresql.org/message-id/flat/FEF81714-D479-4512-839B-C769D2605F8A%40yesql.se
2020-07-19 12:14:42 +02:00
Tom Lane 9b14280b20 Fix replication/worker_internal.h to compile without other headers.
This header hasn't changed recently, so the fact that it now fails
headerscheck/cpluspluscheck testing must be due to changes in what
it includes.  Probably f21916791 is to blame, but I didn't try to
verify that.

Discussion: https://postgr.es/m/3699703.1595016554@sss.pgh.pa.us
2020-07-18 14:58:18 -04:00
Tom Lane 9de77b5453 Allow logical replication to transfer data in binary format.
This patch adds a "binary" option to CREATE/ALTER SUBSCRIPTION.
When that's set, the publisher will send data using the data type's
typsend function if any, rather than typoutput.  This is generally
faster, if slightly less robust.

As committed, we won't try to transfer user-defined array or composite
types in binary, for fear that type OIDs won't match at the subscriber.
This might be changed later, but it seems like fit material for a
follow-on patch.

Dave Cramer, reviewed by Daniel Gustafsson, Petr Jelinek, and others;
adjusted some by me

Discussion: https://postgr.es/m/CADK3HH+R3xMn=8t3Ct+uD+qJ1KD=Hbif5NFMJ+d5DkoCzp6Vgw@mail.gmail.com
2020-07-18 12:44:51 -04:00
Amit Kapila f41fbee7e7 Adjust minor comment in reorderbuffer.c.
Author: Dave Cramer
Reviewed-by: David G. Johnston
Discussion: https://postgr.es/m/CADK3HHL8do4Fp1bsymgNasx375njV3AR7zY3UgYwzbL_Dx-n2Q@mail.gmail.com
2020-07-18 09:57:23 +05:30
Amit Kapila df7c5cb16e Fix comments in reorderbuffer.c.
Author: Dave Cramer
Reviewed-by: David G. Johnston
Discussion: https://postgr.es/m/CADK3HHL8do4Fp1bsymgNasx375njV3AR7zY3UgYwzbL_Dx-n2Q@mail.gmail.com
2020-07-18 09:47:38 +05:30