Commit Graph

35412 Commits

Author SHA1 Message Date
David Rowley f0705bb628 Add functions to calculate the next power of 2
There are many areas in the code where we need to determine the next
highest power of 2 of a given number.  We tend to always do that in an
ad-hoc way each time, generally with some tight for loop which performs a
bitshift left once per loop and goes until it finds a number above the
given number.

Here we add two generic functions which make use of the existing
pg_leftmost_one_pos* functions which, when available, will allow us to
calculate the next power of 2 without any looping.

Here we don't add any code which uses these new functions. That will be
done in follow-up commits.

Author: David Fetter, with some minor adjustments by me
Reviewed-by: John Naylor, Jesse Zhang
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
2020-04-08 16:22:52 +12:00
Tom Lane 7a5d74b7dd Put back mistakenly removed #include.
In commit 4dbcb3f84 I removed some code from parse_coerce.c, and also
removed some apparently-no-longer-needed #includes.  But removing
datum.h broke some not-compiled-by-default code.

Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development
2020-04-08 00:10:16 -04:00
Alvaro Herrera 9e9abed746
Remove testing for precise LSN/reserved bytes in new TAP test
Trying to ensure that a slot's restart_lsn or amount of reserved bytes
exactly match some specific values seems unnecessary, and fragile as
shown by failures in multiple buildfarm members.

Discussion: https://postgr.es/m/20200407232602.GA21559@alvherre.pgsql
2020-04-07 23:28:27 -04:00
Thomas Munro 3985b600f5 Support PrefetchBuffer() in recovery.
Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for
use in recovery.  Rename LocalPrefetchBuffer() to PrefetchLocalBuffer()
for consistency.

Add a return value to all of these.  In recovery, tolerate and report
missing files, so we can handle relations unlinked before crash recovery
began.  Also report cache hits and misses, so that callers can do faster
buffer lookups and better I/O accounting.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2020-04-08 14:56:57 +12:00
Tom Lane 981643dcdb Allow partitionwise join to handle nested FULL JOIN USING cases.
This case didn't work because columns merged by FULL JOIN USING are
represented in the parse tree by COALESCE expressions, and the logic
for recognizing a partitionable join failed to match upper-level join
clauses to such expressions.  To fix, synthesize suitable COALESCE
expressions and add them to the nullable_partexprs lists.  This is
pretty ugly and brute-force, but it gets the job done.  (I have
ambitions of rethinking the way outer-join output Vars are
represented, so maybe that will provide a cleaner solution someday.
For now, do this.)

Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself

Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
2020-04-07 22:12:14 -04:00
Etsuro Fujita c8434d64ce Allow partitionwise joins in more cases.
Previously, the partitionwise join technique only allowed partitionwise
join when input partitioned tables had exactly the same partition
bounds.  This commit extends the technique to some cases when the tables
have different partition bounds, by using an advanced partition-matching
algorithm introduced by this commit.  For both the input partitioned
tables, the algorithm checks whether every partition of one input
partitioned table only matches one partition of the other input
partitioned table at most, and vice versa.  In such a case the join
between the tables can be broken down into joins between the matching
partitions, so the algorithm produces the pairs of the matching
partitions, plus the partition bounds for the join relation, to allow
partitionwise join for computing the join.  Currently, the algorithm
works for list-partitioned and range-partitioned tables, but not
hash-partitioned tables.  See comments in partition_bounds_merge().

Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar
Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by
Dmitry Dolgov and Amul Sul, with additional review at various points by
Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote,
Justin Pryzby, and Tomas Vondra

Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
2020-04-08 10:25:00 +09:00
Tom Lane 41a194f491 Fix circle_in to accept "(x,y),r" as it's advertised to do.
Our documentation describes four allowed input syntaxes for circles,
but the regression tests tried only three ... with predictable
consequences.  Remarkably, this has been wrong since the circle
datatype was added in 1997, but nobody noticed till now.

David Zhang, with some help from me

Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
2020-04-07 20:50:28 -04:00
Andres Freund 75848bc744 snapshot scalability: Move delayChkpt from PGXACT to PGPROC.
The goal of separating hotly accessed per-backend data from PGPROC
into PGXACT is to make accesses fast (GetSnapshotData() in
particular). But delayChkpt is not actually accessed frequently; only
when starting a checkpoint. As it is frequently modified (multiple
times in the course of a single transaction), storing it in the same
cacheline as hotly accessed data unnecessarily dirties a contended
cacheline.

Therefore move delayChkpt to PGPROC.

This is part of a larger series of patches intending to improve
GetSnapshotData() scalability. It is committed and pushed separately,
as it is independently beneficial (small but measurable win, limited
by the other frequent modifications of PGXACT).

Author: Andres Freund
Reviewed-By: Robert Haas, Thomas Munro, David Rowley
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-04-07 17:36:23 -07:00
Tomas Vondra 2b88fdde30 Track SLRU page hits in SimpleLruReadPage_ReadOnly
SLRU page hits were tracked only in SimpleLruReadPage, but that's not
enough because we may hit the page in SimpleLruReadPage_ReadOnly in
which case we don't call SimpleLruReadPage at all.

Reported-by: Kuntal Ghosh
Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
2020-04-08 02:15:47 +02:00
Andres Freund 91c40548d5 Fix XLogReader FD leak that makes backends unusable after 2PC usage.
Before the fix every 2PC commit/abort leaked a file descriptor. As the
files are opened using BasicOpenFile(), that quickly leads to the
backend running out of file descriptors.

Once enough 2PC abort/commit have caused enough FDs to leak, any IO
in the backend will fail with "Too many open files", as
BasicOpenFilePerm() will have triggered all open files known to fd.c
to be closed.

The leak causing the problem at hand is a consequence of 0dc8ead46,
but is only exascerbated by it. Previously most XLogPageReadCB
callbacks used static variables to cache one open file, but after the
commit the cache is private to each XLogReader instance. There never
was infrastructure to close FDs at the time of XLogReaderFree, but the
way XLogReader was used limited the leak to one FD.

This commit just closes the during XLogReaderFree() if the FD is
stored in XLogReaderState.seg.ws_segno. This may not be the way to
solve this medium/long term, but at least unbreaks 2PC.

Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
2020-04-07 17:03:04 -07:00
Alvaro Herrera 7e2ffb3885
Appease perlcritic
Food for the gods must always be found somehow, even when the land starves.
2020-04-07 19:09:55 -04:00
Peter Geoghegan 60cbd7751c Remove nbtree BTreeTupleSetAltHeapTID() function.
Since heap TID is supposed to be just another key attribute to the
implementation, it doesn't make much sense to have separate
BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions.  Merge the
two functions together.  This slightly simplifies _bt_truncate().
2020-04-07 15:56:52 -07:00
Alvaro Herrera c655077639
Allow users to limit storage reserved by replication slots
Replication slots are useful to retain data that may be needed by a
replication system.  But experience has shown that allowing them to
retain excessive data can lead to the primary failing because of running
out of space.  This new feature allows the user to configure a maximum
amount of space to be reserved using the new option
max_slot_wal_keep_size.  Slots that overrun that space are invalidated
at checkpoint time, enabling the storage to be released.

Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
2020-04-07 18:35:00 -04:00
Tom Lane b63c293bcb Allow psql's \g and \gx commands to transiently change \pset options.
We invented \gx to allow the "\pset expanded" flag to be forced on
for the duration of one command output, but that turns out to not
be nearly enough to satisfy the demand for variant output formats.
Hence, make it possible to change any pset option(s) for the duration
of a single command output, by writing "option=value ..." inside
parentheses, for example
	\g (format=csv csv_fieldsep='\t') somefile

\gx can now be understood as a shorthand for including expanded=on
inside the parentheses.

Patch by me, expanding on a proposal by Pavel Stehule

Discussion: https://postgr.es/m/CAFj8pRBx9OnBPRJVtfA5ycUpySge-XootAXAsv_4rrkHxJ8eRg@mail.gmail.com
2020-04-07 17:46:29 -04:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Alvaro Herrera 357889eb17
Support FETCH FIRST WITH TIES
WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL
standard's spelling of LIMIT), where you additionally get rows that
compare equal to the last of those N rows by the columns in the
mandatory ORDER BY clause.

There was a proposal by Andrew Gierth to implement this functionality in
a more powerful way that would yield more features, but the other patch
had not been finished at this time, so we decided to use this one for
now in the spirit of incremental development.

Author: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com
Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
2020-04-07 16:22:13 -04:00
Tom Lane 26a944cf29 Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value.  Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).

The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.

While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB.  This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32.  But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.

Also add some test cases to improve the miserable code coverage
of these functions.

Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-04-07 15:57:58 -04:00
Tomas Vondra 9c74ceb20b Remove debugging elog from pgstat_recv_resetslrucounter
Reported-by: Thomas Munro
2020-04-07 19:20:20 +02:00
Tomas Vondra d22782a539 Minor improvements in Incremental Sort explain
Some places still used "Maximum" instead of "Peak" when displaying info
about sort space, so fix that. Also, add a comment clarifying why it's
correct to check the number of full/prefix sort groups.

Author: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 18:25:13 +02:00
Fujii Masao 4bd0ad9e44 Prevent archive recovery from scanning non-existent WAL files.
Previously when there were multiple timelines listed in the history file
of the recovery target timeline, archive recovery searched all of them,
starting from the newest timeline to the oldest one, to find the segment
to read. That is, archive recovery had to continuously fail scanning
the segment until it reached the timeline that the segment belonged to.
These scans for non-existent segment could be harmful on the recovery
performance especially when archival area was located on the remote
storage and each scan could take a long time.

To address the issue, this commit changes archive recovery so that
it skips scanning the timeline that the segment to read doesn't belong to.

Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao
Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin
Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org
Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
2020-04-08 00:49:29 +09:00
Tomas Vondra ba3e76cc57 Consider Incremental Sort paths at additional places
Commit d2d8a229bc introduced Incremental Sort, but it was considered
only in create_ordered_paths() as an alternative to regular Sort. There
are many other places that require sorted input and might benefit from
considering Incremental Sort too.

This patch modifies a number of those places, but not all. The concern
is that just adding Incremental Sort to any place that already adds
Sort may increase the number of paths considered, negatively affecting
planning time, without any benefit. So we've taken a more conservative
approach, based on analysis of which places do affect a set of queries
that did seem practical. This means some less common queries may not
benefit from Incremental Sort yet.

Author: Tomas Vondra
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 16:43:22 +02:00
Tom Lane c7654f6a37 Fix representation of SORT_TYPE_STILL_IN_PROGRESS.
It turns out that the code did indeed rely on a zeroed
TuplesortInstrumentation.sortMethod field to indicate
"this worker never did anything", although it seems the
issue only comes up during certain race-condition-y cases.

Hence, rearrange the TuplesortMethod enum to restore
SORT_TYPE_STILL_IN_PROGRESS to having the value zero,
and add some comments reinforcing that that isn't optional.

Also future-proof a loop over the possible values of the enum.
sizeof(bits32) happened to be the correct limit value,
but only by purest coincidence.

Per buildfarm and local investigation.

Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us
2020-04-06 22:22:13 -04:00
Thomas Munro 4c04be9b05 Introduce xid8-based functions to replace txid_XXX.
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8.  Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that.  Keep the old functions around too,
for now.

It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:04:32 +12:00
Thomas Munro aeec457de8 Add SQL type xid8 to expose FullTransactionId to users.
Similar to xid, but 64 bits wide.  This new type is suitable for use in
various system views and administration functions.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:03:59 +12:00
Tomas Vondra 4bea576b03 Use INT64_FORMAT when formatting int64 values in explain
Per report from lapwing.
2020-04-07 01:16:57 +02:00
Tomas Vondra 23ba3b5ee2 Fix failures in incremental_sort due to number of workers
The last test in incremental_sort suite prints a parallel plan, but some
of the buildfarm animals have custom max_parallel_workers_per_gather
values, causing failures. Fixed by setting the GUC to an explicit value.

Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-07 00:02:07 +02:00
Peter Geoghegan ce2cee0ade Fix nbtree kill_prior_tuple posting list assert.
An assertion added by commit 0d861bbb checked that _bt_killitems() only
processes a BTScanPosItem whose heap TID is contained in a posting list
tuple when its page offset number still matches what is on the page
(i.e. when it matches the posting list tuple's current offset number).
This was only correct in the common case where the page can't have
changed since we first read it.  It was not correct in cases where we
don't drop the buffer pin (and don't need to verify the page hasn't
changed using its LSN).  The latter category includes scans involving
unlogged tables, and scans that use a non-MVCC snapshot, per the logic
originally introduced by commit 2ed5b87f.

The assertion still seems helpful.  Fix it by taking cases where the
page may have been concurrently modified into account.

Reported-By: Anastasia Lubennikova, Alexander Lakhin
Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru
2020-04-06 14:46:33 -07:00
Tomas Vondra 7d6d82a524 Fix show_incremental_sort_info with force_parallel_mode
When executed with force_parallel_mode=regress, the function was exiting
too early and thus failed to print the worker stats. Fixed by making it
more like show_sort_info.

Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 23:19:13 +02:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Tom Lane 3c8553547b Re-stabilize infinite_recurse() test case.
Since commit 8f59f6b9c0, CLOBBER_CACHE_ALWAYS buildfarm members have
been failing this test case because the error message now sometimes
includes an error cursor position.  It seems largely just luck that
that never happened before, and there are likely to be more ways it
could happen in future.  Hence, rather than trying to prevent it,
adjust the test script to suppress that component of the report.

At some point we might need to back-patch this, but refrain until
there's a demonstrated need.  (We'd need a different fix before v12,
anyway, since VERBOSITY=sqlstate is a recent thing.)

Tom Lane and Andres Freund

Discussion: https://postgr.es/m/30675.1586111599@sss.pgh.pa.us
2020-04-06 12:00:37 -04:00
Peter Eisentraut f1ac27bfda Add logical replication support to replicate into partitioned tables
Mainly, this adds support code in logical/worker.c for applying
replicated operations whose target is a partitioned table to its
relevant partitions.

Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Petr Jelinek <petr@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
2020-04-06 15:15:52 +02:00
Amit Kapila b7ce6de93b Allow autovacuum to log WAL usage statistics.
This commit allows autovacuum to log WAL usage statistics added by commit
df3b181499.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 16:24:51 +05:30
Michael Paquier 8ef9451f58 Refactor cluster.c to use new routine get_index_isclustered()
This new cache lookup routine has been introduced in a40caf5, and more
code paths can directly use it.

Note that in cluster_rel(), the code was returning immediately if the
tuple's entry in pg_index for the clustered index was not valid.  This
commit changes the code so as a lookup error is raised instead,
something that could not happen from the start as we check for the
existence of the index beforehand, while holding an exclusive lock on
the parent table.

Author: Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
2020-04-06 11:44:23 +09:00
Amit Kapila 33e05f89c5 Add the option to report WAL usage in EXPLAIN and auto_explain.
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command.  This option allows to include information on WAL record
generation added by commit df3b181499 in EXPLAIN output.

This also allows the WAL usage information to be displayed via
the auto_explain module.  A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.

Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-06 08:02:15 +05:30
Michael Paquier a40caf5f86 Preserve clustered index after rewrites with ALTER TABLE
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER.  This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly.  Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.

Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
2020-04-06 11:03:49 +09:00
Andres Freund fc3f4453a2 Recompute stack base in forked postmaster children.
This is for the benefit of running postgres under the rr
debugger. When using rr signal handlers running while a syscall is
active use an alternative stack. As e.g. bgworkers are started from
within signal handlers, the forked backend then has a different stack
base than postmaster. Previously that subsequently lead to those
processes triggering spurious "stack depth limit exceeded" errors.

Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de
2020-04-05 18:23:30 -07:00
Andres Freund f946069e68 Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().
There's a very low risk that RecentGlobalXmin could be far enough in
the past to be older than relfrozenxid, or even wrapped
around. Luckily the consequences of that having happened wouldn't be
too bad - the page wouldn't be pruned for a while.

Avoid that risk by using TransactionXmin instead. As that's announced
via MyPgXact->xmin, it is protected against wrapping around (see code
comments for details around relfrozenxid).

Author: Andres Freund
Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de
Backpatch: 9.5-
2020-04-05 17:47:30 -07:00
Andres Freund 549a3e23c3 Fix recently introduced typo.
Reported-By: David Rowley
2020-04-05 12:03:09 -07:00
Peter Eisentraut a9d9bdd3ad Save errno across LWLockRelease() calls
Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()"

Reported-by: Michael Paquier <michael@paquier.xyz>
2020-04-05 10:02:00 +02:00
Tom Lane 18d85e9b8a Further improve stability fix for partition_aggregate test.
Commit 7cb0a423f overlooked that the multi-level partition test table
pagg_tab_ml still had an exactly even row split at its upper level of
partitioning, so that some of the sub-aggregation plan steps still had
exactly equal costs, leading to plan instability.  Tweak the partition
boundaries some more to make the row distribution unequal at both
levels.  This leads to more changes in the "expected" plan order than
the previous round, but it seems fine.  (Actually, I'm surprised that
this didn't affect even more plans in this test: looking at the
underlying costs shows that some of the parallel plan groups are
*not* getting sorted by cost.  Bug?)

Per buildfarm member lousyjack,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-04%2021%3A03%3A04

Discussion: https://postgr.es/m/24467.1585838693@sss.pgh.pa.us
2020-04-05 00:53:28 -04:00
Noah Misch 70de4e950c Add perl2host call missing from a new test file.
Oversight in today's commit c6b92041d3.
Per buildfarm member jacana.

Discussion: http://postgr.es/m/20200404223212.GC3442685@rfd.leadboat.com
2020-04-04 15:45:45 -07:00
Tom Lane 07871d40c7 Remove bogus Assert, add some regression test cases showing why.
Commit 77ec5affb added an assertion to enforce_generic_type_consistency
that boils down to "if the function result is polymorphic, there must be
at least one polymorphic argument".  This should be true for user-created
functions, but there are built-in functions for which it's not true, as
pointed out by Jaime Casanova.  Hence, go back to the old behavior of
leaving the return type alone.  There's only a limited amount of stuff
you can do with such a function result, but it does work to some extent;
add some regression test cases to ensure we don't break that again.

Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com
2020-04-04 18:03:30 -04:00
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Peter Eisentraut 552fcebff0 Revert "Improve handling of parameter differences in physical replication"
This reverts commit 246f136e76.

That patch wasn't quite complete enough.

Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
2020-04-04 09:08:12 +02:00
Amit Kapila df3b181499 Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution.  The three statistics that we collect are the number of WAL
records, the number of full page writes and the amount of WAL bytes
generated.

This helps the users who have write-intensive workload to see the impact
of I/O due to WAL.  This further enables us to see approximately what
percentage of overall WAL is due to full page writes.

In the future, we can extend this functionality to allow us to compute the
the exact amount of WAL data due to full page writes.

This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.

Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
2020-04-04 10:02:08 +05:30
Jeff Davis 0588ee63aa Include chunk overhead in hash table entry size estimate.
Don't try to be precise about it, just use a constant 16 bytes of
chunk overhead. Being smarter would require knowing the memory context
where the chunk will be allocated, which is not known by all callers.

Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
2020-04-03 20:07:58 -07:00
Robert Haas 3e0d80fd8d Fix resource management bug with replication=database.
Commit 0d8c9c1210 allowed BASE_BACKUP to
acquire a ResourceOwner without a transaction so that the backup
manifest functionality could use a BufFile, but it overlooked the fact
that when a walsender is used with replication=database, it might have
a transaction in progress, because in that mode, SQL and replication
commands can be mixed.  Try to fix things up so that the two cleanup
mechanisms don't conflict.

Per buildfarm member serinus, which triggered the problem when
CREATE_REPLICATION_SLOT failed from inside a transaction.  It passed
on the subsequent run, so evidently the failure doesn't happen every
time.
2020-04-03 22:28:37 -04:00
Robert Haas db1531cae0 Be more careful about time_t vs. pg_time_t in basebackup.c.
lapwing is complaining that about a call to pg_gmtime, saying that
it "expected 'const pg_time_t *' but argument is of type 'time_t *'".
I at first thought that the problem had someting to do with const,
but Thomas Munro suggested that it might be just because time_t
and pg_time_t are different identifers. lapwing is i686 rather than
x86_64, and pg_time_t is always int64, so that seems like a good
guess.

There is other code that just casts time_t to pg_time_t without
any conversion function, so try that approach here.

Introduced in commit 0d8c9c1210.
2020-04-03 20:18:47 -04:00
Robert Haas 9f8f881caa pg_validatebackup: Fix 'make clean' to remove tmp_check.
Report by Tom Lane.

Discussion: http://postgr.es/m/22394.1585951968@sss.pgh.pa.us
2020-04-03 19:51:18 -04:00
Robert Haas 19c0422ad0 pg_validatebackup: Adjust TAP tests to undo permissions change.
It may be necessary to go further and remove this test altogether,
but I'm going to try this fix first. It's not clear, at least to
me, exactly how this is breaking buildfarm members, but it appears
to be doing so.
2020-04-03 19:01:59 -04:00
Robert Haas 460314db08 pg_validatebackup: Also use perl2host in TAP tests.
Second try at getting the buildfarm to be happy with 003_corrution.pl
as added by commit 0d8c9c1210.

Per suggestion from Álvaro Herrera.

Discussion: http://postgr.es/m/20200403205412.GA8279@alvherre.pgsql
2020-04-03 17:18:23 -04:00
Tom Lane 0568e7a2a4 Cosmetic improvements for code related to partitionwise join.
Move have_partkey_equi_join and match_expr_to_partition_keys to
relnode.c, since they're used only there.  Refactor
build_joinrel_partition_info to split out the code that fills the
joinrel's partition key lists; this doesn't have any non-cosmetic
impact, but it seems like a useful separation of concerns.
Improve assorted nearby comments.

Amit Langote, with a little further editorialization by me

Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
2020-04-03 17:00:35 -04:00
Robert Haas 21dc48840c pg_validatebackup: Use tempdir_short in TAP tests.
The buildfarm is very unhappy right now because TAP test
003_corruption.pl uses TestLib::tempdir to generate the name of
a temporary directory that is used as a tablespace name, and
this results in a 'symbolic link target too long' error message
on many of the buildfarm machines, but not on my machine.

It appears that other people have run into similar problems in
the past and that TestLib::tempdir_short was the solution, so
let's try using that instead.
2020-04-03 15:40:35 -04:00
Robert Haas 87e3004340 pg_validatebackup: Adjust TAP tests to placate perlcritic.
It seems that we have a policy that every Perl subroutine should
end with an explicit "return", so add explicit "return"
statements to all the new subroutines added by my prior
commit 0d8c9c1210.

Per buildfarm.
2020-04-03 15:28:59 -04:00
Robert Haas 0d8c9c1210 Generate backup manifests for base backups, and validate them.
A manifest is a JSON document which includes (1) the file name, size,
last modification time, and an optional checksum for each file backed
up, (2) timelines and LSNs for whatever WAL will need to be replayed
to make the backup consistent, and (3) a checksum for the manifest
itself. By default, we use CRC-32C when checksumming data files,
because we are trying to detect corruption and user error, not foil an
adversary. However, pg_basebackup and the server-side BASE_BACKUP
command now have options to select a different algorithm, so users
wanting a cryptographic hash function can select SHA-224, SHA-256,
SHA-384, or SHA-512. Users not wanting file checksums at all can
disable them, or disable generating of the backup manifest altogether.
Using a cryptographic hash function in place of CRC-32C consumes
significantly more CPU cycles, which may slow down backups in some
cases.

A new tool called pg_validatebackup can validate a backup against the
manifest. If no checksums are present, it can still check that the
right files exist and that they have the expected sizes. If checksums
are present, it can also verify that each file has the expected
checksum. Additionally, it calls pg_waldump to verify that the
expected WAL files are present and parseable. Only plain format
backups can be validated directly, but tar format backups can be
validated after extracting them.

Robert Haas, with help, ideas, review, and testing from David Steele,
Stephen Frost, Andrew Dunstan, Rushabh Lathia, Suraj Kharage, Tushar
Ahuja, Rajkumar Raghuwanshi, Mark Dilger, Davinder Singh, Jeevan
Chalke, Amit Kapila, Andres Freund, and Noah Misch.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
2020-04-03 15:05:59 -04:00
Fujii Masao ce77abe63c Include information on buffer usage during planning phase, in EXPLAIN output, take two.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.

This commit revives the original commit ed7a509571 that was reverted by
commit 19db23bcbd. The original commit had to be reverted because
it caused the regression test failure on the buildfarm members prion and
dory. But since commit c0885c4c30 got rid of the caues of the test failure,
the original commit can be safely introduced again.

Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
2020-04-04 03:13:17 +09:00
Tom Lane e41955faf0 Fix bugs in gin_fuzzy_search_limit processing.
entryGetItem()'s three code paths each contained bugs associated
with filtering the entries for gin_fuzzy_search_limit.

The posting-tree path failed to advance "advancePast" after having
decided to filter an item.  If we ran out of items on the current
page and needed to advance to the next, what would actually happen
is that entryLoadMoreItems() would re-load the same page.  Eventually,
the random dropItem() test would accept one of the same items it'd
previously rejected, and we'd move on --- but it could take awhile
with small gin_fuzzy_search_limit.  To add insult to injury, this
case would inevitably cause entryLoadMoreItems() to decide it needed
to re-descend from the root, making things even slower.

The posting-list path failed to implement gin_fuzzy_search_limit
filtering at all, so that all entries in the posting list would
be returned.

The bitmap-result path used a "gotitem" variable that it failed to
update in the one place where it'd actually make a difference, ie
at the one "continue" statement.  I think this was unreachable in
practice, because if we'd looped around then it shouldn't be the
case that the entries on the new page are before advancePast.
Still, the "gotitem" variable was contributing nothing to either
clarity or correctness, so get rid of it.

Refactor all three loops so that the termination conditions are
more alike and less unreadable.

The code coverage report showed that we had no coverage at all for
the re-descend-from-root code path in entryLoadMoreItems(), which
seems like a very bad thing, so add a test case that exercises it.
We also had exactly no coverage for gin_fuzzy_search_limit, so add a
simplistic test case that at least hits those code paths a little bit.

Back-patch to all supported branches.

Adé Heyward and Tom Lane

Discussion: https://postgr.es/m/CAEknJCdS-dE1Heddptm7ay2xTbSeADbkaQ8bU2AXRCVC2LdtKQ@mail.gmail.com
2020-04-03 13:15:45 -04:00
Fujii Masao c0885c4c30 Improve stability of explain regression test.
The explain regression test runs EXPLAIN commands via the function
that filters unstable outputs. To produce more stable test output,
this commit improves the function so that it also filters out text-mode
Buffers lines. This is necessary because text-mode Buffers lines vary
depending the system state.

This improvement will get rid of the regression test failure that
the commit ed7a509571 caused on the buildfarm members prion and
dory because of the instability of Buffers lines.

Author: Fujii Masao
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20200403025751.GB1759@paquier.xyz
2020-04-04 01:26:39 +09:00
Robert Haas 3031440e98 pg_waldump: Don't call XLogDumpDisplayStats() if -q is specified.
Commit ac44367efb introduced this
problem.

Report and fix by Fujii Masao.

Discussion: http://postgr.es/m/d332b8f0-0c72-3cd6-6945-7a86a503662a@oss.nttdata.com
2020-04-03 11:58:58 -04:00
Robert Haas c12e43a2e0 Add checksum helper functions.
These functions make it easier to write code that wants to compute a
checksum for some data while allowing the user to configure the type
of checksum that gets used.

This is another piece of infrastructure for the upcoming patch to add
backup manifests.

Patch written from scratch by me, but it is similar to previous work
by Rushabh Lathia and Suraj Kharage. Suraj also reviewed this version
off-list. Advice on how not to break Windows from Davinder Singh.

Discussion: http://postgr.es/m/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoZRTBiPyvQEwV79PU1ePTtSEo2UeVncrkJMbn1sU1gnRA@mail.gmail.com
2020-04-03 11:52:43 -04:00
Tom Lane 6dd9f35779 Fix bogus CALLED_AS_TRIGGER() defenses.
contrib/lo's lo_manage() thought it could use
trigdata->tg_trigger->tgname in its error message about
not being called as a trigger.  That naturally led to a core dump.

unique_key_recheck() figured it could Assert that fcinfo->context
is a TriggerData node in advance of having checked that it's
being called as a trigger.  That's harmless in production builds,
and perhaps not that easy to reach in any case, but it's logically
wrong.

The first of these per bug #16340 from William Crowell;
the second from manual inspection of other CALLED_AS_TRIGGER
call sites.

Back-patch the lo.c change to all supported branches, the
other to v10 where the thinko crept in.

Discussion: https://postgr.es/m/16340-591c7449dc7c8c47@postgresql.org
2020-04-03 11:24:56 -04:00
Fujii Masao 19db23bcbd Revert "Include information on buffer usage during planning phase, in EXPLAIN output."
This reverts commit ed7a509571.

Per buildfarm member prion.
2020-04-03 12:20:42 +09:00
Fujii Masao 18808f8c89 Add wait events for recovery conflicts.
This commit introduces new wait events RecoveryConflictSnapshot and
RecoveryConflictTablespace. The former is reported while waiting for
recovery conflict resolution on a vacuum cleanup. The latter is reported
while waiting for recovery conflict resolution on dropping tablespace.

Also this commit changes the code so that the wait event Lock is reported
while waiting in ResolveRecoveryConflictWithVirtualXIDs() for recovery
conflict resolution on a lock. Basically the wait event Lock is reported
during that wait, but previously was not reported only when that wait
happened in ResolveRecoveryConflictWithVirtualXIDs().

Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
2020-04-03 12:15:56 +09:00
Michael Paquier 9d8ef98800 Add support for \aset in pgbench
This option is similar to \gset, except that it is able to store all
results from combined SQL queries into separate variables.  If a query
returns multiple rows, the last result is stored and if a query returns
no rows, nothing is stored.

While on it, add a TAP test for \gset to check for a failure when a
query returns multiple rows.

Author: Fabien Coelho
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1904081914200.2529@lancre
2020-04-03 11:45:15 +09:00
Fujii Masao ed7a509571 Include information on buffer usage during planning phase, in EXPLAIN output.
When BUFFERS option is enabled, EXPLAIN command includes the information
on buffer usage during each plan node, in its output. In addition to that,
this commit makes EXPLAIN command include also the information on
buffer usage during planning phase, in its output. This feature makes it
easier to discern the cases where lots of buffer access happen during
planning.

Author: Julien Rouhaud, slightly revised by Fujii Masao
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
2020-04-03 11:27:09 +09:00
Robert Haas ac44367efb pg_waldump: Add a --quiet option.
The primary motivation for this change is that it will be used by the
upcoming patch to add backup manifests, but it also seems to have some
potential more general use.

Andres Freund and Robert Haas

Discussion: http://postgr.es/m/20200330020814.nspra4mvby42yoa4@alap3.anarazel.de
2020-04-02 20:25:04 -04:00
Tom Lane 7cb0a423f9 Improve stability fix for partition_aggregate test.
Instead of disabling autovacuum on these test tables, adjust the
partition boundaries so that the child partitions are not all the
same size.  That should cause the planner to use a predictable
ordering of the per-partition scan nodes even in cases where
autovacuum causes the rowcount estimates to be off a bit.
Moreover, this also lets these tests show that the planner does
properly order the tables in descending size order, something
that wasn't being proven before.

The pagg_tab1 and pagg_tab2 partitions are still all the same
size, but that should be fine, because those tables are so small
that (1) autovacuum won't fire on them, and (2) even if it did,
it couldn't change the reltuples value --- with only one page,
it can't see just part of the relation.

Discussion: https://postgr.es/m/24467.1585838693@sss.pgh.pa.us
2020-04-02 19:43:51 -04:00
Tom Lane 0b34e7d307 Improve user control over truncation of logged bind-parameter values.
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to.  (The previous coding hard-wired that choice at
64 bytes.)

In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors.  Previous
releases always logged such parameters in full, possibly causing log
bloat.

For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).

Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.

Alexey Bashtanov, editorialized a little by me

Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
2020-04-02 15:04:51 -04:00
David Rowley cefb82d49e Attempt to stabilize partitionwise_aggregate test
In b07642dbc, we added code to trigger autovacuums based on the number of
INSERTs into a table. This seems to have cause some destabilization of
the regression tests. Likely this is due to an autovacuum triggering
mid-test and (per theory from Tom Lane) one of the test's queries causes
autovacuum to skip some number of pages, resulting in the reltuples
estimate changing.

The failure that this is attempting to fix is around the order of subnodes
in an Append. Since the planner orders these according to the subnode
cost, then it's possible that a small change in the reltuples value changes
the subnode's cost enough that it swaps position with one of its fellow
subnodes.

The failure here only seems to occur on slower buildfarm machines. In this
case, lousyjack, which seems have taken over 8 minutes to run just
the partitionwise_aggregate test. Such a slow run would increase the
chances that the autovacuum launcher would trigger a vacuum mid-test.
Faster machines run this test in sub second time, so have a much smaller
window for an autovacuum to trigger.

Here we fix this by disabling autovacuum on all tables created in the test.

Additionally, this reverts the change made in the
partitionwise_aggregate test in 2dc16efed.

Discussion: https://postgr.es/m/22297.1585797192@sss.pgh.pa.us
2020-04-02 21:26:54 +13:00
Peter Eisentraut 2991ac5fc9 Add SQL functions for Unicode normalization
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.

To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data.  make update-unicode will keep it up to date.  We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
2020-04-02 08:56:27 +02:00
Peter Eisentraut c6e0edad46 Add some comments to some SQL features
Otherwise, it could be confusing to a reader that some of these
well-publicized features are simply listed as unsupported without
further explanation.
2020-04-02 07:52:20 +02:00
Thomas Munro 37b3794dfc Add maintenance_io_concurrency to postgresql.conf.sample.
New GUC from commit fc34b0d9.
2020-04-02 16:50:36 +13:00
Amit Kapila 3a5e22138a Allow parallel vacuum to accumulate buffer usage.
Commit 40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers.  This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.

Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
2020-04-02 08:04:58 +05:30
Fujii Masao 17e0328224 Allow pg_stat_statements to track planning statistics.
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.

The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.

This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.

Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
2020-04-02 11:20:19 +09:00
Tomas Vondra 28cac71bd3 Collect statistics about SLRU caches
There's a number of SLRU caches used to access important data like clog,
commit timestamps, multixact, asynchronous notifications, etc. Until now
we had no easy way to monitor these shared caches, compute hit ratios,
number of reads/writes etc.

This commit extends the statistics collector to track this information
for a predefined list of SLRUs, and also introduces a new system view
pg_stat_slru displaying the data.

The list of built-in SLRUs is fixed, but additional SLRUs may be defined
in extensions. Unfortunately, there's no suitable registry of SLRUs, so
this patch simply defines a fixed list of SLRUs with entries for the
built-in ones and one entry for all additional SLRUs. Extensions adding
their own SLRU are fairly rare, so this seems acceptable.

This patch only allows monitoring of SLRUs, not tuning. The SLRU sizes
are still fixed (hard-coded in the code) and it's not entirely clear
which of the SLRUs might need a GUC to tune size. In a way, allowing us
to determine that is one of the goals of this patch.

Bump catversion as the patch introduces new functions and system view.

Author: Tomas Vondra
Reviewed-by: Alvaro Herrera
Discussion: https://www.postgresql.org/message-id/flat/20200119143707.gyinppnigokesjok@development
2020-04-02 02:34:21 +02:00
Tom Lane 501b018799 Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.

The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.

Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily.  But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies.  Hence, look up the equality
operator in each family, and accept if it's the same operator.  This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.

A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality".  But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present.  (Any non-core AMs that know how to enforce equality are
out of luck, for now.)

Back-patch to v11 where this feature was introduced.

Guancheng Luo, revised a bit by me

Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 14:49:49 -04:00
Tom Lane a80818605e Improve selectivity estimation for assorted match-style operators.
Quite a few matching operators such as JSONB's @> used "contsel" and
"contjoinsel" as their selectivity estimators.  That was a bad idea,
because (a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators as
five times more selective than equality, which is surely pretty silly.

There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values.  That code is
actually 100% generic, except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.

Hence, migrate the guts of ltreeparentsel() into the core code, provide
wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate
default estimate, and use those for the non-geometric operators that
formerly used contsel (mostly JSONB containment operators and tsquery
matching).

Also apply this code to some match-like operators in hstore, ltree, and
pg_trgm, including the former users of ltreeparentsel as well as ones
that improperly used contsel.  Since commit 911e70207 just created new
versions of those extensions that we haven't released yet, we can sneak
this change into those new versions instead of having to create an
additional generation of update scripts.

Patch by me, reviewed by Alexey Bashtanov

Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
2020-04-01 10:32:33 -04:00
Peter Eisentraut d8653f4687 Refactor code to look up local replication tuple
This unifies some duplicate code.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA+HiwqFjYE5anArxvkjr37AQMd52L-LZtz9Ld2QrLQ3YfcYhTw@mail.gmail.com
2020-04-01 15:34:41 +02:00
Michael Paquier 8d84dd0012 Fix crash in psql when attempting to reuse old connection
In a psql session, if the connection to the server is abruptly cut, the
referenced connection would become NULL as of CheckConnection().  This
could cause a hard crash with psql if attempting to connect by reusing
the past connection's data because of a null-pointer dereference with
either PQhost() or PQdb().  This issue is fixed by making sure that no
reuse of the past connection is done if it does not exist.

Issue has been introduced by 6e5f8d4, so backpatch down to 12.

Reported-by: Hugh Wang
Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/16330-b34835d83619e25d@postgresql.org
Backpatch-through: 12
2020-04-01 14:45:45 +09:00
Amit Kapila 2401d93718 Fix coverity complaint about commit 40d964ec99.
The coverity complained that dividing integer expressions and then
converting the integer quotient to type "double" would lose fractional
part.  Typecasting one of the arguments of expression with double should
fix the report.

Author: Mahendra Singh Thalor
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20200329224818.6phnhv7o2q2rfovf@alap3.anarazel.de
2020-04-01 09:28:13 +05:30
Bruce Momjian 08481eedd1 psql: do file completion for \gx
This was missed when the feature was added.

Reported-by: Vik Fearing

Discussion: https://postgr.es/m/eca20529-0b06-b493-ee38-f071a75dcd5b@postgresfriends.org

Backpatch-through: 10
2020-03-31 23:01:34 -04:00
Michael Paquier a7e8ece41c Add -c/--restore-target-wal to pg_rewind
pg_rewind needs to copy from the source cluster to the target cluster a
set of relation blocks changed from the previous checkpoint where WAL
forked up to the end of WAL on the target.  Building this list of
relation blocks requires a range of WAL segments that may not be present
anymore on the target's pg_wal, causing pg_rewind to fail.  It is
possible to work around this issue by copying manually the WAL segments
needed but this may lead to some extra and actually useless work.

This commit introduces a new option allowing pg_rewind to use a
restore_command while doing the rewind by grabbing the parameter value
of restore_command from the target cluster configuration.  This allows
the rewind operation to be more reliable, so as only the WAL segments
needed by the rewind are restored from the archives.

In order to be able to do that, a new routine is added to src/common/ to
allow frontend tools to restore files from archives using an
already-built restore command.  This version is more simple than the
backend equivalent as there is no need to handle the non-recovery case.

Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
2020-04-01 10:57:03 +09:00
Peter Geoghegan 7dbe290da4 Add CREATE INDEX deduplication assertions.
Add two assertions that verify the assumptions about posting list tuple
space accounting and suffix truncation made within nbtsort.c.
2020-03-31 14:38:39 -07:00
Tom Lane fe3036527a Fix race condition in statext_store().
Must hold some lock on the pg_statistic_ext_data catalog *before*
we look up the tuple we aim to replace.  Otherwise a concurrent
VACUUM FULL or similar operation could move it to a different TID,
leaving us trying to replace the wrong tuple.

Back-patch to v12 where this got broken.

Credit goes to Dean Rasheed; I'm just doing the clerical work.

Discussion: https://postgr.es/m/CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
2020-03-31 17:06:22 -04:00
Tom Lane 0936d1b6ff Still another try at stabilizing stats_ext test results.
The stats_ext test is not expecting that autovacuum will touch
any of its tables; an expectation falsified by commit b07642dbc.
Although I'm suspicious that there's something else going on that
makes extended stats estimates not 100% reproducible, it's pretty
easy to demonstrate that there are places in this test that fail
if an autovacuum updates the table's stats unexpectedly.

Hence, revert the band-aid changes made by 2dc16efed and 24566b359
in favor of summarily disabling autovacuum for all the tables that
this test checks estimated rowcounts for.

Also remove an evidently obsolete comment at the head of the test.

Discussion: https://postgr.es/m/15012.1585623298@sss.pgh.pa.us
2020-03-31 16:09:25 -04:00
Fujii Masao b0236508d3 Improve the message logged when recovery is paused.
When recovery target is reached and recovery is paused because of
recovery_target_action=pause, executing pg_wal_replay_resume() causes
the standby to promote, i.e., the recovery to end. So, in this case,
the previous message "Execute pg_wal_replay_resume() to continue"
logged was confusing because pg_wal_replay_resume() doesn't cause
the recovery to continue.

This commit improves the message logged when recovery is paused,
and the proper message is output based on what (pg_wal_replay_pause
or recovery_target_action) causes recovery to be paused.

Author: Sergei Kornilov, revised by Fujii Masao
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/19168211580382043@myt5-b646bde4b8f3.qloud-c.yandex.net
2020-04-01 03:35:13 +09:00
Bruce Momjian 051fd5e0f9 Allow ecpg to be built stand-alone, allow parallel libpq make
This change defines SHLIB_PREREQS for the libpgport dependency, rather
than using a makefile rule.  This was broken in PG 12.

Reported-by: Filip Janus

Discussion: https://postgr.es/m/E5Dc85EGUY4wyG8cjAU0qoEdCJxGK_qhW1s9qSuYq9A@mail.gmail.com

Author: Dagfinn Ilmari Mannsåker (for libpq)

Backpatch-through: 12
2020-03-31 14:17:32 -04:00
Tom Lane 82e8018522 Teach pg_ls_dir_files() to ignore ENOENT failures from stat().
Buildfarm experience shows that this function can fail with ENOENT
if some other process unlinks a file between when we read the directory
entry and when we try to stat() it.  The problem is old but we had
not noticed it until 085b6b667 added regression test coverage.

To fix, just ignore ENOENT failures.  There is one other case that
this might hide: a symlink that points to nowhere.  That seems okay
though, at least better than erroring.

Back-patch to v10 where this function was added, since the regression
test cases were too.

Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
2020-03-31 12:57:55 -04:00
Alexander Korotkov 02a5786df2 Improve error reporting in opclasscmds.c
This commit improves error reporting introduced by 911e702077.  It puts
argument of errmsg() to the single line for easier grepping source for error
text.  Also it improves wording of errhint().
2020-03-31 17:51:57 +03:00
Magnus Hagander 087d3d0583 Fix assorted typos
Author: Daniel Gustafsson <daniel@yesql.se>
2020-03-31 16:00:06 +02:00
Peter Eisentraut de3bbfcc96 Fix INSERT OVERRIDING USER VALUE behavior
The original implementation disallowed using OVERRIDING USER VALUE on
identity columns defined as GENERATED ALWAYS, which is not per
standard.  So allow that now.

Expand documentation and tests around this.

Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/CAEZATCVrh2ufCwmzzM%3Dk_OfuLhTTPBJCdFkimst2kry4oHepuQ%40mail.gmail.com
2020-03-31 08:50:39 +02:00
Michael Paquier 616ae3d2b0 Move routine definitions of xlogarchive.c to a new header file
The definitions of the routines defined in xlogarchive.c have been part
of xlog_internal.h which is included by several frontend tools, but all
those routines are only called by the backend.  More cleanup could be
done within xlog_internal.h, but that's already a nice cut.

This will help a follow-up patch for pg_rewind where handling of
restore_command is added for frontends.

Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Álvaro Herrera, Alexander Korotkov
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
2020-03-31 15:33:04 +09:00
Peter Eisentraut fc8c3bdde2 Update SQL features
Set T653 to supported.  This has always been possible.
2020-03-31 08:25:03 +02:00
Amit Kapila ef75140fe7 Avoid calls to RelationGetRelationName() and RelationGetNamespace() in
vacuum code.

After commit b61d161c14, during vacuum, we cache the information of
relation name and relation namespace in local structure LVRelStats so that
we can use it in an error callback function.  We can use the cached
information to avoid the calls to RelationGetRelationName(),
RelationGetNamespace() and get_namespace_name().  This is mainly for the
consistent in vacuum code path but it will avoid the extra syscache lookup
we do in get_namespace_name().

Author: Justin Pryzby
Reviewed-by: Amit Kapila
Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
2020-03-31 09:34:49 +05:30
Peter Geoghegan f01157e2ac Further simplify nbtree high key truncation.
Commit 7c2dbc69 reorganized _bt_truncate() in a way that enables a
further simplification that I (pgeoghegan) missed:  Since we mark the
tuple that is returned to the caller as a pivot tuple before the point
where its heap TID is set as of 7c2dbc69, it is possible to use the high
level BTreeTupleGetHeapTID() inline function to get an item pointer.  Do
it that way now.  This approach is clearer and more maintainable.
2020-03-30 17:34:12 -07:00
Michael Paquier dd9ac7d5d8 Revert "Skip redundant anti-wraparound vacuums"
This reverts commit 2aa6e33, that added a fast path to skip
anti-wraparound and non-aggressive autovacuum jobs (these have no sense
as anti-wraparound implies aggressive).  With a cluster using a high
amount of relations with a portion of them being heavily updated, this
could cause autovacuum to lock down, with autovacuum workers attempting
repeatedly those jobs on the same relations for the same database, that
just kept being skipped.  This lock down can be solved with a manual
VACUUM FREEZE.

Justin King has reported one environment where the issue happened, and
Julien Rouhaud and I have been able to reproduce it in a second
environment.  With a very aggressive autovacuum_freeze_max_age,
triggering those jobs with pgbench is a matter of minutes, and hitting
the lock down is a lot harder (my local tests failed to do that).

Note that anti-wraparound and non-aggressive jobs can only be triggered
on a subset of shared catalogs:
- pg_auth_members
- pg_authid
- pg_database
- pg_replication_origin
- pg_shseclabel
- pg_subscription
- pg_tablespace
While the lock down was possible down to v12, the root cause of those
jobs is a much older issue, which needs more analysis.

Bonus thanks to Andres Freund for the discussion.

Reported-by: Justin King
Discussion: https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com
Backpatch-through: 12
2020-03-31 08:27:47 +09:00
Peter Geoghegan 7c2dbc691c Refactor nbtree high key truncation.
Simplify _bt_truncate(), the routine that generates truncated leaf page
high keys.  Remove a micro-optimization that avoided a second palloc0()
call (this was used when a heap TID was needed in the final pivot tuple,
though only when the index happened to not be an INCLUDE index).

Removing this dubious micro-optimization allows _bt_truncate() to use
the index_truncate_tuple() indextuple.c utility routine in all cases.
This was already the common case.

This commit is a HEAD-only follow up to bugfix commit 4b42a899.
2020-03-30 15:52:39 -07:00
Andres Freund d4b34f60c5 Deduplicate PageIsNew() check in lazy_scan_heap().
The recheck isn't needed anymore, as RelationGetBufferForTuple() now
extends the relation with RBM_ZERO_AND_LOCK. Previously we needed to
handle the fact that relation extension extended the relation and then
separately acquired a lock on the page - while expecting that the page
is empty.

Reported-By: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQArA_=J0D5T258xsCY6Xtf6wiH4b=QDPDgVS+WZUN10WDw@mail.gmail.com
2020-03-30 13:56:40 -07:00
Alexander Korotkov 364bdd0b41 Fix missing SP-GiST support in 911e702077
911e702077 misses setting of amoptsprocnum for SP-GiST.  This commit fixes
that.
2020-03-30 23:45:03 +03:00
Alexander Korotkov 851b14b0c6 Remove rudiments of supporting procnum == 0 from 911e702077
Early versions of opclass options patch uses zero support procedure as opclass
options procedure.  This commit removes rudiments of it, which were committed
in 911e702077.  Also, it implements correct handling of amoptsprocnum == 0.
2020-03-30 23:43:25 +03:00
Peter Geoghegan 4b42a89938 Consistently truncate non-key suffix columns.
INCLUDE indexes failed to have their non-key attributes physically
truncated away in certain rare cases.  This led to physically larger
pivot tuples that contained useless non-key attribute values.  The
impact on users should be negligible, but this is still clearly a
regression (Postgres 11 supports INCLUDE indexes, and yet was not
affected).

The bug appeared in commit dd299df8, which introduced "true" suffix
truncation of key attributes.

Discussion: https://postgr.es/m/CAH2-Wz=E8pkV9ivRSFHtv812H5ckf8s1-yhx61_WrJbKccGcrQ@mail.gmail.com
Backpatch: 12-, where "true" suffix truncation was introduced.
2020-03-30 12:03:59 -07:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
Peter Eisentraut 1d53432ff9 Allow using Unix-domain sockets on Windows in tests
The test suites currently don't use Unix-domain sockets on Windows.
This optionally allows enabling that by setting the environment
variable PG_TEST_USE_UNIX_SOCKETS.

This should currently be considered experimental.  In particular,
pg_regress.c contains some comments that the cleanup code for
Unix-domain sockets doesn't work correctly under Windows, which hasn't
been an problem until now.  But it's good enough for locally
supervised testing of the functionality.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com
2020-03-30 17:35:29 +02:00
Tom Lane 8c49454caa Be more careful about extracting encoding from locale strings on Windows.
GetLocaleInfoEx() can fail on strings that setlocale() was perfectly
happy with.  A common way for that to happen is if the locale string
is actually a Unix-style string, say "et_EE.UTF-8".  In that case,
what's after the dot is an encoding name, not a Windows codepage number;
blindly treating it as a codepage number led to failure, with a fairly
silly error message.  Hence, check to see if what's after the dot is
all digits, and if not, treat it as a literal encoding name rather than
a codepage number.  This will do the right thing with many Unix-style
locale strings, and produce a more sensible error message otherwise.

Somewhat independently of that, treat a zero (CP_ACP) result from
GetLocaleInfoEx() as meaning that we must use UTF-8 encoding.

Back-patch to all supported branches.

Juan José Santamaría Flecha

Discussion: https://postgr.es/m/24905.1585445371@sss.pgh.pa.us
2020-03-30 11:14:58 -04:00
David Rowley 24566b359d Attempt to fix unstable regression tests, take 2
Following up on 2dc16efed, petalura has suffered some additional
failures in stats_ext which again appear to be around the timing of an
autovacuum during the test, causing instability in the row estimates.

Again, let's fix this by explicitly performing a VACUUM on the table
and not leave it to happen by chance of an autovacuum pass.

Discussion: https://postgr.es/m/CAApHDvok5hmXr%2BbUbJe7%2B2sQzWo4B_QzSk7RKFR9fP6BjYXx5g%40mail.gmail.com
2020-03-30 23:41:11 +13:00
Fujii Masao 64638ccba3 Report waiting via PS while recovery is waiting for buffer pin in hot standby.
Previously while the startup process was waiting for the recovery conflict
with snapshot, tablespace or lock to be resolved, waiting was reported in
PS display, but not in the case of recovery conflict with buffer pin.
This commit makes the startup process in hot standby report waiting via PS
while waiting for the conflicts with other backends holding buffer pins to
be resolved.

Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4mXWTwfQLS3RPwGr4xnfAEs1ysFfgYHvmmoUgv6Zxvmg@mail.gmail.com
2020-03-30 17:35:03 +09:00
Peter Eisentraut 246f136e76 Improve handling of parameter differences in physical replication
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  At that point we pause
recovery, so a hot standby remains usable.  We also repeat the last
warning message once a minute so it is harder to miss or ignore.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2020-03-30 09:53:45 +02:00
Peter Eisentraut a01e1b8b9d Add new part SQL/MDA to information_schema.sql_parts 2020-03-30 08:55:55 +02:00
Fujii Masao 6aba63ef3e Allow the planner-related functions and hook to accept the query string.
This commit adds query_string argument into the planner-related functions
and hook and allows us to pass the query string to them.

Currently there is no user of the query string passed. But the upcoming patch
for the planning counters will add the planning hook function into
pg_stat_statements and the function will need the query string. So this change
will be necessary for that patch.

Also this change is useful for some extensions that want to use the query
string in their planner hook function.

Author: Pascal Legrand, Julien Rouhaud
Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com
Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
2020-03-30 13:51:05 +09:00
Fujii Masao 4a539a25eb Expose BufferUsageAccumDiff().
Previously pg_stat_statements calculated the difference of buffer counters
by its own code even while BufferUsageAccumDiff() had the same code.
This commit expose BufferUsageAccumDiff() and makes pg_stat_statements
use it for the calculation, in order to simply the code.

This change also would be useful for the upcoming patch for the planning
counters in pg_stat_statements because the patch will add one more code
for the calculation of difference of buffer counters and that can easily be
done by using BufferUsageAccumDiff().

Author: Julien Rouhaud
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/bdfee4e0-a304-2498-8da5-3cb52c0a193e@oss.nttdata.com
2020-03-30 12:15:26 +09:00
Amit Kapila b61d161c14 Introduce vacuum errcontext to display additional information.
The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.GC30362@telsasoft.com
2020-03-30 07:33:38 +05:30
Peter Eisentraut 9cedb16660 pg_regress: Observe TMPDIR
Put the temporary socket directory under TMPDIR, if that environment
variable is set, instead of the hardcoded /tmp.

This allows running the tests if there is no /tmp at all (for example
on Windows, although running the tests with Unix-domain sockets is not
enabled on Windows yet).  We also use TMPDIR everywhere else /tmp is
hardcoded, so this makes the behavior consistent.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com
2020-03-29 09:25:40 +02:00
Peter Eisentraut b79911dc8c Update SQL features
Change F181 to supported.  It requires that an embedded C program can
be split across multiple files, which ECPG easily supports.
2020-03-29 08:56:41 +02:00
David Rowley 2dc16efedc Attempt to fix unstable regression tests
b07642dbc added code to trigger autovacuums based on the number of
inserts into a table. This seems to have caused some regression test
results to destabilize. I suspect this is due to autovacuum triggering a
vacuum sometime after the test's ANALYZE run and perhaps reltuples is
ending up being set to a slightly different value as a result.

Attempt to resolve this by running a VACUUM ANALYZE on the affected table
instead of just ANALYZE. pg_class.reltuples will still get set to whatever
ANALYZE chooses but we should no longer get the proceeding autovacuum
overriding that.

The overhead this adds to each test's runtime seems small enough not to
worry about. I measure 3-4% on stats_ext and can't measure any change in
partition_aggregate.

I'm unable to recreate the issue locally, so this is a bit of a blind
fix.

Discussion: https://postgr.es/m/CAApHDvpWmpqYrKwwDQyeDq8dAyK7GMNaxDhrG69CkSuXoEg%2BVg%40mail.gmail.com
2020-03-29 19:36:20 +13:00
Peter Geoghegan a7b9d24e4e Make deduplication use number of key attributes.
Use IndexRelationGetNumberOfKeyAttributes() rather than
IndexRelationGetNumberOfAttributes() when determining whether or not two
index tuples are suitable for merging together into a single posting
list tuple.  This is a little bit tidier.  It brings affected code in
nbtdedup.c a little closer to similar, related code in nbtsplitloc.c.
2020-03-28 20:25:03 -07:00
Andres Freund 42750b08d9 Ensure snapshot is registered within ScanPgRelation().
In 9.4 I added support to use a historical snapshot in
ScanPgRelation(), while adding logical decoding. Unfortunately a
conflict with the concurrent removal of SnapshotNow was incorrectly
resolved, leading to an unregistered snapshot being used.

It is not correct to use an unregistered (or non-active) snapshot for
anything non-trivial, because catalog invalidations can cause the
snapshot to be invalidated.

Luckily it seems unlikely to actively cause problems in practice, as
ScanPgRelation() requires that we already have a lock on the relation,
we only look for a single row, and we don't appear to rely on the
result's tid to be correct. It however is clearly wrong and potential
negative consequences would likely be hard to find. So it seems worth
backpatching the fix, even without a concrete hazard.

Discussion: https://postgr.es/m/20200229052459.wzhqnbhrriezg4v2@alap3.anarazel.de
Backpatch: 9.5-
2020-03-28 12:26:46 -07:00
Jeff Davis 7351bfeda3 Fix costing for disk-based hash aggregation.
Report and suggestions from Richard Guo and Tomas Vondra.

Discussion: https://postgr.es/m/CAMbWs4_W8fYbAn8KxgidAaZHON_Oo08OYn9ze=7remJymLqo5g@mail.gmail.com
2020-03-28 12:07:49 -07:00
Dean Rasheed 4083f445c0 Improve the performance and accuracy of numeric sqrt() and ln().
Instead of using Newton's method to compute numeric square roots, use
the Karatsuba square root algorithm, which performs better for numbers
of all sizes. In practice, this is 3-5 times faster for inputs with
just a few digits and up to around 10 times faster for larger inputs.

Also, the new algorithm guarantees that the final digit of the result
is correctly rounded, since it computes an integer square root with
truncation, containing at least 1 extra decimal digit before rounding.
The former algorithm would occasionally round the wrong way because
it rounded both the intermediate and final results.

In addition, arrange for sqrt_var() to explicitly support negative
rscale values (rounding before the decimal point). This allows the
argument reduction phase of ln_var() to be optimised for large inputs,
since it only needs to compute square roots with a few more digits
than the final ln() result, rather than computing all the digits
before the decimal point. For very large inputs, this can be many
thousands of times faster.

In passing, optimise div_var_fast() in a couple of places where it was
doing unnecessary work.

Patch be me, reviewed by Tom Lane and Tels.

Discussion: https://postgr.es/m/CAEZATCV1A7+jD3P30Zu31KjaxeSEyOn3v9d6tYegpxcq3cQu-g@mail.gmail.com
2020-03-28 14:37:53 +00:00
Peter Eisentraut 8f3ec75de4 Enable Unix-domain sockets support on Windows
As of Windows 10 version 1803, Unix-domain sockets are supported on
Windows.  But it's not automatically detected by configure because it
looks for struct sockaddr_un and Windows doesn't define that.  So we
just make our own definition on Windows and override the configure
result.

Set DEFAULT_PGSOCKET_DIR to empty on Windows so by default no
Unix-domain socket is used, because there is no good standard
location.

In pg_upgrade, we have to do some extra tweaking to preserve the
existing behavior of not using Unix-domain sockets on Windows.  Adding
support would be desirable, but it needs further work, in particular a
way to select whether to use Unix-domain sockets from the command-line
or with a run-time test.

The pg_upgrade test script needs a fix.  The previous code passed
"localhost" to postgres -k, which only happened to work because
Windows used to ignore the -k argument value altogether.  We instead
need to pass an empty string to get the desired effect.

The test suites will continue to not use Unix-domain sockets on
Windows.  This requires a small tweak in pg_regress.c.  The TAP tests
don't need to be changed because they decide by the operating system
rather than HAVE_UNIX_SOCKETS.

Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/54bde68c-d134-4eb8-5bd3-8af33b72a010@2ndquadrant.com
2020-03-28 15:01:01 +01:00
Dean Rasheed 87779aa474 Prevent functional dependency estimates from exceeding column estimates.
Formerly we applied a functional dependency "a => b with dependency
degree f" using the formula

  P(a,b) = P(a) * [f + (1-f)*P(b)]

This leads to the possibility that the combined selectivity P(a,b)
could exceed P(b), which is not ideal. The addition of support for IN
and OR clauses (commits 8f321bd16c and ccaa3569f5) would seem to make
this more likely, since the user-supplied values in such clauses are
not necessarily compatible with the functional dependency.

Mitigate this by using the formula

  P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

instead, which guarantees that the combined selectivity is less than
each column's individual selectivity. Logically, this is modifies the
part of the formula that accounts for dependent rows to handle cases
where P(a) > P(b), whilst not changing the second term which accounts
for independent rows.

Additionally, this refactors the way that functional dependencies are
applied, so now dependencies_clauselist_selectivity() estimates both
the implying clauses and the implied clauses for each functional
dependency (formerly only the implied clauses were estimated), and now
all clauses for each attribute are taken into account (formerly only
one clause for each implied attribute was estimated). This removes the
previously built-in assumption that only equality clauses will be
seen, which is no longer true, and opens up the possibility of
applying functional dependencies to more general clauses.

Patch by me, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
Discussion: https://postgr.es/m/20200318002946.6dvblukm3cfmgir2%40development
2020-03-28 12:48:34 +00:00
Peter Eisentraut 145cb16d3b Cleanup in SQL features files
Feature C011 was still listed in sql_feature_packages.txt but had been
removed from sql_features.txt, so also remove from the former.
2020-03-28 08:46:18 +01:00
David Rowley b07642dbcd Trigger autovacuum based on number of INSERTs
Traditionally autovacuum has only ever invoked a worker based on the
estimated number of dead tuples in a table and for anti-wraparound
purposes. For the latter, with certain classes of tables such as
insert-only tables, anti-wraparound vacuums could be the first vacuum that
the table ever receives. This could often lead to autovacuum workers being
busy for extended periods of time due to having to potentially freeze
every page in the table. This could be particularly bad for very large
tables. New clusters, or recently pg_restored clusters could suffer even
more as many large tables may have the same relfrozenxid, which could
result in large numbers of tables requiring an anti-wraparound vacuum all
at once.

Here we aim to reduce the work required by anti-wraparound and aggressive
vacuums in general, by triggering autovacuum when the table has received
enough INSERTs. This is controlled by adding two new GUCs and reloptions;
autovacuum_vacuum_insert_threshold and
autovacuum_vacuum_insert_scale_factor. These work exactly the same as the
existing scale factor and threshold controls, only base themselves off the
number of inserts since the last vacuum, rather than the number of dead
tuples. New controls were added rather than reusing the existing
controls, to allow these new vacuums to be tuned independently and perhaps
even completely disabled altogether, which can be done by setting
autovacuum_vacuum_insert_threshold to -1.

We make no attempt to skip index cleanup operations on these vacuums as
they may trigger for an insert-mostly table which continually doesn't have
enough dead tuples to trigger an autovacuum for the purpose of removing
those dead tuples. If we were to skip cleaning the indexes in this case,
then it is possible for the index(es) to become bloated over time.

There are additional benefits to triggering autovacuums based on inserts,
as tables which never contain enough dead tuples to trigger an autovacuum
are now more likely to receive a vacuum, which can mark more of the table
as "allvisible" and encourage the query planner to make use of Index Only
Scans.

Currently, we still obey vacuum_freeze_min_age when triggering these new
autovacuums based on INSERTs. For large insert-only tables, it may be
beneficial to lower the table's autovacuum_freeze_min_age so that tuples
are eligible to be frozen sooner. Here we've opted not to zero that for
these types of vacuums, since the table may just be insert-mostly and we
may otherwise freeze tuples that are still destined to be updated or
removed in the near future.

There was some debate to what exactly the new scale factor and threshold
should default to. For now, these are set to 0.2 and 1000, respectively.
There may be some motivation to adjust these before the release.

Author: Laurenz Albe, Darafei Praliaskouski
Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby
Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-28 19:20:12 +13:00
Peter Geoghegan 9945ad6e90 Justify nbtree page split locking in code comment.
Delaying unlocking the right child page until after the point that the
left child's parent page has been refound is no longer truly necessary.
Commit 40dae7ec made nbtree tolerant of interrupted page splits.  VACUUM
was taught to avoid deleting a page that happens to be the right half of
an incomplete split.  As long as page splits don't unlock the left child
page until the end of the second/final phase, it should be safe to
unlock the right child page earlier (at the end of the first phase).

It probably isn't actually useful to release the right child's lock
earlier like this (it probably won't improve performance).  Even still,
pointing out that it ought to be safe to do so should make it easier to
understand the overall design.
2020-03-27 16:44:52 -07:00
Alvaro Herrera 1e6148032e
Allow walreceiver configuration to change on reload
The parameters primary_conninfo, primary_slot_name and
wal_receiver_create_temp_slot can now be changed with a simple "reload"
signal, no longer requiring a server restart.  This is achieved by
signalling the walreceiver process to terminate and having it start
again with the new values.

Thanks to Andres Freund, Kyotaro Horiguchi, Fujii Masao for discussion.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/19513901543181143@sas1-19a94364928d.qloud-c.yandex.net
2020-03-27 19:51:37 -03:00
Alvaro Herrera 092c6936de
Set wal_receiver_create_temp_slot PGC_POSTMASTER
Commit 3297308278 gave walreceiver the ability to create and use a
temporary replication slot, and made it controllable by a GUC (enabled
by default) that can be changed with SIGHUP.  That's useful but has two
problems: one, it's possible to cause the origin server to fill its disk
if the slot doesn't advance in time; and also there's a disconnect
between state passed down via the startup process and GUCs that
walreceiver reads directly.

We handle the first problem by setting the option to disabled by
default.  If the user enables it, its on their head to make sure that
disk doesn't fill up.

We handle the second problem by passing the flag via startup rather than
having walreceiver acquire it directly, and making it PGC_POSTMASTER
(which ensures a walreceiver always has the fresh value).  A future
commit can relax this (to PGC_SIGHUP again) by having the startup
process signal walreceiver to shutdown whenever the value changes.

Author: Sergei Kornilov <sk@zsrv.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200122055510.GH174860@paquier.xyz
2020-03-27 16:20:33 -03:00
Tom Lane fbc7a71608 Rearrange validity checks for plpgsql "simple" expressions.
Buildfarm experience shows what probably should've occurred to me before:
if a cache flush occurs partway through building a generic plan, then
the plansource may have is_valid = false even though the plan is valid.
We need to accept this case, use the generated plan, and then try to
replan the next time.  We can't try to replan immediately, because that
would produce an infinite loop in CLOBBER_CACHE_ALWAYS builds; moreover
it's really overkill.  (We can assume that the plan is valid, it's just
possibly a bit stale.  Note that the pre-existing code behaved this way,
and the non-simple-expression code paths do too.)  Conversely, not using
the generated plan would drop us into the not-a-simple-expression code
path, which is bad for performance and would also cause regression-test
failures due to visibly different error-reporting behavior.

Hence, refactor the validity-check functions so that the initial check
and recheck cases can react differently to plansource->is_valid.
This makes their usage a bit simpler, too.

Discussion: https://postgr.es/m/7072.1585332104@sss.pgh.pa.us
2020-03-27 14:47:34 -04:00
Peter Eisentraut 8d1b9648c5 Update SQL features
Change F311 to supported.  This was already accomplished when
subfeature F311-04 (WITH CHECK OPTION) was added, but the top-level
feature wasn't updated at the time.
2020-03-27 08:36:08 +01:00
Tom Lane 8f59f6b9c0 Improve performance of "simple expressions" in PL/pgSQL.
For relatively simple expressions (say, "x + 1" or "x > 0"), plpgsql's
management overhead exceeds the cost of evaluating the expression.
This patch substantially improves that situation, providing roughly
2X speedup for such trivial expressions.

First, add infrastructure in the plancache to allow fast re-validation
of cached plans that contain no table access, and hence need no locks.
Teach plpgsql to use this infrastructure for expressions that it's
already deemed "simple" (which in particular will never contain table
references).

The fast path still requires checking that search_path hasn't changed,
so provide a fast path for OverrideSearchPathMatchesCurrent by
counting changes that have occurred to the active search path in the
current session.  This is simplistic but seems enough for now, seeing
that PushOverrideSearchPath is not used in any performance-critical
cases.

Second, manage the refcounts on simple expressions' cached plans using
a transaction-lifespan resource owner, so that we only need to take
and release an expression's refcount once per transaction not once per
expression evaluation.  The management of this resource owner exactly
parallels the existing management of plpgsql's simple-expression EState.

Add some regression tests covering this area, in particular verifying
that expression caching doesn't break semantics for search_path changes.

Patch by me, but it owes something to previous work by Amit Langote,
who recognized that getting rid of plancache-related overhead would
be a useful thing to do here.  Also thanks to Andres Freund for review.

Discussion: https://postgr.es/m/CAFj8pRDRVfLdAxsWeVLzCAbkLFZhW549K+67tpOc-faC8uH8zw@mail.gmail.com
2020-03-26 18:58:57 -04:00
Tom Lane 86e5badd22 Ensure that plpgsql cleans up cleanly during parallel-worker exit.
plpgsql_xact_cb ought to treat events XACT_EVENT_PARALLEL_COMMIT and
XACT_EVENT_PARALLEL_ABORT like XACT_EVENT_COMMIT and XACT_EVENT_ABORT
respectively, since its goal is to do process-local cleanup.  This
oversight caused plpgsql's end-of-transaction cleanup to not get done
in parallel workers.  Since a parallel worker will exit just after the
transaction cleanup, the effects of this are limited.  I couldn't find
any case in the core code with user-visible effects, but perhaps there
are some in extensions.  In any case it's wrong, so let's fix it before
it bites us not after.

In passing, add some comments around the handling of expression
evaluation resources in DO blocks.  There's no live bug there, but it's
quite unobvious what's happening; at least I thought so.  This isn't
related to the other issue, except that I found both things while poking
at expression-evaluation performance.

Back-patch the plpgsql_xact_cb fix to 9.5 where those event types
were introduced, and the DO-block commentary to v11 where DO blocks
gained the ability to issue COMMIT/ROLLBACK.

Discussion: https://postgr.es/m/10353.1585247879@sss.pgh.pa.us
2020-03-26 18:06:55 -04:00
Magnus Hagander eff5b245df Document that pg_checksums exists in checksums README
Author: Daniel Gustafsson <daniel@yesql.se>
2020-03-26 15:05:54 +01:00
Peter Eisentraut 49bf81536e Drop slot's LWLock before returning from SaveSlotToPath()
When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.

This could result in a walsender being stuck on the lock forever.  A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.

Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com>
Reported-by: Craig Ringer <craig@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
2020-03-26 13:29:20 +01:00
Tom Lane 958aa438aa Further fixes for ssl_passphrase_callback test module.
The Makefile should set TAP_TESTS = 1, not implement the infrastructure
for itself.  For one thing, it missed the appropriate "make clean"
steps.  For another, the buildfarm isn't running this test because
it wasn't hooked into "make installcheck" either.
2020-03-25 22:05:27 -04:00
Andrew Dunstan e984fb341f Don't listen to localhost in ssl_passphrase_callback test
Commit 896fcdb230 contained an unnecessary setting that listened to
localhost. Since the test doesn't actually try to make an SSL connection
to the database this isn't required. Moreover, it's a security hole.

Per gripe from Tom Lane.
2020-03-25 21:14:14 -04:00
Tom Lane 13c98bdfc4 Fix assorted portability issues in commit 896fcdb23.
Some platforms require libssl to be linked explicitly in the new
SSL test module.  Borrow contrib/sslinfo's code for that.

Since src/test/modules/Makefile now has a variable SUBDIRS list,
it needs to follow the ALWAYS_SUBDIRS protocol for that (cf.
comments in Makefile.global.in).

Blindly try to fix MSVC build failures by adding PGDLLIMPORT.
2020-03-25 19:37:30 -04:00
Andrew Dunstan 896fcdb230 Provide a TLS init hook
The default hook function sets the default password callback function.
In order to allow preloaded libraries to have an opportunity to override
the default, TLS initialization if now delayed slightly until after
shared preloaded libraries have been loaded.

A test module is provided which contains a trivial example that decodes
an obfuscated password for an SSL certificate.

Author: Andrew Dunstan
Reviewed By: Andreas Karlsson, Asaba Takanori
Discussion: https://postgr.es/m/04116472-818b-5859-1d74-3d995aab2252@2ndQuadrant.com
2020-03-25 17:13:17 -04:00
Alvaro Herrera ffd398021c
pg_dump new test: Change order of arguments
Some getopt_long implementations don't like to have a non-option
argument before option arguments, so put the database name as the
last switch.

Per buildfarm member hoverfly.
2020-03-25 15:15:32 -03:00
Alvaro Herrera 2f9eb31320
pg_dump: Allow dumping data of specific foreign servers
The new command-line switch --include-foreign-data=PATTERN lets the user
specify foreign servers from which to dump foreign table data.  This can
be refined by further inclusion/exclusion switches, so that the user has
full control over which tables to dump.

A limitation is that this doesn't work in combination with parallel
dumps, for implementation reasons.  This might be lifted in the future,
but requires shuffling some code around.

Author: Luis Carril <luis.carril@swarm64.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Surafel Temesgen <surafel3000@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@2ndQuadrant.com>
Discussion: https://postgr.es/m/LEJPR01MB0185483C0079D2F651B16231E7FC0@LEJPR01MB0185.DEUPRD01.PROD.OUTLOOK.DE
2020-03-25 13:19:31 -03:00
Tom Lane bda6dedbea Go back to returning int from ereport auxiliary functions.
This reverts the parts of commit 17a28b0364
that changed ereport's auxiliary functions from returning dummy integer
values to returning void.  It turns out that a minority of compilers
complain (not entirely unreasonably) about constructs such as

	(condition) ? errdetail(...) : 0

if errdetail() returns void rather than int.  We could update those
call sites to say "(void) 0" perhaps, but the expectation for this
patch set was that ereport callers would not have to change anything.
And this aspect of the patch set was already the most invasive and
least compelling part of it, so let's just drop it.

Per buildfarm.

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-25 11:57:36 -04:00
Peter Eisentraut f5817595a7 Define EXEC_BACKEND in pg_config_manual.h
It was for unclear reasons defined in a separate location, which makes
it more cumbersome to override for testing, and it also did not have
any prominent documentation.  Move to pg_config_manual.h, where
similar things are already collected.

The previous definition on the command-line had the effect of defining
it to the value 1, but now that we don't need that anymore we just
define it to empty, to simplify manual editing a bit.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/b7053ba8-b008-5335-31de-2fe4fe41ef0f%402ndquadrant.com
2020-03-25 14:31:14 +01:00
Peter Eisentraut e8b1774fc2 Update SQL features
The name of E182 was changed in SQL:2011.

Also, we can change it to supported because all it requires is one
embedded language to be supported, which we do.
2020-03-25 08:46:41 +01:00
Thomas Munro 352f6f2df6 Add collation versions for Windows.
On Vista and later, use GetNLSVersionEx() to request collation version
information.

Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
2020-03-25 16:04:32 +13:00
Thomas Munro 382a821907 Allow NULL version for individual collations.
Remove the documented restriction that collation providers must either
return NULL for all collations or non-NULL for all collations.

Use NULL for glibc collations like "C.UTF-8", which might otherwise lead
future proposed commits to force unnecessary index rebuilds.

Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
2020-03-25 15:53:24 +13:00
Jeff Davis dd8e19132a Consider disk-based hash aggregation to implement DISTINCT.
Correct oversight in 1f39bce0. If enable_hashagg_disk=true, we should
consider hash aggregation for DISTINCT when applicable.
2020-03-24 18:30:04 -07:00
Jeff Davis 3649133b14 Avoid allocating unnecessary zero-sized array.
If there are no aggregates, there is no need to allocate an array of
zero AggStatePerGroupData elements.
2020-03-24 18:30:04 -07:00
Peter Geoghegan b150a76793 Fix nbtree deduplication README commentary.
Descriptions of some aspects of how deduplication works were unclear in
a couple of places.
2020-03-24 14:58:27 -07:00
Andres Freund 112b006fe7 logical decoding: Remove TODO about unnecessary optimization.
Measurements show, and intuition agrees, that there's currently no
known cases where adding a fastpath to avoid allocating / ordering a
heap for a single transaction is worthwhile.

Author: Dilip Kumar
Discussion: https://postgr.es/m/CAFiTN-sp701wvzvnLQJGk7JDqrFM8f--97-ihbwkU8qvn=p8nw@mail.gmail.com
2020-03-24 12:15:03 -07:00
Peter Eisentraut f15ace7935 Fix compiler warning on Cygwin
bf68b79e50 introduced an unused variable
compiler warning on Cygwin.
2020-03-24 19:31:02 +01:00
Tom Lane 17a28b0364 Improve the internal implementation of ereport().
Change all the auxiliary error-reporting routines to return void,
now that we no longer need to pretend they are passing something
useful to errfinish().  While this probably doesn't save anything
significant at the machine-code level, it allows detection of some
additional types of mistakes.

Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO)
to errfinish not errstart.  This shaves a few cycles off the case where
errstart decides we're not going to emit anything.

Re-implement elog() as a trivial wrapper around ereport(), removing
the separate support infrastructure it used to have.  Aside from
getting rid of some now-surplus code, this means that elog() now
really does have exactly the same semantics as ereport(), in particular
that it can skip evaluation work if the message is not to be emitted.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 12:08:48 -04:00
Tom Lane e3a87b4991 Re-implement the ereport() macro using __VA_ARGS__.
Now that we require C99, we can depend on __VA_ARGS__ to work, and
revising ereport() to use it has several significant benefits:

* The extra parentheses around the auxiliary function calls are now
optional.  Aside from being a bit less ugly, this removes a common
gotcha for new contributors, because in some cases the compiler errors
you got from forgetting them were unintelligible.

* The auxiliary function calls are now evaluated as a comma expression
list rather than as extra arguments to errfinish().  This means that
compilers can be expected to warn about no-op expressions in the list,
allowing detection of several other common mistakes such as forgetting
to add errmsg(...) when converting an elog() call to ereport().

* Unlike the situation with extra function arguments, comma expressions
are guaranteed to be evaluated left-to-right, so this removes platform
dependency in the order of the auxiliary function calls.  While that
dependency hasn't caused us big problems in the past, this change does
allow dropping some rather shaky assumptions around errcontext() domain
handling.

There's no intention to make wholesale changes of existing ereport
calls, but as proof-of-concept this patch removes the extra parens
from a couple of calls in postgres.c.

While new code can be written either way, code intended to be
back-patched will need to use extra parens for awhile yet.  It seems
worth back-patching this change into v12, so as to reduce the window
where we have to be careful about that by one year.  Hence, this patch
is careful to preserve ABI compatibility; a followup HEAD-only patch
will make some additional simplifications.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 11:49:00 -04:00
Peter Eisentraut cef27ae01a Fix compiler warning
A variable was unused in non-assert builds.  Simplify the code to
avoid the issue.

Reported-by: Erik Rijkers <er@xs4all.nl>
2020-03-24 16:02:01 +01:00