Commit Graph

21735 Commits

Author SHA1 Message Date
Tom Lane 9bacdf9f53 Don't leak malloc'd error string in libpqrcv_check_conninfo().
We leaked the error report from PQconninfoParse, when there was
one.  It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.

Found via valgrind testing.
Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 377b7a8300 Don't leak malloc'd strings when a GUC setting is rejected.
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks.  Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane d303849b05 Don't leak compiled regex(es) when an ispell cache entry is dropped.
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context.  However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc.  Hence, we leaked
compiled regex(es) any time we dropped such a cache entry.  That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes.  Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 415ffdc220 Don't run RelationInitTableAccessMethod in a long-lived context.
Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context.  If said context is CacheMemoryContext,
we eventually will have visible bloat.  But fixing this is no
harder than moving one memory context switch step.  (The other
callers don't have a problem.)

Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.

Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 28644fac10 Don't leak rd_statlist when a relcache entry is dropped.
Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 1d581ce712 Fix misuse of foreach_delete_current().
Our coding convention requires this macro's result to be assigned
back to the original List variable.  In this usage, since the
List could not become empty, there was no actual bug --- but
some compilers warned about it.  Oversight in be45be9c3.

Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
2021-03-18 19:24:22 -04:00
Tomas Vondra be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Magnus Hagander da18d829c2 Fix function name in error hint
pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.

Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
2021-03-18 11:22:20 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Andres Freund 5f79580ad6 Fix memory lifetime issues of replication slot stats.
When accessing replication slot stats, introduced in 9868167500,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.

The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.

No backpatch necessary, feature is only in HEAD.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
2021-03-17 16:21:46 -07:00
Tom Lane 8620a7f6db Code review for server's handling of "tablespace map" files.
While looking at Robert Foggia's report, I noticed a passel of
other issues in the same area:

* The scheme for backslash-quoting newlines in pathnames is just
wrong; it will misbehave if the last ordinary character in a pathname
is a backslash.  I'm not sure why we're bothering to allow newlines
in tablespace paths, but if we're going to do it we should do it
without introducing other problems.  Hence, backslashes themselves
have to be backslashed too.

* The author hadn't read the sscanf man page very carefully, because
this code would drop any leading whitespace from the path.  (I doubt
that a tablespace path with leading whitespace could happen in
practice; but if we're bothering to allow newlines in the path, it
sure seems like leading whitespace is little less implausible.)  Using
sscanf for the task of finding the first space is overkill anyway.

* While I'm not 100% sure what the rationale for escaping both \r and
\n is, if the idea is to allow Windows newlines in the file then this
code failed, because it'd throw an error if it saw \r followed by \n.

* There's no cross-check for an incomplete final line in the map file,
which would be a likely apparent symptom of the improper-escaping
bug.

On the generation end, aside from the escaping issue we have:

* If needtblspcmapfile is true then do_pg_start_backup will pass back
escaped strings in tablespaceinfo->path values, which no caller wants
or is prepared to deal with.  I'm not sure if there's a live bug from
that, but it looks like there might be (given the dubious assumption
that anyone actually has newlines in their tablespace paths).

* It's not being very paranoid about the possibility of random stuff
in the pg_tblspc directory.  IMO we should ignore anything without an
OID-like name.

The escaping rule change doesn't seem back-patchable: it'll require
doubling of backslashes in the tablespace_map file, which is basically
a basebackup format change.  The odds of that causing trouble are
considerably more than the odds of the existing bug causing trouble.
The rest of this seems somewhat unlikely to cause problems too,
so no back-patch.
2021-03-17 16:18:46 -04:00
Tom Lane a50e4fd028 Prevent buffer overrun in read_tablespace_map().
Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.

While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't.  To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map.  That's not very
appetizing either.)
2021-03-17 16:10:37 -04:00
Thomas Munro 7f7f25f15e Revert "Fix race in Parallel Hash Join batch cleanup."
This reverts commit 378802e371.
This reverts commit 3b8981b6e1.

Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
2021-03-18 01:10:55 +13:00
Michael Paquier 9fd2952cf4 Fix comment in indexing.c
578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.

Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
2021-03-17 18:07:00 +09:00
Peter Eisentraut e1ae40f381 Small error message improvement 2021-03-17 08:17:33 +01:00
Thomas Munro 378802e371 Update the names of Parallel Hash Join phases.
Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases.  Update the phases' names to match.

While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state.  The distinction is clearer after bugfix commit
3b8981b6 split the phase into two.  As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then.  Also improve the comments about
the phases a bit.

Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
2021-03-17 18:43:04 +13:00
Thomas Munro 3b8981b6e1 Fix race in Parallel Hash Join batch cleanup.
With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.

Fix, by introducing an extra barrier phase.  The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race.  This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().

Back-patch to 11, where the code arrived.

Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
2021-03-17 18:05:39 +13:00
Amit Kapila 6b67d72b60 Fix race condition in drop subscription's handling of tablesync slots.
Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.

Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
2021-03-17 08:15:12 +05:30
Thomas Munro 9e7ccd9ef6 Enable parallelism in REFRESH MATERIALIZED VIEW.
Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans
are considered when running the underlying SELECT query.  This wasn't
done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW,
because it wasn't yet known to be safe.

Since REFRESH always inserts into a freshly created table before later
merging or swapping the data into place with separate operations, we can
enable such plans here too.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Luc Vlaming <luc@swarm64.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
2021-03-17 15:04:17 +13:00
Peter Geoghegan fbe4cb3bd4 Fix comment about promising tuples.
Oversight in commit d168b66682, which added bottom-up index deletion.
2021-03-16 13:38:52 -07:00
Tom Lane 4b12ab18c9 Avoid corner-case memory leak in SSL parameter processing.
After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context.  That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.

The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data.  That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.

While we're here, add some comments about the memory management
aspects of this logic.

Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
2021-03-16 16:03:06 -04:00
Stephen Frost c6fc50cb40 Use pre-fetching for ANALYZE
When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for.  Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Stephen Frost 94d13d474d Improve logging of auto-vacuum and auto-analyze
When logging auto-vacuum and auto-analyze activity, include the I/O
timing if track_io_timing is enabled.  Also, for auto-analyze, add the
read rate and the dirty rate, similar to how that information has
historically been logged for auto-vacuum.

Stephen Frost and Jakub Wartak

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Tom Lane 1ea396362b Improve logging of bad parameter values in BIND messages.
Since commit ba79cb5dc, values of bind parameters have been logged
during errors in extended query mode.  However, we only did that after
we'd collected and converted all the parameter values, thus failing to
offer any useful localization of invalid-parameter problems.  Add a
separate callback that's used during parameter collection, and have it
print the parameter number, along with the input string if text input
format is used.

Justin Pryzby and Tom Lane

Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com
Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com
2021-03-16 11:16:41 -04:00
Alvaro Herrera acb7e4eb6b
Implement pipeline mode in libpq
Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query.  The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync.  This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com>
Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Vérité <daniel@manitou-mail.org>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com>
Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>

Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
2021-03-15 18:13:42 -03:00
Fujii Masao d75288fb27 Make archiver process an auxiliary process.
This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.

This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.

Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.

Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
2021-03-15 13:13:14 +09:00
Peter Geoghegan 0ea71c93a0 Notice that heap page has dead items during VACUUM.
Consistently set a flag variable that tracks whether the current heap
page has a dead item during lazy vacuum's heap scan.  We missed the
common case where there is an preexisting (or even a new) LP_DEAD heap
line pointer.

Also make it clear that the variable might be affected by an existing
line pointer, say from an earlier opportunistic pruning operation.  This
distinction is important because it's the main reason why we can't just
use the nearby tups_vacuumed variable instead.

No backpatch.  In theory failing to set the page level flag variable had
no consequences.  Currently it is only used to defensively check if a
page marked all visible has dead items, which should never happen anyway
(if it does then the table must be corrupt).

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com
2021-03-14 18:05:57 -07:00
Amit Kapila c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
Commit 05c8482f7f added special logic related to parallel-safety of FK
triggers. This is a bit of a hack and should have instead been done by
simply setting appropriate proparallel values on those trigger functions
themselves.

Suggested-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-13 09:20:52 +05:30
Peter Geoghegan 02b5940dbe Consolidate nbtree VACUUM metapage routines.
Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs
a single 'rel' argument now), and move it next to its sibling function
in nbtpage.c.

I believe that _bt_vacuum_needs_cleanup() was originally located in
nbtree.c due to an include dependency issue.  That's no longer an issue.

Follow-up to commit 9f3665fb.
2021-03-12 13:11:47 -08:00
Tom Lane f52c5d6749 Forbid marking an identity column as nullable.
GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed
to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY
NULL".  One might think the old behavior was a feature, but it was
inconsistent because the outcome varied depending on the order of
the clauses, so it seems to have been just an oversight.

Per bug #16913 from Pavel Boev.  Back-patch to v10 where identity
columns were introduced.

Vik Fearing (minor tweaks by me)

Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org
2021-03-12 11:08:42 -05:00
Thomas Munro 1b88b8908e Specialize checkpointer sort functions.
When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine.  One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-12 23:56:02 +13:00
Amit Kapila 519e4c9ee2 Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.
Reported-by: Thomas Munro
Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com
2021-03-12 15:42:08 +05:30
Amit Kapila e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
The commit added code which used a relcache TriggerDesc field across
another cache access, which it shouldn't because the relcache doesn't
guarantee it won't get moved.

Diagnosed-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-12 15:14:41 +05:30
Thomas Munro 57dcc2ef33 Poll postmaster less frequently in recovery.
Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death.  This avoids
expensive system calls reported to slow down CPU-bound recovery by as
much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
2021-03-12 19:45:42 +13:00
Thomas Munro de829ddf23 Add condition variable for walreceiver shutdown.
Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Thomas Munro 600f2f50b7 Add condition variable for recovery resume.
Replace a sleep loop with a CV, to get a fast reaction time when
recovery is resumed or the postmaster exits via standard infrastructure.
Unfortunately we still need to wake up every second to perform extra
polling during the recovery pause loop.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Fujii Masao b82640df00 Send statistics collected during shutdown checkpoint to the stats collector.
When shutdown is requested, checkpointer performs checkpoint or
restartpoint, and updates the statistics, before it exits. But previously
checkpointer didn't send those statistics to the stats collector.

Shutdown checkpoint and restartpoint are treated as requested ones
instead of scheduled ones, so the number of them are counted in
pg_stat_bgwriter.checkpoints_req column.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 14:23:00 +09:00
Fujii Masao 33394ee6f2 Force to send remaining WAL stats to the stats collector at walwriter exit.
In walwriter's main loop, WAL stats message is only sent if enough time
has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs.
This is necessary to avoid overloading to the stats collector. But this
can cause recent WAL stats to be unsent when walwriter exits.

To ensure that all the WAL stats are sent, this commit makes walwriter
force to send remaining WAL stats to the collector when it exits because
of shutdown request. Note that those remaining WAL stats can still be
unsent when walwriter exits with non-zero exit code (e.g., FATAL error).
This is OK because that walwriter exit leads to server crash and
subsequent recovery discards all the stats. So there is no need to send
remaining stats in that case.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 13:29:59 +09:00
Thomas Munro 43c6662496 Minor modernization for README.barrier.
Itanium is very uncommon and being discontinued.  ARM is everywhere.
Prefer ARM as an example of an architecture with weak memory ordering.
2021-03-12 15:36:16 +13:00
Peter Geoghegan 7bb97211a5 Save a few cycles during nbtree VACUUM.
Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common
case where there are no deleted but not yet recycled pages to recycle
during a cleanup-only nbtree VACUUM operation.

Follow-up to commit e5d8a999, which (among other things) taught the
"skip full scan" nbtree VACUUM mechanism to only trigger a full index
scan when the absolute number of deleted pages in the index is
considered excessive.
2021-03-11 14:18:23 -08:00
Peter Geoghegan effdd3f3b6 Add back vacuum_cleanup_index_scale_factor parameter.
Commit 9f3665fb removed the vacuum_cleanup_index_scale_factor storage
parameter.  However, that creates dump/reload hazards when moving across
major versions.

Add back the vacuum_cleanup_index_scale_factor parameter (though not the
GUC of the same name) purely to avoid problems when using tools like
pg_upgrade.  The parameter remains disabled and undocumented.

No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was
only disabled by REL_13_STABLE's version of master branch commit
9f3665fb in the first place -- the parameter already looks like this on
REL_13_STABLE.

Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz
2021-03-11 12:42:46 -08:00
Robert Haas 32fd2b57d7 Be clear about whether a recovery pause has taken effect.
Previously, the code and documentation seem to have essentially
assumed than a call to pg_wal_replay_pause() would take place
immediately, but that's not the case, because we only check for a
pause in certain places. This means that a tool that uses this
function and then wants to do something else afterward that is
dependent on the pause having taken effect doesn't know how long it
needs to wait to be sure that no more WAL is going to be replayed.

To avoid that, add a new function pg_get_wal_replay_pause_state()
which returns either 'not paused', 'paused requested', or 'paused'.
After calling pg_wal_replay_pause() the status will immediate change
from 'not paused' to 'pause requested'; when the startup process
has noticed this, the status will change to 'pause'.  For backward
compatibility, pg_is_wal_replay_paused() still exists and returns
the same thing as before: true if a pause has been requested,
whether or not it has taken effect yet; and false if not.
The documentation is updated to clarify.

To improve the changes that a pause request is quickly confirmed
effective, adjust things so that WaitForWALToBecomeAvailable will
swiftly reach a call to recoveryPausesHere() when a pause request
is made.

Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata,
Masahiko Sawada, and Bharath Rupireddy.

Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
2021-03-11 15:07:03 -05:00
Peter Geoghegan 5f8727f5a6 VACUUM ANALYZE: Always update pg_class.reltuples.
vacuumlazy.c sometimes fails to update pg_class entries for each index
(to ensure that pg_class.reltuples is current), even though analyze.c
assumed that that must have happened during VACUUM ANALYZE.  There are
at least a couple of reasons for this.  For example, vacuumlazy.c could
fail to update pg_class when the index AM indicated that its statistics
are merely an estimate, per the contract for amvacuumcleanup() routines
established by commit e57345975c back in 2006.

Stop assuming that pg_class must have been updated with accurate
statistics within VACUUM ANALYZE -- update pg_class for indexes at the
same time as the table relation in all cases.  That way VACUUM ANALYZE
will never fail to keep pg_class.reltuples reasonably accurate.

The only downside of this approach (compared to the old approach) is
that it might inaccurately set pg_class.reltuples for indexes whose heap
relation ends up with the same inaccurate value anyway.  This doesn't
seem too bad.  We already consistently called vac_update_relstats() (to
update pg_class) for the heap/table relation twice during any VACUUM
ANALYZE -- once in vacuumlazy.c, and once in analyze.c.  We now make
sure that we call vac_update_relstats() at least once (though often
twice) for each index.

This is follow up work to commit 9f3665fb, which dealt with issues in
btvacuumcleanup().  Technically this fixes an unrelated issue, though.
btvacuumcleanup() no longer provides an accurate num_index_tuples value
following commit 9f3665fb (when there was no btbulkdelete() call during
the VACUUM operation in question), but hashvacuumcleanup() has worked in
the same way for many years now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com
Backpatch: 13-, just like commit 9f3665fb.
2021-03-10 17:07:57 -08:00
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Bruce Momjian 845ac7f847 C comments: improve description of GiST NSN and GistBuildLSN
GiST indexes are complex, so adding more details in the code might help
someone.

Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us
2021-03-10 17:03:10 -05:00
Thomas Munro d87251048a Replace buffer I/O locks with condition variables.
1.  Backends waiting for buffer I/O are now interruptible.

2.  If something goes wrong in a backend that is currently performing
I/O, waiting backends no longer wake up until that backend reaches
AbortBufferIO() and broadcasts on the CV.  Previously, any waiters would
wake up (because the I/O lock was automatically released) and then
busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS.

3.  LWLockMinimallyPadded is removed, as it would now be unused.

Author: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016)
Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
2021-03-11 10:36:17 +13:00
Tom Lane c3ffe34863 Avoid creating duplicate cached plans for inherited FK constraints.
When a foreign key constraint is applied to a partitioned table, each
leaf partition inherits a similar FK constraint.  We were processing all
of those constraints independently, meaning that in large partitioning
trees we'd build up large collections of cached FK-checking query plans.
However, in all cases but one, the generated queries are actually
identical for all members of the inheritance tree (because, in most
cases, the query only mentions the topmost table of the other side of
the FK relationship).  So we can share a single cached plan among all
the partitions, saving memory, not to mention time to build and maintain
the cached plans.

Keisuke Kuroda and Amit Langote

Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1
2021-03-10 14:22:31 -05:00
Peter Eisentraut bbaf315309 Add bound check before bsearch() for performance
In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuple during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple.  In that function, we simply call bsearch(), but we
should be able to know the result without bsearch() if the index tuple
points to the heap tuple that is out of range of the collected garbage
tuples.  Therefore, add a simple bound check before resorting to
bsearch().  Testing has shown that this can give significant
performance benefits.

Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
2021-03-10 15:19:37 +01:00
Peter Eisentraut 1657b37d7c Small debug message tweak
This makes the wording of the delete case match the update case.
2021-03-10 08:16:38 +01:00
Amit Kapila e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
Initialize other newly added variables in max_parallel_hazard_context via
is_parallel_safe() because we don't check the parallel-safety of target
relations in that function.

Reported-by: Tom Lane as per buildfarm
Author: Amit Kapila
Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us
2021-03-10 10:06:39 +05:30
Amit Kapila 05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
  predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
  expression or support function

The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.

While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.

Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.

Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-03-10 07:38:58 +05:30
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier 9d2d457009 Add support for more progress reporting in COPY
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.

The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.

Bump catalog version, again.

Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
2021-03-09 14:21:03 +09:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Tom Lane d4545dc19b Complain if a function-in-FROM returns a set when it shouldn't.
Throw a "function protocol violation" error if a function in FROM
tries to return a set though it wasn't marked proretset.  Although
such cases work at the moment, it doesn't seem like something we
want to guarantee will keep working.  Besides, there are other
negative consequences of not setting the proretset flag, such as
potentially bad plans.

No back-patch, since if there is any third-party code violating
this expectation, people wouldn't appreciate us breaking it in
a minor release.

Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us
2021-03-08 18:54:55 -05:00
Tom Lane 5c06abb9b9 Validate the OID argument of pg_import_system_collations().
"SELECT pg_import_system_collations(0)" caused an assertion failure.
With a random nonzero argument --- or indeed with zero, in non-assert
builds --- it would happily make pg_collation entries with garbage
values of collnamespace.  These are harmless as far as I can tell
(unless maybe the OID happens to become used for a schema, later on?).
In any case this isn't a security issue, since the function is
superuser-only.  But it seems like a gotcha for unwary DBAs, so let's
add a check that the given OID belongs to some schema.

Back-patch to v10 where this function was introduced.
2021-03-08 18:21:51 -05:00
Tom Lane 6c20bdb2a2 Further tweak memory management for regex DFAs.
Coverity is still unhappy after commit 190c79884, and after looking
closer I think it might be onto something.  The callers of newdfa()
typically drop out if v->err has been set nonzero, which newdfa()
is faithfully doing if it fails.  However, what if v->err was already
nonzero before we entered newdfa()?  Then newdfa() could succeed and
the caller would promptly leak its result.

I don't think this scenario can actually happen, but the predicate
"v->err is always zero when newdfa() is called" seems difficult to be
entirely sure of; there's a good deal of code that potentially could
get that wrong.

It seems better to adjust the callers to directly check for a null
result instead of relying on ISERR() tests.  This is slightly cheaper
than the previous coding anyway.

Lacking evidence that there's any real bug, no back-patch.
2021-03-08 16:32:29 -05:00
Amit Kapila 8a812e5106 Track replication origin progress for rollbacks.
Commit 1eb6d6527a allowed to track replica origin replay progress for 2PC
but it was not complete. It misses to properly track the progress for
rollback prepared especially it missed updating the code for recovery.
Additionally, we need to allow tracking it on subscriber nodes where
wal_level might not be logical.

It is required to track decoding of 2PC which is committed in PG14
(a271a1b50e) and also nobody complained about this till now so not
backpatching it.

Author: Amit Kapila
Reviewed-by: Michael Paquier and Ajin Cherian
Discussion: https://postgr.es/m/CAA4eK1L-kHmMnSdrRW6UhRbCjR7cgh04c+6psY15qzT6ktcd+g@mail.gmail.com
2021-03-08 07:54:03 +05:30
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Tom Lane 0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Peter Eisentraut e527a99055 Some copy-editing of GUC descriptions 2021-03-03 07:14:35 +01:00
Thomas Munro 8eda3eba30 Use sort_template.h for qsort_tuple() and qsort_ssup().
Replace the Perl code previously used to generate specialized sort
functions with sort_template.h.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-03 17:02:32 +13:00
Amit Kapila 19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30
Peter Geoghegan 5b2f2af3d9 nbtree page deletion: Add leaftopparent assertion.
Add documenting assertion.  This makes it easier to follow how we
maintain the top parent link in target subtree's half-dead/leaf level
page.
2021-03-02 14:06:07 -08:00
Peter Geoghegan 3d8d5787a3 Fix nbtree page deletion error messages.
Adjust some "can't happen" error messages that assumed that the page
deletion target page must be a half-dead page.  This assumption was
wrong in the case of an internal target page.  Simply refer to these
pages as the target page instead.

Internal pages are never marked half-dead.  There is exactly one
half-dead page for each subtree undergoing deletion.  The half-dead page
is also the target subtree's leaf-level page.  This has been the case
since commit efada2b8, which totally overhauled nbtree page deletion.
2021-03-02 13:02:24 -08:00
Tom Lane d16f8c8e41 Mark default_transaction_read_only as GUC_REPORT.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers.  (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-03-02 13:53:54 -05:00
Tom Lane 4604f83fdf Suppress unnecessary regex subre nodes in a couple more cases.
This extends the changes made in commit cebc1d34e, teaching
parseqatom() to generate fewer or cheaper subre nodes in some edge
cases.  The case of interest here is a quantified atom that is "messy"
only because it has greediness opposite to what preceded it (whereas
captures and backrefs are intrinsically messy).  In this case we don't
need an iteration node, since we don't care where the sub-matches of
the quantifier are; and we might also not need a second concatenation
node.  This seems of only marginal real-world use according to my
testing, but I wanted to get it in before wrapping up this series of
regex performance fixes.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-03-02 12:14:14 -05:00
Tom Lane 0c3405cf11 Improve performance of regular expression back-references.
In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched.  If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.

In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child.  It can help quite a
lot if the tree looks like

              concat
             /      \
      capture        concat
                    /      \
     expensive stuff        backref

as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing.  This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior).  Fortunately, right-deep concat trees
are the common case.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:55:12 -05:00
Tom Lane 4aea704a5b Fix semantics of regular expression back-references.
POSIX defines the behavior of back-references thus:

    The back-reference expression '\n' shall match the same (possibly
    empty) string of characters as was matched by a subexpression
    enclosed between "\(" and "\)" preceding the '\n'.

As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression.  However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA.  If POSIX intended
that, you'd think they'd mention it.  Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.

Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied.  This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.

Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes.  Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:34:53 -05:00
Michael Paquier fabde52fab Simplify code to switch pg_class.relrowsecurity in tablecmds.c
The same code pattern was repeated twice to enable or disable ROW LEVEL
SECURITY with an ALTER TABLE command.  This makes the code slightly
cleaner.

Author: Justin Pryzby
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com
2021-03-02 12:30:21 +09:00
Tom Lane ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
Thomas Munro bd69ddfcdb Remove obsolete comment for WaitForProcSignalBarrier().
Commit 814f1d8b removed the behavior described.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
2021-03-02 09:30:57 +13:00
Thomas Munro f5a5773a9d Allow condition variables to be used in interrupt code.
Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:24:47 +13:00
Thomas Munro 814f1d8bc3 Use condition variables for ProcSignalBarriers.
Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:23:43 +13:00
Amit Kapila 8bdb1332eb Avoid repeated decoding of prepared transactions after a restart.
In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consistent_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-01 09:11:18 +05:30
Thomas Munro 6230912f23 Use FeBeWaitSet for walsender.c.
This avoids the need to set up and tear down a fresh WaitEventSet every
time we need need to wait.  We have to add an explicit exit on
postmaster exit (FeBeWaitSet isn't set up to do that automatically), so
move the code to do that into a new function to avoid repetition.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:19:38 +13:00
Thomas Munro a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Amit Kapila b4e3dc7fd4 Update the docs and comments for decoding of prepared xacts.
Commit a271a1b50e introduced decoding at prepare time in ReorderBuffer.
This can lead to deadlock for out-of-core logical replication solutions
that uses this feature to build distributed 2PC in case such transactions
lock [user] catalog tables exclusively. They need to inform users to not
have locks on catalog tables (via explicit LOCK command) in such
transactions.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
2021-03-01 08:14:33 +05:30
Thomas Munro 6148656a0b Use EVFILT_SIGNAL for kqueue latches.
Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.

This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:20:04 +13:00
Thomas Munro 6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Thomas Munro 83709a0d5a Use SIGURG rather than SIGUSR1 for latches.
Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups.  Move that last use over to a
new dedicated signal.  SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that facility.

The signal handler is now installed in all postmaster children by
InitializeLatchSupport().  Those wishing to disconnect from it should
call ShutdownLatchSupport().

Future patches will use this separation of signals to avoid the need for
a signal handler on some operating systems.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro c8f3bc2401 Optimize latches to send fewer signals.
Don't send signals to processes that aren't sleeping.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro d1b90995e8 Remove latch.c workaround for Linux < 2.6.27.
Commit 82ebbeb0 added a workaround for systems with no epoll_create1()
and EPOLL_CLOEXEC.  Linux < 2.6.27 and glibc < 2.9 are long gone.  Now
seems like a good time to drop the extra code, because otherwise we'd
have to add similar already-dead workaround code to new patches using
XXX_CLOEXEC flags that arrived in the same kernel release.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com
2021-03-01 11:24:28 +13:00
Alvaro Herrera 25936fd46c
Fix use-after-free bug with AfterTriggersTableData.storeslot
AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.

Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4b9.

Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com
2021-02-27 18:09:15 -03:00
David Rowley 977b2c0853 Add missing TidRangeScan readfunc
Mistakenly forgotten in bb437f995
2021-02-27 23:21:21 +13:00
David Rowley bb437f995d Add TID Range Scans to support efficient scanning ranges of TIDs
This adds a new executor node named TID Range Scan.  The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column.  These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM.  scan_set_tidrange is used to set the scan range to just the
given range of TIDs.  scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine.  The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
2021-02-27 22:59:36 +13:00
Peter Eisentraut f4adc41c4f Enhanced cycle mark values
Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-27 08:13:24 +01:00
Tom Lane 0fc1af174c Improve memory management in regex compiler.
The previous logic here created a separate pool of arcs for each
state, so that the out-arcs of each state were physically stored
within it.  Perhaps this choice was driven by trying to not include
a "from" pointer within each arc; but Spencer gave up on that idea
long ago, and it's hard to see what the value is now.  The approach
turns out to be fairly disastrous in terms of memory consumption,
though.  In the first place, NFAs built by this engine seem to have
about 4 arcs per state on average, with a majority having only one
or two out-arcs.  So pre-allocating 10 out-arcs for each state is
already cause for a factor of two or more bloat.  Worse, the NFA
optimization phase moves arcs around with abandon.  In a large NFA,
some of the states will have hundreds of out-arcs, so towards the
end of the optimization phase we have a significant number of states
whose arc pools have room for hundreds of arcs each, even though only
a few of those arcs are in use.  We have seen real-world regexes in
which this effect bloats the memory requirement by 25X or even more.

Hence, get rid of the per-state arc pools in favor of a single arc
pool for the whole NFA, with variable-sized allocation batches
instead of always asking for 10 at a time.  While we're at it,
let's batch the allocations of state structs too, to further reduce
the malloc traffic.

This incidentally allows moveouts() to be optimized in a similar
way to moveins(): when moving an arc to another state, it's now
valid to just re-link the same arc struct into a different outchain,
where before the code invariants required us to make a physically
new arc and then free the old one.

These changes reduce the regex compiler's typical space consumption
for average-size regexes by about a factor of two, and much more for
large or complicated regexes.  In a large test set of real-world
regexes, we formerly had half a dozen cases that failed with "regular
expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
limit (about 150MB); we would have had to raise that limit to
something close to 400MB to make them work with the old code.  Now,
none of those cases need more than 13MB to compile.  Furthermore,
the test set is about 10% faster overall due to less malloc traffic.

Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us
2021-02-26 13:52:10 -05:00
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Tom Lane 80ca8464fe Fix list-manipulation bug in WITH RECURSIVE processing.
makeDependencyGraphWalker and checkWellFormedRecursionWalker
thought they could hold onto a pointer to a list's first
cons cell while the list was modified by recursive calls.
That was okay when the cons cell was actually separately
palloc'd ... but since commit 1cff1b95a, it's quite unsafe,
leading to core dumps or incorrect complaints of faulty
WITH nesting.

In the field this'd require at least a seven-deep WITH nest
to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
allows the bug to be seen with lesser nesting depths.

Per bug #16801 from Alexander Lakhin.  Back-patch to v13.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org
2021-02-25 20:47:32 -05:00
Peter Geoghegan 2376361839 VACUUM VERBOSE: Count "newly deleted" index pages.
Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
operation -- these are newly deleted pages.  VACUUM VERBOSE continues to
report on the total number of deleted pages in the entire index (no
change there).  The former is a subset of the latter.

The distinction between each category of deleted index page only arises
with index AMs where page deletion is supported and is decoupled from
page recycling for performance reasons.

This is follow-up work to commit e5d8a999, which made nbtree store
64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
deleted.  Note that the btm_last_cleanup_num_delpages metapage field
added by that commit usually gets set to pages_newly_deleted.  The
exceptions (the scenarios in which they're not equal) all seem to be
tricky cases for the implementation (of page deletion and recycling) in
general.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com
2021-02-25 14:32:18 -08:00
Tom Lane 301ed8812e Doc: remove src/backend/regex/re_syntax.n.
We aren't publishing this file as documentation, and it's been
much more haphazardly maintained than the real docs in func.sgml,
so let's just drop it.  I think the only reason I included it in
commit 7bcc6d98f was that the Berkeley-era sources had had a man
page in this directory.

Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us
2021-02-25 13:33:27 -05:00
Tom Lane 7dc13a0f08 Change regex \D and \W shorthands to always match newlines.
Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.

This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists.  It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.

The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]].  (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)

Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:29:06 -05:00
Tom Lane 2a0af7fe46 Allow complemented character class escapes within regex brackets.
The complement-class escapes \D, \S, \W are now allowed within
bracket expressions.  There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.

Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets.  POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.

One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".

Some implementation cleanup along the way:

* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.

* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.

* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does?  Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.

Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:00:40 -05:00
Peter Geoghegan e5d8a99903 Use full 64-bit XIDs in deleted nbtree pages.
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely.  Commit 6655a729 did the same thing for deleted pages in
GiST indexes.  That work was used as a starting point here.

Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages.  There is no longer any
reason to care about that condition/the oldest XID.  It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.

The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).

The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much.  We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever.  We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).

These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index.  All that matters now is keeping the costs and
benefits in balance over time.

Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored.  We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().

A backpatchable fix cannot easily be extracted from this commit.  A
targeted fix for the issue will follow in a later commit, though that
won't happen today.

I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users.  The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future.  Perhaps some passing mention of page
deletion will be added back at the same time.

Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-24 18:41:34 -08:00
Amit Kapila 8a4f9522d0 Fix relcache reference leak introduced by ce0fdbfe97.
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com
2021-02-25 07:48:24 +05:30
Michael Paquier bcf2667bf6 Fix some typos, grammar and style in docs and comments
The portions fixing the documentation are backpatched where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com
backpatch-through: 9.6
2021-02-24 16:13:17 +09:00
Peter Eisentraut 8ec8fe0f31 Message style fix
Don't quote type name placeholders.
2021-02-24 07:00:49 +01:00
Alvaro Herrera 5a65eacfdc
Fix confusion in comments about generate_gather_paths
d2d8a229bc introduced a new function generate_useful_gather_paths to
be used as a replacement for generate_gather_paths, but forgot to update
a couple of places that referenced the older function.

This is possibly not 100% complete (ref. create_ordered_paths), but it's
better than not changing anything.

Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local
2021-02-23 20:05:15 -03:00
Alvaro Herrera 8deb6b38dc
Reinstate HEAP_XMAX_LOCK_ONLY|HEAP_KEYS_UPDATED as allowed
Commit 866e24d47d added an assert that HEAP_XMAX_LOCK_ONLY and
HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that
the latter necessarily referred to an update and not a tuple lock; but
that's wrong, because SELECT FOR UPDATE can use precisely that
combination, as evidenced by the amcheck test case added here.

Remove the Assert(), and also patch amcheck's verify_heapam.c to not
complain if the combination is found.  Also, out of overabundance of
caution, update (across all branches) README.tuplock to be more explicit
about this.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/20210124061758.GA11756@nol
2021-02-23 17:30:21 -03:00
Tom Lane 3db05e76f9 Suppress compiler warning in new regex match-all detection code.
gcc 10 is smart enough to notice that control could reach this
"hasmatch[depth]" assignment with depth < 0, but not smart enough
to know that that would require a badly broken NFA graph.  Change
the assert() to a plain runtime test to shut it up.

Per report from Andres Freund.

Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de
2021-02-23 13:55:34 -05:00
Alvaro Herrera d9d076222f
VACUUM: ignore indexing operations with CONCURRENTLY
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction.  (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).

This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
2021-02-23 12:15:09 -03:00
Peter Eisentraut 6f6f284c7e Simplify printing of LSNs
Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs.
Convert all applicable code to use it.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
2021-02-23 10:27:02 +01:00
Amit Kapila ade89ba5f4 Fix an oversight in ReorderBufferFinishPrepared.
We don't have anything to decode in a transaction if ReorderBufferTXN
doesn't exist by the time we decode the commit prepared. So don't create a
new ReorderBufferTXN here. This is an oversight in commit a271a1b5.

Reported-by: Markus Wanner
Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch
2021-02-23 09:47:41 +05:30
Amit Kapila bc617a7b1c Change the error message for logical replication authentication failure.
The authentication failure error message wasn't distinguishing whether
it is a physical replication or logical replication connection failure and
was giving incomplete information on what led to failure in case of logical
replication connection.

Author: Paul Martinez and Amit Kapila
Reviewed-by: Euler Taveira and Amit Kapila
Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com
2021-02-23 09:11:22 +05:30
Alvaro Herrera 0f5505a881
Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls
Pavan Deolasee recently noted that a few of the
HeapTupleHeaderIndicatesMovedPartitions calls added by commit
5db6df0c01 are useless, since they are done after comparing t_self
with t_ctid.  But because t_self can never be set to the magical values
that indicate that the tuple moved partition, this can never succeed: if
the first test fails (so we know t_self equals t_ctid), necessarily the
second test will also fail.

So these checks can be removed and no harm is done.  There's no bug
here, just a code legibility issue.

Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com>
Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql
2021-02-22 16:51:44 -03:00
Alvaro Herrera 6a03369a71
Fix typo 2021-02-22 11:34:05 -03:00
Thomas Munro beb4480c85 Refactor get_collation_current_version().
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration.  Merge them.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Fujii Masao f05ed5a5cf Initialize atomic variable waitStart in PGPROC, at postmaster startup.
Commit 46d6e5f567 added the atomic variable "waitStart" into PGPROC struct,
to store the time at which wait for lock acquisition started. Previously
this variable was initialized every time each backend started. Instead,
this commit makes postmaster initialize it at the startup, to ensure that
the variable should be initialized before any use of it.

This commit also moves the code to initialize "waitStart" variable for
prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts().
Because MarkAsPreparingGuts() is more proper place to do that since
it initializes other PGPROC variables.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com
2021-02-22 18:25:00 +09:00
Peter Eisentraut efbfb64241 Improve new hash partition bound check error messages
For the error message "every hash partition modulus must be a factor
of the next larger modulus", add a detail message that shows the
particular numbers and existing partition involved.  Also comment the
code more.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com
2021-02-22 08:06:45 +01:00
Michael Paquier 9294264278 Use pgstat_progress_update_multi_param() where possible
This commit changes one code path in REINDEX INDEX and one code path
in CREATE INDEX CONCURRENTLY to report the progress of each operation
using pgstat_progress_update_multi_param() rather than
multiple calls to pgstat_progress_update_param().  This has the
advantage to make the progress report more consistent to the end-user
without impacting the amount of information provided.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com
2021-02-22 14:21:40 +09:00
Thomas Munro db8374d804 Remove outdated reference to RAID spindles.
Commit b09ff536 left behind some outdated advice in the long_desc field
of the GUC "effective_io_concurrency".  Remove it.

Back-patch to 13.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com
2021-02-22 14:42:15 +13:00
Tom Lane 190c79884a Simplify memory management for regex DFAs a little.
Coverity complained that functions in regexec.c might leak DFA
storage.  It's wrong, but this logic is confusing enough that it's
not so surprising Coverity couldn't make sense of it.  Rewrite
in hopes of making it more legible to humans as well as machines.
2021-02-21 20:29:11 -05:00
Tom Lane ea1268f630 Avoid generating extra subre tree nodes for capturing parentheses.
Previously, each pair of capturing parentheses gave rise to a separate
subre tree node, whose only function was to identify that we ought to
capture the match details for this particular sub-expression.  In
most cases we don't really need that, since we can perfectly well
put a "capture this" annotation on the child node that does the real
matching work.  As with the two preceding commits, the main value
of this is to avoid generating and optimizing an NFA for a tree node
that's not really pulling its weight.

The chosen data representation only allows one capture annotation
per subre node.  In the legal-per-spec, but seemingly not very useful,
case where there are multiple capturing parens around the exact same
bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
nodes that act the same as before.  We could work harder at that but
I'll refrain, pending some evidence that such cases are worth troubling
over.

In passing, improve the comments in regex.h to say what all the
different re_info bits mean.  Some of them were pretty obvious
but others not so much, so reverse-engineer some documentation.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:26:41 -05:00
Tom Lane 5810430894 Convert regex engine's subre tree from binary to N-ary style.
Instead of having left and right child links in subre structs,
have a single child link plus a sibling link.  Multiple children
of a tree node are now reached by chasing the sibling chain.

The beneficiary of this is alternation tree nodes.  A regular
expression with N (>1) branches is now represented by one alternation
node with N children, rather than a tree that includes N alternation
nodes as well as N children.  While the old representation didn't
really cost anything extra at execution time, it was pretty horrid
for compilation purposes, because each of the alternation nodes had
its own NFA, which we were too stupid not to separately optimize.
(To make matters worse, all of those NFAs described the entire
alternation pattern, not just the portion of it that one might
expect from the tree structure.)

We continue to require concatenation nodes to have exactly two
children.  This data structure is now prepared to support more,
but the executor's logic would need some careful redesign, and
it's not clear that a lot of benefit could be had.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:07:45 -05:00
Tom Lane cebc1d34e5 Fix regex engine to suppress useless concatenation sub-REs.
The comment for parsebranch() claims that it avoids generating
unnecessary concatenation nodes in the "subre" tree, but it missed
some significant cases.  Once we've decided that a given atom is
"messy" and can't be bundled with the preceding atom(s) of the
current regex branch, parseqatom() always generated two new concat
nodes, one to concat the messy atom to what follows it in the branch,
and an upper node to concatenate the preceding part of the branch
to that one.  But one or both of these could be unnecessary, if the
messy atom is the first, last, or only one in the branch.  Improve
the code to suppress such useless concat nodes, along with the
no-op child nodes representing empty chunks of a branch.

Reducing the number of subre tree nodes offers significant savings
not only at execution but during compilation, because each subre node
has its own NFA that has to be separately optimized.  (Maybe someday
we'll figure out how to share the optimization work across multiple
tree nodes, but it doesn't look easy.)  Eliminating upper tree nodes
is especially useful because they tend to have larger NFAs.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:45:29 -05:00
Tom Lane 824bf71902 Recognize "match-all" NFAs within the regex engine.
This builds on the previous "rainbow" patch to detect NFAs that will
match any string, though possibly with constraints on the string length.
This definition is chosen to match constructs such as ".*", ".+", and
".{1,100}".  Recognizing such an NFA after the optimization pass is
fairly cheap, since we basically just have to verify that all arcs
are RAINBOW arcs and count the number of steps to the end state.
(Well, there's a bit of complication with pseudo-color arcs for string
boundary conditions, but not much.)

Once we have these markings, the regex executor functions longest(),
shortest(), and matchuntil() don't have to expend per-character work
to determine whether a given substring satisfies such an NFA; they
just need to check its length against the bounds.  Since some matching
problems require O(N) invocations of these functions, we've reduced
the runtime for an N-character string from O(N^2) to O(N).  Of course,
this is no help for non-matchall sub-patterns, but those usually have
constraints that allow us to avoid needing O(N) substring checks in the
first place.  It's precisely the unconstrained "match-all" cases that
cause the most headaches.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:31:19 -05:00
Tom Lane 08c0d6ad65 Invent "rainbow" arcs within the regex engine.
Some regular expression constructs, most notably the "." match-anything
metacharacter, produce a sheaf of parallel NFA arcs covering all
possible colors (that is, character equivalence classes).  We can make
a noticeable improvement in the space and time needed to process large
regexes by replacing such cases with a single arc bearing the special
color code "RAINBOW".  This requires only minor additional complication
in places such as pull() and push().

Callers of pg_reg_getoutarcs() must now be prepared for the possibility
of seeing a RAINBOW arc.  For the one known user, contrib/pg_trgm,
that's a net benefit since it cuts the number of arcs to be dealt with,
and the handling isn't any different than for other colors that contain
too many characters to be dealt with individually.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:11:56 -05:00
Fujii Masao 8a55cb5ba9 Fix bug in COMMIT AND CHAIN command.
This commit fixes COMMIT AND CHAIN command so that it starts new transaction
immediately even if savepoints are defined within the transaction to commit.
Previously COMMIT AND CHAIN command did not in that case because
commit 280a408b48 forgot to make CommitTransactionCommand() handle
a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT.

Also this commit adds the regression test for COMMIT AND CHAIN command
when savepoints are defined.

Back-patch to v12 where transaction chaining was added.

Reported-by: Arthur Nascimento
Author: Fujii Masao
Reviewed-by: Arthur Nascimento, Vik Fearing
Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
2021-02-19 21:57:52 +09:00
Peter Eisentraut 678d0e239b Update snowball
Update to snowball tag v2.1.0.  Major changes are new stemmers for
Armenian, Serbian, and Yiddish.
2021-02-19 08:10:15 +01:00
Peter Geoghegan b071a31149 Add nbtree README section on page recycling.
Consolidate discussion of how VACUUM places pages in the FSM for
recycling by adding a new section that comes after discussion of page
deletion.  This structure reflects the fact that page recycling is
explicitly decoupled from page deletion in Lanin & Shasha's paper.  Page
recycling in nbtree is an implementation of what the paper calls "the
drain technique".

This decoupling is an important concept for nbtree VACUUM.  Searchers
have to detect and recover from concurrent page deletions, but they will
never have to reason about concurrent page recycling.  Recycling can
almost always be thought of as a low level garbage collection operation
that asynchronously frees the physical space that backs a logical tree
node.  Almost all code need only concern itself with logical tree nodes.
(Note that "logical tree node" is not currently a term of art in the
nbtree code -- this all works implicitly.)

This is preparation for an upcoming patch that teaches nbtree VACUUM to
remember the details of pages that it deletes on the fly, in local
memory.  This enables the same VACUUM operation to consider placing its
own deleted pages in the FSM later on, when it reaches the end of
btvacuumscan().
2021-02-18 21:16:33 -08:00
Tom Lane b5a66e7353 Fix another ancient bug in parsing of BRE-mode regular expressions.
While poking at the regex code, I happened to notice that the bug
squashed in commit afcc8772e had a sibling: next() failed to return
a specific value associated with the '}' token for a "\{m,n\}"
quantifier when parsing in basic RE mode.  Again, this could result
in treating the quantifier as non-greedy, which it never should be in
basic mode.  For that to happen, the last character before "\}" that
sets "nextvalue" would have to set it to zero, or it'd have to have
accidentally been zero from the start.  The failure can be provoked
repeatably with, for example, a bound ending in digit "0".

Like the previous patch, back-patch all the way.
2021-02-18 22:38:55 -05:00
Fujii Masao 614b7f18b3 Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver.
Commit 2c8dd05d6c added the atomic variable writtenUpto into
walreceiver's shared memory information. It's initialized only
when walreceiver started up but could be read via pg_stat_wal_receiver
view anytime, i.e., even before it's initialized. In the server built
with --disable-atomics and --disable-spinlocks, this uninitialized
atomic variable read could cause "invalid spinlock number: 0" error.

This commit changed writtenUpto so that it's initialized at
the postmaster startup, to avoid the uninitialized variable read
via pg_stat_wal_receiver and fix the error.

Also this commit moved the read of writtenUpto after the release
of spinlock protecting walreceiver's shared variables. This is
necessary to prevent new spinlock from being taken by atomic
variable read while holding another spinlock, and to shorten
the spinlock duration. This change leads writtenUpto not to be
consistent with the other walreceiver's shared variables protected
by a spinlock. But this is OK because writtenUpto should not be
used for data integrity checks.

Back-patch to v13 where commit 2c8dd05d6c introduced the bug.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com
2021-02-18 23:28:15 +09:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Peter Geoghegan 128dd901a5 nbtree README: move VACUUM linear scan section.
Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM.  This
progression is a lot more natural.

Also tweak the wording a little.  It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
2021-02-17 21:13:15 -08:00
Tomas Vondra 927f453a94 Fix tuple routing to initialize batching only for inserts
A cross-partition update on a partitioned table is implemented as a
delete followed by an insert. With foreign partitions, this was however
causing issues, because the FDW and core may disagree on when to enable
batching.  postgres_fdw was only allowing batching for plain inserts
(CMD_INSERT) while core was trying to batch the insert component of the
cross-partition update.  Fix by restricting core to apply batching only
to plain CMD_INSERT queries.

It's possible to allow batching for cross-partition updates, but that
will require more extensive changes, so better to leave that for a
separate patch.

Author: Amit Langote
Reviewed-by: Tomas Vondra, Takayuki Tsunakawa
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-02-18 00:03:45 +01:00
Tom Lane 4e703d6719 Make some minor improvements in the regex code.
Push some hopefully-uncontroversial bits extracted from an upcoming
patch series, to remove non-relevant clutter from the main patches.

In compact(), return immediately after setting REG_ASSERT error;
continuing the loop would just lead to assertion failure below.
(Ask me how I know.)

In parseqatom(), remove assertion that moresubs() did its job.
When moresubs actually did its job, this is redundant with that
function's final assert; but when it failed on OOM, this is an
assertion crash.  We could avoid the crash by adding a NOERR()
check before the assertion, but it seems better to subtract code
than add it.  (Note that there's a NOERR exit a few lines further
down, and nothing else between here and there requires moresubs
to have succeeded.  So we don't really need an extra error exit.)
This is a live bug in assert-enabled builds, but given the very
low likelihood of OOM in moresub's tiny allocation, I don't think
it's worth back-patching.

On the other hand, it seems worthwhile to add an assertion that
our intended v->subs[subno] target is still null by the time we
are ready to insert into it, since there's a recursion in between.

In pg_regexec, ensure we fflush any debug output on the way out,
and try to make MDEBUG messages more uniform and helpful.  (In
particular, ensure that all of them are prefixed with the subre's
id number, so one can match up entry and exit reports.)

Add some test cases in test_regex to improve coverage of lookahead
and lookbehind constraints.  Adding these now is mainly to establish
that this is indeed the existing behavior.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-17 12:24:23 -05:00
Peter Eisentraut f40c6969d0 Routine usage information schema tables
Several information schema views track dependencies between
functions/procedures and objects used by them.  These had not been
implemented so far because PostgreSQL doesn't track objects used in a
function body.  However, formally, these also show dependencies used
in parameter default expressions, which PostgreSQL does support and
track.  So for the sake of completeness, we might as well add these.
If dependency tracking for function bodies is ever implemented, these
views will automatically work correctly.

Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
2021-02-17 18:16:06 +01:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Tom Lane 38bb3aef35 Convert tsginidx.c's GIN indexing logic to fully ternary operation.
Commit 2f2007fbb did this partially, but there were two remaining
warts.  checkcondition_gin handled some uncertain cases by setting
the out-of-band recheck flag, some by returning TS_MAYBE, and some
by doing both.  Meanwhile, TS_execute arbitrarily converted a
TS_MAYBE result to TS_YES.  Thus, if checkcondition_gin chose to
only return TS_MAYBE, the outcome would be TS_YES with no recheck
flag, potentially resulting in wrong query outputs.

The case where this'd happen is if there were GIN_MAYBE entries
in the indexscan results passed to gin_tsquery_[tri]consistent,
which so far as I can see would only happen if the tidbitmap used
to accumulate indexscan results grew large enough to become lossy.

I initially thought of fixing this by ensuring we always set the
recheck flag as well as returning TS_MAYBE in uncertain cases.
But that errs in the other direction, potentially forcing rechecks
of rows that provably match the query (since the recheck flag
remains set even if TS_execute later finds that the answer must be
TS_YES).  Instead, let's get rid of the out-of-band recheck flag
altogether and rely on returning TS_MAYBE.  This requires exporting
a version of TS_execute that will actually return the full ternary
result of the evaluation ... but we likely should have done that
to start with.

Unfortunately it doesn't seem practical to add a regression test case
that covers this: the amount of data needed to cause the GIN bitmap to
become lossy results in a longer runtime than I think we want to have
in the tests.  (I'm wondering about allowing smaller work_mem settings
to ameliorate that, but it'd be a matter for a separate patch.)

Per bug #16865 from Dimitri Nüscheler.  Back-patch to v13 where
the faulty commit came in.

Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org
2021-02-16 12:07:14 -05:00
Amit Kapila f672df5fdd Remove the unnecessary PrepareWrite in pgoutput.
This issue exists from the inception of this code (PG-10) but got exposed
by the recent commit ce0fdbfe97 where we are using origins in tablesync
workers. The problem was that we were sometimes sending the prepare_write
('w') message but then the actual message was not being sent and on the
subscriber side, we always expect a message after prepare_write message
which led to this bug.

I refrained from backpatching this because there is no way in the core
code to hit this prior to commit ce0fdbfe97 and we haven't received any
complaints so far.

Reported-by: Erik Rijkers
Author: Amit Kapila and Vignesh C
Tested-by: Erik Rijkers
Discussion: https://postgr.es/m/1295168140.139428.1613133237154@webmailclassic.xs4all.nl
2021-02-16 07:26:50 +05:30
Andres Freund 8001cb77ee Fix heap_page_prune() parameter order confusion introduced in dc7420c2c9.
Both luckily and unluckily the passed values meant the same for all
types. Luckily because that meant my confusion caused no harm,
unluckily because otherwise the compiler might have warned...

In passing, synchronize parameter names between definition and
declaration.

Reported-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=L=nBoepQdH9b5Qd0nMvepFT2CnT6sjWvvpOXa=K8HVQ@mail.gmail.com
2021-02-15 17:12:12 -08:00
Andres Freund a975ff4980 Remove backwards compat ugliness in snapbuild.c.
In 955a684e04 we fixed a bug in initial snapshot creation. In the
course of which several members of struct SnapBuild were obsoleted. As
SnapBuild is serialized to disk we couldn't change the memory layout.

Unfortunately I subsequently forgot about removing the backward compat
gunk, but luckily Heikki just reminded me.

This commit bumps SNAPBUILD_VERSION, therefore breaking existing
slots (which is fine in a major release).

Author: Andres Freund
Reminded-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-02-15 16:57:47 -08:00
Tom Lane 0e52903128 Simplify loop logic in nodeIncrementalSort.c.
The inner loop in switchToPresortedPrefixMode() can be implemented
as a conventional integer-counter for() loop, removing a couple of
redundant boolean state variables.  The old logic here was a remnant
of earlier development, but as things now stand there's no reason
for extra complexity.

Also, annotate the test case added by 82e0e2930 to explain why it
manages to hit the corner case fixed in that commit, and add an
EXPLAIN to verify that it's creating an incremental-sort plan.

Back-patch to v13, like the previous patch.

James Coleman and Tom Lane

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-15 10:17:58 -05:00
Heikki Linnakangas 54e51dcde0 Make ExecGetInsertedCols() and friends more robust and improve comments.
If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols()
were called with a ResultRelInfo that's not in the range table and isn't a
partition routing target, the functions would dereference a NULL pointer,
relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing
RI triggers in tables that are not modified directly. None of the current
callers of these functions pass such relations, so this isn't a live bug,
but let's make them more robust.

Also update comment in ResultRelInfo; after commit 6214e2b228,
ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple
routing.

Noted by Coverity. Backpatch down to v11, like commit 6214e2b228.

Reviewed-by: Tom Lane, Amit Langote
2021-02-15 09:28:08 +02:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Peter Geoghegan 7cde6b13a9 Adjust lazy_scan_heap() accounting comments.
Explain which particular LP_DEAD line pointers get accounted for by the
tups_vacuumed variable.
2021-02-14 19:28:37 -08:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Amit Kapila d9b0767bec Fix the warnings introduced in commit ce0fdbfe97.
Author: Amit Kapila
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1610789.1613170207@sss.pgh.pa.us
2021-02-15 07:28:02 +05:30
Thomas Munro 637668fb1d Hold interrupts while running dsm_detach() callbacks.
While cleaning up after a parallel query or parallel index creation that
created temporary files, we could be interrupted by a statement timeout.
The error handling path would then fail to clean up the files when it
ran dsm_detach() again, because the callback was already popped off the
list.  Prevent this hazard by holding interrupts while the cleanup code
runs.

Thanks to Heikki Linnakangas for this suggestion, and also to Kyotaro
Horiguchi, Masahiko Sawada, Justin Pryzby and Tom Lane for discussion of
this and earlier ideas on how to fix the problem.

Back-patch to all supported releases.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20191212180506.GR2082@telsasoft.com
2021-02-15 14:27:33 +13:00
Michael Paquier b83dcf7928 Add result size as argument of pg_cryptohash_final() for overflow checks
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that.  This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.

The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core.  Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes.  In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.

This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.

Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
2021-02-15 10:18:34 +09:00
Tom Lane 2dd6733108 Minor fixes to improve regex debugging code.
When REG_DEBUG is defined, ensure that an un-filled "struct cnfa"
is all-zeroes, not just that it has nstates == 0.  This is mainly
so that looking at "struct subre" structs in gdb doesn't distract
one with a lot of garbage fields during regex compilation.

Adjust some places that print debug output to have suitable fflush
calls afterwards.

In passing, correct an erroneous ancient comment: the concatenation
subre-s created by parsebranch() have op == '.' not ','.

Noted while fooling around with some regex performance improvements.
2021-02-14 19:53:42 -05:00
Thomas Munro c7ecd6af01 ReadNewTransactionId() -> ReadNextTransactionId().
The new name conveys the effect better, is more consistent with similar
functions ReadNextMultiXactId(), ReadNextFullTransactionId(), and
matches the name of the variable that it reads.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmVR4SakBXQUdhhPpMf1aYvZCnna5%3DHKa7DAgEmBAg%2B8g%40mail.gmail.com
2021-02-15 13:17:02 +13:00
Bruce Momjian 8facf1ea00 README/C-comment: document GiST's NSN value 2021-02-13 13:50:49 -05:00
Tom Lane ae4867ec74 Avoid divide-by-zero in regex_selectivity() with long fixed prefix.
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero.  Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.

To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate.  In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on.  The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(

Per report from Alexander Lakhin.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
2021-02-12 16:26:47 -05:00
Amit Kapila ce0fdbfe97 Allow multiple xacts during table sync in logical replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.

There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.

This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.

The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.

This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.

Bump catalog version due to change of state in the catalog
(pg_subscription_rel).

Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-12 07:41:51 +05:30
Peter Geoghegan 3063eb1759 Remove obsolete IndexBulkDeleteResult stats field.
The pages_removed field is no longer used for anything.  It hasn't been
possible for an index to physically shrink since old-style VACUUM FULL
was removed by commit 0a469c87.
2021-02-11 16:49:41 -08:00
Tom Lane 69036aafb9 Simplify jsonfuncs.c code by using strtoint() not strtol().
Explicitly testing for INT_MIN and INT_MAX isn't particularly good
style; it's tedious and may draw useless compiler warnings on
machines where int and long are the same width.  We invented
strtoint() precisely for this usage, so use that instead.

While here, remove gratuitous variations in the way the tests for
did-strtoint-succeed were spelled.  Also, avoid attempting to
negate INT_MIN; that would probably work given that the result
is implicitly cast to uint32, but I think it's nominally undefined
behavior.

Per gripe from Ranier Vilela, though this isn't his proposed patch.

Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com
2021-02-11 12:49:22 -05:00
Tom Lane d4c746516b Remove no-longer-used RTE argument of markVarForSelectPriv().
In the wake of c028faf2a, this is no longer needed.  I left it
out of that patch since the API change would be undesirable in
a released branch; but there's no reason not to do it in HEAD.
2021-02-11 11:23:25 -05:00
Peter Eisentraut 4ad5611055 Fix lack of message pluralization 2021-02-10 11:35:45 +01:00
Michael Paquier 092b785fad Simplify code related to compilation of SSL and OpenSSL
This commit makes more generic some comments and code related to the
compilation with OpenSSL and SSL in general to ease the addition of more
SSL implementations in the future.  In libpq, some OpenSSL-only code is
moved under USE_OPENSSL and not USE_SSL.

While on it, make a comment more consistent in libpq-fe.h.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/5382CB4A-9CF3-4145-BA46-C802615935E0@yesql.se
2021-02-10 15:28:19 +09:00
Michael Paquier bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Amit Kapila cd142e032e Make pg_replication_origin_drop safe against concurrent drops.
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".

To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().

Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro
Herrera
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
2021-02-10 07:17:09 +05:30
Peter Geoghegan 31c7fb41e2 Fix obsolete FSM remarks in nbtree README.
The free space map has used a dedicated relation fork rather than shared
memory segments for over a decade.
2021-02-09 11:36:51 -08:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Michael Paquier 7cb3048f38 Add option PROCESS_TOAST to VACUUM
This option controls if toast tables associated with a relation are
vacuumed or not when running a manual VACUUM.  It was already possible
to trigger a manual VACUUM on a toast relation without processing its
main relation, but a manual vacuum on a main relation always forced a
vacuum on its toast table.  This is useful in scenarios where the level
of bloat or transaction age of the main and toast relations differs a
lot.

This option is an extension of the existing VACOPT_SKIPTOAST that was
used by autovacuum to control if toast relations should be skipped or
not.  This internal flag is renamed to VACOPT_PROCESS_TOAST for
consistency with the new option.

A new option switch, called --no-process-toast, is added to vacuumdb.

Author: Nathan Bossart
Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com
2021-02-09 14:13:57 +09:00
Tom Lane c028faf2a6 Fix mishandling of column-level SELECT privileges for join aliases.
scanNSItemForColumn, expandNSItemAttrs, and ExpandSingleTable would
pass the wrong RTE to markVarForSelectPriv when dealing with a join
ParseNamespaceItem: they'd pass the join RTE, when what we need to
mark is the base table that the join column came from.  The end
result was to not fill the base table's selectedCols bitmap correctly,
resulting in an understatement of the set of columns that are read
by the query.  The executor would still insist on there being at
least one selectable column; but with a correctly crafted query,
a user having SELECT privilege on just one column of a table would
nonetheless be allowed to read all its columns.

To fix, make markRTEForSelectPriv fetch the correct RTE for itself,
ignoring the possibly-mismatched RTE passed by the caller.  Later,
we'll get rid of some now-unused RTE arguments, but that risks
API breaks so we won't do it in released branches.

This problem was introduced by commit 9ce77d75c, so back-patch
to v13 where that came in.  Thanks to Sven Klemm for reporting
the problem.

Security: CVE-2021-20229
2021-02-08 10:14:09 -05:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Peter Geoghegan 617fffee8a Rename removable xid function for consistency.
GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid().
This is consistent with the general convention for FullTransactionId
equivalents of functions that deal with TransactionId values.  It now
matches the nearby GlobalVisCheckRemovableXid() function, which performs
the same check for callers that use TransactionId values.

Oversight in commit dc7420c2c9.

Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com
2021-02-07 10:11:14 -08:00
Tom Lane d1d2979852 Revert "Propagate CTE property flags when copying a CTE list into a rule."
This reverts commit ed29089633 and
equivalent back-branch commits.  The issue is subtler than I thought,
and it's far from new, so just before a release deadline is no time
to be fooling with it.  We'll consider what to do at a bit more
leisure.

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-07 12:54:08 -05:00
Tom Lane ed29089633 Propagate CTE property flags when copying a CTE list into a rule.
rewriteRuleAction() neglected this step, although it was careful to
propagate other similar flags such as hasSubLinks or hasRowSecurity.
Omitting to transfer hasRecursive is just cosmetic at the moment,
but omitting hasModifyingCTE is a live bug, since the executor
certainly looks at that.

The proposed test case only fails back to v10, but since the executor
examines hasModifyingCTE in 9.x as well, I suspect that a test case
could be devised that fails in older branches.  Given the nearness
of the release deadline, though, I'm not going to spend time looking
for a better test.

Report and patch by Greg Nancarrow, cosmetic changes by me

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-06 19:28:39 -05:00
Tom Lane dd705a039f Disallow converting an inheritance child table to a view.
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables).  ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end.  When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted.  Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).

One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible.  There are probably a lot of other parts of the
system that don't cope well with such a situation, too.  For now,
just forbid it.

Per bug #16856 from Yang Lin.  Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass().  Perhaps the lack of
that infrastructure partially explains the missing check.)

Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
2021-02-06 15:17:01 -05:00
Michael Paquier f7400823c3 Clarify some comments around SharedRecoveryState in xlog.c
SharedRecoveryState has been switched from a boolean to an enum as of
commit 4e87c48, but some comments still referred to it as a boolean.

Author: Amul Sul
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig
2021-02-06 10:27:55 +09:00
Heikki Linnakangas c444472af5 Fix backslash-escaping multibyte chars in COPY FROM.
If a multi-byte character is escaped with a backslash in TEXT mode input,
and the encoding is one of the client-only encodings where the bytes after
the first one can have an ASCII byte "embedded" in the char, we didn't
skip the character correctly. After a backslash, we only skipped the first
byte of the next character, so if it was a multi-byte character, we would
try to process its second byte as if it was a separate character. If it
was one of the characters with special meaning, like '\n', '\r', or
another '\\', that would cause trouble.

One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding.
That's supposed to be [backslash][two-byte character][.][f][o][o], but
because the second byte of the two-byte character is 0x5c, we incorrectly
treat it as another backslash. And because the next character is a dot, we
parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt"
error.

Backpatch to all supported versions.

Reviewed-by: John Naylor, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
2021-02-05 11:14:56 +02:00
Tom Lane 0ff865fbe5 Fix bug in HashAgg's selective-column-spilling logic.
Commit 230230223 taught nodeAgg.c that, when spilling tuples from
memory in an oversized hash aggregation, it only needed to spill
input columns referenced in the node's tlist and quals.  Unfortunately,
that's wrong: we also have to save the grouping columns.  The error
is masked in common cases because the grouping columns also appear
in the tlist, but that's not necessarily true.  The main category
of plans where it's not true seem to come from semijoins ("WHERE
outercol IN (SELECT innercol FROM innertable)") where the innercol
needs an implicit promotion to make it comparable to the outercol.
The grouping column will be "innercol::promotedtype", but that
expression appears nowhere in the Agg node's own tlist and quals;
only the bare "innercol" is found in the tlist.

I spent quite a bit of time looking for a suitable regression test
case for this, without much success.  If the number of distinct
values of the innercol is large enough to make spilling happen,
the planner tends to prefer a non-HashAgg plan, at least for
problem sizes that are reasonable to use in the regression tests.
So, no new regression test.  However, this patch does demonstrably
fix the originally-reported test case.

Per report from s.p.e (at) gmx-topmail.de.  Backpatch to v13
where the troublesome code came in.

Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
2021-02-04 23:01:37 -05:00
Tom Lane 82e0e29308 Fix YA incremental sort bug.
switchToPresortedPrefixMode() did the wrong thing if it detected
a batch boundary just at the last tuple of a fullsort group.

The initially-reported symptom was a "retrieved too many tuples in a
bounded sort" error, but the test case added here just silently gives
the wrong answer without this patch.

I (tgl) am not really happy about committing this patch without review
from the incremental-sort authors, but they seem AWOL and we are hard
against a release deadline.  This does demonstrably make some cases
better, anyway.

Per bug #16846 from Yoran Heling.  Back-patch to v13 where incremental
sort was introduced.

Neil Chen

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-04 19:12:14 -05:00
Peter Geoghegan c34787f910 Harden nbtree page deletion.
Add some additional defensive checks in the second phase of index
deletion to detect and report index corruption during VACUUM, and to
avoid having VACUUM become stuck in more cases.  The code is still not
robust in the presence of a circular chain of sibling links, though it's
not clear whether that really matters.  This is follow-up work to commit
3a01f68e.

The new defensive checks rely on the assumption that there can be no
more than one VACUUM operation running for an index at any given time.
Remove an old comment suggesting that multiple concurrent VACUUMs need
to be considered here.  This concern now seems highly unlikely to have
any real validity, since we clearly rely on the same assumption in
several other places.  For example, there are much more recent comments
that appear in the same function (added by commit efada2b8e9) that make
the same assumption.

Also add a CHECK_FOR_INTERRUPTS() to the relevant code path.  Contrary
to comments added by commit 3a01f68e, it is actually possible to handle
interrupts here, at least in the common case where processing takes
place at the leaf level.  We only hold a pin on leafbuf/target page when
stepping right at the leaf level.

No backpatch due to the lack of complaints following hardening added to
the same area by commit 3a01f68e.
2021-02-04 15:42:36 -08:00
Heikki Linnakangas 2f86ab305e Fix small error in COPY FROM progress reporting.
The # of bytes processed was accumulated slightly incorrectly. After
loading more data to the input buffer, we added the number of bytes in
the buffer to the sum. But in case of multi-byte characters or escapes,
there can be a few unprocessed bytes left over from previous load in the
buffer. Those bytes got counted twice.
2021-02-04 17:40:33 +02:00
Peter Eisentraut 3c78e0569c Refactor Windows error message for easier translation
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization.  Refactor the error
messages so this is easier and clearer.  Also fix the capitalization
to match the existing capitalization in the OS.
2021-02-04 13:31:13 +01:00
Michael Paquier 5128483d06 Ensure unlinking of old index file with REINDEX (TABLESPACE)
The original versions of the patch included this part, but a mismerge
from my side has made this piece go missing.  Oversight in c5b28604.
2021-02-04 17:16:47 +09:00
Michael Paquier fc749bc704 Clarify comment in tablesync.c
Author: Peter Smith
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAHut+Pt9_T6pWar0FLtPsygNmme8HPWPdGUyZ_8mE1Yvjdf0ZA@mail.gmail.com
2021-02-04 16:02:31 +09:00
Michael Paquier c5b286047c Add TABLESPACE option to REINDEX
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them.  Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace.  The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX.  This results in an error
if doing REINDEX on a single object.  REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.

Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2021-02-04 14:34:20 +09:00
Tom Lane 9624321ec5 Avoid crash when rolling back within a prepared statement.
If a portal is used to run a prepared CALL or DO statement that
contains a ROLLBACK, PortalRunMulti fails because the portal's
statement list gets cleared by the rollback.  (Since the grammar
doesn't allow CALL/DO in PREPARE, the only easy way to get to this is
via extended query protocol, which treats all inputs as prepared
statements.)  It's difficult to avoid resetting the portal early
because of resource-management issues, so work around this by teaching
PortalRunMulti to be wary of portal->stmts having suddenly become NIL.

The crash has only been seen to occur in v13 and HEAD (as a
consequence of commit 1cff1b95a having added an extra touch of
portal->stmts).  But even before that, the code involved touching a
List that the portal no longer has any claim on.  In the test case at
hand, the List will still exist because of another refcount on the
cached plan; but I'm far from convinced that it's impossible for the
cached plan to have been dropped by the time control gets back to
PortalRunMulti.  Hence, backpatch to v11 where nested transactions
were added.

Thomas Munro and Tom Lane, per bug #16811 from James Inform

Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org
2021-02-03 19:38:43 -05:00
Tom Lane ba0faf81c6 Remove special BKI_LOOKUP magic for namespace and role OIDs.
Now that commit 62f34097c attached BKI_LOOKUP annotation to all the
namespace and role OID columns in the catalogs, there's no real reason
to have the magic PGNSP and PGUID symbols.  Get rid of them in favor
of implementing those lookups according to genbki.pl's normal pattern.

This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes
BKI_DEFAULT(pg_catalog), which seems a lot more transparent.
BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps
less so; but you can look into pg_authid.dat to discover that
POSTGRES is the nonce name for the bootstrap superuser.

This change also means that if we ever need cross-references in the
initial catalog data to any of the other built-in roles besides
POSTGRES, or to some other built-in schema besides pg_catalog,
we can just do it.

No catversion bump here, as there's no actual change in the contents
of postgres.bki.

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-03 12:01:48 -05:00
Tom Lane 62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut 1d71f3c83c Improve confusing variable names
The prototype calls the second argument of
pgstat_progress_update_multi_param() "index", and some callers name
their local variable that way.  But when the surrounding code deals
with index relations, this is confusing, and in at least one case
shadowed another variable that is referring to an index relation.
Adjust those call sites to have clearer local variable naming, similar
to existing callers in indexcmds.c.
2021-02-02 09:20:22 +01:00
Michael Paquier 4ad31bb2ef Remove unused column atttypmod from initial tablesync query
The initial tablesync done by logical replication used a query to fetch
the information of a relation's columns that included atttypmod, but it
was left unused.  This was added by 7c4f524.

Author: Euler Taveira
Reviewed-by: Önder Kalacı, Amit Langote, Japin Li
Discussion: https://postgr.es/m/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-02-02 13:59:23 +09:00
Tom Lane f003a7522b Remove [Merge]AppendPath.partitioned_rels.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

The preceding two commits did away with all use of the partitioned_rels
values; this commit just mechanically removes the fields and the code
that calculated them.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:43:54 -05:00
Tom Lane 5076f88bc9 Remove incidental dependencies on partitioned_rels lists.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

This patch undoes a couple of very minor uses of the partitioned_rels
values.

createplan.c was testing for nil-ness to optimize away the preparatory
work for make_partition_pruneinfo().  That is worth doing if the check
is nigh free, but it's not worth going to any great lengths to avoid.

create_append_path() was testing for nil-ness as part of deciding how
to set up ParamPathInfo for an AppendPath.  I replaced that with a
check for the appendrel's parent rel being partitioned.  That's not
quite the same thing but should cover most cases.  If we note any
interesting loss of optimizations, we can dumb this down to just
always use the more expensive method when the parent is a baserel.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:34:59 -05:00
Tom Lane fb2d645dd5 Revise make_partition_pruneinfo to not use its partitioned_rels input.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

As a first step, teach make_partition_pruneinfo to collect the relevant
partitioned tables for itself.  It's not hard to do so by traversing
from child tables up to parents using the AppendRelInfo links.

While here, make some minor stylistic improvements; mainly, don't use
the "Relids" alias for bitmapsets that are not identities of any
relation considered by the planner.  Try to document the logic better,
too.

No backpatch, as there does not seem to be a live problem before
a929e17e5a.  Also no new regression test; the code where the bug
was will be gone at the end of this patch series, so it seems a
bit pointless to memorialize the issue.

Tom Lane and David Rowley, per reports from Andreas Seltenreich
and Jaime Casanova.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:05:51 -05:00
Peter Eisentraut 3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Alexander Korotkov bb513b364b Get rid of unnecessary memory allocation in jsonb_subscript_assign()
Current code allocates memory for JsonbValue, but it could be placed locally.
2021-02-01 14:06:02 +03:00
Michael Paquier fe61df7f82 Introduce --with-ssl={openssl} as a configure option
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries.  The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.

The original switch, --with-openssl, is kept for compatibility.

Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
2021-02-01 19:19:44 +09:00
Tom Lane 7c5d57caed Fix portability issue in new jsonbsubs code.
On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit
platform), the previous coding would compute a misaligned
workspace->index pointer if nupper is odd.  Architectures where
misaligned access is a hard no-no would then fail.  This appears
to explain why thorntail is unhappy but other buildfarm members
are not.
2021-02-01 02:03:59 -05:00
Alexander Korotkov aa6e46daf5 Throw error when assigning jsonb scalar instead of a composite object
During the jsonb subscripting assignment, the provided path might assume an
object or an array where the source jsonb has a scalar value.  Initial
subscripting assignment logic will skip such an update operation with no
message shown.  This commit makes it throw an error to indicate this type
of situation.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:06 +03:00
Alexander Korotkov 81fcc72e66 Filling array gaps during jsonb subscripting
This commit introduces two new flags for jsonb assignment:

* JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps
  are filled with nulls (similar to the JavaScript behavior).  This mode also
  instructs to   create the whole path in a jsonb object if some part of the
  path (more than just the last element) is not present.

* JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by
  preventing prepending of elements.

Both flags are used only in jsonb subscripting assignment.

Initially proposed by Nikita Glukhov based on polymorphic subscripting
patch, but transformed into an independent change.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:01 +03:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Peter Geoghegan dc43492e46 Remove unused _bt_delitems_delete() argument.
The latestRemovedXid values used by nbtree deletion operations are
determined by _bt_delitems_delete()'s caller, so there is no reason to
pass a separate heapRel argument.

Oversight in commit d168b66682.
2021-01-31 10:10:55 -08:00
Alexander Korotkov 0c4f355c6a Fix parsing of complex morphs to tsquery
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing
multiple words residing adjacent position, these words are connected
with OP_AND operator.  That leads to surprising results.  For instace,
both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg')
produce '( pg & class ) <-> pg' tsquery.  This tsquery requires
'pg' and 'class' words to reside on the same position and doesn't match
to to_tsvector('pg_class pg').  It appears to be ridiculous behavior, which
needs to be fixed.

This commit makes to_tsquery() or websearch_to_tsquery() connect words
residing adjacent position with OP_PHRASE.  Therefore, now those words are
normally chained with other OP_PHRASE operator.  The examples of above now
produces 'pg <-> class <-> pg' tsquery, which matches to
to_tsvector('pg_class pg').

Another effect of this commit is that complex morph word positions now need to
match the tsvector even if there is no surrounding OP_PHRASE.  This behavior
change generally looks like an improvement but making this commit not
backpatchable.

Reported-by: Barry Pederson
Bug: #16592
Discussion: https://postgr.es/m/16592-70b110ff9731c07d@postgresql.org
Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Neil Chen
2021-01-31 20:14:29 +03:00
Peter Eisentraut dfb75e478c Add primary keys and unique constraints to system catalogs
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.

This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.

The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives.  To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead.  By convention, we
usually make a catalog's OID column its primary key, if it has one.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
2021-01-30 19:44:29 +01:00
Peter Eisentraut 6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Noah Misch 7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Noah Misch 360bd2321b Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.
CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d3 broke this.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:11:38 -08:00
Noah Misch 8a54e12a38 Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions.
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows.  Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions.  This expands the VirtualTransactionId structure domain to
admit prepared transactions.  It may be necessary to reindex to recover
from past occurrences.  Back-patch to 9.5 (all supported versions).

Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.

Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
2021-01-30 00:00:27 -08:00
Michael Paquier 24843297a9 Adjust comments of CheckRelationTableSpaceMove() and SetRelationTableSpace()
4c9c359, that introduced those two functions, has been overoptimistic on
the point that only ShareUpdateExclusiveLock would be required when
moving a relation to a new tablespace.  AccessExclusiveLock is a
requirement, but ShareUpdateExclusiveLock may be used under specific
conditions like REINDEX CONCURRENTLY where waits on past transactions
make the operation safe even with a lower-level lock.  The current code
does only the former, so update the existing comments to reflect that.

Once a REINDEX (TABLESPACE) is introduced, those comments would require
an extra refresh to mention their new use case.

While on it, fix an incorrect variable name.

Per discussion with Álvaro Herrera.

Discussion: https://postgr.es/m/20210127140741.GA14174@alvherre.pgsql
2021-01-29 13:59:18 +09:00
Thomas Munro 514b411a2b Retire pg_standby.
pg_standby was useful more than a decade ago, but now it is obsolete.
It has been proposed that we retire it many times.  Now seems like a
good time to finally do it, because "waiting restore commands"
are incompatible with a proposed recovery prefetching feature.

Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
2021-01-29 14:09:41 +13:00
Tom Lane 1046dbedde Silence another gcc 11 warning.
Per buildfarm and local experimentation, bleeding-edge gcc isn't
convinced that the MemSet in reorder_function_arguments() is safe.
Shut it up by adding an explicit check that pronargs isn't negative,
and by changing MemSet to memset.  (It appears that either change is
enough to quiet the warning at -O2, but let's do both to be sure.)
2021-01-28 17:19:16 -05:00
Alvaro Herrera 6f5c8a8ec2
Remove bogus restriction from BEFORE UPDATE triggers
In trying to protect the user from inconsistent behavior, commit
487e9861d0 "Enable BEFORE row-level triggers for partitioned tables"
tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row
from one partition to another.  However, it turns out that the
restriction is wrong in two ways: first, it fails spuriously, preventing
valid situations from working, as in bug #16794; and second, they don't
protect from any misbehavior, because tuple routing would cope anyway.

Fix by removing that restriction.

We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers,
though.  It is valid and useful there.  In the future we could remove it
by having tuple reroute work for inserts as it does for updates.

Backpatch to 13.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Phillip Menke <pg@pmenke.de>
Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org
2021-01-28 16:56:07 -03:00
Tom Lane 1d9351a87c Fix hash partition pruning with asymmetric partition sets.
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong.  These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing.  However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.

While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place.  It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array.  Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true.  Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.

This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.

Per bug #16840 from Michał Albrycht.  Back-patch to v11 where the
hash partitioning code came in.  (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)

Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2021-01-28 13:41:55 -05:00
Heikki Linnakangas 6c5576075b Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the target
encoding. Implement functions to convert directly between the two.

The reason to do this now is that I'm working on a patch that will change
the conversion function signature so that if the input is invalid, we
convert as much as we can and return the number of bytes successfully
converted. That's not possible if we use an intermediary format, because
if an error happens in the intermediary -> final conversion, we lose track
of the location of the invalid character in the original input. Avoiding
the intermediate step makes the conversions faster, too.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
2021-01-28 14:53:03 +02:00
Heikki Linnakangas b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Andrew Gierth a3367aa3c4 Don't add bailout adjustment for non-strict deserialize calls.
When building aggregate expression steps, strict checks need a bailout
jump for when a null value is encountered, so there is a list of steps
that require later adjustment. Adding entries to that list for steps
that aren't actually strict would be harmless, except that there is an
Assert which catches them. This leads to spurious errors on asserts
builds, for data sets that trigger parallel aggregation of an
aggregate with a non-strict deserialization function (no such
aggregates exist in the core system).

Repair by not adding the adjustment entry when it's not needed.

Backpatch back to 11 where the code was introduced.

Per a report from Darafei (Komzpa) of the PostGIS project; analysis
and patch by me.

Discussion: https://postgr.es/m/87mty7peb3.fsf@news-spur.riddles.org.uk
2021-01-28 10:53:10 +00:00
Michael Paquier f854c69a5b Refactor SQL functions of SHA-2 in cryptohashfuncs.c
The same code pattern was repeated four times when compiling a SHA-2
hash.  This refactoring has the advantage to issue a compilation warning
if a new value is added to pg_cryptohash_type, so as anybody doing an
addition in this area would need to consider if support for a new SQL
function is needed or not.

Author: Sehrope Sarkuni, Michael Paquier
Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz
2021-01-28 16:13:26 +09:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Robert Haas 69059d3b2f In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
Since the CLOG page number is not recorded directly in the checkpoint
record, we have to use ShmemVariableCache->nextXid to figure out the
latest CLOG page number at the start of recovery. However, as recovery
progresses, replay of CLOG/EXTEND records will update our notion of
the latest page number, and we should rely on that being accurate
rather than recomputing the value based on an updated notion of
nextXid. ShmemVariableCache->nextXid is only an approximation
during recovery anyway, whereas CLOG/EXTEND records are an
authoritative representation of how the SLRU has been updated.

Commit 0fcc2decd4 makes this
simplification possible, as before that change clog_redo() might
have injected a bogus value here, and we'd want to get rid of
that before entering normal running.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 15:52:34 -05:00
Robert Haas 0fcc2decd4 In clog_redo(), don't set XactCtl->shared->latest_page_number.
The comment is no longer accurate, and hasn't been entirely accurate
since Hot Standby was introduced. The original idea here was that
StartupCLOG() wouldn't be called until the end of recovery and
therefore this value would be uninitialized when this code is reached,
but Hot Standby made that true only when hot_standby=off, and commit
1f113abdf8 means that this value is now
always initialized before replay even starts.

The original purpose of this code was to bypass the sanity check
in SimpleLruTruncate(), which will no longer occur: now, if something
is wrong, that sanity check might trip during recovery. That's
probably a good thing, because in the current code base
latest_page_number should always be initialized and therefore we
expect that the sanity check should pass. If it doesn't, something
has gone wrong, and complaining about it is appropriate.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 13:11:30 -05:00
Robert Haas 1f113abdf8 Move StartupCLOG() calls to just after we initialize ShmemVariableCache.
Previously, the hot_standby=off code path did this at end of recovery,
while the hot_standby=on code path did it at the beginning of recovery.
It's better to do this in only one place because (a) it's simpler,
(b) StartupCLOG() is trivial so trying to postpone the work isn't
useful, and (c) this will make it possible to simplify some other
logic.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 12:20:46 -05:00
Peter Geoghegan e42b3c3bd6 Fix GiST index deletion assert issue.
Avoid calling heap_index_delete_tuples() with an empty deltids array to
avoid an assertion failure.

This issue was arguably an oversight in commit b5f58cf2, though the
failing assert itself was added by my recent commit d168b666.  No
backpatch, though, since the oversight is harmless in the back branches.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Discussion: https://postgr.es/m/CAJKUy5jscES84n3puE=sYngyF+zpb4wv8UMtuLnLPv5z=6yyNw@mail.gmail.com
2021-01-26 23:24:37 -08:00
Michael Paquier 4c9c359d38 Refactor code in tablecmds.c to check and process tablespace moves
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.

A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).

This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections.  This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.

Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
2021-01-27 11:54:16 +09:00
Tom Lane d5a83d79c9 Rethink recently-added SPI interfaces.
SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are
new in v14 (cf. commit 2f48ede08).  Before they can get out the door,
let's change their APIs to follow the practice recently established by
SPI_prepare_extended etc: shove all optional arguments into a struct
that callers are supposed to pre-zero.  The hope is to allow future
addition of more options without either API breakage or a continuing
proliferation of new SPI entry points.  With that in mind, choose
slightly more generic names for them: SPI_execute_extended and
SPI_cursor_parse_open respectively.

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-26 16:37:12 -05:00
Tom Lane ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Andres Freund 55ef8555f0 Fix two typos in snapbuild.c.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-01-25 12:15:10 -08:00
Tom Lane 07d46fceb4 Fix broken ruleutils support for function TRANSFORM clauses.
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.

Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
2021-01-25 13:03:43 -05:00
Robert Haas d18e75664a Remove CheckpointLock.
Up until now, we've held this lock when performing a checkpoint or
restartpoint, but commit 076a055acf back
in 2004 and commit 7e48b77b1c from 2009,
taken together, have removed all need for this. In the present code,
there's only ever one process entitled to attempt a checkpoint: either
the checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.

One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() will now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.

Also, if there are any such problems, we might want to fix them
rather than reverting this, since running lots of code with
interrupt handling suspended is generally bad.

Patch by me, per an inquiry by Amul Sul. Review by Tom Lane
and Michael Paquier.

Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
2021-01-25 12:34:38 -05:00
David Rowley 16dfe253e3 Fix hypothetical bug in heap backward scans
Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the number of pages to scan was
specified by heap_setscanlimits().  The code incorrectly started the scan
at the end of the relation when startBlk was 0, or otherwise at
startBlk - 1, neither of which is correct when only scanning a subset of
pages.

The fix here checks if heap_setscanlimits() has changed the number of
pages to scan and if so we set the first page to scan as the final page in
the specified range during backward scans.

Proper adjustment of this code was forgotten when heap_setscanlimits() was
added in 7516f5259 back in 9.5.  However, practice, nowhere in core code
performs backward scans after having used heap_setscanlimits(), yet, it is
possible an extension uses the heap functions in this way, hence
backpatch.

An upcoming patch does use heap_setscanlimits() with backward scans, so
this must be fixed before that can go in.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
Backpatch-through: 9.5, all supported versions
2021-01-25 19:52:18 +13:00
Amit Kapila 40ab64c1ec Fix ALTER PUBLICATION...DROP TABLE behavior.
Commit 69bd60672 fixed the initialization of streamed transactions for
RelationSyncEntry. It forgot to initialize the publication actions while
invalidating the RelationSyncEntry due to which even though the relation
is dropped from a particular publication we still publish its changes. Fix
it by initializing pubactions when entry got invalidated.

Author: Japin Li and Bharath Rupireddy
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
2021-01-25 07:39:29 +05:30
Tomas Vondra 39b66a91bd Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS
This adds code omitted from commit 7db0cd2145 by accident, which had
two consequences. Firstly, only rows inserted by heap_multi_insert were
frozen as expected when running COPY FREEZE, while heap_insert left
rows unfrozen. That however includes rows in TOAST tables, so a lot of
data might have been left unfrozen. Secondly, page might have been left
partially empty after relcache invalidation.

This addresses both of those issues.

Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
2021-01-24 01:08:11 +01:00
Tom Lane 7cd9765f9b Re-allow DISTINCT in pl/pgsql expressions.
I'd omitted this from the grammar in commit c9d529848, figuring that
it wasn't worth supporting.  However we already have one complaint,
so it seems that judgment was wrong.  It doesn't require a huge
amount of code, so add it back.  (I'm still drawing the line at
UNION/INTERSECT/EXCEPT though: those'd require an unreasonable
amount of grammar refactoring, and the single-result-row restriction
makes them near useless anyway.)

Also rethink the documentation: this behavior is a property of
all pl/pgsql expressions, not just assignments.

Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org
2021-01-22 16:26:22 -05:00
Peter Eisentraut 09418bed67 Remove bogus tracepoint
Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the LWLock.
(Probably a copy/paste bug in 68a2e52bbaf.)  Remove it.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com
2021-01-22 11:58:21 +01:00
Michael Paquier af0e79c8f4 Move SSL information callback earlier to capture more information
The callback for retrieving state change information during connection
setup was only installed when the connection was mostly set up, and
thus didn't provide much information and missed all the details related
to the handshake.

This also extends the callback with SSL_state_string_long() to print
more information about the state change within the SSL object handled.

While there, fix some comments which were incorrectly referring to the
callback and its previous location in fe-secure.c.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se
2021-01-22 09:26:27 +09:00
Tom Lane 55dc86eca7 Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tomas Vondra 920f853dc9 Fix initialization of FDW batching in ExecInitModifyTable
ExecInitModifyTable has to initialize batching for all result relations,
not just the first one. Furthermore, when junk filters were necessary,
the pointer pointed past the mtstate->resultRelInfo array.

Per reports from multiple non-x86 animals (florican, locust, ...).

Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-21 03:34:32 +01:00
Tomas Vondra b663a41363 Implement support for bulk inserts in postgres_fdw
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.

It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:

* GetForeignModifyBatchSize - allows the FDW picking optimal batch size

* ExecForeignBatchInsert - inserts a batch of rows at once

Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.

This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.

The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.

Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-20 23:57:27 +01:00
Heikki Linnakangas 6b4d3046f4 Fix bug in detecting concurrent page splits in GiST insert
In commit 9eb5607e69, I got the condition on checking for split or
deleted page wrong: I used && instead of ||. The comment correctly said
"concurrent split _or_ deletion".

As a result, GiST insertion could miss a concurrent split, and insert to
wrong page. Duncan Sands demonstrated this with a test script that did a
lot of concurrent inserts.

Backpatch to v12, where this was introduced. REINDEX is required to fix
indexes that were affected by this bug.

Backpatch-through: 12
Reported-by: Duncan Sands
Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com
2021-01-20 11:58:03 +02:00
Michael Paquier 21378e1fef Fix ALTER DEFAULT PRIVILEGES with duplicated objects
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors.  Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.

A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.

Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
2021-01-20 11:38:17 +09:00
Tom Lane a0efda88a6 Remove faulty support for MergeAppend plan with WHERE CURRENT OF.
Somebody extended search_plan_tree() to treat MergeAppend exactly
like Append, which is 100% wrong, because unlike Append we can't
assume that only one input node is actively returning tuples.
Hence a cursor using a MergeAppend across a UNION ALL or inheritance
tree could falsely match a WHERE CURRENT OF query at a row that
isn't actually the cursor's current output row, but coincidentally
has the same TID (in a different table) as the current output row.

Delete the faulty code; this means that such a case will now return
an error like 'cursor "foo" is not a simply updatable scan of table
"bar"', instead of silently misbehaving.  Users should not find that
surprising though, as the same cursor query could have failed that way
already depending on the chosen plan.  (It would fail like that if the
sort were done with an explicit Sort node instead of MergeAppend.)

Expand the clearly-inadequate commentary to be more explicit about
what this code is doing, in hopes of forestalling future mistakes.

It's been like this for awhile, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us
2021-01-19 13:25:33 -05:00
Amit Kapila ed43677e20 pgindent worker.c.
This is a leftover from commit 0926e96c49. Changing this separately
because this file is being modified for upcoming patch logical replication
of 2PC.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
2021-01-19 08:10:13 +05:30
Tom Lane 60661bbf2d Avoid crash with WHERE CURRENT OF and a custom scan plan.
execCurrent.c's search_plan_tree() assumed that ForeignScanStates
and CustomScanStates necessarily have a valid ss_currentRelation.
This is demonstrably untrue for postgres_fdw's remote join and
remote aggregation plans, and non-leaf custom scans might not have
an identifiable scan relation either.  Avoid crashing by ignoring
such nodes when the field is null.

This solution will lead to errors like 'cursor "foo" is not a
simply updatable scan of table "bar"' in cases where maybe we
could have allowed WHERE CURRENT OF to work.  That's not an issue
for postgres_fdw's usages, since joins or aggregations would render
WHERE CURRENT OF invalid anyway.  But an otherwise-transparent
upper level custom scan node might find this annoying.  When and if
someone cares to expend work on such a scenario, we could invent a
custom-scan-provider callback to determine what's safe.

Report and patch by David Geier, commentary by me.  It's been like
this for awhile, so back-patch to all supported branches.

Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
2021-01-18 18:32:30 -05:00
Tom Lane 3fd80c728d Narrow the scope of a local variable.
This is better style and more symmetrical with the other if-branch.
This likely should have been included in 9de77b545 (which created
the opportunity), but it was overlooked.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-01-18 15:55:01 -05:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Robert Haas a3ed4d1efe Allow for error or refusal while absorbing a ProcSignalBarrier.
Previously, the per-barrier-type functions tasked with absorbing
them were expected to always succeed and never throw an error.
However, that's a bit inconvenient. Further study has revealed that
there are realistic cases where it might not be possible to absorb
a ProcSignalBarrier without terminating the transaction, or even
the whole backend. Similarly, for some barrier types, there might
be other reasons where it's not reasonably possible to absorb the
barrier at certain points in the code, so provide a way for a
per-barrier-type function to reject absorbing the barrier.

Unfortunately, there's still no committed code making use of this
infrastructure; hopefully, we'll get there. :-(

Patch by me, reviewed by Andres Freund and Amul Sul.

Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de
Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com
2021-01-18 12:09:52 -05:00
Peter Eisentraut 15251c0a60 Pause recovery for insufficient parameter settings
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior for hot standbys to pause recovery at
that point instead.  That allows read traffic on the standby to
continue while database administrators figure out next steps.  When
recovery is unpaused, the server shuts down (as before).  The idea is
to fix the parameters while recovery is paused and then restart when
there is a maintenance window.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2021-01-18 09:04:04 +01:00
Michael Paquier a3dc926009 Refactor option handling of CLUSTER, REINDEX and VACUUM
This continues the work done in b5913f6.  All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options.  Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM).  The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.

This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.

Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.

Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
2021-01-18 14:03:10 +09:00
Tomas Vondra 7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now we only marked individual tuples as frozen,
but page-level flags were not updated, so the first VACUUM after the
COPY FREEZE had to rewrite the whole table.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada,
             Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii,
             Darafei Praliaskouski
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
2021-01-17 22:28:26 +01:00
Magnus Hagander 960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Noah Misch 6db992833c Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
2021-01-16 12:21:35 -08:00
Amit Kapila c95765f476 Remove unnecessary pstrdup in fetch_table_list.
The result of TextDatumGetCString is already palloc'ed so we don't need to
allocate memory for it again. We decide not to backpatch it as there
doesn't seem to be any case where it can create a meaningful leak.

Author: Zhijie Hou
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-16 10:15:32 +05:30
Tomas Vondra c9a0dc3486 Disallow CREATE STATISTICS on system catalogs
Add a check that CREATE STATISTICS does not add extended statistics on
system catalogs, similarly to indexes etc.  It can be overriden using
the allow_system_table_mods GUC.

This bug exists since 7b504eb282, adding the extended statistics, so
backpatch all the way back to PostgreSQL 10.

Author: Tomas Vondra
Reported-by: Dean Rasheed
Backpatch-through: 10
Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
2021-01-15 23:31:22 +01:00
Alvaro Herrera f9900df5f9
Avoid spurious wait in concurrent reindex
This is like commit c98763bf51, but for REINDEX CONCURRENTLY.  To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY.  This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-15 10:31:42 -03:00
Fujii Masao 2ad78a87f0 Fix calculation of how much shared memory is required to store a TOC.
Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.

Back-patch to v11 where the bug was introduced.

Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
2021-01-15 12:44:17 +09:00
Michael Paquier 5ae1572993 Fix O(N^2) stat() calls when recycling WAL segments
The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.

This performance issue has been introduced by b2a5545, and it was first
implemented in 61b86142.

No backpatch is done per the lack of field complaints.

Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
2021-01-15 10:33:13 +09:00
Alvaro Herrera ebfe2dbd6b
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025d in pg12 for tables and 33e6c34c32 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 15:32:14 -03:00
Fujii Masao fef5b47f6b Ensure that a standby is able to follow a primary on a newer timeline.
Commit 709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.

If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR:  requested WAL segment ... has
 already been removed".

This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.

This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.

Back-patch to v13 where the bug was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by:  Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
2021-01-14 12:27:11 +09:00
Michael Paquier aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Thomas Munro 0d56acfbaa Move our p{read,write}v replacements into their own files.
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions.  Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.

Also move the _with_retry() wrapper to a more permanent home.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
2021-01-14 11:16:59 +13:00
Peter Geoghegan d168b66682 Enhance nbtree index tuple deletion.
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions.  This is "bottom-up
deletion".  Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page.  This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).

The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates.  It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_.  Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index.  The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.

The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs.  That is left as work for a
later commit.

Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing.  This increases the number of index tuples deleted
significantly in many cases.  The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples.  We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).

Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
to delete at least a few extra index tuples.  It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples.  It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).

Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples.  It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly).  Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.

Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.

No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade.  The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
2021-01-13 09:21:32 -08:00