Commit Graph

21961 Commits

Author SHA1 Message Date
Amit Kapila 4b82ed6eca Fix dangling pointer reference in stream_cleanup_files.
We can't access the entry after it is removed from dynahash.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
2021-03-23 09:43:33 +05:30
Tomas Vondra a5f002ad9a Use correct spelling of statistics kind
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.

Backpatch-through: 10
2021-03-23 05:01:35 +01:00
Fujii Masao 1e3e8b51bd Change the type of WalReceiverWaitStart wait event from Client to IPC.
Previously the type of this wait event was Client. But while this
wait event is being reported, walreceiver process is waiting for
the startup process to set initial data for streaming replication.
It's not waiting for any activity on a socket connected to a user
application or walsender. So this commit changes the type for
WalReceiverWaitStart wait event to IPC.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com
2021-03-23 10:09:42 +09:00
Bruce Momjian 95d77149c5 Add macro RelationIsPermanent() to report relation permanence
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code.  This matches other RelationIs* macros.

This macro will be used in more places in future cluster file encryption
patches.

Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
2021-03-22 20:23:52 -04:00
Tomas Vondra 8e4b332e88 Optimize allocations in bringetbitmap
The bringetbitmap function allocates memory for various purposes, which
may be quite expensive, depending on the number of scan keys. Instead of
allocating them separately, allocate one bit chunk of memory an carve it
into smaller pieces as needed - all the pieces have the same lifespan,
and it saves quite a bit of CPU and memory overhead.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:47:09 +01:00
Tomas Vondra 72ccf55cb9 Move IS [NOT] NULL handling from BRIN support functions
The handling of IS [NOT] NULL clauses is independent of an opclass, and
most of the code was exactly the same in both minmax and inclusion. So
instead move the code from support procedures to the AM.

This simplifies the code - especially the support procedures - quite a
bit, as they don't need to care about NULL values and flags at all. It
also means the IS [NOT] NULL clauses can be evaluated without invoking
the support procedure.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:45:42 +01:00
Tomas Vondra a1c649d889 Pass all scan keys to BRIN consistent function at once
This commit changes how we pass scan keys to BRIN consistent function.
Instead of passing them one by one, we now pass all scan keys for a
given attribute at once. That makes the consistent function a bit more
complex, as it has to loop through the keys, but it does allow more
elaborate opclasses that can use multiple keys to eliminate ranges much
more effectively.

The existing BRIN opclasses (minmax, inclusion) don't really benefit
from this change. The primary purpose is to allow future opclases to
benefit from seeing all keys at once.

This does change the BRIN API, because the signature of the consistent
function changes (a new parameter with number of scan keys). So this
breaks existing opclasses, and will require supporting two variants of
the code for different PostgreSQL versions. We've considered supporting
two variants of the consistent, but we've decided not to do that.
Firstly, there's another patch that moves handling of NULL values from
the opclass, which means the opclasses need to be updated anyway.
Secondly, we're not aware of any out-of-core BRIN opclasses, so it does
not seem worth the extra complexity.

Bump catversion, because of pg_proc changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:45:03 +01:00
Tomas Vondra bfa2cee784 Move bsearch_arg to src/port
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code.  But we already have qsort_arg in
src/port, so let's move it next to it.
2021-03-23 00:11:22 +01:00
Tom Lane 063dd37ebc Short-circuit slice requests that are for more than the object's size.
substring(), and perhaps other callers, isn't careful to pass a
slice length that is no more than the datum's true size.  Since
toast_decompress_datum_slice's children will palloc the requested
slice length, this can waste memory.  Also, close study of the liblz4
documentation suggests that it is dependent on the caller to not ask
for more than the correct amount of decompressed data; this squares
with observed misbehavior with liblz4 1.8.3.  Avoid these problems
by switching to the normal full-decompression code path if the
slice request is >= datum's decompressed size.

Tom Lane and Dilip Kumar

Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us
2021-03-22 14:01:20 -04:00
Tom Lane aeb1631ed2 Mostly-cosmetic adjustments of TOAST-related macros.
The authors of bbe0a81db hadn't quite got the idea that macros named
like SOMETHING_4B_C were only meant for internal endianness-related
details in postgres.h.  Choose more legible names for macros that are
intended to be used elsewhere.  Rearrange postgres.h a bit to clarify
the separation between those internal macros and ones intended for
wider use.

Also, avoid using the term "rawsize" for true decompressed size;
we've used "extsize" for that, because "rawsize" generally denotes
total Datum size including header.  This choice seemed particularly
unfortunate in tests that were comparing one of these meanings to
the other.

This patch includes a couple of not-purely-cosmetic changes: be
sure that the shifts aligning compression methods are unsigned
(not critical today, but will be when compression method 2 exists),
and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now
VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only
accidentally.

Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us
2021-03-22 13:43:10 -04:00
Robert Haas a4d5284a10 Error on invalid TOAST compression in CREATE or ALTER TABLE.
The previous coding treated an invalid compression method name as
equivalent to the default, which is certainly not right.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
2021-03-22 10:57:08 -04:00
Robert Haas 226e2be387 More code cleanup for configurable TOAST compression.
Remove unused macro. Fix confusion about whether a TOAST compression
method is identified by an OID or a char.

Justin Pryzby

Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
2021-03-22 09:21:37 -04:00
Michael Paquier 909b449e00 Fix concurrency issues with WAL segment recycling on Windows
This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename().  As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG:  could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied

This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used.  This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive.  The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename().  In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).

Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.

Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz
Backpatch-through: 13
2021-03-22 14:02:26 +09:00
Michael Paquier 595b9cba2a Fix timeline assignment in checkpoints with 2PC transactions
Any transactions found as still prepared by a checkpoint have their
state data read from the WAL records generated by PREPARE TRANSACTION
before being moved into their new location within pg_twophase/.  While
reading such records, the WAL reader uses the callback
read_local_xlog_page() to read a page, that is shared across various
parts of the system.  This callback, since 1148e22a, has introduced an
update of ThisTimeLineID when reading a record while in recovery, which
is potentially helpful in the context of cascading WAL senders.

This update of ThisTimeLineID interacts badly with the checkpointer if a
promotion happens while some 2PC data is read from its record, as, by
changing ThisTimeLineID, any follow-up WAL records would be written to
an timeline older than the promoted one.  This results in consistency
issues.  For instance, a subsequent server restart would cause a failure
in finding a valid checkpoint record, resulting in a PANIC, for
instance.

This commit changes the code reading the 2PC data to reset the timeline
once the 2PC record has been read, to prevent messing up with the static
state of the checkpointer.  It would be tempting to do the same thing
directly in read_local_xlog_page().  However, based on the discussion
that has led to 1148e22a, users may rely on the updates of
ThisTimeLineID when a WAL record page is read in recovery, so changing
this callback could break some cases that are working currently.

A TAP test reproducing the issue is added, relying on a PITR to
precisely trigger a promotion with a prepared transaction still
tracked.

Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
and myself.

Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
Backpatch-through: 10
2021-03-22 08:30:53 +09:00
Tom Lane ac897c4834 Fix assorted silliness in ATExecSetCompression().
It's not okay to scribble directly on a syscache entry.
Nor to continue accessing said entry after releasing it.

Also get rid of not-used local variables.

Per valgrind testing.
2021-03-21 18:43:07 -04:00
Peter Geoghegan 9dd963ae25 Recycle nbtree pages deleted during same VACUUM.
Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call.  Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay.  It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying.  In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.

This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design.  Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique".  The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin).  All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right).  Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.

Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable.  This is expected
in the absence of even one concurrent XID assignment.  It is an old
implementation restriction.  In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.

An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general.  If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict.  This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases.  Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
2021-03-21 15:25:39 -07:00
Tom Lane 9fb9691a88 Suppress various new compiler warnings.
Compilers that don't understand that elog(ERROR) doesn't return
issued warnings here.  In the cases in libpq_pipeline.c, we were
not exactly helping things by failing to mark pg_fatal() as noreturn.

Per buildfarm.
2021-03-21 11:50:43 -04:00
Peter Eisentraut 96ae658e62 Move lwlock-release probe back where it belongs
The documentation specifically states that lwlock-release fires before
any released waiters have been awakened.  It worked that way until
ab5194e6f6, where is seems to have been
misplaced accidentally.  Move it back where it belongs.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
2021-03-21 08:02:30 +01:00
Tomas Vondra 882b2cdc08 Use valid compression method in brin_form_tuple
When compressing the BRIN summary, we can't simply use the compression
method from the indexed attribute.  The summary may use a different data
type, e.g. fixed-length attribute may have varlena summary, leading to
compression failures.  For the built-in BRIN opclasses this happens to
work, because the summary uses the same data type as the attribute.

When the data types match, we can inherit use the compression method
specified for the attribute (it's copied into the index descriptor).
Otherwise we don't have much choice and have to use the default one.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com
2021-03-21 00:28:34 +01:00
Tom Lane e835e89a0f Fix memory leak when rejecting bogus DH parameters.
While back-patching e0e569e1d, I noted that there were some other
places where we ought to be applying DH_free(); namely, where we
load some DH parameters from a file and then reject them as not
being sufficiently secure.  While it seems really unlikely that
anybody would hit these code paths in production, let alone do
so repeatedly, let's fix it for consistency.

Back-patch to v10 where this code was introduced.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
2021-03-20 12:47:21 -04:00
Tom Lane f0c2a5bba6 Avoid leaking memory in RestoreGUCState(), and improve comments.
RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.

The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader.  Rewrite and enlarge the
comments to try to make it clearer what's happening.

Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.

Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process.  So it seems unnecessary to take any
risk of back-patching.

Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
2021-03-19 23:03:17 -04:00
Thomas Munro 61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Tomas Vondra b822ae13ea Use lfirst_int in cmp_list_len_contents_asc
The function added in be45be9c33 is comparing integer lists (IntList) by
length and contents, but there were two bugs.  Firstly, it used intVal()
to extract the value, but that's for Value nodes, not for extracting int
values from IntList.  Secondly, it called it directly on the ListCell,
without doing lfirst().  So just do lfirst_int() instead.

Interestingly enough, this did not cause any crashes on the buildfarm,
but valgrind rightfully complained about it.

Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-20 00:04:25 +01:00
Robert Haas d00fbdc431 Fix use-after-ReleaseSysCache problem in ATExecAlterColumnType.
Introduced by commit bbe0a81db6.

Per buildfarm member prion.
2021-03-19 17:17:48 -04:00
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Fujii Masao fd31214075 Fix comments in postmaster.c.
Commit 86c23a6eb2 changed the option to specify that postgres will
stop all other server processes by sending the signal SIGSTOP,
from -s to -T. But previously there were comments incorrectly
explaining that SIGSTOP behavior is set by -s option. This commit
fixes them.

Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com
2021-03-19 11:28:54 +09:00
Tom Lane 9bacdf9f53 Don't leak malloc'd error string in libpqrcv_check_conninfo().
We leaked the error report from PQconninfoParse, when there was
one.  It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.

Found via valgrind testing.
Back-patch to v10 where this code was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 377b7a8300 Don't leak malloc'd strings when a GUC setting is rejected.
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks.  Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane d303849b05 Don't leak compiled regex(es) when an ispell cache entry is dropped.
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context.  However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc.  Hence, we leaked
compiled regex(es) any time we dropped such a cache entry.  That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes.  Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.

Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 415ffdc220 Don't run RelationInitTableAccessMethod in a long-lived context.
Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context.  If said context is CacheMemoryContext,
we eventually will have visible bloat.  But fixing this is no
harder than moving one memory context switch step.  (The other
callers don't have a problem.)

Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.

Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 28644fac10 Don't leak rd_statlist when a relcache entry is dropped.
Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.

Found via valgrind testing.
Back-patch to v10 where this field was added.

Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
2021-03-18 22:22:47 -04:00
Tom Lane 1d581ce712 Fix misuse of foreach_delete_current().
Our coding convention requires this macro's result to be assigned
back to the original List variable.  In this usage, since the
List could not become empty, there was no actual bug --- but
some compilers warned about it.  Oversight in be45be9c3.

Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
2021-03-18 19:24:22 -04:00
Tomas Vondra be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Tomas Vondra cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Magnus Hagander da18d829c2 Fix function name in error hint
pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.

Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
2021-03-18 11:22:20 +01:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Andres Freund 5f79580ad6 Fix memory lifetime issues of replication slot stats.
When accessing replication slot stats, introduced in 9868167500,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.

The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.

No backpatch necessary, feature is only in HEAD.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
2021-03-17 16:21:46 -07:00
Tom Lane 8620a7f6db Code review for server's handling of "tablespace map" files.
While looking at Robert Foggia's report, I noticed a passel of
other issues in the same area:

* The scheme for backslash-quoting newlines in pathnames is just
wrong; it will misbehave if the last ordinary character in a pathname
is a backslash.  I'm not sure why we're bothering to allow newlines
in tablespace paths, but if we're going to do it we should do it
without introducing other problems.  Hence, backslashes themselves
have to be backslashed too.

* The author hadn't read the sscanf man page very carefully, because
this code would drop any leading whitespace from the path.  (I doubt
that a tablespace path with leading whitespace could happen in
practice; but if we're bothering to allow newlines in the path, it
sure seems like leading whitespace is little less implausible.)  Using
sscanf for the task of finding the first space is overkill anyway.

* While I'm not 100% sure what the rationale for escaping both \r and
\n is, if the idea is to allow Windows newlines in the file then this
code failed, because it'd throw an error if it saw \r followed by \n.

* There's no cross-check for an incomplete final line in the map file,
which would be a likely apparent symptom of the improper-escaping
bug.

On the generation end, aside from the escaping issue we have:

* If needtblspcmapfile is true then do_pg_start_backup will pass back
escaped strings in tablespaceinfo->path values, which no caller wants
or is prepared to deal with.  I'm not sure if there's a live bug from
that, but it looks like there might be (given the dubious assumption
that anyone actually has newlines in their tablespace paths).

* It's not being very paranoid about the possibility of random stuff
in the pg_tblspc directory.  IMO we should ignore anything without an
OID-like name.

The escaping rule change doesn't seem back-patchable: it'll require
doubling of backslashes in the tablespace_map file, which is basically
a basebackup format change.  The odds of that causing trouble are
considerably more than the odds of the existing bug causing trouble.
The rest of this seems somewhat unlikely to cause problems too,
so no back-patch.
2021-03-17 16:18:46 -04:00
Tom Lane a50e4fd028 Prevent buffer overrun in read_tablespace_map().
Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.

While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't.  To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map.  That's not very
appetizing either.)
2021-03-17 16:10:37 -04:00
Thomas Munro 7f7f25f15e Revert "Fix race in Parallel Hash Join batch cleanup."
This reverts commit 378802e371.
This reverts commit 3b8981b6e1.

Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
2021-03-18 01:10:55 +13:00
Michael Paquier 9fd2952cf4 Fix comment in indexing.c
578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.

Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
2021-03-17 18:07:00 +09:00
Peter Eisentraut e1ae40f381 Small error message improvement 2021-03-17 08:17:33 +01:00
Thomas Munro 378802e371 Update the names of Parallel Hash Join phases.
Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases.  Update the phases' names to match.

While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state.  The distinction is clearer after bugfix commit
3b8981b6 split the phase into two.  As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then.  Also improve the comments about
the phases a bit.

Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
2021-03-17 18:43:04 +13:00
Thomas Munro 3b8981b6e1 Fix race in Parallel Hash Join batch cleanup.
With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.

Fix, by introducing an extra barrier phase.  The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race.  This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).

Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().

Back-patch to 11, where the code arrived.

Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
2021-03-17 18:05:39 +13:00
Amit Kapila 6b67d72b60 Fix race condition in drop subscription's handling of tablesync slots.
Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.

Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
2021-03-17 08:15:12 +05:30
Thomas Munro 9e7ccd9ef6 Enable parallelism in REFRESH MATERIALIZED VIEW.
Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans
are considered when running the underlying SELECT query.  This wasn't
done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW,
because it wasn't yet known to be safe.

Since REFRESH always inserts into a freshly created table before later
merging or swapping the data into place with separate operations, we can
enable such plans here too.

Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Luc Vlaming <luc@swarm64.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
2021-03-17 15:04:17 +13:00
Peter Geoghegan fbe4cb3bd4 Fix comment about promising tuples.
Oversight in commit d168b66682, which added bottom-up index deletion.
2021-03-16 13:38:52 -07:00
Tom Lane 4b12ab18c9 Avoid corner-case memory leak in SSL parameter processing.
After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context.  That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.

The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data.  That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.

While we're here, add some comments about the memory management
aspects of this logic.

Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.

Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
2021-03-16 16:03:06 -04:00
Stephen Frost c6fc50cb40 Use pre-fetching for ANALYZE
When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for.  Similar to bitmap index
scans, the number of buffers pre-fetched is based off of the
maintenance_io_concurrency setting (for the particular tablespace or,
if not set, globally, via get_tablespace_maintenance_io_concurrency()).

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Stephen Frost 94d13d474d Improve logging of auto-vacuum and auto-analyze
When logging auto-vacuum and auto-analyze activity, include the I/O
timing if track_io_timing is enabled.  Also, for auto-analyze, add the
read rate and the dirty rate, similar to how that information has
historically been logged for auto-vacuum.

Stephen Frost and Jakub Wartak

Reviewed-By: Heikki Linnakangas, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2021-03-16 14:46:48 -04:00
Tom Lane 1ea396362b Improve logging of bad parameter values in BIND messages.
Since commit ba79cb5dc, values of bind parameters have been logged
during errors in extended query mode.  However, we only did that after
we'd collected and converted all the parameter values, thus failing to
offer any useful localization of invalid-parameter problems.  Add a
separate callback that's used during parameter collection, and have it
print the parameter number, along with the input string if text input
format is used.

Justin Pryzby and Tom Lane

Discussion: https://postgr.es/m/20210104170939.GH9712@telsasoft.com
Discussion: https://postgr.es/m/CANfkH5k-6nNt-4cSv1vPB80nq2BZCzhFVR5O4VznYbsX0wZmow@mail.gmail.com
2021-03-16 11:16:41 -04:00
Alvaro Herrera acb7e4eb6b
Implement pipeline mode in libpq
Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query.  The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync.  This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com>
Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Vérité <daniel@manitou-mail.org>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com>
Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>

Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
2021-03-15 18:13:42 -03:00
Fujii Masao d75288fb27 Make archiver process an auxiliary process.
This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.

This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.

Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.

Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
2021-03-15 13:13:14 +09:00
Peter Geoghegan 0ea71c93a0 Notice that heap page has dead items during VACUUM.
Consistently set a flag variable that tracks whether the current heap
page has a dead item during lazy vacuum's heap scan.  We missed the
common case where there is an preexisting (or even a new) LP_DEAD heap
line pointer.

Also make it clear that the variable might be affected by an existing
line pointer, say from an earlier opportunistic pruning operation.  This
distinction is important because it's the main reason why we can't just
use the nearby tups_vacuumed variable instead.

No backpatch.  In theory failing to set the page level flag variable had
no consequences.  Currently it is only used to defensively check if a
page marked all visible has dead items, which should never happen anyway
(if it does then the table must be corrupt).

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Diagnosed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAtZb4+HJT_8RoOXvu4HM-Zd4HKS3YSMCH6+-W=bDyh-w@mail.gmail.com
2021-03-14 18:05:57 -07:00
Amit Kapila c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
Commit 05c8482f7f added special logic related to parallel-safety of FK
triggers. This is a bit of a hack and should have instead been done by
simply setting appropriate proparallel values on those trigger functions
themselves.

Suggested-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-13 09:20:52 +05:30
Peter Geoghegan 02b5940dbe Consolidate nbtree VACUUM metapage routines.
Simplify _bt_vacuum_needs_cleanup() functions's signature (it only needs
a single 'rel' argument now), and move it next to its sibling function
in nbtpage.c.

I believe that _bt_vacuum_needs_cleanup() was originally located in
nbtree.c due to an include dependency issue.  That's no longer an issue.

Follow-up to commit 9f3665fb.
2021-03-12 13:11:47 -08:00
Tom Lane f52c5d6749 Forbid marking an identity column as nullable.
GENERATED ALWAYS AS IDENTITY implies NOT NULL, but the code failed
to complain if you overrode that with "GENERATED ALWAYS AS IDENTITY
NULL".  One might think the old behavior was a feature, but it was
inconsistent because the outcome varied depending on the order of
the clauses, so it seems to have been just an oversight.

Per bug #16913 from Pavel Boev.  Back-patch to v10 where identity
columns were introduced.

Vik Fearing (minor tweaks by me)

Discussion: https://postgr.es/m/16913-3b5198410f67d8c6@postgresql.org
2021-03-12 11:08:42 -05:00
Thomas Munro 1b88b8908e Specialize checkpointer sort functions.
When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine.  One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-12 23:56:02 +13:00
Amit Kapila 519e4c9ee2 Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.
Reported-by: Thomas Munro
Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com
2021-03-12 15:42:08 +05:30
Amit Kapila e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
The commit added code which used a relcache TriggerDesc field across
another cache access, which it shouldn't because the relcache doesn't
guarantee it won't get moved.

Diagnosed-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-12 15:14:41 +05:30
Thomas Munro 57dcc2ef33 Poll postmaster less frequently in recovery.
Since commits 9f095299 and f98b8476 we don't poll the postmaster
pipe at all during crash recovery on Linux and FreeBSD, but on other
operating systems we were still doing it for every WAL record.  Do it
less frequently on operating systems where system calls are required, at
the cost of delaying exit a bit after postmaster death.  This avoids
expensive system calls reported to slow down CPU-bound recovery by as
much as 10-30%.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
2021-03-12 19:45:42 +13:00
Thomas Munro de829ddf23 Add condition variable for walreceiver shutdown.
Use this new CV to wait for walreceiver shutdown without a sleep/poll
loop, while also benefiting from standard postmaster death handling.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Thomas Munro 600f2f50b7 Add condition variable for recovery resume.
Replace a sleep loop with a CV, to get a fast reaction time when
recovery is resumed or the postmaster exits via standard infrastructure.
Unfortunately we still need to wake up every second to perform extra
polling during the recovery pause loop.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-12 19:45:42 +13:00
Fujii Masao b82640df00 Send statistics collected during shutdown checkpoint to the stats collector.
When shutdown is requested, checkpointer performs checkpoint or
restartpoint, and updates the statistics, before it exits. But previously
checkpointer didn't send those statistics to the stats collector.

Shutdown checkpoint and restartpoint are treated as requested ones
instead of scheduled ones, so the number of them are counted in
pg_stat_bgwriter.checkpoints_req column.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 14:23:00 +09:00
Fujii Masao 33394ee6f2 Force to send remaining WAL stats to the stats collector at walwriter exit.
In walwriter's main loop, WAL stats message is only sent if enough time
has passed since last one was sent to reach PGSTAT_STAT_INTERVAL msecs.
This is necessary to avoid overloading to the stats collector. But this
can cause recent WAL stats to be unsent when walwriter exits.

To ensure that all the WAL stats are sent, this commit makes walwriter
force to send remaining WAL stats to the collector when it exits because
of shutdown request. Note that those remaining WAL stats can still be
unsent when walwriter exits with non-zero exit code (e.g., FATAL error).
This is OK because that walwriter exit leads to server crash and
subsequent recovery discards all the stats. So there is no need to send
remaining stats in that case.

Author: Masahiro Ikeda
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-12 13:29:59 +09:00
Thomas Munro 43c6662496 Minor modernization for README.barrier.
Itanium is very uncommon and being discontinued.  ARM is everywhere.
Prefer ARM as an example of an architecture with weak memory ordering.
2021-03-12 15:36:16 +13:00
Peter Geoghegan 7bb97211a5 Save a few cycles during nbtree VACUUM.
Avoid calling RelationGetNumberOfBlocks() unnecessarily in the common
case where there are no deleted but not yet recycled pages to recycle
during a cleanup-only nbtree VACUUM operation.

Follow-up to commit e5d8a999, which (among other things) taught the
"skip full scan" nbtree VACUUM mechanism to only trigger a full index
scan when the absolute number of deleted pages in the index is
considered excessive.
2021-03-11 14:18:23 -08:00
Peter Geoghegan effdd3f3b6 Add back vacuum_cleanup_index_scale_factor parameter.
Commit 9f3665fb removed the vacuum_cleanup_index_scale_factor storage
parameter.  However, that creates dump/reload hazards when moving across
major versions.

Add back the vacuum_cleanup_index_scale_factor parameter (though not the
GUC of the same name) purely to avoid problems when using tools like
pg_upgrade.  The parameter remains disabled and undocumented.

No backpatch to Postgres 13, since vacuum_cleanup_index_scale_factor was
only disabled by REL_13_STABLE's version of master branch commit
9f3665fb in the first place -- the parameter already looks like this on
REL_13_STABLE.

Discussion: https://postgr.es/m/YEm/a3Ko3nKnBuVq@paquier.xyz
2021-03-11 12:42:46 -08:00
Robert Haas 32fd2b57d7 Be clear about whether a recovery pause has taken effect.
Previously, the code and documentation seem to have essentially
assumed than a call to pg_wal_replay_pause() would take place
immediately, but that's not the case, because we only check for a
pause in certain places. This means that a tool that uses this
function and then wants to do something else afterward that is
dependent on the pause having taken effect doesn't know how long it
needs to wait to be sure that no more WAL is going to be replayed.

To avoid that, add a new function pg_get_wal_replay_pause_state()
which returns either 'not paused', 'paused requested', or 'paused'.
After calling pg_wal_replay_pause() the status will immediate change
from 'not paused' to 'pause requested'; when the startup process
has noticed this, the status will change to 'pause'.  For backward
compatibility, pg_is_wal_replay_paused() still exists and returns
the same thing as before: true if a pause has been requested,
whether or not it has taken effect yet; and false if not.
The documentation is updated to clarify.

To improve the changes that a pause request is quickly confirmed
effective, adjust things so that WaitForWALToBecomeAvailable will
swiftly reach a call to recoveryPausesHere() when a pause request
is made.

Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata,
Masahiko Sawada, and Bharath Rupireddy.

Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
2021-03-11 15:07:03 -05:00
Peter Geoghegan 5f8727f5a6 VACUUM ANALYZE: Always update pg_class.reltuples.
vacuumlazy.c sometimes fails to update pg_class entries for each index
(to ensure that pg_class.reltuples is current), even though analyze.c
assumed that that must have happened during VACUUM ANALYZE.  There are
at least a couple of reasons for this.  For example, vacuumlazy.c could
fail to update pg_class when the index AM indicated that its statistics
are merely an estimate, per the contract for amvacuumcleanup() routines
established by commit e57345975c back in 2006.

Stop assuming that pg_class must have been updated with accurate
statistics within VACUUM ANALYZE -- update pg_class for indexes at the
same time as the table relation in all cases.  That way VACUUM ANALYZE
will never fail to keep pg_class.reltuples reasonably accurate.

The only downside of this approach (compared to the old approach) is
that it might inaccurately set pg_class.reltuples for indexes whose heap
relation ends up with the same inaccurate value anyway.  This doesn't
seem too bad.  We already consistently called vac_update_relstats() (to
update pg_class) for the heap/table relation twice during any VACUUM
ANALYZE -- once in vacuumlazy.c, and once in analyze.c.  We now make
sure that we call vac_update_relstats() at least once (though often
twice) for each index.

This is follow up work to commit 9f3665fb, which dealt with issues in
btvacuumcleanup().  Technically this fixes an unrelated issue, though.
btvacuumcleanup() no longer provides an accurate num_index_tuples value
following commit 9f3665fb (when there was no btbulkdelete() call during
the VACUUM operation in question), but hashvacuumcleanup() has worked in
the same way for many years now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzknxdComjhqo4SUxVFk_Q1171GJO2ZgHZ1Y6pion6u8rA@mail.gmail.com
Backpatch: 13-, just like commit 9f3665fb.
2021-03-10 17:07:57 -08:00
Peter Geoghegan 9f3665fbfc Don't consider newly inserted tuples in nbtree VACUUM.
Remove the entire idea of "stale stats" within nbtree VACUUM (stop
caring about stats involving the number of inserted tuples).  Also
remove the vacuum_cleanup_index_scale_factor GUC/param on the master
branch (though just disable them on postgres 13).

The vacuum_cleanup_index_scale_factor/stats interface made the nbtree AM
partially responsible for deciding when pg_class.reltuples stats needed
to be updated.  This seems contrary to the spirit of the index AM API,
though -- it is not actually necessary for an index AM's bulk delete and
cleanup callbacks to provide accurate stats when it happens to be
inconvenient.  The core code owns that.  (Index AMs have the authority
to perform or not perform certain kinds of deferred cleanup based on
their own considerations, such as page deletion and recycling, but that
has little to do with pg_class.reltuples/num_index_tuples.)

This issue was fairly harmless until the introduction of the
autovacuum_vacuum_insert_threshold feature by commit b07642db, which had
an undesirable interaction with the vacuum_cleanup_index_scale_factor
mechanism: it made insert-driven autovacuums perform full index scans,
even though there is no real benefit to doing so.  This has been tied to
a regression with an append-only insert benchmark [1].

Also have remaining cases that perform a full scan of an index during a
cleanup-only nbtree VACUUM indicate that the final tuple count is only
an estimate.  This prevents vacuumlazy.c from setting the index's
pg_class.reltuples in those cases (it will now only update pg_class when
vacuumlazy.c had TIDs for nbtree to bulk delete).  This arguably fixes
an oversight in deduplication-related bugfix commit 48e12913.

[1] https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoA4WHthN5uU6+WScZ7+J_RcEjmcuH94qcoUPuB42ShXzg@mail.gmail.com
Backpatch: 13-, where autovacuum_vacuum_insert_threshold was added.
2021-03-10 16:27:01 -08:00
Bruce Momjian 845ac7f847 C comments: improve description of GiST NSN and GistBuildLSN
GiST indexes are complex, so adding more details in the code might help
someone.

Discussion: https://postgr.es/m/20210302164021.GA364@momjian.us
2021-03-10 17:03:10 -05:00
Thomas Munro d87251048a Replace buffer I/O locks with condition variables.
1.  Backends waiting for buffer I/O are now interruptible.

2.  If something goes wrong in a backend that is currently performing
I/O, waiting backends no longer wake up until that backend reaches
AbortBufferIO() and broadcasts on the CV.  Previously, any waiters would
wake up (because the I/O lock was automatically released) and then
busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS.

3.  LWLockMinimallyPadded is removed, as it would now be unused.

Author: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016)
Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
2021-03-11 10:36:17 +13:00
Tom Lane c3ffe34863 Avoid creating duplicate cached plans for inherited FK constraints.
When a foreign key constraint is applied to a partitioned table, each
leaf partition inherits a similar FK constraint.  We were processing all
of those constraints independently, meaning that in large partitioning
trees we'd build up large collections of cached FK-checking query plans.
However, in all cases but one, the generated queries are actually
identical for all members of the inheritance tree (because, in most
cases, the query only mentions the topmost table of the other side of
the FK relationship).  So we can share a single cached plan among all
the partitions, saving memory, not to mention time to build and maintain
the cached plans.

Keisuke Kuroda and Amit Langote

Discussion: https://postgr.es/m/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1
2021-03-10 14:22:31 -05:00
Peter Eisentraut bbaf315309 Add bound check before bsearch() for performance
In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuple during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple.  In that function, we simply call bsearch(), but we
should be able to know the result without bsearch() if the index tuple
points to the heap tuple that is out of range of the collected garbage
tuples.  Therefore, add a simple bound check before resorting to
bsearch().  Testing has shown that this can give significant
performance benefits.

Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
2021-03-10 15:19:37 +01:00
Peter Eisentraut 1657b37d7c Small debug message tweak
This makes the wording of the delete case match the update case.
2021-03-10 08:16:38 +01:00
Amit Kapila e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
Initialize other newly added variables in max_parallel_hazard_context via
is_parallel_safe() because we don't check the parallel-safety of target
relations in that function.

Reported-by: Tom Lane as per buildfarm
Author: Amit Kapila
Discussion: https://postgr.es/m/2060179.1615347455@sss.pgh.pa.us
2021-03-10 10:06:39 +05:30
Amit Kapila 05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Parallel SELECT can't be utilized for INSERT in the following cases:
- INSERT statement uses the ON CONFLICT DO UPDATE clause
- Target table has a parallel-unsafe: trigger, index expression or
  predicate, column default expression or check constraint
- Target table has a parallel-unsafe domain constraint on any column
- Target table is a partitioned table with a parallel-unsafe partition key
  expression or support function

The planner is updated to perform additional parallel-safety checks for
the cases listed above, for determining whether it is safe to run INSERT
in parallel-mode with an underlying parallel SELECT. The planner will
consider using parallel SELECT for "INSERT INTO ... SELECT ...", provided
nothing unsafe is found from the additional parallel-safety checks, or
from the existing parallel-safety checks for SELECT.

While checking parallel-safety, we need to check it for all the partitions
on the table which can be costly especially when we decide not to use a
parallel plan. So, in a separate patch, we will introduce a GUC and or a
reloption to enable/disable parallelism for Insert statements.

Prior to entering parallel-mode for the execution of INSERT with parallel
SELECT, a TransactionId is acquired and assigned to the current
transaction state. This is necessary to prevent the INSERT from attempting
to assign the TransactionId whilst in parallel-mode, which is not allowed.
This approach has a disadvantage in that if the underlying SELECT does not
return any rows, then the TransactionId is not used, however that
shouldn't happen in practice in many cases.

Author: Greg Nancarrow, Amit Langote, Amit Kapila
Reviewed-by: Amit Langote, Hou Zhijie, Takayuki Tsunakawa, Antonin Houska, Bharath Rupireddy, Dilip Kumar, Vignesh C, Zhihong Yu, Amit Kapila
Tested-by: Tang, Haiying
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-03-10 07:38:58 +05:30
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier 9d2d457009 Add support for more progress reporting in COPY
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.

The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.

Bump catalog version, again.

Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
2021-03-09 14:21:03 +09:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Tom Lane d4545dc19b Complain if a function-in-FROM returns a set when it shouldn't.
Throw a "function protocol violation" error if a function in FROM
tries to return a set though it wasn't marked proretset.  Although
such cases work at the moment, it doesn't seem like something we
want to guarantee will keep working.  Besides, there are other
negative consequences of not setting the proretset flag, such as
potentially bad plans.

No back-patch, since if there is any third-party code violating
this expectation, people wouldn't appreciate us breaking it in
a minor release.

Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us
2021-03-08 18:54:55 -05:00
Tom Lane 5c06abb9b9 Validate the OID argument of pg_import_system_collations().
"SELECT pg_import_system_collations(0)" caused an assertion failure.
With a random nonzero argument --- or indeed with zero, in non-assert
builds --- it would happily make pg_collation entries with garbage
values of collnamespace.  These are harmless as far as I can tell
(unless maybe the OID happens to become used for a schema, later on?).
In any case this isn't a security issue, since the function is
superuser-only.  But it seems like a gotcha for unwary DBAs, so let's
add a check that the given OID belongs to some schema.

Back-patch to v10 where this function was introduced.
2021-03-08 18:21:51 -05:00
Tom Lane 6c20bdb2a2 Further tweak memory management for regex DFAs.
Coverity is still unhappy after commit 190c79884, and after looking
closer I think it might be onto something.  The callers of newdfa()
typically drop out if v->err has been set nonzero, which newdfa()
is faithfully doing if it fails.  However, what if v->err was already
nonzero before we entered newdfa()?  Then newdfa() could succeed and
the caller would promptly leak its result.

I don't think this scenario can actually happen, but the predicate
"v->err is always zero when newdfa() is called" seems difficult to be
entirely sure of; there's a good deal of code that potentially could
get that wrong.

It seems better to adjust the callers to directly check for a null
result instead of relying on ISERR() tests.  This is slightly cheaper
than the previous coding anyway.

Lacking evidence that there's any real bug, no back-patch.
2021-03-08 16:32:29 -05:00
Amit Kapila 8a812e5106 Track replication origin progress for rollbacks.
Commit 1eb6d6527a allowed to track replica origin replay progress for 2PC
but it was not complete. It misses to properly track the progress for
rollback prepared especially it missed updating the code for recovery.
Additionally, we need to allow tracking it on subscriber nodes where
wal_level might not be logical.

It is required to track decoding of 2PC which is committed in PG14
(a271a1b50e) and also nobody complained about this till now so not
backpatching it.

Author: Amit Kapila
Reviewed-by: Michael Paquier and Ajin Cherian
Discussion: https://postgr.es/m/CAA4eK1L-kHmMnSdrRW6UhRbCjR7cgh04c+6psY15qzT6ktcd+g@mail.gmail.com
2021-03-08 07:54:03 +05:30
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Tom Lane 0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Peter Eisentraut e527a99055 Some copy-editing of GUC descriptions 2021-03-03 07:14:35 +01:00
Thomas Munro 8eda3eba30 Use sort_template.h for qsort_tuple() and qsort_ssup().
Replace the Perl code previously used to generate specialized sort
functions with sort_template.h.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-03 17:02:32 +13:00
Amit Kapila 19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30
Peter Geoghegan 5b2f2af3d9 nbtree page deletion: Add leaftopparent assertion.
Add documenting assertion.  This makes it easier to follow how we
maintain the top parent link in target subtree's half-dead/leaf level
page.
2021-03-02 14:06:07 -08:00
Peter Geoghegan 3d8d5787a3 Fix nbtree page deletion error messages.
Adjust some "can't happen" error messages that assumed that the page
deletion target page must be a half-dead page.  This assumption was
wrong in the case of an internal target page.  Simply refer to these
pages as the target page instead.

Internal pages are never marked half-dead.  There is exactly one
half-dead page for each subtree undergoing deletion.  The half-dead page
is also the target subtree's leaf-level page.  This has been the case
since commit efada2b8, which totally overhauled nbtree page deletion.
2021-03-02 13:02:24 -08:00
Tom Lane d16f8c8e41 Mark default_transaction_read_only as GUC_REPORT.
This allows clients to find out the setting at connection time without
having to expend a query round trip to do so; which is helpful when
trying to identify read/write servers.  (One must also look at
in_hot_standby, but that's already GUC_REPORT, cf bf8a662c9.)
Modifying libpq to make use of this will come soon, but I felt it
cleaner to push the server change separately.

Haribabu Kommi, Greg Nancarrow, Vignesh C; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-03-02 13:53:54 -05:00
Tom Lane 4604f83fdf Suppress unnecessary regex subre nodes in a couple more cases.
This extends the changes made in commit cebc1d34e, teaching
parseqatom() to generate fewer or cheaper subre nodes in some edge
cases.  The case of interest here is a quantified atom that is "messy"
only because it has greediness opposite to what preceded it (whereas
captures and backrefs are intrinsically messy).  In this case we don't
need an iteration node, since we don't care where the sub-matches of
the quantifier are; and we might also not need a second concatenation
node.  This seems of only marginal real-world use according to my
testing, but I wanted to get it in before wrapping up this series of
regex performance fixes.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-03-02 12:14:14 -05:00
Tom Lane 0c3405cf11 Improve performance of regular expression back-references.
In some cases, at the time that we're doing an NFA-based precheck
of whether a backref subexpression can match at a particular place
in the string, we already know which substring the referenced
subexpression matched.  If so, we might as well forget about the NFA
and just compare the substring; this is faster and it gives an exact
rather than approximate answer.

In general, this optimization can help while we are prechecking within
the second child expression of a concat node, while the capture was
within the first child expression; then the substring was saved during
cdissect() of the first child and will be available to NFA checks done
while cdissect() recurses into the second child.  It can help quite a
lot if the tree looks like

              concat
             /      \
      capture        concat
                    /      \
     expensive stuff        backref

as we will be able to avoid recursively dissecting the "expensive
stuff" before discovering that the backref isn't satisfied with a
particular midpoint that the lower concat node is testing.  This
doesn't help if the concat tree is left-deep, as the capture node
won't get set soon enough (and it's hard to fix that without changing
the engine's match behavior).  Fortunately, right-deep concat trees
are the common case.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:55:12 -05:00
Tom Lane 4aea704a5b Fix semantics of regular expression back-references.
POSIX defines the behavior of back-references thus:

    The back-reference expression '\n' shall match the same (possibly
    empty) string of characters as was matched by a subexpression
    enclosed between "\(" and "\)" preceding the '\n'.

As far as I can see, the back-reference is supposed to consider only
the data characters matched by the referenced subexpression.  However,
because our engine copies the NFA constructed from the referenced
subexpression, it effectively enforces any constraints therein, too.
As an example, '(^.)\1' ought to match 'xx', or any other string
starting with two occurrences of the same character; but in our code
it does not, and indeed can't match anything, because the '^' anchor
constraint is included in the backref's copied NFA.  If POSIX intended
that, you'd think they'd mention it.  Perl for one doesn't act that
way, so it's hard to conclude that this isn't a bug.

Fix by modifying the backref's NFA immediately after it's copied from
the reference, replacing all constraint arcs by EMPTY arcs so that the
constraints are treated as automatically satisfied.  This still allows
us to enforce matching rules that depend only on the data characters;
for example, in '(^\d+).*\1' the NFA matching step will still know
that the backref can only match strings of digits.

Perhaps surprisingly, this change does not affect the results of any
of a rather large corpus of real-world regexes.  Nonetheless, I would
not consider back-patching it, since it's a clear compatibility break.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/661609.1614560029@sss.pgh.pa.us
2021-03-02 11:34:53 -05:00
Michael Paquier fabde52fab Simplify code to switch pg_class.relrowsecurity in tablecmds.c
The same code pattern was repeated twice to enable or disable ROW LEVEL
SECURITY with an ALTER TABLE command.  This makes the code slightly
cleaner.

Author: Justin Pryzby
Reviewed-by: Zhihong Yu
Discussion: https://postgr.es/m/20210228211854.GC20769@telsasoft.com
2021-03-02 12:30:21 +09:00
Tom Lane ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
Thomas Munro bd69ddfcdb Remove obsolete comment for WaitForProcSignalBarrier().
Commit 814f1d8b removed the behavior described.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
2021-03-02 09:30:57 +13:00
Thomas Munro f5a5773a9d Allow condition variables to be used in interrupt code.
Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:24:47 +13:00
Thomas Munro 814f1d8bc3 Use condition variables for ProcSignalBarriers.
Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:23:43 +13:00
Amit Kapila 8bdb1332eb Avoid repeated decoding of prepared transactions after a restart.
In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consistent_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-01 09:11:18 +05:30
Thomas Munro 6230912f23 Use FeBeWaitSet for walsender.c.
This avoids the need to set up and tear down a fresh WaitEventSet every
time we need need to wait.  We have to add an explicit exit on
postmaster exit (FeBeWaitSet isn't set up to do that automatically), so
move the code to do that into a new function to avoid repetition.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:19:38 +13:00
Thomas Munro a042ba2ba7 Introduce symbolic names for FeBeWaitSet positions.
Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in a couple of places that didn't already do that.

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
2021-03-01 16:10:16 +13:00
Amit Kapila b4e3dc7fd4 Update the docs and comments for decoding of prepared xacts.
Commit a271a1b50e introduced decoding at prepare time in ReorderBuffer.
This can lead to deadlock for out-of-core logical replication solutions
that uses this feature to build distributed 2PC in case such transactions
lock [user] catalog tables exclusively. They need to inform users to not
have locks on catalog tables (via explicit LOCK command) in such
transactions.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
2021-03-01 08:14:33 +05:30
Thomas Munro 6148656a0b Use EVFILT_SIGNAL for kqueue latches.
Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.

This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:20:04 +13:00
Thomas Munro 6a2a70a020 Use signalfd(2) for epoll latches.
Cut down on system calls and other overheads by reading from a signalfd
instead of using a signal handler and self-pipe.  Affects Linux sytems,
and possibly others including illumos that implement the Linux epoll and
signalfd interfaces.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:12:02 +13:00
Thomas Munro 83709a0d5a Use SIGURG rather than SIGUSR1 for latches.
Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups.  Move that last use over to a
new dedicated signal.  SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that facility.

The signal handler is now installed in all postmaster children by
InitializeLatchSupport().  Those wishing to disconnect from it should
call ShutdownLatchSupport().

Future patches will use this separation of signals to avoid the need for
a signal handler on some operating systems.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro c8f3bc2401 Optimize latches to send fewer signals.
Don't send signals to processes that aren't sleeping.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 12:44:12 +13:00
Thomas Munro d1b90995e8 Remove latch.c workaround for Linux < 2.6.27.
Commit 82ebbeb0 added a workaround for systems with no epoll_create1()
and EPOLL_CLOEXEC.  Linux < 2.6.27 and glibc < 2.9 are long gone.  Now
seems like a good time to drop the extra code, because otherwise we'd
have to add similar already-dead workaround code to new patches using
XXX_CLOEXEC flags that arrived in the same kernel release.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGKL_%3DaO%3Dr30N%3Ds9VoDgTqHpRSzePRbA9dkYO7snc7HsxA%40mail.gmail.com
2021-03-01 11:24:28 +13:00
Alvaro Herrera 25936fd46c
Fix use-after-free bug with AfterTriggersTableData.storeslot
AfterTriggerSaveEvent() wrongly allocates the slot in execution-span
memory context, whereas the correct thing is to allocate it in
a transaction-span context, because that's where the enclosing
AfterTriggersTableData instance belongs into.

Backpatch to 12 (the test back to 11, where it works well with no code
changes, and it's good to have to confirm that the case was previously
well supported); this bug seems introduced by commit ff11e7f4b9.

Reported-by: Bertrand Drouvot <bdrouvot@amazon.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/39a71864-b120-5a5c-8cc5-c632b6f16761@amazon.com
2021-02-27 18:09:15 -03:00
David Rowley 977b2c0853 Add missing TidRangeScan readfunc
Mistakenly forgotten in bb437f995
2021-02-27 23:21:21 +13:00
David Rowley bb437f995d Add TID Range Scans to support efficient scanning ranges of TIDs
This adds a new executor node named TID Range Scan.  The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column.  These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM.  scan_set_tidrange is used to set the scan range to just the
given range of TIDs.  scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine.  The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
2021-02-27 22:59:36 +13:00
Peter Eisentraut f4adc41c4f Enhanced cycle mark values
Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-27 08:13:24 +01:00
Tom Lane 0fc1af174c Improve memory management in regex compiler.
The previous logic here created a separate pool of arcs for each
state, so that the out-arcs of each state were physically stored
within it.  Perhaps this choice was driven by trying to not include
a "from" pointer within each arc; but Spencer gave up on that idea
long ago, and it's hard to see what the value is now.  The approach
turns out to be fairly disastrous in terms of memory consumption,
though.  In the first place, NFAs built by this engine seem to have
about 4 arcs per state on average, with a majority having only one
or two out-arcs.  So pre-allocating 10 out-arcs for each state is
already cause for a factor of two or more bloat.  Worse, the NFA
optimization phase moves arcs around with abandon.  In a large NFA,
some of the states will have hundreds of out-arcs, so towards the
end of the optimization phase we have a significant number of states
whose arc pools have room for hundreds of arcs each, even though only
a few of those arcs are in use.  We have seen real-world regexes in
which this effect bloats the memory requirement by 25X or even more.

Hence, get rid of the per-state arc pools in favor of a single arc
pool for the whole NFA, with variable-sized allocation batches
instead of always asking for 10 at a time.  While we're at it,
let's batch the allocations of state structs too, to further reduce
the malloc traffic.

This incidentally allows moveouts() to be optimized in a similar
way to moveins(): when moving an arc to another state, it's now
valid to just re-link the same arc struct into a different outchain,
where before the code invariants required us to make a physically
new arc and then free the old one.

These changes reduce the regex compiler's typical space consumption
for average-size regexes by about a factor of two, and much more for
large or complicated regexes.  In a large test set of real-world
regexes, we formerly had half a dozen cases that failed with "regular
expression too complex" due to exceeding the REG_MAX_COMPILE_SPACE
limit (about 150MB); we would have had to raise that limit to
something close to 400MB to make them work with the old code.  Now,
none of those cases need more than 13MB to compile.  Furthermore,
the test set is about 10% faster overall due to less malloc traffic.

Discussion: https://postgr.es/m/168861.1614298592@sss.pgh.pa.us
2021-02-26 13:52:10 -05:00
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Tom Lane 80ca8464fe Fix list-manipulation bug in WITH RECURSIVE processing.
makeDependencyGraphWalker and checkWellFormedRecursionWalker
thought they could hold onto a pointer to a list's first
cons cell while the list was modified by recursive calls.
That was okay when the cons cell was actually separately
palloc'd ... but since commit 1cff1b95a, it's quite unsafe,
leading to core dumps or incorrect complaints of faulty
WITH nesting.

In the field this'd require at least a seven-deep WITH nest
to cause an issue, but enabling DEBUG_LIST_MEMORY_USAGE
allows the bug to be seen with lesser nesting depths.

Per bug #16801 from Alexander Lakhin.  Back-patch to v13.

Michael Paquier and Tom Lane

Discussion: https://postgr.es/m/16801-393c7922143eaa4d@postgresql.org
2021-02-25 20:47:32 -05:00
Peter Geoghegan 2376361839 VACUUM VERBOSE: Count "newly deleted" index pages.
Teach VACUUM VERBOSE to report on pages deleted by the _current_ VACUUM
operation -- these are newly deleted pages.  VACUUM VERBOSE continues to
report on the total number of deleted pages in the entire index (no
change there).  The former is a subset of the latter.

The distinction between each category of deleted index page only arises
with index AMs where page deletion is supported and is decoupled from
page recycling for performance reasons.

This is follow-up work to commit e5d8a999, which made nbtree store
64-bit XIDs (not 32-bit XIDs) in pages at the point at which they're
deleted.  Note that the btm_last_cleanup_num_delpages metapage field
added by that commit usually gets set to pages_newly_deleted.  The
exceptions (the scenarios in which they're not equal) all seem to be
tricky cases for the implementation (of page deletion and recycling) in
general.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX%3Dd5u-tRYhi-F4wnV2uN2zHpMUXw%40mail.gmail.com
2021-02-25 14:32:18 -08:00
Tom Lane 301ed8812e Doc: remove src/backend/regex/re_syntax.n.
We aren't publishing this file as documentation, and it's been
much more haphazardly maintained than the real docs in func.sgml,
so let's just drop it.  I think the only reason I included it in
commit 7bcc6d98f was that the Berkeley-era sources had had a man
page in this directory.

Discussion: https://postgr.es/m/4099447.1614186542@sss.pgh.pa.us
2021-02-25 13:33:27 -05:00
Tom Lane 7dc13a0f08 Change regex \D and \W shorthands to always match newlines.
Newline is certainly not a digit, nor a word character, so it is
sensible that it should match these complemented character classes.
Previously, \D and \W acted that way by default, but in
newline-sensitive mode ('n' or 'p' flag) they did not match newlines.

This behavior was previously forced because explicit complemented
character classes don't match newlines in newline-sensitive mode;
but as of the previous commit that implementation constraint no
longer exists.  It seems useful to change this because the primary
real-world use for newline-sensitive mode seems to be to match the
default behavior of other regex engines such as Perl and Javascript
... and their default behavior is that these match newlines.

The old behavior can be kept by writing an explicit complemented
character class, i.e. [^[:digit:]] or [^[:word:]].  (This means
that \D and \W are not exactly equivalent to those strings, but
they weren't anyway.)

Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:29:06 -05:00
Tom Lane 2a0af7fe46 Allow complemented character class escapes within regex brackets.
The complement-class escapes \D, \S, \W are now allowed within
bracket expressions.  There is no semantic difficulty with doing
that, but the rather hokey macro-expansion-based implementation
previously used here couldn't cope.

Also, invent "word" as an allowed character class name, thus "\w"
is now equivalent to "[[:word:]]" outside brackets, or "[:word:]"
within brackets.  POSIX allows such implementation-specific
extensions, and the same name is used in e.g. bash.

One surprising compatibility issue this raises is that constructs
such as "[\w-_]" are now disallowed, as our documentation has always
said they should be: character classes can't be endpoints of a range.
Previously, because \w was just a macro for "[:alnum:]_", such a
construct was read as "[[:alnum:]_-_]", so it was accepted so long as
the character after "-" was numerically greater than or equal to "_".

Some implementation cleanup along the way:

* Remove the lexnest() hack, and in consequence clean up wordchrs()
to not interact with the lexer.

* Fix colorcomplement() to not be O(N^2) in the number of colors
involved.

* Get rid of useless-as-far-as-I-can-see calls of element()
on single-character character element names in brackpart().
element() always maps these to the character itself, and things
would be quite broken if it didn't --- should "[a]" match something
different than "a" does?  Besides, the shortcut path in brackpart()
wasn't doing this anyway, making it even more inconsistent.

Discussion: https://postgr.es/m/2845172.1613674385@sss.pgh.pa.us
Discussion: https://postgr.es/m/3220564.1613859619@sss.pgh.pa.us
2021-02-25 13:00:40 -05:00
Peter Geoghegan e5d8a99903 Use full 64-bit XIDs in deleted nbtree pages.
Otherwise we risk "leaking" deleted pages by making them non-recyclable
indefinitely.  Commit 6655a729 did the same thing for deleted pages in
GiST indexes.  That work was used as a starting point here.

Stop storing an XID indicating the oldest bpto.xact across all deleted
though unrecycled pages in nbtree metapages.  There is no longer any
reason to care about that condition/the oldest XID.  It only ever made
sense when wraparound was something _bt_vacuum_needs_cleanup() had to
consider.

The btm_oldest_btpo_xact metapage field has been repurposed and renamed.
It is now btm_last_cleanup_num_delpages, which is used to remember how
many non-recycled deleted pages remain from the last VACUUM (in practice
its value is usually the precise number of pages that were _newly
deleted_ during the specific VACUUM operation that last set the field).

The general idea behind storing btm_last_cleanup_num_delpages is to use
it to give _some_ consideration to non-recycled deleted pages inside
_bt_vacuum_needs_cleanup() -- though never too much.  We only really
need to avoid leaving a truly excessive number of deleted pages in an
unrecycled state forever.  We only do this to cover certain narrow cases
where no other factor makes VACUUM do a full scan, and yet the index
continues to grow (and so actually misses out on recycling existing
deleted pages).

These metapage changes result in a clear user-visible benefit: We no
longer trigger full index scans during VACUUM operations solely due to
the presence of only 1 or 2 known deleted (though unrecycled) blocks
from a very large index.  All that matters now is keeping the costs and
benefits in balance over time.

Fix an issue that has been around since commit 857f9c36, which added the
"skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup()
logic).  The accuracy of btm_last_cleanup_num_heap_tuples accidentally
hinged upon _when_ the source value gets stored.  We now always store
btm_last_cleanup_num_heap_tuples in btvacuumcleanup().  This fixes the
issue because IndexVacuumInfo.num_heap_tuples (the source field) is
expected to accurately indicate the state of the table _after_ the
VACUUM completes inside btvacuumcleanup().

A backpatchable fix cannot easily be extracted from this commit.  A
targeted fix for the issue will follow in a later commit, though that
won't happen today.

I (pgeoghegan) have chosen to remove any mention of deleted pages in the
documentation of the vacuum_cleanup_index_scale_factor GUC/param, since
the presence of deleted (though unrecycled) pages is no longer of much
concern to users.  The vacuum_cleanup_index_scale_factor description in
the docs now seems rather unclear in any case, and it should probably be
rewritten in the near future.  Perhaps some passing mention of page
deletion will be added back at the same time.

Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
2021-02-24 18:41:34 -08:00
Amit Kapila 8a4f9522d0 Fix relcache reference leak introduced by ce0fdbfe97.
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com
2021-02-25 07:48:24 +05:30
Michael Paquier bcf2667bf6 Fix some typos, grammar and style in docs and comments
The portions fixing the documentation are backpatched where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com
backpatch-through: 9.6
2021-02-24 16:13:17 +09:00
Peter Eisentraut 8ec8fe0f31 Message style fix
Don't quote type name placeholders.
2021-02-24 07:00:49 +01:00
Alvaro Herrera 5a65eacfdc
Fix confusion in comments about generate_gather_paths
d2d8a229bc introduced a new function generate_useful_gather_paths to
be used as a replacement for generate_gather_paths, but forgot to update
a couple of places that referenced the older function.

This is possibly not 100% complete (ref. create_ordered_paths), but it's
better than not changing anything.

Author: "Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/4ce1d5116fe746a699a6d29858c6a39a@G08CNEXMBPEKD05.g08.fujitsu.local
2021-02-23 20:05:15 -03:00
Alvaro Herrera 8deb6b38dc
Reinstate HEAP_XMAX_LOCK_ONLY|HEAP_KEYS_UPDATED as allowed
Commit 866e24d47d added an assert that HEAP_XMAX_LOCK_ONLY and
HEAP_KEYS_UPDATED cannot appear together, on the faulty assumption that
the latter necessarily referred to an update and not a tuple lock; but
that's wrong, because SELECT FOR UPDATE can use precisely that
combination, as evidenced by the amcheck test case added here.

Remove the Assert(), and also patch amcheck's verify_heapam.c to not
complain if the combination is found.  Also, out of overabundance of
caution, update (across all branches) README.tuplock to be more explicit
about this.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Mahendra Singh Thalor <mahi6run@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/20210124061758.GA11756@nol
2021-02-23 17:30:21 -03:00
Tom Lane 3db05e76f9 Suppress compiler warning in new regex match-all detection code.
gcc 10 is smart enough to notice that control could reach this
"hasmatch[depth]" assignment with depth < 0, but not smart enough
to know that that would require a badly broken NFA graph.  Change
the assert() to a plain runtime test to shut it up.

Per report from Andres Freund.

Discussion: https://postgr.es/m/20210223173437.b3ywijygsy6q42gq@alap3.anarazel.de
2021-02-23 13:55:34 -05:00
Alvaro Herrera d9d076222f
VACUUM: ignore indexing operations with CONCURRENTLY
As envisioned in commit c98763bf51, it is possible for VACUUM to
ignore certain transactions that are executing CREATE INDEX CONCURRENTLY
and REINDEX CONCURRENTLY for the purposes of computing Xmin; that's
because we know those transactions are not going to examine any other
tables, and are not going to execute anything else in the same
transaction.  (Only operations on "safe" indexes can be ignored: those
on indexes that are neither partial nor expressional).

This is extremely useful in cases where CIC/RC can run for a very long
time, because that used to be a significant headache for concurrent
vacuuming of other tables.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
2021-02-23 12:15:09 -03:00
Peter Eisentraut 6f6f284c7e Simplify printing of LSNs
Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs.
Convert all applicable code to use it.

Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
2021-02-23 10:27:02 +01:00
Amit Kapila ade89ba5f4 Fix an oversight in ReorderBufferFinishPrepared.
We don't have anything to decode in a transaction if ReorderBufferTXN
doesn't exist by the time we decode the commit prepared. So don't create a
new ReorderBufferTXN here. This is an oversight in commit a271a1b5.

Reported-by: Markus Wanner
Discussion: https://postgr.es/m/dbec82e2-dbd7-95a2-c6b6-e488cbbdf853@bluegap.ch
2021-02-23 09:47:41 +05:30
Amit Kapila bc617a7b1c Change the error message for logical replication authentication failure.
The authentication failure error message wasn't distinguishing whether
it is a physical replication or logical replication connection failure and
was giving incomplete information on what led to failure in case of logical
replication connection.

Author: Paul Martinez and Amit Kapila
Reviewed-by: Euler Taveira and Amit Kapila
Discussion: https://postgr.es/m/CACqFVBYahrAi2OPdJfUA3YCvn3QMzzxZdw0ibSJ8wouWeDtiyQ@mail.gmail.com
2021-02-23 09:11:22 +05:30
Alvaro Herrera 0f5505a881
Remove pointless HeapTupleHeaderIndicatesMovedPartitions calls
Pavan Deolasee recently noted that a few of the
HeapTupleHeaderIndicatesMovedPartitions calls added by commit
5db6df0c01 are useless, since they are done after comparing t_self
with t_ctid.  But because t_self can never be set to the magical values
that indicate that the tuple moved partition, this can never succeed: if
the first test fails (so we know t_self equals t_ctid), necessarily the
second test will also fail.

So these checks can be removed and no harm is done.  There's no bug
here, just a code legibility issue.

Reported-by: Pavan Deolasee <pavan.deolasee@gmail.com>
Discussion: https://postgr.es/m/20200929164411.GA15497@alvherre.pgsql
2021-02-22 16:51:44 -03:00
Alvaro Herrera 6a03369a71
Fix typo 2021-02-22 11:34:05 -03:00
Thomas Munro beb4480c85 Refactor get_collation_current_version().
The code paths for three different OSes finished up with three different
ways of excluding C[.xxx] and POSIX from consideration.  Merge them.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Fujii Masao f05ed5a5cf Initialize atomic variable waitStart in PGPROC, at postmaster startup.
Commit 46d6e5f567 added the atomic variable "waitStart" into PGPROC struct,
to store the time at which wait for lock acquisition started. Previously
this variable was initialized every time each backend started. Instead,
this commit makes postmaster initialize it at the startup, to ensure that
the variable should be initialized before any use of it.

This commit also moves the code to initialize "waitStart" variable for
prepare transaction, from TwoPhaseGetDummyProc() to MarkAsPreparingGuts().
Because MarkAsPreparingGuts() is more proper place to do that since
it initializes other PGPROC variables.

Author: Fujii Masao
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com
2021-02-22 18:25:00 +09:00
Peter Eisentraut efbfb64241 Improve new hash partition bound check error messages
For the error message "every hash partition modulus must be a factor
of the next larger modulus", add a detail message that shows the
particular numbers and existing partition involved.  Also comment the
code more.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bb9d60b4-aadb-607a-1a9d-fdc3434dddcd%40enterprisedb.com
2021-02-22 08:06:45 +01:00
Michael Paquier 9294264278 Use pgstat_progress_update_multi_param() where possible
This commit changes one code path in REINDEX INDEX and one code path
in CREATE INDEX CONCURRENTLY to report the progress of each operation
using pgstat_progress_update_multi_param() rather than
multiple calls to pgstat_progress_update_param().  This has the
advantage to make the progress report more consistent to the end-user
without impacting the amount of information provided.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com
2021-02-22 14:21:40 +09:00
Thomas Munro db8374d804 Remove outdated reference to RAID spindles.
Commit b09ff536 left behind some outdated advice in the long_desc field
of the GUC "effective_io_concurrency".  Remove it.

Back-patch to 13.

Reported-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGJyyWqFBxL9gEj-qtjBThGjhAOBE8GBnF8MUJOJ3vrfag%40mail.gmail.com
2021-02-22 14:42:15 +13:00
Tom Lane 190c79884a Simplify memory management for regex DFAs a little.
Coverity complained that functions in regexec.c might leak DFA
storage.  It's wrong, but this logic is confusing enough that it's
not so surprising Coverity couldn't make sense of it.  Rewrite
in hopes of making it more legible to humans as well as machines.
2021-02-21 20:29:11 -05:00
Tom Lane ea1268f630 Avoid generating extra subre tree nodes for capturing parentheses.
Previously, each pair of capturing parentheses gave rise to a separate
subre tree node, whose only function was to identify that we ought to
capture the match details for this particular sub-expression.  In
most cases we don't really need that, since we can perfectly well
put a "capture this" annotation on the child node that does the real
matching work.  As with the two preceding commits, the main value
of this is to avoid generating and optimizing an NFA for a tree node
that's not really pulling its weight.

The chosen data representation only allows one capture annotation
per subre node.  In the legal-per-spec, but seemingly not very useful,
case where there are multiple capturing parens around the exact same
bit of the regex (i.e. "((xyz))"), wrap the child node in N-1 capture
nodes that act the same as before.  We could work harder at that but
I'll refrain, pending some evidence that such cases are worth troubling
over.

In passing, improve the comments in regex.h to say what all the
different re_info bits mean.  Some of them were pretty obvious
but others not so much, so reverse-engineer some documentation.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:26:41 -05:00
Tom Lane 5810430894 Convert regex engine's subre tree from binary to N-ary style.
Instead of having left and right child links in subre structs,
have a single child link plus a sibling link.  Multiple children
of a tree node are now reached by chasing the sibling chain.

The beneficiary of this is alternation tree nodes.  A regular
expression with N (>1) branches is now represented by one alternation
node with N children, rather than a tree that includes N alternation
nodes as well as N children.  While the old representation didn't
really cost anything extra at execution time, it was pretty horrid
for compilation purposes, because each of the alternation nodes had
its own NFA, which we were too stupid not to separately optimize.
(To make matters worse, all of those NFAs described the entire
alternation pattern, not just the portion of it that one might
expect from the tree structure.)

We continue to require concatenation nodes to have exactly two
children.  This data structure is now prepared to support more,
but the executor's logic would need some careful redesign, and
it's not clear that a lot of benefit could be had.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 19:07:45 -05:00
Tom Lane cebc1d34e5 Fix regex engine to suppress useless concatenation sub-REs.
The comment for parsebranch() claims that it avoids generating
unnecessary concatenation nodes in the "subre" tree, but it missed
some significant cases.  Once we've decided that a given atom is
"messy" and can't be bundled with the preceding atom(s) of the
current regex branch, parseqatom() always generated two new concat
nodes, one to concat the messy atom to what follows it in the branch,
and an upper node to concatenate the preceding part of the branch
to that one.  But one or both of these could be unnecessary, if the
messy atom is the first, last, or only one in the branch.  Improve
the code to suppress such useless concat nodes, along with the
no-op child nodes representing empty chunks of a branch.

Reducing the number of subre tree nodes offers significant savings
not only at execution but during compilation, because each subre node
has its own NFA that has to be separately optimized.  (Maybe someday
we'll figure out how to share the optimization work across multiple
tree nodes, but it doesn't look easy.)  Eliminating upper tree nodes
is especially useful because they tend to have larger NFAs.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:45:29 -05:00
Tom Lane 824bf71902 Recognize "match-all" NFAs within the regex engine.
This builds on the previous "rainbow" patch to detect NFAs that will
match any string, though possibly with constraints on the string length.
This definition is chosen to match constructs such as ".*", ".+", and
".{1,100}".  Recognizing such an NFA after the optimization pass is
fairly cheap, since we basically just have to verify that all arcs
are RAINBOW arcs and count the number of steps to the end state.
(Well, there's a bit of complication with pseudo-color arcs for string
boundary conditions, but not much.)

Once we have these markings, the regex executor functions longest(),
shortest(), and matchuntil() don't have to expend per-character work
to determine whether a given substring satisfies such an NFA; they
just need to check its length against the bounds.  Since some matching
problems require O(N) invocations of these functions, we've reduced
the runtime for an N-character string from O(N^2) to O(N).  Of course,
this is no help for non-matchall sub-patterns, but those usually have
constraints that allow us to avoid needing O(N) substring checks in the
first place.  It's precisely the unconstrained "match-all" cases that
cause the most headaches.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:31:19 -05:00
Tom Lane 08c0d6ad65 Invent "rainbow" arcs within the regex engine.
Some regular expression constructs, most notably the "." match-anything
metacharacter, produce a sheaf of parallel NFA arcs covering all
possible colors (that is, character equivalence classes).  We can make
a noticeable improvement in the space and time needed to process large
regexes by replacing such cases with a single arc bearing the special
color code "RAINBOW".  This requires only minor additional complication
in places such as pull() and push().

Callers of pg_reg_getoutarcs() must now be prepared for the possibility
of seeing a RAINBOW arc.  For the one known user, contrib/pg_trgm,
that's a net benefit since it cuts the number of arcs to be dealt with,
and the handling isn't any different than for other colors that contain
too many characters to be dealt with individually.

This is part of a patch series that in total reduces the regex engine's
runtime by about a factor of four on a large corpus of real-world regexes.

Patch by me, reviewed by Joel Jacobson

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-20 18:11:56 -05:00
Fujii Masao 8a55cb5ba9 Fix bug in COMMIT AND CHAIN command.
This commit fixes COMMIT AND CHAIN command so that it starts new transaction
immediately even if savepoints are defined within the transaction to commit.
Previously COMMIT AND CHAIN command did not in that case because
commit 280a408b48 forgot to make CommitTransactionCommand() handle
a transaction chaining when the transaction state was TBLOCK_SUBCOMMIT.

Also this commit adds the regression test for COMMIT AND CHAIN command
when savepoints are defined.

Back-patch to v12 where transaction chaining was added.

Reported-by: Arthur Nascimento
Author: Fujii Masao
Reviewed-by: Arthur Nascimento, Vik Fearing
Discussion: https://postgr.es/m/16867-3475744069228158@postgresql.org
2021-02-19 21:57:52 +09:00
Peter Eisentraut 678d0e239b Update snowball
Update to snowball tag v2.1.0.  Major changes are new stemmers for
Armenian, Serbian, and Yiddish.
2021-02-19 08:10:15 +01:00
Peter Geoghegan b071a31149 Add nbtree README section on page recycling.
Consolidate discussion of how VACUUM places pages in the FSM for
recycling by adding a new section that comes after discussion of page
deletion.  This structure reflects the fact that page recycling is
explicitly decoupled from page deletion in Lanin & Shasha's paper.  Page
recycling in nbtree is an implementation of what the paper calls "the
drain technique".

This decoupling is an important concept for nbtree VACUUM.  Searchers
have to detect and recover from concurrent page deletions, but they will
never have to reason about concurrent page recycling.  Recycling can
almost always be thought of as a low level garbage collection operation
that asynchronously frees the physical space that backs a logical tree
node.  Almost all code need only concern itself with logical tree nodes.
(Note that "logical tree node" is not currently a term of art in the
nbtree code -- this all works implicitly.)

This is preparation for an upcoming patch that teaches nbtree VACUUM to
remember the details of pages that it deletes on the fly, in local
memory.  This enables the same VACUUM operation to consider placing its
own deleted pages in the FSM later on, when it reaches the end of
btvacuumscan().
2021-02-18 21:16:33 -08:00
Tom Lane b5a66e7353 Fix another ancient bug in parsing of BRE-mode regular expressions.
While poking at the regex code, I happened to notice that the bug
squashed in commit afcc8772e had a sibling: next() failed to return
a specific value associated with the '}' token for a "\{m,n\}"
quantifier when parsing in basic RE mode.  Again, this could result
in treating the quantifier as non-greedy, which it never should be in
basic mode.  For that to happen, the last character before "\}" that
sets "nextvalue" would have to set it to zero, or it'd have to have
accidentally been zero from the start.  The failure can be provoked
repeatably with, for example, a bound ending in digit "0".

Like the previous patch, back-patch all the way.
2021-02-18 22:38:55 -05:00
Fujii Masao 614b7f18b3 Fix "invalid spinlock number: 0" error in pg_stat_wal_receiver.
Commit 2c8dd05d6c added the atomic variable writtenUpto into
walreceiver's shared memory information. It's initialized only
when walreceiver started up but could be read via pg_stat_wal_receiver
view anytime, i.e., even before it's initialized. In the server built
with --disable-atomics and --disable-spinlocks, this uninitialized
atomic variable read could cause "invalid spinlock number: 0" error.

This commit changed writtenUpto so that it's initialized at
the postmaster startup, to avoid the uninitialized variable read
via pg_stat_wal_receiver and fix the error.

Also this commit moved the read of writtenUpto after the release
of spinlock protecting walreceiver's shared variables. This is
necessary to prevent new spinlock from being taken by atomic
variable read while holding another spinlock, and to shorten
the spinlock duration. This change leads writtenUpto not to be
consistent with the other walreceiver's shared variables protected
by a spinlock. But this is OK because writtenUpto should not be
used for data integrity checks.

Back-patch to v13 where commit 2c8dd05d6c introduced the bug.

Author: Fujii Masao
Reviewed-by: Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com
2021-02-18 23:28:15 +09:00
Peter Eisentraut f5465fade9 Allow specifying CRL directory
Add another method to specify CRLs, hashed directory method, for both
server and client side.  This offers a means for server or libpq to
load only CRLs that are required to verify a certificate.  The CRL
directory is specifed by separate GUC variables or connection options
ssl_crl_dir and sslcrldir, alongside the existing ssl_crl_file and
sslcrl, so both methods can be used at the same time.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/20200731.173911.904649928639357911.horikyota.ntt@gmail.com
2021-02-18 07:59:10 +01:00
Peter Geoghegan 128dd901a5 nbtree README: move VACUUM linear scan section.
Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM.  This
progression is a lot more natural.

Also tweak the wording a little.  It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
2021-02-17 21:13:15 -08:00
Tomas Vondra 927f453a94 Fix tuple routing to initialize batching only for inserts
A cross-partition update on a partitioned table is implemented as a
delete followed by an insert. With foreign partitions, this was however
causing issues, because the FDW and core may disagree on when to enable
batching.  postgres_fdw was only allowing batching for plain inserts
(CMD_INSERT) while core was trying to batch the insert component of the
cross-partition update.  Fix by restricting core to apply batching only
to plain CMD_INSERT queries.

It's possible to allow batching for cross-partition updates, but that
will require more extensive changes, so better to leave that for a
separate patch.

Author: Amit Langote
Reviewed-by: Tomas Vondra, Takayuki Tsunakawa
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-02-18 00:03:45 +01:00
Tom Lane 4e703d6719 Make some minor improvements in the regex code.
Push some hopefully-uncontroversial bits extracted from an upcoming
patch series, to remove non-relevant clutter from the main patches.

In compact(), return immediately after setting REG_ASSERT error;
continuing the loop would just lead to assertion failure below.
(Ask me how I know.)

In parseqatom(), remove assertion that moresubs() did its job.
When moresubs actually did its job, this is redundant with that
function's final assert; but when it failed on OOM, this is an
assertion crash.  We could avoid the crash by adding a NOERR()
check before the assertion, but it seems better to subtract code
than add it.  (Note that there's a NOERR exit a few lines further
down, and nothing else between here and there requires moresubs
to have succeeded.  So we don't really need an extra error exit.)
This is a live bug in assert-enabled builds, but given the very
low likelihood of OOM in moresub's tiny allocation, I don't think
it's worth back-patching.

On the other hand, it seems worthwhile to add an assertion that
our intended v->subs[subno] target is still null by the time we
are ready to insert into it, since there's a recursion in between.

In pg_regexec, ensure we fflush any debug output on the way out,
and try to make MDEBUG messages more uniform and helpful.  (In
particular, ensure that all of them are prefixed with the subre's
id number, so one can match up entry and exit reports.)

Add some test cases in test_regex to improve coverage of lookahead
and lookbehind constraints.  Adding these now is mainly to establish
that this is indeed the existing behavior.

Discussion: https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us
2021-02-17 12:24:23 -05:00
Peter Eisentraut f40c6969d0 Routine usage information schema tables
Several information schema views track dependencies between
functions/procedures and objects used by them.  These had not been
implemented so far because PostgreSQL doesn't track objects used in a
function body.  However, formally, these also show dependencies used
in parameter default expressions, which PostgreSQL does support and
track.  So for the sake of completeness, we might as well add these.
If dependency tracking for function bodies is ever implemented, these
views will automatically work correctly.

Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
2021-02-17 18:16:06 +01:00
Peter Eisentraut 0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Tom Lane 38bb3aef35 Convert tsginidx.c's GIN indexing logic to fully ternary operation.
Commit 2f2007fbb did this partially, but there were two remaining
warts.  checkcondition_gin handled some uncertain cases by setting
the out-of-band recheck flag, some by returning TS_MAYBE, and some
by doing both.  Meanwhile, TS_execute arbitrarily converted a
TS_MAYBE result to TS_YES.  Thus, if checkcondition_gin chose to
only return TS_MAYBE, the outcome would be TS_YES with no recheck
flag, potentially resulting in wrong query outputs.

The case where this'd happen is if there were GIN_MAYBE entries
in the indexscan results passed to gin_tsquery_[tri]consistent,
which so far as I can see would only happen if the tidbitmap used
to accumulate indexscan results grew large enough to become lossy.

I initially thought of fixing this by ensuring we always set the
recheck flag as well as returning TS_MAYBE in uncertain cases.
But that errs in the other direction, potentially forcing rechecks
of rows that provably match the query (since the recheck flag
remains set even if TS_execute later finds that the answer must be
TS_YES).  Instead, let's get rid of the out-of-band recheck flag
altogether and rely on returning TS_MAYBE.  This requires exporting
a version of TS_execute that will actually return the full ternary
result of the evaluation ... but we likely should have done that
to start with.

Unfortunately it doesn't seem practical to add a regression test case
that covers this: the amount of data needed to cause the GIN bitmap to
become lossy results in a longer runtime than I think we want to have
in the tests.  (I'm wondering about allowing smaller work_mem settings
to ameliorate that, but it'd be a matter for a separate patch.)

Per bug #16865 from Dimitri Nüscheler.  Back-patch to v13 where
the faulty commit came in.

Discussion: https://postgr.es/m/16865-4ffdc3e682e6d75b@postgresql.org
2021-02-16 12:07:14 -05:00
Amit Kapila f672df5fdd Remove the unnecessary PrepareWrite in pgoutput.
This issue exists from the inception of this code (PG-10) but got exposed
by the recent commit ce0fdbfe97 where we are using origins in tablesync
workers. The problem was that we were sometimes sending the prepare_write
('w') message but then the actual message was not being sent and on the
subscriber side, we always expect a message after prepare_write message
which led to this bug.

I refrained from backpatching this because there is no way in the core
code to hit this prior to commit ce0fdbfe97 and we haven't received any
complaints so far.

Reported-by: Erik Rijkers
Author: Amit Kapila and Vignesh C
Tested-by: Erik Rijkers
Discussion: https://postgr.es/m/1295168140.139428.1613133237154@webmailclassic.xs4all.nl
2021-02-16 07:26:50 +05:30
Andres Freund 8001cb77ee Fix heap_page_prune() parameter order confusion introduced in dc7420c2c9.
Both luckily and unluckily the passed values meant the same for all
types. Luckily because that meant my confusion caused no harm,
unluckily because otherwise the compiler might have warned...

In passing, synchronize parameter names between definition and
declaration.

Reported-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wz=L=nBoepQdH9b5Qd0nMvepFT2CnT6sjWvvpOXa=K8HVQ@mail.gmail.com
2021-02-15 17:12:12 -08:00
Andres Freund a975ff4980 Remove backwards compat ugliness in snapbuild.c.
In 955a684e04 we fixed a bug in initial snapshot creation. In the
course of which several members of struct SnapBuild were obsoleted. As
SnapBuild is serialized to disk we couldn't change the memory layout.

Unfortunately I subsequently forgot about removing the backward compat
gunk, but luckily Heikki just reminded me.

This commit bumps SNAPBUILD_VERSION, therefore breaking existing
slots (which is fine in a major release).

Author: Andres Freund
Reminded-By: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-02-15 16:57:47 -08:00
Tom Lane 0e52903128 Simplify loop logic in nodeIncrementalSort.c.
The inner loop in switchToPresortedPrefixMode() can be implemented
as a conventional integer-counter for() loop, removing a couple of
redundant boolean state variables.  The old logic here was a remnant
of earlier development, but as things now stand there's no reason
for extra complexity.

Also, annotate the test case added by 82e0e2930 to explain why it
manages to hit the corner case fixed in that commit, and add an
EXPLAIN to verify that it's creating an incremental-sort plan.

Back-patch to v13, like the previous patch.

James Coleman and Tom Lane

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-15 10:17:58 -05:00
Heikki Linnakangas 54e51dcde0 Make ExecGetInsertedCols() and friends more robust and improve comments.
If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols()
were called with a ResultRelInfo that's not in the range table and isn't a
partition routing target, the functions would dereference a NULL pointer,
relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing
RI triggers in tables that are not modified directly. None of the current
callers of these functions pass such relations, so this isn't a live bug,
but let's make them more robust.

Also update comment in ResultRelInfo; after commit 6214e2b228,
ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple
routing.

Noted by Coverity. Backpatch down to v11, like commit 6214e2b228.

Reviewed-by: Tom Lane, Amit Langote
2021-02-15 09:28:08 +02:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Peter Geoghegan 7cde6b13a9 Adjust lazy_scan_heap() accounting comments.
Explain which particular LP_DEAD line pointers get accounted for by the
tups_vacuumed variable.
2021-02-14 19:28:37 -08:00
Thomas Munro f900a79ecd Default to wal_sync_method=fdatasync on FreeBSD.
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value.  That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios.  Let's preserve the existing default behavior for now.

Like commit 576477e73c, which did the same for Linux, back-patch to all
supported releases.

Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
2021-02-15 16:04:59 +13:00
Amit Kapila d9b0767bec Fix the warnings introduced in commit ce0fdbfe97.
Author: Amit Kapila
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1610789.1613170207@sss.pgh.pa.us
2021-02-15 07:28:02 +05:30
Thomas Munro 637668fb1d Hold interrupts while running dsm_detach() callbacks.
While cleaning up after a parallel query or parallel index creation that
created temporary files, we could be interrupted by a statement timeout.
The error handling path would then fail to clean up the files when it
ran dsm_detach() again, because the callback was already popped off the
list.  Prevent this hazard by holding interrupts while the cleanup code
runs.

Thanks to Heikki Linnakangas for this suggestion, and also to Kyotaro
Horiguchi, Masahiko Sawada, Justin Pryzby and Tom Lane for discussion of
this and earlier ideas on how to fix the problem.

Back-patch to all supported releases.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20191212180506.GR2082@telsasoft.com
2021-02-15 14:27:33 +13:00
Michael Paquier b83dcf7928 Add result size as argument of pg_cryptohash_final() for overflow checks
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that.  This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.

The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core.  Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes.  In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.

This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.

Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
2021-02-15 10:18:34 +09:00
Tom Lane 2dd6733108 Minor fixes to improve regex debugging code.
When REG_DEBUG is defined, ensure that an un-filled "struct cnfa"
is all-zeroes, not just that it has nstates == 0.  This is mainly
so that looking at "struct subre" structs in gdb doesn't distract
one with a lot of garbage fields during regex compilation.

Adjust some places that print debug output to have suitable fflush
calls afterwards.

In passing, correct an erroneous ancient comment: the concatenation
subre-s created by parsebranch() have op == '.' not ','.

Noted while fooling around with some regex performance improvements.
2021-02-14 19:53:42 -05:00
Thomas Munro c7ecd6af01 ReadNewTransactionId() -> ReadNextTransactionId().
The new name conveys the effect better, is more consistent with similar
functions ReadNextMultiXactId(), ReadNextFullTransactionId(), and
matches the name of the variable that it reads.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmVR4SakBXQUdhhPpMf1aYvZCnna5%3DHKa7DAgEmBAg%2B8g%40mail.gmail.com
2021-02-15 13:17:02 +13:00
Bruce Momjian 8facf1ea00 README/C-comment: document GiST's NSN value 2021-02-13 13:50:49 -05:00
Tom Lane ae4867ec74 Avoid divide-by-zero in regex_selectivity() with long fixed prefix.
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero.  Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.

To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate.  In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on.  The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(

Per report from Alexander Lakhin.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
2021-02-12 16:26:47 -05:00
Amit Kapila ce0fdbfe97 Allow multiple xacts during table sync in logical replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.

There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.

This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.

The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.

This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.

Bump catalog version due to change of state in the catalog
(pg_subscription_rel).

Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-12 07:41:51 +05:30
Peter Geoghegan 3063eb1759 Remove obsolete IndexBulkDeleteResult stats field.
The pages_removed field is no longer used for anything.  It hasn't been
possible for an index to physically shrink since old-style VACUUM FULL
was removed by commit 0a469c87.
2021-02-11 16:49:41 -08:00
Tom Lane 69036aafb9 Simplify jsonfuncs.c code by using strtoint() not strtol().
Explicitly testing for INT_MIN and INT_MAX isn't particularly good
style; it's tedious and may draw useless compiler warnings on
machines where int and long are the same width.  We invented
strtoint() precisely for this usage, so use that instead.

While here, remove gratuitous variations in the way the tests for
did-strtoint-succeed were spelled.  Also, avoid attempting to
negate INT_MIN; that would probably work given that the result
is implicitly cast to uint32, but I think it's nominally undefined
behavior.

Per gripe from Ranier Vilela, though this isn't his proposed patch.

Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com
2021-02-11 12:49:22 -05:00
Tom Lane d4c746516b Remove no-longer-used RTE argument of markVarForSelectPriv().
In the wake of c028faf2a, this is no longer needed.  I left it
out of that patch since the API change would be undesirable in
a released branch; but there's no reason not to do it in HEAD.
2021-02-11 11:23:25 -05:00
Peter Eisentraut 4ad5611055 Fix lack of message pluralization 2021-02-10 11:35:45 +01:00
Michael Paquier 092b785fad Simplify code related to compilation of SSL and OpenSSL
This commit makes more generic some comments and code related to the
compilation with OpenSSL and SSL in general to ease the addition of more
SSL implementations in the future.  In libpq, some OpenSSL-only code is
moved under USE_OPENSSL and not USE_SSL.

While on it, make a comment more consistent in libpq-fe.h.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/5382CB4A-9CF3-4145-BA46-C802615935E0@yesql.se
2021-02-10 15:28:19 +09:00
Michael Paquier bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Amit Kapila cd142e032e Make pg_replication_origin_drop safe against concurrent drops.
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".

To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().

Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro
Herrera
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
2021-02-10 07:17:09 +05:30
Peter Geoghegan 31c7fb41e2 Fix obsolete FSM remarks in nbtree README.
The free space map has used a dedicated relation fork rather than shared
memory segments for over a decade.
2021-02-09 11:36:51 -08:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Michael Paquier 7cb3048f38 Add option PROCESS_TOAST to VACUUM
This option controls if toast tables associated with a relation are
vacuumed or not when running a manual VACUUM.  It was already possible
to trigger a manual VACUUM on a toast relation without processing its
main relation, but a manual vacuum on a main relation always forced a
vacuum on its toast table.  This is useful in scenarios where the level
of bloat or transaction age of the main and toast relations differs a
lot.

This option is an extension of the existing VACOPT_SKIPTOAST that was
used by autovacuum to control if toast relations should be skipped or
not.  This internal flag is renamed to VACOPT_PROCESS_TOAST for
consistency with the new option.

A new option switch, called --no-process-toast, is added to vacuumdb.

Author: Nathan Bossart
Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com
2021-02-09 14:13:57 +09:00
Tom Lane c028faf2a6 Fix mishandling of column-level SELECT privileges for join aliases.
scanNSItemForColumn, expandNSItemAttrs, and ExpandSingleTable would
pass the wrong RTE to markVarForSelectPriv when dealing with a join
ParseNamespaceItem: they'd pass the join RTE, when what we need to
mark is the base table that the join column came from.  The end
result was to not fill the base table's selectedCols bitmap correctly,
resulting in an understatement of the set of columns that are read
by the query.  The executor would still insist on there being at
least one selectable column; but with a correctly crafted query,
a user having SELECT privilege on just one column of a table would
nonetheless be allowed to read all its columns.

To fix, make markRTEForSelectPriv fetch the correct RTE for itself,
ignoring the possibly-mismatched RTE passed by the caller.  Later,
we'll get rid of some now-unused RTE arguments, but that risks
API breaks so we won't do it in released branches.

This problem was introduced by commit 9ce77d75c, so back-patch
to v13 where that came in.  Thanks to Sven Klemm for reporting
the problem.

Security: CVE-2021-20229
2021-02-08 10:14:09 -05:00
Heikki Linnakangas 6214e2b228 Fix permission checks on constraint violation errors on partitions.
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.

The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.

ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.

With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.

NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.

Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.

Reviewed-by: Amit Langote
Security: CVE-2021-3393
2021-02-08 11:01:51 +02:00
Peter Geoghegan 617fffee8a Rename removable xid function for consistency.
GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid().
This is consistent with the general convention for FullTransactionId
equivalents of functions that deal with TransactionId values.  It now
matches the nearby GlobalVisCheckRemovableXid() function, which performs
the same check for callers that use TransactionId values.

Oversight in commit dc7420c2c9.

Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com
2021-02-07 10:11:14 -08:00
Tom Lane d1d2979852 Revert "Propagate CTE property flags when copying a CTE list into a rule."
This reverts commit ed29089633 and
equivalent back-branch commits.  The issue is subtler than I thought,
and it's far from new, so just before a release deadline is no time
to be fooling with it.  We'll consider what to do at a bit more
leisure.

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-07 12:54:08 -05:00
Tom Lane ed29089633 Propagate CTE property flags when copying a CTE list into a rule.
rewriteRuleAction() neglected this step, although it was careful to
propagate other similar flags such as hasSubLinks or hasRowSecurity.
Omitting to transfer hasRecursive is just cosmetic at the moment,
but omitting hasModifyingCTE is a live bug, since the executor
certainly looks at that.

The proposed test case only fails back to v10, but since the executor
examines hasModifyingCTE in 9.x as well, I suspect that a test case
could be devised that fails in older branches.  Given the nearness
of the release deadline, though, I'm not going to spend time looking
for a better test.

Report and patch by Greg Nancarrow, cosmetic changes by me

Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
2021-02-06 19:28:39 -05:00
Tom Lane dd705a039f Disallow converting an inheritance child table to a view.
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables).  ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end.  When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted.  Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).

One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible.  There are probably a lot of other parts of the
system that don't cope well with such a situation, too.  For now,
just forbid it.

Per bug #16856 from Yang Lin.  Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass().  Perhaps the lack of
that infrastructure partially explains the missing check.)

Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
2021-02-06 15:17:01 -05:00
Michael Paquier f7400823c3 Clarify some comments around SharedRecoveryState in xlog.c
SharedRecoveryState has been switched from a boolean to an enum as of
commit 4e87c48, but some comments still referred to it as a boolean.

Author: Amul Sul
Reviewed-by: Dilip Kumar, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAAJ_b97Hf+1SXnm8jySpO+Fhm+-VKFAAce1T_cupUYtnE3Nxig
2021-02-06 10:27:55 +09:00
Heikki Linnakangas c444472af5 Fix backslash-escaping multibyte chars in COPY FROM.
If a multi-byte character is escaped with a backslash in TEXT mode input,
and the encoding is one of the client-only encodings where the bytes after
the first one can have an ASCII byte "embedded" in the char, we didn't
skip the character correctly. After a backslash, we only skipped the first
byte of the next character, so if it was a multi-byte character, we would
try to process its second byte as if it was a separate character. If it
was one of the characters with special meaning, like '\n', '\r', or
another '\\', that would cause trouble.

One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding.
That's supposed to be [backslash][two-byte character][.][f][o][o], but
because the second byte of the two-byte character is 0x5c, we incorrectly
treat it as another backslash. And because the next character is a dot, we
parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt"
error.

Backpatch to all supported versions.

Reviewed-by: John Naylor, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
2021-02-05 11:14:56 +02:00
Tom Lane 0ff865fbe5 Fix bug in HashAgg's selective-column-spilling logic.
Commit 230230223 taught nodeAgg.c that, when spilling tuples from
memory in an oversized hash aggregation, it only needed to spill
input columns referenced in the node's tlist and quals.  Unfortunately,
that's wrong: we also have to save the grouping columns.  The error
is masked in common cases because the grouping columns also appear
in the tlist, but that's not necessarily true.  The main category
of plans where it's not true seem to come from semijoins ("WHERE
outercol IN (SELECT innercol FROM innertable)") where the innercol
needs an implicit promotion to make it comparable to the outercol.
The grouping column will be "innercol::promotedtype", but that
expression appears nowhere in the Agg node's own tlist and quals;
only the bare "innercol" is found in the tlist.

I spent quite a bit of time looking for a suitable regression test
case for this, without much success.  If the number of distinct
values of the innercol is large enough to make spilling happen,
the planner tends to prefer a non-HashAgg plan, at least for
problem sizes that are reasonable to use in the regression tests.
So, no new regression test.  However, this patch does demonstrably
fix the originally-reported test case.

Per report from s.p.e (at) gmx-topmail.de.  Backpatch to v13
where the troublesome code came in.

Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
2021-02-04 23:01:37 -05:00
Tom Lane 82e0e29308 Fix YA incremental sort bug.
switchToPresortedPrefixMode() did the wrong thing if it detected
a batch boundary just at the last tuple of a fullsort group.

The initially-reported symptom was a "retrieved too many tuples in a
bounded sort" error, but the test case added here just silently gives
the wrong answer without this patch.

I (tgl) am not really happy about committing this patch without review
from the incremental-sort authors, but they seem AWOL and we are hard
against a release deadline.  This does demonstrably make some cases
better, anyway.

Per bug #16846 from Yoran Heling.  Back-patch to v13 where incremental
sort was introduced.

Neil Chen

Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
2021-02-04 19:12:14 -05:00
Peter Geoghegan c34787f910 Harden nbtree page deletion.
Add some additional defensive checks in the second phase of index
deletion to detect and report index corruption during VACUUM, and to
avoid having VACUUM become stuck in more cases.  The code is still not
robust in the presence of a circular chain of sibling links, though it's
not clear whether that really matters.  This is follow-up work to commit
3a01f68e.

The new defensive checks rely on the assumption that there can be no
more than one VACUUM operation running for an index at any given time.
Remove an old comment suggesting that multiple concurrent VACUUMs need
to be considered here.  This concern now seems highly unlikely to have
any real validity, since we clearly rely on the same assumption in
several other places.  For example, there are much more recent comments
that appear in the same function (added by commit efada2b8e9) that make
the same assumption.

Also add a CHECK_FOR_INTERRUPTS() to the relevant code path.  Contrary
to comments added by commit 3a01f68e, it is actually possible to handle
interrupts here, at least in the common case where processing takes
place at the leaf level.  We only hold a pin on leafbuf/target page when
stepping right at the leaf level.

No backpatch due to the lack of complaints following hardening added to
the same area by commit 3a01f68e.
2021-02-04 15:42:36 -08:00
Heikki Linnakangas 2f86ab305e Fix small error in COPY FROM progress reporting.
The # of bytes processed was accumulated slightly incorrectly. After
loading more data to the input buffer, we added the number of bytes in
the buffer to the sum. But in case of multi-byte characters or escapes,
there can be a few unprocessed bytes left over from previous load in the
buffer. Those bytes got counted twice.
2021-02-04 17:40:33 +02:00
Peter Eisentraut 3c78e0569c Refactor Windows error message for easier translation
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization.  Refactor the error
messages so this is easier and clearer.  Also fix the capitalization
to match the existing capitalization in the OS.
2021-02-04 13:31:13 +01:00
Michael Paquier 5128483d06 Ensure unlinking of old index file with REINDEX (TABLESPACE)
The original versions of the patch included this part, but a mismerge
from my side has made this piece go missing.  Oversight in c5b28604.
2021-02-04 17:16:47 +09:00
Michael Paquier fc749bc704 Clarify comment in tablesync.c
Author: Peter Smith
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAHut+Pt9_T6pWar0FLtPsygNmme8HPWPdGUyZ_8mE1Yvjdf0ZA@mail.gmail.com
2021-02-04 16:02:31 +09:00
Michael Paquier c5b286047c Add TABLESPACE option to REINDEX
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them.  Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace.  The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX.  This results in an error
if doing REINDEX on a single object.  REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.

Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2021-02-04 14:34:20 +09:00
Tom Lane 9624321ec5 Avoid crash when rolling back within a prepared statement.
If a portal is used to run a prepared CALL or DO statement that
contains a ROLLBACK, PortalRunMulti fails because the portal's
statement list gets cleared by the rollback.  (Since the grammar
doesn't allow CALL/DO in PREPARE, the only easy way to get to this is
via extended query protocol, which treats all inputs as prepared
statements.)  It's difficult to avoid resetting the portal early
because of resource-management issues, so work around this by teaching
PortalRunMulti to be wary of portal->stmts having suddenly become NIL.

The crash has only been seen to occur in v13 and HEAD (as a
consequence of commit 1cff1b95a having added an extra touch of
portal->stmts).  But even before that, the code involved touching a
List that the portal no longer has any claim on.  In the test case at
hand, the List will still exist because of another refcount on the
cached plan; but I'm far from convinced that it's impossible for the
cached plan to have been dropped by the time control gets back to
PortalRunMulti.  Hence, backpatch to v11 where nested transactions
were added.

Thomas Munro and Tom Lane, per bug #16811 from James Inform

Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org
2021-02-03 19:38:43 -05:00
Tom Lane ba0faf81c6 Remove special BKI_LOOKUP magic for namespace and role OIDs.
Now that commit 62f34097c attached BKI_LOOKUP annotation to all the
namespace and role OID columns in the catalogs, there's no real reason
to have the magic PGNSP and PGUID symbols.  Get rid of them in favor
of implementing those lookups according to genbki.pl's normal pattern.

This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes
BKI_DEFAULT(pg_catalog), which seems a lot more transparent.
BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps
less so; but you can look into pg_authid.dat to discover that
POSTGRES is the nonce name for the bootstrap superuser.

This change also means that if we ever need cross-references in the
initial catalog data to any of the other built-in roles besides
POSTGRES, or to some other built-in schema besides pg_catalog,
we can just do it.

No catversion bump here, as there's no actual change in the contents
of postgres.bki.

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-03 12:01:48 -05:00
Tom Lane 62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut 1d71f3c83c Improve confusing variable names
The prototype calls the second argument of
pgstat_progress_update_multi_param() "index", and some callers name
their local variable that way.  But when the surrounding code deals
with index relations, this is confusing, and in at least one case
shadowed another variable that is referring to an index relation.
Adjust those call sites to have clearer local variable naming, similar
to existing callers in indexcmds.c.
2021-02-02 09:20:22 +01:00
Michael Paquier 4ad31bb2ef Remove unused column atttypmod from initial tablesync query
The initial tablesync done by logical replication used a query to fetch
the information of a relation's columns that included atttypmod, but it
was left unused.  This was added by 7c4f524.

Author: Euler Taveira
Reviewed-by: Önder Kalacı, Amit Langote, Japin Li
Discussion: https://postgr.es/m/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-02-02 13:59:23 +09:00
Tom Lane f003a7522b Remove [Merge]AppendPath.partitioned_rels.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

The preceding two commits did away with all use of the partitioned_rels
values; this commit just mechanically removes the fields and the code
that calculated them.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:43:54 -05:00
Tom Lane 5076f88bc9 Remove incidental dependencies on partitioned_rels lists.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

This patch undoes a couple of very minor uses of the partitioned_rels
values.

createplan.c was testing for nil-ness to optimize away the preparatory
work for make_partition_pruneinfo().  That is worth doing if the check
is nigh free, but it's not worth going to any great lengths to avoid.

create_append_path() was testing for nil-ness as part of deciding how
to set up ParamPathInfo for an AppendPath.  I replaced that with a
check for the appendrel's parent rel being partitioned.  That's not
quite the same thing but should cover most cases.  If we note any
interesting loss of optimizations, we can dumb this down to just
always use the more expensive method when the parent is a baserel.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:34:59 -05:00
Tom Lane fb2d645dd5 Revise make_partition_pruneinfo to not use its partitioned_rels input.
It turns out that the calculation of [Merge]AppendPath.partitioned_rels
in allpaths.c is faulty and sometimes omits relevant non-leaf partitions,
allowing an assertion added by commit a929e17e5a to trigger.  Rather
than fix that, it seems better to get rid of those fields altogether.
We don't really need the info until create_plan time, and calculating
it once for the selected plan should be cheaper than calculating it
for each append path we consider.

As a first step, teach make_partition_pruneinfo to collect the relevant
partitioned tables for itself.  It's not hard to do so by traversing
from child tables up to parents using the AppendRelInfo links.

While here, make some minor stylistic improvements; mainly, don't use
the "Relids" alias for bitmapsets that are not identities of any
relation considered by the planner.  Try to document the logic better,
too.

No backpatch, as there does not seem to be a live problem before
a929e17e5a.  Also no new regression test; the code where the bug
was will be gone at the end of this patch series, so it seems a
bit pointless to memorialize the issue.

Tom Lane and David Rowley, per reports from Andreas Seltenreich
and Jaime Casanova.

Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu
Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01 14:05:51 -05:00
Peter Eisentraut 3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Alexander Korotkov bb513b364b Get rid of unnecessary memory allocation in jsonb_subscript_assign()
Current code allocates memory for JsonbValue, but it could be placed locally.
2021-02-01 14:06:02 +03:00
Michael Paquier fe61df7f82 Introduce --with-ssl={openssl} as a configure option
This is a replacement for the existing --with-openssl, extending the
logic to make easier the addition of new SSL libraries.  The grammar is
chosen to be similar to --with-uuid, where multiple values can be
chosen, with "openssl" as the only supported value for now.

The original switch, --with-openssl, is kept for compatibility.

Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Jacob Champion
Discussion: https://postgr.es/m/FAB21FC8-0F62-434F-AA78-6BD9336D630A@yesql.se
2021-02-01 19:19:44 +09:00
Tom Lane 7c5d57caed Fix portability issue in new jsonbsubs code.
On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit
platform), the previous coding would compute a misaligned
workspace->index pointer if nupper is odd.  Architectures where
misaligned access is a hard no-no would then fail.  This appears
to explain why thorntail is unhappy but other buildfarm members
are not.
2021-02-01 02:03:59 -05:00
Alexander Korotkov aa6e46daf5 Throw error when assigning jsonb scalar instead of a composite object
During the jsonb subscripting assignment, the provided path might assume an
object or an array where the source jsonb has a scalar value.  Initial
subscripting assignment logic will skip such an update operation with no
message shown.  This commit makes it throw an error to indicate this type
of situation.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:06 +03:00
Alexander Korotkov 81fcc72e66 Filling array gaps during jsonb subscripting
This commit introduces two new flags for jsonb assignment:

* JB_PATH_FILL_GAPS: Appending array elements on the specified position, gaps
  are filled with nulls (similar to the JavaScript behavior).  This mode also
  instructs to   create the whole path in a jsonb object if some part of the
  path (more than just the last element) is not present.

* JB_PATH_CONSISTENT_POSITION: Assigning keeps array positions consistent by
  preventing prepending of elements.

Both flags are used only in jsonb subscripting assignment.

Initially proposed by Nikita Glukhov based on polymorphic subscripting
patch, but transformed into an independent change.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:51:01 +03:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Peter Geoghegan dc43492e46 Remove unused _bt_delitems_delete() argument.
The latestRemovedXid values used by nbtree deletion operations are
determined by _bt_delitems_delete()'s caller, so there is no reason to
pass a separate heapRel argument.

Oversight in commit d168b66682.
2021-01-31 10:10:55 -08:00
Alexander Korotkov 0c4f355c6a Fix parsing of complex morphs to tsquery
When to_tsquery() or websearch_to_tsquery() meet a complex morph containing
multiple words residing adjacent position, these words are connected
with OP_AND operator.  That leads to surprising results.  For instace,
both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg')
produce '( pg & class ) <-> pg' tsquery.  This tsquery requires
'pg' and 'class' words to reside on the same position and doesn't match
to to_tsvector('pg_class pg').  It appears to be ridiculous behavior, which
needs to be fixed.

This commit makes to_tsquery() or websearch_to_tsquery() connect words
residing adjacent position with OP_PHRASE.  Therefore, now those words are
normally chained with other OP_PHRASE operator.  The examples of above now
produces 'pg <-> class <-> pg' tsquery, which matches to
to_tsvector('pg_class pg').

Another effect of this commit is that complex morph word positions now need to
match the tsvector even if there is no surrounding OP_PHRASE.  This behavior
change generally looks like an improvement but making this commit not
backpatchable.

Reported-by: Barry Pederson
Bug: #16592
Discussion: https://postgr.es/m/16592-70b110ff9731c07d@postgresql.org
Discussion: https://postgr.es/m/CAPpHfdv0EzVhf6CWfB1_TTZqXV_2Sn-jSY3zSd7ePH%3D-%2B1V2DQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane, Neil Chen
2021-01-31 20:14:29 +03:00
Peter Eisentraut dfb75e478c Add primary keys and unique constraints to system catalogs
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.

This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.

The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives.  To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead.  By convention, we
usually make a catalog's OID column its primary key, if it has one.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
2021-01-30 19:44:29 +01:00
Peter Eisentraut 6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Noah Misch 7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Noah Misch 360bd2321b Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.
CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d3 broke this.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:11:38 -08:00
Noah Misch 8a54e12a38 Fix CREATE INDEX CONCURRENTLY for simultaneous prepared transactions.
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows.  Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions.  This expands the VirtualTransactionId structure domain to
admit prepared transactions.  It may be necessary to reindex to recover
from past occurrences.  Back-patch to 9.5 (all supported versions).

Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.

Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
2021-01-30 00:00:27 -08:00
Michael Paquier 24843297a9 Adjust comments of CheckRelationTableSpaceMove() and SetRelationTableSpace()
4c9c359, that introduced those two functions, has been overoptimistic on
the point that only ShareUpdateExclusiveLock would be required when
moving a relation to a new tablespace.  AccessExclusiveLock is a
requirement, but ShareUpdateExclusiveLock may be used under specific
conditions like REINDEX CONCURRENTLY where waits on past transactions
make the operation safe even with a lower-level lock.  The current code
does only the former, so update the existing comments to reflect that.

Once a REINDEX (TABLESPACE) is introduced, those comments would require
an extra refresh to mention their new use case.

While on it, fix an incorrect variable name.

Per discussion with Álvaro Herrera.

Discussion: https://postgr.es/m/20210127140741.GA14174@alvherre.pgsql
2021-01-29 13:59:18 +09:00
Thomas Munro 514b411a2b Retire pg_standby.
pg_standby was useful more than a decade ago, but now it is obsolete.
It has been proposed that we retire it many times.  Now seems like a
good time to finally do it, because "waiting restore commands"
are incompatible with a proposed recovery prefetching feature.

Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com
Author: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
2021-01-29 14:09:41 +13:00
Tom Lane 1046dbedde Silence another gcc 11 warning.
Per buildfarm and local experimentation, bleeding-edge gcc isn't
convinced that the MemSet in reorder_function_arguments() is safe.
Shut it up by adding an explicit check that pronargs isn't negative,
and by changing MemSet to memset.  (It appears that either change is
enough to quiet the warning at -O2, but let's do both to be sure.)
2021-01-28 17:19:16 -05:00
Alvaro Herrera 6f5c8a8ec2
Remove bogus restriction from BEFORE UPDATE triggers
In trying to protect the user from inconsistent behavior, commit
487e9861d0 "Enable BEFORE row-level triggers for partitioned tables"
tried to prevent BEFORE UPDATE FOR EACH ROW triggers from moving the row
from one partition to another.  However, it turns out that the
restriction is wrong in two ways: first, it fails spuriously, preventing
valid situations from working, as in bug #16794; and second, they don't
protect from any misbehavior, because tuple routing would cope anyway.

Fix by removing that restriction.

We keep the same restriction on BEFORE INSERT FOR EACH ROW triggers,
though.  It is valid and useful there.  In the future we could remove it
by having tuple reroute work for inserts as it does for updates.

Backpatch to 13.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Phillip Menke <pg@pmenke.de>
Discussion: https://postgr.es/m/16794-350a655580fbb9ae@postgresql.org
2021-01-28 16:56:07 -03:00
Tom Lane 1d9351a87c Fix hash partition pruning with asymmetric partition sets.
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong.  These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing.  However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.

While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place.  It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array.  Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true.  Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.

This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.

Per bug #16840 from Michał Albrycht.  Back-patch to v11 where the
hash partitioning code came in.  (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)

Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2021-01-28 13:41:55 -05:00
Heikki Linnakangas 6c5576075b Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the target
encoding. Implement functions to convert directly between the two.

The reason to do this now is that I'm working on a patch that will change
the conversion function signature so that if the input is invalid, we
convert as much as we can and return the number of bytes successfully
converted. That's not possible if we use an intermediary format, because
if an error happens in the intermediary -> final conversion, we lose track
of the location of the invalid character in the original input. Avoiding
the intermediate step makes the conversions faster, too.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi
2021-01-28 14:53:03 +02:00
Heikki Linnakangas b80e10638e Add mbverifystr() functions specific to each encoding.
This makes pg_verify_mbstr() function faster, by allowing more efficient
encoding-specific implementations. All the implementations included in
this commit are pretty naive, they just call the same encoding-specific
verifychar functions that were used previously, but that already gives a
performance boost because the tight character-at-a-time loop is simpler.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
2021-01-28 14:40:07 +02:00
Andrew Gierth a3367aa3c4 Don't add bailout adjustment for non-strict deserialize calls.
When building aggregate expression steps, strict checks need a bailout
jump for when a null value is encountered, so there is a list of steps
that require later adjustment. Adding entries to that list for steps
that aren't actually strict would be harmless, except that there is an
Assert which catches them. This leads to spurious errors on asserts
builds, for data sets that trigger parallel aggregation of an
aggregate with a non-strict deserialization function (no such
aggregates exist in the core system).

Repair by not adding the adjustment entry when it's not needed.

Backpatch back to 11 where the code was introduced.

Per a report from Darafei (Komzpa) of the PostGIS project; analysis
and patch by me.

Discussion: https://postgr.es/m/87mty7peb3.fsf@news-spur.riddles.org.uk
2021-01-28 10:53:10 +00:00
Michael Paquier f854c69a5b Refactor SQL functions of SHA-2 in cryptohashfuncs.c
The same code pattern was repeated four times when compiling a SHA-2
hash.  This refactoring has the advantage to issue a compilation warning
if a new value is added to pg_cryptohash_type, so as anybody doing an
addition in this area would need to consider if support for a new SQL
function is needed or not.

Author: Sehrope Sarkuni, Michael Paquier
Discussion: https://postgr.es/m/YA7DvLRn2xnTgsMc@paquier.xyz
2021-01-28 16:13:26 +09:00
Peter Geoghegan e19594c5c0 Reduce the default value of vacuum_cost_page_miss.
When commit f425b605 introduced cost based vacuum delays back in 2004,
the defaults reflected then-current trends in hardware, as well as
certain historical limitations in PostgreSQL.  There have been enormous
improvements in both areas since that time.  The cost limit GUC defaults
finally became much more representative of current trends following
commit cbccac37, which decreased autovacuum_vacuum_cost_delay's default
by 10x for PostgreSQL 12 (it went from 20ms to only 2ms).

The relative costs have shifted too.  This should also be accounted for
by the defaults.  More specifically, the relative importance of avoiding
dirtying pages within VACUUM has greatly increased, primarily due to
main memory capacity scaling and trends in flash storage.  Within
Postgres itself, improvements like sequential access during index
vacuuming (at least in nbtree and GiST indexes) have also been
contributing factors.

To reflect all this, decrease the default of vacuum_cost_page_miss to 2.
Since the default of vacuum_cost_page_dirty remains 20, dirtying a page
is now considered 10x "costlier" than a page miss by default.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmLPFnkWT8xMjmcsm7YS3+_Qi3iRWAb2+_Bc8UhVyHfuA@mail.gmail.com
2021-01-27 15:11:13 -08:00
Robert Haas 69059d3b2f In TrimCLOG(), don't reset XactCtl->shared->latest_page_number.
Since the CLOG page number is not recorded directly in the checkpoint
record, we have to use ShmemVariableCache->nextXid to figure out the
latest CLOG page number at the start of recovery. However, as recovery
progresses, replay of CLOG/EXTEND records will update our notion of
the latest page number, and we should rely on that being accurate
rather than recomputing the value based on an updated notion of
nextXid. ShmemVariableCache->nextXid is only an approximation
during recovery anyway, whereas CLOG/EXTEND records are an
authoritative representation of how the SLRU has been updated.

Commit 0fcc2decd4 makes this
simplification possible, as before that change clog_redo() might
have injected a bogus value here, and we'd want to get rid of
that before entering normal running.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 15:52:34 -05:00
Robert Haas 0fcc2decd4 In clog_redo(), don't set XactCtl->shared->latest_page_number.
The comment is no longer accurate, and hasn't been entirely accurate
since Hot Standby was introduced. The original idea here was that
StartupCLOG() wouldn't be called until the end of recovery and
therefore this value would be uninitialized when this code is reached,
but Hot Standby made that true only when hot_standby=off, and commit
1f113abdf8 means that this value is now
always initialized before replay even starts.

The original purpose of this code was to bypass the sanity check
in SimpleLruTruncate(), which will no longer occur: now, if something
is wrong, that sanity check might trip during recovery. That's
probably a good thing, because in the current code base
latest_page_number should always be initialized and therefore we
expect that the sanity check should pass. If it doesn't, something
has gone wrong, and complaining about it is appropriate.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 13:11:30 -05:00
Robert Haas 1f113abdf8 Move StartupCLOG() calls to just after we initialize ShmemVariableCache.
Previously, the hot_standby=off code path did this at end of recovery,
while the hot_standby=on code path did it at the beginning of recovery.
It's better to do this in only one place because (a) it's simpler,
(b) StartupCLOG() is trivial so trying to postpone the work isn't
useful, and (c) this will make it possible to simplify some other
logic.

Patch by me, reviewed by Heikki Linnakangas.

Discussion: http://postgr.es/m/CA+TgmoZYig9+AQodhF5sRXuKkJ=RgFDugLr3XX_dz_F-p=TwTg@mail.gmail.com
2021-01-27 12:20:46 -05:00
Peter Geoghegan e42b3c3bd6 Fix GiST index deletion assert issue.
Avoid calling heap_index_delete_tuples() with an empty deltids array to
avoid an assertion failure.

This issue was arguably an oversight in commit b5f58cf2, though the
failing assert itself was added by my recent commit d168b666.  No
backpatch, though, since the oversight is harmless in the back branches.

Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Discussion: https://postgr.es/m/CAJKUy5jscES84n3puE=sYngyF+zpb4wv8UMtuLnLPv5z=6yyNw@mail.gmail.com
2021-01-26 23:24:37 -08:00
Michael Paquier 4c9c359d38 Refactor code in tablecmds.c to check and process tablespace moves
Two code paths of tablecmds.c (for relations with storage and without
storage) use the same logic to check if the move of a relation to a
new tablespace is allowed or not and to update pg_class.reltablespace
and pg_class.relfilenode.

A potential TABLESPACE clause for REINDEX, CLUSTER and VACUUM FULL needs
similar checks to make sure that nothing is moved around in illegal ways
(no mapped relations, shared relations only in pg_global, no move of
temp tables owned by other backends).

This reorganizes the existing code of ALTER TABLE so as all this logic
is controlled by two new routines that can be reused for the other
commands able to move relations across tablespaces, limiting the number
of code paths in need of the same protections.  This also removes some
code that was duplicated for tables with and without storage for ALTER
TABLE.

Author: Alexey Kondratov, Michael Paquier
Discussion: https://postgr.es/m/YA+9mAMWYLXJMVPL@paquier.xyz
2021-01-27 11:54:16 +09:00
Tom Lane d5a83d79c9 Rethink recently-added SPI interfaces.
SPI_execute_with_receiver and SPI_cursor_parse_open_with_paramlist are
new in v14 (cf. commit 2f48ede08).  Before they can get out the door,
let's change their APIs to follow the practice recently established by
SPI_prepare_extended etc: shove all optional arguments into a struct
that callers are supposed to pre-zero.  The hope is to allow future
addition of more options without either API breakage or a continuing
proliferation of new SPI entry points.  With that in mind, choose
slightly more generic names for them: SPI_execute_extended and
SPI_cursor_parse_open respectively.

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-26 16:37:12 -05:00
Tom Lane ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Andres Freund 55ef8555f0 Fix two typos in snapbuild.c.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/c94be044-818f-15e3-1ad3-7a7ae2dfed0a@iki.fi
2021-01-25 12:15:10 -08:00
Tom Lane 07d46fceb4 Fix broken ruleutils support for function TRANSFORM clauses.
I chanced to notice that this dumped core due to a faulty Assert.
To add insult to injury, the output has been misformatted since v11.
Obviously we need some regression testing here.

Discussion: https://postgr.es/m/d1cc628c-3953-4209-957b-29427acc38c8@www.fastmail.com
2021-01-25 13:03:43 -05:00
Robert Haas d18e75664a Remove CheckpointLock.
Up until now, we've held this lock when performing a checkpoint or
restartpoint, but commit 076a055acf back
in 2004 and commit 7e48b77b1c from 2009,
taken together, have removed all need for this. In the present code,
there's only ever one process entitled to attempt a checkpoint: either
the checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.

One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() will now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.

Also, if there are any such problems, we might want to fix them
rather than reverting this, since running lots of code with
interrupt handling suspended is generally bad.

Patch by me, per an inquiry by Amul Sul. Review by Tom Lane
and Michael Paquier.

Discussion: http://postgr.es/m/CAAJ_b97XnBBfYeSREDJorFsyoD1sHgqnNuCi=02mNQBUMnA=FA@mail.gmail.com
2021-01-25 12:34:38 -05:00
David Rowley 16dfe253e3 Fix hypothetical bug in heap backward scans
Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the number of pages to scan was
specified by heap_setscanlimits().  The code incorrectly started the scan
at the end of the relation when startBlk was 0, or otherwise at
startBlk - 1, neither of which is correct when only scanning a subset of
pages.

The fix here checks if heap_setscanlimits() has changed the number of
pages to scan and if so we set the first page to scan as the final page in
the specified range during backward scans.

Proper adjustment of this code was forgotten when heap_setscanlimits() was
added in 7516f5259 back in 9.5.  However, practice, nowhere in core code
performs backward scans after having used heap_setscanlimits(), yet, it is
possible an extension uses the heap functions in this way, hence
backpatch.

An upcoming patch does use heap_setscanlimits() with backward scans, so
this must be fixed before that can go in.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvpGc9h0_oVD2CtgBcxCS1N-qDYZSeBRnUh+0CWJA9cMaA@mail.gmail.com
Backpatch-through: 9.5, all supported versions
2021-01-25 19:52:18 +13:00
Amit Kapila 40ab64c1ec Fix ALTER PUBLICATION...DROP TABLE behavior.
Commit 69bd60672 fixed the initialization of streamed transactions for
RelationSyncEntry. It forgot to initialize the publication actions while
invalidating the RelationSyncEntry due to which even though the relation
is dropped from a particular publication we still publish its changes. Fix
it by initializing pubactions when entry got invalidated.

Author: Japin Li and Bharath Rupireddy
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALj2ACV+0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw@mail.gmail.com
2021-01-25 07:39:29 +05:30
Tomas Vondra 39b66a91bd Fix COPY FREEZE with CLOBBER_CACHE_ALWAYS
This adds code omitted from commit 7db0cd2145 by accident, which had
two consequences. Firstly, only rows inserted by heap_multi_insert were
frozen as expected when running COPY FREEZE, while heap_insert left
rows unfrozen. That however includes rows in TOAST tables, so a lot of
data might have been left unfrozen. Secondly, page might have been left
partially empty after relcache invalidation.

This addresses both of those issues.

Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
2021-01-24 01:08:11 +01:00
Tom Lane 7cd9765f9b Re-allow DISTINCT in pl/pgsql expressions.
I'd omitted this from the grammar in commit c9d529848, figuring that
it wasn't worth supporting.  However we already have one complaint,
so it seems that judgment was wrong.  It doesn't require a huge
amount of code, so add it back.  (I'm still drawing the line at
UNION/INTERSECT/EXCEPT though: those'd require an unreasonable
amount of grammar refactoring, and the single-result-row restriction
makes them near useless anyway.)

Also rethink the documentation: this behavior is a property of
all pl/pgsql expressions, not just assignments.

Discussion: https://postgr.es/m/20210122134106.e94c5cd7@mail.verfriemelt.org
2021-01-22 16:26:22 -05:00
Peter Eisentraut 09418bed67 Remove bogus tracepoint
Calls to LWLockWaitForVar() fired the TRACE_POSTGRESQL_LWLOCK_ACQUIRE
tracepoint, but LWLockWaitForVar() never actually acquires the LWLock.
(Probably a copy/paste bug in 68a2e52bbaf.)  Remove it.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAGRY4nxJo+-HCC2i5H93ttSZ4gZO-FSddCwvkb-qAfQ1zdXd1w@mail.gmail.com
2021-01-22 11:58:21 +01:00
Michael Paquier af0e79c8f4 Move SSL information callback earlier to capture more information
The callback for retrieving state change information during connection
setup was only installed when the connection was mostly set up, and
thus didn't provide much information and missed all the details related
to the handshake.

This also extends the callback with SSL_state_string_long() to print
more information about the state change within the SSL object handled.

While there, fix some comments which were incorrectly referring to the
callback and its previous location in fe-secure.c.

Author: Daniel Gustafsson
Discussion: https://postgr.es/m/232CF476-94E1-42F1-9408-719E2AEC5491@yesql.se
2021-01-22 09:26:27 +09:00
Tom Lane 55dc86eca7 Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins.  This could result in a malformed plan.  The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.

The right value to use is the join level we actually intend to evaluate
the PHV at.  We can get that from the ph_eval_at field of the associated
PlaceHolderInfo.  However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level.  (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.)  Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.

The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos.  We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter.  (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)

Back-patch to v12.  The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.

Per report from Stephan Springl.

Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
Tomas Vondra 920f853dc9 Fix initialization of FDW batching in ExecInitModifyTable
ExecInitModifyTable has to initialize batching for all result relations,
not just the first one. Furthermore, when junk filters were necessary,
the pointer pointed past the mtstate->resultRelInfo array.

Per reports from multiple non-x86 animals (florican, locust, ...).

Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-21 03:34:32 +01:00
Tomas Vondra b663a41363 Implement support for bulk inserts in postgres_fdw
Extends the FDW API to allow batching inserts into foreign tables. That
is usually much more efficient than inserting individual rows, due to
high latency for each round-trip to the foreign server.

It was possible to implement something similar in the regular FDW API,
but it was inconvenient and there were issues with reporting the number
of actually inserted rows etc. This extends the FDW API with two new
functions:

* GetForeignModifyBatchSize - allows the FDW picking optimal batch size

* ExecForeignBatchInsert - inserts a batch of rows at once

Currently, only INSERT queries support batching. Support for DELETE and
UPDATE may be added in the future.

This also implements batching for postgres_fdw. The batch size may be
specified using "batch_size" option both at the server and table level.

The initial patch version was written by me, but it was rewritten and
improved in many ways by Takayuki Tsunakawa.

Author: Takayuki Tsunakawa
Reviewed-by: Tomas Vondra, Amit Langote
Discussion: https://postgr.es/m/20200628151002.7x5laxwpgvkyiu3q@development
2021-01-20 23:57:27 +01:00
Heikki Linnakangas 6b4d3046f4 Fix bug in detecting concurrent page splits in GiST insert
In commit 9eb5607e69, I got the condition on checking for split or
deleted page wrong: I used && instead of ||. The comment correctly said
"concurrent split _or_ deletion".

As a result, GiST insertion could miss a concurrent split, and insert to
wrong page. Duncan Sands demonstrated this with a test script that did a
lot of concurrent inserts.

Backpatch to v12, where this was introduced. REINDEX is required to fix
indexes that were affected by this bug.

Backpatch-through: 12
Reported-by: Duncan Sands
Discussion: https://www.postgresql.org/message-id/a9690483-6c6c-3c82-c8ba-dc1a40848f11%40deepbluecap.com
2021-01-20 11:58:03 +02:00
Michael Paquier 21378e1fef Fix ALTER DEFAULT PRIVILEGES with duplicated objects
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors.  Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.

A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.

Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
2021-01-20 11:38:17 +09:00
Tom Lane a0efda88a6 Remove faulty support for MergeAppend plan with WHERE CURRENT OF.
Somebody extended search_plan_tree() to treat MergeAppend exactly
like Append, which is 100% wrong, because unlike Append we can't
assume that only one input node is actively returning tuples.
Hence a cursor using a MergeAppend across a UNION ALL or inheritance
tree could falsely match a WHERE CURRENT OF query at a row that
isn't actually the cursor's current output row, but coincidentally
has the same TID (in a different table) as the current output row.

Delete the faulty code; this means that such a case will now return
an error like 'cursor "foo" is not a simply updatable scan of table
"bar"', instead of silently misbehaving.  Users should not find that
surprising though, as the same cursor query could have failed that way
already depending on the chosen plan.  (It would fail like that if the
sort were done with an explicit Sort node instead of MergeAppend.)

Expand the clearly-inadequate commentary to be more explicit about
what this code is doing, in hopes of forestalling future mistakes.

It's been like this for awhile, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/482865.1611075182@sss.pgh.pa.us
2021-01-19 13:25:33 -05:00
Amit Kapila ed43677e20 pgindent worker.c.
This is a leftover from commit 0926e96c49. Changing this separately
because this file is being modified for upcoming patch logical replication
of 2PC.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps+EgG8KzcmAyAgBUi_vuTps6o9ZA8DG6SdnO0-YuOhPQ@mail.gmail.com
2021-01-19 08:10:13 +05:30
Tom Lane 60661bbf2d Avoid crash with WHERE CURRENT OF and a custom scan plan.
execCurrent.c's search_plan_tree() assumed that ForeignScanStates
and CustomScanStates necessarily have a valid ss_currentRelation.
This is demonstrably untrue for postgres_fdw's remote join and
remote aggregation plans, and non-leaf custom scans might not have
an identifiable scan relation either.  Avoid crashing by ignoring
such nodes when the field is null.

This solution will lead to errors like 'cursor "foo" is not a
simply updatable scan of table "bar"' in cases where maybe we
could have allowed WHERE CURRENT OF to work.  That's not an issue
for postgres_fdw's usages, since joins or aggregations would render
WHERE CURRENT OF invalid anyway.  But an otherwise-transparent
upper level custom scan node might find this annoying.  When and if
someone cares to expend work on such a scenario, we could invent a
custom-scan-provider callback to determine what's safe.

Report and patch by David Geier, commentary by me.  It's been like
this for awhile, so back-patch to all supported branches.

Discussion: https://postgr.es/m/0253344d-9bdd-11c4-7f0d-d88c02cd7991@swarm64.com
2021-01-18 18:32:30 -05:00
Tom Lane 3fd80c728d Narrow the scope of a local variable.
This is better style and more symmetrical with the other if-branch.
This likely should have been included in 9de77b545 (which created
the opportunity), but it was overlooked.

Japin Li

Discussion: https://postgr.es/m/MEYP282MB16699FA4A7CD57EB250E871FB6A40@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2021-01-18 15:55:01 -05:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Robert Haas a3ed4d1efe Allow for error or refusal while absorbing a ProcSignalBarrier.
Previously, the per-barrier-type functions tasked with absorbing
them were expected to always succeed and never throw an error.
However, that's a bit inconvenient. Further study has revealed that
there are realistic cases where it might not be possible to absorb
a ProcSignalBarrier without terminating the transaction, or even
the whole backend. Similarly, for some barrier types, there might
be other reasons where it's not reasonably possible to absorb the
barrier at certain points in the code, so provide a way for a
per-barrier-type function to reject absorbing the barrier.

Unfortunately, there's still no committed code making use of this
infrastructure; hopefully, we'll get there. :-(

Patch by me, reviewed by Andres Freund and Amul Sul.

Discussion: http://postgr.es/m/20200908182005.xya7wetdh3pndzim@alap3.anarazel.de
Discussion: http://postgr.es/m/CA+Tgmob56Pk1-5aTJdVPCWFHon7me4M96ENpGe9n_R4JUjjhZA@mail.gmail.com
2021-01-18 12:09:52 -05:00
Peter Eisentraut 15251c0a60 Pause recovery for insufficient parameter settings
When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior for hot standbys to pause recovery at
that point instead.  That allows read traffic on the standby to
continue while database administrators figure out next steps.  When
recovery is unpaused, the server shuts down (as before).  The idea is
to fix the parameters while recovery is paused and then restart when
there is a maintenance window.

Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2021-01-18 09:04:04 +01:00
Michael Paquier a3dc926009 Refactor option handling of CLUSTER, REINDEX and VACUUM
This continues the work done in b5913f6.  All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options.  Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM).  The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.

This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.

Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.

Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
2021-01-18 14:03:10 +09:00
Tomas Vondra 7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now we only marked individual tuples as frozen,
but page-level flags were not updated, so the first VACUUM after the
COPY FREEZE had to rewrite the whole table.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Anastasia Lubennikova, Pavan Deolasee, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada,
             Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii,
             Darafei Praliaskouski
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
2021-01-17 22:28:26 +01:00
Magnus Hagander 960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Noah Misch 6db992833c Prevent excess SimpleLruTruncate() deletion.
Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
2021-01-16 12:21:35 -08:00
Amit Kapila c95765f476 Remove unnecessary pstrdup in fetch_table_list.
The result of TextDatumGetCString is already palloc'ed so we don't need to
allocate memory for it again. We decide not to backpatch it as there
doesn't seem to be any case where it can create a meaningful leak.

Author: Zhijie Hou
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/229fed2eb8c54c71a96ccb99e516eb12@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-16 10:15:32 +05:30
Tomas Vondra c9a0dc3486 Disallow CREATE STATISTICS on system catalogs
Add a check that CREATE STATISTICS does not add extended statistics on
system catalogs, similarly to indexes etc.  It can be overriden using
the allow_system_table_mods GUC.

This bug exists since 7b504eb282, adding the extended statistics, so
backpatch all the way back to PostgreSQL 10.

Author: Tomas Vondra
Reported-by: Dean Rasheed
Backpatch-through: 10
Discussion: https://postgr.es/m/CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb%2BrQ%40mail.gmail.com
2021-01-15 23:31:22 +01:00
Alvaro Herrera f9900df5f9
Avoid spurious wait in concurrent reindex
This is like commit c98763bf51, but for REINDEX CONCURRENTLY.  To wit:
this flags indicates that the current process is safe to ignore for the
purposes of waiting for other snapshots, when doing CREATE INDEX
CONCURRENTLY and REINDEX CONCURRENTLY.  This helps two processes doing
either of those things not deadlock, and also avoids spurious waits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-15 10:31:42 -03:00
Fujii Masao 2ad78a87f0 Fix calculation of how much shared memory is required to store a TOC.
Commit ac883ac453 refactored shm_toc_estimate() but changed its calculation
of shared memory size for TOC incorrectly. Previously this could cause too
large memory to be allocated.

Back-patch to v11 where the bug was introduced.

Author: Takayuki Tsunakawa
Discussion: https://postgr.es/m/TYAPR01MB2990BFB73170E2C4921E2C4DFEA80@TYAPR01MB2990.jpnprd01.prod.outlook.com
2021-01-15 12:44:17 +09:00
Michael Paquier 5ae1572993 Fix O(N^2) stat() calls when recycling WAL segments
The counter tracking the last segment number recycled was getting
initialized when recycling one single segment, while it should be used
across a full cycle of segments recycled to prevent useless checks
related to entries already recycled.

This performance issue has been introduced by b2a5545, and it was first
implemented in 61b86142.

No backpatch is done per the lack of field complaints.

Reported-by: Andres Freund, Thomas Munro
Author: Michael Paquier
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170621211016.eln6cxxp3jrv7m4m@alap3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+DRiF9z1_MU4fWq+RfJMxP7zjoptfcmuCFPeO4JM2iVg@mail.gmail.com
2021-01-15 10:33:13 +09:00
Alvaro Herrera ebfe2dbd6b
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025d in pg12 for tables and 33e6c34c32 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 15:32:14 -03:00
Fujii Masao fef5b47f6b Ensure that a standby is able to follow a primary on a newer timeline.
Commit 709d003fbd refactored WAL-reading code, but accidentally caused
WalSndSegmentOpen() to fail to follow a timeline switch while reading from
a historic timeline. This issue caused a standby to fail to follow a primary
on a newer timeline when WAL archiving is enabled.

If there is a timeline switch within the segment, WalSndSegmentOpen() should
read from the WAL segment belonging to the new timeline. But previously
since it failed to follow a timeline switch, it tried to read the WAL segment
with old timeline. When WAL archiving is enabled, that WAL segment with
old timeline doesn't exist because it's renamed to .partial. This leads
a primary to have tried to read non-existent WAL segment, and which caused
replication to faill with the error "ERROR:  requested WAL segment ... has
 already been removed".

This commit fixes WalSndSegmentOpen() so that it's able to follow a timeline
switch, to ensure that a standby is able to follow a primary on a newer
timeline even when WAL archiving is enabled.

This commit also adds the regression test to check whether a standby is
able to follow a primary on a newer timeline when WAL archiving is enabled.

Back-patch to v13 where the bug was introduced.

Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi, tweaked by Fujii Masao
Reviewed-by:  Alvaro Herrera, Fujii Masao
Discussion: https://postgr.es/m/20201209.174314.282492377848029776.horikyota.ntt@gmail.com
2021-01-14 12:27:11 +09:00
Michael Paquier aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Thomas Munro 0d56acfbaa Move our p{read,write}v replacements into their own files.
macOS's ranlib issued a warning about an empty pread.o file with the
previous arrangement, on systems new enough to require no replacement
functions.  Let's go back to using configure's AC_REPLACE_FUNCS system
to build and include each .o in the library only if it's needed, which
requires moving the *v() functions to their own files.

Also move the _with_retry() wrapper to a more permanent home.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1283127.1610554395%40sss.pgh.pa.us
2021-01-14 11:16:59 +13:00
Peter Geoghegan d168b66682 Enhance nbtree index tuple deletion.
Teach nbtree and heapam to cooperate in order to eagerly remove
duplicate tuples representing dead MVCC versions.  This is "bottom-up
deletion".  Each bottom-up deletion pass is triggered lazily in response
to a flood of versions on an nbtree leaf page.  This usually involves a
"logically unchanged index" hint (these are produced by the executor
mechanism added by commit 9dc718bd).

The immediate goal of bottom-up index deletion is to avoid "unnecessary"
page splits caused entirely by version duplicates.  It naturally has an
even more useful effect, though: it acts as a backstop against
accumulating an excessive number of index tuple versions for any given
_logical row_.  Bottom-up index deletion complements what we might now
call "top-down index deletion": index vacuuming performed by VACUUM.
Bottom-up index deletion responds to the immediate local needs of
queries, while leaving it up to autovacuum to perform infrequent clean
sweeps of the index.  The overall effect is to avoid certain
pathological performance issues related to "version churn" from UPDATEs.

The previous tableam interface used by index AMs to perform tuple
deletion (the table_compute_xid_horizon_for_tuples() function) has been
replaced with a new interface that supports certain new requirements.
Many (perhaps all) of the capabilities added to nbtree by this commit
could also be extended to other index AMs.  That is left as work for a
later commit.

Extend deletion of LP_DEAD-marked index tuples in nbtree by adding logic
to consider extra index tuples (that are not LP_DEAD-marked) for
deletion in passing.  This increases the number of index tuples deleted
significantly in many cases.  The LP_DEAD deletion process (which is now
called "simple deletion" to clearly distinguish it from bottom-up
deletion) won't usually need to visit any extra table blocks to check
these extra tuples.  We have to visit the same table blocks anyway to
generate a latestRemovedXid value (at least in the common case where the
index deletion operation's WAL record needs such a value).

Testing has shown that the "extra tuples" simple deletion enhancement
increases the number of index tuples deleted with almost any workload
that has LP_DEAD bits set in leaf pages.  That is, it almost never fails
to delete at least a few extra index tuples.  It helps most of all in
cases that happen to naturally have a lot of delete-safe tuples.  It's
not uncommon for an individual deletion operation to end up deleting an
order of magnitude more index tuples compared to the old naive approach
(e.g., custom instrumentation of the patch shows that this happens
fairly often when the regression tests are run).

Add a further enhancement that augments simple deletion and bottom-up
deletion in indexes that make use of deduplication: Teach nbtree's
_bt_delitems_delete() function to support granular TID deletion in
posting list tuples.  It is now possible to delete individual TIDs from
posting list tuples provided the TIDs have a tableam block number of a
table block that gets visited as part of the deletion process (visiting
the table block can be triggered directly or indirectly).  Setting the
LP_DEAD bit of a posting list tuple is still an all-or-nothing thing,
but that matters much less now that deletion only needs to start out
with the right _general_ idea about which index tuples are deletable.

Bump XLOG_PAGE_MAGIC because xl_btree_delete changed.

No bump in BTREE_VERSION, since there are no changes to the on-disk
representation of nbtree indexes.  Indexes built on PostgreSQL 12 or
PostgreSQL 13 will automatically benefit from bottom-up index deletion
(i.e. no reindexing required) following a pg_upgrade.  The enhancement
to simple deletion is available with all B-Tree indexes following a
pg_upgrade, no matter what PostgreSQL version the user upgrades from.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
2021-01-13 09:21:32 -08:00
Peter Geoghegan 9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Fujii Masao 39b03690b5 Log long wait time on recovery conflict when it's resolved.
This is a follow-up of the work done in commit 0650ff2303. This commit
extends log_recovery_conflict_waits so that a log message is produced
also when recovery conflict has already been resolved after deadlock_timeout
passes, i.e., when the startup process finishes waiting for recovery
conflict after deadlock_timeout. This is useful in investigating how long
recovery conflicts prevented the recovery from applying WAL.

Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Bertrand Drouvot
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-13 22:59:17 +09:00
Amit Kapila ee1b38f659 Fix memory leak in SnapBuildSerialize.
The memory for the snapshot was leaked while serializing it to disk during
logical decoding. This memory will be freed only once walsender stops
streaming the changes. This can lead to a huge memory increase when master
logs Standby Snapshot too frequently say when the user is trying to create
many replication slots.

Reported-by: funnyxj.fxj@alibaba-inc.com
Diagnosed-by: funnyxj.fxj@alibaba-inc.com
Author: Amit Kapila
Backpatch-through: 9.5
Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com
2021-01-13 08:19:50 +05:30
Amit Kapila bea449c635 Optimize DropRelFileNodesAllBuffers() for recovery.
Similar to commit d6ad34f341, this patch optimizes
DropRelFileNodesAllBuffers() by avoiding the complete buffer pool scan and
instead find the buffers to be invalidated by doing lookups in the
BufMapping table.

This optimization helps operations where the relation files need to be
removed like Truncate, Drop, Abort of Create Table, etc.

Author: Kirk Jamison
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-13 07:46:11 +05:30
Michael Paquier fce7d0e6ef Fix routine name in comment of catcache.c
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACUDXLAkf_XxQO9tAUtnTNGi3Lmd8fANd+vBJbcHn1HoWA@mail.gmail.com
2021-01-13 10:32:21 +09:00
Alvaro Herrera c6c4b37395
Invent struct ReindexIndexInfo
This struct is used by ReindexRelationConcurrently to keep track of the
relations to process.  This saves having to obtain some data repeatedly,
and has future uses as well.

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Hamid Akhtar <hamid.akhtar@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20201130195439.GA24598@alvherre.pgsql
2021-01-12 17:05:06 -03:00
Alvaro Herrera a3e51a36b7
Fix thinko in comment
This comment has been wrong since its introduction in commit
2c03216d83.

Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAzz6qipFJBbGEaHmyWxvvNDp8httbwLR9tUQWaTjUs2Q@mail.gmail.com
2021-01-12 11:48:45 -03:00
Amit Kapila 044aa9e70e Fix relation descriptor leak.
We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote and Mark Zhao
Reviewed-by: Amit Kapila and Ashutosh Bapat
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Discussion: https://postgr.es/m/tencent_6E296D2F7D70AFC90D83353B69187C3AA507@qq.com
2021-01-12 08:19:39 +05:30
Amit Kapila d6ad34f341 Optimize DropRelFileNodeBuffers() for recovery.
The recovery path of DropRelFileNodeBuffers() is optimized so that
scanning of the whole buffer pool can be avoided when the number of
blocks to be truncated in a relation is below a certain threshold. For
such cases, we find the buffers by doing lookups in BufMapping table.
This improves the performance by more than 100 times in many cases
when several small tables (tested with 1000 relations) are truncated
and where the server is configured with a large value of shared
buffers (greater than equal to 100GB).

This optimization helps cases (a) when vacuum or autovacuum truncated off
any of the empty pages at the end of a relation, or (b) when the relation is
truncated in the same transaction in which it was created.

This commit introduces a new API smgrnblocks_cached which returns a cached
value for the number of blocks in a relation fork. This helps us to determine
the exact size of relation which is required to apply this optimization. The
exact size is required to ensure that we don't leave any buffer for the
relation being dropped as otherwise the background writer or checkpointer
can lead to a PANIC error while flushing buffers corresponding to files that
don't exist.

Author: Kirk Jamison based on ideas by Amit Kapila
Reviewed-by: Kyotaro Horiguchi, Takayuki Tsunakawa, and Amit Kapila
Tested-By: Haiying Tang
Discussion: https://postgr.es/m/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660@OSBPR01MB3207.jpnprd01.prod.outlook.com
2021-01-12 07:45:40 +05:30
Tom Lane 4edf96846a Rethink SQLSTATE code for ERRCODE_IDLE_SESSION_TIMEOUT.
Move it to class 57 (Operator Intervention), which seems like a
better choice given that from the client's standpoint it behaves
a heck of a lot like, e.g., ERRCODE_ADMIN_SHUTDOWN.

In a green field I'd put ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT
here as well.  But that's been around for a few years, so it's
probably too late to change its SQLSTATE code.

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-11 14:53:42 -05:00
Thomas Munro ce6a71fa53 Use vectored I/O to fill new WAL segments.
Instead of making many block-sized write() calls to fill a new WAL file
with zeroes, make a smaller number of pwritev() calls (or various
emulations).  The actual number depends on the OS's IOV_MAX, which
PG_IOV_MAX currently caps at 32.  That means we'll write 256kB per call
on typical systems.  We may want to tune the number later with more
experience.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
2021-01-11 15:28:31 +13:00
Tom Lane afcc8772ed Fix ancient bug in parsing of BRE-mode regular expressions.
brenext(), when parsing a '*' quantifier, forgot to return any "value"
for the token; per the equivalent case in next(), it should return
value 1 to indicate that greedy rather than non-greedy behavior is
wanted.  The result is that the compiled regexp could behave like 'x*?'
rather than the intended 'x*', if we were unlucky enough to have
a zero in v->nextvalue at this point.  That seems to happen with some
reliability if we have '.*' at the beginning of a BRE-mode regexp,
although that depends on the initial contents of a stack-allocated
struct, so it's not guaranteed to fail.

Found by Alexander Lakhin using valgrind testing.  This bug seems
to be aboriginal in Spencer's code, so back-patch all the way.

Discussion: https://postgr.es/m/16814-6c5e3edd2bdf0d50@postgresql.org
2021-01-08 12:16:00 -05:00
Tom Lane b8d0cda533 Further second thoughts about idle_session_timeout patch.
On reflection, the order of operations in PostgresMain() is wrong.
These timeouts ought to be shut down before, not after, we do the
post-command-read CHECK_FOR_INTERRUPTS, to guarantee that any
timeout error will be detected there rather than at some ill-defined
later point (possibly after having wasted a lot of work).

This is really an error in the original idle_in_transaction_timeout
patch, so back-patch to 9.6 where that was introduced.
2021-01-07 11:45:23 -05:00
Fujii Masao 0650ff2303 Add GUC to log long wait times on recovery conflicts.
This commit adds GUC log_recovery_conflict_waits that controls whether
a log message is produced when the startup process is waiting longer than
deadlock_timeout for recovery conflicts. This is useful in determining
if recovery conflicts prevent the recovery from applying WAL.

Note that currently a log message is produced only when recovery conflict
has not been resolved yet even after deadlock_timeout passes, i.e.,
only when the startup process is still waiting for recovery conflict
even after deadlock_timeout.

Author: Bertrand Drouvot, Masahiko Sawada
Reviewed-by: Alvaro Herrera, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com
2021-01-08 00:47:03 +09:00
Tom Lane 9486e7b666 Improve commentary in timeout.c.
On re-reading I realized that I'd missed one race condition in the new
timeout code.  It's safe, but add a comment explaining it.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 22:09:16 -05:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Tom Lane 09cf1d5226 Improve timeout.c's handling of repeated timeout set/cancel.
A very common usage pattern is that we set a timeout that we don't
expect to reach, cancel it after a little bit, and later repeat.
With the original implementation of timeout.c, this results in one
setitimer() call per timeout set or cancel.  We can do a lot better
by being lazy about changing the timeout interrupt request, namely:
(1) never cancel the outstanding interrupt, even when we have no
active timeout events;
(2) if we need to set an interrupt, but there already is one pending
at or before the required time, leave it alone.  When the interrupt
happens, the signal handler will reschedule it at whatever time is
then needed.

For example, with a one-second setting for statement_timeout, this
method results in having to interact with the kernel only a little
more than once a second, no matter how many statements we execute
in between.  The mainline code might never call setitimer() at all
after the first time, while each time the signal handler fires,
it sees that the then-pending request is most of a second away,
and that's when it sets the next interrupt request for.  Each
mainline timeout-set request after that will observe that the time
it wants is past the pending interrupt request time, and do nothing.

This also works pretty well for cases where a few different timeout
lengths are in use, as long as none of them are very short.  But
that describes our usage well.

Idea and original patch by Thomas Munro; I fixed a race condition
and improved the comments.

Discussion: https://postgr.es/m/CA+hUKG+o6pbuHBJSGnud=TadsuXySWA7CCcPgCt2QE9F6_4iHQ@mail.gmail.com
2021-01-06 18:28:52 -05:00
Tomas Vondra 8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Peter Eisentraut 4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Fujii Masao 8900b5a9d5 Detect the deadlocks between backends and the startup process.
The deadlocks that the recovery conflict on lock is involved in can
happen between hot-standby backends and the startup process.
If a backend takes an access exclusive lock on the table and which
finally triggers the deadlock, that deadlock can be detected
as expected. On the other hand, previously, if the startup process
took an access exclusive lock and which finally triggered the deadlock,
that deadlock could not be detected and could remain even after
deadlock_timeout passed. This is a bug.

The cause of this bug was that the code for handling the recovery
conflict on lock didn't take care of deadlock case at all. It assumed
that deadlocks involving the startup process and backends were able
to be detected by the deadlock detector invoked within backends.
But this assumption was incorrect. The startup process also should
have invoked the deadlock detector if necessary.

To fix this bug, this commit makes the startup process invoke
the deadlock detector if deadlock_timeout is reached while handling
the recovery conflict on lock. Specifically, in that case, the startup
process requests all the backends holding the conflicting locks to
check themselves for deadlocks.

Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
not to back-patch the fix to v9.5. Because v9.5 doesn't have some
infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on.
We can apply those codes for the back-patch, but since the next minor
version release is the final one for v9.5, it's risky to do that. If we
unexpectedly introduce new bug to v9.5 by the back-patch, there is no
chance to fix that. We determined that the back-patch to v9.5 would give
more risk than gain.

Author: Fujii Masao
Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
2021-01-06 12:39:18 +09:00
Amit Kapila e02e840ff7 Fix typos in decode.c and logical.c.
Per report by Ajin Cherian in email:
https://postgr.es/m/CAFPTHDYnRKDvzgDxoMn_CKqXA-D0MtrbyJvfvjBsO4G=UHDXkg@mail.gmail.com
2021-01-06 08:56:19 +05:30
Tom Lane bf8a662c9a Introduce a new GUC_REPORT setting "in_hot_standby".
Aside from being queriable via SHOW, this value is sent to the client
immediately at session startup, and again later on if the server gets
promoted to primary during the session.  The immediate report will be
used in an upcoming patch to avoid an extra round trip when trying to
connect to a primary server.

Haribabu Kommi, Greg Nancarrow, Tom Lane; reviewed at various times
by Laurenz Albe, Takayuki Tsunakawa, Peter Smith.

Discussion: https://postgr.es/m/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
2021-01-05 16:18:05 -05:00
Dean Rasheed fead67c24a Add an explicit cast to double when using fabs().
Commit bc43b7c2c0 used fabs() directly on an int variable, which
apparently requires an explicit cast on some platforms.

Per buildfarm.
2021-01-05 11:52:42 +00:00
Dean Rasheed bc43b7c2c0 Fix numeric_power() when the exponent is INT_MIN.
In power_var_int(), the computation of the number of significant
digits to use in the computation used log(Abs(exp)), which isn't safe
because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs()
instead of Abs(), so that the exponent is cast to a double before the
absolute value is taken.

Back-patch to 9.6, where this was introduced (by 7d9a4737c2).

Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
2021-01-05 11:15:28 +00:00
Peter Geoghegan 83e3239ee7 Standardize one aspect of rmgr desc output.
Bring heap and hash rmgr desc output in line with nbtree and GiST desc
output by using the name latestRemovedXid for all fields that output the
contents of the latestRemovedXid field from the WAL record's C struct
(stop using local variants).

This seems like a clear improvement because latestRemovedXid is a symbol
name that already appears across many different source files, and so is
probably much more recognizable.

Discussion: https://postgr.es/m/CAH2-Wzkt_Rs4VqPSCk87nyjPAAEmWL8STU9zgET_83EF5YfrLw@mail.gmail.com
2021-01-04 19:46:11 -08:00
Amit Kapila cd357c7629 Fix typo in origin.c.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+PsReyuvww_Fn1NN_Vsv0wBP1bnzuhzRFr_2=y1nNZrG7w@mail.gmail.com
2021-01-05 08:05:08 +05:30
Amit Kapila 9da2224ea2 Fix typo in reorderbuffer.c.
Author: Zhijie Hou
Reviewed-by: Sawada Masahiko
Discussion: https://postgr.es/m/ba88bb58aaf14284abca16aec04bf279@G08CNEXMBPEKD05.g08.fujitsu.local
2021-01-05 07:56:40 +05:30
Thomas Munro 034510c820 Replace remaining uses of "whitelist".
Instead describe the action that the list effects, or just use "list"
where the meaning is obvious from context.

Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 14:00:16 +13:00
Thomas Munro c0d4f6d897 Rename "enum blacklist" to "uncommitted enums".
We agreed to remove this terminology and use something more descriptive.

Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 12:38:48 +13:00
Tom Lane 4bd3fad80e Fix integer-overflow corner cases in substring() functions.
If the substring start index and length overflow when added together,
substring() misbehaved, either throwing a bogus "negative substring
length" error on a case that should succeed, or failing to complain that
a negative length is negative (and instead returning the whole string,
in most cases).  Unsurprisingly, the text, bytea, and bit variants of
the function all had this issue.  Rearrange the logic to ensure that
negative lengths are always rejected, and add an overflow check to
handle the other case.

Also install similar guards into detoast_attr_slice() (nee
heap_tuple_untoast_attr_slice()), since it's far from clear that
no other code paths leading to that function could pass it values
that would overflow.

Patch by myself and Pavel Stehule, per bug #16804 from Rafi Shamim.

Back-patch to v11.  While these bugs are old, the common/int.h
infrastructure for overflow-detecting arithmetic didn't exist before
commit 4d6ad3125, and it doesn't seem like these misbehaviors are bad
enough to justify developing a standalone fix for the older branches.

Discussion: https://postgr.es/m/16804-f4eeeb6c11ba71d4@postgresql.org
2021-01-04 18:32:44 -05:00
Tom Lane 1c1cbe279b Rethink the "read/write parameter" mechanism in pl/pgsql.
Performance issues with the preceding patch to re-implement array
element assignment within pl/pgsql led me to realize that the read/write
parameter mechanism is misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

Hence, rewrite that logic to identify one single Param that references
the target variable, and make only that Param pass a read/write
reference, not any other Params referencing the target variable.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 12:39:27 -05:00
Tom Lane c9d5298485 Re-implement pl/pgsql's expression and assignment parsing.
Invent new RawParseModes that allow the core grammar to handle
pl/pgsql expressions and assignments directly, and thereby get rid
of a lot of hackery in pl/pgsql's parser.  This moves a good deal
of knowledge about pl/pgsql into the core code: notably, we have to
invent a CoercionContext that matches pl/pgsql's (rather dubious)
historical behavior for assignment coercions.  That's getting away
from the original idea of pl/pgsql as an arm's-length extension of
the core, but really we crossed that bridge a long time ago.

The main advantage of doing this is that we can now use the core
parser to generate FieldStore and/or SubscriptingRef nodes to handle
assignments to pl/pgsql variables that are records or arrays.  That
fixes a number of cases that had never been implemented in pl/pgsql
assignment, such as nested records and array slicing, and it allows
pl/pgsql assignment to support the datatype-specific subscripting
behaviors introduced in commit c7aba7c14.

There are cosmetic benefits too: when a syntax error occurs in a
pl/pgsql expression, the error report no longer includes the confusing
"SELECT" keyword that used to get prefixed to the expression text.
Also, there seem to be some small speed gains.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:52:00 -05:00
Tom Lane 844fe9f159 Add the ability for the core grammar to have more than one parse target.
This patch essentially allows gram.y to implement a family of related
syntax trees, rather than necessarily always parsing a list of SQL
statements.  raw_parser() gains a new argument, enum RawParseMode,
to say what to do.  As proof of concept, add a mode that just parses
a TypeName without any other decoration, and use that to greatly
simplify typeStringToTypeName().

In addition, invent a new SPI entry point SPI_prepare_extended() to
allow SPI users (particularly plpgsql) to get at this new functionality.
In hopes of making this the last variant of SPI_prepare(), set up its
additional arguments as a struct rather than direct arguments, and
promise that future additions to the struct can default to zero.
SPI_prepare_cursor() and SPI_prepare_params() can perhaps go away at
some point.

Discussion: https://postgr.es/m/4165684.1607707277@sss.pgh.pa.us
2021-01-04 11:03:22 -05:00
Michael Paquier b49154b3b7 Simplify some comments in xml.c
Author: Justin Pryzby
Discussion: https://postgr.es/m/X/Ff7jfnvJUab013@paquier.xyz
2021-01-04 19:47:58 +09:00
Amit Kapila a271a1b50e Allow decoding at prepare time in ReorderBuffer.
This patch allows PREPARE-time decoding of two-phase transactions (if the
output plugin supports this capability), in which case the transactions
are replayed at PREPARE and then committed later when COMMIT PREPARED
arrives.

Now that we decode the changes before the commit, the concurrent aborts
may cause failures when the output plugin consults catalogs (both system
and user-defined).

We detect such failures with a special sqlerrcode
ERRCODE_TRANSACTION_ROLLBACK introduced by commit 7259736a6e and stop
decoding the remaining changes. Then we rollback the changes when rollback
prepared is encountered.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, Arseny Sher, and Dilip Kumar
Tested-by: Takamichi Osumi
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-01-04 08:34:50 +05:30
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Peter Geoghegan 32d6287d2e Get heap page max offset with buffer lock held.
On further reflection it seems better to call PageGetMaxOffsetNumber()
after acquiring a buffer lock on the page.  This shouldn't really
matter, but doing it this way is cleaner.

Follow-up to commit 42288174.

Backpatch: 12-, just like commit 42288174
2020-12-30 17:21:42 -08:00
Peter Geoghegan 4228817449 Fix index deletion latestRemovedXid bug.
The logic for determining the latest removed XID for the purposes of
generating recovery conflicts in REDO routines was subtly broken.  It
failed to follow links from HOT chains, and so failed to consider all
relevant heap tuple headers in some cases.

To fix, expand the loop that deals with LP_REDIRECT line pointers to
also deal with HOT chains.  The new version of the loop is loosely based
on a similar loop from heap_prune_chain().

The impact of this bug is probably quite limited, since the horizon code
necessarily deals with heap tuples that are pointed to by LP_DEAD-set
index tuples.  The process of setting LP_DEAD index tuples (e.g. within
the kill_prior_tuple mechanism) is highly correlated with opportunistic
pruning of pointed-to heap tuples.  Plus the question of generating a
recovery conflict usually comes up some time after index tuple LP_DEAD
bits were initially set, unlike heap pruning, where a latestRemovedXid
is generated at the point of the pruning operation (heap pruning has no
deferred "would-be page split" style processing that produces conflicts
lazily).

Only backpatch to Postgres 12, the first version where this logic runs
during original execution (following commit 558a9165e0).  The index
latestRemovedXid mechanism has had the same bug since it first appeared
over 10 years ago (in commit a760893d), but backpatching to all
supported versions now seems like a bad idea on balance.  Running the
new improved code during recovery seems risky, especially given the lack
of complaints from the field.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wz=Eib393+HHcERK_9MtgNS7Ew1HY=RDC_g6GL46zM5C6Q@mail.gmail.com
Backpatch: 12-
2020-12-30 16:29:05 -08:00
Alexander Korotkov 16d531a30a Refactor multirange_in()
This commit preserves the logic of multirange_in() but makes it more clear
what's going on.  Also, this commit fixes the compiler warning spotted by the
buildfarm.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/2246043.1609290699%40sss.pgh.pa.us
2020-12-30 21:17:34 +03:00
Tom Lane 7ca37fb040 Use setenv() in preference to putenv().
Since at least 2001 we've used putenv() and avoided setenv(), on the
grounds that the latter was unportable and not in POSIX.  However,
POSIX added it that same year, and by now the situation has reversed:
setenv() is probably more portable than putenv(), since POSIX now
treats the latter as not being a core function.  And setenv() has
cleaner semantics too.  So, let's reverse that old policy.

This commit adds a simple src/port/ implementation of setenv() for
any stragglers (we have one in the buildfarm, but I'd not be surprised
if that code is never used in the field).  More importantly, extend
win32env.c to also support setenv().  Then, replace usages of putenv()
with setenv(), and get rid of some ad-hoc implementations of setenv()
wannabees.

Also, adjust our src/port/ implementation of unsetenv() to follow the
POSIX spec that it returns an error indicator, rather than returning
void as per the ancient BSD convention.  I don't feel a need to make
all the call sites check for errors, but the portability stub ought
to match real-world practice.

Discussion: https://postgr.es/m/2065122.1609212051@sss.pgh.pa.us
2020-12-30 12:56:06 -05:00
Alexander Korotkov 62097a4cc8 Fix selectivity estimation @> (anymultirange, anyrange) operator
Attempt to get selectivity estimation for @> (anymultirange, anyrange) operator
caused an error in buildfarm, because this operator was missed in switch()
of calc_hist_selectivity().  Fix that and also make regression tests reliably
check that selectivity estimation for (multi)ranges doesn't fall.  Previously,
whether we test selectivity estimation for (multi)ranges depended on
whether autovacuum managed to gather concurrently to the test.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/X%2BwmgjRItuvHNBeV%40paquier.xyz
2020-12-30 20:31:15 +03:00
Tom Lane 860fe27ee1 Fix up usage of krb_server_keyfile GUC parameter.
secure_open_gssapi() installed the krb_server_keyfile setting as
KRB5_KTNAME unconditionally, so long as it's not empty.  However,
pg_GSS_recvauth() only installed it if KRB5_KTNAME wasn't set already,
leading to a troubling inconsistency: in theory, clients could see
different sets of server principal names depending on whether they
use GSSAPI encryption.  Always using krb_server_keyfile seems like
the right thing, so make both places do that.  Also fix up
secure_open_gssapi()'s lack of a check for setenv() failure ---
it's unlikely, surely, but security-critical actions are no place
to be sloppy.

Also improve the associated documentation.

This patch does nothing about secure_open_gssapi()'s use of setenv(),
and indeed causes pg_GSS_recvauth() to use it too.  That's nominally
against project portability rules, but since this code is only built
with --with-gssapi, I do not feel a need to do something about this
in the back branches.  A fix will be forthcoming for HEAD though.

Back-patch to v12 where GSSAPI encryption was introduced.  The
dubious behavior in pg_GSS_recvauth() goes back further, but it
didn't have anything to be inconsistent with, so let it be.

Discussion: https://postgr.es/m/2187460.1609263156@sss.pgh.pa.us
2020-12-30 11:38:42 -05:00
Michael Paquier e665769e6d Sanitize IF NOT EXISTS in EXPLAIN for CTAS and matviews
IF NOT EXISTS was ignored when specified in an EXPLAIN query for CREATE
MATERIALIZED VIEW or CREATE TABLE AS.  Hence, if this clause was
specified, the caller would get a failure if the relation already
exists instead of a success with a NOTICE message.

This commit makes the behavior of IF NOT EXISTS in EXPLAIN consistent
with the non-EXPLAIN'd DDL queries, preventing a failure with IF NOT
EXISTS if the relation to-be-created already exists.  The skip is done
before the SELECT query used for the relation is planned or executed,
and a "dummy" plan is generated instead depending on the format used by
EXPLAIN.

Author: Bharath Rupireddy
Reviewed-by: Zhijie Hou, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVa3oJ9O_wcGd+FtHWZds04dEKcakxphGz5POVgD4wC7Q@mail.gmail.com
2020-12-30 21:23:24 +09:00
Amit Kapila 0aa8a01d04 Extend the output plugin API to allow decoding of prepared xacts.
This adds six methods to the output plugin API, adding support for
streaming changes of two-phase transactions at prepare time.

* begin_prepare
* filter_prepare
* prepare
* commit_prepared
* rollback_prepared
* stream_prepare

Most of this is a simple extension of the existing methods, with the
semantic difference that the transaction is not yet committed and maybe
aborted later.

Until now two-phase transactions were translated into regular transactions
on the subscriber, and the GID was not forwarded to it. None of the
two-phase commands were communicated to the subscriber.

This patch provides the infrastructure for logical decoding plugins to be
informed of two-phase commands Like PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED commands with the corresponding GID.

This also extends the 'test_decoding' plugin, implementing these new
methods.

This commit simply adds these new APIs and the upcoming patch to "allow
the decoding at prepare time in ReorderBuffer" will use these APIs.

Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich
Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, and Dilip Kumar
Discussion:
https://postgr.es/m/02DA5F5E-CECE-4D9C-8B4B-418077E2C010@postgrespro.ru
https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2020-12-30 16:17:26 +05:30
Tom Lane 1f9158ba48 Suppress log spam from multiple reports of SIGQUIT shutdown.
When the postmaster sends SIGQUIT to its children, there's no real
need for all the children to log that fact; the postmaster already
made a log entry about it, so adding perhaps dozens or hundreds of
child-process log entries adds nothing of value.  So, let's introduce
a new ereport level to specify "WARNING, but never send to log" and
use that for these messages.

Such a change wouldn't have been desirable before commit 7e784d1dc,
because if someone manually SIGQUIT's a backend, we *do* want to log
that.  But now we can tell the difference between a signal that was
issued by the postmaster and one that was not with reasonable
certainty.

While we're here, also clear error_context_stack before ereport'ing,
to prevent error callbacks from being invoked in the signal-handler
context.  This should reduce the odds of getting hung up while trying
to notify the client.

Per a suggestion from Andres Freund.

Discussion: https://postgr.es/m/20201225230331.hru3u6obyy6j53tk@alap3.anarazel.de
2020-12-29 18:02:38 -05:00
Alexander Korotkov db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov d1d61a8b23 Improve the signature of internal multirange functions
There is a set of *_internal() functions exposed in
include/utils/multirangetypes.h.  This commit improves the signatures of these
functions in two ways.
 * Add const qualifies where applicable.
 * Replace multirange typecache argument with range typecache argument.
   Multirange typecache was used solely to find the range typecache.  At the
   same time, range typecache is easier for the caller to find.
2020-12-29 23:35:38 +03:00
Alexander Korotkov 4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Alexander Korotkov a5b81b6f00 Fix bugs in comparison functions for multirange_bsearch_match()
Two functions multirange_range_overlaps_bsearch_comparison() and
multirange_range_contains_bsearch_comparison() contain bugs of returning -1
instead of 1.  This commit fixes these bugs and adds corresponding regression
tests.
2020-12-29 23:35:26 +03:00
Tom Lane 3995c42498 Improve log messages related to pg_hba.conf not matching a connection.
Include details on whether GSS encryption has been activated;
since we added "hostgssenc" type HBA entries, that's relevant info.

Kyotaro Horiguchi and Tom Lane.  Back-patch to v12 where
GSS encryption was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:58:58 -05:00
Tom Lane 622ae4621e Fix assorted issues in backend's GSSAPI encryption support.
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync.  Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.

Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.

Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.

Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)

Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus().  This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.

Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.

Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.

Per report from Mikael Gustavsson.  Back-patch to v12 where
this code was introduced.

Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
2020-12-28 17:44:17 -05:00
Michael Paquier 643428c54b Fix inconsistent code with shared invalidations of snapshots
The code in charge of processing a single invalidation message has been
using since 568d413 the structure for relation mapping messages.  This
had fortunately no consequence as both locate the database ID at the
same location, but it could become a problem in the future if this area
of the code changes.

Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/8044c223-4d3a-2cdb-42bf-29940840ce94@postgrespro.ru
Backpatch-through: 9.5
2020-12-28 22:16:49 +09:00
Bruce Momjian 3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Jeff Davis 05c0258966 Fix bug #16784 in Disk-based Hash Aggregation.
Before processing tuples, agg_refill_hash_table() was setting all
pergroup pointers to NULL to signal to advance_aggregates() that it
should not attempt to advance groups that had spilled.

The problem was that it also set the pergroups for sorted grouping
sets to NULL, which caused rescanning to fail.

Instead, change agg_refill_hash_table() to only set the pergroups for
hashed grouping sets to NULL; and when compiling the expression, pass
doSort=false.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16784-7ff169bf2c3d1588%40postgresql.org
Backpatch-through: 13
2020-12-26 17:25:30 -08:00
Bruce Momjian ba6725df36 auth commands: list specific commands to install in Makefile
Previously I used Makefile functions.

Backpatch-through: master
2020-12-26 09:25:05 -05:00
Bruce Momjian d7602afa2e Add scripts for retrieving the cluster file encryption key
Scripts are passphrase, direct, AWS, and two Yubikey ones.

Backpatch-through: master
2020-12-26 01:19:09 -05:00
Bruce Momjian 300e430c76 Allow ssl_passphrase_command to prompt the terminal
Previously the command could not access the terminal for a passphrase.

Backpatch-through: master
2020-12-25 20:41:06 -05:00
Noah Misch 08db7c63f3 Invalidate acl.c caches when pg_authid changes.
This makes existing sessions reflect "ALTER ROLE ... [NO]INHERIT" as
quickly as they have been reflecting "GRANT role_name".  Back-patch to
9.5 (all supported versions).

Reviewed by Nathan Bossart.

Discussion: https://postgr.es/m/20201221095028.GB3777719@rfd.leadboat.com
2020-12-25 10:41:59 -08:00
Bruce Momjian 978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Tom Lane 7519bd16d1 Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state.  This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.

To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point.  (We can optimize things slightly by only
doing the cancellation for workers that have waiters.)  To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry.  Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before.  We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.

Back-patch to v10.  There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.

Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
2020-12-24 17:00:43 -05:00
Tom Lane 7e784d1dc1 Improve client error messages for immediate-stop situations.
Up to now, if the DBA issued "pg_ctl stop -m immediate", the message
sent to clients was the same as for a crash-and-restart situation.
This is confusing, not least because the message claims that the
database will soon be up again, something we have no business
predicting.

Improve things so that we can generate distinct messages for the two
cases (and also recognize an ad-hoc SIGQUIT, should somebody try that).
To do that, add a field to pmsignal.c's shared memory data structure
that the postmaster sets just before broadcasting SIGQUIT to its
children.  No interlocking seems to be necessary; the intervening
signal-sending and signal-receipt should sufficiently serialize accesses
to the field.  Hence, this isn't any riskier than the existing usages
of pmsignal.c.

We might in future extend this idea to improve other
postmaster-to-children signal scenarios, although none of them
currently seem to be as badly overloaded as SIGQUIT.

Discussion: https://postgr.es/m/559291.1608587013@sss.pgh.pa.us
2020-12-24 12:58:32 -05:00
Michael Paquier 90fbf7c57d Fix typos and grammar in docs and comments
This fixes several areas of the documentation and some comments in
matters of style, grammar, or even format.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201222041153.GK30237@telsasoft.com
2020-12-24 17:05:49 +09:00
Michael Paquier 6db27037b9 Fix portability issues with parsing of recovery_target_xid
The parsing of this parameter has been using strtoul(), which is not
portable across platforms.  On most Unix platforms, unsigned long has a
size of 64 bits, while on Windows it is 32 bits.  It is common in
recovery scenarios to rely on the output of txid_current() or even the
newer pg_current_xact_id() to get a transaction ID for setting up
recovery_target_xid.  The value returned by those functions includes the
epoch in the computed result, which would cause strtoul() to fail where
unsigned long has a size of 32 bits once the epoch is incremented.

WAL records and 2PC data include only information about 32-bit XIDs and
it is not possible to have XIDs across more than one epoch, so
discarding the high bits from the transaction ID set has no impact on
recovery.  On the contrary, the use of strtoul() prevents a consistent
behavior across platforms depending on the size of unsigned long.

This commit changes the parsing of recovery_target_xid to use
pg_strtouint64() instead, available down to 9.6.  There is one TAP test
stressing recovery with recovery_target_xid, where a tweak based on
pg_reset{xlog,wal} is added to bump the XID epoch so as this change gets
tested, as per an idea from Alexander Lakhin.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/16780-107fd0c0385b1035@postgresql.org
Backpatch-through: 9.6
2020-12-23 12:51:22 +09:00
Tomas Vondra 1ca2eb1031 Improve find_em_expr_usable_for_sorting_rel comment
Clarify the relationship between find_em_expr_usable_for_sorting_rel and
prepare_sort_from_pathkeys, i.e. what restrictions need to be shared
between those two places.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-22 02:00:51 +01:00
Tomas Vondra 9aff4dc01f Don't search for volatile expr in find_em_expr_usable_for_sorting_rel
While prepare_sort_from_pathkeys has to be concerned about matching up
a volatile expression to the proper tlist entry, we don't need to do
that in find_em_expr_usable_for_sorting_rel becausee such a sort will
have to be postponed anyway.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
2020-12-21 20:05:11 +01:00
Tomas Vondra fac1b470a9 Disallow SRFs when considering sorts below Gather Merge
While we do allow SRFs in ORDER BY, scan/join processing should not
consider such cases - such sorts should only happen via final Sort atop
a ProjectSet. So make sure we don't try adding such sorts below Gather
Merge, just like we do for expressions that are volatile and/or not
parallel safe.

Backpatch to PostgreSQL 13, where this code was introduced as part of
the Incremental Sort patch.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
Discussion: https://postgr.es/m/295524.1606246314%40sss.pgh.pa.us
2020-12-21 19:36:22 +01:00
Tom Lane ff5d5611c0 Remove "invalid concatenation of jsonb objects" error case.
The jsonb || jsonb operator arbitrarily rejected certain combinations
of scalar and non-scalar inputs, while being willing to concatenate
other combinations.  This was of course quite undocumented.  Rather
than trying to document it, let's just remove the restriction,
creating a uniform rule that unless we are handling an object-to-object
concatenation, non-array inputs are converted to one-element arrays,
resulting in an array-to-array concatenation.  (This does not change
the behavior for any case that didn't throw an error before.)

Per complaint from Joel Jacobson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/163099.1608312033@sss.pgh.pa.us
2020-12-21 13:11:50 -05:00
Tomas Vondra 86b7cca72d Check parallel safety in generate_useful_gather_paths
Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions
when considering adding a sort below a Gather Merge. Turns out we need
to care about parallel safety of the pathkeys too, otherwise we might
try sorting e.g. on results of a correlated subquery (as demonstrated
by a report from Luis Roberto).

Initial investigation by Tom Lane, patch by James Coleman. Backpatch
to 13, where the code was instroduced (as part of Incremental Sort).

Reported-by: Luis Roberto
Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:29:49 +01:00
Tomas Vondra f4a3c0b062 Consider unsorted paths in generate_useful_gather_paths
generate_useful_gather_paths used to skip unsorted paths (without any
pathkeys), but that is unnecessary - the later code actually can handle
such paths just fine by adding a Sort node. This is clearly a thinko,
preventing construction of useful plans.

Backpatch to 13, where Incremental Sort was introduced.

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com
2020-12-21 18:10:20 +01:00
Alexander Korotkov 29f8f54676 Fix compiler warning in multirange_constructor0()
Discussion: https://postgr.es/m/X%2BBP8XE0UpIB6Yvh%40paquier.xyz
Author: Michael Paquier
2020-12-21 14:25:32 +03:00
Michael Paquier 93e8ff8701 Refactor logic to check for ASCII-only characters in string
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.

Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
2020-12-21 09:37:11 +09:00
Alexander Korotkov 4e1ee79e31 Fix typalign in rangetypes statistics
6df7a9698b introduces multirange types, whose typanalyze function shares
infrastructure with range types typanalyze function.  Since 6df7a9698b,
information about type gathered by statistics is filled from typcache.
But typalign is mistakenly always set to double.  This commit fixes this
oversight.
2020-12-21 00:31:11 +03:00
Tom Lane ed6329cfa9 Avoid memcpy() with same source and destination in pgstat_recv_replslot.
Same type of issue as in commit 53d4f5fef and earlier fixes; also
found by apparently-more-picky-than-the-buildfarm valgrind testing.
This one is an oversight in commit 986816750.  Since that's new in
HEAD, no need for a back-patch.
2020-12-20 12:38:32 -05:00
Alexander Korotkov 11072e8693 Fix compiler warning introduced in 6df7a9698b 2020-12-20 16:27:01 +03:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Amit Kapila 20659fd8e5 Update comment atop of ReorderBufferQueueMessage().
The comments atop of this function describes behaviour in case of a
transactional WAL message only, but it accepts both transactional and
non-transactional WAL messages. Update the comments to describe
behaviour in case of non-transactional WAL message as well.

Ashutosh Bapat, rephrased by Amit Kapila
Discussion: https://postgr.es/m/CAGEoWWTTzNzHOi8bj0wfAo1siGi-YEh6wqH1oaz4DrkTJ6HbTQ@mail.gmail.com
2020-12-19 10:08:46 +05:30
Tom Lane 53d4f5fef0 Avoid memcpy() with same source and destination during relmapper init.
A narrow reading of the C standard says that memcpy(x,x,n) is undefined,
although it's hard to envision an implementation that would really
misbehave.  However, analysis tools such as valgrind might whine about
this; accordingly, let's band-aid relmapper.c to not do it.

See also 5b630501e, d3f4e8a8a, ad7b48ea0, and other similar fixes.
Apparently, none of those folk tried valgrinding initdb?  This has been
like this for long enough that I'm surprised it hasn't been reported
before.

Back-patch, just in case anybody wants to use a back branch on a platform
that complains about this; we back-patched those earlier fixes too.

Discussion: https://postgr.es/m/161790.1608310142@sss.pgh.pa.us
2020-12-18 15:46:44 -05:00
Fujii Masao 00f690a239 Revert "Get rid of the dedicated latch for signaling the startup process".
Revert ac22929a26, as well as the followup fix 113d3591b8. Because it broke
the assumption that the startup process waiting for the recovery conflict
on buffer pin should be waken up only by buffer unpin or the timeout enabled
in ResolveRecoveryConflictWithBufferPin(). It caused, for example,
SIGHUP signal handler or walreceiver process to wake that startup process
up unnecessarily frequently.

Additionally, add the comments about why that dedicated latch that
the reverted patch tried to get rid of should not be removed.

Thanks to Kyotaro Horiguchi for the discussion.

Author: Fujii Masao
Discussion: https://postgr.es/m/d8c0c608-021b-3c73-fffd-3240829ee986@oss.nttdata.com
2020-12-17 18:06:51 +09:00
Peter Geoghegan 41ddc27f66 Remove obsolete btrescan() comment.
"Ordering stuff" refered to a _bt_first() call to _bt_orderkeys().
However, the _bt_orderkeys() function was renamed to
_bt_preprocess_keys() by commit fa5c8a055a.

_bt_preprocess_keys() is directly referenced just after the removed
comment already, which seems sufficient.
2020-12-15 15:55:07 -08:00
Alvaro Herrera a18422a3ad
Remove useless variable stores
Mistakenly introduced in 4cbe3ac3e867; bug repaired in 148e632c05 but
the stores were accidentally.
2020-12-15 19:51:16 -03:00
Tomas Vondra 6bc2769832 Error out when Gather Merge input is not sorted
To build Gather Merge path, the input needs to be sufficiently sorted.
Ensuring this is the responsibility of the code constructing the paths,
but create_gather_merge_plan tried to handle unsorted paths by adding
an explicit Sort. In light of the recent issues related to Incremental
Sort, this is rather fragile. Some of the expressions may be volatile
or parallel unsafe, in which case we can't add the Sort here.

We could do more checks and add the Sort in at least some cases, but
it seems cleaner to just error out and make it clear this is a bug in
code constructing those paths.

Author: James Coleman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com
Discussion: https://postgr.es/m/CAJGNTeNaxpXgBVcRhJX%2B2vSbq%2BF2kJqGBcvompmpvXb7pq%2BoFA%40mail.gmail.com
2020-12-15 23:19:41 +01:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Jeff Davis a58db3aa10 Revert "Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE."
This reverts commit 3a9e64aa0d.

Commit 4bad60e3 fixed the root of the problem that 3a9e64aa worked
around.

This enables proper pipelining of commands after terminating
replication, eliminating an undocumented limitation.

Discussion: https://postgr.es/m/3d57bc29-4459-578b-79cb-7641baf53c57%40iki.fi
Backpatch-through: 9.5
2020-12-14 23:47:30 -08:00
Michael Paquier df9274adf3 Add some checkpoint/restartpoint status to ps display
This is done for end-of-recovery and shutdown checkpoints/restartpoints
(end-of-recovery restartpoints don't exist) rather than all types of
checkpoints, in cases where it may not be possible to rely on
pg_stat_activity to get a status from the startup or checkpointer
processes.

For example, at the end of a crash recovery, this is useful to know if a
checkpoint is running in the startup process, while previously the ps
display may only show some information about "recovering" something,
that can be confusing while a checkpoint runs.

Author: Justin Pryzby
Reviewed-by: Nathan Bossart, Kirk Jamison, Fujii Masao, Michael Paquier
Discussion: https://postgr.es/m/20200818225238.GP17022@telsasoft.com
2020-12-14 11:53:58 +09:00
Noah Misch a1b8aa1e4e Use HASH_BLOBS for xidhash.
This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
possible.  Conditions for reaching those symptoms were more frequent on
big-endian systems.

Discussion: https://postgr.es/m/20201129214441.GA691200@rfd.leadboat.com
2020-12-12 21:38:36 -08:00
Noah Misch 73aae4522b Correct behavior descriptions in comments, and correct a test name. 2020-12-12 20:12:25 -08:00
Tom Lane 8c15a29745 Allow ALTER TYPE to update an existing type's typsubscript value.
This is essential if we'd like to allow existing extension data types
to support subscripting in future, since dropping and recreating the
type isn't a practical thing for an extension upgrade script, and
direct manipulation of pg_type isn't a great answer either.

There was some discussion about also allowing alteration of typelem,
but it's less clear whether that's a good idea or not, so for now
I forebore.

Discussion: https://postgr.es/m/3724341.1607551174@sss.pgh.pa.us
2020-12-11 18:58:21 -05:00
Tom Lane 653aa603f5 Provide an error cursor for "can't subscript" error messages.
Commit c7aba7c14 didn't add this, but after more fooling with the
feature I feel that it'd be useful.  To make this possible, refactor
getSubscriptingRoutines() so that the caller is responsible for
throwing any error.  (In clauses.c, I just chose to make the
most conservative assumption rather than throwing an error.  We don't
expect failures there anyway really, so the code space for an error
message would be a poor investment.)
2020-12-11 18:58:21 -05:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut 8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Andres Freund df99ddc70b jit: Reference function pointer types via llvmjit_types.c.
It is error prone (see 5da871bfa1) and verbose to manually create function
types. Add a helper that can reference a function pointer type via
llvmjit_types.c and and convert existing instances of manual creation.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20201207212142.wz5tnbk2jsaqzogb@alap3.anarazel.de
2020-12-08 16:55:20 -08:00
Tom Lane 62ee703313 Teach contain_leaked_vars that assignment SubscriptingRefs are leaky.
array_get_element and array_get_slice qualify as leakproof, since
they will silently return NULL for bogus subscripts.  But
array_set_element and array_set_slice throw errors for such cases,
making them clearly not leakproof.  contain_leaked_vars was evidently
written with only the former case in mind, as it gave the wrong answer
for assignment SubscriptingRefs (nee ArrayRefs).

This would be a live security bug, were it not that assignment
SubscriptingRefs can only occur in INSERT and UPDATE target lists,
while we only care about leakproofness for qual expressions; so the
wrong answer can't occur in practice.  Still, that's a rather shaky
answer for a security-related question; and maybe in future somebody
will want to ask about leakproofness of a tlist.  So it seems wise to
fix and even back-patch this correction.

(We would need some change here anyway for the upcoming
generic-subscripting patch, since extensions might make different
tradeoffs about whether to throw errors.  Commit 558d77f20 attempted
to lay groundwork for that by asking check_functions_in_node whether a
SubscriptingRef contains leaky functions; but that idea fails now that
the implementation methods of a SubscriptingRef are not SQL-visible
functions that could be marked leakproof or not.)

Back-patch to 9.6.  While 9.5 has the same issue, the code's a bit
different.  It seems quite unlikely that we'd introduce any actual bug
in the short time 9.5 has left to live, so the work/risk/reward balance
isn't attractive for changing 9.5.

Discussion: https://postgr.es/m/3143742.1607368115@sss.pgh.pa.us
2020-12-08 17:50:54 -05:00
Tom Lane a676386b58 Remove operator_precedence_warning.
This GUC was always intended as a temporary solution to help with
finding 9.4-to-9.5 migration issues.  Now that all pre-9.5 branches
are out of support, and 9.5 will be too before v14 is released,
it seems like it's okay to drop it.  Doing so allows removal of
several hundred lines of poorly-tested code in parse_expr.c,
which have been a fertile source of bugs when people did use this.

Discussion: https://postgr.es/m/2234320.1607117945@sss.pgh.pa.us
2020-12-08 16:29:52 -05:00
Dean Rasheed 4f5760d4af Improve estimation of ANDs under ORs using extended statistics.
Formerly, extended statistics only handled clauses that were
RestrictInfos. However, the restrictinfo machinery doesn't create
sub-AND RestrictInfos for AND clauses underneath OR clauses.
Therefore teach extended statistics to handle bare AND clauses,
looking for compatible RestrictInfo clauses underneath them.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 20:10:11 +00:00
Dean Rasheed 88b0898fe3 Improve estimation of OR clauses using multiple extended statistics.
When estimating an OR clause using multiple extended statistics
objects, treat the estimates for each set of clauses for each
statistics object as independent of one another. The overlap estimates
produced for each statistics object do not apply to clauses covered by
other statistics objects.

Dean Rasheed, reviewed by Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=CfBvWRVhFLJ_fWA@mail.gmail.com
2020-12-08 19:39:24 +00:00
Fujii Masao e2ac3fed3b Speed up rechecking if relation needs to be vacuumed or analyze in autovacuum.
After autovacuum collects the relations to vacuum or analyze, it rechecks
whether each relation still needs to be vacuumed or analyzed before actually
doing that. Previously this recheck could be a significant overhead
especially when there were a very large number of relations. This was
because each recheck forced the statistics to be refreshed, and the refresh
of the statistics for a very large number of relations could cause heavy
overhead. There was the report that this issue caused autovacuum workers
to have gotten “stuck” in a tight loop of table_recheck_autovac() that
rechecks whether a relation needs to be vacuumed or analyzed.

This commit speeds up the recheck by making autovacuum worker reuse
the previously-read statistics for the recheck if possible. Then if that
"stale" statistics says that a relation still needs to be vacuumed or analyzed,
autovacuum refreshes the statistics and does the recheck again.

The benchmark shows that the more relations exist and autovacuum workers
are running concurrently, the more this change reduces the autovacuum
execution time. For example, when there are 20,000 tables and 10 autovacuum
workers are running, the benchmark showed that the change improved
the performance of autovacuum more than three times. On the other hand,
even when there are only 1000 tables and only a single autovacuum worker
is running, the benchmark didn't show any big performance regression by
the change.

Firstly POC patch was proposed by Jim Nasby. As the result of discussion,
we used Tatsuhito Kasahara's version of the patch using the approach
suggested by Tom Lane.

Reported-by: Jim Nasby
Author: Tatsuhito Kasahara
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/3FC6C2F2-8A47-44C0-B997-28830B5716D0@amazon.com
2020-12-08 23:59:39 +09:00
Andres Freund 5da871bfa1 jit: Correct parameter type for generated expression evaluation functions.
clang only uses the 'i1' type for scalar booleans, not for pointers to
booleans (as the pointer might be pointing into a larger memory
allocation). Therefore a pointer-to-bool needs to the "storage" boolean.

There's no known case of wrong code generation due to this, but it seems quite
possible that it could cause problems (see e.g. 72559438f9).

Author: Andres Freund
Discussion: https://postgr.es/m/20201207212142.wz5tnbk2jsaqzogb@alap3.anarazel.de
Backpatch: 11-, where jit support was added
2020-12-07 19:34:13 -08:00
Michael Paquier 947789f1f5 Avoid using tuple from syscache for update of pg_database.datfrozenxid
pg_database.datfrozenxid gets updated using an in-place update at the
end of vacuum or autovacuum.  Since 96cdeae, as pg_database has a toast
relation, it is possible for a pg_database tuple to have toast values
if there is a large set of ACLs in place.  In such a case, the in-place
update would fail because of the flattening of the toast values done for
the catcache entry fetched.  Instead of using a copy from the catcache,
this changes the logic to fetch the copy of the tuple by directly
scanning pg_database.

Per the lack of complaints on the matter, no backpatch is done.  Note
that before 96cdeae, attempting to insert such a tuple to pg_database
would cause a "row is too big" error, so the end-of-vacuum problem was
not reachable.

Author: Ashwin Agrawal, Junfeng Yang
Discussion: https://postgr.es/m/DM5PR0501MB38800D9E4605BCA72DD35557CCE10@DM5PR0501MB3880.namprd05.prod.outlook.com
2020-12-08 12:13:19 +09:00
Tom Lane e98c900993 Fix missed step in removal of useless RESULT RTEs in the planner.
Commit 4be058fe9 forgot that the append_rel_list would already be
populated at the time we remove useless result RTEs, and it might contain
PlaceHolderVars that need to be adjusted like the ones in the main parse
tree.  This could lead to "no relation entry for relid N" failures later
on, when the planner tries to do something with an unadjusted PHV.

Per report from Tom Ellis.  Back-patch to v12 where the bug came in.

Discussion: https://postgr.es/m/20201205173056.GF30712@cloudinit-builder
2020-12-05 16:16:13 -05:00
Peter Eisentraut eb93f3a0b6 Convert elog(LOG) calls to ereport() where appropriate
User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2020-12-04 14:25:23 +01:00
Peter Eisentraut a6964bc1bb Remove unnecessary grammar symbols
Instead of publication_name_list, we can use name_list.  We already
refer to publications everywhere else by the 'name' or 'name_list'
symbols, so this only improves consistency.

Reviewed-by: https://www.postgresql.org/message-id/flat/3e3ccddb-41bd-ecd8-29fe-195e34d9886f%40enterprisedb.com
Discussion: Tom Lane <tgl@sss.pgh.pa.us>
2020-12-04 11:16:26 +01:00
Amit Kapila 8ae4ef4fb0 Remove incorrect assertion in reorderbuffer.c.
We start recording changes in ReorderBufferTXN even before we reach
SNAPBUILD_CONSISTENT state so that if the commit is encountered after
reaching that we should be able to send the changes of the entire transaction.
Now, while recording changes if the reorder buffer memory has exceeded
logical_decoding_work_mem then we can start streaming if it is allowed and
we haven't yet streamed that data. However, we must not allow streaming to
start unless the snapshot has reached SNAPBUILD_CONSISTENT state.

In passing, improve the comments atop ReorderBufferResetTXN to mention the
case when we need to continue streaming after getting an error.

Author: Amit Kapila
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAA4eK1KoOH0byboyYY40NBcC7Fe812trwTa+WY3jQF7WQWZbQg@mail.gmail.com
2020-12-04 13:54:50 +05:30
Michael Paquier bd94a9c04e Rename cryptohashes.c to cryptohashfuncs.c
87ae969 has created two new files called cryptohash{_openssl}.c in
src/common/, whose names overlap with the existing backend file called
cryptohashes.c dedicated to the SQL wrappers for SHA2 and MD5.  This
file is renamed to cryptohashfuncs.c to be more consistent with the
surroundings and reduce the confusion with the new cryptohash interface
of src/common/.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/X8hHhaQgbMbW+aGU@paquier.xyz
2020-12-04 12:58:44 +09:00
Michael Paquier 4f48a6fbe2 Change SHA2 implementation based on OpenSSL to use EVP digest routines
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
This takes advantage of the refactoring done in 87ae969 that has
introduced the allocation and free routines for cryptographic hashes.

Since 1.1.0, OpenSSL does not publish the contents of the cryptohash
contexts, forcing any consumers to rely on OpenSSL for all allocations.
Hence, the resource owner callback mechanism gains a new set of routines
to track and free cryptohash contexts when using OpenSSL, preventing any
risks of leaks in the backend.  Nothing is needed in the frontend thanks
to the refactoring of 87ae969, and the resowner knowledge is isolated
into cryptohash_openssl.c.

Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Heikki Linnakangas
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
2020-12-04 10:49:23 +09:00
Peter Eisentraut 6114040711 Small code simplifications
strVal() can be used in a couple of places instead of coding the same
thing by hand.
2020-12-03 11:44:13 +01:00
Dean Rasheed 25a9e54d2d Improve estimation of OR clauses using extended statistics.
Formerly we only applied extended statistics to an OR clause as part
of the clauselist_selectivity() code path for an OR clause appearing
in an implicitly-ANDed list of clauses. This meant that it could only
use extended statistics if all sub-clauses of the OR clause were
covered by a single extended statistics object.

Instead, teach clause_selectivity() how to apply extended statistics
to an OR clause by handling its ORed list of sub-clauses in a similar
manner to an implicitly-ANDed list of sub-clauses, but with different
combination rules. This allows one or more extended statistics objects
to be used to estimate all or part of the list of sub-clauses. Any
remaining sub-clauses are then treated as if they are independent.

Additionally, to avoid double-application of extended statistics, this
introduces "extended" versions of clause_selectivity() and
clauselist_selectivity(), which include an option to ignore extended
statistics. This replaces the old clauselist_selectivity_simple()
function which failed to completely ignore extended statistics when
called from the extended statistics code.

A known limitation of the current infrastructure is that an AND clause
under an OR clause is not treated as compatible with extended
statistics (because we don't build RestrictInfos for such sub-AND
clauses). Thus, for example, "(a=1 AND b=1) OR (a=2 AND b=2)" will
currently be treated as two independent AND clauses (each of which may
be estimated using extended statistics), but extended statistics will
not currently be used to account for any possible overlap between
those clauses. Improving that is left as a task for the future.

Original patch by Tomas Vondra, with additional improvements by me.

Discussion: https://postgr.es/m/20200113230008.g67iyk4cs3xbnjju@development
2020-12-03 10:03:49 +00:00
Michael Paquier b5913f6120 Refactor CLUSTER and REINDEX grammar to use DefElem for option lists
This changes CLUSTER and REINDEX so as a parenthesized grammar becomes
possible for options, while unifying the grammar parsing rules for
option lists with the existing ones.

This is a follow-up of the work done in 873ea9e for VACUUM, ANALYZE and
EXPLAIN.  This benefits REINDEX for a potential backend-side filtering
for collatable-sensitive indexes and TABLESPACE, while CLUSTER would
benefit from the latter.

Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2020-12-03 10:13:21 +09:00
Stephen Frost dc11f31a1a Add GSS information to connection authorized log message
GSS information (if used) such as if the connection was authorized using
GSS or if it was encrypted using GSS, and perhaps most importantly, what
the GSS principal used for the authentication was, is extremely useful
but wasn't being included in the connection authorized log message.

Therefore, add to the connection authorized log message that
information, in a similar manner to how we log SSL information when SSL
is used for a connection.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy
Discussion: https://www.postgresql.org/message-id/CALDaNm2N1385_Ltoo%3DS7VGT-ESu_bRQa-sC1wg6ikrM2L2Z49w%40mail.gmail.com
2020-12-02 14:41:53 -05:00
Fujii Masao 01469241b2 Track total number of WAL records, FPIs and bytes generated in the cluster.
Commit 6b466bf5f2 allowed pg_stat_statements to track the number of
WAL records, full page images and bytes that each statement generated.
Similarly this commit allows us to track the cluster-wide WAL statistics
counters.

New columns wal_records, wal_fpi and wal_bytes are added into the
pg_stat_wal view, and reports the total number of WAL records,
full page images and bytes generated in the , respectively.

Author: Masahiro Ikeda
Reviewed-by: Amit Kapila, Movead Li, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/35ef960128b90bfae3b3fdf60a3a860f@oss.nttdata.com
2020-12-02 13:00:15 +09:00
Fujii Masao 942305a363 Allow restore_command parameter to be changed with reload.
This commit changes restore_command from PGC_POSTMASTER to PGC_SIGHUP.

As the side effect of this commit, restore_command can be reset to
empty during archive recovery. In this setting, archive recovery
tries to replay only WAL files available in pg_wal directory. This is
the same behavior as when the command that always fails is specified
in restore_command.

Note that restore_command still must be specified (not empty) when
starting archive recovery, even after applying this commit. This is
necessary as the safeguard to prevent users from forgetting to
specify restore_command and starting archive recovery.

Thanks to Peter Eisentraut, Michael Paquier, Andres Freund,
Robert Haas and Anastasia Lubennikova for discussion.

Author: Sergei Kornilov
Reviewed-by: Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/2317771549527294@sas2-985f744271ca.qloud-c.yandex.net
2020-12-02 11:00:15 +09:00
Michael Paquier 87ae9691d2 Move SHA2 routines to a new generic API layer for crypto hashes
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years).  Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines.  This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.

This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.

The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/.  OpenSSL specifics are part of
cryptohash_openssl.c.  This infrastructure is usable for more hash
types, like MD5 or HMAC.

Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit.  The
zones mostly impacted are checksum manifests, libpq and SCRAM.

Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.

This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
2020-12-02 10:37:20 +09:00
Tom Lane f7f83a55bf Ensure that expandTableLikeClause() re-examines the same table.
As it stood, expandTableLikeClause() re-did the same relation_openrv
call that transformTableLikeClause() had done.  However there are
scenarios where this would not find the same table as expected.
We hold lock on the LIKE source table, so it can't be renamed or
dropped, but another table could appear before it in the search path.
This explains the odd behavior reported in bug #16758 when cloning a
table as a temp table of the same name.  This case worked as expected
before commit 502898192 introduced the need to open the source table
twice, so we should fix it.

To make really sure we get the same table, let's re-open it by OID not
name.  That requires adding an OID field to struct TableLikeClause,
which is a little nervous-making from an ABI standpoint, but as long
as it's at the end I don't think there's any serious risk.

Per bug #16758 from Marc Boeren.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/16758-840e84a6cfab276d@postgresql.org
2020-12-01 14:02:27 -05:00
Alvaro Herrera 677f74e5bb
Avoid memcpy() with a NULL source pointer and count == 0
When memcpy() is called on a pointer, the compiler is entitled to assume
that the pointer is not null, which can lead to optimizing nearby code
in potentially undesirable ways.  We still want such optimizations
(gcc's -fdelete-null-pointer-checks) in cases where they're valid.

Related: commit 13bba02271.

Backpatch to pg11, where this particular instance appeared.

Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Reported-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CAEudQApUndmQkr5fLrCKXQ7+ib44i7S+Kk93pyVThS85PnG3bQ@mail.gmail.com
Discussion: https://postgr.es/m/CALNJ-vSdhwSM5f4tnNn1cdLHvXMVe_S+V3nR5GwNrmCPNB2VtQ@mail.gmail.com
2020-12-01 11:46:56 -03:00
Thomas Munro 57faaf376e Use truncate(2) where appropriate.
When truncating files by name, use truncate(2).  Windows hasn't got it,
so keep our previous coding based on ftruncate(2) as a fallback.

Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 15:42:22 +13:00
Thomas Munro 9f35f94373 Free disk space for dropped relations on commit.
When committing a transaction that dropped a relation, we previously
truncated only the first segment file to free up disk space (the one
that won't be unlinked until the next checkpoint).

Truncate higher numbered segments too, even though we unlink them on
commit.  This frees the disk space immediately, even if other backends
have open file descriptors and might take a long time to get around to
handling shared invalidation events and closing them.  Also extend the
same behavior to the first segment, in recovery.

Back-patch to all supported releases.

Bug: #16663
Reported-by: Denis Patron <denis.patron@previnet.it>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: David Zhang <david.zhang@highgo.ca>
Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 13:21:03 +13:00
Tom Lane 8286223f3d Fix missing outfuncs.c support for IncrementalSortPath.
For debugging purposes, Path nodes are supposed to have outfuncs
support, but this was overlooked in the original incremental sort patch.

While at it, clean up a couple other minor oversights, as well as
bizarre choice of return type for create_incremental_sort_path().
(All the existing callers just cast it to "Path *" immediately, so
they don't care, but some future caller might care.)

outfuncs.c fix by Zhijie Hou, the rest by me

Discussion: https://postgr.es/m/324c4d81d8134117972a5b1f6cdf9560@G08CNEXMBPEKD05.g08.fujitsu.local
2020-11-30 16:33:09 -05:00
Tom Lane 275b3411d9 Prevent parallel index build in a standalone backend.
This can't work if there's no postmaster, and indeed the code got an
assertion failure trying.  There should be a check on IsUnderPostmaster
gating the use of parallelism, as the planner has for ordinary
parallel queries.

Commit 40d964ec9 got this right, so follow its model of checking
IsUnderPostmaster at the same place where we check for
max_parallel_maintenance_workers == 0.  In general, new code
implementing parallel utility operations should do the same.

Report and patch by Yulin Pei, cosmetically adjusted by me.
Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/HK0PR01MB22747D839F77142D7E76A45DF4F50@HK0PR01MB2274.apcprd01.prod.exchangelabs.com
2020-11-30 14:38:00 -05:00
Tom Lane b1738ff6ab Fix miscomputation of direct_lateral_relids for join relations.
If a PlaceHolderVar is to be evaluated at a join relation, but
its value is only needed there and not at higher levels, we neglected
to update the joinrel's direct_lateral_relids to include the PHV's
source rel.  This causes problems because join_is_legal() then won't
allow joining the joinrel to the PHV's source rel at all, leading
to "failed to build any N-way joins" planner failures.

Per report from Andreas Seltenreich.  Back-patch to 9.5
where the problem originated.

Discussion: https://postgr.es/m/87blfgqa4t.fsf@aurora.ydns.eu
2020-11-30 12:22:43 -05:00
Michael Paquier 873ea9ee69 Refactor parsing rules for option lists of EXPLAIN, VACUUM and ANALYZE
Those three commands have been using the same grammar rules to handle a
a list of parenthesized options.  This refactors the code so as they use
the same parsing rules, shaving some code.  A future commit will make
use of those option parsing rules for more utility commands, like
REINDEX and CLUSTER.

Author: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2020-11-30 20:27:37 +09:00
Heikki Linnakangas 2bc588798b Remove leftover comments, left behind by removal of WITH OIDS.
Author: Amit Langote
Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGaRoF3XrhPW-Y7P%2BG7bKo84Z_h%3DkQHvMh-80%3Dav3wmOw%40mail.gmail.com
2020-11-30 10:26:43 +02:00
Fujii Masao 98e2d58d66 Improve log message about termination of background workers.
Previously the shutdown of a background worker that uses die() as
SIGTERM signal handler produced the log message "terminating
connection due to administrator command". This log message was
confusing because a background worker is not a connection.
This commit improves that log message to "terminating background
worker XXX due to administrator command" (XXX is replaced with
the name of the background worker). This is the same log message
as another SIGTERM signal handler bgworker_die() for a background
worker reports.

Author: Bharath Rupireddy
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/3f292fbb-f155-9a01-7cb2-7ccc9007ab3f@oss.nttdata.com
2020-11-30 11:05:19 +09:00
Tom Lane 9c83b54a9c Fix a recently-introduced race condition in LISTEN/NOTIFY handling.
Commit 566372b3d fixed some race conditions involving concurrent
SimpleLruTruncate calls, but it introduced new ones in async.c.
A newly-listening backend could attempt to read Notify SLRU pages that
were in process of being truncated, possibly causing an error.  Also,
the QUEUE_TAIL pointer could become set to a value that's not equal to
the queue position of any backend.  While that's fairly harmless in
v13 and up (thanks to commit 51004c717), in older branches it resulted
in near-permanent disabling of the queue truncation logic, so that
continued use of NOTIFY led to queue-fill warnings and eventual
inability to send any more notifies.  (A server restart is enough to
make that go away, but it's still pretty unpleasant.)

The core of the problem is confusion about whether QUEUE_TAIL
represents the "logical" tail of the queue (i.e., the oldest
still-interesting data) or the "physical" tail (the oldest data we've
not yet truncated away).  To fix, split that into two variables.
QUEUE_TAIL regains its definition as the logical tail, and we
introduce a new variable to track the oldest un-truncated page.

Per report from Mikael Gustavsson.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/1b8561412e8a4f038d7a491c8b922788@smhi.se
2020-11-28 14:03:40 -05:00
Fujii Masao 3df51ca8b3 Fix CLUSTER progress reporting of number of blocks scanned.
Previously pg_stat_progress_cluster view reported the current block
number in heap scan as the number of heap blocks scanned (i.e.,
heap_blks_scanned). This reported number could be incorrect when
synchronize_seqscans is enabled, because it allowed the heap scan to
start at block in middle. This could result in wraparounds in the
heap_blks_scanned column when the heap scan wrapped around.
This commit fixes the bug by calculating the number of blocks from
the block that the heap scan starts at to the current block in scan,
and reporting that number in the heap_blks_scanned column.

Also, in pg_stat_progress_cluster view, previously heap_blks_scanned
could not reach heap_blks_total at the end of heap scan phase
if the last pages scanned were empty. This commit fixes the bug by
manually updating heap_blks_scanned to the same value as
heap_blks_total when the heap scan phase finishes.

Back-patch to v12 where pg_stat_progress_cluster view was introduced.

Reported-by: Matthias van de Meent
Author: Matthias van de Meent
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAEze2WjCBWSGkVfYag001Rc4+-nNLDpWM7QbyD6yPvuhKs-gYQ@mail.gmail.com
2020-11-27 20:16:44 +09:00
Amit Kapila 0926e96c49 Fix replication of in-progress transactions in tablesync worker.
Tablesync worker runs under a single transaction but in streaming mode, we
were committing the transaction on stream_stop, stream_abort, and
stream_commit. We need to avoid committing the transaction in a streaming
mode in tablesync worker.

In passing move the call to process_syncing_tables in
apply_handle_stream_commit after clean up of stream files. This will
allow clean up of files to happen before the exit of tablesync worker
which would otherwise be handled by one of the proc exit routines.

Author: Dilip Kumar
Reviewed-by: Amit Kapila and Peter Smith
Tested-by: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pt4PyKQCwqzQ=EFF=bpKKJD7XKt_S23F6L20ayQNxg77A@mail.gmail.com
2020-11-27 07:43:34 +05:30
Alvaro Herrera dcfff74fb1
Restore lock level to update statusFlags
Reverts 27838981be (some comments are kept).  Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu43a@alap3.anarazel.de
2020-11-26 12:30:48 -03:00
Amit Kapila f3a8f73ec2 Use Enums for logical replication message types at more places.
Commit 644f0d7cc9 added logical replication message type enums to use
instead of character literals but some char substitutions were overlooked.

Author: Peter Smith
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAHut+PsTG=Vrv8hgrvOnAvCNR21jhqMdPk2n0a1uJPoW0p+UfQ@mail.gmail.com
2020-11-26 09:21:14 +05:30
Alvaro Herrera c98763bf51
Avoid spurious waits in concurrent indexing
In the various waiting phases of CREATE INDEX CONCURRENTLY (CIC) and
REINDEX CONCURRENTLY (RC), we wait for other processes to release their
snapshots; this is necessary in general for correctness.  However,
processes doing CIC in other tables cannot possibly affect CIC or RC
done in "this" table, so we don't need to wait for those.  This commit
adds a flag in MyProc->statusFlags to indicate that the current process
is doing CIC, so that other processes doing CIC or RC can ignore it when
waiting.

Note that this logic is only valid if the index does not access other
tables.  For simplicity we avoid setting the flag if the index has a
column that's an expression, or has a WHERE predicate.  (It is possible
to have expressional or partial indexes that do not access other tables,
but figuring that out would require more work.)

This flag can potentially also be used by processes doing REINDEX
CONCURRENTLY to be skipped; and by VACUUM to ignore processes in CIC or
RC for the purposes of computing an Xmin.  That's left for future
commits.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Dimitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql
2020-11-25 18:22:57 -03:00
Tom Lane 2432b1a040 Avoid spamming the client with multiple ParameterStatus messages.
Up to now, we sent a ParameterStatus message to the client immediately
upon any change in the active value of any GUC_REPORT variable.  This
was only barely okay when the feature was designed; now that we have
things like function SET clauses, there are very plausible use-cases
where a GUC_REPORT variable might change many times within a query
--- and even end up back at its original value, perhaps.  Fortunately
most of our GUC_REPORT variables are unlikely to be changed often;
but there are proposals in play to enlarge that set, or even make it
user-configurable.

Hence, let's fix things to not generate more than one ParameterStatus
message per variable per query, and to not send any message at all
unless the end-of-query value is different from what we last reported.

Discussion: https://postgr.es/m/5708.1601145259@sss.pgh.pa.us
2020-11-25 11:40:44 -05:00
Peter Eisentraut d5d91acdcc Make error hint from bind() failure more accurate
The hint "Is another postmaster already running ..." should only be
printed for errors that are really about something else already using
the address.  In other cases it is misleading.  So only show that hint
if errno == EADDRINUSE.

Also, since Unix-domain sockets in the file-system namespace never
report EADDRINUSE for an existing file (they would just overwrite it),
the part of the hint saying "If not, remove socket file \"%s\" and
retry." can never happen, so remove it.  Unix-domain sockets in the
abstract namespace can report EADDRINUSE, but in that case there is no
file to remove, so the hint doesn't work there either.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033@2ndquadrant.com
2020-11-25 08:33:57 +01:00
Peter Eisentraut c9f0624bc2 Add support for abstract Unix-domain sockets
This is a variant of the normal Unix-domain sockets that don't use the
file system but a separate "abstract" namespace.  At the user
interface, such sockets are represented by names starting with "@".
Supported on Linux and Windows right now.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/6dee8574-b0ad-fc49-9c8c-2edc796f0033@2ndquadrant.com
2020-11-25 08:33:57 +01:00
Thomas Munro a7e65dc88b Fix WaitLatch(NULL) on Windows.
Further to commit 733fa9aa, on Windows when a latch is triggered but we
aren't currently waiting for it, we need to locate the latch's HANDLE
rather than calling ResetEvent(NULL).

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQArTPi1YBc%2Bn1fo0Asy3QBFhVjp_QgyKG-8yksVn%2ByRTiw%40mail.gmail.com
2020-11-25 17:55:49 +13:00
Amit Kapila 805b816305 Remove obsolete comment atop ri_PlanCheck.
Commit 5b7ba75f7f removed the unused parameter but forgot to update the
nearby comments.

Author: Li Japin
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/0E2F62A2-B2F1-4052-83AE-F0BEC8A75789@hotmail.com
2020-11-25 09:14:45 +05:30
Michael Paquier 7b94e99960 Remove catalog function currtid()
currtid() and currtid2() are an undocumented set of functions whose sole
known user is the Postgres ODBC driver, able to retrieve the latest TID
version for a tuple given by the caller of those functions.

As used by Postgres ODBC, currtid() is a shortcut able to retrieve the
last TID loaded into a backend by passing an OID of 0 (magic value)
after a tuple insertion.  This is removed in this commit, as it became
obsolete after the driver began using "RETURNING ctid" with inserts, a
clause supported since Postgres 8.2 (using RETURNING is better for
performance anyway as it reduces the number of round-trips to the
backend).

currtid2() is still used by the driver, so this remains around for now.
Note that this function is kept in its original shape for backward
compatibility reasons.

Per discussion with many people, including Andres Freund, Peter
Eisentraut, Álvaro Herrera, Hiroshi Inoue, Tom Lane and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20200603021448.GB89559@paquier.xyz
2020-11-25 12:18:26 +09:00
Andrew Gierth 660b89928d Properly check index mark/restore in ExecSupportsMarkRestore.
Previously this code assumed that all IndexScan nodes supported
mark/restore, which is not true since it depends on optional index AM
support functions. This could lead to errors about missing support
functions in rare edge cases of mergejoins with no sort keys, where an
unordered non-btree index scan was placed on the inner path without a
protecting Materialize node. (Normally, the fact that merge join
requires ordered input would avoid this error.)

Backpatch all the way since this bug is ancient.

Per report from Eugen Konkov on irc.

Discussion: https://postgr.es/m/87o8jn50be.fsf@news-spur.riddles.org.uk
2020-11-24 21:58:32 +00:00
Tom Lane ec05bafdbb Put "inline" marker on declarations of inline functions.
I'm having a hard time telling whether the letter of the C standard
requires this, but we do have a couple of buildfarm members that
throw warnings when this is not done.  Oversight in c532d15dd.
2020-11-24 15:43:01 -05:00
Heikki Linnakangas 0a2bc5d61e Move per-agg and per-trans duplicate finding to the planner.
This has the advantage that the cost estimates for aggregates can count
the number of calls to transition and final functions correctly.

Bump catalog version, because views can contain Aggrefs.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b2e3536b-1dbc-8303-c97e-89cb0b4a9a48%40iki.fi
2020-11-24 10:45:00 +02:00
Michael Paquier d03d7549b2 Use macros instead of hardcoded offsets for LWLock initialization
This makes the code slightly easier to follow, as the initialization
relies on an offset that overlapped with an equivalent set of macros
defined, which are used in other places already.

Author: Japin Li
Discussion: https://postgr.es/m/MEYP282MB1669FB410006758402F2C3A2B6E00@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2020-11-24 12:39:58 +09:00
Tom Lane 789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
David Rowley 913ec71d68 Improve compiler code layout in elog/ereport ERROR calls
Here we use a bit of preprocessor trickery to coax supporting compilers
into laying out their generated code so that the code that's in the same
branch as elog(ERROR)/ereport(ERROR) calls is moved away from the hot
path.  Effectively, this reduces the size of the hot code meaning that it
can sit on fewer cache lines.

Performance improvements of between 10-15% have been seen on highly CPU
bound workloads using pgbench's TPC-b benchmark.

What's achieved here is very similar to putting the error condition inside
an unlikely() macro. For example;

if (unlikely(x < 0))
    elog(ERROR, "invalid x value");

now there's no need to make use of unlikely() here as the common macro
used by elog and ereport will now see that elevel is >= ERROR and make use
of a pg_attribute_cold marked version of errstart().

When elevel < ERROR or if it cannot be determined to be constant, the
original behavior is maintained.

Author: David Rowley
Reviewed-by: Andres Freund, Peter Eisentraut
Discussion: https://postgr.es/m/CAApHDvrVpasrEzLL2er7p9iwZFZ%3DJj6WisePcFeunwfrV0js_A%40mail.gmail.com
2020-11-24 12:04:42 +13:00
Alvaro Herrera 450c8230b1
Don't hold ProcArrayLock longer than needed in rare cases
While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2020-11-23 18:55:23 -03:00
Tom Lane 0cc9932788 Rename the "point is strictly above/below point" comparison operators.
Historically these were called >^ and <^, but that is inconsistent
with the similar box, polygon, and circle operators, which are named
|>> and <<| respectively.  Worse, the >^ and <^ names are used for
*not* strict above/below tests for the box type.

Hence, invent new operators following the more common naming.  The
old operators remain available for now, and are still accepted by
the relevant index opclasses too.  But there's a deprecation notice,
so maybe we can get rid of them someday.

Emre Hasegeli, reviewed by Pavel Borisov

Discussion: https://postgr.es/m/24348.1587444160@sss.pgh.pa.us
2020-11-23 11:38:37 -05:00
Tom Lane d36228a9fc Improve wording of two error messages related to generated columns.
Clarify that you can "insert" into a generated column as long as what
you're inserting is a DEFAULT placeholder.

Also, use ERRCODE_GENERATED_ALWAYS in place of ERRCODE_SYNTAX_ERROR;
there doesn't seem to be any reason to use the less specific errcode.

Discussion: https://postgr.es/m/9q0sgcr416t.fsf@gmx.us
2020-11-23 11:15:12 -05:00
Alvaro Herrera fe05129155
Make some sanity-check elogs more verbose
A few sanity checks in funcapi.c were not mentioning all the possible
clauses for failure, confusing developers who fat-fingered catalog data
additions.  Make the errors more detailed to avoid wasting time in
pinpointing mistakes.

Per complaint from Craig Ringer.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAMsr+YH7Kd87A3cU5m_wKo46HPQ46zFv5wesFNL0YWxkGhGv3g@mail.gmail.com
2020-11-23 13:10:03 -03:00
Heikki Linnakangas 68b1a4877e Fix a few comments that referred to copy.c.
Missed these in the previous commit.
2020-11-23 11:36:13 +02:00
Heikki Linnakangas c532d15ddd Split copy.c into four files.
Copy.c has grown really large. Split it into more manageable parts:

- copy.c now contains only a few functions that are common to COPY FROM
  and COPY TO.

- copyto.c contains code for COPY TO.

- copyfrom.c contains code for initializing COPY FROM, and inserting the
  tuples to the correct table.

- copyfromparse.c contains code for reading from the client/file/program,
  and parsing the input text/CSV/binary format into tuples.

All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.

The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.

Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
2020-11-23 10:50:50 +02:00
Tom Lane 17958972fe Allow a multi-row INSERT to specify DEFAULTs for a generated column.
One can say "INSERT INTO tab(generated_col) VALUES (DEFAULT)" and not
draw an error.  But the equivalent case for a multi-row VALUES list
always threw an error, even if one properly said DEFAULT in each row.
Fix that.  While here, improve the test cases for nearby logic about
OVERRIDING SYSTEM/USER values.

Dean Rasheed

Discussion: https://postgr.es/m/9q0sgcr416t.fsf@gmx.us
2020-11-22 15:48:32 -05:00
Tom Lane 9fe649ea29 In geo_ops.c, represent infinite slope as Infinity, not DBL_MAX.
Since we're assuming IEEE floats these days, there seems little
reason not to do this.  It has the advantage that when the slope is
computed as infinite due to the presence of Inf coordinates, we get
saner behavior than before from line_construct(), and thence also
in some dependent operations such as finding the closest point.

Also fix line_construct() to special-case slope zero.  The previous
coding got the right answer in most cases, but it could compute
C as NaN when the point has Inf coordinates.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 17:24:07 -05:00
Tom Lane 8597a48d01 Fix FPeq() and friends to get the right answers for infinities.
"FPeq(infinity, infinity)" returned false, on account of getting NaN
when it subtracts the two inputs.  Fix that by adding a separate
check for exact equality.

FPle() and FPge() similarly got the wrong answer for two like-signed
infinities.  In those cases, we can just rearrange the comparisons
to avoid potentially subtracting infinities.

While the sibling functions FPne() etc accidentally gave the right
answers even with the internal NaN results, it seems best to make
similar adjustments to them to avoid depending on this.

FPeq() has to be converted to an inline function to avoid double
evaluations of its arguments, and I did the same for the others
just for consistency.

In passing, make the handling of NaN cases in line_eq() and
point_eq_point() simpler and easier to reason about, and perhaps
faster.

This results in just one visible regression test change: slope()
now gives DBL_MAX for two inputs of (inf,1e300), which is consistent
with what it does for (1e300,inf), so that seems like a bug fix.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 16:46:43 -05:00
Michael Paquier 878f3a19c6 Remove INSERT privilege check at table creation of CTAS and matview
As per discussion with Peter Eisentraunt, the SQL standard specifies
that any tuple insertion done as part of CREATE TABLE AS happens without
any extra ACL check, so it makes little sense to keep a check for INSERT
privileges when using WITH DATA.  Materialized views are not part of the
standard, but similarly, this check can be confusing as this refers to
an access check on a table created within the same command as the one
that would insert data into this table.

This commit removes the INSERT privilege check for WITH DATA, the
default, that 846005e removed partially, but only for WITH NO DATA.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/d049c272-9a47-d783-46b0-46665b011598@enterprisedb.com
2020-11-21 19:45:30 +09:00
Peter Eisentraut b5acf10cfc Replace a macro by a function
Using a macro is ugly and not justified here.

Discussion: https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36c7b@2ndquadrant.com
2020-11-20 11:25:25 +01:00
Thomas Munro ca051d8b10 Add collation versions for FreeBSD.
On FreeBSD 13, use querylocale() to read the current version of libc
collations.  Similar to commits 352f6f2d for Windows and d5ac14f9 for
GNU/Linux.

Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-20 21:49:57 +13:00
Fujii Masao a4ef0329c2 Emit log when restore_command succeeds but archived file faills to be restored.
Previously, when restore_command claimed to succeed but failed to restore
the file with the right name, for example, due to mis-configuration of
restore_command, no log message was reported. Then the recovery failed
later with an error message not directly related to the issue.

This commit changes the recovery so that a log message is emitted in
this error case. This would enable us to investigate what happened in
this case more easily.

Author: Jeff Janes, Fujii Masao
Reviewed-by: Pavel Borisov, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAMkU=1xkFs3Omp4JR4wMYWdam_KLuj6LXnTYfU8u3T0h=PLLMQ@mail.gmail.com
2020-11-20 15:42:47 +09:00
Tom Lane 926fa801ac Remove undocumented IS [NOT] OF syntax.
This feature was added a long time ago, in 7c1e67bd5 and eb121ba2c,
but never documented in any user-facing way.  (Documentation added
in 6126d3e70 was commented out almost immediately, in 8272fc3f7.)
That's because, while this syntax is defined by SQL:99, our
implementation is only vaguely related to the standard's semantics.
The standard appears to intend a run-time not parse-time test, and
it definitely intends that the test should understand subtype
relationships.

No one has stepped up to fix that in the intervening years, but
people keep coming across the code and asking why it's not documented.
Let's just get rid of it: if anyone ever wants to make it work per
spec, they can easily recover whatever parts of this code are still
of value from our git history.

If there's anyone out there who's actually using this despite its
undocumented status, they can switch to using pg_typeof() instead,
eg. "pg_typeof(something) = 'mytype'::regtype".  That gives
essentially the same semantics as what our IS OF code did.
(We didn't have that function last time this was discussed, or
we would have ripped out IS OF then.)

Discussion: https://postgr.es/m/CAKFQuwZ2pTc-DSkOiTfjauqLYkNREeNZvWmeg12Q-_69D+sYZA@mail.gmail.com
Discussion: https://postgr.es/m/BAY20-F23E9F2B4DAB3E4E88D3623F99B0@phx.gbl
Discussion: https://postgr.es/m/3E7CF81D.1000203@joeconway.com
2020-11-19 17:39:39 -05:00
Tom Lane 97390fe8a6 Further fixes for CREATE TABLE LIKE: cope with self-referential FKs.
Commit 502898192 was too careless about the order of execution of the
additional ALTER TABLE operations generated by expandTableLikeClause.
It just stuck them all at the end, which seems okay for most purposes.
But it falls down in the case where LIKE is importing a primary key
or unique index and the outer CREATE TABLE includes a FOREIGN KEY
constraint that needs to depend on that index.  Weird as that is,
it used to work, so we ought to keep it working.

To fix, make parse_utilcmd.c insert LIKE clauses between index-creation
and FK-creation commands in the transformed list of commands, and change
utility.c so that the commands generated by expandTableLikeClause are
executed immediately not at the end.  One could imagine scenarios where
this wouldn't work either; but currently expandTableLikeClause only
makes column default expressions, CHECK constraints, and indexes, and
this ordering seems fine for those.

Per bug #16730 from Sofoklis Papasofokli.  Like the previous patch,
back-patch to all supported branches.

Discussion: https://postgr.es/m/16730-b902f7e6e0276b30@postgresql.org
2020-11-19 15:03:17 -05:00
Peter Eisentraut 01e658fa74 Hash support for row types
Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as hash-based plans for
UNION/INTERSECT/EXCEPT DISTINCT, recursive queries using UNION
DISTINCT, hash joins, and hash partitioning.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
2020-11-19 09:32:47 +01:00
Thomas Munro 7888b09994 Add BarrierArriveAndDetachExceptLast().
Provide a way for one process to continue the remaining phases of a
(previously) parallel computation alone.  Later patches will use this to
extend Parallel Hash Join.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
2020-11-19 18:13:46 +13:00
Alvaro Herrera 27838981be
Relax lock level for setting PGPROC->statusFlags
We don't actually need a lock to set PGPROC->statusFlags itself; what we
do need is a shared lock on either XidGenLock or ProcArrayLock in order to
ensure MyProc->pgxactoff keeps still while we modify the mirror array in
ProcGlobal->statusFlags.  Some places were using an exclusive lock for
that, which is excessive.  Relax those to use shared lock only.

procarray.c has a couple of places with somewhat brittle assumptions
about PGPROC changes: ProcArrayEndTransaction uses only shared lock, so
it's permissible to change MyProc only.  On the other hand,
ProcArrayEndTransactionInternal also changes other procs, so it must
hold exclusive lock.  Add asserts to ensure those assumptions continue
to hold.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20201117155501.GA13805@alvherre.pgsql
2020-11-18 13:24:22 -03:00
Heikki Linnakangas 2cccb627f1 Skip allocating hash table in EXPLAIN-only mode.
Author: Alexey Bashtanov
Discussion: https://www.postgresql.org/message-id/36823f65-050d-ae24-aa4d-a37726998240%40imap.cc
2020-11-18 12:39:15 +02:00
Peter Geoghegan cf2acaf4dc Deprecate nbtree's BTP_HAS_GARBAGE flag.
Streamline handling of the various strategies that we have to avoid a
page split in nbtinsert.c.  When it looks like a leaf page is about to
overflow, we now perform deleting LP_DEAD items and deduplication in one
central place.  This greatly simplifies _bt_findinsertloc().

This has an independently useful consequence: nbtree no longer relies on
the BTP_HAS_GARBAGE page level flag/hint for anything important.  We
still set and unset the flag in the same way as before, but it's no
longer treated as a gating condition when considering if we should check
for already-set LP_DEAD bits.  This happens at the point where the page
looks like it might have to be split anyway, so simply checking the
LP_DEAD bits in passing is practically free.  This avoids missing
LP_DEAD bits just because the page-level hint is unset, which is
probably reasonably common (e.g. it happens when VACUUM unsets the
page-level flag without actually removing index tuples whose LP_DEAD-bit
was set recently, after the VACUUM operation began but before it reached
the leaf page in question).

Note that this isn't a big behavioral change compared to PostgreSQL 13.
We were already checking for set LP_DEAD bits regardless of whether the
BTP_HAS_GARBAGE page level flag was set before we considered doing a
deduplication pass.  This commit only goes slightly further by doing the
same check for all indexes, even indexes where deduplication won't be
performed.

We don't completely remove the BTP_HAS_GARBAGE flag.  We still rely on
it as a gating condition with pg_upgrade'd indexes from before B-tree
version 4/PostgreSQL 12.  That makes sense because we sometimes have to
make a choice among pages full of duplicates when inserting a tuple with
pre version 4 indexes.  It probably still pays to avoid accessing the
line pointer array of a page there, since it won't yet be clear whether
we'll insert on to the page in question at all, let alone split it as a
result.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz%3DYpc1PDdk8OVJDChGJBjT06%3DA0Mbv9HyTLCsOknGcUFg%40mail.gmail.com
2020-11-17 09:45:56 -08:00
Alvaro Herrera 7684b6fbed
indexcmds.c: reorder function prototypes
... out of an overabundance of neatnikism, perhaps.
2020-11-17 14:22:26 -03:00
Peter Geoghegan a034f8b60c nbtree: Rename nbtinsert.c variables for consistency.
Stop naming special area/opaque pointer variables 'lpageop' in contexts
where it doesn't make sense.  This is a holdover from a time when logic
that performs tasks that are now spread across _bt_insertonpg(),
_bt_findinsertloc(), and _bt_split() was more centralized.  'lpageop'
denotes "left page", which doesn't make sense outside of contexts in
which there isn't also a right page.

Also acquire page flag variables up front within _bt_insertonpg().  This
makes it closer to _bt_split() following refactoring commit bc3087b626.
This allows the page split and retail insert paths to both make use of
the same variables.
2020-11-17 09:01:14 -08:00
Amit Kapila 9653f24ad8 Fix 'skip-empty-xacts' option in test_decoding for streaming mode.
In streaming mode, the transaction can be decoded in multiple streams and
those streams can be interleaved with streams of other transactions. So,
we can't remember the transaction's write status in the logical decoding
context because that might get changed due to some other transactions and
lead to wrong answers for 'skip-empty-xacts' option. We decided to keep
each transaction's write status in the ReorderBufferTxn to avoid
interleaved streams changing the status of some unrelated transactions.

Diagnosed-by: Amit Kapila
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1LR7=XNM_TLmpZMFuV8ZQpoxkem--NZJYf8YXmesbvwLA@mail.gmail.com
2020-11-17 12:14:53 +05:30
Tom Lane 2bd49b493a Don't Insert() a VFD entry until it's fully built.
Otherwise, if FDDEBUG is enabled, the debugging output fails because
it tries to read the fileName, which isn't set up yet (and should in
fact always be NULL).

AFAICT, this has been wrong since Berkeley.  Before 96bf88d52,
it would accidentally fail to crash on platforms where snprintf()
is forgiving about being passed a NULL pointer for %s; but the
file name intended to be included in the debug output wouldn't
ever have shown up.

Report and fix by Greg Nancarrow.  Although this is only visibly
broken in custom-made builds, it still seems worth back-patching
to all supported branches, as the FDDEBUG code is pretty useless
as it stands.

Discussion: https://postgr.es/m/CAJcOf-cUDgm9qYtC_B6XrC6MktMPNRby2p61EtSGZKnfotMArw@mail.gmail.com
2020-11-16 20:32:55 -05:00
Alvaro Herrera cd9c1b3e19
Rename PGPROC->vacuumFlags to statusFlags
With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.

(The same is done to the mirroring PROC_HDR->vacuumFlags.)

No functional changes in this commit.

This was suggested first by Hari Babu Kommi in [1] and then by Michael
Paquier at [2].

[1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com
[2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost
2020-11-16 19:42:55 -03:00
Tom Lane 4025e6c466 Do not return NULL for error cases in satisfies_hash_partition().
Since this function is used as a CHECK constraint condition,
returning NULL is tantamount to returning TRUE, which would have the
effect of letting in a row that doesn't satisfy the hash condition.
Admittedly, the cases for which this is done should be unreachable
in practice, but that doesn't make it any less a bad idea.  It also
seems like a dartboard was used to decide which error cases should
throw errors as opposed to returning NULL.

For the checks for NULL input values, I just switched it to returning
false.  There's some argument that an error would be better; but the
case really should be can't-happen in a generated hash constraint,
so it's likely not worth more code for.

For the parent-relation-open-failure case, it seems like we might
as well let relation_open throw an error, instead of having an
impossible-to-diagnose constraint failure.

Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/24067.1605134819@sss.pgh.pa.us
2020-11-16 16:39:59 -05:00
Tom Lane ad84ecc98d Use "true" not "TRUE" in one ICU function call.
This was evidently missed in commit 6337865f3, which generally did
s/TRUE/true/ everywhere.  It escaped notice up to now because ICU
versions before ICU 68 provided definitions of "TRUE" and "FALSE"
regardless.  With ICU 68, it fails to compile.

Per report from Condor.  Back-patch to v11 where 6337865f3 came in.
(I've not tested v10, where this call originated, but I imagine
it's fine since we defined TRUE in c.h back then.)

Discussion: https://postgr.es/m/7a6f3336165bfe3ca66abcda7966f9d0@stz-bg.com
2020-11-16 15:16:39 -05:00
Peter Eisentraut d93ccdea1d Remove unused and deprecated strategy numbers from BRIN code
These were dead code.

Discussion: https://www.postgresql.org/message-id/flat/20201027032511.GF9241@telsasoft.com
2020-11-16 17:25:41 +01:00
Peter Eisentraut 5664b7be5b Normalize comment in empty grammar rules
Change lower case /* empty */ to /* EMPTY */ for consistency with the
majority.

Discussion: https://www.postgresql.org/message-id/flat/e9eed669-e32d-6919-fed4-acc0daea857b%40enterprisedb.com
2020-11-16 11:54:52 +01:00
Peter Eisentraut 591d282e8d Remove code handling removed deprecated containment operators
This removes the code that was there for handling the operators
removed by 2f70fdb064.

Discussion: https://www.postgresql.org/message-id/flat/20201027032511.GF9241@telsasoft.com
2020-11-16 11:51:12 +01:00
Fujii Masao 2945a488a3 Make the standby server promptly handle interrupt signals.
This commit changes the startup process in the standby server so that
it handles the interrupt signals after waiting for wal_retrieve_retry_interval
on the latch and resetting it, before entering another wait on the latch.
This change causes the standby server to promptly handle interrupt signals.

Otherwise, previously, there was the case where the standby needs to
wait extra five seconds to shutdown when the shutdown request arrived
while the startup process was waiting for wal_retrieve_retry_interval
on the latch.

Author: Fujii Masao, but implementation idea is from Soumyadeep Chakraborty
Reviewed-by: Soumyadeep Chakraborty
Discussion: https://postgr.es/m/9d7e6ab0-8a53-ddb9-63cd-289bcb25fe0e@oss.nttdata.com
2020-11-16 18:27:51 +09:00
Michael Paquier 846005e4f3 Relax INSERT privilege requirement for CTAS and matviews WITH NO DATA
When specified, WITH NO DATA does not insert any data into the relation
created, so skip checking for the insert permissions.  With WITH DATA or
WITH NO DATA, it is always required for the user to have CREATE
privileges on the schema targeted for the relation.

Note that plain CREATE TABLE AS or CREATE MATERIALIZED VIEW queries have
begun to work accidentally without INSERT privilege checks as of
874fe3ae, while using EXECUTE or EXPLAIN ANALYZE would fail with the ACL
check, so this makes the behavior for all the command flavors consistent
with each other.  This is arguably a bug fix, but there have been no
complaints about the current behavior either so stable branches are not
changed.

While on it, document properly the privileges requirements for each
commands with more tests for all the scenarios possible, and avoid a
useless bulk-insert allocation when using WITH NO DATA.

Author: Bharath Rupireddy
Reviewed-by: Anastasia Lubennikova, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACWc3N8j0_9nMPz9wcAUnVcdKHzFdDZJ3hVFNEbqtcyG9w@mail.gmail.com
2020-11-16 11:52:40 +09:00
Tom Lane 29d29d652f Fix fuzzy thinking about amcanmulticol versus amcaninclude.
These flags should be independent: in particular an index AM should
be able to say that it supports include columns without necessarily
supporting multiple key columns.  The included-columns patch got
this wrong, possibly aided by the fact that it didn't bother to
update the documentation.

While here, clarify some text about amcanreturn, which was a little
vague about what should happen when amcanreturn reports that only
some of the index columns are returnable.

Noted while reviewing the SP-GiST included-columns patch, which
quite incorrectly (and unsafely) changed SP-GiST to claim
amcanmulticol = true as a workaround for this bug.

Backpatch to v11 where included columns were introduced.
2020-11-15 16:10:58 -05:00
Peter Geoghegan 46cf3c72c3 nbtree: Demote incomplete split "can't happen" error.
Only a basic logic bug in a _bt_insertonpg() caller could lead to a
violation of this invariant (index corruption won't do it).  A "can't
happen" error seems inappropriate (it is arbitrary at best).

Demote the error to a simple assertion.  This matches similar nearby
sanity checks.
2020-11-15 11:53:37 -08:00