Commit Graph

2539 Commits

Author SHA1 Message Date
Robert Haas 3500ccc39b Support base backup targets.
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied,
it is sent to the server as the value of the TARGET option to the
BASE_BACKUP command. If DETAIL is included, it is sent as the value of
the new TARGET_DETAIL option to the BASE_BACKUP command.  If the
target is anything other than 'client', pg_basebackup assumes that it
will now be the server's job to write the backup in a location somehow
defined by the target, and that it therefore needs to write nothing
locally. However, the server will still send messages to the client
for progress reporting purposes.

On the server side, we now support two additional types of backup
targets.  There is a 'blackhole' target, which just throws away the
backup data without doing anything at all with it. Naturally, this
should only be used for testing and debugging purposes, since you will
not actually have a backup when it finishes running. More usefully,
there is also a 'server' target, so you can now use something like
'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some
location on the server. We can extend this to more types of targets
in the future, and might even want to create an extensibility
mechanism for adding new target types.

Since WAL fetching is handled with separate client-side logic, it's
not part of this mechanism; thus, backups with non-default targets
must use -Xnone or -Xfetch.

Patch by me, with a bug fix by Jeevan Ladhe.  The patch set of which
this is a part has also had review and/or testing from Tushar Ahuja,
Suraj Kharage, Dipesh Pandit, and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-20 10:46:33 -05:00
Michael Paquier dc686681e0 Introduce log_destination=jsonlog
"jsonlog" is a new value that can be added to log_destination to provide
logs in the JSON format, with its output written to a file, making it
the third type of destination of this kind, after "stderr" and
"csvlog".  The format is convenient to feed logs to other applications.
There is also a plugin external to core that provided this feature using
the hook in elog.c, but this had to overwrite the output of "stderr" to
work, so being able to do both at the same time was not possible.  The
files generated by this log format are suffixed with ".json", and use
the same rotation policies as the other two formats depending on the
backend configuration.

This takes advantage of the refactoring work done previously in ac7c807,
bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the
TAP tests, making the addition of any new file-based format rather
straight-forward.

The documentation is updated to list all the keys and the values that
can exist in this new format.  pg_current_logfile() also required a
refresh for the new option.

Author: Sehrope Sarkuni, Michael Paquier
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2022-01-17 10:16:53 +09:00
Michael Paquier ac7c80758a Refactor set of routines specific to elog.c
This refactors the following routines and facilities coming from
elog.c, to ease their use across multiple log destinations:
- Start timestamp, including its reset, to store when a process has been
started.
- The log timestamp, associated to an entry (the same timestamp is used
when logging across multiple destinations).
- Routine deciding if a query can be logged or not.
- The backend type names, depending on the process that logs any
information (postmaster, bgworker name or just GetBackendTypeDesc() with
a regular backend).
- Write of logs using the logging piped protocol, with the log collector
enabled.
- Error severity converted to a string.

These refactored routines will be used for some follow-up changes
to move all the csvlog logic into its own file and to potentially add
JSON as log destination, reducing the overall size of elog.c as the end
result.

Author: Michael Paquier, Sehrope Sarkuni
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
2022-01-12 14:16:59 +09:00
Bruce Momjian 27b77ecf9f Update copyright for 2022
Backpatch-through: 10
2022-01-07 19:04:57 -05:00
Tom Lane cab5b9ab2c Revert changes about warnings/errors for placeholders.
Revert commits 5609cc01c, 2ed8a8cc5, and 75d22069e until we have
a less broken idea of how this should work in parallel workers.
Per buildfarm.

Discussion: https://postgr.es/m/1640909.1640638123@sss.pgh.pa.us
2021-12-27 16:01:10 -05:00
Tom Lane 5609cc01c6 Rename EmitWarningsOnPlaceholders() to MarkGUCPrefixReserved().
This seems like a clearer name for what it does now.

Provide a compatibility macro so that extensions don't have to convert
to the new name right away.

Discussion: https://postgr.es/m/116024.1640111629@sss.pgh.pa.us
2021-12-27 14:39:08 -05:00
Peter Eisentraut 3c6f8c011f Simplify the general-purpose 64-bit integer parsing APIs
pg_strtouint64() is a wrapper around strtoull/strtoul/_strtoui64, but
it seems no longer necessary to have this indirection.
msvc/Solution.pm claims HAVE_STRTOULL, so the "MSVC only" part seems
unnecessary.  Also, we have code in c.h to substitute alternatives for
strtoull() if not found, and that would appear to cover all currently
supported platforms, so having a further fallback in pg_strtouint64()
seems unnecessary.

Therefore, we could remove pg_strtouint64(), and use strtoull()
directly in all call sites.  However, it seems useful to keep a
separate notation for parsing exactly 64-bit integers, matching the
type definition int64/uint64.  For that, add new macros strtoi64() and
strtou64() in c.h as thin wrappers around strtol()/strtoul() or
strtoll()/stroull().  This makes these functions available everywhere
instead of just in the server code, and it makes the function naming
notably different from the pg_strtointNN() functions in numutils.c,
which have a different API.

Discussion: https://www.postgresql.org/message-id/flat/a3df47c9-b1b4-29f2-7e91-427baf8b75a3%40enterprisedb.com
2021-12-17 06:32:07 +01:00
Tomas Vondra 5753d4ee32 Ignore BRIN indexes when checking for HOT udpates
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed only by BRIN indexes. There are no index
pointers to individual tuples in BRIN, and the page range summary will
be updated anyway as it relies on visibility info.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

Patch by Josef Simanek, various fixes and improvements by me.

Author: Josef Simanek
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
2021-11-30 20:04:38 +01:00
Tom Lane 3804539e48 Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code.  In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.

xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy.  It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random().  It is not cryptographically strong, but neither
are the functions it replaces.

Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself

Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
2021-11-28 21:33:07 -05:00
David Rowley e502150f7d Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:06:59 +13:00
Fujii Masao 1b06d7bac9 Report wait events for local shell commands like archive_command.
This commit introduces new wait events for archive_command,
archive_cleanup_command, restore_command and recovery_end_command.

Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier
Discussion: https://postgr.es/m/4ca4f920-6b48-638d-08b2-93598356f5d3@oss.nttdata.com
2021-11-22 10:28:21 +09:00
Amit Kapila 5a2832465f Allow publishing the tables of schema.
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.

The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;

A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.

Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.

Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
2021-10-27 07:44:52 +05:30
Robert Haas 9ce346eabf Report progress of startup operations that take a long time.
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.

To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.

To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.

Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.

Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
2021-10-25 11:51:57 -04:00
Robert Haas 732e6677a6 Add enable_timeout_every() to fire the same timeout repeatedly.
enable_timeout_at() and enable_timeout_after() can still be used
when you want to fire a timeout just once.

Patch by me, per a suggestion from Tom Lane.

Discussion: http://postgr.es/m/2992585.1632938816@sss.pgh.pa.us
Discussion: http://postgr.es/m/CA+TgmoYqSF5sCNrgTom9r3Nh=at4WmYFD=gsV-omStZ60S0ZUQ@mail.gmail.com
2021-10-25 11:33:44 -04:00
Noah Misch fdd965d074 Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start.  That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index.  Queries that use the
resulting index can silently fail to find rows.  Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation.  It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
2021-10-23 18:36:38 -07:00
Amit Kapila 1607cd0b6c Remove unused wait events.
Commit 464824323e introduced the wait events which were neither used by
that commit nor by follow-up commits for that work.

Author: Masahiro Ikeda
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/ff077840-3ab2-04dd-bbe4-4f5dfd2ad481@oss.nttdata.com
2021-10-21 08:01:25 +05:30
Heikki Linnakangas c4649cce39 Refactor LogicalTapeSet/LogicalTape interface.
All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.

This makes the tape management in hash agg spilling in nodeAgg.c simpler.

Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi
Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
2021-10-18 14:46:01 +03:00
Tom Lane 7b5d4c29ed Fix Portal snapshot tracking to handle subtransactions properly.
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's.  In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.

To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.

Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems.  It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.

Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.

Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
2021-10-01 11:10:12 -04:00
Alexander Korotkov b92f9f7443 Split macros from visibilitymap.h into a separate header
That allows to include just visibilitymapdefs.h from file.c, and in turn,
remove include of postgres.h from relcache.h.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210913232614.czafiubr435l6egi%40alap3.anarazel.de
Author: Alexander Korotkov
Reviewed-by: Andres Freund, Tom Lane, Alvaro Herrera
Backpatch-through: 13
2021-09-23 19:59:03 +03:00
Michael Paquier 0c39c29207 Support "postgres -C" with runtime-computed GUCs
Until now, the -C option of postgres was handled before a small subset
of GUCs computed at runtime are initialized, leading to incorrect
results as GUC machinery would fall back to default values for such
parameters.

For example, data_checksums could report "off" for a cluster as the
control file is not loaded yet.  Or wal_segment_size would show a
segment size at 16MB even if initdb --wal-segsize used something else.
Worse, the command would fail to properly report the recently-introduced
shared_memory, that requires to load shared_preload_libraries as these
could ask for a chunk of shared memory.

Support for runtime GUCs comes with a limitation, as the operation is
now allowed on a running server.  One notable reason for this is that
_PG_init() functions of loadable libraries are called before all
runtime-computed GUCs are initialized, and this is not guaranteed to be
safe to do on running servers.  For the case of shared_memory_size,
where we want to know how much memory would be used without allocating
it, this limitation is fine.  Another case where this will help is for
huge pages, with the introduction of a different GUC to evaluate the
amount of huge pages required for a server before starting it, without
having to allocate large chunks of memory.

This feature is controlled with a new GUC flag, and four parameters are
classified as runtime-computed as of this change:
- data_checksums
- shared_memory_size
- data_directory_mode
- wal_segment_size

Some TAP tests are added to provide some coverage here, using
data_checksums in the tests of pg_checksums.

Per discussion with Andres Freund, Justin Pryzby, Magnus Hagander and
more.

Author: Nathan Bossart
Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
2021-09-16 10:59:26 +09:00
Peter Eisentraut bb9ff46bc4 Fix typo 2021-08-25 10:14:51 +02:00
Michael Paquier 2576dcfb76 Revert refactoring of hex code to src/common/
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/.  This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding.  As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are.  Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.

The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13.  Fow now and
post-beta3, this is the simplest option.

Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
2021-08-19 09:20:13 +09:00
Tom Lane 18bac60ede Let regexp_replace() make use of REG_NOSUB when feasible.
If the replacement string doesn't contain \1...\9, then we don't
need sub-match locations, so we can use the REG_NOSUB optimization
here too.  There's already a pre-scan of the replacement string
to look for backslashes, so extend that to check for digits, and
refactor to allow that to happen before we compile the regexp.

While at it, try to speed up the pre-scan by using memchr() instead
of a handwritten loop.  It's likely that this is lost in the noise
compared to the regexp processing proper, but maybe not.  In any
case, this coding is shorter.

Also, add some test cases to improve the poor coverage of
appendStringInfoRegexpSubstr().

Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us
2021-08-09 20:53:25 -04:00
Tom Lane 6424337073 Add assorted new regexp_xxx SQL functions.
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.

Gilles Darold, heavily revised by me

Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net
2021-08-03 13:08:49 -04:00
Dean Rasheed 085f931f52 Allow numeric scale to be negative or greater than precision.
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).

A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.

A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.

Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.

Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.

Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.

Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
2021-07-26 14:13:47 +01:00
Tom Lane d68a003912 Rename debug_invalidate_system_caches_always to debug_discard_caches.
The name introduced by commit 4656e3d66 was agreed to be unreasonably
long.  To match this change, rename initdb's recently-added
--clobber-cache option to --discard-caches.

Discussion: https://postgr.es/m/1374320.1625430433@sss.pgh.pa.us
2021-07-13 15:01:01 -04:00
Tom Lane f10f0ae420 Replace RelationOpenSmgr() with RelationGetSmgr().
The idea behind this patch is to design out bugs like the one fixed
by commit 9d523119f.  Previously, once one did RelationOpenSmgr(rel),
it was considered okay to access rel->rd_smgr directly for some
not-very-clear interval.  But since that pointer will be cleared by
relcache flushes, we had bugs arising from overreliance on a previous
RelationOpenSmgr call still being effective.

Now, very little code except that in rel.h and relcache.c should ever
touch the rd_smgr field directly.  The normal coding rule is to use
RelationGetSmgr(rel) and not expect the result to be valid for longer
than one smgr function call.  There are a couple of places where using
the function every single time seemed like overkill, but they are now
annotated with large warning comments.

Amul Sul, after an idea of mine.

Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
2021-07-12 17:01:36 -04:00
Michael Paquier 70685385d7 Use WaitLatch() instead of pg_usleep() at end-of-vacuum truncation
This has the advantage to make a process more responsive when the
postmaster dies, even if the wait time was rather limited as there was
only a 50ms timeout here.  Another advantage of this change is for
monitoring, as we gain a new wait event for the end-of-vacuum
truncation.

Author: Bharath Rupireddy
Reviewed-by: Aleksander Alekseev, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACU4AdPCq6NLfcA-ZGwX7pPCK5FgEj-CAU0xCKzkASSy_A@mail.gmail.com
2021-07-02 12:58:34 +09:00
Andrew Dunstan e1c1c30f63
Pre branch pgindent / pgperltidy run
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28 11:05:54 -04:00
Peter Geoghegan 3499df0dee Support disabling index bypassing by VACUUM.
Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding
reloption): make it into a ternary style boolean parameter.  It now
exposes a third option, "auto".  The "auto" option (which is now the
default) enables the "bypass index vacuuming" optimization added by
commit 1e55e7d1.

"VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM
simply do any required index vacuuming, regardless of how few dead
tuples are encountered during the first scan of the target heap relation
(unless there are exactly zero).  This gives users a way of opting out
of the "bypass index vacuuming" optimization, if for whatever reason
that proves necessary.  It is also expected to be used by PostgreSQL
developers as a testing option from time to time.

"VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it
forcibly disables both index vacuuming and index cleanup.  It's not
expected to be used much in PostgreSQL 14.  The failsafe mechanism added
by commit 1e55e7d1 addresses the same problem in a simpler way.
INDEX_CLEANUP can now be thought of as a testing and compatibility
option.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
2021-06-18 20:04:07 -07:00
Tom Lane 84f5c2908d Restore the portal-level snapshot after procedure COMMIT/ROLLBACK.
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal.  In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.

Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve.  So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.

We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK.  As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands.  Hence, teach
spi.c about doing this at the right time.  (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)

This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.

replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix.  But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.

This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).

Per bug #15990 from Andreas Wicht.  Back-patch to v11 where
intra-procedure COMMIT was added.

Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
2021-05-21 14:03:59 -04:00
Alvaro Herrera cafde58b33
Allow compute_query_id to be set to 'auto' and make it default
Allowing only on/off meant that all either all existing configuration
guides would become obsolete if we disabled it by default, or that we
would have to accept a performance loss in the default config if we
enabled it by default.  By allowing 'auto' as a middle ground, the
performance cost is only paid by those who enable pg_stat_statements and
similar modules.

I only edited the release notes to comment-out a paragraph that is now
factually wrong; further edits are probably needed to describe the
related change in more detail.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20210513002623.eugftm4nk2lvvks3@nol
2021-05-15 14:13:09 -04:00
Tom Lane def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Tom Lane f02b9085ad Prevent integer overflows in array subscripting calculations.
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027
2021-05-10 10:44:38 -04:00
Thomas Munro c2dc19342e Revert recovery prefetching feature.
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-10 16:06:09 +12:00
Tom Lane a55a98477b Sync guc.c and postgresql.conf.sample with the SGML docs.
It seems that various people have moved GUCs around in the config.sgml
listing without bothering to make the code agree.  Ensure that the
config_group codes assigned to GUCs match where they are listed in
config.sgml.  Likewise ensure that postgresql.conf.sample lists GUCs
in the same sub-section and same ordering as they appear in config.sgml.

(I've got some doubts about some of these choices, but for the purposes
of this patch, we'll treat config.sgml as gospel.)

Notably, this requires adding a WAL_RECOVERY config_group value,
because 1d257577e didn't.  As long as we're renumbering that enum
anyway, let's take out the values corresponding to major groups
that are divided into sub-groups.  No GUC should be assigned to the
major group itself, so those values just create a temptation to
do the wrong thing, while adding work for translators.

In passing, adjust the short_desc strings for PRESET_OPTIONS GUCs
to uniformly use the phrasing "Shows XYZ.", removing the impression
some of these strings left that you can set the value.

While some of these errors are old, no back-patch, as changing the
contents of the pg_settings view in stable branches seems more likely
to be seen as a compatibility break than anything helpful.

Bharath Rupireddy, Justin Pryzby, Tom Lane

Discussion: https://postgr.es/m/16997-ff16127f6e0d1390@postgresql.org
Discussion: https://postgr.es/m/20210413123139.GE6091@telsasoft.com
2021-05-08 12:13:33 -04:00
Thomas Munro ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
Alvaro Herrera 3fe773b149
Track detached partitions more accurately in partdescs
In d6b8d29419 I (Álvaro) was sloppy about recording whether a
partition descripor does or does not include detached partitions, when
the snapshot checking does not see the pg_inherits row marked detached.
In that case no partition was omitted, yet in the relcache entry we were
saving the partdesc as omitting partitions.  Flip that (so we save it as
a partdesc not omitting partitions, which indeed it doesn't), which
hopefully makes the code easier to reason about.

Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqE7GxGU4VdzwZzfiz+Ont5SsopoFkgtrZGEdPqWRL+biA@mail.gmail.com
2021-05-06 12:47:30 -04:00
Alvaro Herrera d6b8d29419
Allow a partdesc-omitting-partitions to be cached
Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

This also adds the restriction that only one partition can be in
pending-detach state for a partitioned table.

While at it, return find_inheritance_children() API to what it was
before 71f4c8c6f7, and create a separate
find_inheritance_children_extended() that returns detailed info about
detached partitions.

(This incidentally fixes a bug in 8aba932251 whereby a memory context
holding a transient partdesc is reparented to a NULL PortalContext,
leading to permanent leak of that memory.  The fix is to no longer rely
on reparenting contexts to PortalContext.   Reported by Amit Langote.)

Per gripe from Amit Langote
Discussion: https://postgr.es/m/CA+HiwqFgpP1LxJZOBYGt9rpvTjXXkg5qG2+Xch2Z1Q7KrqZR1A@mail.gmail.com
2021-04-28 15:44:35 -04:00
Amit Kapila e7eea52b2d Fix Logical Replication of Truncate in synchronous commit mode.
The Truncate operation acquires an exclusive lock on the target relation
and indexes. It then waits for logical replication of the operation to
finish at commit. Now because we are acquiring the shared lock on the
target index to get index attributes in pgoutput while sending the
changes for the Truncate operation, it leads to a deadlock.

Actually, we don't need to acquire a lock on the target index as we build
the cache entry using a historic snapshot and all the later changes are
absorbed while decoding WAL. So, we wrote a special purpose function for
logical replication to get a bitmap of replica identity attribute numbers
where we get that information without locking the target index.

We decided not to backpatch this as there doesn't seem to be any field
complaint about this issue since it was introduced in commit 5dfd1e5a in
v11.

Reported-by: Haiying Tang
Author: Takamichi Osumi, test case by Li Japin
Reviewed-by: Amit Kapila, Ajin Cherian
Discussion: https://postgr.es/m/OS0PR01MB6113C2499C7DC70EE55ADB82FB759@OS0PR01MB6113.jpnprd01.prod.outlook.com
2021-04-27 08:28:26 +05:30
Bruce Momjian 9660834dd8 adjust query id feature to use pg_stat_activity.query_id
Previously, it was pg_stat_activity.queryid to match the
pg_stat_statements queryid column.  This is an adjustment to patch
4f0b0966c8.  This also adjusts some of the internal function calls to
match.  Catversion bumped.

Reported-by: Álvaro Herrera, Julien Rouhaud

Discussion: https://postgr.es/m/20210408032704.GA7498@alvherre.pgsql
2021-04-20 12:22:26 -04:00
Tom Lane d7cff12c4c Add macro PGWARNING, and make PGERROR available on all platforms.
We'd previously noted the need for coping with Windows headers
that provide some other definition of macro "ERROR" than elog.h
does.  It turns out that R also wants to define ERROR, and
WARNING too.  PL/R has been working around this in a hacky way
that broke when we recently changed the numeric value of ERROR.
To let them have a more future-proof solution, provide an
alternate macro PGWARNING for WARNING, and make PGERROR visible
always, not only when #ifdef WIN32.

Discussion: https://postgr.es/m/CADK3HHK6iMChd1yoOqssxBn5Z14Zar8Ztr3G-N_fuG7F8YTP3w@mail.gmail.com
2021-04-11 13:22:56 -04:00
Michael Paquier 609b0652af Fix typos and grammar in documentation and code comments
Comment fixes are applied on HEAD, and documentation improvements are
applied on back-branches where needed.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com
Backpatch-through: 9.6
2021-04-09 13:53:07 +09:00
Fujii Masao 8ff1c94649 Allow TRUNCATE command to truncate foreign tables.
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.

The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.

For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.

Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
2021-04-08 20:56:08 +09:00
Thomas Munro 1d257577e0 Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2021-04-08 23:20:42 +12:00
Magnus Hagander aaf0432572 Add functions to wait for backend termination
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.

Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
             Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
2021-04-08 11:40:54 +02:00
Bruce Momjian 4f0b0966c8 Make use of in-core query id added by commit 5fd9dfa5f5
Use the in-core query id computation for pg_stat_activity,
log_line_prefix, and EXPLAIN VERBOSE.

Similar to other fields in pg_stat_activity, only the queryid from the
top level statements are exposed, and if the backends status isn't
active then the queryid from the last executed statements is displayed.

Add a %Q placeholder to include the queryid in log_line_prefix, which
will also only expose top level statements.

For EXPLAIN VERBOSE, if a query identifier has been computed, either by
enabling compute_query_id or using a third-party module, display it.

Bump catalog version.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 14:04:06 -04:00
Bruce Momjian 5fd9dfa5f5 Move pg_stat_statements query jumbling to core.
Add compute_query_id GUC to control whether a query identifier should be
computed by the core (off by default).  It's thefore now possible to
disable core queryid computation and use pg_stat_statements with a
different algorithm to compute the query identifier by using a
third-party module.

To ensure that a single source of query identifier can be used and is
well defined, modules that calculate a query identifier should throw an
error if compute_query_id specified to compute a query id and if a query
idenfitier was already calculated.

Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol

Author: Julien Rouhaud

Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
2021-04-07 13:06:56 -04:00
Peter Eisentraut a2da77cdb4 Change return type of EXTRACT to numeric
The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed).  This can lead to
imprecise output in some cases, so returning numeric would be
preferrable.  Changing the return type of an existing function is a
bit risky, so instead we do the following:  We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions.  They work the same way internally but use
numeric instead of float8.  The EXTRACT construct is now mapped by the
parser to these new extract functions.  That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.

Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef9.

The following minor changes of behavior result from the new
implementation:

- The column name from an isolated EXTRACT call is now "extract"
  instead of "date_part".

- Extract from date now rejects inappropriate field names such as
  HOUR.  It was previously mapped internally to extract from
  timestamp, so it would silently accept everything appropriate for
  timestamp.

- Return values when extracting fields with possibly fractional
  values, such as second and epoch, now have the full scale that the
  value has internally (so, for example, '1.000000' instead of just
  '1').

Reported-by: Petr Fedorov <petr.fedorov@phystech.edu>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2021-04-06 07:20:42 +02:00
Fujii Masao 43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Andres Freund 225a22b19e Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.

The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC).  After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.

These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.

The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.

The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-03 12:03:45 -07:00
Andres Freund e1025044cd Split backend status and progress related functionality out of pgstat.c.
Backend status (supporting pg_stat_activity) and command
progress (supporting pg_stat_progress*) related code is largely
independent from the rest of pgstat.[ch] (supporting views like
pg_stat_all_tables that accumulate data over time). See also
a333476b92.

This commit doesn't rename the function names to make the distinction
from the rest of pgstat_ clearer - that'd be more invasive and not
clearly beneficial. If we were to decide to do such a rename at some
point, it's better done separately from moving the code as well.

Robert's review was of an earlier version.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-03 11:42:52 -07:00
Michael Paquier e6bdfd9700 Refactor HMAC implementations
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently).  If there
is no such libraries, a fallback implementation is available.  Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.

HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC.  This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.

This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1.  I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.

Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
2021-04-03 17:30:49 +09:00
Andres Freund a333476b92 Split wait event related code from pgstat.[ch] into wait_event.[ch].
The wait event related code is independent from the rest of the
pgstat.[ch] code, of nontrivial size and changes on a regular
basis. Put it into its own set of files.

As there doesn't seem to be a good pre-existing directory for code
like this, add src/backend/utils/activity.

Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/20210316195440.twxmlov24rr2nxrg@alap3.anarazel.de
2021-04-02 20:02:26 -07:00
Thomas Munro c30f54ad73 Detect POLLHUP/POLLRDHUP while running queries.
Provide a new GUC check_client_connection_interval that can be used to
check whether the client connection has gone away, while running very
long queries.  It is disabled by default.

For now this uses a non-standard Linux extension (also adopted by at
least one other OS).  POLLRDHUP is not defined by POSIX, and other OSes
don't have a reliable way to know if a connection was closed without
actually trying to read or write.

In future we might consider trying to send a no-op/heartbeat message
instead, but that could require protocol changes.

Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa.takay@fujitsu.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
2021-04-03 09:02:41 +13:00
Joe Conway b12bd4869b Fix has_column_privilege function corner case
According to the comments, when an invalid or dropped column oid is passed
to has_column_privilege(), the intention has always been to return NULL.
However, when the caller had table level privilege the invalid/missing
column was never discovered, because table permissions were checked first.

Fix that by introducing extended versions of pg_attribute_acl(check|mask)
and pg_class_acl(check|mask) which take a new argument, is_missing. When
is_missing is NULL, the old behavior is preserved. But when is_missing is
passed by the caller, no ERROR is thrown for dropped or missing
columns/relations, and is_missing is flipped to true. This in turn allows
has_column_privilege to check for column privileges first, providing the
desired semantics.

Not backpatched since it is a user visible behavioral change with no previous
complaints, and the fix is a bit on the invasive side.

Author: Joe Conway
Reviewed-By: Tom Lane
Reported by: Ian Barwick
Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com
2021-03-31 13:55:25 -04:00
Peter Eisentraut 91c5a8caaa Add errhint_plural() function and make use of it
Similar to existing errmsg_plural() and errdetail_plural().  Some
errhint() calls hadn't received the proper plural treatment yet.
2021-03-31 09:16:25 +02:00
David Rowley ed934d4fa3 Allow estimate_num_groups() to pass back further details about the estimation
Here we add a new output parameter to estimate_num_groups() to allow it to
inform the caller of additional, possibly useful information about the
estimation.

The new output parameter is a struct that currently contains just a single
field with a set of flags.  This was done rather than having the flags as
an output parameter to allow future fields to be added without having to
change the signature of the function at a later date when we want to pass
back further information that might not be suitable to store in the flags
field.

It seems reasonable that one day in the future that the planner would want
to know more about the estimation. For example, how many individual sets
of statistics was the estimation generated from?  The planner may want to
take that into account if we ever want to consider risks as well as costs
when generating plans.

For now, there's only 1 flag we set in the flags field.  This is to
indicate if the estimation fell back on using the hard-coded constants in
any part of the estimation. Callers may like to change their behavior if
this is set, and this gives them the ability to do so.  Callers may pass
the flag pointer as NULL if they have no interest in obtaining any
additional information about the estimate.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.  Additionally, we're also not
making any changes to add support for clauselist_selectivity() and
clauselist_selectivity_ext().  However, if this is required in the future
then the same struct being added here should be fine to use as a new
output argument for those functions too.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
2021-03-30 20:52:46 +13:00
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Bruce Momjian 95d77149c5 Add macro RelationIsPermanent() to report relation permanence
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code.  This matches other RelationIs* macros.

This macro will be used in more places in future cluster file encryption
patches.

Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
2021-03-22 20:23:52 -04:00
Amit Kapila c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
2021-03-18 07:25:27 +05:30
Heikki Linnakangas 3174d69fb9 Remove server and libpq support for old FE/BE protocol version 2.
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.

The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.

Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
2021-03-04 10:45:55 +02:00
Thomas Munro 0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Michael Paquier bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Peter Geoghegan 617fffee8a Rename removable xid function for consistency.
GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid().
This is consistent with the general convention for FullTransactionId
equivalents of functions that deal with TransactionId values.  It now
matches the nearby GlobalVisCheckRemovableXid() function, which performs
the same check for callers that use TransactionId values.

Oversight in commit dc7420c2c9.

Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com
2021-02-07 10:11:14 -08:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Noah Misch 7da83415e5 Revive "snapshot too old" with wal_level=minimal and SET TABLESPACE.
Given a permanent relation rewritten in the current transaction, the
old_snapshot_threshold mechanism assumed the relation had never been
subject to early pruning.  Hence, a query could fail to report "snapshot
too old" when the rewrite followed an early truncation.  ALTER TABLE SET
TABLESPACE is probably the only rewrite mechanism capable of exposing
this bug.  REINDEX sets indcheckxmin, avoiding the problem.  CLUSTER has
zeroed page LSNs since before old_snapshot_threshold existed, so
old_snapshot_threshold has never cooperated with it.  ALTER TABLE
... SET DATA TYPE makes the table look empty to every past snapshot,
which is strictly worse.  Back-patch to v13, where commit
c6b92041d3 broke this.

Kyotaro Horiguchi and Noah Misch

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:12:18 -08:00
Tom Lane ee895a655c Improve performance of repeated CALLs within plpgsql procedures.
This patch essentially is cleaning up technical debt left behind
by the original implementation of plpgsql procedures, particularly
commit d92bc83c4.  That patch (or more precisely, follow-on patches
fixing its worst bugs) forced us to re-plan CALL and DO statements
each time through, if we're in a non-atomic context.  That wasn't
for any fundamental reason, but just because use of a saved plan
requires having a ResourceOwner to hold a reference count for the
plan, and we had no suitable resowner at hand, nor would the
available APIs support using one if we did.  While it's not that
expensive to create a "plan" for CALL/DO, the cycles do add up
in repeated executions.

This patch therefore makes the following API changes:

* GetCachedPlan/ReleaseCachedPlan are modified to let the caller
specify which resowner to use to pin the plan, rather than forcing
use of CurrentResourceOwner.

* spi.c gains a "SPI_execute_plan_extended" entry point that lets
callers say which resowner to use to pin the plan.  This borrows the
idea of an options struct from the recently added SPI_prepare_extended,
hopefully allowing future options to be added without more API breaks.
This supersedes SPI_execute_plan_with_paramlist (which I've marked
deprecated) as well as SPI_execute_plan_with_receiver (which is new
in v14, so I just took it out altogether).

* I also took the opportunity to remove the crude hack of letting
plpgsql reach into SPI private data structures to mark SPI plans as
"no_snapshot".  It's better to treat that as an option of
SPI_prepare_extended.

Now, when running a non-atomic procedure or DO block that contains
any CALL or DO commands, plpgsql creates a ResourceOwner that
will be used to pin the plans of the CALL/DO commands.  (In an
atomic context, we just use CurrentResourceOwner, as before.)
Having done this, we can just save CALL/DO plans normally,
whether or not they are used across transaction boundaries.
This seems to be good for something like 2X speedup of a CALL
of a trivial procedure with a few simple argument expressions.
By restricting the creation of an extra ResourceOwner like this,
there's essentially zero penalty in cases that can't benefit.

Pavel Stehule, with some further hacking by me

Discussion: https://postgr.es/m/CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
2021-01-25 22:28:29 -05:00
Michael Paquier aef8948f38 Rework refactoring of hex and encoding routines
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected.  Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing.  This is moved to src/common/ and
it gains the same overflow checks as the decoding part.

On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries.  Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.

While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection.  The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code.  Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks.  The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.

Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
2021-01-14 11:13:24 +09:00
Tom Lane 9877374bef Add idle_session_timeout.
This GUC variable works much like idle_in_transaction_session_timeout,
in that it kills sessions that have waited too long for a new client
query.  But it applies when we're not in a transaction, rather than
when we are.

Li Japin, reviewed by David Johnston and Hayato Kuroda, some
fixes by me

Discussion: https://postgr.es/m/763A0689-F189-459E-946F-F0EC4458980B@hotmail.com
2021-01-06 18:28:52 -05:00
Peter Eisentraut 4656e3d668 Replace CLOBBER_CACHE_ALWAYS with run-time GUC
Forced cache invalidation (CLOBBER_CACHE_ALWAYS) has been impractical
to use for testing in PostgreSQL because it's so slow and because it's
toggled on/off only at build time.  It is helpful when hunting bugs in
any code that uses the sycache/relcache because causes cache
invalidations to be injected whenever it would be possible for an
invalidation to occur, whether or not one was really pending.

Address this by providing run-time control over cache clobber
behaviour using the new debug_invalidate_system_caches_always GUC.
Support is not compiled in at all unless assertions are enabled or
CLOBBER_CACHE_ENABLED is explicitly defined at compile time.  It
defaults to 0 if compiled in, so it has negligible effect on assert
build performance by default.

When support is compiled in, test code can now set
debug_invalidate_system_caches_always=1 locally to a backend to test
specific queries, functions, extensions, etc.  Or tests can toggle it
globally for a specific test case while retaining normal performance
during test setup and teardown.

For backwards compatibility with existing test harnesses and scripts,
debug_invalidate_system_caches_always defaults to 1 if
CLOBBER_CACHE_ALWAYS is defined, and to 3 if CLOBBER_CACHE_RECURSIVE
is defined.

CLOBBER_CACHE_ENABLED is now visible in pg_config_manual.h, as is the
related RECOVER_RELATION_BUILD_MEMORY setting for the relcache.

Author: Craig Ringer <craig.ringer@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMsr+YF=+ctXBZj3ywmvKNUjWpxmuTuUKuv-rgbHGX5i5pLstQ@mail.gmail.com
2021-01-06 10:46:44 +01:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Tom Lane 1f9158ba48 Suppress log spam from multiple reports of SIGQUIT shutdown.
When the postmaster sends SIGQUIT to its children, there's no real
need for all the children to log that fact; the postmaster already
made a log entry about it, so adding perhaps dozens or hundreds of
child-process log entries adds nothing of value.  So, let's introduce
a new ereport level to specify "WARNING, but never send to log" and
use that for these messages.

Such a change wouldn't have been desirable before commit 7e784d1dc,
because if someone manually SIGQUIT's a backend, we *do* want to log
that.  But now we can tell the difference between a signal that was
issued by the postmaster and one that was not with reasonable
certainty.

While we're here, also clear error_context_stack before ereport'ing,
to prevent error callbacks from being invoked in the signal-handler
context.  This should reduce the odds of getting hung up while trying
to notify the client.

Per a suggestion from Andres Freund.

Discussion: https://postgr.es/m/20201225230331.hru3u6obyy6j53tk@alap3.anarazel.de
2020-12-29 18:02:38 -05:00
Alexander Korotkov db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov d1d61a8b23 Improve the signature of internal multirange functions
There is a set of *_internal() functions exposed in
include/utils/multirangetypes.h.  This commit improves the signatures of these
functions in two ways.
 * Add const qualifies where applicable.
 * Replace multirange typecache argument with range typecache argument.
   Multirange typecache was used solely to find the range typecache.  At the
   same time, range typecache is easier for the caller to find.
2020-12-29 23:35:38 +03:00
Alexander Korotkov 4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Bruce Momjian 3187ef7c46 Revert "Add key management system" (978f869b99) & later commits
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.

Reported-by: Tom Lane, Michael Paquier

Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
2020-12-27 21:37:42 -05:00
Bruce Momjian 978f869b99 Add key management system
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits.  The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode.  A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start.  New parameters allow a file descriptor open to the terminal to
be passed.  pg_upgrade support has also been added.

Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us

Author: Masahiko Sawada, me, Stephen Frost
2020-12-25 10:19:44 -05:00
Bruce Momjian c3826f831e move hex_decode() to /common so it can be called from frontend
This allows removal of a copy of hex_decode() from ecpg, and will be
used by the soon-to-be added pg_alterckey command.

Backpatch-through: master
2020-12-24 17:25:48 -05:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Tom Lane b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut 8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Michael Paquier 4f48a6fbe2 Change SHA2 implementation based on OpenSSL to use EVP digest routines
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
This takes advantage of the refactoring done in 87ae969 that has
introduced the allocation and free routines for cryptographic hashes.

Since 1.1.0, OpenSSL does not publish the contents of the cryptohash
contexts, forcing any consumers to rely on OpenSSL for all allocations.
Hence, the resource owner callback mechanism gains a new set of routines
to track and free cryptohash contexts when using OpenSSL, preventing any
risks of leaks in the backend.  Nothing is needed in the frontend thanks
to the refactoring of 87ae969, and the resowner knowledge is isolated
into cryptohash_openssl.c.

Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.

Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Heikki Linnakangas
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
2020-12-04 10:49:23 +09:00
Tom Lane 2432b1a040 Avoid spamming the client with multiple ParameterStatus messages.
Up to now, we sent a ParameterStatus message to the client immediately
upon any change in the active value of any GUC_REPORT variable.  This
was only barely okay when the feature was designed; now that we have
things like function SET clauses, there are very plausible use-cases
where a GUC_REPORT variable might change many times within a query
--- and even end up back at its original value, perhaps.  Fortunately
most of our GUC_REPORT variables are unlikely to be changed often;
but there are proposals in play to enlarge that set, or even make it
user-configurable.

Hence, let's fix things to not generate more than one ParameterStatus
message per variable per query, and to not send any message at all
unless the end-of-query value is different from what we last reported.

Discussion: https://postgr.es/m/5708.1601145259@sss.pgh.pa.us
2020-11-25 11:40:44 -05:00
David Rowley b0727ae99b Tidy up definitions of pg_attribute_hot and pg_attribute_cold
1fa22a43a was a quick fix for portability problem I introduced in
697e1d02f.  1fa22a43a adds a few more cases to the preprocessor logic than
I'd have liked.  Andres Freund and Dagfinn Ilmari Mannsåker suggested a
better way to do this.

In passing, also adjust the only current usage of these macros so that the
macro comes before the function's return type in the declaration of the
function.  This now matches what the definition of the function does.

Discussion: https://postgr.es/m/20200625163553.lt6wocbjhklp5pl4@alap3.anarazel.de
Discussion: https://postgr.es/m/87pn43bmok.fsf@wibble.ilmari.org
2020-11-25 10:52:50 +13:00
Heikki Linnakangas 0a2bc5d61e Move per-agg and per-trans duplicate finding to the planner.
This has the advantage that the cost estimates for aggregates can count
the number of calls to transition and final functions correctly.

Bump catalog version, because views can contain Aggrefs.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/b2e3536b-1dbc-8303-c97e-89cb0b4a9a48%40iki.fi
2020-11-24 10:45:00 +02:00
Tom Lane 789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
David Rowley 913ec71d68 Improve compiler code layout in elog/ereport ERROR calls
Here we use a bit of preprocessor trickery to coax supporting compilers
into laying out their generated code so that the code that's in the same
branch as elog(ERROR)/ereport(ERROR) calls is moved away from the hot
path.  Effectively, this reduces the size of the hot code meaning that it
can sit on fewer cache lines.

Performance improvements of between 10-15% have been seen on highly CPU
bound workloads using pgbench's TPC-b benchmark.

What's achieved here is very similar to putting the error condition inside
an unlikely() macro. For example;

if (unlikely(x < 0))
    elog(ERROR, "invalid x value");

now there's no need to make use of unlikely() here as the common macro
used by elog and ereport will now see that elevel is >= ERROR and make use
of a pg_attribute_cold marked version of errstart().

When elevel < ERROR or if it cannot be determined to be constant, the
original behavior is maintained.

Author: David Rowley
Reviewed-by: Andres Freund, Peter Eisentraut
Discussion: https://postgr.es/m/CAApHDvrVpasrEzLL2er7p9iwZFZ%3DJj6WisePcFeunwfrV0js_A%40mail.gmail.com
2020-11-24 12:04:42 +13:00
Tom Lane 8597a48d01 Fix FPeq() and friends to get the right answers for infinities.
"FPeq(infinity, infinity)" returned false, on account of getting NaN
when it subtracts the two inputs.  Fix that by adding a separate
check for exact equality.

FPle() and FPge() similarly got the wrong answer for two like-signed
infinities.  In those cases, we can just rearrange the comparisons
to avoid potentially subtracting infinities.

While the sibling functions FPne() etc accidentally gave the right
answers even with the internal NaN results, it seems best to make
similar adjustments to them to avoid depending on this.

FPeq() has to be converted to an inline function to avoid double
evaluations of its arguments, and I did the same for the others
just for consistency.

In passing, make the handling of NaN cases in line_eq() and
point_eq_point() simpler and easier to reason about, and perhaps
faster.

This results in just one visible regression test change: slope()
now gives DBL_MAX for two inputs of (inf,1e300), which is consistent
with what it does for (1e300,inf), so that seems like a bug fix.

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-21 16:46:43 -05:00
Peter Eisentraut a378ba49a5 Add pg_nodiscard decorations to some functions
Especially for the list API such as lappend() forgetting to assign the
return value is a common problem.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/e3753562-99cd-b65f-5aca-687dfd1ec2fc@2ndquadrant.com
2020-11-11 11:00:27 +01:00
Tom Lane ec29427ce2 Fix and simplify some usages of TimestampDifference().
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format.  This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.

Two of these call sites were in fact wrong:

* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.

* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended.  Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units.  This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.

There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds.  It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.

Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.

Alexey Kondratov and Tom Lane

Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
2020-11-10 22:51:54 -05:00
Tom Lane fac83dbd6f Remove underflow error in float division with infinite divisor.
float4_div and float8_div correctly produced zero for zero divided
by infinity, but threw an underflow error for nonzero finite values
divided by infinity.  This seems wrong; at the very least it's
inconsistent with the behavior recently implemented for numeric
infinities.  Remove the error and allow zero to be returned.

This patch also removes a useless isinf() test from the overflow
checks in these functions (non-Inf divided by Inf can't produce Inf).

Extracted from a larger patch; this seems significant outside the
context of geometric operators, so it deserves its own commit.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/CAGf+fX70rWFOk5cd00uMfa__0yP+vtQg5ck7c2Onb-Yczp0URA@mail.gmail.com
2020-11-04 18:11:15 -05:00
Thomas Munro 257836a755 Track collation versions for indexes.
Record the current version of dependent collations in pg_depend when
creating or rebuilding an index.  When accessing the index later, warn
that the index may be corrupted if the current version doesn't match.

Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
Michael Paquier, Robert Haas, Tom Lane and others for very helpful
discussion.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
2020-11-03 01:19:50 +13:00
Tom Lane 3db322eaab Prevent internal overflows in date-vs-timestamp and related comparisons.
The date-vs-timestamp, date-vs-timestamptz, and timestamp-vs-timestamptz
comparators all worked by promoting the first type to the second and
then doing a simple same-type comparison.  This works fine, except
when the conversion result is out of range, in which case we throw an
entirely avoidable error.  The sources of such failures are
(a) type date can represent dates much farther in the future than
the timestamp types can;
(b) timezone rotation might cause a just-in-range timestamp value to
become a just-out-of-range timestamptz value.

Up to now we just ignored these corner-case issues, but now we have
an actual user complaint (bug #16657 from Huss EL-Sheikh), so let's
do something about it.

It turns out that commit 52ad1e659 already built all the necessary
infrastructure to support error-free comparisons, but neglected to
actually use it in the main-line code paths.  Fix that, do a little
bit of code style review, and remove the now-duplicate logic in
jsonpath_exec.c.

Back-patch to v13 where 52ad1e659 came in.  We could take this back
further by back-patching said infrastructure, but given the small
number of complaints so far, I don't feel a great need to.

Discussion: https://postgr.es/m/16657-cde2f876d8cc7971@postgresql.org
2020-10-07 17:10:26 -04:00
Tom Lane 97b6144826 Make postgres.bki use the same literal-string syntax as postgresql.conf.
The BKI file's string quoting conventions were previously quite weird,
perhaps as a result of repurposing a function built to scan
single-quoted strings to scan double-quoted ones.  Change to use the
same rules as we use in GUC files, allowing some simplifications in
genbki.pl and initdb.c.

While at it, completely remove the backend's scanstr() function, which
was essentially a duplicate of the string dequoting code in guc-file.l.
Instead export that one (under a less generic name than it had) and let
bootscanner.l use it.  Now we can clarify that scansup.c exists only to
support the main lexer. We could alternatively have removed GUC_scanstr,
but this way seems better since the previous arrangement could mislead
a reader into thinking that scanstr() had something to do with the main
lexer's handling of string literals.  Maybe it did once, but if so it
was a long time ago.

This patch does not bump catversion, since the initially-installed
catalog contents don't change.  Note however that successful initdb
after applying this patch will require up-to-date postgres.bki as well
as postgres and initdb executables.

In passing, remove a bunch of very-long-obsolete #include's in
bootparse.y and bootscanner.l.

John Naylor

Discussion: https://postgr.es/m/CACPNZCtDpd18T0KATTmCggO2GdVC4ow86ypiq5ENff1VnauL8g@mail.gmail.com
2020-10-04 16:09:55 -04:00
Robert Haas f5ea92e8d6 Expose oldSnapshotControl definition via new header.
This makes it possible for code outside snapmgr.c to examine the
contents of this data structure. This commit does not add any code
which actually does so; a subsequent commit will make that change.

Patch by me, reviewed by Thomas Munro, Dilip Kumar, Hamid Akhtar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
2020-09-24 13:33:09 -04:00
Thomas Munro be0a666665 Remove large fill factor support from dynahash.c.
Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:

1.  It was an integer, whereas for in-memory hash tables interesting
load factor targets are probably somewhere near the 0.75-1.0 range.

2.  It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.

3.  We are not aware of anyone ever having used a non-default value.

Therefore, remove support, effectively fixing it at 1.

Author: Jakub Wartak <Jakub.Wartak@tomtom.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
2020-09-19 11:40:39 +12:00
Heikki Linnakangas 16fa9b2b30 Add support for building GiST index by sorting.
This adds a new optional support function to the GiST access method:
sortsupport. If it is defined, the GiST index is built by sorting all data
to the order defined by the sortsupport's comparator function, and packing
the tuples in that order to GiST pages. This is similar to how B-tree
index build works, and is much faster than inserting the tuples one by
one. The resulting index is smaller too, because the pages are packed more
tightly, upto 'fillfactor'. The normal build method works by splitting
pages, which tends to lead to more wasted space.

The quality of the resulting index depends on how good the opclass-defined
sort order is. A good order preserves locality of the input data.

As the first user of this facility, add 'sortsupport' function to the
point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
interleaving the bits of the X and Y coordinates.

Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
2020-09-17 11:33:40 +03:00
Jeff Davis 0758964963 logtape.c: do not preallocate for tapes when sorting
The preallocation logic is only useful for HashAgg, so disable it when
sorting.

Also, adjust an out-of-date comment.

Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com
Backpatch-through: 13
2020-09-11 17:10:02 -07:00
Peter Eisentraut 0aa8f76408 Expose internal function for converting int64 to numeric
Existing callers had to take complicated detours via
DirectFunctionCall1().  This simplifies a lot of code.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
2020-09-09 20:16:28 +02:00
Andres Freund 623a9ba79b snapshot scalability: cache snapshots using a xact completion counter.
Previous commits made it faster/more scalable to compute snapshots. But not
building a snapshot is still faster. Now that GetSnapshotData() does not
maintain RecentGlobal* anymore, that is actually not too hard:

This commit introduces xactCompletionCount, which tracks the number of
top-level transactions with xids (i.e. which may have modified the database)
that completed in some form since the start of the server.

We can avoid rebuilding the snapshot's contents whenever the current
xactCompletionCount is the same as it was when the snapshot was
originally built.  Currently this check happens while holding
ProcArrayLock. While it's likely possible to perform the check without
acquiring ProcArrayLock, it seems better to do that separately /
later, some careful analysis is required. Even with the lock this is a
significant win on its own.

On a smaller two socket machine this gains another ~1.03x, on a larger
machine the effect is roughly double (earlier patch version tested
though).  If we were able to safely avoid the lock there'd be another
significant gain on top of that.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-17 21:08:30 -07:00
Andres Freund dc7420c2c9 snapshot scalability: Don't compute global horizons while building snapshots.
To make GetSnapshotData() more scalable, it cannot not look at at each proc's
xmin: While snapshot contents do not need to change whenever a read-only
transaction commits or a snapshot is released, a proc's xmin is modified in
those cases. The frequency of xmin modifications leads to, particularly on
higher core count systems, many cache misses inside GetSnapshotData(), despite
the data underlying a snapshot not changing. That is the most
significant source of GetSnapshotData() scaling poorly on larger systems.

Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons /
thresholds as it has so far. But we don't really have to: The horizons don't
actually change that much between GetSnapshotData() calls. Nor are the horizons
actually used every time a snapshot is built.

The trick this commit introduces is to delay computation of accurate horizons
until there use and using horizon boundaries to determine whether accurate
horizons need to be computed.

The use of RecentGlobal[Data]Xmin to decide whether a row version could be
removed has been replaces with new GlobalVisTest* functions.  These use two
thresholds to determine whether a row can be pruned:
1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed
   are definitely still visible.
2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
   definitely be removed
GetSnapshotData() updates definitely_needed to be the xmin of the computed
snapshot.

When testing whether a row can be removed (with GlobalVisTestIsRemovableXid())
and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID <
definitely_needed) the boundaries can be recomputed to be more accurate. As it
is not cheap to compute accurate boundaries, we limit the number of times that
happens in short succession.  As the boundaries used by
GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by
GetSnapshotData()), it is likely that further test can benefit from an earlier
computation of accurate horizons.

To avoid regressing performance when old_snapshot_threshold is set (as that
requires an accurate horizon to be computed), heap_page_prune_opt() doesn't
unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the
computation of the limited horizon, and the triggering of errors (with
SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove
tuples.

This commit just removes the accesses to PGXACT->xmin from
GetSnapshotData(), but other members of PGXACT residing in the same
cache line are accessed. Therefore this in itself does not result in a
significant improvement. Subsequent commits will take advantage of the
fact that GetSnapshotData() now does not need to access xmins anymore.

Note: This contains a workaround in heap_page_prune_opt() to keep the
snapshot_too_old tests working. While that workaround is ugly, the tests
currently are not meaningful, and it seems best to address them separately.

Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Thomas Munro <thomas.munro@gmail.com>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
2020-08-12 16:03:49 -07:00
Peter Eisentraut 1784f278a6 Replace remaining StrNCpy() by strlcpy()
They are equivalent, except that StrNCpy() zero-fills the entire
destination buffer instead of providing just one trailing zero.  For
all but a tiny number of callers, that's just overhead rather than
being desirable.

Remove StrNCpy() as it is now unused.

In some cases, namestrcpy() is the more appropriate function to use.
While we're here, simplify the API of namestrcpy(): Remove the return
value, don't check for NULL input.  Nothing was using that anyway.
Also, remove a few unused name-related functions.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com
2020-08-10 23:20:37 +02:00
David Rowley 6ee3b5fb99 Use int64 instead of long in incremental sort code
Windows 64bit has 4-byte long values which is not suitable for tracking
disk space usage in the incremental sort code. Let's just make all these
fields int64s.

Author: James Coleman
Discussion: https://postgr.es/m/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com
Backpatch-through: 13, where the incremental sort code was added
2020-08-02 14:24:46 +12:00
Amit Kapila c55040ccd0 WAL Log invalidations at command end with wal_level=logical.
When wal_level=logical, write invalidations at command end into WAL so
that decoding can use this information.

This patch is required to allow the streaming of in-progress transactions
in logical decoding.  The actual work to allow streaming will be committed
as a separate patch.

We still add the invalidations to the cache and write them to WAL at
commit time in RecordTransactionCommit(). This uses the existing
XLOG_INVALIDATIONS xlog record type, from the RM_STANDBY_ID resource
manager (see LogStandbyInvalidations for details).

So existing code relying on those invalidations (e.g. redo) does not need
to be changed.

The invalidations written at command end uses a new xlog record type
XLOG_XACT_INVALIDATIONS, from RM_XACT_ID resource manager. See
LogLogicalInvalidations for details.

These new xlog records are ignored by existing redo procedures, which
still rely on the invalidations written to commit records.

The invalidations are decoded and accumulated in top-transaction, and then
executed during replay.  This obviates the need to decode the
invalidations as part of a commit record.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_XACT_INVALIDATIONS.

Author: Dilip Kumar, Tomas Vondra, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-07-23 08:34:48 +05:30
Tom Lane a57d312a77 Support infinity and -infinity in the numeric data type.
Add infinities that behave the same as they do in the floating-point
data types.  Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.

The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.

Patch by me, reviewed by Dean Rasheed and Andrew Gierth

Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
2020-07-22 19:19:44 -04:00
Tom Lane 4fb6aeb4f6 Make floating-point "NaN / 0" return NaN instead of raising an error.
This is more consistent with the IEEE 754 spec and our treatment of
NaNs elsewhere; in particular, the case has always acted that way in
"numeric" arithmetic.

Noted by Dean Rasheed.

Discussion: https://postgr.es/m/3421746.1594927785@sss.pgh.pa.us
2020-07-20 19:44:45 -04:00
Fujii Masao d05b172a76 Add generic_plans and custom_plans fields into pg_prepared_statements.
There was no easy way to find how many times generic and custom plans
have been executed for a prepared statement. This commit exposes those
numbers of times in pg_prepared_statements view.

Author: Atsushi Torikoshi, Kyotaro Horiguchi
Reviewed-by: Tatsuro Yamada, Masahiro Ikeda, Fujii Masao
Discussion: https://postgr.es/m/CACZ0uYHZ4M=NZpofH6JuPHeX=__5xcDELF8hT8_2T+R55w4RQw@mail.gmail.com
2020-07-20 11:55:50 +09:00
Michael Paquier 2a10fdc430 Eliminate cache lookup errors in SQL functions for object addresses
When using the following functions, users could see various types of
errors of the type "cache lookup failed for OID XXX" with elog(), that
can only be used for internal errors:
* pg_describe_object()
* pg_identify_object()
* pg_identify_object_as_address()

The set of APIs managing object addresses for all object types are made
smarter by gaining a new argument "missing_ok" that allows any caller to
control if an error is raised or not on an undefined object.  The SQL
functions listed above are changed to handle the case where an object is
missing.

Regression tests are added for all object types for the cases where
these are undefined.  Before this commit, these cases failed with cache
lookup errors, and now they basically return NULL (minus the name of the
object type requested).

Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
2020-07-15 09:03:10 +09:00
Michael Paquier b8401c32ba Fix some header identifications
The following header files missed the shot:
- jsonfuncs.h, as of ce0425b.
- jsonapi.h, as of beb4699.
- llvmjit_emit.h as of 7ec0d80.
- partdesc.h, as of 1bb5e78.

Author: Jesse Zhang
Discussion: https://postgr.es/m/CAGf+fX4-8xULEOz09DE2dZGjT+q8VJ--rqfTpvcFwc+A4fc-3Q@mail.gmail.com
2020-07-14 13:39:45 +09:00
Andres Freund a9a4a7ad56 code: replace most remaining uses of 'master'.
Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 13:24:35 -07:00
Andres Freund 5e7bbb5286 code: replace 'master' with 'primary' where appropriate.
Also changed "in the primary" to "on the primary", and added a few
"the" before "primary".

Author: Andres Freund
Reviewed-By: David Steele
Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
2020-07-08 12:57:23 -07:00
Michael Paquier aa38434824 Refactor routines for name lookups of procedures and operators
This introduces a new set of extended routines for procedure and
operator name lookups, with a flag bitmask argument that can modify the
result.  The following options are available:
- Force schema qualification, ignoring search_path.  This is similar to
the existing option for format_{operator|procedure}_qualified().
- Force NULL as result instead of a numeric OID for an undefined
object.  This option is new.

This is a refactoring similar to 1185c78, that will be used for a future
patch to improve the SQL functions providing information using object
addresses for undefined objects.

Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
2020-07-06 13:06:08 +09:00
Michael Paquier 1185c78294 Add new flag to format_type_extended() to get NULL for undefined type
If a type scanned is undefined, type format routines have two behaviors
depending on if FORMAT_TYPE_ALLOW_INVALID is used by the caller or not:
- Issue a cache lookup error
- Return an undefined type name "???", "???[]" or "-"

The current interface is not really helpful for callers willing to
format properly a type name, but still make sure that the type is
defined as there could be types matching the strings generated when
looking for an undefined type, even if that should not be a problem in
practice.  In order to counter that, add a new flag called
FORMAT_TYPE_INVALID_AS_NULL that returns a NULL result instead of "???
or "-" which does not generate an error.  This flag will be used in a
follow-up patch improving the set of SQL functions showing information
for object addresses when it comes to undefined objects.

Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
2020-07-06 12:12:11 +09:00
David Rowley dad75eb4a8 Have pg_itoa, pg_ltoa and pg_lltoa return the length of the string
Core by no means makes excessive use of these functions, but quite a large
number of those usages do require the caller to call strlen() on the
returned string.  This is quite wasteful since these functions do already
have a good idea of the length of the string, so we might as well just
have them return that.

Reviewed-by: Andrew Gierth
Discussion: https://postgr.es/m/CAApHDvrm2A5x2uHYxsqriO2cUaGcFvND%2BksC9e7Tjep0t2RK_A%40mail.gmail.com
2020-06-13 12:32:00 +12:00
David Rowley 9a7fccd9ea Add missing extern keyword for a couple of numutils functions
In passing, also remove a few surplus empty lines from pg_ltoa and
pg_ulltoa_n in numutils.c

Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y2ou3xuh.fsf@news-spur.riddles.org.uk
Backpatch-through: 13, where these changes were introduced
2020-06-13 11:27:25 +12:00
Peter Eisentraut b1d32d3e32 Unify drop-by-OID functions
There are a number of Remove${Something}ById() functions that are
essentially identical in structure and only different in which catalog
they are working on.  Refactor this to be one generic function.  The
information about which oid column, index, etc. to use was already
available in ObjectProperty for most catalogs, in a few cases it was
easily added.

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/331d9661-1743-857f-1cbb-d5728bcd62cb%402ndquadrant.com
2020-06-09 09:39:46 +02:00
Tom Lane 0c882e52a8 Improve ineq_histogram_selectivity's behavior for non-default orderings.
ineq_histogram_selectivity() can be invoked in situations where the
ordering we care about is not that of the column's histogram.  We could
be considering some other collation, or even more drastically, the
query operator might not agree at all with what was used to construct
the histogram.  (We'll get here for anything using scalarineqsel-based
estimators, so that's quite likely to happen for extension operators.)

Up to now we just ignored this issue and assumed we were dealing with
an operator/collation whose sort order exactly matches the histogram,
possibly resulting in junk estimates if the binary search gets confused.
It's past time to improve that, since the use of nondefault collations
is increasing.  What we can do is verify that the given operator and
collation match what's recorded in pg_statistic, and use the existing
code only if so.  When they don't match, instead execute the operator
against each histogram entry, and take the fraction of successes as our
selectivity estimate.  This gives an estimate that is probably good to
about 1/histogram_size, with no assumptions about ordering.  (The quality
of the estimate is likely to degrade near the ends of the value range,
since the two orderings probably don't agree on what is an extremal value;
but this is surely going to be more reliable than what we did before.)

At some point we might further improve matters by storing more than one
histogram calculated according to different orderings.  But this code
would still be good fallback logic when no matches exist, so that is
not an argument for not doing this.

While here, also improve get_variable_range() to deal more honestly
with non-default collations.

This isn't back-patchable, because it requires adding another argument
to ineq_histogram_selectivity, and because it might have significant
impact on the estimation results for extension operators relying on
scalarineqsel --- mostly for the better, one hopes, but in any case
destabilizing plan choices in back branches is best avoided.

Per investigation of a report from James Lucas.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:55:27 -04:00
Tom Lane 044c99bc56 Use query collation, not column's collation, while examining statistics.
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering.  This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement.  However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation.  We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.

The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice.  (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)

The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches.  Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used.  That does not match the prior behavior, but it should avoid risk
of failure in most cases.  (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)

Per report from James Lucas.  Back-patch to v12 where the problem was
introduced.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
2020-06-05 16:18:50 -04:00
Tom Lane a9632830bb Reject "23:59:60.nnn" in datetime input.
It's intentional that we don't allow values greater than 24 hours,
while we do allow "24:00:00" as well as "23:59:60" as inputs.
However, the range check was miscoded in such a way that it would
accept "23:59:60.nnn" with a nonzero fraction.  For time or timetz,
the stored result would then be greater than "24:00:00" which would
fail dump/reload, not to mention possibly confusing other operations.

Fix by explicitly calculating the result and making sure it does not
exceed 24 hours.  (This calculation is redundant with what will happen
later in tm2time or tm2timetz.  Maybe someday somebody will find that
annoying enough to justify refactoring to avoid the duplication; but
that seems too invasive for a back-patched bug fix, and the cost is
probably unmeasurable anyway.)

Note that this change also rejects such input as the time portion
of a timestamp(tz) value.

Back-patch to v10.  The bug is far older, but to change this pre-v10
we'd need to ensure that the logic behaves sanely with float timestamps,
which is possibly nontrivial due to roundoff considerations.
Doesn't really seem worth troubling with.

Per report from Christoph Berg.

Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
2020-06-04 16:42:23 -04:00
Tom Lane 5cbfce562f Initial pgindent and pgperltidy run for v13.
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.

Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
2020-05-14 13:06:50 -04:00
Tom Lane 9d25e1aa31 Clean up cpluspluscheck violation.
"operator" is a reserved word in C++, so per project conventions,
don't use it as an identifier in header files.

My oversight in commit a80818605.
2020-04-21 11:21:15 -04:00
Michael Paquier 8128b0c152 Fix collection of typos and grammar mistakes in the tree, volume 2
This fixes some comments and documentation new as of Postgres 13, and is
a follow-up of the work done in dd0f37e.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
2020-04-14 14:45:43 +09:00
Alexander Korotkov 1aac32df89 Revert 0f5ca02f53
0f5ca02f53 introduces 3 new keywords.  It appears to be too much for relatively
small feature.  Given now we past feature freeze, it's already late for
discussion of the new syntax.  So, revert.

Discussion: https://postgr.es/m/28209.1586294824%40sss.pgh.pa.us
2020-04-08 11:37:27 +03:00
Alexander Korotkov 0f5ca02f53 Implement waiting for given lsn at transaction start
This commit adds following optional clause to BEGIN and START TRANSACTION
commands.

  WAIT FOR LSN lsn [ TIMEOUT timeout ]

New clause pospones transaction start till given lsn is applied on standby.
This clause allows user be sure, that changes previously made on primary would
be visible on standby.

New shared memory struct is used to track awaited lsn per backend.  Recovery
process wakes up backend once required lsn is applied.

Author: Ivan Kartyshov, Anna Akenteva
Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi
Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs
Reviewed-by: Amit Kapila, Alexander Korotkov
Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
2020-04-07 23:51:10 +03:00
Tom Lane 26a944cf29 Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.
Since the existing bit number argument can't exceed INT32_MAX, it's
not possible for these functions to manipulate bits beyond the first
256MB of a bytea value.  Lift that restriction by redeclaring the
bit number arguments as int8 (which requires a catversion bump,
hence is not back-patchable).

The similarly-named functions for bit/varbit don't really have a
problem because we restrict those types to at most VARBITMAXLEN bits;
hence leave them alone.

While here, extend the encode/decode functions in utils/adt/encode.c
to allow dealing with values wider than 1GB.  This is not a live bug
or restriction in current usage, because no input could be more than
1GB, and since none of the encoders can expand a string more than 4X,
the result size couldn't overflow uint32.  But it might be desirable
to support more in future, so make the input length values size_t
and the potential-output-length values uint64.

Also add some test cases to improve the miserable code coverage
of these functions.

Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
2020-04-07 15:57:58 -04:00
Tom Lane c7654f6a37 Fix representation of SORT_TYPE_STILL_IN_PROGRESS.
It turns out that the code did indeed rely on a zeroed
TuplesortInstrumentation.sortMethod field to indicate
"this worker never did anything", although it seems the
issue only comes up during certain race-condition-y cases.

Hence, rearrange the TuplesortMethod enum to restore
SORT_TYPE_STILL_IN_PROGRESS to having the value zero,
and add some comments reinforcing that that isn't optional.

Also future-proof a loop over the possible values of the enum.
sizeof(bits32) happened to be the correct limit value,
but only by purest coincidence.

Per buildfarm and local investigation.

Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us
2020-04-06 22:22:13 -04:00
Thomas Munro aeec457de8 Add SQL type xid8 to expose FullTransactionId to users.
Similar to xid, but 64 bits wide.  This new type is suitable for use in
various system views and administration functions.

Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
2020-04-07 12:03:59 +12:00
Tomas Vondra d2d8a229bc Implement Incremental Sort
Incremental Sort is an optimized variant of multikey sort for cases when
the input is already sorted by a prefix of the requested sort keys. For
example when the relation is already sorted by (key1, key2) and we need
to sort it by (key1, key2, key3) we can simply split the input rows into
groups having equal values in (key1, key2), and only sort/compare the
remaining column key3.

This has a number of benefits:

- Reduced memory consumption, because only a single group (determined by
  values in the sorted prefix) needs to be kept in memory. This may also
  eliminate the need to spill to disk.

- Lower startup cost, because Incremental Sort produce results after each
  prefix group, which is beneficial for plans where startup cost matters
  (like for example queries with LIMIT clause).

We consider both Sort and Incremental Sort, and decide based on costing.

The implemented algorithm operates in two different modes:

- Fetching a minimum number of tuples without check of equality on the
  prefix keys, and sorting on all columns when safe.

- Fetching all tuples for a single prefix group and then sorting by
  comparing only the remaining (non-prefix) keys.

We always start in the first mode, and employ a heuristic to switch into
the second mode if we believe it's beneficial - the goal is to minimize
the number of unnecessary comparions while keeping memory consumption
below work_mem.

This is a very old patch series. The idea was originally proposed by
Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the
patch was taken over by James Coleman, who wrote and rewrote most of the
current code.

There were many reviewers/contributors since 2013 - I've done my best to
pick the most active ones, and listed them in this commit message.

Author: James Coleman, Alexander Korotkov
Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov
Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
2020-04-06 21:35:10 +02:00
Michael Paquier a40caf5f86 Preserve clustered index after rewrites with ALTER TABLE
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER.  This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly.  Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.

Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
2020-04-06 11:03:49 +09:00
Noah Misch c6b92041d3 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 12:25:34 -07:00
Tom Lane 0b34e7d307 Improve user control over truncation of logged bind-parameter values.
This patch replaces the boolean GUC log_parameters_on_error introduced
by commit ba79cb5dc with an integer log_parameter_max_length_on_error,
adding the ability to specify how many bytes to trim each logged
parameter value to.  (The previous coding hard-wired that choice at
64 bytes.)

In addition, add a new parameter log_parameter_max_length that provides
similar control over truncation of query parameters that are logged in
response to statement-logging options, as opposed to errors.  Previous
releases always logged such parameters in full, possibly causing log
bloat.

For backwards compatibility with prior releases,
log_parameter_max_length defaults to -1 (log in full), while
log_parameter_max_length_on_error defaults to 0 (no logging).

Per discussion, log_parameter_max_length is SUSET since the DBA should
control routine logging behavior, but log_parameter_max_length_on_error
is USERSET because it also affects errcontext data sent back to the
client.

Alexey Bashtanov, editorialized a little by me

Discussion: https://postgr.es/m/b10493cc-a399-a03a-67c7-068f2791ee50@imap.cc
2020-04-02 15:04:51 -04:00
Tom Lane a80818605e Improve selectivity estimation for assorted match-style operators.
Quite a few matching operators such as JSONB's @> used "contsel" and
"contjoinsel" as their selectivity estimators.  That was a bad idea,
because (a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators as
five times more selective than equality, which is surely pretty silly.

There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values.  That code is
actually 100% generic, except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.

Hence, migrate the guts of ltreeparentsel() into the core code, provide
wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate
default estimate, and use those for the non-geometric operators that
formerly used contsel (mostly JSONB containment operators and tsquery
matching).

Also apply this code to some match-like operators in hstore, ltree, and
pg_trgm, including the former users of ltreeparentsel as well as ones
that improperly used contsel.  Since commit 911e70207 just created new
versions of those extensions that we haven't released yet, we can sneak
this change into those new versions instead of having to create an
additional generation of update scripts.

Patch by me, reviewed by Alexey Bashtanov

Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
2020-04-01 10:32:33 -04:00
Alexander Korotkov 911e702077 Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing.  These index AMs are GiST, GIN,
SP-GiST and BRIN.  There opclasses define representation of keys, operations on
them and supported search strategies.  So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision.  This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.

This commit doesn't introduce new storage in system catalog.  Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.

In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions.  Options
are set to fn_expr as the constant bytea expression.  It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.

This commit comes with some examples of opclass options usage.  We parametrize
signature length in GiST.  That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops.  Also we parametrize maximum number of integer ranges for
gist__int_ops.  However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.

Catversion is bumped.

Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 19:17:23 +03:00
David Rowley b07642dbcd Trigger autovacuum based on number of INSERTs
Traditionally autovacuum has only ever invoked a worker based on the
estimated number of dead tuples in a table and for anti-wraparound
purposes. For the latter, with certain classes of tables such as
insert-only tables, anti-wraparound vacuums could be the first vacuum that
the table ever receives. This could often lead to autovacuum workers being
busy for extended periods of time due to having to potentially freeze
every page in the table. This could be particularly bad for very large
tables. New clusters, or recently pg_restored clusters could suffer even
more as many large tables may have the same relfrozenxid, which could
result in large numbers of tables requiring an anti-wraparound vacuum all
at once.

Here we aim to reduce the work required by anti-wraparound and aggressive
vacuums in general, by triggering autovacuum when the table has received
enough INSERTs. This is controlled by adding two new GUCs and reloptions;
autovacuum_vacuum_insert_threshold and
autovacuum_vacuum_insert_scale_factor. These work exactly the same as the
existing scale factor and threshold controls, only base themselves off the
number of inserts since the last vacuum, rather than the number of dead
tuples. New controls were added rather than reusing the existing
controls, to allow these new vacuums to be tuned independently and perhaps
even completely disabled altogether, which can be done by setting
autovacuum_vacuum_insert_threshold to -1.

We make no attempt to skip index cleanup operations on these vacuums as
they may trigger for an insert-mostly table which continually doesn't have
enough dead tuples to trigger an autovacuum for the purpose of removing
those dead tuples. If we were to skip cleaning the indexes in this case,
then it is possible for the index(es) to become bloated over time.

There are additional benefits to triggering autovacuums based on inserts,
as tables which never contain enough dead tuples to trigger an autovacuum
are now more likely to receive a vacuum, which can mark more of the table
as "allvisible" and encourage the query planner to make use of Index Only
Scans.

Currently, we still obey vacuum_freeze_min_age when triggering these new
autovacuums based on INSERTs. For large insert-only tables, it may be
beneficial to lower the table's autovacuum_freeze_min_age so that tuples
are eligible to be frozen sooner. Here we've opted not to zero that for
these types of vacuums, since the table may just be insert-mostly and we
may otherwise freeze tuples that are still destined to be updated or
removed in the near future.

There was some debate to what exactly the new scale factor and threshold
should default to. For now, these are set to 0.2 and 1000, respectively.
There may be some motivation to adjust these before the release.

Author: Laurenz Albe, Darafei Praliaskouski
Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby
Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-28 19:20:12 +13:00
Tom Lane fbc7a71608 Rearrange validity checks for plpgsql "simple" expressions.
Buildfarm experience shows what probably should've occurred to me before:
if a cache flush occurs partway through building a generic plan, then
the plansource may have is_valid = false even though the plan is valid.
We need to accept this case, use the generated plan, and then try to
replan the next time.  We can't try to replan immediately, because that
would produce an infinite loop in CLOBBER_CACHE_ALWAYS builds; moreover
it's really overkill.  (We can assume that the plan is valid, it's just
possibly a bit stale.  Note that the pre-existing code behaved this way,
and the non-simple-expression code paths do too.)  Conversely, not using
the generated plan would drop us into the not-a-simple-expression code
path, which is bad for performance and would also cause regression-test
failures due to visibly different error-reporting behavior.

Hence, refactor the validity-check functions so that the initial check
and recheck cases can react differently to plansource->is_valid.
This makes their usage a bit simpler, too.

Discussion: https://postgr.es/m/7072.1585332104@sss.pgh.pa.us
2020-03-27 14:47:34 -04:00
Tom Lane 8f59f6b9c0 Improve performance of "simple expressions" in PL/pgSQL.
For relatively simple expressions (say, "x + 1" or "x > 0"), plpgsql's
management overhead exceeds the cost of evaluating the expression.
This patch substantially improves that situation, providing roughly
2X speedup for such trivial expressions.

First, add infrastructure in the plancache to allow fast re-validation
of cached plans that contain no table access, and hence need no locks.
Teach plpgsql to use this infrastructure for expressions that it's
already deemed "simple" (which in particular will never contain table
references).

The fast path still requires checking that search_path hasn't changed,
so provide a fast path for OverrideSearchPathMatchesCurrent by
counting changes that have occurred to the active search path in the
current session.  This is simplistic but seems enough for now, seeing
that PushOverrideSearchPath is not used in any performance-critical
cases.

Second, manage the refcounts on simple expressions' cached plans using
a transaction-lifespan resource owner, so that we only need to take
and release an expression's refcount once per transaction not once per
expression evaluation.  The management of this resource owner exactly
parallels the existing management of plpgsql's simple-expression EState.

Add some regression tests covering this area, in particular verifying
that expression caching doesn't break semantics for search_path changes.

Patch by me, but it owes something to previous work by Amit Langote,
who recognized that getting rid of plancache-related overhead would
be a useful thing to do here.  Also thanks to Andres Freund for review.

Discussion: https://postgr.es/m/CAFj8pRDRVfLdAxsWeVLzCAbkLFZhW549K+67tpOc-faC8uH8zw@mail.gmail.com
2020-03-26 18:58:57 -04:00
Tom Lane bda6dedbea Go back to returning int from ereport auxiliary functions.
This reverts the parts of commit 17a28b0364
that changed ereport's auxiliary functions from returning dummy integer
values to returning void.  It turns out that a minority of compilers
complain (not entirely unreasonably) about constructs such as

	(condition) ? errdetail(...) : 0

if errdetail() returns void rather than int.  We could update those
call sites to say "(void) 0" perhaps, but the expectation for this
patch set was that ereport callers would not have to change anything.
And this aspect of the patch set was already the most invasive and
least compelling part of it, so let's just drop it.

Per buildfarm.

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-25 11:57:36 -04:00
Tom Lane 17a28b0364 Improve the internal implementation of ereport().
Change all the auxiliary error-reporting routines to return void,
now that we no longer need to pretend they are passing something
useful to errfinish().  While this probably doesn't save anything
significant at the machine-code level, it allows detection of some
additional types of mistakes.

Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO)
to errfinish not errstart.  This shaves a few cycles off the case where
errstart decides we're not going to emit anything.

Re-implement elog() as a trivial wrapper around ereport(), removing
the separate support infrastructure it used to have.  Aside from
getting rid of some now-surplus code, this means that elog() now
really does have exactly the same semantics as ereport(), in particular
that it can skip evaluation work if the message is not to be emitted.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 12:08:48 -04:00
Tom Lane e3a87b4991 Re-implement the ereport() macro using __VA_ARGS__.
Now that we require C99, we can depend on __VA_ARGS__ to work, and
revising ereport() to use it has several significant benefits:

* The extra parentheses around the auxiliary function calls are now
optional.  Aside from being a bit less ugly, this removes a common
gotcha for new contributors, because in some cases the compiler errors
you got from forgetting them were unintelligible.

* The auxiliary function calls are now evaluated as a comma expression
list rather than as extra arguments to errfinish().  This means that
compilers can be expected to warn about no-op expressions in the list,
allowing detection of several other common mistakes such as forgetting
to add errmsg(...) when converting an elog() call to ereport().

* Unlike the situation with extra function arguments, comma expressions
are guaranteed to be evaluated left-to-right, so this removes platform
dependency in the order of the auxiliary function calls.  While that
dependency hasn't caused us big problems in the past, this change does
allow dropping some rather shaky assumptions around errcontext() domain
handling.

There's no intention to make wholesale changes of existing ereport
calls, but as proof-of-concept this patch removes the extra parens
from a couple of calls in postgres.c.

While new code can be written either way, code intended to be
back-patched will need to use extra parens for awhile yet.  It seems
worth back-patching this change into v12, so as to reduce the window
where we have to be careful about that by one year.  Hence, this patch
is careful to preserve ABI compatibility; a followup HEAD-only patch
will make some additional simplifications.

Andres Freund and Tom Lane

Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
2020-03-24 11:49:00 -04:00
Noah Misch de9396326e Revert "Skip WAL for new relfilenodes, under wal_level=minimal."
This reverts commit cb2fd7eac2.  Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64.  Back-patch to 9.5 (all supported versions).

Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
2020-03-22 09:24:09 -07:00
Tom Lane d0587f52b3 Fix up recent breakage of headerscheck and cpluspluscheck.
headerscheck and cpluspluscheck should skip the recently-added
cmdtaglist.h header, since (like kwlist.h and some other similarly-
designed headers) it's not meant to be included standalone.

evtcache.h was missing an #include to support its usage of Bitmapset.

typecmds.h was missing an #include to support its usage of ParseState.

The first two of these were evidently oversights in commit 2f9661311.
I didn't track down exactly which change broke typecmds.h, but it
must have been some rearrangement in one of its existing inclusions,
because it's referenced ParseState for quite a long time and there
were not complaints from these checking programs before.
2020-03-21 18:28:44 -04:00
Noah Misch cb2fd7eac2 Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this.  If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY.  See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules.  Maintainers of table access
methods should examine that section.

To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL.  A new GUC,
wal_skip_threshold, guides that choice.  If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold.  Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.

Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode.  Introduce rd_firstRelfilenodeSubid.  Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node.  Make relcache.c retain entries for certain
dropped relations until end of transaction.

Back-patch to 9.5 (all supported versions).  This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC.  As
always, update standby systems before master systems.  This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions.  (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)

Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas.  Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem.  Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs.  Reported by Martijn van Oosterhout.

Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-03-21 09:38:26 -07:00
Noah Misch d60ef94d76 Fix cosmetic blemishes involving rd_createSubid.
Remove an obsolete comment from AtEOXact_cleanup().  Restore formatting
of a comment in struct RelationData, mangled by the pgindent run in
commit 9af4159fce.  Back-patch to 9.5 (all
supported versions), because another fix stacks on this.
2020-03-21 09:38:26 -07:00
Alvaro Herrera 487e9861d0
Enable BEFORE row-level triggers for partitioned tables
... with the limitation that the tuple must remain in the same
partition.

Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/20200227165158.GA2071@alvherre.pgsql
2020-03-18 18:58:05 -03:00
Thomas Munro fc34b0d9de Introduce a maintenance_io_concurrency setting.
Introduce a GUC and a tablespace option to control I/O prefetching, much
like effective_io_concurrency, but for work that is done on behalf of
many client sessions.

Use the new setting in heapam.c instead of the hard-coded formula
effective_io_concurrency + 10 introduced by commit 558a9165e0.  Go with
a default value of 10 for now, because it's a round number pretty close
to the value used for that existing case.

Discussion: https://postgr.es/m/CA%2BhUKGJUw08dPs_3EUcdO6M90GnjofPYrWp4YSLaBkgYwS-AqA%40mail.gmail.com
2020-03-16 17:14:26 +13:00
Peter Eisentraut 1cc9c2412c Preserve replica identity index across ALTER TABLE rewrite
If an index was explicitly set as replica identity index, this setting
was lost when a table was rewritten by ALTER TABLE.  Because this
setting is part of pg_index but actually controlled by ALTER
TABLE (not part of CREATE INDEX, say), we have to do some extra work
to restore it.

Based-on-patch-by: Quan Zongliang <quanzongliang@gmail.com>
Reviewed-by: Euler Taveira <euler.taveira@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
2020-03-13 11:57:06 +01:00
Peter Eisentraut bf68b79e50 Refactor ps_status.c API
The init_ps_display() arguments were mostly lies by now, so to match
typical usage, just use one argument and let the caller assemble it
from multiple sources if necessary.  The only user of the additional
arguments is BackendInitialize(), which was already doing string
assembly on the caller side anyway.

Remove the second argument of set_ps_display() ("force") and just
handle that in init_ps_display() internally.

BackendInitialize() also used to set the initial status as
"authentication", but that was very far from where authentication
actually happened.  So now it's set to "initializing" and then
"authentication" just before the actual call to
ClientAuthentication().

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
2020-03-11 16:38:31 +01:00
Alvaro Herrera 40b3e2c201
Split out CreateCast into src/backend/catalog/pg_cast.c
This catalog-handling code was previously together with the rest of
CastCreate() in src/backend/commands/functioncmds.c.  A future patch
will need a way to add casts internally, so this will be useful to have
separate.

Also, move the nearby get_cast_oid() function from functioncmds.c to
lsyscache.c, which seems a more natural place for it.

Author: Paul Jungwirth, minor edits by Álvaro
Discussion: https://postgr.es/m/20200309210003.GA19992@alvherre.pgsql
2020-03-10 11:28:23 -03:00
Michael Paquier 61d7c7bce3 Prevent reindex of invalid indexes on TOAST tables
Such indexes can only be duplicated leftovers of a previously failed
REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to
exist.  As toast indexes can only be dropped if invalid, reindexing
these would lead to useless duplicated indexes that can't be dropped
anymore, except if the parent relation is dropped.

Thanks to Justin Pryzby for reminding that this problem was reported
long ago during the review of the original patch of REINDEX
CONCURRENTLY, but the issue was never addressed.

Reported-by: Sergei Kornilov, Justin Pryzby
Author: Julien Rouhaud
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
Discussion: https://postgr.es/m/20200216190835.GA21832@telsasoft.com
Backpatch-through: 12
2020-03-10 15:38:17 +09:00
Jeff Davis 24d85952a5 Introduce LogicalTapeSetExtend().
Increases the number of tapes in a logical tape set. This will be
important for disk-based hash aggregation, because the maximum number
of tapes is not known ahead of time.

While discussing this change, it was observed to regress the
performance of Sort for at least one test case. The performance
regression was because some versions of GCC switch to an inlined
version of memcpy() in LogicalTapeWrite() after this change. No
performance regression for clang was observed.

Because the regression is due to an arbitrary decision by the
compiler, I decided it shouldn't hold up this change. If it needs to
be fixed, we can find a workaround.

Author: Adam Lee, Jeff Davis
Discussion: https://postgr.es/m/e54bfec11c59689890f277722aaaabd05f78e22c.camel%40j-davis.com
2020-03-09 10:40:02 -07:00
Tom Lane fe30e7ebfa Allow ALTER TYPE to change some properties of a base type.
Specifically, this patch allows ALTER TYPE to:
* Change the default TOAST strategy for a toastable base type;
* Promote a non-toastable type to toastable;
* Add/remove binary I/O functions for a type;
* Add/remove typmod I/O functions for a type;
* Add/remove a custom ANALYZE statistics functions for a type.

The first of these can be done by the type's owner; all the others
require superuser privilege since misuse could cause problems.

The main motivation for this patch is to allow extensions to
upgrade the feature sets of their data types, so the set of
alterable properties is biased towards that use-case.  However
it's also true that changing some other properties would be
a lot harder, as they get baked into physical storage and/or
stored expressions that depend on the type.

Along the way, refactor GenerateTypeDependencies() to make it easier
to call, refactor DefineType's volatility checks so they can be shared
by AlterType, and teach typcache.c that it might have to reload data
from the type's pg_type row, a scenario it never handled before.
Also rearrange alter_type.sgml a bit for clarity (put the
composite-type operations together).

Tomas Vondra and Tom Lane

Discussion: https://postgr.es/m/20200228004440.b23ein4qvmxnlpht@development
2020-03-06 12:19:29 -05:00
Tom Lane 3ed2005ff5 Introduce macros for typalign and typstorage constants.
Our usual practice for "poor man's enum" catalog columns is to define
macros for the possible values and use those, not literal constants,
in C code.  But for some reason lost in the mists of time, this was
never done for typalign/attalign or typstorage/attstorage.  It's never
too late to make it better though, so let's do that.

The reason I got interested in this right now is the need to duplicate
some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch.
But in general, this sort of change aids greppability and readability,
so it's a good idea even without any specific motivation.

I may have missed a few places that could be converted, and it's even
more likely that pending patches will re-introduce some hard-coded
references.  But that's not fatal --- there's no expectation that
we'd actually change any of these values.  We can clean up stragglers
over time.

Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
2020-03-04 10:34:25 -05:00
Tom Lane d677550493 Allow to_date/to_timestamp to recognize non-English month/day names.
to_char() has long allowed the TM (translation mode) prefix to
specify output of translated month or day names; but that prefix
had no effect in input format strings.  Now it does.  to_date()
and to_timestamp() will now recognize the same month or day names
that to_char() would output for the same format code.  Matching is
case-insensitive (per the active collation's notion of what that
means), just as it has always been for English month/day names
without the TM prefix.

(As per the discussion thread, there are lots of cases that this
feature will not handle, such as alternate day names.  But being
able to accept what to_char() will output seems useful enough.)

In passing, fix some shaky English and violations of message
style guidelines in jsonpath errors for the .datetime() method,
which depends on this code.

Juan José Santamaría Flecha, reviewed and modified by me,
with other commentary from Alvaro Herrera, Tomas Vondra,
Arthur Zakirov, Peter Eisentraut, Mark Dilger.

Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com
2020-03-03 11:06:47 -05:00
Alvaro Herrera 2f9661311b
Represent command completion tags as structs
The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-definable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger, with unsolicited help from Álvaro Herrera
Reviewed-by: John Naylor, Tom Lane
Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
2020-03-02 18:19:51 -03:00
Robert Haas 05d8449e73 Move src/backend/utils/hash/hashfn.c to src/common
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.

Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-27 09:25:41 +05:30
Robert Haas a91e2fa941 Adapt hashfn.c and hashutils.h for frontend use.
hash_any() and its various variants are defined to return Datum,
which is a backend-only concept, but the underlying functions
actually want to return uint32 and uint64, and only return Datum
because it's convenient for callers who are using them to
implement a hash function for some SQL datatype.

However, changing these functions to return uint32 and uint64
seems like it might lead to programming errors or back-patching
difficulties, both because they are widely used and because
failure to use UInt{32,64}GetDatum() might not provoke a
compilation error. Instead, rename the existing functions as
well as changing the return type, and add static inline wrappers
for those callers that need the previous behavior.

Although this commit adapts hashutils.h and hashfn.c so that they
can be compiled as frontend code, it does not actually do
anything that would cause them to be so compiled. That is left
for another commit.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:27:15 +05:30
Robert Haas 9341c783cc Put all the prototypes for hashfn.c into the same header file.
Previously, some of the prototypes for functions in hashfn.c were
in utils/hashutils.h and others were in utils/hsearch.h, but that
is confusing and has no particular benefit.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:22:45 +05:30
Robert Haas 07b95c3d83 Move bitmap_hash and bitmap_match to bitmapset.c.
The closely-related function bms_hash_value is already defined in that
file, and this change means that hashfn.c no longer needs to depend on
nodes/bitmapset.h. That gets us closer to allowing use of the hash
functions in hashfn.c in frontend code.

Patch by me, reviewed by Suraj Kharage and Mark Dilger.

Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
2020-02-24 17:17:43 +05:30
Tom Lane 607f8ce74d Avoid a performance regression in float overflow/underflow detection.
Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static
inline subroutines, but that wasn't too well thought out.  In the original
coding, the unlikely condition (isinf(result) or result == 0) was checked
first, and the inf_is_valid or zero_is_valid condition only afterwards.
The inline-subroutine coding caused that to be swapped around, which is
pretty horrid for performance because (a) in common cases the is_valid
condition is twice as expensive to evaluate (e.g., requiring two isinf()
calls not one) and (b) in common cases the is_valid condition is false,
requiring us to perform the unlikely-condition check anyway.  Net result
is that one isinf() call becomes two or three, resulting in visible
performance loss as reported by Keisuke Kuroda.

The original fix proposal was to revert the replacement of the macro,
but on second thought, that macro was just a bad idea from the beginning:
if anything it's a net negative for readability of the code.  So instead,
let's just open-code all the overflow/underflow tests, being careful to
test the unlikely condition first (and mark it unlikely() to help the
compiler get the point).

Also, rather than having N copies of the actual ereport() calls, collapse
those into out-of-line error subroutines to save some code space.  This
does mean that the error file/line numbers won't be very helpful for
figuring out where the issue really is --- but we'd already burned that
bridge by putting the ereports into static inlines.

In HEAD, check_float[48]_val() are gone altogether.  In v12, leave them
present in float.h but unused in the core code, just in case some
extension is depending on them.

Emre Hasegeli, with some kibitzing from me and Andres Freund

Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
2020-02-13 13:37:43 -05:00
Andrew Gierth 1fd687a035 Optimizations for integer to decimal output.
Using a lookup table of digit pairs reduces the number of divisions
needed, and calculating the length upfront saves some work; these
ideas are taken from the code previously committed for floats.

David Fetter, reviewed by Kyotaro Horiguchi, Tels, and me.

Discussion: https://postgr.es/m/20190924052620.GP31596%40fetter.org
2020-02-01 21:57:14 +00:00
Tom Lane 74b35eb468 Fix CheckAttributeType's handling of collations for ranges.
Commit fc7695891 changed CheckAttributeType to recurse into ranges,
but made it pass down the wrong collation (always InvalidOid, since
ranges as such have no collation).  This would result in guaranteed
failure when considering a range type whose subtype is collatable.

Embarrassingly, we lack any regression tests that would expose such
a problem (but fortunately, somebody noticed before we shipped this
bug in any release).

Fix it to pass down the range's subtype collation property instead,
and add some regression test cases to exercise collatable-subtype
ranges a bit more.  Back-patch to all supported branches, as the
previous patch was.

Report and patch by Julien Rouhaud, test cases tweaked by me

Discussion: https://postgr.es/m/CAOBaU_aBWqNweiGUFX0guzBKkcfJ8mnnyyGC_KBQmO12Mj5f_A@mail.gmail.com
2020-01-31 17:03:55 -05:00
Robert Haas beb4699091 Move jsonapi.c and jsonapi.h to src/common.
To make this work, (1) makeJsonLexContextCstringLen now takes the
encoding to be used as an argument; (2) check_stack_depth() is made to
do nothing in frontend code, and (3) elog(ERROR, ...) is changed to
pg_log_fatal + exit in frontend code.

Mark Dilger, reviewed and slightly revised by me.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-29 10:22:51 -05:00
Robert Haas 73ce2a03f3 Move some code from jsonapi.c to jsonfuncs.c.
Specifically, move those functions that depend on ereport()
from jsonapi.c to jsonfuncs.c, in preparation for allowing
jsonapi.c to be used from frontend code.

A few cases where elog(ERROR, ...) is used for can't-happen
conditions are left alone; we can handle those in some other
way in frontend code.

Reviewed by Mark Dilger and Andrew Dunstan.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-27 11:22:13 -05:00
Robert Haas 1f3a021730 Adjust pg_parse_json() so that it does not directly ereport().
Instead, it now returns a value indicating either success or the
type of error which occurred. The old behavior is still available
by calling pg_parse_json_or_ereport(). If the new interface is
used, an error can be thrown by passing the return value of
pg_parse_json() to json_ereport_error().

pg_parse_json() can still elog() in can't-happen cases, but it
seems like that issue is best handled separately.

Adjust json_lex() and json_count_array_elements() to return an
error code, too.

This is all in preparation for making the backend's json parser
available to frontend code.

Reviewed and/or tested by Mark Dilger and Andrew Dunstan.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-27 11:04:51 -05:00
Robert Haas 11b5e3e35d Split JSON lexer/parser from 'json' data type support.
Keep the code that pertains to the 'json' data type in json.c, but
move the lexing and parsing code to a new file jsonapi.c, a name
I chose because the corresponding prototypes are in jsonapi.h.

This seems like a logical division, because the JSON lexer and parser
are also used by the 'jsonb' data type, but the SQL-callable functions
in json.c are a separate thing. Also, the new jsonapi.c file needs to
include far fewer header files than json.c, which seems like a good
sign that this is an appropriate place to insert an abstraction
boundary. I took the opportunity to remove a few apparently-unneeded
includes from json.c at the same time.

Patch by me, reviewed by David Steele, Mark Dilger, and Andrew
Dunstan. The previous commit was, too, but I forgot to note it
in the commit message.

Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
2020-01-24 10:17:43 -08:00
Robert Haas ce0425b162 Adjust src/include/utils/jsonapi.h so it's not backend-only.
The major change here is that we no longer include jsonb.h into
jsonapi.h. The reason that was necessary is that jsonapi.h included
several prototypes functions in jsonfuncs.c that depend on the Jsonb
type. Move those prototypes to a new header, jsonfuncs.h, and include
it where needed.

The other change is that JsonEncodeDateTime is now declared in
json.h rather than jsonapi.h.

Taken together, these steps eliminate all dependencies of jsonapi.h
on backend-only data types and header files, so that it can
potentially be included in frontend code.
2020-01-24 09:58:37 -08:00
Alvaro Herrera a166d408eb Report progress of ANALYZE commands
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
	Vignesh C, Amit Langote
2020-01-15 11:14:39 -03:00
Bruce Momjian 7559d8ebfa Update copyrights for 2020
Backpatch-through: update all files in master, backpatch legal files through 9.4
2020-01-01 12:21:45 -05:00
Michael Paquier 7854e07f25 Revert "Rename files and headers related to index AM"
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.

Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
2019-12-27 08:09:00 +09:00
Tom Lane fbe0232358 Improve comments in utils/rel.h.
Mark the fields that should be accessed via partitioning-related
functions, as we already did for some other fields.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqFnK6LbVMACMCaqwWrvoSFTecZzufKRahg2qGvLPYMX=g@mail.gmail.com
2019-12-26 11:20:05 -05:00
Tom Lane 5b9312378e Load relcache entries' partitioning data on-demand, not immediately.
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested.  (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)

This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote).  In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.

Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero.  In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.)  Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.

In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does.  Avoid doing get_rel_relkind in the
critical section, too.

Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too.  Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.

Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 14:43:13 -05:00
Michael Paquier 8ce3aa9b59 Rename files and headers related to index AM
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c.  Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.

This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.

Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
2019-12-25 10:23:39 +09:00
Tom Lane 6ef77cf46e Further adjust EXPLAIN's choices of table alias names.
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan.  Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended.  (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)

This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".

The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel.  But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.

While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees.  This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c.  However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against.  (That's not done
here, but will be in the next patch.)  This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.

Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
2019-12-11 17:05:18 -05:00
Alvaro Herrera ba79cb5dc8 Emit parameter values during query bind/execute errors
This makes such log entries more useful, since the cause of the error
can be dependent on the parameter values.

Author: Alexey Bashtanov, Álvaro Herrera
Discussion: https://postgr.es/m/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc
Reviewed-by: Peter Eisentraut, Andres Freund, Tom Lane
2019-12-11 18:03:35 -03:00
Tom Lane c35b714caf Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables.
We implement ON COMMIT DELETE ROWS by truncating tables marked that
way, which requires also truncating/rebuilding their indexes.  But
RelationTruncateIndexes asks the relcache for up-to-date copies of any
index expressions, which may cause execution of eval_const_expressions
on them, which can result in actual execution of subexpressions.
This is a bad thing to have happening during ON COMMIT.  Manuel Rigger
reported that use of a SQL function resulted in crashes due to
expectations that ActiveSnapshot would be set, which it isn't.
The most obvious fix perhaps would be to push a snapshot during
PreCommit_on_commit_actions, but I think that would just open the door
to more problems: CommitTransaction explicitly expects that no
user-defined code can be running at this point.

Fortunately, since we know that no tuples exist to be indexed, there
seems no need to use the real index expressions or predicates during
RelationTruncateIndexes.  We can set up dummy index expressions
instead (we do need something that will expose the right data type,
as there are places that build index tupdescs based on this), and
just ignore predicates and exclusion constraints.

In a green field it'd likely be better to reimplement ON COMMIT DELETE
ROWS using the same "init fork" infrastructure used for unlogged
relations.  That seems impractical without catalog changes though,
and even without that it'd be too big a change to back-patch.
So for now do it like this.

Per private report from Manuel Rigger.  This has been broken forever,
so back-patch to all supported branches.
2019-12-01 13:09:26 -05:00
Amit Kapila e0487223ec Make the order of the header file includes consistent.
Similar to commits 14aec03502, 7e735035f2 and dddf4cdc33, this commit
makes the order of header file inclusion consistent in more places.

Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
2019-11-25 08:08:57 +05:30
Michael Paquier 4cb658af70 Refactor reloption handling for index AMs in-core
This reworks the reloption parsing and build of a couple of index AMs by
creating new structures for each index AM's options.  This split was
already done for BRIN, GIN and GiST (which actually has a fillfactor
parameter), but not for hash, B-tree and SPGiST which relied on
StdRdOptions due to an overlap with the default option set.

This saves a couple of bytes for rd_options in each relcache entry with
indexes making use of relation options, and brings more consistency
between all index AMs.  While on it, add a couple of AssertMacro() calls
to make sure that utility macros to grab values of reloptions are used
with the expected index AM.

Author: Nikolay Shaplov
Reviewed-by: Amit Langote, Michael Paquier, Álvaro Herrera, Dent John
Discussion: https://postgr.es/m/4127670.gFlpRb6XCm@x200m
2019-11-25 09:40:53 +09:00
Alvaro Herrera 71a8a4f6e3 Add backtrace support for error reporting
Add some support for automatically showing backtraces in certain error
situations in the server.  Backtraces are shown on assertion failure;
also, a new setting backtrace_functions can be set to a list of C
function names, and all ereport()s and elog()s from the mentioned
functions will have backtraces generated.  Finally, the function
errbacktrace() can be manually added to an ereport() call to generate a
backtrace for that call.

Authors: Peter Eisentraut, Álvaro Herrera
Discussion: https://postgr.es/m//5f48cb47-bf1e-05b6-7aae-3bf2cd01586d@2ndquadrant.com
Discussion: https://postgr.es/m/CAMsr+YGL+yfWE=JvbUbnpWtrRZNey7hJ07+zT4bYJdVp4Szdrg@mail.gmail.com
2019-11-08 15:44:20 -03:00
Peter Eisentraut 581a55889b Fix nested error handling in PG_FINALLY
We need to pop the error stack before running the user-supplied
PG_FINALLY code.  Otherwise an error in the cleanup code would end up
at the same sigsetjmp() invocation and result in an infinite error
handling loop.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
2019-11-07 09:56:47 +01:00
Tomas Vondra 6e3e6cc0e8 Allow sampling of statements depending on duration
This allows logging a sample of statements, without incurring excessive
log traffic (which may impact performance).  This can be useful when
analyzing workloads with lots of short queries.

The sampling is configured using two new GUC parameters:

 * log_min_duration_sample - minimum required statement duration

 * log_statement_sample_rate - sample rate (0.0 - 1.0)

Only statements with duration exceeding log_min_duration_sample are
considered for sampling. To enable sampling, both those GUCs have to
be set correctly.

The existing log_min_duration_statement GUC has a higher priority, i.e.
statements with duration exceeding log_min_duration_statement will be
always logged, irrespectedly of how the sampling is configured. This
means only configurations

  log_min_duration_sample < log_min_duration_statement

do actually sample the statements, instead of logging everything.

Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing, Tomas Vondra
Discussion: https://postgr.es/m/bbe0a1a8-a8f7-3be2-155a-888e661cc06c@anayrat.info
2019-11-06 19:11:07 +01:00
Peter Eisentraut 3967737624 Add some assertions to view reloption macros
In these macros, the rd_options pointer is cast to ViewOption *.  Add
some assertions that the passed-in relation is actually a view before
doing that.

Author: Nikolay Shaplov <dhyan@nataraj.su>
Discussion: https://www.postgresql.org/message-id/flat/3634983.eHpMQ1mJnI@x200m
2019-11-01 13:25:38 +01:00
Peter Eisentraut 604bd36711 PG_FINALLY
This gives an alternative way of catching exceptions, for the common
case where the cleanup code is the same in the error and non-error
cases.  So instead of

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_CATCH();
    {
        cleanup();
	PG_RE_THROW();
    }
    PG_END_TRY();
    cleanup();

one can write

    PG_TRY();
    {
        ... code that might throw ereport(ERROR) ...
    }
    PG_FINALLY();
    {
        cleanup();
    }
    PG_END_TRY();

Discussion: https://www.postgresql.org/message-id/flat/95a822c3-728b-af0e-d7e5-71890507ae0c%402ndquadrant.com
2019-11-01 11:18:03 +01:00
Peter Eisentraut 7302514088 Add const qualifiers to internal range type APIs
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/dc9b45fa-b950-fadc-4751-85d6f729df55%402ndquadrant.com
2019-10-31 07:48:21 +01:00
Tom Lane 22f6f2c1cc Improve management of statement timeouts.
Commit f8e5f156b added private state in postgres.c to track whether
a statement timeout is running.  This seems like bad design to me;
timeout.c's private state should be the single source of truth about
that.  We already fixed one bug associated with failure to keep those
states in sync (cf. be42015fc), and I've got little faith that we
won't find more in future.  So get rid of postgres.c's local variable
by exposing a way to ask timeout.c whether a timeout is running.
(Obviously, such an inquiry is subject to race conditions, but it
seems fine for the purpose at hand.)

To make get_timeout_active() as cheap as possible, add a flag in
the per-timeout struct showing whether that timeout is active.
This allows some small savings elsewhere in timeout.c, mainly
elimination of unnecessary searches of the active_timeouts array.

While at it, fix enable_statement_timeout to not call disable_timeout
when statement_timeout is 0 and the timeout is not running.  This
avoids a useless deschedule-and-reschedule-timeouts cycle, which
represents a significant savings (at least one kernel call) when
there is any other active timeout.  Right now, there usually isn't,
but there are proposals around to change that.

Discussion: https://postgr.es/m/16035-456e6e69ebfd4374@postgresql.org
2019-10-25 11:41:16 -04:00
Alexander Korotkov 52ad1e6599 Refactor jsonpath's compareDatetime()
This commit refactors come ridiculous coding in compareDatetime().  Also, it
provides correct cross-datatype comparison even when one of values overflows
during cast.  That eliminates dilemma on whether we should suppress overflow
errors during cast.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/32308.1569455803%40sss.pgh.pa.us
Discussion: https://postgr.es/m/a5629d0c-8162-7559-16aa-0c8390d6ba5f%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov
2019-10-21 23:07:07 +03:00
Michael Paquier 1de4fd1092 Refresh some incorrect links in pg_crc.c/h
Author: Vignesh C
Discussion: https://postgr.es/m/CALDaNm0LPk9vTGTBPBRv0=fX=94o4r6-DuBbHNeCN2AH5bufLw@mail.gmail.com
2019-10-16 15:10:14 +09:00
Peter Eisentraut 38d8dce61f Remove some code for old unsupported versions of MSVC
As of d9dd406fe2, we require MSVC 2013,
which means _MSC_VER >= 1800.  This means that conditionals about
older versions of _MSC_VER can be removed or simplified.

Previous code was also in some cases handling MinGW, where _MSC_VER is
not defined at all, incorrectly, such as in pg_ctl.c and win32_port.h,
leading to some compiler warnings.  This should now be handled better.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-10-08 10:50:54 +02:00
Tomas Vondra 36425ece5d Change MemoryContextMemAllocated to return Size
Commit f2369bc610 switched most of the memory accounting from int64 to
Size, but it forgot to change the MemoryContextMemAllocated return type.
So this fixes that omission.

Discussion: https://www.postgresql.org/message-id/11238.1570200198%40sss.pgh.pa.us
2019-10-05 20:49:39 +02:00
Tomas Vondra 5dd7fc1519 Add transparent block-level memory accounting
Adds accounting of memory allocated in a memory context. Compared to
various ad hoc solutions, the main advantage is that the accounting is
transparent and does not require direct control over allocations (this
matters for use cases where the allocations happen in user code, like
for example aggregate states allocated in a transition functions).

To reduce overhead, the accounting happens at the block level (not for
individual chunks) and only the context immediately owning the block is
updated. When inquiring about amount of memory allocated in a context,
we have to recursively walk all children contexts.

This "lazy" accounting works well for cases with relatively small number
of contexts in the relevant subtree and/or with infrequent inquiries.

Author: Jeff Davis
Reivewed-by: Tomas Vondra, Melanie Plageman, Soumyadeep Chakraborty
Discussion: https://www.postgresql.org/message-id/flat/027a129b8525601c6a680d27ce3a7172dab61aab.camel@j-davis.com
2019-10-01 03:13:39 +02:00
Alexander Korotkov bffe1bd684 Implement jsonpath .datetime() method
This commit implements jsonpath .datetime() method as it's specified in
SQL/JSON standard.  There are no-argument and single-argument versions of
this method.  No-argument version selects first of ISO datetime formats
matching input string.  Single-argument version accepts template string as
its argument.

Additionally to .datetime() method itself this commit also implements
comparison ability of resulting date and time values.  There is some difficulty
because exising jsonb_path_*() functions are immutable, while comparison of
timezoned and non-timezoned types involves current timezone.  At first, current
timezone could be changes in session.  Moreover, timezones themselves are not
immutable and could be updated.  This is why we let existing immutable functions
throw errors on such non-immutable comparison.  In the same time this commit
provides jsonb_path_*_tz() functions which are stable and support operations
involving timezones.  As new functions are added to the system catalog,
catversion is bumped.

Support of .datetime() method was the only blocker prevents T832 from being
marked as supported.  sql_features.txt is updated correspondingly.

Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Heavily revised by me.  Comments were adjusted by Liudmila Mantrova.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Alexander Korotkov, Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
2019-09-25 22:51:51 +03:00
Alexander Korotkov 6dda292d4d Allow datetime values in JsonbValue
SQL/JSON standard allows manipulation with datetime values.  So, it appears to
be convinient to allow datetime values to be represented in JsonbValue struct.
These datetime values are allowed for temporary representation only.  During
serialization datetime values are converted into strings.

SQL/JSON requires writing timestamps with timezone in the same timezone offset
as they were parsed.  This is why we allow storage of timezone offset in
JsonbValue struct.  For the same reason timezone offset argument is added to
JsonEncodeDateTime() function.

Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Revised by me.  Comments were adjusted by Liudmila Mantrova.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Alexander Korotkov, Liudmila Mantrova
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
2019-09-25 22:51:51 +03:00
Alexander Korotkov 5bc450629b Error suppression support for upcoming jsonpath .datetime() method
Add support of error suppression in some date and time manipulation functions
as it's required for jsonpath .datetime() method support.  This commit doesn't
use PG_TRY()/PG_CATCH() in order to implement that.  Instead, it provides
internal versions of date and time functions used, which support error
suppression.

Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Alexander Korotkov, Nikita Glukhov
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
2019-09-25 22:51:51 +03:00
Alexander Korotkov 66c74f8b6e Implement parse_datetime() function
This commit adds parse_datetime() function, which implements datetime
parsing with extended features demanded by upcoming jsonpath .datetime()
method:

 * Dynamic type identification based on template string,
 * Support for standard-conforming 'strict' mode,
 * Timezone offset is returned as separate value.

Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Revised by me.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Alexander Korotkov
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
2019-09-25 22:51:51 +03:00
Alvaro Herrera 773df883e8 Support reloptions of enum type
All our current in core relation options of type string (not many,
admittedly) behave in reality like enums.  But after seeing an
implementation for enum reloptions, it's clear that strings are messier,
so introduce the new reloption type.  Switch all string options to be
enums instead.

Fortunately we have a recently introduced test module for reloptions, so
we don't lose coverage of string reloptions, which may still be used by
third-party modules.

Authors: Nikolay Shaplov, Álvaro Herrera
Reviewed-by: Nikita Glukhov, Aleksandr Parfenov
Discussion: https://postgr.es/m/43332102.S2V5pIjXRx@x200m
2019-09-25 15:56:52 -03:00
Tom Lane 5ac0d93600 Fix failure to zero-pad the result of bitshiftright().
If the bitstring length is not a multiple of 8, we'd shift the
rightmost bits into the pad space, which must be zeroes --- bit_cmp,
for one, depends on that.  This'd lead to the result failing to
compare equal to what it should compare equal to, as reported in
bug #16013 from Daryl Waycott.

This is, if memory serves, not the first such bug in the bitstring
functions.  In hopes of making it the last one, do a bit more work
than minimally necessary to fix the bug:

* Add assertion checks to bit_out() and varbit_out() to complain if
they are given incorrectly-padded input.  This will improve the
odds that manual testing of any new patch finds problems.

* Encapsulate the padding-related logic in macros to make it
easier to use.

Also, remove unnecessary padding logic from bit_or() and bitxor().
Somebody had already noted that we need not re-pad the result of
bit_and() since the inputs are required to be the same length,
but failed to extrapolate that to the other two.

Also, move a comment block that once was near the head of varbit.c
(but people kept putting other stuff in front of it), to put it in
the header block.

Note for the release notes: if anyone has inconsistent data as a
result of saving the output of bitshiftright() in a table, it's
possible to fix it with something like
UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol);

This has been broken since day one, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/16013-c2765b6996aacae9@postgresql.org
2019-09-22 17:45:59 -04:00
Alvaro Herrera 1a2983231d Split out code into new getKeyJsonValueFromContainer()
The new function stashes its output value in a JsonbValue that can be
passed in by the caller, which enables some of them to pass
stack-allocated structs -- saving palloc cycles.  It also allows some
callers that know they are handling a jsonb object to use this new jsonb
object-specific API, instead of going through generic container
findJsonbValueFromContainer.

Author: Nikita Glukhov
Discussion: https://postgr.es/m/7c417f90-f95f-247e-ba63-d95e39c0ad14@postgrespro.ru
2019-09-20 20:18:11 -03:00
Tom Lane d5b90cd648 Fix bogus handling of XQuery regex option flags.
The SQL spec defers to XQuery to define what the option flags are
for LIKE_REGEX patterns.  XQuery says that:
* 's' allows the dot character to match newlines, which by
  default it will not;
* 'm' allows ^ and $ to match at newlines, not only at the
  start/end of the whole string.
Thus, these are *not* inverses as they are for the similarly-named
POSIX options, and neither one corresponds to the POSIX 'n' option.
Fortunately, Spencer's library does expose these two behaviors as
separately twiddlable flags, so we just have to fix the mapping from
JSP flag bits to REG flag bits.  I also chose to rename the symbol
for 's' to DOTALL, to make it clearer that it's not the inverse
of MLINE.

Also, XQuery says that if the 'q' flag "is used together with the m, s,
or x flag, that flag has no effect".  I read this as saying that 'q'
overrides the other flags; whoever wrote our code seems to have read
it backwards.

Lastly, while XQuery's 'x' flag is related to what Spencer's code
does for REG_EXPANDED, it's not the same or a subset.  It seems best
to treat XQuery's 'x' as unimplemented for now.  Maybe later we can
expand our regex code to offer 'x'-style parsing as a separate option.

While at it, refactor the jsonpath code so that (a) there's only
one copy of the flag transformation logic not two, and (b) the
processing of flags is independent of the order in which the flags
are written.

We need some documentation updates to go with this, but I'll
tackle that separately.

Back-patch to v12 where this code originated.

Discussion: https://postgr.es/m/CAPpHfdvDci4iqNF9fhRkTqhe-5_8HmzeLt56drH%2B_Rv2rNRqfg@mail.gmail.com
Reference: https://www.w3.org/TR/2017/REC-xpath-functions-31-20170321/#flags
2019-09-17 15:39:51 -04:00
Alexander Korotkov d589f94460 Support for FF1-FF6 datetime format patterns
SQL Standard 2016 defines FF1-FF9 format patters for fractions of seconds in
jsonpath .datetime() method and CAST (... FORMAT ...) SQL clause.  Parsing
engine of upcoming .datetime() method will be shared with to_date()/
to_timestamp().

This patch implements FF1-FF6 format patterns for upcoming jsonpath .datetime()
method.  to_date()/to_timestamp() functions will also get support of this
format patterns as positive side effect.  FF7-FF9 are not supported due to
lack of precision in our internal timestamp representation.

Extracted from original patch by Nikita Glukhov, Teodor Sigaev, Oleg Bartunov.
Heavily revised by me.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
Author: Nikita Glukhov, Teodor Sigaev, Oleg Bartunov, Alexander Korotkov
Reviewed-by: Anastasia Lubennikova, Peter Eisentraut
2019-09-16 21:14:32 +03:00
Tom Lane 9a86f03b4e Rearrange postmaster's startup sequence for better syslogger results.
This is a second try at what commit 57431a911 tried to do, namely,
launch the syslogger before we open postmaster sockets so that our
messages about the sockets end up in the syslogger files.  That
commit fell foul of a bunch of subtle issues caused by trying to
launch a postmaster child process before creating shared memory.
Rather than messing with that interaction, let's postpone opening
the sockets till after we launch the syslogger.

This would not have been terribly safe before commit 7de19fbc0,
because we relied on socket opening to detect whether any competing
postmasters were using the same port number.  But now that we choose
IPC keys without regard to the port number, there's no interaction
to worry about.

Also delay creation of the external PID file (if requested) till after
the sockets are open, since external code could plausibly be relying
on that ordering of events.  And postpone most of the work of
RemovePgTempFiles() so that that potentially-slow processing still
happens after we make the external PID file.  We have to be a bit
careful about that last though: as noted in the discussion subsequent to
bug #15804, EXEC_BACKEND builds still have to clear the parameter-file
temp dir before launching the syslogger.

Patch by me; thanks to Michael Paquier for review/testing.

Discussion: https://postgr.es/m/15804-3721117bf40fb654@postgresql.org
2019-09-11 11:43:01 -04:00
Thomas Munro 720b59b55b Avoid catalog lookups in RelationAllowsEarlyPruning().
RelationAllowsEarlyPruning() performed a catalog scan, but is used
in two contexts where that was a bad idea:

1.  In heap_page_prune_opt(), which runs very frequently in some large
    scans.  This caused major performance problems in a field report
    that was easy to reproduce.

2.  In TestForOldSnapshot(), which runs while we hold a buffer content
    lock.  It's not clear if this was guaranteed to be free of buffer
    deadlock risk.

The check was introduced in commit 2cc41acd8 and defended against a
real problem: 9.6's hash indexes have no page LSN and so we can't
allow early pruning (ie the snapshot-too-old feature).  We can remove
the check from all later releases though: hash indexes are now logged,
and there is no way to create UNLOGGED indexes on regular logged
tables.

If a future release allows such a combination, it might need to put
a similar check in place, but it'll need some more thought.

Back-patch to 10.

Author: Thomas Munro
Reviewed-by: Tom Lane, who spotted the second problem
Discussion: https://postgr.es/m/CA%2BhUKGKT8oTkp5jw_U4p0S-7UG9zsvtw_M47Y285bER6a2gD%2Bg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1%2BWy%2BN4eE5zPm765h68LrkWc3Biu_8rzzi%2BOYX4j%2BiHRw%40mail.gmail.com
2019-08-28 16:18:29 +12:00
Andres Freund 4c01a11103 Add fmgr.h include to selfuncs.h.
Necessary after fb3b098f. That previously escaped notice, because all
including sites already include fmgr.h some other way.

Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/17463.1566153454@sss.pgh.pa.us
2019-08-19 12:51:38 -07:00
Tom Lane 927f34ce8a Avoid conflicts with library versions of inet_net_ntop() and friends.
Prefix inet_net_ntop and sibling routines with "pg_" to ensure that
they aren't mistaken for C-library functions.  This fixes warnings
from cpluspluscheck on some platforms, and should help reduce reader
confusion everywhere, since our functions aren't exactly interchangeable
with the library versions (they may have different ideas about address
family codes).

This shouldn't be fixing any actual bugs, unless somebody's linker
is misbehaving, so no need to back-patch.

Discussion: https://postgr.es/m/20518.1559494394@sss.pgh.pa.us
2019-08-18 19:27:23 -04:00
Tom Lane 232720be9b Fix incidental warnings from cpluspluscheck.
Remove use of "register" keyword in hashfn.c.  It's obsolescent
according to recent C++ compilers, and no modern C compiler pays
much attention to it either.

Also fix one cosmetic warning about signed vs unsigned comparison.

Discussion: https://postgr.es/m/20518.1559494394@sss.pgh.pa.us
2019-08-18 19:01:40 -04:00
Andres Freund fb3b098fe8 Remove fmgr.h includes from headers that don't really need it.
Most of the fmgr.h includes were obsoleted by 352a24a1f9. A
few others can be obsoleted using the underlying struct type in an
implementation detail.

Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-08-16 10:35:31 -07:00
Andres Freund 6a04d345fd Don't include utils/array.h from acl.h.
For most uses of acl.h the details of how "Acl" internally looks like
are irrelevant. It might make sense to move a lot of the
implementation details into a separate header at a later point.

The main motivation of this change is to avoid including fmgr.h (via
array.h, which needs it for exposed structs) in a lot of files that
otherwise don't need it. A subsequent commit will remove the fmgr.h
include from a lot of files.

Directly include utils/array.h and utils/expandeddatum.h from the
files that need them, but previously included them indirectly, via
acl.h.

Author: Andres Freund
Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-08-16 10:33:30 -07:00
Michael Paquier 66bde49d96 Fix inconsistencies and typos in the tree, take 10
This addresses some issues with unnecessary code comments, fixes various
typos in docs and comments, and removes some orphaned structures and
definitions.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9aabc775-5494-b372-8bcb-4dfc0bd37c68@gmail.com
2019-08-13 13:53:41 +09:00
Michael Paquier 8548ddc61b Fix inconsistencies and typos in the tree, take 9
This addresses more issues with code comments, variable names and
unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
2019-08-05 12:14:58 +09:00
Tomas Vondra 75506195da Revert "Add log_statement_sample_rate parameter"
This reverts commit 88bdbd3f74.

As committed, statement sampling used the existing duration threshold
(log_min_duration_statement) when decide which statements to sample.
The issue is that even the longest statements are subject to sampling,
and so may not end up logged. An improvement was proposed, introducing
a second duration threshold, but it would not be backwards compatible.
So we've decided to revert this feature - the separate threshold should
be part of the feature itself.

Discussion: https://postgr.es/m/CAFj8pRDS8tQ3Wviw9%3DAvODyUciPSrGeMhJi_WPE%2BEB8%2B4gLL-Q%40mail.gmail.com
2019-08-04 23:38:27 +02:00
Andres Freund 870b1d6800 Remove superfluous newlines in function prototypes.
These were introduced by pgindent due to fixe to broken
indentation (c.f. 8255c7a5ee). Previously the mis-indentation of
function prototypes was creatively used to reduce indentation in a few
places.

As that formatting only exists in master and REL_12_STABLE, it seems
better to fix it in both, rather than having some odd indentation in
v12 that somebody might copy for future patches or such.

Author: Andres Freund
Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de
Backpatch: 12-
2019-07-31 00:05:21 -07:00
Heikki Linnakangas a29834beb1 Allow table AM's to use rd_amcache, too.
The rd_amcache allows an index AM to cache arbitrary information in a
relcache entry. This commit moves the cleanup of rd_amcache so that it
can also be used by table AMs. Nothing takes advantage of that yet, but
I'm sure it'll come handy for anyone writing new table AMs.

Backpatch to v12, where table AM interface was introduced.

Reviewed-by: Julien Rouhaud
2019-07-30 21:43:27 +03:00
Heikki Linnakangas 6655a7299d Use full 64-bit XID for checking if a deleted GiST page is old enough.
Otherwise, after a deleted page gets even older, it becomes unrecyclable
again. B-tree has the same problem, and has had since time immemorial,
but let's at least fix this in GiST, where this is new.

Backpatch to v12, where GiST page deletion was introduced.

Reviewed-by: Andrey Borodin
Discussion: https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
2019-07-24 20:24:07 +03:00
Michael Paquier 23bccc823d Fix inconsistencies and typos in the tree
This is numbered take 7, and addresses a set of issues with code
comments, variable names and unreferenced variables.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-07-22 10:01:50 +09:00
Michael Paquier d1a040543b Remove more unreferenced function declarations
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDuAYsRb3Q9aobkFZ6DZMWxsyg4HOmgkwgeWNfSkTwGxw@mail.gmail.com
2019-07-07 09:58:33 +09:00
Tom Lane 79b94716e7 Remove unreferenced function declarations.
These seem to be leftovers from old patches, perhaps.

Masahiko Sawada

Discussion: https://postgr.es/m/CAD21AoDuAYsRb3Q9aobkFZ6DZMWxsyg4HOmgkwgeWNfSkTwGxw@mail.gmail.com
2019-07-05 19:28:45 -04:00
Michael Paquier c74d49d41c Fix many typos and inconsistencies
Author: Alexander Lakhin
Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
2019-07-01 10:00:23 +09:00
Noah Misch 459c3cdb4a Don't read fields of a misaligned ExpandedObjectHeader or AnyArrayType.
UBSan complains about this.  Instead, cast to a suitable type requiring
only 4-byte alignment.  DatumGetAnyArrayP() already assumes one can cast
between AnyArrayType and ArrayType, so this doesn't introduce a new
assumption.  Back-patch to 9.5, where AnyArrayType was introduced.

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/20190629210334.GA1244217@rfd.leadboat.com
2019-06-30 17:34:17 -07:00
Peter Eisentraut 21f428ebde Don't call data type input functions in GUC check hooks
Instead of calling pg_lsn_in() in check_recovery_target_lsn and
timestamptz_in() in check_recovery_target_time, reorganize the
respective code so that we don't raise any errors in the check hooks.
The previous code tried to use PG_TRY/PG_CATCH to handle errors in a
way that is not safe, so now the code contains no ereport() calls and
can operate safely within the GUC error handling system.

Moreover, since the interpretation of the recovery_target_time string
may depend on the time zone, we cannot do the final processing of that
string until all the GUC processing is done.  Instead,
check_recovery_target_time() now does some parsing for syntax
checking, but the actual conversion to a timestamptz value is done
later in the recovery code that uses it.

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190611061115.njjwkagvxp4qujhp%40alap3.anarazel.de
2019-06-30 10:27:43 +02:00
Peter Eisentraut 666cbae16d Remove explicit error handling for obsolete date/time values
The date/time values 'current', 'invalid', and 'undefined' were
removed a long time ago, but the code still contains explicit error
handling for the transition.  To simplify the code and avoid having to
handle these values everywhere, just remove the recognition of these
tokens altogether now.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2019-06-30 10:27:35 +02:00
Alexander Korotkov 261a5c1928 Support 'q' flag in jsonpath 'like_regex' predicate
SQL/JSON standard defines that jsonpath 'like_regex' predicate should support
the same set of flags as XQuery/XPath.  It appears that implementation of 'q'
flag was missed.  This commit fixes that.

Discussion: https://postgr.es/m/CAPpHfdtyfPsxLYiTjp5Ov8T5xGsB5t3CwE5%2B3PS%3DLLwA%2BxTJog%40mail.gmail.com
Author: Nikita Glukhov, Alexander Korotkov
2019-06-19 22:41:57 +03:00
Michael Paquier 3412030205 Fix more typos and inconsistencies in the tree
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
2019-06-17 16:13:16 +09:00
Tomas Vondra 6cbfb784c3 Rework the pg_statistic_ext catalog
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic.  That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.

Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics.  That would be a surprising behavior.

Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly.  This changed with introduction
of MCV lists, which do include most common combinations of values.

The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.

Bumped CATVERSION due to changing system catalog definitions.

Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
2019-06-16 01:20:31 +02:00
Alvaro Herrera b976845815 Fix double-word typos
Discussion: https://postgr.es/m/20190612184527.GA24266@alvherre.pgsql
Reviewed-by: Michaël Paquier
2019-06-13 10:03:56 -04:00
Amit Kapila 9679345f3c Fix typos.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin
Reviewed-by: Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
2019-05-26 18:28:18 +05:30
Tom Lane 8255c7a5ee Phase 2 pgindent run for v12.
Switch to 2.1 version of pg_bsd_indent.  This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.

Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-22 13:04:48 -04:00
Tom Lane be76af171c Initial pgindent run for v12.
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.

Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
2019-05-22 12:55:34 -04:00
Andres Freund 78d6a9cbd3 Fix and improve SnapshotType comments.
The comment for SNAPSHOT_SELF was unfortunately explaining
SNAPSHOT_DIRTY, as reported by Sergei. Also expand a few comments, and
include a few more comments from heapam_visibility.c, so they're in an
AM independent place.

Reported-By: Sergei Kornilov
Author: Andres Freund
Discussion: https://postgr.es/m/9152241558192351@sas1-d856b3d759c7.qloud-c.yandex.net
2019-05-19 16:18:44 -07:00
Tom Lane f884dca495 Remove RelationSetIndexList().
In the wake of commit f912d7dec, RelationSetIndexList isn't used any
more.  It was always a horrid wart, so getting rid of it is very nice.
We can also convert rd_indexvalid back to a plain boolean.

Discussion: https://postgr.es/m/28926.1556664156@sss.pgh.pa.us
2019-05-03 10:26:14 -04:00
Tom Lane 5f1433ac5e Prevent memory leaks associated with relcache rd_partcheck structures.
The original coding of generate_partition_qual() just copied the list
of predicate expressions into the global CacheMemoryContext, making it
effectively impossible to clean up when the owning relcache entry is
destroyed --- the relevant code in RelationDestroyRelation() only managed
to free the topmost List header :-(.  This resulted in a session-lifespan
memory leak whenever a table partition's relcache entry is rebuilt.
Fortunately, that's not normally a large data structure, and rebuilds
shouldn't occur all that often in production situations; but this is
still a bug worth fixing back to v10 where the code was introduced.

To fix, put the cached expression tree into its own small memory context,
as we do with other complicated substructures of relcache entries.
Also, deal more honestly with the case that a partition has an empty
partcheck list; while that probably isn't a case that's very interesting
for production use, it's legal.

In passing, clarify comments about how partitioning-related relcache
data structures are managed, and add some Asserts that we're not leaking
old copies when we overwrite these data fields.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/7961.1552498252@sss.pgh.pa.us
2019-04-13 13:22:26 -04:00
Fujii Masao 119dcfad98 Add vacuum_truncate reloption.
vacuum_truncate controls whether vacuum tries to truncate off
any empty pages at the end of the table. Previously vacuum always
tried to do the truncation. However, the truncation could cause
some problems; for example, ACCESS EXCLUSIVE lock needs to
be taken on the table during the truncation and can cause
the query cancellation on the standby even if hot_standby_feedback
is true. Setting this reloption to false can be helpful to avoid
such problems.

Author: Tsunakawa Takayuki
Reviewed-By: Julien Rouhaud, Masahiko Sawada, Michael Paquier, Kirk Jamison and Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwE5UqFqSq1=kV3QtTUtXphTdyHA-8rAj4A=Y+e4kyp3BQ@mail.gmail.com
2019-04-08 16:43:57 +09:00
Michael Paquier 249d649996 Add support TCP user timeout in libpq and the backend server
Similarly to the set of parameters for keepalive, a connection parameter
for libpq is added as well as a backend GUC, called tcp_user_timeout.

Increasing the TCP user timeout is useful to allow a connection to
survive extended periods without end-to-end connection, and decreasing
it allows application to fail faster.  By default, the parameter is 0,
which makes the connection use the system default, and follows a logic
close to the keepalive parameters in its handling.  When connecting
through a Unix-socket domain, the parameters have no effect.

Author: Ryohei Nagaura
Reviewed-by: Fabien Coelho, Robert Haas, Kyotaro Horiguchi, Kirk
Jamison, Mikalai Keida, Takayuki Tsunakawa, Andrei Yahorau
Discussion: https://postgr.es/m/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
2019-04-06 15:23:37 +09:00
Robert Haas a96c41feec Allow VACUUM to be run with index cleanup disabled.
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.

Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.

Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
2019-04-04 15:04:43 -04:00
Tomas Vondra ea569d64ac Add SETTINGS option to EXPLAIN, to print modified settings.
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values.  With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan.  Only options affecting planning,
with values different from the built-in default are printed.

This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.

Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
2019-04-04 00:04:31 +02:00
Alvaro Herrera 799e220346 Log all statements from a sample of transactions
This is useful to obtain a view of the different transaction types in an
application, regardless of the durations of the statements each runs.

Author: Adrien Nayrat
Reviewed-by: Masahiko Sawada, Hayato Kuroda, Andres Freund
2019-04-03 18:43:59 -03:00
Alvaro Herrera f56f8f8da6 Support foreign keys that reference partitioned tables
Previously, while primary keys could be made on partitioned tables, it
was not possible to define foreign keys that reference those primary
keys.  Now it is possible to do that.

Author: Álvaro Herrera
Reviewed-by: Amit Langote, Jesper Pedersen
Discussion: https://postgr.es/m/20181102234158.735b3fevta63msbj@alvherre.pgsql
2019-04-03 14:40:21 -03:00
Alexander Korotkov 0a02e2ae02 GIN support for @@ and @? jsonpath operators
This commit makes existing GIN operator classes jsonb_ops and json_path_ops
support "jsonb @@ jsonpath" and "jsonb @? jsonpath" operators.  Basic idea is
to extract statements of following form out of jsonpath.

 key1.key2. ... .keyN = const

The rest of jsonpath is rechecked from heap.

Catversion is bumped.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Nikita Glukhov, Alexander Korotkov
Reviewed-by: Jonathan Katz, Pavel Stehule
2019-04-01 18:08:52 +03:00
Peter Eisentraut fc22b6623b Generated columns
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.

This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
2019-03-30 08:15:57 +01:00
Andres Freund d25f519107 tableam: relation creation, VACUUM FULL/CLUSTER, SET TABLESPACE.
This moves the responsibility for:
- creating the storage necessary for a relation, including creating a
  new relfilenode for a relation with existing storage
- non-transactional truncation of a relation
- VACUUM FULL / CLUSTER's rewrite of a table
below tableam.

This is fairly straight forward, with a bit of complexity smattered in
to move the computation of xid / multixid horizons below the AM, as
they don't make sense for every table AM.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-03-28 20:01:43 -07:00
Andres Freund 5db6df0c01 tableam: Add tuple_{insert, delete, update, lock} and use.
This adds new, required, table AM callbacks for insert/delete/update
and lock_tuple. To be able to reasonably use those, the EvalPlanQual
mechanism had to be adapted, moving more logic into the AM.

Previously both delete/update/lock call-sites and the EPQ mechanism had
to have awareness of the specific tuple format to be able to fetch the
latest version of a tuple. Obviously that needs to be abstracted
away. To do so, move the logic that find the latest row version into
the AM. lock_tuple has a new flag argument,
TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last
version, rather than the current one.  It'd have been possible to do
so via a separate callback as well, but finding the last version
usually also necessitates locking the newest version, making it
sensible to combine the two. This replaces the previous use of
EvalPlanQualFetch().  Additionally HeapTupleUpdated, which previously
signaled either a concurrent update or delete, is now split into two,
to avoid callers needing AM specific knowledge to differentiate.

The move of finding the latest row version into tuple_lock means that
encountering a row concurrently moved into another partition will now
raise an error about "tuple to be locked" rather than "tuple to be
updated/deleted" - which is accurate, as that always happens when
locking rows. While possible slightly less helpful for users, it seems
like an acceptable trade-off.

As part of this commit HTSU_Result has been renamed to TM_Result, and
its members been expanded to differentiated between updating and
deleting. HeapUpdateFailureData has been renamed to TM_FailureData.

The interface to speculative insertion is changed so nodeModifyTable.c
does not have to set the speculative token itself anymore. Instead
there's a version of tuple_insert, tuple_insert_speculative, that
performs the speculative insertion (without requiring a flag to signal
that fact), and the speculative insertion is either made permanent
with table_complete_speculative(succeeded = true) or aborted with
succeeded = false).

Note that multi_insert is not yet routed through tableam, nor is
COPY. Changing multi_insert requires changes to copy.c that are large
enough to better be done separately.

Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED
VIEW are also only going to be adjusted in a later commit.

Author: Andres Freund and Haribabu Kommi
Discussion:
    https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
    https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de
    https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-23 19:55:57 -07:00
Peter Eisentraut 7b084b3831 Revert "Add gitignore entries for jsonpath_gram.h"
This reverts commit 4e274a043f.

These files aren't actually built anymore since 550b9d26f.
2019-03-23 00:19:34 +01:00
Peter Eisentraut 4e274a043f Add gitignore entries for jsonpath_gram.h 2019-03-22 23:19:30 +01:00
Peter Eisentraut 5e1963fb76 Collations with nondeterministic comparison
This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

This functionality is only supported with the ICU provider.  At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).

This patch makes changes in three areas:

- CREATE COLLATION DDL changes and system catalog changes to support
  this new flag.

- Many executor nodes and auxiliary code are extended to track
  collations.  Previously, this code would just throw away collation
  information, because the eventually-called user-defined functions
  didn't use it since they only cared about equality, which didn't
  need collation information.

- String data type functions that do equality comparisons and hashing
  are changed to take the (non-)deterministic flag into account.  For
  comparison, this just means skipping various shortcuts and tie
  breakers that use byte-wise comparison.  For hashing, we first need
  to convert the input string to a canonical "sort key" using the ICU
  analogue of strxfrm().

Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
2019-03-22 12:12:43 +01:00
Alexander Korotkov 550b9d26f8 Get rid of jsonpath_gram.h and jsonpath_scanner.h
Jsonpath grammar and scanner are both quite small.  It doesn't worth complexity
to compile them separately.  This commit makes grammar and scanner be compiled
at once.  Therefore, jsonpath_gram.h and jsonpath_gram.h are no longer needed.
This commit also does some reorganization of code in jsonpath_gram.y.

Discussion: https://postgr.es/m/d47b2023-3ecb-5f04-d253-d557547cf74f%402ndQuadrant.com
2019-03-20 11:13:34 +03:00
Alexander Korotkov 5e28b778bf Rename typedef in jsonpath_gram.y from "string" to "JsonPathString"
Reason is the same as in 75c57058b0.
2019-03-19 21:01:10 +03:00
Peter Eisentraut 1ffa59a85c Fix optimization of foreign-key on update actions
In RI_FKey_pk_upd_check_required(), we check among other things
whether the old and new key are equal, so that we don't need to run
cascade actions when nothing has actually changed.  This was using the
equality operator.  But the effect of this is that if a value in the
primary key is changed to one that "looks" different but compares as
equal, the update is not propagated.  (Examples are float -0 and 0 and
case-insensitive text.)  This appears to violate the SQL standard, and
it also behaves inconsistently if in a multicolumn key another key is
also updated that would cause the row to compare as not equal.

To fix, if we are looking at the PK table in ri_KeysEqual(), then do a
bytewise comparison similar to record_image_eq() instead of using the
equality operators.  This only makes a difference for ON UPDATE
CASCADE, but for consistency we treat all changes to the PK the same.  For
the FK table, we continue to use the equality operators.

Discussion: https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com
2019-03-18 17:19:21 +01:00
Alexander Korotkov 16d489b0fe Numeric error suppression in jsonpath
Add support of numeric error suppression to jsonpath as it's required by
standard.  This commit doesn't use PG_TRY()/PG_CATCH() in order to implement
that.  Instead, it provides internal versions of numeric functions used, which
support error suppression.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Alexander Korotkov, Nikita Glukhov
Reviewed-by: Tomas Vondra
2019-03-16 12:21:19 +03:00
Alexander Korotkov 72b6460336 Partial implementation of SQL/JSON path language
SQL 2016 standards among other things contains set of SQL/JSON features for
JSON processing inside of relational database.  The core of SQL/JSON is JSON
path language, allowing access parts of JSON documents and make computations
over them.  This commit implements partial support JSON path language as
separate datatype called "jsonpath".  The implementation is partial because
it's lacking datetime support and suppression of numeric errors.  Missing
features will be added later by separate commits.

Support of SQL/JSON features requires implementation of separate nodes, and it
will be considered in subsequent patches.  This commit includes following
set of plain functions, allowing to execute jsonpath over jsonb values:

 * jsonb_path_exists(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_match(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_query(jsonb, jsonpath[, jsonb, bool]),
 * jsonb_path_query_array(jsonb, jsonpath[, jsonb, bool]).
 * jsonb_path_query_first(jsonb, jsonpath[, jsonb, bool]).

This commit also implements "jsonb @? jsonpath" and "jsonb @@ jsonpath", which
are wrappers over jsonpath_exists(jsonb, jsonpath) and jsonpath_predicate(jsonb,
jsonpath) correspondingly.  These operators will have an index support
(implemented in subsequent patches).

Catversion bumped, to add new functions and operators.

Code was written by Nikita Glukhov and Teodor Sigaev, revised by me.
Documentation was written by Oleg Bartunov and Liudmila Mantrova.  The work
was inspired by Oleg Bartunov.

Discussion: https://postgr.es/m/fcc6fc6a-b497-f39a-923d-aa34d0c588e8%402ndQuadrant.com
Author: Nikita Glukhov, Teodor Sigaev, Alexander Korotkov, Oleg Bartunov, Liudmila Mantrova
Reviewed-by: Tomas Vondra, Andrew Dunstan, Pavel Stehule, Alexander Korotkov
2019-03-16 12:16:48 +03:00