Commit Graph

147 Commits

Author SHA1 Message Date
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Tomas Vondra ab596105b5 BRIN minmax-multi indexes
Adds BRIN opclasses similar to the existing minmax, except that instead
of summarizing the page range into a single [min,max] range, the summary
consists of multiple ranges and/or points, allowing gaps. This allows
more efficient handling of data with poor correlation to physical
location within the table and/or outlier values, for which the regular
minmax opclassed tend to work poorly.

It's possible to specify the number of values kept for each page range,
either as a single point or an interval boundary.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_minmax_multi_ops(values_per_range=16));

When building the summary, the values are combined into intervals with
the goal to minimize the "covering" (sum of interval lengths), using a
support procedure computing distance between two values.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:54:30 +01:00
Tomas Vondra 77b88cd1bb BRIN bloom indexes
Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes
using the new opclasses allow only equality queries (similar to hash
indexes), but that works fine for data like UUID, MAC addresses etc. for
which range queries are not very common. This also means the indexes
work for data that is not well correlated to physical location within
the table, or perhaps even entirely random (which is a common issue with
existing BRIN minmax opclasses).

It's possible to specify opclass parameters with the usual Bloom filter
parameters, i.e. the desired false-positive rate and the expected number
of distinct values per page range.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_bloom_ops(false_positive_rate = 0.05,
                                n_distinct_per_range = 100));

The opclasses do not operate on the indexed values directly, but compute
a 32-bit hash first, and the Bloom filter is built on the hash value.
Collisions should not be a huge issue though, as the number of distinct
values in a page ranges is usually fairly small.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: Nico Williams <nico@cryptonector.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:35:32 +01:00
Tomas Vondra a681e3c107 Support the old signature of BRIN consistent function
Commit a1c649d889 changed the signature of the BRIN consistent function
by adding a new required parameter.  Treating the parameter as optional,
which would make the change backwards incompatibile, was rejected with
the justification that there are few out-of-core extensions, so it's not
worth adding making the code more complex, and it's better to deal with
that in the extension.

But after further thought, that would be rather problematic, because
pg_upgrade simply dumps catalog contents and the same version of an
extension needs to work on both PostgreSQL versions. Supporting both
variants of the consistent function (with 3 or 4 arguments) makes that
possible.

The signature is not the only thing that changed, as commit 72ccf55cb9
moved handling of IS [NOT] NULL keys from the support procedures. But
this change is backward compatible - handling the keys in exension is
unnecessary, but harmless. The consistent function will do a bit of
unnecessary work, but it should be very cheap.

This also undoes most of the changes to the existing opclasses (minmax
and inclusion), making them use the old signature again. This should
make backpatching simpler.

Catversion bump, because of changes in pg_amproc.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-26 13:17:58 +01:00
Peter Eisentraut 49ab61f0bd Add date_bin function
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.

Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
2021-03-24 16:18:24 +01:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Peter Eisentraut a6715af1e7 Add bit_count SQL function
This function for bit and bytea counts the set bits in the bit or byte
string.  Internally, we use the existing popcount functionality.

For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.

Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
2021-03-23 10:13:58 +01:00
Tomas Vondra a1c649d889 Pass all scan keys to BRIN consistent function at once
This commit changes how we pass scan keys to BRIN consistent function.
Instead of passing them one by one, we now pass all scan keys for a
given attribute at once. That makes the consistent function a bit more
complex, as it has to loop through the keys, but it does allow more
elaborate opclasses that can use multiple keys to eliminate ranges much
more effectively.

The existing BRIN opclasses (minmax, inclusion) don't really benefit
from this change. The primary purpose is to allow future opclases to
benefit from seeing all keys at once.

This does change the BRIN API, because the signature of the consistent
function changes (a new parameter with number of scan keys). So this
breaks existing opclasses, and will require supporting two variants of
the code for different PostgreSQL versions. We've considered supporting
two variants of the consistent, but we've decided not to do that.
Firstly, there's another patch that moves handling of NULL values from
the opclass, which means the opclasses need to be updated anyway.
Secondly, we're not aware of any out-of-core BRIN opclasses, so it does
not seem worth the extra complexity.

Bump catversion, because of pg_proc changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-23 00:45:03 +01:00
Robert Haas bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Amit Kapila c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
Commit 05c8482f7f added special logic related to parallel-safety of FK
triggers. This is a bit of a hack and should have instead been done by
simply setting appropriate proparallel values on those trigger functions
themselves.

Suggested-by: Tom Lane
Author: Greg Nancarrow
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/2309260.1615485644@sss.pgh.pa.us
2021-03-13 09:20:52 +05:30
Robert Haas 32fd2b57d7 Be clear about whether a recovery pause has taken effect.
Previously, the code and documentation seem to have essentially
assumed than a call to pg_wal_replay_pause() would take place
immediately, but that's not the case, because we only check for a
pause in certain places. This means that a tool that uses this
function and then wants to do something else afterward that is
dependent on the pause having taken effect doesn't know how long it
needs to wait to be sure that no more WAL is going to be replayed.

To avoid that, add a new function pg_get_wal_replay_pause_state()
which returns either 'not paused', 'paused requested', or 'paused'.
After calling pg_wal_replay_pause() the status will immediate change
from 'not paused' to 'pause requested'; when the startup process
has noticed this, the status will change to 'pause'.  For backward
compatibility, pg_is_wal_replay_paused() still exists and returns
the same thing as before: true if a pause has been requested,
whether or not it has taken effect yet; and false if not.
The documentation is updated to clarify.

To improve the changes that a pause request is quickly confirmed
effective, adjust things so that WaitForWALToBecomeAvailable will
swiftly reach a call to recoveryPausesHere() when a pause request
is made.

Dilip Kumar, reviewed by Simon Riggs, Kyotaro Horiguchi, Yugo Nagata,
Masahiko Sawada, and Bharath Rupireddy.

Discussion: http://postgr.es/m/CAFiTN-vcLLWEm8Zr%3DYK83rgYrT9pbC8VJCfa1kY9vL3AUPfu6g%40mail.gmail.com
2021-03-11 15:07:03 -05:00
Fujii Masao ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Tom Lane fed10d4eec Properly mark pg_stat_get_subscription() as returning a set.
The initial catalog data for this function failed to set proretset
or provide a prorows estimate.  It accidentally worked anyway when
invoked in the FROM clause, because the executor isn't too picky
about this; but the planner didn't expect the function to return
multiple rows, which could lead to bad plans.  Also the function
would fail if invoked in the SELECT list.

We can't easily back-patch this fix, but fortunately the bug's
consequences aren't awful in most cases.  Getting this right is
mainly an exercise in future-proofing.

Discussion: https://postgr.es/m/1636062.1615141782@sss.pgh.pa.us
2021-03-08 18:47:23 -05:00
Peter Eisentraut f9a0392e1c Add bit_xor aggregate function
This can be used as a checksum for unordered sets.  bit_and and bit_or
already exist.

Author: Alexey Bashtanov <bashtanov@imap.cc>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9d4582ae-ecfc-3a13-2238-6ab5a37c1f41@imap.cc
2021-03-06 19:28:05 +01:00
Tom Lane 0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Amit Kapila 19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30
Thomas Munro 8556267b2b Revert "pg_collation_actual_version() -> pg_collation_current_version()."
This reverts commit 9cf184cc05.  Name
change less well received than anticipated.

Discussion: https://postgr.es/m/afcfb97e-88a1-a540-db95-6c573b93bc2b%40eisentraut.org
2021-02-26 15:29:27 +13:00
Thomas Munro 9cf184cc05 pg_collation_actual_version() -> pg_collation_current_version().
The new name seems a bit more natural.

Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:32:16 +13:00
Fujii Masao 46d6e5f567 Display the time when the process started waiting for the lock, in pg_locks, take 2
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

The first attempt of this patch (commit 3b733fcd04) caused the buildfarm
member "rorqual" (built with --disable-atomics --disable-spinlocks) to report
the failure of the regression test. It was reverted by commit 890d2182a2.
The cause of this failure was that the atomic variable for "waitstart"
in the dummy process entry created at the end of prepare transaction was
not initialized. This second attempt fixes that issue.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-15 15:13:37 +09:00
Fujii Masao 890d2182a2 Revert "Display the time when the process started waiting for the lock, in pg_locks."
This reverts commit 3b733fcd04.

Per buildfarm members prion and rorqual.
2021-02-09 18:30:40 +09:00
Fujii Masao 3b733fcd04 Display the time when the process started waiting for the lock, in pg_locks.
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.

This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.

Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
2021-02-09 18:10:19 +09:00
Tom Lane 62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Alexander Korotkov 676887a3b0 Implementation of subscripting for jsonb
Subscripting for jsonb does not support slices, does not have a limit for the
number of subscripts, and an assignment expects a replace value to have jsonb
type.  There is also one functional difference between assignment via
subscripting and assignment via jsonb_set().  When an original jsonb container
is NULL, the subscripting replaces it with an empty jsonb and proceeds with
an assignment.

For the sake of code reuse, we rearrange some parts of jsonb functionality
to allow the usage of the same functions for jsonb_set and assign subscripting
operation.

The original idea belongs to Oleg Bartunov.

Catversion is bumped.

Discussion: https://postgr.es/m/CA%2Bq6zcV8qvGcDXurwwgUbwACV86Th7G80pnubg42e-p9gsSf%3Dg%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcX3mdxGCgdThzuySwH-ApyHHM-G4oB1R0fn0j2hZqqkLQ%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVDuGBv%3DM0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2Bq6zcVovR%2BXY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA%40mail.gmail.com
Author: Dmitry Dolgov
Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
Reviewed-by: Andrew Dunstan, Chapman Flack, Merlin Moncure, Peter Geoghegan
Reviewed-by: Alvaro Herrera, Jim Nasby, Josh Berkus, Victor Wagner
Reviewed-by: Aleksander Alekseev, Robert Haas, Oleg Bartunov
2021-01-31 23:50:40 +03:00
Tom Lane a6cf3df4eb Add bytea equivalents of ltrim() and rtrim().
We had bytea btrim() already, but for some reason not the other two.

Joel Jacobson

Discussion: https://postgr.es/m/d10cd5cd-a901-42f1-b832-763ac6f7ff3a@www.fastmail.com
2021-01-18 15:11:32 -05:00
Magnus Hagander 960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Tom Lane 5a6f9bce8d Mark inet_server_addr() and inet_server_port() as parallel-restricted.
These need to be PR because they access the MyProcPort data structure,
which doesn't get copied to parallel workers.  The very similar
functions inet_client_addr() and inet_client_port() are already
marked PR, but somebody missed these.

Although this is a pre-existing bug, we can't readily fix it in the back
branches since we can't force initdb.  Given the small usage of these
two functions, and the even smaller likelihood that they'd get pushed to
a parallel worker anyway, it doesn't seem worth the trouble to suggest
that DBAs should fix it manually.

Masahiko Sawada

Discussion: https://postgr.es/m/CAD21AoAT4aHP0Uxq91qpD7NL009tnUYQe-b14R3MnSVOjtE71g@mail.gmail.com
2021-01-13 16:23:15 -05:00
Tom Lane 8b411b8ff4 Run reformat-dat-files to declutter the catalog data files.
Things had gotten pretty messy here, apparently mostly but not
entirely the fault of the multirange patch.  No functional changes.
2021-01-13 16:14:38 -05:00
Bruce Momjian ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Alexander Korotkov db6335b5b1 Add support of multirange matching to the existing range GiST indexes
6df7a9698b has introduced a set of operators between ranges and multiranges.
Existing GiST indexes for ranges could easily support majority of them.
This commit adds support for new operators to the existing range GiST indexes.
New operators resides the same strategy numbers as existing ones.  Appropriate
check function is determined using the subtype.

Catversion is bumped.
2020-12-29 23:36:43 +03:00
Alexander Korotkov 4d7684cc75 Implement operators for checking if the range contains a multirange
We have operators for checking if the multirange contains a range but don't
have the opposite.  This commit improves completeness of the operator set by
adding two new operators: @> (anyrange,anymultirange) and
<@(anymultirange,anyrange).

Catversion is bumped.
2020-12-29 23:35:33 +03:00
Alexander Korotkov 6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Tom Lane c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Fujii Masao 01469241b2 Track total number of WAL records, FPIs and bytes generated in the cluster.
Commit 6b466bf5f2 allowed pg_stat_statements to track the number of
WAL records, full page images and bytes that each statement generated.
Similarly this commit allows us to track the cluster-wide WAL statistics
counters.

New columns wal_records, wal_fpi and wal_bytes are added into the
pg_stat_wal view, and reports the total number of WAL records,
full page images and bytes generated in the , respectively.

Author: Masahiro Ikeda
Reviewed-by: Amit Kapila, Movead Li, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/35ef960128b90bfae3b3fdf60a3a860f@oss.nttdata.com
2020-12-02 13:00:15 +09:00
Michael Paquier 7b94e99960 Remove catalog function currtid()
currtid() and currtid2() are an undocumented set of functions whose sole
known user is the Postgres ODBC driver, able to retrieve the latest TID
version for a tuple given by the caller of those functions.

As used by Postgres ODBC, currtid() is a shortcut able to retrieve the
last TID loaded into a backend by passing an OID of 0 (magic value)
after a tuple insertion.  This is removed in this commit, as it became
obsolete after the driver began using "RETURNING ctid" with inserts, a
clause supported since Postgres 8.2 (using RETURNING is better for
performance anyway as it reduces the number of round-trips to the
backend).

currtid2() is still used by the driver, so this remains around for now.
Note that this function is kept in its original shape for backward
compatibility reasons.

Per discussion with many people, including Andres Freund, Peter
Eisentraut, Álvaro Herrera, Hiroshi Inoue, Tom Lane and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20200603021448.GB89559@paquier.xyz
2020-11-25 12:18:26 +09:00
Peter Eisentraut 01e658fa74 Hash support for row types
Add hash functions for the record type as well as a hash operator
family and operator class for the record type.  This enables all the
hash functionality for the record type such as hash-based plans for
UNION/INTERSECT/EXCEPT DISTINCT, recursive queries using UNION
DISTINCT, hash joins, and hash partitioning.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/38eccd35-4e2d-6767-1b3c-dada1eac3124%402ndquadrant.com
2020-11-19 09:32:47 +01:00
Tom Lane 9e38c2bb50 Declare assorted array functions using anycompatible not anyelement.
Convert array_append, array_prepend, array_cat, array_position,
array_positions, array_remove, array_replace, and width_bucket
to use anycompatiblearray.  This is a simple extension of commit
5c292e6b9 to hit some other places where there's a pretty obvious
gain in usability from doing so.

Ideally we'd also modify other functions taking multiple old-style
polymorphic arguments.  But most of the remainder are tied into one
or more operator classes, making any such change a much larger can of
worms than I desire to open right now.

Discussion: https://postgr.es/m/77675130-89da-dab1-51dd-492c93dcf5d1@postgresfriends.org
2020-11-04 16:09:55 -05:00
Tom Lane 5c292e6b90 Declare lead() and lag() using anycompatible not anyelement.
This allows use of a "default" expression that doesn't slavishly
match the data column's type.  Formerly you got something like
"function lag(numeric, integer, integer) does not exist", which
is not just unhelpful but actively misleading.

The SQL spec suggests that the default should be coerced to the data
column's type, but this implementation instead chooses the common
supertype, which seems at least as reasonable.

(Note: I took the opportunity to run "make reformat-dat-files" on
pg_proc.dat, so this commit includes some cosmetic changes to
recently-added entries that aren't related to lead/lag.)

Vik Fearing

Discussion: https://postgr.es/m/77675130-89da-dab1-51dd-492c93dcf5d1@postgresfriends.org
2020-11-04 15:08:37 -05:00
Michael Paquier e152506ade Revert pg_relation_check_pages()
This reverts the following set of commits, following complaints about
the lack of portability of the central part of the code in bufmgr.c as
well as the use of partition mapping locks during page reads:
c780a7a9
f2b88396
b787d4ce
ce7f772c
60a51c6b

Per discussion with Andres Freund, Robert Haas and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de
2020-11-04 10:21:46 +09:00
Amit Kapila 8e90ec5580 Track statistics for streaming of changes from ReorderBuffer.
This adds the statistics about transactions streamed to the decoding
output plugin from ReorderBuffer. Users can query the
pg_stat_replication_slots view to check these stats and call
pg_stat_reset_replication_slot to reset the stats of a particular slot.
Users can pass NULL in pg_stat_reset_replication_slot to reset stats of
all the slots.

Commit 9868167500 has added the basic infrastructure to capture the stats
of slot and this commit extends the statistics collector to track
additional information about slots.

Bump the catversion as we have added new columns in the catalog entry.

Author: Ajin Cherian and Amit Kapila
Reviewed-by: Sawada Masahiko and Dilip Kumar
Discussion: https://postgr.es/m/CAA4eK1+chpEomLzgSoky-D31qev19AmECNiEAietPQUGEFhtVA@mail.gmail.com
2020-10-29 09:11:51 +05:30
Tom Lane 36b9312143 Don't use custom OID symbols in pg_proc.dat.
We have a perfectly good convention for OID macros for built-in functions
already, so making custom symbols is just introducing unnecessary
deviation from the convention.  Remove the one case that had snuck in,
and add an error check in genbki.pl to discourage future instances.

Although this touches pg_proc.dat, there's no need for a catversion
bump since the actual catalog data isn't changed.

John Naylor

Discussion: https://postgr.es/m/CAFBsxsHpCbjfoddNGpnnnY5pHwckWfiYkMYSF74PmP1su0+ZOw@mail.gmail.com
2020-10-28 12:18:45 -04:00
Michael Paquier f2b8839695 Add pg_relation_check_pages() to check on-disk pages of a relation
This makes use of CheckBuffer() introduced in c780a7a, adding a SQL
wrapper able to do checks for all the pages of a relation.  By default,
all the fork types of a relation are checked, and it is possible to
check only a given relation fork.  Note that if the relation given in
input has no physical storage or is temporary, then no errors are
generated, allowing full-database checks when coupled with a simple scan
of pg_class for example.  This is not limited to clusters with data
checksums enabled, as clusters without data checksums can still apply
checks on pages using the page headers or for the case of a page full of
zeros.

This function returns a set of tuples consisting of:
- The physical file where a broken page has been detected (without the
segment number as that can be AM-dependent, which can be guessed from
the block number for heap).  A relative path from PGPATH is used.
- The block number of the broken page.

By default, only superusers have an access to this function but
execution rights can be granted to other users.

The feature introduced here is still minimal, and more improvements
could be done, like:
- Addition of a start and end block number to run checks on a range
of blocks, which would apply only if one fork type is checked.
- Addition of some progress reporting.
- Throttling, with configuration parameters in function input or
potentially some cost-based GUCs.

Regression tests are added for positive cases in the main regression
test suite, and TAP tests are added for cases involving the emulation of
page corruptions.

Bump catalog version.

Author: Julien Rouhaud, Michael Paquier
Reviewed-by: Masahiko Sawada, Justin Pryzby
Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
2020-10-28 12:15:00 +09:00
Amit Kapila 03d51b776d Change the attribute name in pg_stat_replication_slots view.
Change the attribute 'name' to 'slot_name' in pg_stat_replication_slots
view to make it clear and that way we will be consistent with the other
places like pg_stat_wal_receiver view where we display the same attribute.

In the passing, fix the typo in one of the macros in the related code.

Bump the catversion as we have modified the name in the catalog as well.

Reported-by: Noriyoshi Shinoda
Author: Noriyoshi Shinoda
Reviewed-by: Sawada  Masahiko and Amit Kapila
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
2020-10-20 10:24:36 +05:30
Amit Kapila 9868167500 Track statistics for spilling of changes from ReorderBuffer.
This adds the statistics about transactions spilled to disk from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats and call pg_stat_reset_replication_slot to reset the stats of
a particular slot. Users can pass NULL in pg_stat_reset_replication_slot
to reset stats of all the slots.

This commit extends the statistics collector to track this information
about slots.

Author: Sawada Masahiko and Amit Kapila
Reviewed-by: Amit Kapila and Dilip Kumar
Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com
2020-10-08 09:09:08 +05:30
Fujii Masao 8d9a935965 Add pg_stat_wal statistics view.
This view shows the statistics about WAL activity. Currently it has only
two columns: wal_buffers_full and stats_reset. wal_buffers_full column
indicates the number of times WAL data was written to the disk because
WAL buffers got full. This information is useful when tuning wal_buffers.
stats_reset column indicates the time at which these statistics were
last reset.

pg_stat_wal view is also the basic infrastructure to expose other
various statistics about WAL activity later.

Bump PGSTAT_FILE_FORMAT_ID due to the change in pgstat format.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-by: Takayuki Tsunakawa, Kyotaro Horiguchi, Amit Kapila, Fujii Masao
Discussion: https://postgr.es/m/188bd3f2d2233cf97753b5ced02bb050@oss.nttdata.com
2020-10-02 10:17:11 +09:00
Tom Lane 06a7c3154f Allow most keywords to be used as column labels without requiring AS.
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword.  This is rather unfriendly considering that we have
so many keywords and are constantly growing more.  In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.

We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS.  Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.

It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%.  This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-18 16:46:36 -04:00
Tom Lane 76f412ab31 Remove factorial operators, leaving only the factorial() function.
The "!" operator is our only built-in postfix operator.  Remove it,
on the way to removal of grammar support for postfix operators.

There is also a "!!" prefix operator, but since it's been marked
deprecated for most of its existence, we might as well remove it too.

Also zap the SQL alias function numeric_fac(), which seems to have
equally little reason to live.

Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor

Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
2020-09-17 16:17:27 -04:00
Heikki Linnakangas 16fa9b2b30 Add support for building GiST index by sorting.
This adds a new optional support function to the GiST access method:
sortsupport. If it is defined, the GiST index is built by sorting all data
to the order defined by the sortsupport's comparator function, and packing
the tuples in that order to GiST pages. This is similar to how B-tree
index build works, and is much faster than inserting the tuples one by
one. The resulting index is smaller too, because the pages are packed more
tightly, upto 'fillfactor'. The normal build method works by splitting
pages, which tends to lead to more wasted space.

The quality of the resulting index depends on how good the opclass-defined
sort order is. A good order preserves locality of the input data.

As the first user of this facility, add 'sortsupport' function to the
point_ops opclass. It sorts the points in Z-order (aka Morton Code), by
interleaving the bits of the X and Y coordinates.

Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
2020-09-17 11:33:40 +03:00
Tom Lane 66f1630680 Add string_to_table() function.
This splits a string at occurrences of a delimiter.  It is exactly like
string_to_array() except for producing a set of values instead of an
array of values.  Thus, the relationship of these two functions is
the same as between regexp_split_to_table() and regexp_split_to_array().

Although the same results could be had from unnest(string_to_array()),
this is somewhat faster than that, and anyway it seems reasonable to
have it for symmetry with the regexp functions.

Pavel Stehule, reviewed by Peter Smith

Discussion: https://postgr.es/m/CAFj8pRD8HOpjq2TqeTBhSo_QkzjLOhXzGCpKJ4nCs7Y9SQkuPw@mail.gmail.com
2020-09-02 18:23:56 -04:00
Tom Lane 6ca547cf75 Mark factorial operator, and postfix operators in general, as deprecated.
Per discussion, we're planning to remove parser support for postfix
operators in order to simplify the grammar.  So it behooves us to
put out a deprecation notice at least one release before that.

There is only one built-in postfix operator, ! for factorial.
Label it deprecated in the docs and in pg_description, and adjust
some examples that formerly relied on it.  (The sister prefix
operator !! is also deprecated.  We don't really have to remove
that one, but since we're suggesting that people use factorial()
instead, it seems better to remove both operators.)

Also state in the CREATE OPERATOR ref page that postfix operators
in general are going away.

Although this changes the initial contents of pg_description,
I did not force a catversion bump; it doesn't seem essential.

In v13, also back-patch 4c5cf5431, so that there's someplace for
the <link>s to point to.

Mark Dilger and John Naylor, with some adjustments by me

Discussion: https://postgr.es/m/BE2DF53D-251A-4E26-972F-930E523580E9@enterprisedb.com
2020-08-30 14:37:24 -04:00