Commit Graph

2360 Commits

Author SHA1 Message Date
Joe Conway
b12bd4869b Fix has_column_privilege function corner case
According to the comments, when an invalid or dropped column oid is passed
to has_column_privilege(), the intention has always been to return NULL.
However, when the caller had table level privilege the invalid/missing
column was never discovered, because table permissions were checked first.

Fix that by introducing extended versions of pg_attribute_acl(check|mask)
and pg_class_acl(check|mask) which take a new argument, is_missing. When
is_missing is NULL, the old behavior is preserved. But when is_missing is
passed by the caller, no ERROR is thrown for dropped or missing
columns/relations, and is_missing is flipped to true. This in turn allows
has_column_privilege to check for column privileges first, providing the
desired semantics.

Not backpatched since it is a user visible behavioral change with no previous
complaints, and the fix is a bit on the invasive side.

Author: Joe Conway
Reviewed-By: Tom Lane
Reported by: Ian Barwick
Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com
2021-03-31 13:55:25 -04:00
Peter Eisentraut
055fee7eb4 Allow an alias to be attached to a JOIN ... USING
This allows something like

    SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x

where x has the columns a, b, c and unlike a regular alias it does not
hide the range variables of the tables being joined t1 and t2.

Per SQL:2016 feature F404 "Range variable for common column names".

Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com
2021-03-31 17:10:50 +02:00
Peter Eisentraut
8df2f37114 Improve consistency of SQL code capitalization 2021-03-27 10:17:12 +01:00
Tomas Vondra
a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Noah Misch
a14a0118a1 Add "pg_database_owner" default role.
Membership consists, implicitly, of the current database owner.  Expect
use in template databases.  Once pg_database_owner has rights within a
template, each owner of a database instantiated from that template will
exercise those rights.

Reviewed by John Naylor.

Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
2021-03-26 10:42:17 -07:00
Robert Haas
5db1fd7823 Fix interaction of TOAST compression with expression indexes.
Before, trying to compress a value for insertion into an expression
index would crash.

Dilip Kumar, with some editing by me. Report by Jaime Casanova.

Discussion: http://postgr.es/m/CAJKUy5gcs0zGOp6JXU2mMVdthYhuQpFk=S3V8DOKT=LZC1L36Q@mail.gmail.com
2021-03-25 19:55:32 -04:00
Alvaro Herrera
71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Fujii Masao
7fbcee1b2d Log when GetNewOidWithIndex() fails to find unused OID many times.
GetNewOidWithIndex() generates a new OID one by one until it finds
one not in the relation. If there are very long runs of consecutive
existing OIDs, GetNewOidWithIndex() needs to iterate many times
in the loop to find unused OID. Since TOAST table can have a large
number of entries and there can be such long runs of OIDs, there is
the case where it takes so many iterations to find new OID not in
TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
GetNewOidWithIndex() enters something like busy loop and repeats
the iterations until at least one OID is marked as unused.

There are some reported troubles caused by a large number of
iterations in GetNewOidWithIndex(). For example, when inserting
a billion of records into the table, all the backends doing that
insertion operation got hang with 100% CPU usage at some point.

Previously there was no easy way to detect that GetNewOidWithIndex()
failed to find unused OID many times. So, for example, gdb full
backtrace of hanged backends needed to be taken, in order to
investigate that trouble. This is inconvenient and may not be
available in some production environments.

To provide easy way for that, this commit makes GetNewOidWithIndex()
log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
not yet found OID unused in the relation. Also this commit makes
it repeat logging with exponentially increasing intervals until
it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
an unused OID is found. Those macro variables are used not to
fill up the server log with the similar messages.

In the discusion at pgsql-hackers, there was another idea to report
the lots of iterations in GetNewOidWithIndex() via wait event.
But since GetNewOidWithIndex() traverses indexes to find unused
OID and which will do I/O, acquire locks, etc, which will overwrite
the wait event and reset it to nothing once done. So that idea
doesn't work well, and we didn't adopt it.

Author: Tomohiro Hiramitsu
Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
2021-03-24 10:36:56 +09:00
Bruce Momjian
95d77149c5 Add macro RelationIsPermanent() to report relation permanence
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code.  This matches other RelationIs* macros.

This macro will be used in more places in future cluster file encryption
patches.

Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
2021-03-22 20:23:52 -04:00
Robert Haas
bbe0a81db6 Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.

In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.

Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.

It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT.  It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data.  These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project.  However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.

Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored.  More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.

Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
2021-03-19 15:10:38 -04:00
Tomas Vondra
be45be9c33 Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate.  This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to

  GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b, c),
    (a, b, c),
    (a, b),
    (a, b),
    (a, b),
    (a),
    (a),
    (a),
    (c, a),
    (c, a),
    (c, a),
    (c),
    (b, c),
    (b),
    ()
  )

Some of the grouping sets are calculated multiple times, which is mostly
unnecessary.  This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.

Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 18:22:18 +01:00
Michael Paquier
9fd2952cf4 Fix comment in indexing.c
578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.

Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
2021-03-17 18:07:00 +09:00
Fujii Masao
ff99918c62 Track total amounts of times spent writing and syncing WAL data to disk.
This commit adds new GUC track_wal_io_timing. When this is enabled,
the total amounts of time XLogWrite writes and issue_xlog_fsync syncs
WAL data to disk are counted in pg_stat_wal. This information would be
useful to check how much WAL write and sync affect the performance.

Enabling track_wal_io_timing will make the server query the operating
system for the current time every time WAL is written or synced,
which may cause significant overhead on some platforms. To avoid such
additional overhead in the server with track_io_timing enabled,
this commit introduces track_wal_io_timing as a separate parameter from
track_io_timing.

Note that WAL write and sync activity by walreceiver has not been tracked yet.

This commit makes the server also track the numbers of times XLogWrite
writes and issue_xlog_fsync syncs WAL data to disk, in pg_stat_wal,
regardless of the setting of track_wal_io_timing. This counters can be
used to calculate the WAL write and sync time per request, for example.

Bump PGSTAT_FILE_FORMAT_ID.

Bump catalog version.

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston, Fujii Masao
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75392@oss.nttdata.com
2021-03-09 16:52:06 +09:00
Michael Paquier
9d2d457009 Add support for more progress reporting in COPY
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.

The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.

Bump catalog version, again.

Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
2021-03-09 14:21:03 +09:00
Michael Paquier
f9264d1524 Remove support for SSL compression
PostgreSQL disabled compression as of e3bdb2d and the documentation
recommends against using it since.  Additionally, SSL compression has
been disabled in OpenSSL since version 1.1.0, and was disabled in many
distributions long before that.  The most recent TLS version, TLSv1.3,
disallows compression at the protocol level.

This commit removes the feature itself, removing support for the libpq
parameter sslcompression (parameter still listed for compatibility
reasons with existing connection strings, just ignored), and removes
the equivalent field in pg_stat_ssl and de facto PgBackendSSLStatus.

Note that, on top of removing the ability to activate compression by
configuration, compression is actively disabled in both frontend and
backend to avoid overrides from local configurations.

A TAP test is added for deprecated SSL parameters to check after
backwards compatibility.

Bump catalog version.

Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Magnus Hagander, Michael Paquier
Discussion:  https://postgr.es/m/7E384D48-11C5-441B-9EC3-F7DB1F8518F6@yesql.se
2021-03-09 11:16:47 +09:00
Tom Lane
0a687c8f10 Add trim_array() function.
This has been in the SQL spec since 2008.  It's a pretty thin
wrapper around the array slice functionality, but the spec
says we should have it, so here it is.

Vik Fearing, reviewed by Dian Fay

Discussion: https://postgr.es/m/fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
2021-03-03 16:39:57 -05:00
Amit Kapila
19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30
Peter Eisentraut
f4adc41c4f Enhanced cycle mark values
Per SQL:202x draft, in the CYCLE clause of a recursive query, the
cycle mark values can be of type boolean and can be omitted, in which
case they default to TRUE and FALSE.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-27 08:13:24 +01:00
Amit Kapila
8a4f9522d0 Fix relcache reference leak introduced by ce0fdbfe97.
Author: Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoA7ZEfsOXQ9HQqMv3QYGsEm2H5Wk5ic5S=mvzDf-3a3SA@mail.gmail.com
2021-02-25 07:48:24 +05:30
Thomas Munro
0fb0a0503b Hide internal error for pg_collation_actual_version(<bad OID>).
Instead of an unsightly internal "cache lookup failed" message, just
return NULL for bad OIDs, as is the convention for other similar things.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20210117215940.GE8560%40telsasoft.com
2021-02-22 23:01:20 +13:00
Michael Paquier
9294264278 Use pgstat_progress_update_multi_param() where possible
This commit changes one code path in REINDEX INDEX and one code path
in CREATE INDEX CONCURRENTLY to report the progress of each operation
using pgstat_progress_update_multi_param() rather than
multiple calls to pgstat_progress_update_param().  This has the
advantage to make the progress report more consistent to the end-user
without impacting the amount of information provided.

Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACV5zW7GxD8D_tyO==bcj6ZktQchEKWKPBOAGKiLhAQo=w@mail.gmail.com
2021-02-22 14:21:40 +09:00
Peter Eisentraut
f40c6969d0 Routine usage information schema tables
Several information schema views track dependencies between
functions/procedures and objects used by them.  These had not been
implemented so far because PostgreSQL doesn't track objects used in a
function body.  However, formally, these also show dependencies used
in parameter default expressions, which PostgreSQL does support and
track.  So for the sake of completeness, we might as well add these.
If dependency tracking for function bodies is ever implemented, these
views will automatically work correctly.

Reviewed-by: Erik Rijkers <er@xs4all.nl>
Discussion: https://www.postgresql.org/message-id/flat/ac80fc74-e387-8950-9a31-2560778fc1e3%40enterprisedb.com
2021-02-17 18:16:06 +01:00
Peter Eisentraut
0e392fcc0d Use errmsg_internal for debug messages
An inconsistent set of debug-level messages was not using
errmsg_internal(), thus uselessly exposing the messages to translation
work.  Fix those.
2021-02-17 11:33:25 +01:00
Amit Kapila
ce0fdbfe97 Allow multiple xacts during table sync in logical replication.
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.

There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.

This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.

The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.

This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.

Bump catalog version due to change of state in the catalog
(pg_subscription_rel).

Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-12 07:41:51 +05:30
Michael Paquier
bd12080980 Preserve pg_attribute.attstattarget across REINDEX CONCURRENTLY
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS.  This data was lost on the new index after REINDEX
CONCURRENTLY.

The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable.  Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.

Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
2021-02-10 13:06:48 +09:00
Tom Lane
dd705a039f Disallow converting an inheritance child table to a view.
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables).  ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end.  When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted.  Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).

One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible.  There are probably a lot of other parts of the
system that don't cope well with such a situation, too.  For now,
just forbid it.

Per bug #16856 from Yang Lin.  Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass().  Perhaps the lack of
that infrastructure partially explains the missing check.)

Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
2021-02-06 15:17:01 -05:00
Michael Paquier
5128483d06 Ensure unlinking of old index file with REINDEX (TABLESPACE)
The original versions of the patch included this part, but a mismerge
from my side has made this piece go missing.  Oversight in c5b28604.
2021-02-04 17:16:47 +09:00
Michael Paquier
c5b286047c Add TABLESPACE option to REINDEX
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them.  Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace.  The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX.  This results in an error
if doing REINDEX on a single object.  REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.

Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
2021-02-04 14:34:20 +09:00
Tom Lane
ba0faf81c6 Remove special BKI_LOOKUP magic for namespace and role OIDs.
Now that commit 62f34097c attached BKI_LOOKUP annotation to all the
namespace and role OID columns in the catalogs, there's no real reason
to have the magic PGNSP and PGUID symbols.  Get rid of them in favor
of implementing those lookups according to genbki.pl's normal pattern.

This means that in the catalog headers, BKI_DEFAULT(PGNSP) becomes
BKI_DEFAULT(pg_catalog), which seems a lot more transparent.
BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES), which is perhaps
less so; but you can look into pg_authid.dat to discover that
POSTGRES is the nonce name for the bootstrap superuser.

This change also means that if we ever need cross-references in the
initial catalog data to any of the other built-in roles besides
POSTGRES, or to some other built-in schema besides pg_catalog,
we can just do it.

No catversion bump here, as there's no actual change in the contents
of postgres.bki.

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-03 12:01:48 -05:00
Tom Lane
62f34097c8 Build in some knowledge about foreign-key relationships in the catalogs.
This follows in the spirit of commit dfb75e478, which created primary
key and uniqueness constraints to improve the visibility of constraints
imposed on the system catalogs.  While our catalogs contain many
foreign-key-like relationships, they don't quite follow SQL semantics,
in that the convention for an omitted reference is to write zero not
NULL.  Plus, we have some cases in which there are arrays each of whose
elements is supposed to be an FK reference; SQL has no way to model that.
So we can't create actual foreign key constraints to describe the
situation.  Nonetheless, we can collect and use knowledge about these
relationships.

This patch therefore adds annotations to the catalog header files to
declare foreign-key relationships.  (The BKI_LOOKUP annotations cover
simple cases, but we weren't previously distinguishing which such
columns are allowed to contain zeroes; we also need new markings for
multi-column FK references.)  Then, Catalog.pm and genbki.pl are
taught to collect this information into a table in a new generated
header "system_fk_info.h".  The only user of that at the moment is
a new SQL function pg_get_catalog_foreign_keys(), which exposes the
table to SQL.  The oidjoins regression test is rewritten to use
pg_get_catalog_foreign_keys() to find out which columns to check.
Aside from removing the need for manual maintenance of that test
script, this allows it to cover numerous relationships that were not
checked by the old implementation based on findoidjoins.  (As of this
commit, 217 relationships are checked by the test, versus 181 before.)

Discussion: https://postgr.es/m/3240355.1612129197@sss.pgh.pa.us
2021-02-02 17:11:55 -05:00
Peter Eisentraut
1d71f3c83c Improve confusing variable names
The prototype calls the second argument of
pgstat_progress_update_multi_param() "index", and some callers name
their local variable that way.  But when the surrounding code deals
with index relations, this is confusing, and in at least one case
shadowed another variable that is referring to an index relation.
Adjust those call sites to have clearer local variable naming, similar
to existing callers in indexcmds.c.
2021-02-02 09:20:22 +01:00
Peter Eisentraut
3696a600e2 SEARCH and CYCLE clauses
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.

Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com
2021-02-01 14:32:51 +01:00
Peter Eisentraut
dfb75e478c Add primary keys and unique constraints to system catalogs
For those system catalogs that have a unique indexes, make a primary
key and unique constraint, using ALTER TABLE ... PRIMARY KEY/UNIQUE
USING INDEX.

This can be helpful for GUI tools that look for a primary key, and it
might in the future allow declaring foreign keys, for making schema
diagrams.

The constraint creation statements are automatically created by
genbki.pl from DECLARE_UNIQUE_INDEX directives.  To specify which one
of the available unique indexes is the primary key, use the new
directive DECLARE_UNIQUE_INDEX_PKEY instead.  By convention, we
usually make a catalog's OID column its primary key, if it has one.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98@2ndquadrant.com
2021-01-30 19:44:29 +01:00
Peter Eisentraut
6aaaa76bb4 Allow GRANTED BY clause in normal GRANT and REVOKE statements
The SQL standard allows a GRANTED BY clause on GRANT and
REVOKE (privilege) statements that can specify CURRENT_USER or
CURRENT_ROLE.  In PostgreSQL, both of these are the default behavior.
Since we already have all the parsing support for this for the
GRANT (role) statement, we might as well add basic support for this
for the privilege variant as well.  This allows us to check off SQL
feature T332.  In the future, perhaps more interesting things could be
done with this, too.

Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/f2feac44-b4c5-f38f-3699-2851d6a76dc9@2ndquadrant.com
2021-01-30 09:45:11 +01:00
Noah Misch
360bd2321b Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.
CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d3 broke this.

Kyotaro Horiguchi

Discussion: https://postgr.es/m/20210113.160705.2225256954956139776.horikyota.ntt@gmail.com
2021-01-30 00:11:38 -08:00
Michael Paquier
21378e1fef Fix ALTER DEFAULT PRIVILEGES with duplicated objects
Specifying duplicated objects in this command would lead to unique
constraint violations in pg_default_acl or "tuple already updated by
self" errors.  Similarly to GRANT/REVOKE, increment the command ID after
each subcommand processing to allow this case to work transparently.

A regression test is added by tweaking one of the existing queries of
privileges.sql to stress this case.

Reported-by: Andrus
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/ae2a7dc1-9d71-8cba-3bb9-e4cb7eb1f44e@hot.ee
Backpatch-through: 9.5
2021-01-20 11:38:17 +09:00
Michael Paquier
a3dc926009 Refactor option handling of CLUSTER, REINDEX and VACUUM
This continues the work done in b5913f6.  All the options of those
commands are changed to use hex values rather than enums to reduce the
risk of compatibility bugs when introducing new options.  Each option
set is moved into a new structure that can be extended with more
non-boolean options (this was already the case of VACUUM).  The code of
REINDEX is restructured so as manual REINDEX commands go through a
single routine from utility.c, like VACUUM, to ease the allocation
handling of option parameters when a command needs to go through
multiple transactions.

This can be used as a base infrastructure for future patches related to
those commands, including reindex filtering and tablespace support.

Per discussion with people mentioned below, as well as Alvaro Herrera
and Peter Eisentraut.

Author: Michael Paquier, Justin Pryzby
Reviewed-by: Alexey Kondratov, Justin Pryzby
Discussion: https://postgr.es/m/X8riynBLwxAD9uKk@paquier.xyz
2021-01-18 14:03:10 +09:00
Magnus Hagander
960869da08 Add pg_stat_database counters for sessions and session time
This add counters for number of sessions, the different kind of session
termination types, and timers for how much time is spent in active vs
idle in a database to pg_stat_database.

Internally this also renames the parameter "force" to disconnect. This
was the only use-case for the parameter before, so repurposing it to
this mroe narrow usecase makes things cleaner than inventing something
new.

Author: Laurenz Albe
Reviewed-By: Magnus Hagander, Soumyadeep Chakraborty, Masahiro Ikeda
Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.camel@cybertec.at
2021-01-17 13:52:31 +01:00
Alvaro Herrera
ebfe2dbd6b
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025d in pg12 for tables and 33e6c34c32 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems.  One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.

Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.

Backpatch this to 11, where this problem has been latent all along.  We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.

Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST.  Arguably, the latter is more
correct.

It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
  ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace.  Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.

This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 15:32:14 -03:00
Peter Geoghegan
9dc718bdf2 Pass down "logically unchanged index" hint.
Add an executor aminsert() hint mechanism that informs index AMs that
the incoming index tuple (the tuple that accompanies the hint) is not
being inserted by execution of an SQL statement that logically modifies
any of the index's key columns.

The hint is received by indexes when an UPDATE takes place that does not
apply an optimization like heapam's HOT (though only for indexes where
all key columns are logically unchanged).  Any index tuple that receives
the hint on insert is expected to be a duplicate of at least one
existing older version that is needed for the same logical row.  Related
versions will typically be stored on the same index page, at least
within index AMs that apply the hint.

Recognizing the difference between MVCC version churn duplicates and
true logical row duplicates at the index AM level can help with cleanup
of garbage index tuples.  Cleanup can intelligently target tuples that
are likely to be garbage, without wasting too many cycles on less
promising tuples/pages (index pages with little or no version churn).

This is infrastructure for an upcoming commit that will teach nbtree to
perform bottom-up index deletion.  No index AM actually applies the hint
just yet.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=CEKFa74EScx_hFVshCOn6AA5T-ajFASTdzipdkLTNQQ@mail.gmail.com
2021-01-13 08:11:00 -08:00
Tomas Vondra
8a4f618e7a Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).

Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
2021-01-06 21:51:06 +01:00
Thomas Munro
c0d4f6d897 Rename "enum blacklist" to "uncommitted enums".
We agreed to remove this terminology and use something more descriptive.

Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue%40alap3.anarazel.de
2021-01-05 12:38:48 +13:00
Bruce Momjian
ca3b37487b Update copyright for 2021
Backpatch-through: 9.5
2021-01-02 13:06:25 -05:00
Alexander Korotkov
6df7a9698b Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
2020-12-20 07:20:33 +03:00
Tom Lane
b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Tom Lane
c7aba7c14e Support subscripting of arbitrary types, not only arrays.
This patch generalizes the subscripting infrastructure so that any
data type can be subscripted, if it provides a handler function to
define what that means.  Traditional variable-length (varlena) arrays
all use array_subscript_handler(), while the existing fixed-length
types that support subscripting use raw_array_subscript_handler().
It's expected that other types that want to use subscripting notation
will define their own handlers.  (This patch provides no such new
features, though; it only lays the foundation for them.)

To do this, move the parser's semantic processing of subscripts
(including coercion to whatever data type is required) into a
method callback supplied by the handler.  On the execution side,
replace the ExecEvalSubscriptingRef* layer of functions with direct
calls to callback-supplied execution routines.  (Thus, essentially
no new run-time overhead should be caused by this patch.  Indeed,
there is room to remove some overhead by supplying specialized
execution routines.  This patch does a little bit in that line,
but more could be done.)

Additional work is required here and there to remove formerly
hard-wired assumptions about the result type, collation, etc
of a SubscriptingRef expression node; and to remove assumptions
that the subscript values must be integers.

One useful side-effect of this is that we now have a less squishy
mechanism for identifying whether a data type is a "true" array:
instead of wiring in weird rules about typlen, we can look to see
if pg_type.typsubscript == F_ARRAY_SUBSCRIPT_HANDLER.  For this
to be bulletproof, we have to forbid user-defined types from using
that handler directly; but there seems no good reason for them to
do so.

This patch also removes assumptions that the number of subscripts
is limited to MAXDIM (6), or indeed has any hard-wired limit.
That limit still applies to types handled by array_subscript_handler
or raw_array_subscript_handler, but to discourage other dependencies
on this constant, I've moved it from c.h to utils/array.h.

Dmitry Dolgov, reviewed at various times by Tom Lane, Arthur Zakirov,
Peter Eisentraut, Pavel Stehule

Discussion: https://postgr.es/m/CA+q6zcVDuGBv=M0FqBYX8DPebS3F_0KQ6OVFobGJPM507_SZ_w@mail.gmail.com
Discussion: https://postgr.es/m/CA+q6zcVovR+XY4mfk-7oNk-rF91gH0PebnNfuUjuuDsyHjOcVA@mail.gmail.com
2020-12-09 12:40:37 -05:00
Peter Eisentraut
8b069ef5dc Change get_constraint_index() to use pg_constraint.conindid
It was still using a scan of pg_depend instead of using the conindid
column that has been added since.

Since it is now just a catalog lookup wrapper and not related to
pg_depend, move from pg_depend.c to lsyscache.c.

Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/4688d55c-9a2e-9a5a-d166-5f24fe0bf8db%40enterprisedb.com
2020-12-09 15:41:45 +01:00
Fujii Masao
01469241b2 Track total number of WAL records, FPIs and bytes generated in the cluster.
Commit 6b466bf5f2 allowed pg_stat_statements to track the number of
WAL records, full page images and bytes that each statement generated.
Similarly this commit allows us to track the cluster-wide WAL statistics
counters.

New columns wal_records, wal_fpi and wal_bytes are added into the
pg_stat_wal view, and reports the total number of WAL records,
full page images and bytes generated in the , respectively.

Author: Masahiro Ikeda
Reviewed-by: Amit Kapila, Movead Li, Kyotaro Horiguchi, Fujii Masao
Discussion: https://postgr.es/m/35ef960128b90bfae3b3fdf60a3a860f@oss.nttdata.com
2020-12-02 13:00:15 +09:00
Tom Lane
789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
Tom Lane
926fa801ac Remove undocumented IS [NOT] OF syntax.
This feature was added a long time ago, in 7c1e67bd5 and eb121ba2c,
but never documented in any user-facing way.  (Documentation added
in 6126d3e70 was commented out almost immediately, in 8272fc3f7.)
That's because, while this syntax is defined by SQL:99, our
implementation is only vaguely related to the standard's semantics.
The standard appears to intend a run-time not parse-time test, and
it definitely intends that the test should understand subtype
relationships.

No one has stepped up to fix that in the intervening years, but
people keep coming across the code and asking why it's not documented.
Let's just get rid of it: if anyone ever wants to make it work per
spec, they can easily recover whatever parts of this code are still
of value from our git history.

If there's anyone out there who's actually using this despite its
undocumented status, they can switch to using pg_typeof() instead,
eg. "pg_typeof(something) = 'mytype'::regtype".  That gives
essentially the same semantics as what our IS OF code did.
(We didn't have that function last time this was discussed, or
we would have ripped out IS OF then.)

Discussion: https://postgr.es/m/CAKFQuwZ2pTc-DSkOiTfjauqLYkNREeNZvWmeg12Q-_69D+sYZA@mail.gmail.com
Discussion: https://postgr.es/m/BAY20-F23E9F2B4DAB3E4E88D3623F99B0@phx.gbl
Discussion: https://postgr.es/m/3E7CF81D.1000203@joeconway.com
2020-11-19 17:39:39 -05:00