Commit Graph

21613 Commits

Author SHA1 Message Date
Joe Conway b12bd4869b Fix has_column_privilege function corner case
According to the comments, when an invalid or dropped column oid is passed
to has_column_privilege(), the intention has always been to return NULL.
However, when the caller had table level privilege the invalid/missing
column was never discovered, because table permissions were checked first.

Fix that by introducing extended versions of pg_attribute_acl(check|mask)
and pg_class_acl(check|mask) which take a new argument, is_missing. When
is_missing is NULL, the old behavior is preserved. But when is_missing is
passed by the caller, no ERROR is thrown for dropped or missing
columns/relations, and is_missing is flipped to true. This in turn allows
has_column_privilege to check for column privileges first, providing the
desired semantics.

Not backpatched since it is a user visible behavioral change with no previous
complaints, and the fix is a bit on the invasive side.

Author: Joe Conway
Reviewed-By: Tom Lane
Reported by: Ian Barwick
Discussion: https://postgr.es/m/flat/9b5f4311-157b-4164-7fe7-077b4fe8ed84%40joeconway.com
2021-03-31 13:55:25 -04:00
Tom Lane 86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
Peter Eisentraut 055fee7eb4 Allow an alias to be attached to a JOIN ... USING
This allows something like

    SELECT ... FROM t1 JOIN t2 USING (a, b, c) AS x

where x has the columns a, b, c and unlike a regular alias it does not
hide the range variables of the tables being joined t1 and t2.

Per SQL:2016 feature F404 "Range variable for common column names".

Reviewed-by: Vik Fearing <vik.fearing@2ndquadrant.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/454638cf-d563-ab76-a585-2564428062af@2ndquadrant.com
2021-03-31 17:10:50 +02:00
Etsuro Fujita 27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
Peter Eisentraut 66392d3965 Add p_names field to ParseNamespaceItem
ParseNamespaceItem had a wired-in assumption that p_rte->eref
describes the table and column aliases exposed by the nsitem.  This
relaxes this by creating a separate p_names field in an nsitem.  This
is mainly preparation for a patch for JOIN USING aliases, but it saves
one indirection in common code paths, so it's possibly a win on its
own.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/785329.1616455091@sss.pgh.pa.us
2021-03-31 10:52:37 +02:00
Peter Eisentraut 91c5a8caaa Add errhint_plural() function and make use of it
Similar to existing errmsg_plural() and errdetail_plural().  Some
errhint() calls hadn't received the proper plural treatment yet.
2021-03-31 09:16:25 +02:00
Noah Misch 0ff8bbdee1 Accept slightly-filled pages for tuples larger than fillfactor.
We always inserted a larger-than-fillfactor tuple into a newly-extended
page, even when existing pages were empty or contained nothing but an
unused line pointer.  This was unnecessary relation extension.  Start
tolerating page usage up to 1/8 the maximum space that could be taken up
by line pointers.  This is somewhat arbitrary, but it should allow more
cases to reuse pages.  This has no effect on tables with fillfactor=100
(the default).

John Naylor and Floris van Nee.  Reviewed by Matthias van de Meent.
Reported by Floris van Nee.

Discussion: https://postgr.es/m/6e263217180649339720afe2176c50aa@opammb0562.comp.optiver.com
2021-03-30 18:53:44 -07:00
Tom Lane 65158f497a Remove small inefficiency in ExecARDeleteTriggers/ExecARUpdateTriggers.
Whilst poking at nodeModifyTable.c, I chanced to notice that while
its calls to ExecBR*Triggers and ExecIR*Triggers are protected by
tests to see if there are any relevant triggers to fire, its calls
to ExecAR*Triggers are not; the latter functions do the equivalent
tests themselves.  This seems possibly reasonable given the more
complex conditions involved, but what's less reasonable is that
the ExecAR* functions aren't careful to do no work when there is
no work to be done.  ExecARInsertTriggers gets this right, but the
other two will both force creation of a slot that the query may
have no use for.  ExecARUpdateTriggers additionally performed a
usually-useless ExecClearTuple() on that slot.  This is probably
all pretty microscopic in real workloads, but a cycle shaved is a
cycle earned.
2021-03-30 20:01:31 -04:00
Bruce Momjian 5da9868ed9 In messages, use singular nouns for -1, like we do for +1.
This outputs "-1 year", not "-1 years".

Reported-by: neverov.max@gmail.com

Bug: 16939

Discussion: https://postgr.es/m/16939-cceeb03fb72736ee@postgresql.org
2021-03-30 18:34:27 -04:00
Stephen Frost 4753ef37e0 Use a WaitLatch for vacuum/autovacuum sleeping
Instead of using pg_usleep() in vacuum_delay_point(), use a WaitLatch.
This has the advantage that we will realize if the postmaster has been
killed since the last time we decided to sleep while vacuuming.

Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/CAFh8B=kcdk8k-Y21RfXPu5dX=bgPqJ8TC3p_qxR_ygdBS=JN5w@mail.gmail.com
2021-03-30 12:52:56 -04:00
David Rowley ed934d4fa3 Allow estimate_num_groups() to pass back further details about the estimation
Here we add a new output parameter to estimate_num_groups() to allow it to
inform the caller of additional, possibly useful information about the
estimation.

The new output parameter is a struct that currently contains just a single
field with a set of flags.  This was done rather than having the flags as
an output parameter to allow future fields to be added without having to
change the signature of the function at a later date when we want to pass
back further information that might not be suitable to store in the flags
field.

It seems reasonable that one day in the future that the planner would want
to know more about the estimation. For example, how many individual sets
of statistics was the estimation generated from?  The planner may want to
take that into account if we ever want to consider risks as well as costs
when generating plans.

For now, there's only 1 flag we set in the flags field.  This is to
indicate if the estimation fell back on using the hard-coded constants in
any part of the estimation. Callers may like to change their behavior if
this is set, and this gives them the ability to do so.  Callers may pass
the flag pointer as NULL if they have no interest in obtaining any
additional information about the estimate.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.  Additionally, we're also not
making any changes to add support for clauselist_selectivity() and
clauselist_selectivity_ext().  However, if this is required in the future
then the same struct being added here should be fine to use as a new
output argument for those functions too.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
2021-03-30 20:52:46 +13:00
David Rowley efd9d92bb3 Fix compiler warning in unistr function
Some compilers are not aware that elog/ereport ERROR does not return.
2021-03-30 20:28:09 +13:00
Amit Kapila f64ea6dc5c Add a xid argument to the filter_prepare callback for output plugins.
Along with gid, this provides a different way to identify the transaction.
The users that use xid in some way to prepare the transactions can use it
to filter prepare transactions. The later commands COMMIT PREPARED or
ROLLBACK PREPARED carries both identifiers, providing an output plugin the
choice of what to use.

Author: Markus Wanner
Reviewed-by: Vignesh C, Amit Kapila
Discussion: https://postgr.es/m/ee280000-7355-c4dc-e47b-2436e7be959c@enterprisedb.com
2021-03-30 10:34:43 +05:30
David Rowley af527705ed Adjust design of per-worker parallel seqscan data struct
The design of the data structures which allow storage of the per-worker
memory during parallel seq scans were not ideal. The work done in
56788d215 required an additional data structure to allow workers to
remember the range of pages that had been allocated to them for
processing during a parallel seqscan.  That commit added a void pointer
field to TableScanDescData to allow heapam to store the per-worker
allocation information.  However putting the field there made very little
sense given that we have AM specific structs for that, e.g.
HeapScanDescData.

Here we remove the void pointer field from TableScanDescData and add a
dedicated field for this purpose to HeapScanDescData.

Previously we also allocated memory for this parallel per-worker data for
all scans, regardless if it was a parallel scan or not.  This was just a
wasted allocation for non-parallel scans, so here we make the allocation
conditional on the scan being parallel.

Also, add previously missing pfree() to free the per-worker data in
heap_endscan().

Reported-by: Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
2021-03-30 10:17:09 +13:00
Andrew Dunstan 6d7a6feac4 Allow matching the DN of a client certificate for authentication
Currently we only recognize the Common Name (CN) of a certificate's
subject to be matched against the user name. Thus certificates with
subjects '/OU=eng/CN=fred' and '/OU=sales/CN=fred' will have the same
connection rights. This patch provides an option to match the whole
Distinguished Name (DN) instead of just the CN. On any hba line using
client certificate identity, there is an option 'clientname' which can
have values of 'DN' or 'CN'. The default is 'CN', the current procedure.

The DN is matched against the RFC2253 formatted DN, which looks like
'CN=fred,OU=eng'.

This facility of probably best used in conjunction with an ident map.

Discussion: https://postgr.es/m/92e70110-9273-d93c-5913-0bccb6562740@dunslane.net

Reviewed-By: Michael Paquier, Daniel Gustafsson, Jacob Champion
2021-03-29 15:49:39 -04:00
Peter Eisentraut f37fec837c Add unistr function
This allows decoding a string with Unicode escape sequences.  It is
similar to Unicode escape strings, but offers some more flexibility.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com
2021-03-29 11:56:53 +02:00
Peter Geoghegan 30aaab26e5 PageAddItemExtended(): Add LP_UNUSED assertion.
Assert that LP_UNUSED items have no storage.  If it's worth having
defensive code in non-assert builds then it's worth having an assertion
as well.
2021-03-28 20:10:02 -07:00
David Rowley f58b230ed0 Cache if PathTarget and RestrictInfos contain volatile functions
Here we aim to reduce duplicate work done by contain_volatile_functions()
by caching whether PathTargets and RestrictInfos contain any volatile
functions the first time contain_volatile_functions() is called for them.
Any future calls for these nodes just use the cached value rather than
going to the trouble of recursively checking the sub-node all over again.
Thanks to Tom Lane for the idea.

Any locations in the code which make changes to a PathTarget or
RestrictInfo which could change the outcome of the volatility check must
change the cached value back to VOLATILITY_UNKNOWN again.
contain_volatile_functions() is the only code in charge of setting the
cache value to either VOLATILITY_VOLATILE or VOLATILITY_NOVOLATILE.

Some existing code does benefit from this additional caching, however,
this change is mainly aimed at an upcoming patch that must check for
volatility during the join search.  Repeated volatility checks in that
case can become very expensive when the join search contains more than a
few relations.

Author: David Rowley
Discussion: https://postgr.es/m/3795226.1614059027@sss.pgh.pa.us
2021-03-29 14:55:26 +13:00
Peter Eisentraut 8df2f37114 Improve consistency of SQL code capitalization 2021-03-27 10:17:12 +01:00
Tomas Vondra a4d75c86bf Extended statistics on expressions
Allow defining extended statistics on expressions, not just just on
simple column references.  With this commit, expressions are supported
by all existing extended statistics kinds, improving the same types of
estimates. A simple example may look like this:

  CREATE TABLE t (a int);
  CREATE STATISTICS s ON mod(a,10), mod(a,20) FROM t;
  ANALYZE t;

The collected statistics are useful e.g. to estimate queries with those
expressions in WHERE or GROUP BY clauses:

  SELECT * FROM t WHERE mod(a,10) = 0 AND mod(a,20) = 0;

  SELECT 1 FROM t GROUP BY mod(a,10), mod(a,20);

This introduces new internal statistics kind 'e' (expressions) which is
built automatically when the statistics object definition includes any
expressions. This represents single-expression statistics, as if there
was an expression index (but without the index maintenance overhead).
The statistics is stored in pg_statistics_ext_data as an array of
composite types, which is possible thanks to 79f6a942bd.

CREATE STATISTICS allows building statistics on a single expression, in
which case in which case it's not possible to specify statistics kinds.

A new system view pg_stats_ext_exprs can be used to display expression
statistics, similarly to pg_stats and pg_stats_ext views.

ALTER TABLE ... ALTER COLUMN ... TYPE now treats indexes the same way it
treats indexes, i.e. it drops and recreates the statistics. This means
all statistics are reset, and we no longer try to preserve at least the
functional dependencies. This should not be a major issue in practice,
as the functional dependencies actually rely on per-column statistics,
which were always reset anyway.

Author: Tomas Vondra
Reviewed-by: Justin Pryzby, Dean Rasheed, Zhihong Yu
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-27 00:01:11 +01:00
Tomas Vondra 33e52ad9a3 Fix ndistinct estimates with system attributes
When estimating the number of groups using extended statistics, the code
was discarding information about system attributes. This led to strange
situation that

    SELECT 1 FROM t GROUP BY ctid;

could have produced higher estimate (equal to pg_class.reltuples) than

    SELECT 1 FROM t GROUP BY a, b, ctid;

with extended statistics on (a,b). Fixed by retaining information about
the system attribute.

Backpatch all the way to 10, where extended statistics were introduced.

Author: Tomas Vondra
Backpatch-through: 10
2021-03-26 22:34:58 +01:00
Noah Misch a14a0118a1 Add "pg_database_owner" default role.
Membership consists, implicitly, of the current database owner.  Expect
use in template databases.  Once pg_database_owner has rights within a
template, each owner of a database instantiated from that template will
exercise those rights.

Reviewed by John Naylor.

Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
2021-03-26 10:42:17 -07:00
Noah Misch f687bf61ed Merge similar algorithms into roles_is_member_of().
The next commit would have complicated two or three algorithms, so take
this opportunity to consolidate.  No functional changes.

Reviewed by John Naylor.

Discussion: https://postgr.es/m/20201228043148.GA1053024@rfd.leadboat.com
2021-03-26 10:42:16 -07:00
Tomas Vondra 73b96bad4a Fix alignment in BRIN minmax-multi deserialization
The deserialization failed to ensure correct alignment, as it assumed it
can simply point into the serialized value. The serialization however
ignores alignment and copies just the significant bytes in order to make
the result as small as possible. This caused failures on systems that
are sensitive to mialigned addresses, like sparc, or with address
sanitizer enabled.

Fixed by copying the serialized data to ensure proper alignment. While
at it, fix an issue with serialization on big endian machines, using the
same store_att_byval/fetch_att trick as extended statistics.

Discussion: https://postgr.es/0c8c3304-d3dd-5e29-d5ac-b50589a23c8c%40enterprisedb.com
2021-03-26 16:48:36 +01:00
Tomas Vondra ab596105b5 BRIN minmax-multi indexes
Adds BRIN opclasses similar to the existing minmax, except that instead
of summarizing the page range into a single [min,max] range, the summary
consists of multiple ranges and/or points, allowing gaps. This allows
more efficient handling of data with poor correlation to physical
location within the table and/or outlier values, for which the regular
minmax opclassed tend to work poorly.

It's possible to specify the number of values kept for each page range,
either as a single point or an interval boundary.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_minmax_multi_ops(values_per_range=16));

When building the summary, the values are combined into intervals with
the goal to minimize the "covering" (sum of interval lengths), using a
support procedure computing distance between two values.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:54:30 +01:00
Tomas Vondra 77b88cd1bb BRIN bloom indexes
Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes
using the new opclasses allow only equality queries (similar to hash
indexes), but that works fine for data like UUID, MAC addresses etc. for
which range queries are not very common. This also means the indexes
work for data that is not well correlated to physical location within
the table, or perhaps even entirely random (which is a common issue with
existing BRIN minmax opclasses).

It's possible to specify opclass parameters with the usual Bloom filter
parameters, i.e. the desired false-positive rate and the expected number
of distinct values per page range.

  CREATE TABLE t (a int);
  CREATE INDEX ON t
   USING brin (a int4_bloom_ops(false_positive_rate = 0.05,
                                n_distinct_per_range = 100));

The opclasses do not operate on the indexed values directly, but compute
a 32-bit hash first, and the Bloom filter is built on the hash value.
Collisions should not be a huge issue though, as the number of distinct
values in a page ranges is usually fairly small.

Bump catversion, due to various catalog changes.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Sokolov Yura <y.sokolov@postgrespro.ru>
Reviewed-by: Nico Williams <nico@cryptonector.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com
2021-03-26 13:35:32 +01:00
Tomas Vondra a681e3c107 Support the old signature of BRIN consistent function
Commit a1c649d889 changed the signature of the BRIN consistent function
by adding a new required parameter.  Treating the parameter as optional,
which would make the change backwards incompatibile, was rejected with
the justification that there are few out-of-core extensions, so it's not
worth adding making the code more complex, and it's better to deal with
that in the extension.

But after further thought, that would be rather problematic, because
pg_upgrade simply dumps catalog contents and the same version of an
extension needs to work on both PostgreSQL versions. Supporting both
variants of the consistent function (with 3 or 4 arguments) makes that
possible.

The signature is not the only thing that changed, as commit 72ccf55cb9
moved handling of IS [NOT] NULL keys from the support procedures. But
this change is backward compatible - handling the keys in exension is
unnecessary, but harmless. The consistent function will do a bit of
unnecessary work, but it should be very cheap.

This also undoes most of the changes to the existing opclasses (minmax
and inclusion), making them use the old signature again. This should
make backpatching simpler.

Catversion bump, because of changes in pg_amproc.

Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
2021-03-26 13:17:58 +01:00
Robert Haas 5db1fd7823 Fix interaction of TOAST compression with expression indexes.
Before, trying to compress a value for insertion into an expression
index would crash.

Dilip Kumar, with some editing by me. Report by Jaime Casanova.

Discussion: http://postgr.es/m/CAJKUy5gcs0zGOp6JXU2mMVdthYhuQpFk=S3V8DOKT=LZC1L36Q@mail.gmail.com
2021-03-25 19:55:32 -04:00
Alvaro Herrera 71f4c8c6f7
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.

Because it runs in two transactions, it cannot be used in a transaction
block.  This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it.  But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)

In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.

The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command.  Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included.  As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.

A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 18:00:28 -03:00
Alvaro Herrera cc121d5596
Add comments for AlteredTableInfo->rel
The prior commit which introduced it was pretty squalid in terms of
code documentation, so add some comments.
2021-03-25 16:07:15 -03:00
Alvaro Herrera cd03c6e94b
Let ALTER TABLE Phase 2 routines manage the relation pointer
Struct AlteredRelationInfo gains a new Relation member, to be used only
by Phase 2 (ATRewriteCatalogs); this allows ATExecCmd() subroutines open
and close the relation internally.

A future commit will use this facility to implement an ALTER TABLE
subcommand that closes and reopens the relation across transaction
boundaries.

(It is possible to keep the relation open past phase 2 to be used by
phase 3 instead of having to reopen it that point, but there are some
minor complications with that; it's not clear that there is much to be
won from doing that, though.)

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 15:56:11 -03:00
Alvaro Herrera a24ae3d7b9
Remove StoreSingleInheritance reimplementation
I introduced this duplicate code in commit 8b08f7d482 for no good
reason.  Remove it, and backpatch to 11 where it was introduced.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
2021-03-25 10:47:38 -03:00
Peter Eisentraut f2c7ce64ae Trim some extra whitespace in parser file 2021-03-25 10:17:52 +01:00
Peter Eisentraut 91d1f2d302 Rename a parse node to be more general
A WHERE clause will be used for row filtering in logical replication.
We already have a similar node: 'WHERE (condition here)'.  Let's
rename the node to a generic name and use it for row filtering too.

Author: Euler Taveira <euler.taveira@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com
2021-03-25 10:06:32 +01:00
Michael Paquier a1999a01bb Sanitize the term "combo CID" in code comments
Combo CIDs were referred in the code comments using different terms
across various places of the code, so unify a bit the term used with
what is currently in use in some of the READMEs.

Author: "Hou, Zhijie"
Discussion: https://postgr.es/m/1d42865c91404f46af4562532fdbea31@G08CNEXMBPEKD05.g08.fujitsu.local
2021-03-25 16:08:03 +09:00
Fujii Masao 438fc4a39c Fix bug in WAL replay of COMMIT_TS_SETTS record.
Previously the WAL replay of COMMIT_TS_SETTS record called
TransactionTreeSetCommitTsData() with the argument write_xlog=true,
which generated and wrote new COMMIT_TS_SETTS record.
This should not be acceptable because it's during recovery.

This commit fixes the WAL replay of COMMIT_TS_SETTS record
so that it calls TransactionTreeSetCommitTsData() with write_xlog=false
and doesn't generate new WAL during recovery.

Back-patch to all supported branches.

Reported-by: lx zou <zoulx1982@163.com>
Author: Fujii Masao
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/16931-620d0f2fdc6108f1@postgresql.org
2021-03-25 11:23:30 +09:00
Fujii Masao df9384492b Improve connection denied error message during recovery.
Previously when an archive recovery or a standby was starting and
reached the consistent recovery state but hot_standby was configured
to off, the error message when a client connectted was "the database
system is starting up", which was needless confusing and not really
all that accurate either.

This commit improves the connection denied error message during
recovery, as follows, so that the users immediately know that their
servers are configured to deny those connections.

- If hot_standby is disabled, the error message "the database system
  is not accepting connections" and the detail message "Hot standby
  mode is disabled." are output when clients connect while an archive
  recovery or a standby is running.

- If hot_standby is enabled, the error message "the database system
  is not yet accepting connections" and the detail message
  "Consistent recovery state has not been yet reached." are output
  when clients connect until the consistent recovery state is reached
  and postmaster starts accepting read only connections.

This commit doesn't change the connection denied error message of
"the database system is starting up" during normal server startup and
crash recovery. Because it's still suitable for those situations.

Author: James Coleman
Reviewed-by: Alvaro Herrera, Andres Freund, David Zhang, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/CAAaqYe8h5ES_B=F_zDT+Nj9XU7YEwNhKhHA2RE4CFhAQ93hfig@mail.gmail.com
2021-03-25 10:41:28 +09:00
Peter Eisentraut 37c99d304d Fix stray double semicolons
Reported-by: John Naylor <john.naylor@enterprisedb.com>
2021-03-24 20:42:51 +01:00
Stephen Frost bbcc4eb2e0 Change checkpoint_completion_target default to 0.9
Common recommendations are that the checkpoint should be spread out as
much as possible, provided we avoid having it take too long.  This
change updates the default to 0.9 (from 0.5) to match that
recommendation.

There was some debate about possibly removing the option entirely but it
seems there may be some corner-cases where having it set much lower to
try to force the checkpoint to be as fast as possible could result in
fewer periods of time of reduced performance due to kernel flushing.
General agreement is that the "spread more" is the preferred approach
though and those who need to tune away from that value are much less
common.

Reviewed-By: Michael Paquier, Peter Eisentraut, Tom Lane, David Steele,
Nathan Bossart
Discussion: https://postgr.es/m/20201207175329.GM16415%40tamriel.snowman.net
2021-03-24 13:07:51 -04:00
Robert Haas e5595de03e Tidy up more loose ends related to configurable TOAST compression.
Change the default_toast_compression GUC to be an enum rather than
a string. Earlier, uncommitted versions of the patch supported using
CREATE ACCESS METHOD to add new compression methods to a running
system, but that idea was dropped before commit. So, we can simplify
the GUC handling as well, which has the nice side effect of improving
the error messages.

While updating the documentation to reflect the new GUC type, also
move it back to the right place in the list. I moved this while
revising what became commit 24f0e395ac,
but apparently the intended ordering is "alphabetical" rather than
"whatever Robert thinks looks nice."

Rejigger things to avoid having access/toast_compression.h depend on
utils/guc.h, so that we don't end up with every file that includes
it also depending on something largely unrelated. Move a few
inline functions back into the C source file partly to help reduce
dependencies and partly just to avoid clutter. A few very minor
cosmetic fixes.

Original patch by Justin Pryzby, but very heavily edited by me,
and reverse reviewed by him and also reviewed by by Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoYp=GT_ztUCeZg2i4hkHAQv8o=-nVJ1-TKWTG1zQOmOpg@mail.gmail.com
2021-03-24 12:36:08 -04:00
Peter Eisentraut 49ab61f0bd Add date_bin function
Similar to date_trunc, but allows binning by an arbitrary interval
rather than just full units.

Author: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Artur Zakirov <zaartur@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CACPNZCt4buQFRgy6DyjuZS-2aPDpccRkrJBmgUfwYc1KiaXYxg@mail.gmail.com
2021-03-24 16:18:24 +01:00
Peter Eisentraut 1509c6fc29 Improve an error message
Make it the same as another nearby message.
2021-03-24 08:02:06 +01:00
Amit Kapila 26acb54a13 Revert "Enable parallel SELECT for "INSERT INTO ... SELECT ..."."
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.

List of commits reverted, in reverse chronological order:

ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".

Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
2021-03-24 11:29:15 +05:30
Fujii Masao 84007043fc Rename wait event WalrcvExit to WalReceiverExit.
Commit de829ddf23 added wait event WalrcvExit. But its name is not
consistent with other wait events like WalReceiverMain or
WalReceiverWaitStart, etc. So this commit renames WalrcvExit to
WalReceiverExit.

Author: Fujii Masao
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/cced9995-8fa2-7b22-9d91-3f22a2b8c23c@oss.nttdata.com
2021-03-24 10:37:54 +09:00
Fujii Masao 7fbcee1b2d Log when GetNewOidWithIndex() fails to find unused OID many times.
GetNewOidWithIndex() generates a new OID one by one until it finds
one not in the relation. If there are very long runs of consecutive
existing OIDs, GetNewOidWithIndex() needs to iterate many times
in the loop to find unused OID. Since TOAST table can have a large
number of entries and there can be such long runs of OIDs, there is
the case where it takes so many iterations to find new OID not in
TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used,
GetNewOidWithIndex() enters something like busy loop and repeats
the iterations until at least one OID is marked as unused.

There are some reported troubles caused by a large number of
iterations in GetNewOidWithIndex(). For example, when inserting
a billion of records into the table, all the backends doing that
insertion operation got hang with 100% CPU usage at some point.

Previously there was no easy way to detect that GetNewOidWithIndex()
failed to find unused OID many times. So, for example, gdb full
backtrace of hanged backends needed to be taken, in order to
investigate that trouble. This is inconvenient and may not be
available in some production environments.

To provide easy way for that, this commit makes GetNewOidWithIndex()
log that it iterates more than GETNEWOID_LOG_THRESHOLD but have
not yet found OID unused in the relation. Also this commit makes
it repeat logging with exponentially increasing intervals until
it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it
finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless
an unused OID is found. Those macro variables are used not to
fill up the server log with the similar messages.

In the discusion at pgsql-hackers, there was another idea to report
the lots of iterations in GetNewOidWithIndex() via wait event.
But since GetNewOidWithIndex() traverses indexes to find unused
OID and which will do I/O, acquire locks, etc, which will overwrite
the wait event and reset it to nothing once done. So that idea
doesn't work well, and we didn't adopt it.

Author: Tomohiro Hiramitsu
Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
2021-03-24 10:36:56 +09:00
Michael Paquier 99dd75fb99 Reword slightly logs generated for index stats in autovacuum
Using "remain" is confusing, as it implies that the index file can
shrink.  Instead, use "in total".

Per discussion with Peter Geoghegan.

Discussion: https://postgr.es/m/CAH2-WzkYgHZzpGOwR14CScJsjaQpvJrEkEfkh_=wGhzLb=yVdQ@mail.gmail.com
2021-03-24 09:36:03 +09:00
Tomas Vondra 79f6a942bd Allow composite types in catalog bootstrap
When resolving types during catalog bootstrap, try to reload the pg_type
contents if a type is not found. That allows catalogs to contain
composite types, e.g. row types for other catalogs.

Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-24 00:47:52 +01:00
Tomas Vondra e1a5e65703 Convert Typ from array to list in bootstrap
It's a bit easier and more convenient to free and reload a List,
compared to a plain array. This will be helpful when allowing catalogs
to contain composite types.

Author: Justin Pryzby
Reviewed-by: Dean Rasheed, Tomas Vondra
Discussion: https://postgr.es/m/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com
2021-03-24 00:47:40 +01:00
Peter Geoghegan 5b861baa55 nbtree VACUUM: Cope with buggy opclasses.
Teach nbtree VACUUM to press on with vacuuming in the event of a page
deletion attempt that fails to "re-find" a downlink for its child/target
page.

There is no good reason to treat this as an irrecoverable error.  But
there is a good reason not to: pressing on at this point removes any
question of VACUUM not making progress solely due to misbehavior from
user-defined operator class code.

Discussion: https://postgr.es/m/CAH2-Wzma5G9CTtMjbrXTwOym+U=aWg-R7=-htySuztgoJLvZXg@mail.gmail.com
2021-03-23 16:09:51 -07:00
Tom Lane 9d523119fd Avoid possible crash while finishing up a heap rewrite.
end_heap_rewrite was not careful to ensure that the target relation
is open at the smgr level before performing its final smgrimmedsync.
In ordinary cases this is no problem, because it would have been
opened earlier during the rewrite.  However a crash can be reproduced
by re-clustering an empty table with CLOBBER_CACHE_ALWAYS enabled.

Although that exact scenario does not crash in v13, I think that's
a chance result of unrelated planner changes, and the problem is
likely still reachable with other test cases.  The true proximate
cause of this failure is commit c6b92041d, which replaced a call to
heap_sync (which was careful about opening smgr) with a direct call
to smgrimmedsync.  Hence, back-patch to v13.

Amul Sul, per report from Neha Sharma; cosmetic changes
and test case by me.

Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
2021-03-23 11:24:16 -04:00