1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
2000-01-12 06:04:42 +01:00
|
|
|
* indexcmds.c
|
2001-07-17 23:53:01 +02:00
|
|
|
* POSTGRES define and remove index code.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/indexcmds.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1996-11-04 00:57:43 +01:00
|
|
|
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "postgres.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2019-12-27 00:09:00 +01:00
|
|
|
#include "access/amapi.h"
|
Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.
heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.
Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.
As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.
Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-15 00:54:18 +01:00
|
|
|
#include "access/heapam.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2006-07-04 00:45:41 +02:00
|
|
|
#include "access/reloptions.h"
|
2016-04-16 18:11:41 +02:00
|
|
|
#include "access/sysattr.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/tableam.h"
|
2006-07-13 18:49:20 +02:00
|
|
|
#include "access/xact.h"
|
2000-07-04 08:11:54 +02:00
|
|
|
#include "catalog/catalog.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "catalog/index.h"
|
2006-02-10 20:01:12 +01:00
|
|
|
#include "catalog/indexing.h"
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
#include "catalog/pg_am.h"
|
2018-04-08 20:35:29 +02:00
|
|
|
#include "catalog/pg_constraint.h"
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
#include "catalog/pg_inherits.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "catalog/pg_opclass.h"
|
2009-12-07 06:22:23 +01:00
|
|
|
#include "catalog/pg_opfamily.h"
|
2004-11-05 20:17:13 +01:00
|
|
|
#include "catalog/pg_tablespace.h"
|
2012-01-25 21:28:07 +01:00
|
|
|
#include "catalog/pg_type.h"
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
#include "commands/comment.h"
|
2003-06-27 16:45:32 +02:00
|
|
|
#include "commands/dbcommands.h"
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "commands/defrem.h"
|
2018-10-07 00:17:46 +02:00
|
|
|
#include "commands/event_trigger.h"
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
#include "commands/progress.h"
|
2011-12-21 21:17:28 +01:00
|
|
|
#include "commands/tablecmds.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "commands/tablespace.h"
|
2004-06-10 19:56:03 +02:00
|
|
|
#include "mb/pg_wchar.h"
|
2000-07-04 08:11:54 +02:00
|
|
|
#include "miscadmin.h"
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
#include "nodes/makefuncs.h"
|
2008-08-26 00:42:34 +02:00
|
|
|
#include "nodes/nodeFuncs.h"
|
2019-01-29 21:48:51 +01:00
|
|
|
#include "optimizer/optimizer.h"
|
2000-04-25 04:45:54 +02:00
|
|
|
#include "parser/parse_coerce.h"
|
2000-02-25 03:58:48 +01:00
|
|
|
#include "parser/parse_func.h"
|
2009-12-07 06:22:23 +01:00
|
|
|
#include "parser/parse_oper.h"
|
2019-02-21 17:38:54 +01:00
|
|
|
#include "partitioning/partdesc.h"
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
#include "pgstat.h"
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
#include "rewrite/rewriteManip.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "storage/lmgr.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "storage/proc.h"
|
2007-09-05 20:10:48 +02:00
|
|
|
#include "storage/procarray.h"
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
#include "storage/sinvaladt.h"
|
2002-04-27 05:45:03 +02:00
|
|
|
#include "utils/acl.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/builtins.h"
|
2006-02-10 20:01:12 +01:00
|
|
|
#include "utils/fmgroids.h"
|
2007-05-02 23:08:46 +02:00
|
|
|
#include "utils/inval.h"
|
2001-07-17 23:53:01 +02:00
|
|
|
#include "utils/lsyscache.h"
|
2005-05-06 19:24:55 +02:00
|
|
|
#include "utils/memutils.h"
|
2018-04-15 02:12:14 +02:00
|
|
|
#include "utils/partcache.h"
|
2019-03-29 08:25:20 +01:00
|
|
|
#include "utils/pg_rusage.h"
|
2017-01-21 02:29:53 +01:00
|
|
|
#include "utils/regproc.h"
|
2008-03-26 19:48:59 +01:00
|
|
|
#include "utils/snapmgr.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "utils/syscache.h"
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2001-07-17 23:53:01 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/* non-export function prototypes */
|
2020-11-17 15:37:36 +01:00
|
|
|
static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
|
2003-12-28 22:57:37 +01:00
|
|
|
static void CheckPredicate(Expr *predicate);
|
2007-01-09 03:14:16 +01:00
|
|
|
static void ComputeIndexAttrs(IndexInfo *indexInfo,
|
2012-01-25 21:28:07 +01:00
|
|
|
Oid *typeOidP,
|
2011-02-08 22:04:18 +01:00
|
|
|
Oid *collationOidP,
|
2007-01-09 03:14:16 +01:00
|
|
|
Oid *classOidP,
|
|
|
|
int16 *colOptionP,
|
2004-05-05 06:48:48 +02:00
|
|
|
List *attList,
|
2009-12-07 06:22:23 +01:00
|
|
|
List *exclusionOpNames,
|
2004-05-05 06:48:48 +02:00
|
|
|
Oid relId,
|
2017-10-31 15:34:31 +01:00
|
|
|
const char *accessMethodName, Oid accessMethodId,
|
2007-01-09 03:14:16 +01:00
|
|
|
bool amcanorder,
|
2004-05-05 06:48:48 +02:00
|
|
|
bool isconstraint);
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
|
|
|
|
List *colnames, List *exclusionOpNames,
|
|
|
|
bool primary, bool isconstraint);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
static char *ChooseIndexNameAddition(List *colnames);
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
static List *ChooseIndexColumnNames(List *indexElems);
|
2021-01-18 06:03:10 +01:00
|
|
|
static void ReindexIndex(RangeVar *indexRelation, ReindexParams *params,
|
|
|
|
bool isTopLevel);
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
|
|
|
|
Oid relId, Oid oldRelId, void *arg);
|
2021-01-18 06:03:10 +01:00
|
|
|
static Oid ReindexTable(RangeVar *relation, ReindexParams *params,
|
|
|
|
bool isTopLevel);
|
|
|
|
static void ReindexMultipleTables(const char *objectName,
|
|
|
|
ReindexObjectType objectKind, ReindexParams *params);
|
2020-11-17 15:37:36 +01:00
|
|
|
static void reindex_error_callback(void *args);
|
2021-01-18 06:03:10 +01:00
|
|
|
static void ReindexPartitions(Oid relid, ReindexParams *params,
|
|
|
|
bool isTopLevel);
|
|
|
|
static void ReindexMultipleInternal(List *relids,
|
|
|
|
ReindexParams *params);
|
|
|
|
static bool ReindexRelationConcurrently(Oid relationOid,
|
|
|
|
ReindexParams *params);
|
2019-04-25 16:50:14 +02:00
|
|
|
static void update_relispartition(Oid relationId, bool newval);
|
2020-11-25 22:21:08 +01:00
|
|
|
static inline void set_indexsafe_procflags(void);
|
1996-07-09 08:22:35 +02:00
|
|
|
|
2019-05-08 14:15:01 +02:00
|
|
|
/*
|
|
|
|
* callback argument type for RangeVarCallbackForReindexIndex()
|
|
|
|
*/
|
|
|
|
struct ReindexIndexCallbackState
|
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams params; /* options from statement */
|
2019-05-08 14:15:01 +02:00
|
|
|
Oid locked_table_oid; /* tracks previously locked table */
|
|
|
|
};
|
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
/*
|
|
|
|
* callback arguments for reindex_error_callback()
|
|
|
|
*/
|
|
|
|
typedef struct ReindexErrorInfo
|
|
|
|
{
|
|
|
|
char *relname;
|
|
|
|
char *relnamespace;
|
|
|
|
char relkind;
|
|
|
|
} ReindexErrorInfo;
|
|
|
|
|
2011-07-18 17:02:48 +02:00
|
|
|
/*
|
|
|
|
* CheckIndexCompatible
|
|
|
|
* Determine whether an existing index definition is compatible with a
|
|
|
|
* prospective index definition, such that the existing index storage
|
|
|
|
* could become the storage of the new index, avoiding a rebuild.
|
|
|
|
*
|
2022-02-18 04:19:10 +01:00
|
|
|
* 'oldId': the OID of the existing index
|
2011-07-18 17:02:48 +02:00
|
|
|
* 'accessMethodName': name of the AM to use.
|
|
|
|
* 'attributeList': a list of IndexElem specifying columns and expressions
|
|
|
|
* to index on.
|
|
|
|
* 'exclusionOpNames': list of names of exclusion-constraint operators,
|
|
|
|
* or NIL if not an exclusion constraint.
|
|
|
|
*
|
|
|
|
* This is tailored to the needs of ALTER TABLE ALTER TYPE, which recreates
|
|
|
|
* any indexes that depended on a changing column from their pg_get_indexdef
|
|
|
|
* or pg_get_constraintdef definitions. We omit some of the sanity checks of
|
|
|
|
* DefineIndex. We assume that the old and new indexes have the same number
|
|
|
|
* of columns and that if one has an expression column or predicate, both do.
|
|
|
|
* Errors arising from the attribute list still apply.
|
|
|
|
*
|
2012-01-25 21:28:07 +01:00
|
|
|
* Most column type changes that can skip a table rewrite do not invalidate
|
2017-02-06 10:33:58 +01:00
|
|
|
* indexes. We acknowledge this when all operator classes, collations and
|
2012-01-25 21:28:07 +01:00
|
|
|
* exclusion operators match. Though we could further permit intra-opfamily
|
|
|
|
* changes for btree and hash indexes, that adds subtle complexity with no
|
2018-04-07 22:00:39 +02:00
|
|
|
* concrete benefit for core types. Note, that INCLUDE columns aren't
|
|
|
|
* checked by this function, for them it's enough that table rewrite is
|
|
|
|
* skipped.
|
|
|
|
*
|
2012-01-25 21:28:07 +01:00
|
|
|
* When a comparison or exclusion operator has a polymorphic input type, the
|
|
|
|
* actual input types must also match. This defends against the possibility
|
|
|
|
* that operators could vary behavior in response to get_fn_expr_argtype().
|
|
|
|
* At present, this hazard is theoretical: check_exclusion_constraint() and
|
|
|
|
* all core index access methods decline to set fn_expr for such calls.
|
2011-07-18 17:02:48 +02:00
|
|
|
*
|
|
|
|
* We do not yet implement a test to verify compatibility of expression
|
|
|
|
* columns or predicates, so assume any such index is incompatible.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
CheckIndexCompatible(Oid oldId,
|
2017-10-31 15:34:31 +01:00
|
|
|
const char *accessMethodName,
|
2011-07-18 17:02:48 +02:00
|
|
|
List *attributeList,
|
|
|
|
List *exclusionOpNames)
|
|
|
|
{
|
|
|
|
bool isconstraint;
|
2012-01-25 21:28:07 +01:00
|
|
|
Oid *typeObjectId;
|
2011-07-18 17:02:48 +02:00
|
|
|
Oid *collationObjectId;
|
|
|
|
Oid *classObjectId;
|
|
|
|
Oid accessMethodId;
|
|
|
|
Oid relationId;
|
|
|
|
HeapTuple tuple;
|
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
|
|
|
Form_pg_index indexForm;
|
2011-07-18 17:02:48 +02:00
|
|
|
Form_pg_am accessMethodForm;
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
IndexAmRoutine *amRoutine;
|
2011-07-18 17:02:48 +02:00
|
|
|
bool amcanorder;
|
|
|
|
int16 *coloptions;
|
|
|
|
IndexInfo *indexInfo;
|
|
|
|
int numberOfAttributes;
|
|
|
|
int old_natts;
|
|
|
|
bool isnull;
|
|
|
|
bool ret = true;
|
|
|
|
oidvector *old_indclass;
|
|
|
|
oidvector *old_indcollation;
|
2012-01-25 21:28:07 +01:00
|
|
|
Relation irel;
|
2011-07-18 17:02:48 +02:00
|
|
|
int i;
|
|
|
|
Datum d;
|
|
|
|
|
|
|
|
/* Caller should already have the relation locked in some way. */
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
relationId = IndexGetRelation(oldId, false);
|
2012-06-10 21:20:04 +02:00
|
|
|
|
2011-07-18 17:02:48 +02:00
|
|
|
/*
|
|
|
|
* We can pretend isconstraint = false unconditionally. It only serves to
|
|
|
|
* decide the text of an error message that should never happen for us.
|
|
|
|
*/
|
|
|
|
isconstraint = false;
|
|
|
|
|
|
|
|
numberOfAttributes = list_length(attributeList);
|
|
|
|
Assert(numberOfAttributes > 0);
|
|
|
|
Assert(numberOfAttributes <= INDEX_MAX_KEYS);
|
|
|
|
|
|
|
|
/* look up the access method */
|
|
|
|
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("access method \"%s\" does not exist",
|
|
|
|
accessMethodName)));
|
|
|
|
accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
accessMethodId = accessMethodForm->oid;
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
|
2011-07-18 17:02:48 +02:00
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
amcanorder = amRoutine->amcanorder;
|
|
|
|
|
2011-07-18 17:02:48 +02:00
|
|
|
/*
|
|
|
|
* Compute the operator classes, collations, and exclusion operators for
|
|
|
|
* the new index, so we can test whether it's compatible with the existing
|
|
|
|
* one. Note that ComputeIndexAttrs might fail here, but that's OK:
|
|
|
|
* DefineIndex would have called this function with the same arguments
|
2018-04-08 23:23:39 +02:00
|
|
|
* later on, and it would have failed then anyway. Our attributeList
|
|
|
|
* contains only key attributes, thus we're filling ii_NumIndexAttrs and
|
|
|
|
* ii_NumIndexKeyAttrs with same value.
|
2011-07-18 17:02:48 +02:00
|
|
|
*/
|
2019-07-29 02:58:49 +02:00
|
|
|
indexInfo = makeIndexInfo(numberOfAttributes, numberOfAttributes,
|
2022-02-03 11:29:54 +01:00
|
|
|
accessMethodId, NIL, NIL, false, false, false, false);
|
2012-01-25 21:28:07 +01:00
|
|
|
typeObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
2011-07-18 17:02:48 +02:00
|
|
|
collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
|
|
|
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
|
|
|
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
|
2012-01-25 21:28:07 +01:00
|
|
|
ComputeIndexAttrs(indexInfo,
|
|
|
|
typeObjectId, collationObjectId, classObjectId,
|
2011-07-18 17:02:48 +02:00
|
|
|
coloptions, attributeList,
|
|
|
|
exclusionOpNames, relationId,
|
|
|
|
accessMethodName, accessMethodId,
|
|
|
|
amcanorder, isconstraint);
|
|
|
|
|
|
|
|
|
|
|
|
/* Get the soon-obsolete pg_index tuple. */
|
|
|
|
tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId));
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
elog(ERROR, "cache lookup failed for index %u", oldId);
|
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
|
|
|
indexForm = (Form_pg_index) GETSTRUCT(tuple);
|
2011-07-18 17:02:48 +02:00
|
|
|
|
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
|
|
|
/*
|
|
|
|
* We don't assess expressions or predicates; assume incompatibility.
|
|
|
|
* Also, if the index is invalid for any reason, treat it as incompatible.
|
|
|
|
*/
|
2018-03-28 02:13:52 +02:00
|
|
|
if (!(heap_attisnull(tuple, Anum_pg_index_indpred, NULL) &&
|
|
|
|
heap_attisnull(tuple, Anum_pg_index_indexprs, NULL) &&
|
2018-12-27 10:07:46 +01:00
|
|
|
indexForm->indisvalid))
|
2011-07-18 17:02:48 +02:00
|
|
|
{
|
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
/* Any change in operator class or collation breaks compatibility. */
|
2018-04-07 22:00:39 +02:00
|
|
|
old_natts = indexForm->indnkeyatts;
|
2011-07-18 17:02:48 +02:00
|
|
|
Assert(old_natts == numberOfAttributes);
|
|
|
|
|
|
|
|
d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull);
|
|
|
|
Assert(!isnull);
|
|
|
|
old_indcollation = (oidvector *) DatumGetPointer(d);
|
|
|
|
|
|
|
|
d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indclass, &isnull);
|
|
|
|
Assert(!isnull);
|
|
|
|
old_indclass = (oidvector *) DatumGetPointer(d);
|
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
ret = (memcmp(old_indclass->values, classObjectId,
|
|
|
|
old_natts * sizeof(Oid)) == 0 &&
|
|
|
|
memcmp(old_indcollation->values, collationObjectId,
|
|
|
|
old_natts * sizeof(Oid)) == 0);
|
2011-07-18 17:02:48 +02:00
|
|
|
|
|
|
|
ReleaseSysCache(tuple);
|
|
|
|
|
2012-01-26 14:21:31 +01:00
|
|
|
if (!ret)
|
|
|
|
return false;
|
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
/* For polymorphic opcintype, column type changes break compatibility. */
|
|
|
|
irel = index_open(oldId, AccessShareLock); /* caller probably has a lock */
|
2012-01-26 14:21:31 +01:00
|
|
|
for (i = 0; i < old_natts; i++)
|
|
|
|
{
|
|
|
|
if (IsPolymorphicType(get_opclass_input_type(classObjectId[i])) &&
|
2017-08-20 20:19:07 +02:00
|
|
|
TupleDescAttr(irel->rd_att, i)->atttypid != typeObjectId[i])
|
2012-01-26 14:21:31 +01:00
|
|
|
{
|
|
|
|
ret = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2012-01-25 21:28:07 +01:00
|
|
|
|
Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
|
|
|
/* Any change in opclass options break compatibility. */
|
|
|
|
if (ret)
|
|
|
|
{
|
|
|
|
Datum *opclassOptions = RelationGetIndexRawAttOptions(irel);
|
|
|
|
|
|
|
|
ret = CompareOpclassOptions(opclassOptions,
|
|
|
|
indexInfo->ii_OpclassOptions, old_natts);
|
|
|
|
|
|
|
|
if (opclassOptions)
|
|
|
|
pfree(opclassOptions);
|
|
|
|
}
|
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
/* Any change in exclusion operator selections breaks compatibility. */
|
|
|
|
if (ret && indexInfo->ii_ExclusionOps != NULL)
|
2011-07-18 17:02:48 +02:00
|
|
|
{
|
|
|
|
Oid *old_operators,
|
|
|
|
*old_procs;
|
|
|
|
uint16 *old_strats;
|
|
|
|
|
|
|
|
RelationGetExclusionInfo(irel, &old_operators, &old_procs, &old_strats);
|
2012-01-25 21:28:07 +01:00
|
|
|
ret = memcmp(old_operators, indexInfo->ii_ExclusionOps,
|
|
|
|
old_natts * sizeof(Oid)) == 0;
|
2011-07-18 17:02:48 +02:00
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
/* Require an exact input type match for polymorphic operators. */
|
2012-01-26 14:21:31 +01:00
|
|
|
if (ret)
|
2012-01-25 21:28:07 +01:00
|
|
|
{
|
2012-01-26 14:21:31 +01:00
|
|
|
for (i = 0; i < old_natts && ret; i++)
|
|
|
|
{
|
|
|
|
Oid left,
|
|
|
|
right;
|
2011-07-18 17:02:48 +02:00
|
|
|
|
2012-01-26 14:21:31 +01:00
|
|
|
op_input_types(indexInfo->ii_ExclusionOps[i], &left, &right);
|
|
|
|
if ((IsPolymorphicType(left) || IsPolymorphicType(right)) &&
|
2017-08-20 20:19:07 +02:00
|
|
|
TupleDescAttr(irel->rd_att, i)->atttypid != typeObjectId[i])
|
2012-01-26 14:21:31 +01:00
|
|
|
{
|
|
|
|
ret = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2012-01-25 21:28:07 +01:00
|
|
|
}
|
2011-07-18 17:02:48 +02:00
|
|
|
}
|
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
index_close(irel, NoLock);
|
2011-07-18 17:02:48 +02:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
|
|
|
/*
|
|
|
|
* CompareOpclassOptions
|
|
|
|
*
|
|
|
|
* Compare per-column opclass options which are represented by arrays of text[]
|
|
|
|
* datums. Both elements of arrays and array themselves can be NULL.
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (!opts1 && !opts2)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
for (i = 0; i < natts; i++)
|
|
|
|
{
|
|
|
|
Datum opt1 = opts1 ? opts1[i] : (Datum) 0;
|
|
|
|
Datum opt2 = opts2 ? opts2[i] : (Datum) 0;
|
|
|
|
|
|
|
|
if (opt1 == (Datum) 0)
|
|
|
|
{
|
|
|
|
if (opt2 == (Datum) 0)
|
|
|
|
continue;
|
|
|
|
else
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
else if (opt2 == (Datum) 0)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* Compare non-NULL text[] datums. */
|
|
|
|
if (!DatumGetBool(DirectFunctionCall2(array_eq, opt1, opt2)))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* WaitForOlderSnapshots
|
|
|
|
*
|
|
|
|
* Wait for transactions that might have an older snapshot than the given xmin
|
|
|
|
* limit, because it might not contain tuples deleted just before it has
|
|
|
|
* been taken. Obtain a list of VXIDs of such transactions, and wait for them
|
|
|
|
* individually. This is used when building an index concurrently.
|
|
|
|
*
|
|
|
|
* We can exclude any running transactions that have xmin > the xmin given;
|
|
|
|
* their oldest snapshot must be newer than our xmin limit.
|
|
|
|
* We can also exclude any transactions that have xmin = zero, since they
|
|
|
|
* evidently have no live snapshot at all (and any one they might be in
|
|
|
|
* process of taking is certainly newer than ours). Transactions in other
|
|
|
|
* DBs can be ignored too, since they'll never even be able to see the
|
|
|
|
* index being worked on.
|
|
|
|
*
|
|
|
|
* We can also exclude autovacuum processes and processes running manual
|
|
|
|
* lazy VACUUMs, because they won't be fazed by missing index entries
|
|
|
|
* either. (Manual ANALYZEs, however, can't be excluded because they
|
|
|
|
* might be within transactions that are going to do arbitrary operations
|
2021-01-15 14:31:42 +01:00
|
|
|
* later.) Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
|
2020-11-25 22:21:08 +01:00
|
|
|
* on indexes that are neither expressional nor partial are also safe to
|
|
|
|
* ignore, since we know that those processes won't examine any data
|
|
|
|
* outside the table they're indexing.
|
2019-03-29 08:25:20 +01:00
|
|
|
*
|
|
|
|
* Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
|
|
|
|
* check for that.
|
|
|
|
*
|
|
|
|
* If a process goes idle-in-transaction with xmin zero, we do not need to
|
|
|
|
* wait for it anymore, per the above argument. We do not have the
|
|
|
|
* infrastructure right now to stop waiting if that happens, but we can at
|
|
|
|
* least avoid the folly of waiting when it is idle at the time we would
|
|
|
|
* begin to wait. We do this by repeatedly rechecking the output of
|
|
|
|
* GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
|
|
|
|
* doesn't show up in the output, we know we can forget about it.
|
|
|
|
*/
|
ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
Allow a partition be detached from its partitioned table without
blocking concurrent queries, by running in two transactions and only
requiring ShareUpdateExclusive in the partitioned table.
Because it runs in two transactions, it cannot be used in a transaction
block. This is the main reason to use dedicated syntax: so that users
can choose to use the original mode if they need it. But also, it
doesn't work when a default partition exists (because an exclusive lock
would still need to be obtained on it, in order to change its partition
constraint.)
In case the second transaction is cancelled or a crash occurs, there's
ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final
steps.
The main trick to make this work is the addition of column
pg_inherits.inhdetachpending, initially false; can only be set true in
the first part of this command. Once that is committed, concurrent
transactions that use a PartitionDirectory will include or ignore
partitions so marked: in optimizer they are ignored if the row is marked
committed for the snapshot; in executor they are always included. As a
result, and because of the way PartitionDirectory caches partition
descriptors, queries that were planned before the detach will see the
rows in the detached partition and queries that are planned after the
detach, won't.
A CHECK constraint is created that duplicates the partition constraint.
This is probably not strictly necessary, and some users will prefer to
remove it afterwards, but if the partition is re-attached to a
partitioned table, the constraint needn't be rechecked.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20200803234854.GA24158@alvherre.pgsql
2021-03-25 22:00:28 +01:00
|
|
|
void
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
|
|
|
int n_old_snapshots;
|
|
|
|
int i;
|
|
|
|
VirtualTransactionId *old_snapshots;
|
|
|
|
|
|
|
|
old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
|
2020-11-25 22:21:08 +01:00
|
|
|
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
|
|
|
|
| PROC_IN_SAFE_IC,
|
2019-03-29 08:25:20 +01:00
|
|
|
&n_old_snapshots);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
if (progress)
|
|
|
|
pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
for (i = 0; i < n_old_snapshots; i++)
|
|
|
|
{
|
|
|
|
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
|
|
|
|
continue; /* found uninteresting in previous cycle */
|
|
|
|
|
|
|
|
if (i > 0)
|
|
|
|
{
|
|
|
|
/* see if anything's changed ... */
|
|
|
|
VirtualTransactionId *newer_snapshots;
|
|
|
|
int n_newer_snapshots;
|
|
|
|
int j;
|
|
|
|
int k;
|
|
|
|
|
|
|
|
newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
|
|
|
|
true, false,
|
2020-11-25 22:21:08 +01:00
|
|
|
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM
|
|
|
|
| PROC_IN_SAFE_IC,
|
2019-03-29 08:25:20 +01:00
|
|
|
&n_newer_snapshots);
|
|
|
|
for (j = i; j < n_old_snapshots; j++)
|
|
|
|
{
|
|
|
|
if (!VirtualTransactionIdIsValid(old_snapshots[j]))
|
|
|
|
continue; /* found uninteresting in previous cycle */
|
|
|
|
for (k = 0; k < n_newer_snapshots; k++)
|
|
|
|
{
|
|
|
|
if (VirtualTransactionIdEquals(old_snapshots[j],
|
|
|
|
newer_snapshots[k]))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (k >= n_newer_snapshots) /* not there anymore */
|
|
|
|
SetInvalidVirtualTransactionId(old_snapshots[j]);
|
|
|
|
}
|
|
|
|
pfree(newer_snapshots);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (VirtualTransactionIdIsValid(old_snapshots[i]))
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
{
|
2019-10-16 14:51:34 +02:00
|
|
|
/* If requested, publish who we're going to wait for. */
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
if (progress)
|
|
|
|
{
|
|
|
|
PGPROC *holder = BackendIdGetProc(old_snapshots[i].backendId);
|
|
|
|
|
2019-10-16 14:51:34 +02:00
|
|
|
if (holder)
|
|
|
|
pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
|
|
|
|
holder->pid);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
}
|
2019-03-29 08:25:20 +01:00
|
|
|
VirtualXactLock(old_snapshots[i], true);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (progress)
|
|
|
|
pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, i + 1);
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
1999-02-14 00:22:53 +01:00
|
|
|
* DefineIndex
|
1996-07-09 08:22:35 +02:00
|
|
|
* Creates a new index.
|
|
|
|
*
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
* 'relationId': the OID of the heap relation on which the index is to be
|
|
|
|
* created
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
* 'stmt': IndexStmt describing the properties of the new index.
|
2005-04-14 03:38:22 +02:00
|
|
|
* 'indexRelationId': normally InvalidOid, but during bootstrap can be
|
|
|
|
* nonzero to specify a preselected OID for the index.
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
* 'parentIndexId': the OID of the parent index; InvalidOid if not the child
|
|
|
|
* of a partitioned index.
|
2018-02-19 20:59:37 +01:00
|
|
|
* 'parentConstraintId': the OID of the parent constraint; InvalidOid if not
|
|
|
|
* the child of a constraint (only used when recursing)
|
2004-05-05 06:48:48 +02:00
|
|
|
* 'is_alter_table': this is due to an ALTER rather than a CREATE operation.
|
2017-02-12 22:03:41 +01:00
|
|
|
* 'check_rights': check for CREATE rights in namespace and tablespace. (This
|
|
|
|
* should be true except when ALTER is deleting/recreating an index.)
|
2017-06-04 18:02:31 +02:00
|
|
|
* 'check_not_in_use': check for table not already in use in current session.
|
|
|
|
* This should be true unless caller is holding the table open, in which
|
|
|
|
* case the caller had better have checked it earlier.
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
* 'skip_build': make the catalog entries but don't create the index files
|
2004-05-05 06:48:48 +02:00
|
|
|
* 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
|
2011-07-18 17:02:48 +02:00
|
|
|
*
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
* Returns the object address of the created index.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
DefineIndex(Oid relationId,
|
|
|
|
IndexStmt *stmt,
|
2005-04-14 03:38:22 +02:00
|
|
|
Oid indexRelationId,
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
Oid parentIndexId,
|
2018-02-19 20:59:37 +01:00
|
|
|
Oid parentConstraintId,
|
2004-05-05 06:48:48 +02:00
|
|
|
bool is_alter_table,
|
|
|
|
bool check_rights,
|
2017-06-04 18:02:31 +02:00
|
|
|
bool check_not_in_use,
|
2004-05-05 06:48:48 +02:00
|
|
|
bool skip_build,
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
bool quiet)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
bool concurrent;
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
char *indexRelationName;
|
|
|
|
char *accessMethodName;
|
2012-01-25 21:28:07 +01:00
|
|
|
Oid *typeObjectId;
|
2011-02-08 22:04:18 +01:00
|
|
|
Oid *collationObjectId;
|
1996-07-09 08:22:35 +02:00
|
|
|
Oid *classObjectId;
|
|
|
|
Oid accessMethodId;
|
2002-04-27 05:45:03 +02:00
|
|
|
Oid namespaceId;
|
2004-06-18 08:14:31 +02:00
|
|
|
Oid tablespaceId;
|
2018-02-19 20:59:37 +01:00
|
|
|
Oid createdConstraintId = InvalidOid;
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
List *indexColNames;
|
2018-04-12 16:25:13 +02:00
|
|
|
List *allIndexParams;
|
2002-01-04 00:21:32 +01:00
|
|
|
Relation rel;
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
HeapTuple tuple;
|
|
|
|
Form_pg_am accessMethodForm;
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
IndexAmRoutine *amRoutine;
|
2007-01-09 03:14:16 +01:00
|
|
|
bool amcanorder;
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
amoptions_function amoptions;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
bool partitioned;
|
2020-11-25 22:21:08 +01:00
|
|
|
bool safe_index;
|
2006-07-04 00:45:41 +02:00
|
|
|
Datum reloptions;
|
2007-01-09 03:14:16 +01:00
|
|
|
int16 *coloptions;
|
2000-07-15 00:18:02 +02:00
|
|
|
IndexInfo *indexInfo;
|
2017-11-14 15:19:05 +01:00
|
|
|
bits16 flags;
|
|
|
|
bits16 constr_flags;
|
1996-07-09 08:22:35 +02:00
|
|
|
int numberOfAttributes;
|
2018-04-07 22:00:39 +02:00
|
|
|
int numberOfKeyAttributes;
|
2013-04-25 22:58:05 +02:00
|
|
|
TransactionId limitXmin;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2006-08-25 06:06:58 +02:00
|
|
|
LockRelId heaprelid;
|
2006-08-27 21:14:34 +02:00
|
|
|
LOCKTAG heaplocktag;
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
LOCKMODE lockmode;
|
2006-08-25 06:06:58 +02:00
|
|
|
Snapshot snapshot;
|
2022-05-09 17:35:08 +02:00
|
|
|
Oid root_save_userid;
|
|
|
|
int root_save_sec_context;
|
|
|
|
int root_save_nestlevel;
|
2009-04-04 19:40:36 +02:00
|
|
|
int i;
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
root_save_nestlevel = NewGUCNestLevel();
|
|
|
|
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
/*
|
|
|
|
* Some callers need us to run with an empty default_tablespace; this is a
|
|
|
|
* necessary hack to be able to reproduce catalog state accurately when
|
|
|
|
* recreating indexes after table-rewriting ALTER TABLE.
|
|
|
|
*/
|
|
|
|
if (stmt->reset_default_tblspc)
|
|
|
|
(void) set_config_option("default_tablespace", "",
|
|
|
|
PGC_USERSET, PGC_S_SESSION,
|
|
|
|
GUC_ACTION_SAVE, true, 0, false);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
/*
|
|
|
|
* Force non-concurrent build on temporary relations, even if CONCURRENTLY
|
|
|
|
* was requested. Other backends can't access a temporary relation, so
|
|
|
|
* there's no harm in grabbing a stronger lock, and a non-concurrent DROP
|
|
|
|
* is more efficient. Do this before any use of the concurrent option is
|
|
|
|
* done.
|
|
|
|
*/
|
|
|
|
if (stmt->concurrent && get_rel_persistence(relationId) != RELPERSISTENCE_TEMP)
|
|
|
|
concurrent = true;
|
|
|
|
else
|
|
|
|
concurrent = false;
|
|
|
|
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
/*
|
|
|
|
* Start progress report. If we're building a partition, this was already
|
|
|
|
* done.
|
|
|
|
*/
|
|
|
|
if (!OidIsValid(parentIndexId))
|
2019-06-04 09:16:02 +02:00
|
|
|
{
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
|
|
|
|
relationId);
|
2019-06-04 09:16:02 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_COMMAND,
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
concurrent ?
|
2019-06-04 09:16:02 +02:00
|
|
|
PROGRESS_CREATEIDX_COMMAND_CREATE_CONCURRENTLY :
|
|
|
|
PROGRESS_CREATEIDX_COMMAND_CREATE);
|
|
|
|
}
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
/*
|
|
|
|
* No index OID to report yet
|
|
|
|
*/
|
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
|
|
|
|
InvalidOid);
|
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2018-04-07 22:00:39 +02:00
|
|
|
* count key attributes in index
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
2018-04-07 22:00:39 +02:00
|
|
|
numberOfKeyAttributes = list_length(stmt->indexParams);
|
|
|
|
|
|
|
|
/*
|
2018-04-12 16:25:13 +02:00
|
|
|
* Calculate the new list of index columns including both key columns and
|
|
|
|
* INCLUDE columns. Later we can determine which of these are key
|
|
|
|
* columns, and which are just part of the INCLUDE list by checking the
|
|
|
|
* list position. A list item in a position less than ii_NumIndexKeyAttrs
|
|
|
|
* is part of the key columns, and anything equal to and over is part of
|
|
|
|
* the INCLUDE columns.
|
2018-04-07 22:00:39 +02:00
|
|
|
*/
|
Rationalize use of list_concat + list_copy combinations.
In the wake of commit 1cff1b95a, the result of list_concat no longer
shares the ListCells of the second input. Therefore, we can replace
"list_concat(x, list_copy(y))" with just "list_concat(x, y)".
To improve call sites that were list_copy'ing the first argument,
or both arguments, invent "list_concat_copy()" which produces a new
list sharing no ListCells with either input. (This is a bit faster
than "list_concat(list_copy(x), y)" because it makes the result list
the right size to start with.)
In call sites that were not list_copy'ing the second argument, the new
semantics mean that we are usually leaking the second List's storage,
since typically there is no remaining pointer to it. We considered
inventing another list_copy variant that would list_free the second
input, but concluded that for most call sites it isn't worth worrying
about, given the relative compactness of the new List representation.
(Note that in cases where such leakage would happen, the old code
already leaked the second List's header; so we're only discussing
the size of the leak not whether there is one. I did adjust two or
three places that had been troubling to free that header so that
they manually free the whole second List.)
Patch by me; thanks to David Rowley for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-08-12 17:20:18 +02:00
|
|
|
allIndexParams = list_concat_copy(stmt->indexParams,
|
|
|
|
stmt->indexIncludingParams);
|
2018-04-12 16:25:13 +02:00
|
|
|
numberOfAttributes = list_length(allIndexParams);
|
2018-04-07 22:00:39 +02:00
|
|
|
|
2020-11-15 22:10:48 +01:00
|
|
|
if (numberOfKeyAttributes <= 0)
|
2016-04-08 20:52:13 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("must specify at least one column")));
|
2000-01-12 06:04:42 +01:00
|
|
|
if (numberOfAttributes > INDEX_MAX_KEYS)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_TOO_MANY_COLUMNS),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("cannot use more than %d columns in an index",
|
2003-07-20 23:56:35 +02:00
|
|
|
INDEX_MAX_KEYS)));
|
1997-09-07 07:04:48 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2006-08-25 06:06:58 +02:00
|
|
|
* Only SELECT ... FOR UPDATE/SHARE are allowed while doing a standard
|
|
|
|
* index build; but for concurrent builds we allow INSERT/UPDATE/DELETE
|
|
|
|
* (but not VACUUM).
|
Avoid repeated name lookups during table and index DDL.
If the name lookups come to different conclusions due to concurrent
activity, we might perform some parts of the DDL on a different table
than other parts. At least in the case of CREATE INDEX, this can be
used to cause the permissions checks to be performed against a
different table than the index creation, allowing for a privilege
escalation attack.
This changes the calling convention for DefineIndex, CreateTrigger,
transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible
(in 9.2 and newer), and AlterTable (in 9.1 and older). In addition,
CheckRelationOwnership is removed in 9.2 and newer and the calling
convention is changed in older branches. A field has also been added
to the Constraint node (FkConstraint in 8.4). Third-party code calling
these functions or using the Constraint node will require updating.
Report by Andres Freund. Patch by Robert Haas and Andres Freund,
reviewed by Tom Lane.
Security: CVE-2014-0062
2014-02-17 15:33:31 +01:00
|
|
|
*
|
|
|
|
* NB: Caller is responsible for making sure that relationId refers to the
|
|
|
|
* relation on which the index should be built; except in bootstrap mode,
|
|
|
|
* this will typically require the caller to have already locked the
|
|
|
|
* relation. To avoid lock upgrade hazards, that lock should be at least
|
|
|
|
* as strong as the one we take here.
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
|
|
|
* NB: If the lock strength here ever changes, code that is run by
|
|
|
|
* parallel workers under the control of certain particular ambuild
|
|
|
|
* functions will need to be updated, too.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
lockmode = concurrent ? ShareUpdateExclusiveLock : ShareLock;
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(relationId, lockmode);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
/*
|
|
|
|
* Switch to the table owner's userid, so that any index functions are run
|
|
|
|
* as that user. Also lock down security-restricted operations. We
|
|
|
|
* already arranged to make GUC variable changes local to this command.
|
|
|
|
*/
|
|
|
|
GetUserIdAndSecContext(&root_save_userid, &root_save_sec_context);
|
|
|
|
SetUserIdAndSecContext(rel->rd_rel->relowner,
|
|
|
|
root_save_sec_context | SECURITY_RESTRICTED_OPERATION);
|
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
namespaceId = RelationGetNamespace(rel);
|
2002-01-04 00:21:32 +01:00
|
|
|
|
2017-10-16 12:22:18 +02:00
|
|
|
/* Ensure that it makes sense to index this kind of relation */
|
|
|
|
switch (rel->rd_rel->relkind)
|
2011-05-05 21:47:42 +02:00
|
|
|
{
|
2017-10-16 12:22:18 +02:00
|
|
|
case RELKIND_RELATION:
|
|
|
|
case RELKIND_MATVIEW:
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
case RELKIND_PARTITIONED_TABLE:
|
2017-10-16 12:22:18 +02:00
|
|
|
/* OK */
|
|
|
|
break;
|
|
|
|
default:
|
2011-05-05 21:47:42 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
Improve error messages about mismatching relkind
Most error messages about a relkind that was not supported or
appropriate for the command was of the pattern
"relation \"%s\" is not a table, foreign table, or materialized view"
This style can become verbose and tedious to maintain. Moreover, it's
not very helpful: If I'm trying to create a comment on a TOAST table,
which is not supported, then the information that I could have created
a comment on a materialized view is pointless.
Instead, write the primary error message shorter and saying more
directly that what was attempted is not possible. Then, in the detail
message, explain that the operation is not supported for the relkind
the object was. To simplify that, add a new function
errdetail_relkind_not_supported() that does this.
In passing, make use of RELKIND_HAS_STORAGE() where appropriate,
instead of listing out the relkinds individually.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
2021-07-08 09:38:52 +02:00
|
|
|
errmsg("cannot create index on relation \"%s\"",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail_relkind_not_supported(rel->rd_rel->relkind)));
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Establish behavior for partitioned tables, and verify sanity of
|
|
|
|
* parameters.
|
|
|
|
*
|
|
|
|
* We do not build an actual index in this case; we only create a few
|
|
|
|
* catalog entries. The actual indexes are built by recursing for each
|
|
|
|
* partition.
|
|
|
|
*/
|
|
|
|
partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
|
|
|
|
if (partitioned)
|
|
|
|
{
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
/*
|
|
|
|
* Note: we check 'stmt->concurrent' rather than 'concurrent', so that
|
|
|
|
* the error is thrown also for temporary tables. Seems better to be
|
|
|
|
* consistent, even though we could do it on temporary table because
|
|
|
|
* we're not actually doing it concurrently.
|
|
|
|
*/
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (stmt->concurrent)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot create index on partitioned table \"%s\" concurrently",
|
|
|
|
RelationGetRelationName(rel))));
|
|
|
|
if (stmt->excludeOpNames)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
|
|
|
|
RelationGetRelationName(rel))));
|
2011-05-05 21:47:42 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
/*
|
|
|
|
* Don't try to CREATE INDEX on temp tables of other backends.
|
|
|
|
*/
|
2009-04-01 00:12:48 +02:00
|
|
|
if (RELATION_IS_OTHER_TEMP(rel))
|
2006-08-25 06:06:58 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot create indexes on temporary tables of other sessions")));
|
2002-01-04 00:21:32 +01:00
|
|
|
|
2017-06-04 18:02:31 +02:00
|
|
|
/*
|
|
|
|
* Unless our caller vouches for having checked this already, insist that
|
|
|
|
* the table not be in use by our own session, either. Otherwise we might
|
|
|
|
* fail to make entries in the new index (for instance, if an INSERT or
|
|
|
|
* UPDATE is in progress and has already made its list of target indexes).
|
|
|
|
*/
|
|
|
|
if (check_not_in_use)
|
|
|
|
CheckTableNotInUse(rel, "CREATE INDEX");
|
|
|
|
|
2002-04-27 05:45:03 +02:00
|
|
|
/*
|
|
|
|
* Verify we (still) have CREATE rights in the rel's namespace.
|
|
|
|
* (Presumably we did when the rel was created, but maybe not anymore.)
|
2004-05-05 06:48:48 +02:00
|
|
|
* Skip check if caller doesn't want it. Also skip check if
|
|
|
|
* bootstrapping, since permissions machinery may not be working yet.
|
2002-04-27 05:45:03 +02:00
|
|
|
*/
|
2004-05-05 06:48:48 +02:00
|
|
|
if (check_rights && !IsBootstrapProcessingMode())
|
2002-04-27 05:45:03 +02:00
|
|
|
{
|
|
|
|
AclResult aclresult;
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
aclresult = pg_namespace_aclcheck(namespaceId, root_save_userid,
|
2002-04-27 05:45:03 +02:00
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_SCHEMA,
|
2003-08-01 02:15:26 +02:00
|
|
|
get_namespace_name(namespaceId));
|
2002-04-27 05:45:03 +02:00
|
|
|
}
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
2007-06-03 19:08:34 +02:00
|
|
|
* Select tablespace to use. If not specified, use default tablespace
|
2004-11-05 20:17:13 +01:00
|
|
|
* (which may in turn default to database's default).
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
if (stmt->tableSpace)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
tablespaceId = get_tablespace_oid(stmt->tableSpace, false);
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
if (partitioned && tablespaceId == MyDatabaseTableSpace)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-04-30 16:00:38 +02:00
|
|
|
errmsg("cannot specify default tablespace for partitioned relations")));
|
2004-11-05 20:17:13 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence,
|
|
|
|
partitioned);
|
2004-11-05 20:17:13 +01:00
|
|
|
/* note InvalidOid is OK in this case */
|
|
|
|
}
|
|
|
|
|
2017-02-12 22:03:41 +01:00
|
|
|
/* Check tablespace permissions */
|
|
|
|
if (check_rights &&
|
|
|
|
OidIsValid(tablespaceId) && tablespaceId != MyDatabaseTableSpace)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
AclResult aclresult;
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
aclresult = pg_tablespace_aclcheck(tablespaceId, root_save_userid,
|
2004-06-18 08:14:31 +02:00
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE,
|
2004-11-05 20:17:13 +01:00
|
|
|
get_tablespace_name(tablespaceId));
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* Force shared indexes into the pg_global tablespace. This is a bit of a
|
2010-02-07 21:48:13 +01:00
|
|
|
* hack but seems simpler than marking them in the BKI commands. On the
|
|
|
|
* other hand, if it's not shared, don't allow it to be placed there.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
|
|
|
if (rel->rd_rel->relisshared)
|
|
|
|
tablespaceId = GLOBALTABLESPACE_OID;
|
2010-02-07 21:48:13 +01:00
|
|
|
else if (tablespaceId == GLOBALTABLESPACE_OID)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("only shared relations can be placed in pg_global tablespace")));
|
2004-11-05 20:17:13 +01:00
|
|
|
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
/*
|
|
|
|
* Choose the index column names.
|
|
|
|
*/
|
2018-04-12 16:25:13 +02:00
|
|
|
indexColNames = ChooseIndexColumnNames(allIndexParams);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
|
2004-05-05 06:48:48 +02:00
|
|
|
/*
|
|
|
|
* Select name for index if caller didn't specify
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
indexRelationName = stmt->idxname;
|
2004-05-05 06:48:48 +02:00
|
|
|
if (indexRelationName == NULL)
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
indexRelationName = ChooseIndexName(RelationGetRelationName(rel),
|
|
|
|
namespaceId,
|
|
|
|
indexColNames,
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
stmt->excludeOpNames,
|
|
|
|
stmt->primary,
|
|
|
|
stmt->isconstraint);
|
2004-05-05 06:48:48 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
* look up the access method, verify it can handle the requested features
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
accessMethodName = stmt->accessMethod;
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
if (!HeapTupleIsValid(tuple))
|
2005-11-07 18:36:47 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Hack to provide more-or-less-transparent updating of old RTREE
|
2011-05-19 00:14:45 +02:00
|
|
|
* indexes to GiST: if RTREE is requested and not found, use GIST.
|
2005-11-07 18:36:47 +01:00
|
|
|
*/
|
|
|
|
if (strcmp(accessMethodName, "rtree") == 0)
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("substituting access method \"gist\" for obsolete method \"rtree\"")));
|
|
|
|
accessMethodName = "gist";
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
|
2005-11-07 18:36:47 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("access method \"%s\" does not exist",
|
|
|
|
accessMethodName)));
|
|
|
|
}
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
accessMethodId = accessMethodForm->oid;
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
|
|
|
|
accessMethodId);
|
|
|
|
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
if (stmt->unique && !amRoutine->amcanunique)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("access method \"%s\" does not support unique indexes",
|
2003-07-20 23:56:35 +02:00
|
|
|
accessMethodName)));
|
2018-07-18 20:43:03 +02:00
|
|
|
if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude)
|
2018-04-07 22:00:39 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("access method \"%s\" does not support included columns",
|
|
|
|
accessMethodName)));
|
2020-11-15 22:10:48 +01:00
|
|
|
if (numberOfKeyAttributes > 1 && !amRoutine->amcanmulticol)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2003-09-25 08:58:07 +02:00
|
|
|
errmsg("access method \"%s\" does not support multicolumn indexes",
|
2003-07-20 23:56:35 +02:00
|
|
|
accessMethodName)));
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
if (stmt->excludeOpNames && amRoutine->amgettuple == NULL)
|
2009-12-07 06:22:23 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("access method \"%s\" does not support exclusion constraints",
|
|
|
|
accessMethodName)));
|
2000-07-15 00:18:02 +02:00
|
|
|
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
amcanorder = amRoutine->amcanorder;
|
|
|
|
amoptions = amRoutine->amoptions;
|
2006-07-04 00:45:41 +02:00
|
|
|
|
Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-18 01:36:59 +01:00
|
|
|
pfree(amRoutine);
|
Restructure index AM interface for index building and index tuple deletion,
per previous discussion on pghackers. Most of the duplicate code in
different AMs' ambuild routines has been moved out to a common routine
in index.c; this means that all index types now do the right things about
inserting recently-dead tuples, etc. (I also removed support for EXTEND
INDEX in the ambuild routines, since that's about to go away anyway, and
it cluttered the code a lot.) The retail indextuple deletion routines have
been replaced by a "bulk delete" routine in which the indexscan is inside
the access method. I haven't pushed this change as far as it should go yet,
but it should allow considerable simplification of the internal bookkeeping
for deletions. Also, add flag columns to pg_am to eliminate various
hardcoded tests on AM OIDs, and remove unused pg_am columns.
Fix rtree and gist index types to not attempt to store NULLs; before this,
gist usually crashed, while rtree managed not to crash but computed wacko
bounding boxes for NULL entries (which might have had something to do with
the performance problems we've heard about occasionally).
Add AtEOXact routines to hash, rtree, and gist, all of which have static
state that needs to be reset after an error. We discovered this need long
ago for btree, but missed the other guys.
Oh, one more thing: concurrent VACUUM is now the default.
2001-07-16 00:48:19 +02:00
|
|
|
ReleaseSysCache(tuple);
|
2000-07-15 00:18:02 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
2003-12-28 22:57:37 +01:00
|
|
|
* Validate predicate, if given
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
if (stmt->whereClause)
|
|
|
|
CheckPredicate((Expr *) stmt->whereClause);
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2006-07-04 00:45:41 +02:00
|
|
|
/*
|
2007-12-02 00:44:44 +01:00
|
|
|
* Parse AM-specific options, convert to text array form, validate.
|
2006-07-04 00:45:41 +02:00
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
reloptions = transformRelOptions((Datum) 0, stmt->options,
|
|
|
|
NULL, NULL, false, false);
|
2006-07-04 00:45:41 +02:00
|
|
|
|
|
|
|
(void) index_reloptions(amoptions, reloptions, true);
|
|
|
|
|
2000-07-15 00:18:02 +02:00
|
|
|
/*
|
|
|
|
* Prepare arguments for index_create, primarily an IndexInfo structure.
|
2019-07-29 02:58:49 +02:00
|
|
|
* Note that predicates must be in implicit-AND format. In a concurrent
|
|
|
|
* build, mark it not-ready-for-inserts.
|
2000-07-15 00:18:02 +02:00
|
|
|
*/
|
2019-07-29 02:58:49 +02:00
|
|
|
indexInfo = makeIndexInfo(numberOfAttributes,
|
|
|
|
numberOfKeyAttributes,
|
|
|
|
accessMethodId,
|
|
|
|
NIL, /* expressions, NIL for now */
|
|
|
|
make_ands_implicit((Expr *) stmt->whereClause),
|
|
|
|
stmt->unique,
|
2022-02-03 11:29:54 +01:00
|
|
|
stmt->nulls_not_distinct,
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
!concurrent,
|
|
|
|
concurrent);
|
2000-07-15 00:18:02 +02:00
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
typeObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
2011-02-08 22:04:18 +01:00
|
|
|
collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
2018-04-08 23:23:39 +02:00
|
|
|
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
|
2007-01-09 03:14:16 +01:00
|
|
|
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
|
2012-01-25 21:28:07 +01:00
|
|
|
ComputeIndexAttrs(indexInfo,
|
|
|
|
typeObjectId, collationObjectId, classObjectId,
|
2018-04-12 16:25:13 +02:00
|
|
|
coloptions, allIndexParams,
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
stmt->excludeOpNames, relationId,
|
2009-12-07 06:22:23 +01:00
|
|
|
accessMethodName, accessMethodId,
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
amcanorder, stmt->isconstraint);
|
2004-05-05 06:48:48 +02:00
|
|
|
|
2011-01-25 21:42:03 +01:00
|
|
|
/*
|
|
|
|
* Extra checks when creating a PRIMARY KEY index.
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
if (stmt->primary)
|
2018-10-07 00:17:46 +02:00
|
|
|
index_check_primary_key(rel, indexInfo, is_alter_table, stmt);
|
2011-01-25 21:42:03 +01:00
|
|
|
|
2018-02-19 20:59:37 +01:00
|
|
|
/*
|
|
|
|
* If this table is partitioned and we're creating a unique index or a
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
* primary key, make sure that the partition key is a subset of the
|
|
|
|
* index's columns. Otherwise it would be possible to violate uniqueness
|
|
|
|
* by putting values that ought to be unique in different partitions.
|
2018-02-19 20:59:37 +01:00
|
|
|
*
|
|
|
|
* We could lift this limitation if we had global indexes, but those have
|
|
|
|
* their own problems, so this is a useful feature combination.
|
|
|
|
*/
|
|
|
|
if (partitioned && (stmt->unique || stmt->primary))
|
|
|
|
{
|
Load relcache entries' partitioning data on-demand, not immediately.
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested. (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)
This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote). In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.
Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero. In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.) Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.
In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does. Avoid doing get_rel_relkind in the
critical section, too.
Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too. Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.
Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 20:43:13 +01:00
|
|
|
PartitionKey key = RelationGetPartitionKey(rel);
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
const char *constraint_type;
|
2018-02-19 20:59:37 +01:00
|
|
|
int i;
|
|
|
|
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
if (stmt->primary)
|
|
|
|
constraint_type = "PRIMARY KEY";
|
|
|
|
else if (stmt->unique)
|
|
|
|
constraint_type = "UNIQUE";
|
|
|
|
else if (stmt->excludeOpNames != NIL)
|
|
|
|
constraint_type = "EXCLUDE";
|
|
|
|
else
|
|
|
|
{
|
|
|
|
elog(ERROR, "unknown constraint type");
|
|
|
|
constraint_type = NULL; /* keep compiler quiet */
|
|
|
|
}
|
|
|
|
|
2018-02-19 20:59:37 +01:00
|
|
|
/*
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
* Verify that all the columns in the partition key appear in the
|
|
|
|
* unique key definition, with the same notion of equality.
|
2018-02-19 20:59:37 +01:00
|
|
|
*/
|
|
|
|
for (i = 0; i < key->partnatts; i++)
|
|
|
|
{
|
|
|
|
bool found = false;
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
int eq_strategy;
|
|
|
|
Oid ptkey_eqop;
|
2018-02-19 20:59:37 +01:00
|
|
|
int j;
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Identify the equality operator associated with this partkey
|
|
|
|
* column. For list and range partitioning, partkeys use btree
|
|
|
|
* operator classes; hash partitioning uses hash operator classes.
|
|
|
|
* (Keep this in sync with ComputePartitionAttrs!)
|
|
|
|
*/
|
|
|
|
if (key->strategy == PARTITION_STRATEGY_HASH)
|
|
|
|
eq_strategy = HTEqualStrategyNumber;
|
2018-02-19 20:59:37 +01:00
|
|
|
else
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
eq_strategy = BTEqualStrategyNumber;
|
|
|
|
|
|
|
|
ptkey_eqop = get_opfamily_member(key->partopfamily[i],
|
|
|
|
key->partopcintype[i],
|
|
|
|
key->partopcintype[i],
|
|
|
|
eq_strategy);
|
|
|
|
if (!OidIsValid(ptkey_eqop))
|
|
|
|
elog(ERROR, "missing operator %d(%u,%u) in partition opfamily %u",
|
|
|
|
eq_strategy, key->partopcintype[i], key->partopcintype[i],
|
|
|
|
key->partopfamily[i]);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We'll need to be able to identify the equality operators
|
|
|
|
* associated with index columns, too. We know what to do with
|
|
|
|
* btree opclasses; if there are ever any other index types that
|
|
|
|
* support unique indexes, this logic will need extension.
|
|
|
|
*/
|
|
|
|
if (accessMethodId == BTREE_AM_OID)
|
|
|
|
eq_strategy = BTEqualStrategyNumber;
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot match partition key to an index using access method \"%s\"",
|
|
|
|
accessMethodName)));
|
2018-02-19 20:59:37 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* It may be possible to support UNIQUE constraints when partition
|
|
|
|
* keys are expressions, but is it worth it? Give up for now.
|
|
|
|
*/
|
|
|
|
if (key->partattrs[i] == 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("unsupported %s constraint with partition key definition",
|
|
|
|
constraint_type),
|
|
|
|
errdetail("%s constraints cannot be used when partition keys include expressions.",
|
|
|
|
constraint_type)));
|
|
|
|
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
/* Search the index column(s) for a match */
|
2019-01-14 23:25:19 +01:00
|
|
|
for (j = 0; j < indexInfo->ii_NumIndexKeyAttrs; j++)
|
2018-02-19 20:59:37 +01:00
|
|
|
{
|
2018-04-12 12:02:45 +02:00
|
|
|
if (key->partattrs[i] == indexInfo->ii_IndexAttrNumbers[j])
|
2018-02-19 20:59:37 +01:00
|
|
|
{
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
/* Matched the column, now what about the equality op? */
|
|
|
|
Oid idx_opfamily;
|
|
|
|
Oid idx_opcintype;
|
|
|
|
|
|
|
|
if (get_opclass_opfamily_and_input_type(classObjectId[j],
|
|
|
|
&idx_opfamily,
|
|
|
|
&idx_opcintype))
|
|
|
|
{
|
|
|
|
Oid idx_eqop;
|
|
|
|
|
|
|
|
idx_eqop = get_opfamily_member(idx_opfamily,
|
|
|
|
idx_opcintype,
|
|
|
|
idx_opcintype,
|
|
|
|
eq_strategy);
|
|
|
|
if (ptkey_eqop == idx_eqop)
|
|
|
|
{
|
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2018-02-19 20:59:37 +01:00
|
|
|
}
|
|
|
|
}
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
|
2018-02-19 20:59:37 +01:00
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
Form_pg_attribute att;
|
|
|
|
|
Check equality semantics for unique indexes on partitioned tables.
We require the partition key to be a subset of the set of columns
being made unique, so that physically-separate indexes on the different
partitions are sufficient to enforce the uniqueness constraint.
The existing code checked that the listed columns appear, but did not
inquire into the index semantics, which is a serious oversight given
that different index opclasses might enforce completely different
notions of uniqueness.
Ideally, perhaps, we'd just match the partition key opfamily to the
index opfamily. But hash partitioning uses hash opfamilies which we
can't directly match to btree opfamilies. Hence, look up the equality
operator in each family, and accept if it's the same operator. This
should be okay in a fairly general sense, since the equality operator
ought to precisely represent the opfamily's notion of uniqueness.
A remaining weak spot is that we don't have a cross-index-AM notion of
which opfamily member is "equality". But we know which one to use for
hash and btree AMs, and those are the only two that are relevant here
at present. (Any non-core AMs that know how to enforce equality are
out of luck, for now.)
Back-patch to v11 where this feature was introduced.
Guancheng Luo, revised a bit by me
Discussion: https://postgr.es/m/D9C3CEF7-04E8-47A1-8300-CA1DCD5ED40D@gmail.com
2020-04-01 20:49:49 +02:00
|
|
|
att = TupleDescAttr(RelationGetDescr(rel),
|
|
|
|
key->partattrs[i] - 1);
|
2018-02-19 20:59:37 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2020-09-30 23:25:23 +02:00
|
|
|
errmsg("unique constraint on partitioned table must include all partitioning columns"),
|
2018-02-19 20:59:37 +01:00
|
|
|
errdetail("%s constraint on table \"%s\" lacks column \"%s\" which is part of the partition key.",
|
|
|
|
constraint_type, RelationGetRelationName(rel),
|
|
|
|
NameStr(att->attname))));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2016-04-16 18:11:41 +02:00
|
|
|
/*
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
* We disallow indexes on system columns. They would not necessarily get
|
|
|
|
* updated correctly, and they don't seem useful anyway.
|
2016-04-16 18:11:41 +02:00
|
|
|
*/
|
|
|
|
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
|
|
|
|
{
|
2018-04-12 12:02:45 +02:00
|
|
|
AttrNumber attno = indexInfo->ii_IndexAttrNumbers[i];
|
2016-04-16 18:11:41 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
if (attno < 0)
|
2016-04-16 18:11:41 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("index creation on system columns is not supported")));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Also check for system columns used in expressions or predicates.
|
|
|
|
*/
|
|
|
|
if (indexInfo->ii_Expressions || indexInfo->ii_Predicate)
|
|
|
|
{
|
|
|
|
Bitmapset *indexattrs = NULL;
|
|
|
|
|
|
|
|
pull_varattnos((Node *) indexInfo->ii_Expressions, 1, &indexattrs);
|
|
|
|
pull_varattnos((Node *) indexInfo->ii_Predicate, 1, &indexattrs);
|
|
|
|
|
|
|
|
for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
|
|
|
|
{
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
if (bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
|
2016-04-16 18:11:41 +02:00
|
|
|
indexattrs))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("index creation on system columns is not supported")));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-11-25 22:21:08 +01:00
|
|
|
/* Is index safe for others to ignore? See set_indexsafe_procflags() */
|
|
|
|
safe_index = indexInfo->ii_Expressions == NIL &&
|
|
|
|
indexInfo->ii_Predicate == NIL;
|
|
|
|
|
2004-05-05 06:48:48 +02:00
|
|
|
/*
|
|
|
|
* Report index creation if appropriate (delay this till after most of the
|
|
|
|
* error checks)
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
if (stmt->isconstraint && !quiet)
|
2009-12-07 06:22:23 +01:00
|
|
|
{
|
|
|
|
const char *constraint_type;
|
|
|
|
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
if (stmt->primary)
|
2009-12-07 06:22:23 +01:00
|
|
|
constraint_type = "PRIMARY KEY";
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
else if (stmt->unique)
|
2009-12-07 06:22:23 +01:00
|
|
|
constraint_type = "UNIQUE";
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
else if (stmt->excludeOpNames != NIL)
|
2009-12-07 06:22:23 +01:00
|
|
|
constraint_type = "EXCLUDE";
|
|
|
|
else
|
|
|
|
{
|
|
|
|
elog(ERROR, "unknown constraint type");
|
|
|
|
constraint_type = NULL; /* keep compiler quiet */
|
|
|
|
}
|
|
|
|
|
2012-07-05 02:34:24 +02:00
|
|
|
ereport(DEBUG1,
|
2021-02-17 11:24:46 +01:00
|
|
|
(errmsg_internal("%s %s will create implicit index \"%s\" for table \"%s\"",
|
2004-05-05 06:48:48 +02:00
|
|
|
is_alter_table ? "ALTER TABLE / ADD" : "CREATE TABLE /",
|
2009-12-07 06:22:23 +01:00
|
|
|
constraint_type,
|
2004-05-05 06:48:48 +02:00
|
|
|
indexRelationName, RelationGetRelationName(rel))));
|
2009-12-07 06:22:23 +01:00
|
|
|
}
|
2000-02-25 03:58:48 +01:00
|
|
|
|
2011-07-18 17:02:48 +02:00
|
|
|
/*
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
* A valid stmt->oldNode implies that we already have a built form of the
|
2011-07-18 17:02:48 +02:00
|
|
|
* index. The caller should also decline any index build.
|
|
|
|
*/
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
Assert(!OidIsValid(stmt->oldNode) || (skip_build && !concurrent));
|
2011-07-18 17:02:48 +02:00
|
|
|
|
2007-09-20 19:56:33 +02:00
|
|
|
/*
|
2017-11-14 15:19:05 +01:00
|
|
|
* Make the catalog entries for the index, including constraints. This
|
|
|
|
* step also actually builds the index, except if caller requested not to
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
* or in concurrent mode, in which case it'll be done later, or doing a
|
|
|
|
* partitioned index (because those don't have storage).
|
2007-09-20 19:56:33 +02:00
|
|
|
*/
|
2017-11-14 15:19:05 +01:00
|
|
|
flags = constr_flags = 0;
|
|
|
|
if (stmt->isconstraint)
|
|
|
|
flags |= INDEX_CREATE_ADD_CONSTRAINT;
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
if (skip_build || concurrent || partitioned)
|
2017-11-14 15:19:05 +01:00
|
|
|
flags |= INDEX_CREATE_SKIP_BUILD;
|
|
|
|
if (stmt->if_not_exists)
|
|
|
|
flags |= INDEX_CREATE_IF_NOT_EXISTS;
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
if (concurrent)
|
2017-11-14 15:19:05 +01:00
|
|
|
flags |= INDEX_CREATE_CONCURRENT;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (partitioned)
|
|
|
|
flags |= INDEX_CREATE_PARTITIONED;
|
2017-11-14 15:19:05 +01:00
|
|
|
if (stmt->primary)
|
|
|
|
flags |= INDEX_CREATE_IS_PRIMARY;
|
2018-12-05 17:31:51 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the table is partitioned, and recursion was declined but partitions
|
|
|
|
* exist, mark the index as invalid.
|
|
|
|
*/
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (partitioned && stmt->relation && !stmt->relation->inh)
|
2018-12-05 17:31:51 +01:00
|
|
|
{
|
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition. Which is bogus: once the detach operation
completes, the row becomes an orphan.
However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query. This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included). When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.
find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.
This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus. This commit also corrects that expected output.
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 21:13:25 +02:00
|
|
|
PartitionDesc pd = RelationGetPartitionDesc(rel, true);
|
2018-12-05 17:31:51 +01:00
|
|
|
|
|
|
|
if (pd->nparts != 0)
|
|
|
|
flags |= INDEX_CREATE_INVALID;
|
|
|
|
}
|
2017-11-14 15:19:05 +01:00
|
|
|
|
|
|
|
if (stmt->deferrable)
|
|
|
|
constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
|
|
|
|
if (stmt->initdeferred)
|
|
|
|
constr_flags |= INDEX_CONSTR_CREATE_INIT_DEFERRED;
|
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
indexRelationId =
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
index_create(rel, indexRelationName, indexRelationId, parentIndexId,
|
2018-02-19 20:59:37 +01:00
|
|
|
parentConstraintId,
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
stmt->oldNode, indexInfo, indexColNames,
|
2011-04-22 23:43:18 +02:00
|
|
|
accessMethodId, tablespaceId,
|
|
|
|
collationObjectId, classObjectId,
|
2017-11-14 15:19:05 +01:00
|
|
|
coloptions, reloptions,
|
|
|
|
flags, constr_flags,
|
2018-02-19 20:59:37 +01:00
|
|
|
allowSystemTableMods, !check_rights,
|
|
|
|
&createdConstraintId);
|
2014-11-06 10:48:33 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, RelationRelationId, indexRelationId);
|
|
|
|
|
2014-11-06 10:48:33 +01:00
|
|
|
if (!OidIsValid(indexRelationId))
|
|
|
|
{
|
2022-05-09 17:35:08 +02:00
|
|
|
/*
|
|
|
|
* Roll back any GUC changes executed by index functions. Also revert
|
|
|
|
* to original default_tablespace if we changed it above.
|
|
|
|
*/
|
|
|
|
AtEOXact_GUC(false, root_save_nestlevel);
|
|
|
|
|
|
|
|
/* Restore userid and security context */
|
|
|
|
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
|
|
|
|
/* If this is the top-level index, we're done */
|
|
|
|
if (!OidIsValid(parentIndexId))
|
|
|
|
pgstat_progress_end_command();
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2014-11-06 10:48:33 +01:00
|
|
|
}
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
/*
|
|
|
|
* Roll back any GUC changes executed by index functions, and keep
|
|
|
|
* subsequent changes local to this command. It's barely possible that
|
|
|
|
* some index function changed a behavior-affecting GUC, e.g. xmloption,
|
|
|
|
* that affects subsequent steps. This improves bug-compatibility with
|
|
|
|
* older PostgreSQL versions. They did the AtEOXact_GUC() here for the
|
|
|
|
* purpose of clearing the above default_tablespace change.
|
|
|
|
*/
|
|
|
|
AtEOXact_GUC(false, root_save_nestlevel);
|
|
|
|
root_save_nestlevel = NewGUCNestLevel();
|
|
|
|
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
/* Add any requested comment */
|
|
|
|
if (stmt->idxcomment != NULL)
|
|
|
|
CreateComments(indexRelationId, RelationRelationId, 0,
|
|
|
|
stmt->idxcomment);
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (partitioned)
|
|
|
|
{
|
2020-12-01 15:46:56 +01:00
|
|
|
PartitionDesc partdesc;
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
/*
|
|
|
|
* Unless caller specified to skip this step (via ONLY), process each
|
|
|
|
* partition to make sure they all contain a corresponding index.
|
|
|
|
*
|
|
|
|
* If we're called internally (no stmt->relation), recurse always.
|
|
|
|
*/
|
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition. Which is bogus: once the detach operation
completes, the row becomes an orphan.
However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query. This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included). When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.
find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.
This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus. This commit also corrects that expected output.
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 21:13:25 +02:00
|
|
|
partdesc = RelationGetPartitionDesc(rel, true);
|
2020-12-01 15:46:56 +01:00
|
|
|
if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
{
|
|
|
|
int nparts = partdesc->nparts;
|
|
|
|
Oid *part_oids = palloc(sizeof(Oid) * nparts);
|
|
|
|
bool invalidate_parent = false;
|
|
|
|
TupleDesc parentDesc;
|
|
|
|
Oid *opfamOids;
|
|
|
|
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
|
|
|
|
nparts);
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
|
|
|
|
|
2019-06-27 17:57:10 +02:00
|
|
|
parentDesc = RelationGetDescr(rel);
|
2018-04-12 16:25:13 +02:00
|
|
|
opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes);
|
|
|
|
for (i = 0; i < numberOfKeyAttributes; i++)
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
opfamOids[i] = get_opclass_family(classObjectId[i]);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For each partition, scan all existing indexes; if one matches
|
|
|
|
* our index definition and is not already attached to some other
|
|
|
|
* parent index, attach it to the one we just created.
|
|
|
|
*
|
|
|
|
* If none matches, build a new index by calling ourselves
|
|
|
|
* recursively with the same options (except for the index name).
|
|
|
|
*/
|
|
|
|
for (i = 0; i < nparts; i++)
|
|
|
|
{
|
|
|
|
Oid childRelid = part_oids[i];
|
|
|
|
Relation childrel;
|
2022-05-09 17:35:08 +02:00
|
|
|
Oid child_save_userid;
|
|
|
|
int child_save_sec_context;
|
|
|
|
int child_save_nestlevel;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
List *childidxs;
|
|
|
|
ListCell *cell;
|
2019-12-18 08:23:02 +01:00
|
|
|
AttrMap *attmap;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
bool found = false;
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
childrel = table_open(childRelid, lockmode);
|
2019-06-27 00:38:51 +02:00
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
GetUserIdAndSecContext(&child_save_userid,
|
|
|
|
&child_save_sec_context);
|
|
|
|
SetUserIdAndSecContext(childrel->rd_rel->relowner,
|
|
|
|
child_save_sec_context | SECURITY_RESTRICTED_OPERATION);
|
|
|
|
child_save_nestlevel = NewGUCNestLevel();
|
|
|
|
|
2019-06-27 00:38:51 +02:00
|
|
|
/*
|
|
|
|
* Don't try to create indexes on foreign tables, though. Skip
|
|
|
|
* those if a regular index, or fail if trying to create a
|
|
|
|
* constraint index.
|
|
|
|
*/
|
|
|
|
if (childrel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
|
|
|
|
{
|
|
|
|
if (stmt->unique || stmt->primary)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("cannot create unique index on partitioned table \"%s\"",
|
|
|
|
RelationGetRelationName(rel)),
|
|
|
|
errdetail("Table \"%s\" contains partitions that are foreign tables.",
|
|
|
|
RelationGetRelationName(rel))));
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
AtEOXact_GUC(false, child_save_nestlevel);
|
|
|
|
SetUserIdAndSecContext(child_save_userid,
|
|
|
|
child_save_sec_context);
|
2019-06-27 00:38:51 +02:00
|
|
|
table_close(childrel, lockmode);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
childidxs = RelationGetIndexList(childrel);
|
|
|
|
attmap =
|
2019-12-18 08:23:02 +01:00
|
|
|
build_attrmap_by_name(RelationGetDescr(childrel),
|
|
|
|
parentDesc);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
|
|
|
foreach(cell, childidxs)
|
|
|
|
{
|
|
|
|
Oid cldidxid = lfirst_oid(cell);
|
|
|
|
Relation cldidx;
|
|
|
|
IndexInfo *cldIdxInfo;
|
|
|
|
|
|
|
|
/* this index is already partition of another one */
|
|
|
|
if (has_superclass(cldidxid))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
cldidx = index_open(cldidxid, lockmode);
|
|
|
|
cldIdxInfo = BuildIndexInfo(cldidx);
|
|
|
|
if (CompareIndexInfo(cldIdxInfo, indexInfo,
|
|
|
|
cldidx->rd_indcollation,
|
|
|
|
collationObjectId,
|
|
|
|
cldidx->rd_opfamily,
|
|
|
|
opfamOids,
|
2019-12-18 08:23:02 +01:00
|
|
|
attmap))
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
{
|
2018-02-19 20:59:37 +01:00
|
|
|
Oid cldConstrOid = InvalidOid;
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
/*
|
2018-02-19 20:59:37 +01:00
|
|
|
* Found a match.
|
|
|
|
*
|
|
|
|
* If this index is being created in the parent
|
|
|
|
* because of a constraint, then the child needs to
|
|
|
|
* have a constraint also, so look for one. If there
|
|
|
|
* is no such constraint, this index is no good, so
|
|
|
|
* keep looking.
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
*/
|
2018-02-19 20:59:37 +01:00
|
|
|
if (createdConstraintId != InvalidOid)
|
|
|
|
{
|
|
|
|
cldConstrOid =
|
|
|
|
get_relation_idx_constraint_oid(childRelid,
|
|
|
|
cldidxid);
|
|
|
|
if (cldConstrOid == InvalidOid)
|
|
|
|
{
|
|
|
|
index_close(cldidx, lockmode);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Attach index to parent and we're done. */
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
IndexSetParentIndex(cldidx, indexRelationId);
|
2018-02-19 20:59:37 +01:00
|
|
|
if (createdConstraintId != InvalidOid)
|
|
|
|
ConstraintSetParentConstraint(cldConstrOid,
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
createdConstraintId,
|
|
|
|
childRelid);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
2018-12-27 10:07:46 +01:00
|
|
|
if (!cldidx->rd_index->indisvalid)
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
invalidate_parent = true;
|
|
|
|
|
|
|
|
found = true;
|
2018-02-19 20:59:37 +01:00
|
|
|
/* keep lock till commit */
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
index_close(cldidx, NoLock);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
index_close(cldidx, lockmode);
|
|
|
|
}
|
|
|
|
|
|
|
|
list_free(childidxs);
|
2022-05-09 17:35:08 +02:00
|
|
|
AtEOXact_GUC(false, child_save_nestlevel);
|
|
|
|
SetUserIdAndSecContext(child_save_userid,
|
|
|
|
child_save_sec_context);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(childrel, NoLock);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If no matching index was found, create our own.
|
|
|
|
*/
|
|
|
|
if (!found)
|
|
|
|
{
|
|
|
|
IndexStmt *childStmt = copyObject(stmt);
|
|
|
|
bool found_whole_row;
|
2018-06-22 21:12:53 +02:00
|
|
|
ListCell *lc;
|
|
|
|
|
Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.
In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.
Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.
In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0. Both of these changes still seem like smart
future-proofing.
This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to. However, fixing those problems will take more work which may not
get back-patched into v11. We need a fix for the corruption problem now.
Per bug #15672 from Jianing Yang.
Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.
Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 23:18:07 +02:00
|
|
|
/*
|
|
|
|
* We can't use the same index name for the child index,
|
|
|
|
* so clear idxname to let the recursive invocation choose
|
|
|
|
* a new name. Likewise, the existing target relation
|
|
|
|
* field is wrong, and if indexOid or oldNode are set,
|
|
|
|
* they mustn't be applied to the child either.
|
|
|
|
*/
|
|
|
|
childStmt->idxname = NULL;
|
|
|
|
childStmt->relation = NULL;
|
|
|
|
childStmt->indexOid = InvalidOid;
|
|
|
|
childStmt->oldNode = InvalidOid;
|
Skip WAL for new relfilenodes, under wal_level=minimal.
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
|
|
|
childStmt->oldCreateSubid = InvalidSubTransactionId;
|
|
|
|
childStmt->oldFirstRelfilenodeSubid = InvalidSubTransactionId;
|
Apply stopgap fix for bug #15672.
Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused
index relfilenode to a child index creation, and fix TryReuseIndex
to not think that reuse is sensible for a partitioned index.
In v11, this fixes a problem where ALTER TABLE on a partitioned table
could assign the same relfilenode to several different child indexes,
causing very nasty catalog corruption --- in fact, attempting to DROP
the partitioned table then leads not only to a database crash, but to
inability to restart because the same crash will recur during WAL replay.
Either of these two changes would be enough to prevent the failure, but
since neither action could possibly be sane, let's put in both changes
for future-proofing.
In HEAD, no such bug manifests, but that's just an accidental consequence
of having changed the pg_class representation of partitioned indexes to
have relfilenode = 0. Both of these changes still seem like smart
future-proofing.
This is only a stop-gap because the code for ALTER TABLE on a partitioned
table with a no-op type change still leaves a great deal to be desired.
As the added regression tests show, it gets things wrong for comments on
child indexes/constraints, and it is regenerating child indexes it doesn't
have to. However, fixing those problems will take more work which may not
get back-patched into v11. We need a fix for the corruption problem now.
Per bug #15672 from Jianing Yang.
Patch by me, regression test cases based on work by Amit Langote,
who also did a lot of the investigative work.
Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 23:18:07 +02:00
|
|
|
|
2018-06-22 21:12:53 +02:00
|
|
|
/*
|
|
|
|
* Adjust any Vars (both in expressions and in the index's
|
|
|
|
* WHERE clause) to match the partition's column numbering
|
|
|
|
* in case it's different from the parent's.
|
|
|
|
*/
|
|
|
|
foreach(lc, childStmt->indexParams)
|
|
|
|
{
|
|
|
|
IndexElem *ielem = lfirst(lc);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
2018-06-22 21:12:53 +02:00
|
|
|
/*
|
|
|
|
* If the index parameter is an expression, we must
|
|
|
|
* translate it to contain child Vars.
|
|
|
|
*/
|
|
|
|
if (ielem->expr)
|
|
|
|
{
|
|
|
|
ielem->expr =
|
|
|
|
map_variable_attnos((Node *) ielem->expr,
|
2019-12-18 08:23:02 +01:00
|
|
|
1, 0, attmap,
|
2018-06-22 21:12:53 +02:00
|
|
|
InvalidOid,
|
|
|
|
&found_whole_row);
|
|
|
|
if (found_whole_row)
|
|
|
|
elog(ERROR, "cannot convert whole-row table reference");
|
|
|
|
}
|
|
|
|
}
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
childStmt->whereClause =
|
|
|
|
map_variable_attnos(stmt->whereClause, 1, 0,
|
2019-12-18 08:23:02 +01:00
|
|
|
attmap,
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
InvalidOid, &found_whole_row);
|
|
|
|
if (found_whole_row)
|
|
|
|
elog(ERROR, "cannot convert whole-row table reference");
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
/*
|
|
|
|
* Recurse as the starting user ID. Callee will use that
|
|
|
|
* for permission checks, then switch again.
|
|
|
|
*/
|
|
|
|
Assert(GetUserId() == child_save_userid);
|
|
|
|
SetUserIdAndSecContext(root_save_userid,
|
|
|
|
root_save_sec_context);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
DefineIndex(childRelid, childStmt,
|
|
|
|
InvalidOid, /* no predefined OID */
|
|
|
|
indexRelationId, /* this is our child */
|
2018-02-19 20:59:37 +01:00
|
|
|
createdConstraintId,
|
2018-03-12 23:42:32 +01:00
|
|
|
is_alter_table, check_rights, check_not_in_use,
|
2018-06-29 17:27:57 +02:00
|
|
|
skip_build, quiet);
|
2022-05-09 17:35:08 +02:00
|
|
|
SetUserIdAndSecContext(child_save_userid,
|
|
|
|
child_save_sec_context);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
}
|
|
|
|
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
|
|
|
|
i + 1);
|
2019-12-18 08:23:02 +01:00
|
|
|
free_attrmap(attmap);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The pg_index row we inserted for this index was marked
|
|
|
|
* indisvalid=true. But if we attached an existing index that is
|
|
|
|
* invalid, this is incorrect, so update our row to invalid too.
|
|
|
|
*/
|
|
|
|
if (invalidate_parent)
|
|
|
|
{
|
2019-01-21 19:32:19 +01:00
|
|
|
Relation pg_index = table_open(IndexRelationId, RowExclusiveLock);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
HeapTuple tup,
|
|
|
|
newtup;
|
|
|
|
|
|
|
|
tup = SearchSysCache1(INDEXRELID,
|
|
|
|
ObjectIdGetDatum(indexRelationId));
|
2019-05-05 19:10:07 +02:00
|
|
|
if (!HeapTupleIsValid(tup))
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
elog(ERROR, "cache lookup failed for index %u",
|
|
|
|
indexRelationId);
|
|
|
|
newtup = heap_copytuple(tup);
|
|
|
|
((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false;
|
|
|
|
CatalogTupleUpdate(pg_index, &tup->t_self, newtup);
|
|
|
|
ReleaseSysCache(tup);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(pg_index, RowExclusiveLock);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
heap_freetuple(newtup);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Indexes on partitioned tables are not themselves built, so we're
|
|
|
|
* done here.
|
|
|
|
*/
|
2022-05-09 17:35:08 +02:00
|
|
|
AtEOXact_GUC(false, root_save_nestlevel);
|
|
|
|
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
|
2019-06-27 17:57:10 +02:00
|
|
|
table_close(rel, NoLock);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
if (!OidIsValid(parentIndexId))
|
|
|
|
pgstat_progress_end_command();
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
return address;
|
|
|
|
}
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
AtEOXact_GUC(false, root_save_nestlevel);
|
|
|
|
SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
|
|
|
|
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
if (!concurrent)
|
2011-01-25 21:42:03 +01:00
|
|
|
{
|
|
|
|
/* Close the heap and we're done, in the non-concurrent case */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
|
|
|
|
/* If this is the top-level index, we're done. */
|
|
|
|
if (!OidIsValid(parentIndexId))
|
|
|
|
pgstat_progress_end_command();
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2011-01-25 21:42:03 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* save lockrelid and locktag for below, then close rel */
|
|
|
|
heaprelid = rel->rd_lockInfo.lockRelId;
|
|
|
|
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
|
|
|
/*
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
* For a concurrent build, it's important to make the catalog entries
|
|
|
|
* visible to other transactions before we start to build the index. That
|
|
|
|
* will prevent them from making incompatible HOT updates. The new index
|
|
|
|
* will be marked not indisready and not indisvalid, so that no one else
|
|
|
|
* tries to either insert into it or use it for queries.
|
|
|
|
*
|
2006-08-25 06:06:58 +02:00
|
|
|
* We must commit our current transaction so that the index becomes
|
|
|
|
* visible; then start another. Note that all the data structures we just
|
|
|
|
* built are lost in the commit. The only data we keep past here are the
|
|
|
|
* relation IDs.
|
|
|
|
*
|
|
|
|
* Before committing, get a session-level lock on the table, to ensure
|
|
|
|
* that neither it nor the index can be dropped before we finish. This
|
|
|
|
* cannot block, even if someone else is waiting for access, because we
|
|
|
|
* already have the same lock within our transaction.
|
|
|
|
*
|
|
|
|
* Note: we don't currently bother with a session lock on the index,
|
|
|
|
* because there are no operations that could change its state while we
|
|
|
|
* hold lock on the parent table. This might need to change later.
|
|
|
|
*/
|
|
|
|
LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
|
|
|
|
|
2008-05-12 22:02:02 +02:00
|
|
|
PopActiveSnapshot();
|
2006-08-25 06:06:58 +02:00
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2020-11-25 22:21:08 +01:00
|
|
|
/* Tell concurrent index builds to ignore us, if index qualifies */
|
|
|
|
if (safe_index)
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
/*
|
2021-02-22 06:21:40 +01:00
|
|
|
* The index is now visible, so we can report the OID. While on it,
|
|
|
|
* include the report for the beginning of phase 2.
|
2019-04-07 11:30:14 +02:00
|
|
|
*/
|
2021-02-22 06:21:40 +01:00
|
|
|
{
|
|
|
|
const int progress_cols[] = {
|
|
|
|
PROGRESS_CREATEIDX_INDEX_OID,
|
|
|
|
PROGRESS_CREATEIDX_PHASE
|
|
|
|
};
|
|
|
|
const int64 progress_vals[] = {
|
|
|
|
indexRelationId,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_1
|
|
|
|
};
|
|
|
|
|
|
|
|
pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
|
|
|
|
}
|
2019-04-07 11:30:14 +02:00
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
/*
|
2007-09-20 19:56:33 +02:00
|
|
|
* Phase 2 of concurrent index build (see comments for validate_index()
|
|
|
|
* for an overview of how this works)
|
|
|
|
*
|
2006-08-25 06:06:58 +02:00
|
|
|
* Now we must wait until no running transaction could have the table open
|
2014-01-03 17:22:03 +01:00
|
|
|
* with the old list of indexes. Use ShareLock to consider running
|
|
|
|
* transactions that hold locks that permit writing to the table. Note we
|
|
|
|
* do not need to worry about xacts that open the table for writing after
|
|
|
|
* this point; they will see the new index when they open it.
|
2006-08-27 21:14:34 +02:00
|
|
|
*
|
2007-09-05 20:10:48 +02:00
|
|
|
* Note: the reason we use actual lock acquisition here, rather than just
|
|
|
|
* checking the ProcArray and sleeping, is that deadlock is possible if
|
|
|
|
* one of the transactions in question is blocked trying to acquire an
|
|
|
|
* exclusive lock on our table. The lock code will detect deadlock and
|
|
|
|
* error out properly.
|
2006-08-25 06:06:58 +02:00
|
|
|
*/
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
WaitForLockers(heaplocktag, ShareLock, true);
|
2007-09-20 19:56:33 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* At this moment we are sure that there are no transactions with the
|
|
|
|
* table open for write that don't have this new index in their list of
|
|
|
|
* indexes. We have waited out all the existing transactions and any new
|
|
|
|
* transaction will have the new index in its list, but the index is still
|
|
|
|
* marked as "not-ready-for-inserts". The index is consulted while
|
|
|
|
* deciding HOT-safety though. This arrangement ensures that no new HOT
|
|
|
|
* chains can be created where the new tuple and the old tuple in the
|
|
|
|
* chain have different index keys.
|
|
|
|
*
|
|
|
|
* We now take a new snapshot, and build the index using all tuples that
|
|
|
|
* are visible in this snapshot. We can be sure that any HOT updates to
|
|
|
|
* these tuples will be compatible with the index, since any updates made
|
|
|
|
* by transactions that didn't know about the index are now committed or
|
|
|
|
* rolled back. Thus, each visible tuple is either the end of its
|
|
|
|
* HOT-chain or the extension of the chain is HOT-safe for this index.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Set ActiveSnapshot since functions in the indexes may need it */
|
2008-05-12 22:02:02 +02:00
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
2007-09-20 19:56:33 +02:00
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Perform concurrent build of index */
|
|
|
|
index_concurrently_build(relationId, indexRelationId);
|
2007-09-20 19:56:33 +02:00
|
|
|
|
2008-05-12 22:02:02 +02:00
|
|
|
/* we can do away with our snapshot */
|
|
|
|
PopActiveSnapshot();
|
|
|
|
|
2007-09-20 19:56:33 +02:00
|
|
|
/*
|
|
|
|
* Commit this transaction to make the indisready update visible.
|
|
|
|
*/
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2020-11-25 22:21:08 +01:00
|
|
|
/* Tell concurrent index builds to ignore us, if index qualifies */
|
|
|
|
if (safe_index)
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2007-09-20 19:56:33 +02:00
|
|
|
/*
|
|
|
|
* Phase 3 of concurrent index build
|
|
|
|
*
|
|
|
|
* We once again wait until no transaction can have the table open with
|
|
|
|
* the index marked as read-only for updates.
|
|
|
|
*/
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_2);
|
|
|
|
WaitForLockers(heaplocktag, ShareLock, true);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now take the "reference snapshot" that will be used by validate_index()
|
2007-09-20 19:56:33 +02:00
|
|
|
* to filter candidate tuples. Beware! There might still be snapshots in
|
2007-09-05 20:10:48 +02:00
|
|
|
* use that treat some transaction as in-progress that our reference
|
|
|
|
* snapshot treats as committed. If such a recently-committed transaction
|
|
|
|
* deleted tuples in the table, we will not include them in the index; yet
|
|
|
|
* those transactions which see the deleting one as still-in-progress will
|
2009-04-04 19:40:36 +02:00
|
|
|
* expect such tuples to be there once we mark the index as valid.
|
2007-09-05 20:10:48 +02:00
|
|
|
*
|
|
|
|
* We solve this by waiting for all endangered transactions to exit before
|
|
|
|
* we mark the index as valid.
|
2006-08-25 06:06:58 +02:00
|
|
|
*
|
|
|
|
* We also set ActiveSnapshot to this snap, since functions in indexes may
|
|
|
|
* need a snapshot.
|
|
|
|
*/
|
2008-05-12 22:02:02 +02:00
|
|
|
snapshot = RegisterSnapshot(GetTransactionSnapshot());
|
|
|
|
PushActiveSnapshot(snapshot);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan the index and the heap, insert any missing index entries.
|
|
|
|
*/
|
|
|
|
validate_index(relationId, indexRelationId, snapshot);
|
|
|
|
|
2013-04-25 22:58:05 +02:00
|
|
|
/*
|
|
|
|
* Drop the reference snapshot. We must do this before waiting out other
|
|
|
|
* snapshot holders, else we will deadlock against other processes also
|
|
|
|
* doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
|
|
|
|
* they must wait for. But first, save the snapshot's xmin to use as
|
|
|
|
* limitXmin for GetCurrentVirtualXIDs().
|
|
|
|
*/
|
|
|
|
limitXmin = snapshot->xmin;
|
|
|
|
|
|
|
|
PopActiveSnapshot();
|
|
|
|
UnregisterSnapshot(snapshot);
|
2018-04-18 18:07:37 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The snapshot subsystem could still contain registered snapshots that
|
|
|
|
* are holding back our process's advertised xmin; in particular, if
|
|
|
|
* default_transaction_isolation = serializable, there is a transaction
|
|
|
|
* snapshot that is still active. The CatalogSnapshot is likewise a
|
|
|
|
* hazard. To ensure no deadlocks, we must commit and start yet another
|
|
|
|
* transaction, and do our wait before any snapshot has been taken in it.
|
|
|
|
*/
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2020-11-25 22:21:08 +01:00
|
|
|
/* Tell concurrent index builds to ignore us, if index qualifies */
|
|
|
|
if (safe_index)
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2018-04-18 18:07:37 +02:00
|
|
|
/* We should now definitely not be advertising any xmin. */
|
2020-08-14 01:25:21 +02:00
|
|
|
Assert(MyProc->xmin == InvalidTransactionId);
|
2013-04-25 22:58:05 +02:00
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
/*
|
|
|
|
* The index is now valid in the sense that it contains all currently
|
|
|
|
* interesting tuples. But since it might not contain tuples deleted just
|
|
|
|
* before the reference snap was taken, we have to wait out any
|
2019-03-29 08:25:20 +01:00
|
|
|
* transactions that might have older snapshots.
|
2006-08-25 06:06:58 +02:00
|
|
|
*/
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_3);
|
|
|
|
WaitForOlderSnapshots(limitXmin, true);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
2007-05-02 23:08:46 +02:00
|
|
|
/*
|
|
|
|
* Index can now be marked valid -- update its pg_index entry
|
|
|
|
*/
|
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
|
|
|
index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
|
2006-08-25 06:06:58 +02:00
|
|
|
|
2007-05-02 23:08:46 +02:00
|
|
|
/*
|
|
|
|
* The pg_index update will cause backends (including this one) to update
|
|
|
|
* relcache entries for the index itself, but we should also send a
|
|
|
|
* relcache inval on the parent table to force replanning of cached plans.
|
|
|
|
* Otherwise existing sessions might fail to use the new index where it
|
2007-09-20 19:56:33 +02:00
|
|
|
* would be useful. (Note that our earlier commits did not create reasons
|
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation. The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY. This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions. Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.
To fix, add an additional boolean column "indislive" to pg_index, so that
the freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to be
back-patched, but in 9.2 we'll use a kluge consisting of overloading the
formerly-impossible state of indisvalid = true and indisready = false.)
In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update. The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.
In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate. These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.
Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation. Previously we
could have been left with stale values of some fields in an index relcache
entry. It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.
In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.
This will need to be back-patched, but in a noticeably different form,
so I'm committing it to HEAD before working on the back-patch.
Problem reported by Amit Kapila, diagnosis by Pavan Deolassee,
fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
|
|
|
* to replan; so relcache flush on the index itself was sufficient.)
|
2007-05-02 23:08:46 +02:00
|
|
|
*/
|
|
|
|
CacheInvalidateRelcacheByRelid(heaprelid.relId);
|
|
|
|
|
2006-08-25 06:06:58 +02:00
|
|
|
/*
|
|
|
|
* Last thing to do is release the session-level lock on the parent table.
|
|
|
|
*/
|
|
|
|
UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
|
2011-07-18 17:02:48 +02:00
|
|
|
|
Report progress of CREATE INDEX operations
This uses the progress reporting infrastructure added by c16dc1aca5e0,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific. The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan. (The index validation index scan requires
patching each AM, which has not been included here.)
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql
2019-04-02 20:18:08 +02:00
|
|
|
pgstat_progress_end_command();
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2010-05-27 17:59:10 +02:00
|
|
|
/*
|
|
|
|
* CheckMutability
|
|
|
|
* Test whether given expression is mutable
|
|
|
|
*/
|
|
|
|
static bool
|
|
|
|
CheckMutability(Expr *expr)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* First run the expression through the planner. This has a couple of
|
|
|
|
* important consequences. First, function default arguments will get
|
|
|
|
* inserted, which may affect volatility (consider "default now()").
|
|
|
|
* Second, inline-able functions will get inlined, which may allow us to
|
|
|
|
* conclude that the function is really less volatile than it's marked. As
|
|
|
|
* an example, polymorphic functions must be marked with the most volatile
|
|
|
|
* behavior that they have for any input type, but once we inline the
|
|
|
|
* function we may be able to conclude that it's not so volatile for the
|
|
|
|
* particular input type we're dealing with.
|
|
|
|
*
|
|
|
|
* We assume here that expression_planner() won't scribble on its input.
|
|
|
|
*/
|
|
|
|
expr = expression_planner(expr);
|
|
|
|
|
|
|
|
/* Now we can search for non-immutable functions */
|
|
|
|
return contain_mutable_functions((Node *) expr);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
|
|
|
* CheckPredicate
|
2003-12-28 22:57:37 +01:00
|
|
|
* Checks that the given partial-index predicate is valid.
|
2001-07-16 07:07:00 +02:00
|
|
|
*
|
|
|
|
* This used to also constrain the form of the predicate to forms that
|
|
|
|
* indxpath.c could do something with. However, that seems overly
|
|
|
|
* restrictive. One useful application of partial indexes is to apply
|
|
|
|
* a UNIQUE constraint across a subset of a table, and in that scenario
|
2017-02-06 10:33:58 +01:00
|
|
|
* any evaluable predicate will work. So accept any predicate here
|
2001-07-16 07:07:00 +02:00
|
|
|
* (except ones requiring a plan), and let indxpath.c fend for itself.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
static void
|
2003-12-28 22:57:37 +01:00
|
|
|
CheckPredicate(Expr *predicate)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2001-07-17 23:53:01 +02:00
|
|
|
/*
|
Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.
Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.
Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.
In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)
2012-08-10 17:35:33 +02:00
|
|
|
* transformExpr() should have already rejected subqueries, aggregates,
|
|
|
|
* and window functions, based on the EXPR_KIND_ for a predicate.
|
2001-07-17 23:53:01 +02:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2002-04-05 02:31:36 +02:00
|
|
|
* A predicate using mutable functions is probably wrong, for the same
|
2003-05-28 18:04:02 +02:00
|
|
|
* reasons that we don't allow an index expression to use one.
|
2001-07-17 23:53:01 +02:00
|
|
|
*/
|
2010-05-27 17:59:10 +02:00
|
|
|
if (CheckMutability(predicate))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("functions in index predicate must be marked IMMUTABLE")));
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2007-01-09 03:14:16 +01:00
|
|
|
/*
|
|
|
|
* Compute per-index-column information, including indexed column numbers
|
Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
|
|
|
* or index expressions, opclasses and their options. Note, all output vectors
|
2018-04-12 15:37:22 +02:00
|
|
|
* should be allocated for all columns, including "including" ones.
|
2007-01-09 03:14:16 +01:00
|
|
|
*/
|
1996-07-09 08:22:35 +02:00
|
|
|
static void
|
2003-05-28 18:04:02 +02:00
|
|
|
ComputeIndexAttrs(IndexInfo *indexInfo,
|
2012-01-25 21:28:07 +01:00
|
|
|
Oid *typeOidP,
|
2011-02-08 22:04:18 +01:00
|
|
|
Oid *collationOidP,
|
2003-05-28 18:04:02 +02:00
|
|
|
Oid *classOidP,
|
2007-01-09 03:14:16 +01:00
|
|
|
int16 *colOptionP,
|
2003-05-28 18:04:02 +02:00
|
|
|
List *attList, /* list of IndexElem's */
|
2009-12-07 06:22:23 +01:00
|
|
|
List *exclusionOpNames,
|
2003-05-28 18:04:02 +02:00
|
|
|
Oid relId,
|
2017-10-31 15:34:31 +01:00
|
|
|
const char *accessMethodName,
|
2004-05-05 06:48:48 +02:00
|
|
|
Oid accessMethodId,
|
2007-01-09 03:14:16 +01:00
|
|
|
bool amcanorder,
|
2004-05-05 06:48:48 +02:00
|
|
|
bool isconstraint)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2009-12-07 06:22:23 +01:00
|
|
|
ListCell *nextExclOp;
|
|
|
|
ListCell *lc;
|
|
|
|
int attn;
|
2018-04-07 22:00:39 +02:00
|
|
|
int nkeycols = indexInfo->ii_NumIndexKeyAttrs;
|
2009-12-07 06:22:23 +01:00
|
|
|
|
|
|
|
/* Allocate space for exclusion operator info, if needed */
|
|
|
|
if (exclusionOpNames)
|
|
|
|
{
|
2018-04-07 22:00:39 +02:00
|
|
|
Assert(list_length(exclusionOpNames) == nkeycols);
|
|
|
|
indexInfo->ii_ExclusionOps = (Oid *) palloc(sizeof(Oid) * nkeycols);
|
|
|
|
indexInfo->ii_ExclusionProcs = (Oid *) palloc(sizeof(Oid) * nkeycols);
|
|
|
|
indexInfo->ii_ExclusionStrats = (uint16 *) palloc(sizeof(uint16) * nkeycols);
|
2009-12-07 06:22:23 +01:00
|
|
|
nextExclOp = list_head(exclusionOpNames);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
nextExclOp = NULL;
|
1996-08-15 09:42:52 +02:00
|
|
|
|
1997-09-07 07:04:48 +02:00
|
|
|
/*
|
1996-07-09 08:22:35 +02:00
|
|
|
* process attributeList
|
1997-09-07 07:04:48 +02:00
|
|
|
*/
|
2009-12-07 06:22:23 +01:00
|
|
|
attn = 0;
|
|
|
|
foreach(lc, attList)
|
1997-09-07 07:04:48 +02:00
|
|
|
{
|
2009-12-07 06:22:23 +01:00
|
|
|
IndexElem *attribute = (IndexElem *) lfirst(lc);
|
2003-05-28 18:04:02 +02:00
|
|
|
Oid atttype;
|
2011-02-08 22:04:18 +01:00
|
|
|
Oid attcollation;
|
2000-02-25 03:58:48 +01:00
|
|
|
|
2007-01-09 03:14:16 +01:00
|
|
|
/*
|
|
|
|
* Process the column-or-expression to be indexed.
|
|
|
|
*/
|
2003-05-28 18:04:02 +02:00
|
|
|
if (attribute->name != NULL)
|
|
|
|
{
|
|
|
|
/* Simple index attribute */
|
|
|
|
HeapTuple atttuple;
|
|
|
|
Form_pg_attribute attform;
|
|
|
|
|
|
|
|
Assert(attribute->expr == NULL);
|
|
|
|
atttuple = SearchSysCacheAttName(relId, attribute->name);
|
|
|
|
if (!HeapTupleIsValid(atttuple))
|
2004-05-05 06:48:48 +02:00
|
|
|
{
|
|
|
|
/* difference in error message spellings is historical */
|
|
|
|
if (isconstraint)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_COLUMN),
|
|
|
|
errmsg("column \"%s\" named in key does not exist",
|
|
|
|
attribute->name)));
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_COLUMN),
|
|
|
|
errmsg("column \"%s\" does not exist",
|
|
|
|
attribute->name)));
|
|
|
|
}
|
2003-05-28 18:04:02 +02:00
|
|
|
attform = (Form_pg_attribute) GETSTRUCT(atttuple);
|
2018-04-12 12:02:45 +02:00
|
|
|
indexInfo->ii_IndexAttrNumbers[attn] = attform->attnum;
|
2003-05-28 18:04:02 +02:00
|
|
|
atttype = attform->atttypid;
|
2011-02-08 22:04:18 +01:00
|
|
|
attcollation = attform->attcollation;
|
2003-05-28 18:04:02 +02:00
|
|
|
ReleaseSysCache(atttuple);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Index expression */
|
2011-03-24 20:29:52 +01:00
|
|
|
Node *expr = attribute->expr;
|
2003-05-28 18:04:02 +02:00
|
|
|
|
2011-03-24 20:29:52 +01:00
|
|
|
Assert(expr != NULL);
|
2018-04-07 22:00:39 +02:00
|
|
|
|
|
|
|
if (attn >= nkeycols)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("expressions are not supported in included columns")));
|
2011-03-24 20:29:52 +01:00
|
|
|
atttype = exprType(expr);
|
|
|
|
attcollation = exprCollation(expr);
|
2003-05-28 18:04:02 +02:00
|
|
|
|
|
|
|
/*
|
2011-03-24 20:29:52 +01:00
|
|
|
* Strip any top-level COLLATE clause. This ensures that we treat
|
|
|
|
* "x COLLATE y" and "(x COLLATE y)" alike.
|
2003-05-28 18:04:02 +02:00
|
|
|
*/
|
2011-03-24 20:29:52 +01:00
|
|
|
while (IsA(expr, CollateExpr))
|
|
|
|
expr = (Node *) ((CollateExpr *) expr)->arg;
|
|
|
|
|
|
|
|
if (IsA(expr, Var) &&
|
|
|
|
((Var *) expr)->varattno != InvalidAttrNumber)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* User wrote "(column)" or "(column COLLATE something)".
|
|
|
|
* Treat it like simple attribute anyway.
|
|
|
|
*/
|
2018-04-12 12:02:45 +02:00
|
|
|
indexInfo->ii_IndexAttrNumbers[attn] = ((Var *) expr)->varattno;
|
2011-03-24 20:29:52 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2018-04-12 12:02:45 +02:00
|
|
|
indexInfo->ii_IndexAttrNumbers[attn] = 0; /* marks expression */
|
2011-03-24 20:29:52 +01:00
|
|
|
indexInfo->ii_Expressions = lappend(indexInfo->ii_Expressions,
|
|
|
|
expr);
|
|
|
|
|
|
|
|
/*
|
Centralize the logic for detecting misplaced aggregates, window funcs, etc.
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.
Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.
Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.
In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)
2012-08-10 17:35:33 +02:00
|
|
|
* transformExpr() should have already rejected subqueries,
|
|
|
|
* aggregates, and window functions, based on the EXPR_KIND_
|
|
|
|
* for an index expression.
|
2011-03-24 20:29:52 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
2015-04-26 18:42:31 +02:00
|
|
|
* An expression using mutable functions is probably wrong,
|
2011-03-24 20:29:52 +01:00
|
|
|
* since if you aren't going to get the same result for the
|
|
|
|
* same data every time, it's not clear what the index entries
|
|
|
|
* mean at all.
|
|
|
|
*/
|
|
|
|
if (CheckMutability((Expr *) expr))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("functions in index expression must be marked IMMUTABLE")));
|
|
|
|
}
|
2003-05-28 18:04:02 +02:00
|
|
|
}
|
1997-09-07 07:04:48 +02:00
|
|
|
|
2012-01-25 21:28:07 +01:00
|
|
|
typeOidP[attn] = atttype;
|
|
|
|
|
2018-04-12 15:37:22 +02:00
|
|
|
/*
|
|
|
|
* Included columns have no collation, no opclass and no ordering
|
|
|
|
* options.
|
|
|
|
*/
|
|
|
|
if (attn >= nkeycols)
|
|
|
|
{
|
|
|
|
if (attribute->collation)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("including column does not support a collation")));
|
|
|
|
if (attribute->opclass)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("including column does not support an operator class")));
|
|
|
|
if (attribute->ordering != SORTBY_DEFAULT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("including column does not support ASC/DESC options")));
|
|
|
|
if (attribute->nulls_ordering != SORTBY_NULLS_DEFAULT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("including column does not support NULLS FIRST/LAST options")));
|
|
|
|
|
|
|
|
classOidP[attn] = InvalidOid;
|
|
|
|
colOptionP[attn] = 0;
|
|
|
|
collationOidP[attn] = InvalidOid;
|
|
|
|
attn++;
|
|
|
|
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2011-02-08 22:04:18 +01:00
|
|
|
/*
|
2011-03-20 01:29:08 +01:00
|
|
|
* Apply collation override if any
|
2011-02-08 22:04:18 +01:00
|
|
|
*/
|
|
|
|
if (attribute->collation)
|
2011-03-20 01:29:08 +01:00
|
|
|
attcollation = get_collation_oid(attribute->collation, false);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check we have a collation iff it's a collatable type. The only
|
|
|
|
* expected failures here are (1) COLLATE applied to a noncollatable
|
|
|
|
* type, or (2) index expression had an unresolved collation. But we
|
|
|
|
* might as well code this to be a complete consistency check.
|
|
|
|
*/
|
|
|
|
if (type_is_collatable(atttype))
|
|
|
|
{
|
|
|
|
if (!OidIsValid(attcollation))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INDETERMINATE_COLLATION),
|
2011-03-22 21:55:32 +01:00
|
|
|
errmsg("could not determine which collation to use for index expression"),
|
2011-03-20 01:29:08 +01:00
|
|
|
errhint("Use the COLLATE clause to set the collation explicitly.")));
|
|
|
|
}
|
|
|
|
else
|
2011-02-08 22:04:18 +01:00
|
|
|
{
|
2011-03-20 01:29:08 +01:00
|
|
|
if (OidIsValid(attcollation))
|
2011-02-08 22:04:18 +01:00
|
|
|
ereport(ERROR,
|
2011-03-20 01:29:08 +01:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2011-02-08 22:04:18 +01:00
|
|
|
errmsg("collations are not supported by type %s",
|
|
|
|
format_type_be(atttype))));
|
|
|
|
}
|
2011-03-20 01:29:08 +01:00
|
|
|
|
2011-02-08 22:04:18 +01:00
|
|
|
collationOidP[attn] = attcollation;
|
|
|
|
|
2007-01-09 03:14:16 +01:00
|
|
|
/*
|
|
|
|
* Identify the opclass to use.
|
|
|
|
*/
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
classOidP[attn] = ResolveOpClass(attribute->opclass,
|
|
|
|
atttype,
|
|
|
|
accessMethodName,
|
|
|
|
accessMethodId);
|
2007-01-09 03:14:16 +01:00
|
|
|
|
2009-12-07 06:22:23 +01:00
|
|
|
/*
|
|
|
|
* Identify the exclusion operator, if any.
|
|
|
|
*/
|
|
|
|
if (nextExclOp)
|
|
|
|
{
|
|
|
|
List *opname = (List *) lfirst(nextExclOp);
|
|
|
|
Oid opid;
|
|
|
|
Oid opfamily;
|
|
|
|
int strat;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the operator --- it must accept the column datatype
|
|
|
|
* without runtime coercion (but binary compatibility is OK)
|
|
|
|
*/
|
|
|
|
opid = compatible_oper_opid(opname, atttype, atttype, false);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Only allow commutative operators to be used in exclusion
|
|
|
|
* constraints. If X conflicts with Y, but Y does not conflict
|
|
|
|
* with X, bad things will happen.
|
|
|
|
*/
|
|
|
|
if (get_commutator(opid) != opid)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("operator %s is not commutative",
|
|
|
|
format_operator(opid)),
|
|
|
|
errdetail("Only commutative operators can be used in exclusion constraints.")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Operator must be a member of the right opfamily, too
|
|
|
|
*/
|
|
|
|
opfamily = get_opclass_family(classOidP[attn]);
|
|
|
|
strat = get_op_opfamily_strategy(opid, opfamily);
|
|
|
|
if (strat == 0)
|
|
|
|
{
|
|
|
|
HeapTuple opftuple;
|
|
|
|
Form_pg_opfamily opfform;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* attribute->opclass might not explicitly name the opfamily,
|
|
|
|
* so fetch the name of the selected opfamily for use in the
|
|
|
|
* error message.
|
|
|
|
*/
|
2010-02-14 19:42:19 +01:00
|
|
|
opftuple = SearchSysCache1(OPFAMILYOID,
|
|
|
|
ObjectIdGetDatum(opfamily));
|
2009-12-07 06:22:23 +01:00
|
|
|
if (!HeapTupleIsValid(opftuple))
|
|
|
|
elog(ERROR, "cache lookup failed for opfamily %u",
|
|
|
|
opfamily);
|
|
|
|
opfform = (Form_pg_opfamily) GETSTRUCT(opftuple);
|
|
|
|
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("operator %s is not a member of operator family \"%s\"",
|
|
|
|
format_operator(opid),
|
|
|
|
NameStr(opfform->opfname)),
|
|
|
|
errdetail("The exclusion operator must be related to the index operator class for the constraint.")));
|
|
|
|
}
|
|
|
|
|
|
|
|
indexInfo->ii_ExclusionOps[attn] = opid;
|
|
|
|
indexInfo->ii_ExclusionProcs[attn] = get_opcode(opid);
|
|
|
|
indexInfo->ii_ExclusionStrats[attn] = strat;
|
Represent Lists as expansible arrays, not chains of cons-cells.
Originally, Postgres Lists were a more or less exact reimplementation of
Lisp lists, which consist of chains of separately-allocated cons cells,
each having a value and a next-cell link. We'd hacked that once before
(commit d0b4399d8) to add a separate List header, but the data was still
in cons cells. That makes some operations -- notably list_nth() -- O(N),
and it's bulky because of the next-cell pointers and per-cell palloc
overhead, and it's very cache-unfriendly if the cons cells end up
scattered around rather than being adjacent.
In this rewrite, we still have List headers, but the data is in a
resizable array of values, with no next-cell links. Now we need at
most two palloc's per List, and often only one, since we can allocate
some values in the same palloc call as the List header. (Of course,
extending an existing List may require repalloc's to enlarge the array.
But this involves just O(log N) allocations not O(N).)
Of course this is not without downsides. The key difficulty is that
addition or deletion of a list entry may now cause other entries to
move, which it did not before.
For example, that breaks foreach() and sister macros, which historically
used a pointer to the current cons-cell as loop state. We can repair
those macros transparently by making their actual loop state be an
integer list index; the exposed "ListCell *" pointer is no longer state
carried across loop iterations, but is just a derived value. (In
practice, modern compilers can optimize things back to having just one
loop state value, at least for simple cases with inline loop bodies.)
In principle, this is a semantics change for cases where the loop body
inserts or deletes list entries ahead of the current loop index; but
I found no such cases in the Postgres code.
The change is not at all transparent for code that doesn't use foreach()
but chases lists "by hand" using lnext(). The largest share of such
code in the backend is in loops that were maintaining "prev" and "next"
variables in addition to the current-cell pointer, in order to delete
list cells efficiently using list_delete_cell(). However, we no longer
need a previous-cell pointer to delete a list cell efficiently. Keeping
a next-cell pointer doesn't work, as explained above, but we can improve
matters by changing such code to use a regular foreach() loop and then
using the new macro foreach_delete_current() to delete the current cell.
(This macro knows how to update the associated foreach loop's state so
that no cells will be missed in the traversal.)
There remains a nontrivial risk of code assuming that a ListCell *
pointer will remain good over an operation that could now move the list
contents. To help catch such errors, list.c can be compiled with a new
define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents
whenever that could possibly happen. This makes list operations
significantly more expensive so it's not normally turned on (though it
is on by default if USE_VALGRIND is on).
There are two notable API differences from the previous code:
* lnext() now requires the List's header pointer in addition to the
current cell's address.
* list_delete_cell() no longer requires a previous-cell argument.
These changes are somewhat unfortunate, but on the other hand code using
either function needs inspection to see if it is assuming anything
it shouldn't, so it's not all bad.
Programmers should be aware of these significant performance changes:
* list_nth() and related functions are now O(1); so there's no
major access-speed difference between a list and an array.
* Inserting or deleting a list element now takes time proportional to
the distance to the end of the list, due to moving the array elements.
(However, it typically *doesn't* require palloc or pfree, so except in
long lists it's probably still faster than before.) Notably, lcons()
used to be about the same cost as lappend(), but that's no longer true
if the list is long. Code that uses lcons() and list_delete_first()
to maintain a stack might usefully be rewritten to push and pop at the
end of the list rather than the beginning.
* There are now list_insert_nth...() and list_delete_nth...() functions
that add or remove a list cell identified by index. These have the
data-movement penalty explained above, but there's no search penalty.
* list_concat() and variants now copy the second list's data into
storage belonging to the first list, so there is no longer any
sharing of cells between the input lists. The second argument is
now declared "const List *" to reflect that it isn't changed.
This patch just does the minimum needed to get the new implementation
in place and fix bugs exposed by the regression tests. As suggested
by the foregoing, there's a fair amount of followup work remaining to
do.
Also, the ENABLE_LIST_COMPAT macros are finally removed in this
commit. Code using those should have been gone a dozen years ago.
Patch by me; thanks to David Rowley, Jesper Pedersen, and others
for review.
Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
2019-07-15 19:41:58 +02:00
|
|
|
nextExclOp = lnext(exclusionOpNames, nextExclOp);
|
2009-12-07 06:22:23 +01:00
|
|
|
}
|
|
|
|
|
2007-01-09 03:14:16 +01:00
|
|
|
/*
|
|
|
|
* Set up the per-column options (indoption field). For now, this is
|
|
|
|
* zero for any un-ordered index, while ordered indexes have DESC and
|
|
|
|
* NULLS FIRST/LAST options.
|
|
|
|
*/
|
|
|
|
colOptionP[attn] = 0;
|
|
|
|
if (amcanorder)
|
|
|
|
{
|
|
|
|
/* default ordering is ASC */
|
|
|
|
if (attribute->ordering == SORTBY_DESC)
|
|
|
|
colOptionP[attn] |= INDOPTION_DESC;
|
|
|
|
/* default null ordering is LAST for ASC, FIRST for DESC */
|
|
|
|
if (attribute->nulls_ordering == SORTBY_NULLS_DEFAULT)
|
|
|
|
{
|
|
|
|
if (attribute->ordering == SORTBY_DESC)
|
|
|
|
colOptionP[attn] |= INDOPTION_NULLS_FIRST;
|
|
|
|
}
|
|
|
|
else if (attribute->nulls_ordering == SORTBY_NULLS_FIRST)
|
|
|
|
colOptionP[attn] |= INDOPTION_NULLS_FIRST;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* index AM does not support ordering */
|
|
|
|
if (attribute->ordering != SORTBY_DEFAULT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("access method \"%s\" does not support ASC/DESC options",
|
|
|
|
accessMethodName)));
|
|
|
|
if (attribute->nulls_ordering != SORTBY_NULLS_DEFAULT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("access method \"%s\" does not support NULLS FIRST/LAST options",
|
|
|
|
accessMethodName)));
|
|
|
|
}
|
|
|
|
|
Implement operator class parameters
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
|
|
|
/* Set up the per-column opclass options (attoptions field). */
|
|
|
|
if (attribute->opclassopts)
|
|
|
|
{
|
|
|
|
Assert(attn < nkeycols);
|
|
|
|
|
|
|
|
if (!indexInfo->ii_OpclassOptions)
|
|
|
|
indexInfo->ii_OpclassOptions =
|
|
|
|
palloc0(sizeof(Datum) * indexInfo->ii_NumIndexAttrs);
|
|
|
|
|
|
|
|
indexInfo->ii_OpclassOptions[attn] =
|
|
|
|
transformRelOptions((Datum) 0, attribute->opclassopts,
|
|
|
|
NULL, NULL, false, false);
|
|
|
|
}
|
|
|
|
|
2000-07-15 00:18:02 +02:00
|
|
|
attn++;
|
2000-02-25 03:58:48 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2003-05-28 18:04:02 +02:00
|
|
|
/*
|
|
|
|
* Resolve possibly-defaulted operator class specification
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
*
|
2020-03-05 21:36:06 +01:00
|
|
|
* Note: This is used to resolve operator class specifications in index and
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
* partition key definitions.
|
2003-05-28 18:04:02 +02:00
|
|
|
*/
|
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
|
|
|
Oid
|
|
|
|
ResolveOpClass(List *opclass, Oid attrType,
|
2017-10-31 15:34:31 +01:00
|
|
|
const char *accessMethodName, Oid accessMethodId)
|
2000-02-25 03:58:48 +01:00
|
|
|
{
|
2002-07-30 01:46:35 +02:00
|
|
|
char *schemaname;
|
|
|
|
char *opcname;
|
2000-02-25 03:58:48 +01:00
|
|
|
HeapTuple tuple;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Form_pg_opclass opform;
|
2000-04-25 04:45:54 +02:00
|
|
|
Oid opClassId,
|
2001-08-21 18:36:06 +02:00
|
|
|
opInputType;
|
2000-02-25 03:58:48 +01:00
|
|
|
|
2003-05-28 18:04:02 +02:00
|
|
|
if (opclass == NIL)
|
2000-02-25 03:58:48 +01:00
|
|
|
{
|
|
|
|
/* no operator class specified, so find the default */
|
2001-08-21 18:36:06 +02:00
|
|
|
opClassId = GetDefaultOpClass(attrType, accessMethodId);
|
|
|
|
if (!OidIsValid(opClassId))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("data type %s has no default operator class for access method \"%s\"",
|
|
|
|
format_type_be(attrType), accessMethodName),
|
|
|
|
errhint("You must specify an operator class for the index or define a default operator class for the data type.")));
|
2001-08-21 18:36:06 +02:00
|
|
|
return opClassId;
|
2000-02-25 03:58:48 +01:00
|
|
|
}
|
|
|
|
|
2000-04-23 03:44:55 +02:00
|
|
|
/*
|
2002-04-17 22:57:57 +02:00
|
|
|
* Specific opclass name given, so look up the opclass.
|
2000-04-23 03:44:55 +02:00
|
|
|
*/
|
2002-04-17 22:57:57 +02:00
|
|
|
|
|
|
|
/* deconstruct the name list */
|
2003-05-28 18:04:02 +02:00
|
|
|
DeconstructQualifiedName(opclass, &schemaname, &opcname);
|
2002-04-17 22:57:57 +02:00
|
|
|
|
|
|
|
if (schemaname)
|
|
|
|
{
|
|
|
|
/* Look in specific schema only */
|
|
|
|
Oid namespaceId;
|
|
|
|
|
2013-01-26 19:24:50 +01:00
|
|
|
namespaceId = LookupExplicitNamespace(schemaname, false);
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCache3(CLAAMNAMENSP,
|
|
|
|
ObjectIdGetDatum(accessMethodId),
|
|
|
|
PointerGetDatum(opcname),
|
|
|
|
ObjectIdGetDatum(namespaceId));
|
2002-04-17 22:57:57 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Unqualified opclass name, so search the search path */
|
|
|
|
opClassId = OpclassnameGetOpcid(accessMethodId, opcname);
|
|
|
|
if (!OidIsValid(opClassId))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("operator class \"%s\" does not exist for access method \"%s\"",
|
|
|
|
opcname, accessMethodName)));
|
2010-02-14 19:42:19 +01:00
|
|
|
tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opClassId));
|
2002-04-17 22:57:57 +02:00
|
|
|
}
|
|
|
|
|
2001-08-21 18:36:06 +02:00
|
|
|
if (!HeapTupleIsValid(tuple))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("operator class \"%s\" does not exist for access method \"%s\"",
|
|
|
|
NameListToString(opclass), accessMethodName)));
|
2002-04-17 22:57:57 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Verify that the index operator class accepts this datatype. Note we
|
|
|
|
* will accept binary compatibility.
|
|
|
|
*/
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
opform = (Form_pg_opclass) GETSTRUCT(tuple);
|
|
|
|
opClassId = opform->oid;
|
|
|
|
opInputType = opform->opcintype;
|
2000-04-23 03:44:55 +02:00
|
|
|
|
Extend pg_cast castimplicit column to a three-way value; this allows us
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
2002-09-18 23:35:25 +02:00
|
|
|
if (!IsBinaryCoercible(attrType, opInputType))
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
|
|
|
errmsg("operator class \"%s\" does not accept data type %s",
|
|
|
|
NameListToString(opclass), format_type_be(attrType))));
|
2002-04-17 22:57:57 +02:00
|
|
|
|
|
|
|
ReleaseSysCache(tuple);
|
2000-04-25 04:45:54 +02:00
|
|
|
|
2001-08-21 18:36:06 +02:00
|
|
|
return opClassId;
|
|
|
|
}
|
|
|
|
|
2006-02-10 20:01:12 +01:00
|
|
|
/*
|
|
|
|
* GetDefaultOpClass
|
|
|
|
*
|
|
|
|
* Given the OIDs of a datatype and an access method, find the default
|
|
|
|
* operator class, if any. Returns InvalidOid if there is none.
|
|
|
|
*/
|
|
|
|
Oid
|
|
|
|
GetDefaultOpClass(Oid type_id, Oid am_id)
|
2001-08-21 18:36:06 +02:00
|
|
|
{
|
2006-12-23 01:43:13 +01:00
|
|
|
Oid result = InvalidOid;
|
2001-08-21 18:36:06 +02:00
|
|
|
int nexact = 0;
|
|
|
|
int ncompatible = 0;
|
2006-12-23 01:43:13 +01:00
|
|
|
int ncompatiblepreferred = 0;
|
2006-02-10 20:01:12 +01:00
|
|
|
Relation rel;
|
|
|
|
ScanKeyData skey[1];
|
|
|
|
SysScanDesc scan;
|
|
|
|
HeapTuple tup;
|
Replace the hard-wired type knowledge in TypeCategory() and IsPreferredType()
with system catalog lookups, as was foreseen to be necessary almost since
their creation. Instead put the information into two new pg_type columns,
typcategory and typispreferred. Add support for setting these when
creating a user-defined base type.
The category column is just a "char" (i.e. a poor man's enum), allowing
a crude form of user extensibility of the category list: just use an
otherwise-unused character. This seems sufficient for foreseen uses,
but we could upgrade to having an actual category catalog someday, if
there proves to be a huge demand for custom type categories.
In this patch I have attempted to hew exactly to the behavior of the
previous hardwired logic, except for introducing new type categories for
arrays, composites, and enums. In particular the default preferred state
for user-defined types remains TRUE. That seems worth revisiting, but it
should be done as a separate patch from introducing the infrastructure.
Likewise, any adjustment of the standard set of categories should be done
separately.
2008-07-30 19:05:05 +02:00
|
|
|
TYPCATEGORY tcategory;
|
2000-02-25 03:58:48 +01:00
|
|
|
|
2002-08-16 22:55:09 +02:00
|
|
|
/* If it's a domain, look at the base type instead */
|
2006-02-10 20:01:12 +01:00
|
|
|
type_id = getBaseType(type_id);
|
2002-08-16 22:55:09 +02:00
|
|
|
|
2006-12-23 01:43:13 +01:00
|
|
|
tcategory = TypeCategory(type_id);
|
|
|
|
|
2000-04-25 04:45:54 +02:00
|
|
|
/*
|
2001-08-21 18:36:06 +02:00
|
|
|
* We scan through all the opclasses available for the access method,
|
|
|
|
* looking for one that is marked default and matches the target type
|
|
|
|
* (either exactly or binary-compatibly, but prefer an exact match).
|
|
|
|
*
|
2006-12-23 01:43:13 +01:00
|
|
|
* We could find more than one binary-compatible match. If just one is
|
|
|
|
* for a preferred type, use that one; otherwise we fail, forcing the user
|
|
|
|
* to specify which one he wants. (The preferred-type special case is a
|
|
|
|
* kluge for varchar: it's binary-compatible to both text and bpchar, so
|
|
|
|
* we need a tiebreaker.) If we find more than one exact match, then
|
|
|
|
* someone put bogus entries in pg_opclass.
|
2000-04-25 04:45:54 +02:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(OperatorClassRelationId, AccessShareLock);
|
2006-02-10 20:01:12 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&skey[0],
|
2006-12-23 01:43:13 +01:00
|
|
|
Anum_pg_opclass_opcmethod,
|
2006-02-10 20:01:12 +01:00
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(am_id));
|
|
|
|
|
|
|
|
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
NULL, 1, skey);
|
2006-02-10 20:01:12 +01:00
|
|
|
|
|
|
|
while (HeapTupleIsValid(tup = systable_getnext(scan)))
|
2000-04-25 04:45:54 +02:00
|
|
|
{
|
2006-02-10 20:01:12 +01:00
|
|
|
Form_pg_opclass opclass = (Form_pg_opclass) GETSTRUCT(tup);
|
|
|
|
|
2006-12-23 01:43:13 +01:00
|
|
|
/* ignore altogether if not a default opclass */
|
|
|
|
if (!opclass->opcdefault)
|
|
|
|
continue;
|
|
|
|
if (opclass->opcintype == type_id)
|
|
|
|
{
|
|
|
|
nexact++;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
result = opclass->oid;
|
2006-12-23 01:43:13 +01:00
|
|
|
}
|
|
|
|
else if (nexact == 0 &&
|
|
|
|
IsBinaryCoercible(type_id, opclass->opcintype))
|
2000-04-25 04:45:54 +02:00
|
|
|
{
|
2006-12-23 01:43:13 +01:00
|
|
|
if (IsPreferredType(tcategory, opclass->opcintype))
|
2001-08-21 18:36:06 +02:00
|
|
|
{
|
2006-12-23 01:43:13 +01:00
|
|
|
ncompatiblepreferred++;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
result = opclass->oid;
|
2001-08-21 18:36:06 +02:00
|
|
|
}
|
2006-12-23 01:43:13 +01:00
|
|
|
else if (ncompatiblepreferred == 0)
|
2001-08-21 18:36:06 +02:00
|
|
|
{
|
|
|
|
ncompatible++;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
result = opclass->oid;
|
2001-08-21 18:36:06 +02:00
|
|
|
}
|
2000-04-25 04:45:54 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2006-02-10 20:01:12 +01:00
|
|
|
systable_endscan(scan);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, AccessShareLock);
|
2006-02-10 20:01:12 +01:00
|
|
|
|
2006-12-23 01:43:13 +01:00
|
|
|
/* raise error if pg_opclass contains inconsistent data */
|
|
|
|
if (nexact > 1)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("there are multiple default operator classes for data type %s",
|
2006-02-10 20:01:12 +01:00
|
|
|
format_type_be(type_id))));
|
2006-12-23 01:43:13 +01:00
|
|
|
|
|
|
|
if (nexact == 1 ||
|
|
|
|
ncompatiblepreferred == 1 ||
|
|
|
|
(ncompatiblepreferred == 0 && ncompatible == 1))
|
|
|
|
return result;
|
2000-11-16 23:30:52 +01:00
|
|
|
|
2001-08-21 18:36:06 +02:00
|
|
|
return InvalidOid;
|
1996-08-15 09:42:52 +02:00
|
|
|
}
|
|
|
|
|
2004-05-05 06:48:48 +02:00
|
|
|
/*
|
2004-06-10 19:56:03 +02:00
|
|
|
* makeObjectName()
|
|
|
|
*
|
Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so. Patch it up so that it does. Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.
While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.
Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.
In pg11, comments on statistics objects are cloned too. In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it. Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason. Any corresponding user-visible changes (docs) are backpatched,
though.
Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-05 23:37:19 +01:00
|
|
|
* Create a name for an implicitly created index, sequence, constraint,
|
|
|
|
* extended statistics, etc.
|
2004-06-10 19:56:03 +02:00
|
|
|
*
|
|
|
|
* The parameters are typically: the original table name, the original field
|
|
|
|
* name, and a "type" string (such as "seq" or "pkey"). The field name
|
|
|
|
* and/or type can be NULL if not relevant.
|
|
|
|
*
|
|
|
|
* The result is a palloc'd string.
|
|
|
|
*
|
|
|
|
* The basic result we want is "name1_name2_label", omitting "_name2" or
|
|
|
|
* "_label" when those parameters are NULL. However, we must generate
|
|
|
|
* a name with less than NAMEDATALEN characters! So, we truncate one or
|
|
|
|
* both names if necessary to make a short-enough string. The label part
|
|
|
|
* is never truncated (so it had better be reasonably short).
|
|
|
|
*
|
|
|
|
* The caller is responsible for checking uniqueness of the generated
|
|
|
|
* name and retrying as needed; retrying will be done by altering the
|
|
|
|
* "label" string (which is why we never truncate that part).
|
2004-05-05 06:48:48 +02:00
|
|
|
*/
|
2004-06-10 19:56:03 +02:00
|
|
|
char *
|
|
|
|
makeObjectName(const char *name1, const char *name2, const char *label)
|
2004-05-05 06:48:48 +02:00
|
|
|
{
|
2004-06-10 19:56:03 +02:00
|
|
|
char *name;
|
|
|
|
int overhead = 0; /* chars needed for label and underscores */
|
|
|
|
int availchars; /* chars available for name(s) */
|
|
|
|
int name1chars; /* chars allocated to name1 */
|
|
|
|
int name2chars; /* chars allocated to name2 */
|
|
|
|
int ndx;
|
|
|
|
|
|
|
|
name1chars = strlen(name1);
|
|
|
|
if (name2)
|
|
|
|
{
|
|
|
|
name2chars = strlen(name2);
|
|
|
|
overhead++; /* allow for separating underscore */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
name2chars = 0;
|
|
|
|
if (label)
|
|
|
|
overhead += strlen(label) + 1;
|
|
|
|
|
|
|
|
availchars = NAMEDATALEN - 1 - overhead;
|
|
|
|
Assert(availchars > 0); /* else caller chose a bad label */
|
2004-05-05 06:48:48 +02:00
|
|
|
|
|
|
|
/*
|
2022-01-25 01:40:04 +01:00
|
|
|
* If we must truncate, preferentially truncate the longer name. This
|
2004-06-10 19:56:03 +02:00
|
|
|
* logic could be expressed without a loop, but it's simple and obvious as
|
|
|
|
* a loop.
|
2004-05-05 06:48:48 +02:00
|
|
|
*/
|
2004-06-10 19:56:03 +02:00
|
|
|
while (name1chars + name2chars > availchars)
|
|
|
|
{
|
|
|
|
if (name1chars > name2chars)
|
|
|
|
name1chars--;
|
|
|
|
else
|
|
|
|
name2chars--;
|
|
|
|
}
|
|
|
|
|
2005-06-21 02:35:05 +02:00
|
|
|
name1chars = pg_mbcliplen(name1, name1chars, name1chars);
|
2004-06-10 19:56:03 +02:00
|
|
|
if (name2)
|
|
|
|
name2chars = pg_mbcliplen(name2, name2chars, name2chars);
|
|
|
|
|
|
|
|
/* Now construct the string using the chosen lengths */
|
|
|
|
name = palloc(name1chars + name2chars + overhead + 1);
|
|
|
|
memcpy(name, name1, name1chars);
|
|
|
|
ndx = name1chars;
|
|
|
|
if (name2)
|
|
|
|
{
|
|
|
|
name[ndx++] = '_';
|
|
|
|
memcpy(name + ndx, name2, name2chars);
|
|
|
|
ndx += name2chars;
|
|
|
|
}
|
|
|
|
if (label)
|
|
|
|
{
|
|
|
|
name[ndx++] = '_';
|
|
|
|
strcpy(name + ndx, label);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
name[ndx] = '\0';
|
|
|
|
|
|
|
|
return name;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Select a nonconflicting name for a new relation. This is ordinarily
|
|
|
|
* used to choose index names (which is why it's here) but it can also
|
|
|
|
* be used for sequences, or any autogenerated relation kind.
|
|
|
|
*
|
|
|
|
* name1, name2, and label are used the same way as for makeObjectName(),
|
|
|
|
* except that the label can't be NULL; digits will be appended to the label
|
|
|
|
* if needed to create a name that is unique within the specified namespace.
|
|
|
|
*
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
* If isconstraint is true, we also avoid choosing a name matching any
|
|
|
|
* existing constraint in the same namespace. (This is stricter than what
|
|
|
|
* Postgres itself requires, but the SQL standard says that constraint names
|
|
|
|
* should be unique within schemas, so we follow that for autogenerated
|
|
|
|
* constraint names.)
|
|
|
|
*
|
2004-06-10 19:56:03 +02:00
|
|
|
* Note: it is theoretically possible to get a collision anyway, if someone
|
|
|
|
* else chooses the same name concurrently. This is fairly unlikely to be
|
|
|
|
* a problem in practice, especially if one is holding an exclusive lock on
|
|
|
|
* the relation identified by name1. However, if choosing multiple names
|
|
|
|
* within a single command, you'd better create the new object and do
|
|
|
|
* CommandCounterIncrement before choosing the next one!
|
|
|
|
*
|
|
|
|
* Returns a palloc'd string.
|
|
|
|
*/
|
|
|
|
char *
|
|
|
|
ChooseRelationName(const char *name1, const char *name2,
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
const char *label, Oid namespaceid,
|
|
|
|
bool isconstraint)
|
2004-06-10 19:56:03 +02:00
|
|
|
{
|
|
|
|
int pass = 0;
|
|
|
|
char *relname = NULL;
|
|
|
|
char modlabel[NAMEDATALEN];
|
|
|
|
|
|
|
|
/* try the unmodified label first */
|
2020-08-10 18:51:31 +02:00
|
|
|
strlcpy(modlabel, label, sizeof(modlabel));
|
2004-05-05 06:48:48 +02:00
|
|
|
|
|
|
|
for (;;)
|
|
|
|
{
|
2004-06-10 19:56:03 +02:00
|
|
|
relname = makeObjectName(name1, name2, modlabel);
|
2004-05-05 06:48:48 +02:00
|
|
|
|
2009-07-16 08:33:46 +02:00
|
|
|
if (!OidIsValid(get_relname_relid(relname, namespaceid)))
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
{
|
|
|
|
if (!isconstraint ||
|
|
|
|
!ConstraintNameExists(relname, namespaceid))
|
|
|
|
break;
|
|
|
|
}
|
2004-05-05 06:48:48 +02:00
|
|
|
|
|
|
|
/* found a conflict, so try a new name component */
|
2004-06-10 19:56:03 +02:00
|
|
|
pfree(relname);
|
|
|
|
snprintf(modlabel, sizeof(modlabel), "%s%d", label, ++pass);
|
2004-05-05 06:48:48 +02:00
|
|
|
}
|
|
|
|
|
2004-06-10 19:56:03 +02:00
|
|
|
return relname;
|
2004-05-05 06:48:48 +02:00
|
|
|
}
|
|
|
|
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
/*
|
|
|
|
* Select the name to be used for an index.
|
|
|
|
*
|
|
|
|
* The argument list is pretty ad-hoc :-(
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
static char *
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
ChooseIndexName(const char *tabname, Oid namespaceId,
|
|
|
|
List *colnames, List *exclusionOpNames,
|
|
|
|
bool primary, bool isconstraint)
|
|
|
|
{
|
|
|
|
char *indexname;
|
|
|
|
|
|
|
|
if (primary)
|
|
|
|
{
|
|
|
|
/* the primary key's name does not depend on the specific column(s) */
|
|
|
|
indexname = ChooseRelationName(tabname,
|
|
|
|
NULL,
|
|
|
|
"pkey",
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
namespaceId,
|
|
|
|
true);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
}
|
|
|
|
else if (exclusionOpNames != NIL)
|
|
|
|
{
|
|
|
|
indexname = ChooseRelationName(tabname,
|
|
|
|
ChooseIndexNameAddition(colnames),
|
2010-03-22 16:24:11 +01:00
|
|
|
"excl",
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
namespaceId,
|
|
|
|
true);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
}
|
|
|
|
else if (isconstraint)
|
|
|
|
{
|
|
|
|
indexname = ChooseRelationName(tabname,
|
|
|
|
ChooseIndexNameAddition(colnames),
|
|
|
|
"key",
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
namespaceId,
|
|
|
|
true);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
indexname = ChooseRelationName(tabname,
|
|
|
|
ChooseIndexNameAddition(colnames),
|
|
|
|
"idx",
|
Fully enforce uniqueness of constraint names.
It's been true for a long time that we expect names of table and domain
constraints to be unique among the constraints of that table or domain.
However, the enforcement of that has been pretty haphazard, and it missed
some corner cases such as creating a CHECK constraint and then an index
constraint of the same name (as per recent report from André Hänsel).
Also, due to the lack of an actual unique index enforcing this, duplicates
could be created through race conditions.
Moreover, the code that searches pg_constraint has been quite inconsistent
about how to handle duplicate names if one did occur: some places checked
and threw errors if there was more than one match, while others just
processed the first match they came to.
To fix, create a unique index on (conrelid, contypid, conname). Since
either conrelid or contypid is zero, this will separately enforce
uniqueness of constraint names among constraints of any one table and any
one domain. (If we ever implement SQL assertions, and put them into this
catalog, more thought might be needed. But it'd be at least as reasonable
to put them into a new catalog; having overloaded this one catalog with
two kinds of constraints was a mistake already IMO.) This index can replace
the existing non-unique index on conrelid, though we need to keep the one
on contypid for query performance reasons.
Having done that, we can simplify the logic in various places that either
coped with duplicates or neglected to, as well as potentially improve
lookup performance when searching for a constraint by name.
Also, as per our usual practice, install a preliminary check so that you
get something more friendly than a unique-index violation report in the
case complained of by André. And teach ChooseIndexName to avoid choosing
autogenerated names that would draw such a failure.
While it's not possible to make such a change in the back branches,
it doesn't seem quite too late to put this into v11, so do so.
Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
|
|
|
namespaceId,
|
|
|
|
false);
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return indexname;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generate "name2" for a new index given the list of column names for it
|
|
|
|
* (as produced by ChooseIndexColumnNames). This will be passed to
|
|
|
|
* ChooseRelationName along with the parent table name and a suitable label.
|
|
|
|
*
|
|
|
|
* We know that less than NAMEDATALEN characters will actually be used,
|
|
|
|
* so we can truncate the result once we've generated that many.
|
Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so. Patch it up so that it does. Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.
While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.
Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.
In pg11, comments on statistics objects are cloned too. In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it. Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason. Any corresponding user-visible changes (docs) are backpatched,
though.
Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-05 23:37:19 +01:00
|
|
|
*
|
2019-03-13 14:15:37 +01:00
|
|
|
* XXX See also ChooseForeignKeyConstraintNameAddition and
|
|
|
|
* ChooseExtendedStatisticNameAddition.
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
*/
|
|
|
|
static char *
|
|
|
|
ChooseIndexNameAddition(List *colnames)
|
|
|
|
{
|
|
|
|
char buf[NAMEDATALEN * 2];
|
|
|
|
int buflen = 0;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
buf[0] = '\0';
|
|
|
|
foreach(lc, colnames)
|
|
|
|
{
|
|
|
|
const char *name = (const char *) lfirst(lc);
|
|
|
|
|
|
|
|
if (buflen > 0)
|
|
|
|
buf[buflen++] = '_'; /* insert _ between names */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* At this point we have buflen <= NAMEDATALEN. name should be less
|
|
|
|
* than NAMEDATALEN already, but use strlcpy for paranoia.
|
|
|
|
*/
|
|
|
|
strlcpy(buf + buflen, name, NAMEDATALEN);
|
|
|
|
buflen += strlen(buf + buflen);
|
|
|
|
if (buflen >= NAMEDATALEN)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return pstrdup(buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Select the actual names to be used for the columns of an index, given the
|
|
|
|
* list of IndexElems for the columns. This is mostly about ensuring the
|
|
|
|
* names are unique so we don't get a conflicting-attribute-names error.
|
|
|
|
*
|
|
|
|
* Returns a List of plain strings (char *, not String nodes).
|
|
|
|
*/
|
Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index. This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.
To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index. (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.) Now we don't need preassignment of index
names in any situation.
I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.
Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches. The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 19:25:18 +02:00
|
|
|
static List *
|
Adjust naming of indexes and their columns per recent discussion.
Index expression columns are now named after the FigureColname result for
their expressions, rather than always being "pg_expression_N". Digits are
appended to this name if needed to make the column name unique within the
index. (That happens for regular columns too, thus fixing the old problem
that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes
there was no real reason to do such a thing, but now maybe there is.)
Default names for indexes and associated constraints now include the column
names of all their columns, not only the first one as in previous practice.
(Of course, this will be truncated as needed to fit in NAMEDATALEN. Also,
pkey indexes retain the historical behavior of not naming specific columns
at all.)
An example of the results:
regression=# create table foo (f1 int, f2 text,
regression(# exclude (f1 with =, lower(f2) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo"
CREATE TABLE
regression=# \d foo_f1_lower_exclusion
Index "public.foo_f1_lower_exclusion"
Column | Type | Definition
--------+---------+------------
f1 | integer | f1
lower | text | lower(f2)
btree, for table "public.foo"
2009-12-23 03:35:25 +01:00
|
|
|
ChooseIndexColumnNames(List *indexElems)
|
|
|
|
{
|
|
|
|
List *result = NIL;
|
|
|
|
ListCell *lc;
|
|
|
|
|
|
|
|
foreach(lc, indexElems)
|
|
|
|
{
|
|
|
|
IndexElem *ielem = (IndexElem *) lfirst(lc);
|
|
|
|
const char *origname;
|
|
|
|
const char *curname;
|
|
|
|
int i;
|
|
|
|
char buf[NAMEDATALEN];
|
|
|
|
|
|
|
|
/* Get the preliminary name from the IndexElem */
|
|
|
|
if (ielem->indexcolname)
|
|
|
|
origname = ielem->indexcolname; /* caller-specified name */
|
|
|
|
else if (ielem->name)
|
|
|
|
origname = ielem->name; /* simple column reference */
|
|
|
|
else
|
|
|
|
origname = "expr"; /* default name for expression */
|
|
|
|
|
|
|
|
/* If it conflicts with any previous column, tweak it */
|
|
|
|
curname = origname;
|
|
|
|
for (i = 1;; i++)
|
|
|
|
{
|
|
|
|
ListCell *lc2;
|
|
|
|
char nbuf[32];
|
|
|
|
int nlen;
|
|
|
|
|
|
|
|
foreach(lc2, result)
|
|
|
|
{
|
|
|
|
if (strcmp(curname, (char *) lfirst(lc2)) == 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (lc2 == NULL)
|
|
|
|
break; /* found nonconflicting name */
|
|
|
|
|
|
|
|
sprintf(nbuf, "%d", i);
|
|
|
|
|
|
|
|
/* Ensure generated names are shorter than NAMEDATALEN */
|
|
|
|
nlen = pg_mbcliplen(origname, strlen(origname),
|
|
|
|
NAMEDATALEN - 1 - strlen(nbuf));
|
|
|
|
memcpy(buf, origname, nlen);
|
|
|
|
strcpy(buf + nlen, nbuf);
|
|
|
|
curname = buf;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* And attach to the result list */
|
|
|
|
result = lappend(result, pstrdup(curname));
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2020-12-03 02:13:21 +01:00
|
|
|
/*
|
2021-01-18 06:03:10 +01:00
|
|
|
* ExecReindex
|
|
|
|
*
|
|
|
|
* Primary entry point for manual REINDEX commands. This is mainly a
|
|
|
|
* preparation wrapper for the real operations that will happen in
|
|
|
|
* each subroutine of REINDEX.
|
2020-12-03 02:13:21 +01:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
void
|
|
|
|
ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
|
2020-12-03 02:13:21 +01:00
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams params = {0};
|
2020-12-03 02:13:21 +01:00
|
|
|
ListCell *lc;
|
|
|
|
bool concurrently = false;
|
|
|
|
bool verbose = false;
|
2021-02-04 06:34:20 +01:00
|
|
|
char *tablespacename = NULL;
|
2020-12-03 02:13:21 +01:00
|
|
|
|
|
|
|
/* Parse option list */
|
|
|
|
foreach(lc, stmt->params)
|
|
|
|
{
|
|
|
|
DefElem *opt = (DefElem *) lfirst(lc);
|
|
|
|
|
|
|
|
if (strcmp(opt->defname, "verbose") == 0)
|
|
|
|
verbose = defGetBoolean(opt);
|
|
|
|
else if (strcmp(opt->defname, "concurrently") == 0)
|
|
|
|
concurrently = defGetBoolean(opt);
|
2021-02-04 06:34:20 +01:00
|
|
|
else if (strcmp(opt->defname, "tablespace") == 0)
|
|
|
|
tablespacename = defGetString(opt);
|
2020-12-03 02:13:21 +01:00
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("unrecognized REINDEX option \"%s\"",
|
|
|
|
opt->defname),
|
|
|
|
parser_errposition(pstate, opt->location)));
|
|
|
|
}
|
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
if (concurrently)
|
|
|
|
PreventInTransactionBlock(isTopLevel,
|
|
|
|
"REINDEX CONCURRENTLY");
|
|
|
|
|
|
|
|
params.options =
|
2020-12-03 02:13:21 +01:00
|
|
|
(verbose ? REINDEXOPT_VERBOSE : 0) |
|
|
|
|
(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
|
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
/*
|
|
|
|
* Assign the tablespace OID to move indexes to, with InvalidOid to do
|
|
|
|
* nothing.
|
|
|
|
*/
|
|
|
|
if (tablespacename != NULL)
|
|
|
|
{
|
|
|
|
params.tablespaceOid = get_tablespace_oid(tablespacename, false);
|
|
|
|
|
|
|
|
/* Check permissions except when moving to database's default */
|
|
|
|
if (OidIsValid(params.tablespaceOid) &&
|
|
|
|
params.tablespaceOid != MyDatabaseTableSpace)
|
|
|
|
{
|
|
|
|
AclResult aclresult;
|
|
|
|
|
|
|
|
aclresult = pg_tablespace_aclcheck(params.tablespaceOid,
|
|
|
|
GetUserId(), ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE,
|
|
|
|
get_tablespace_name(params.tablespaceOid));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
params.tablespaceOid = InvalidOid;
|
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
switch (stmt->kind)
|
|
|
|
{
|
|
|
|
case REINDEX_OBJECT_INDEX:
|
|
|
|
ReindexIndex(stmt->relation, ¶ms, isTopLevel);
|
|
|
|
break;
|
|
|
|
case REINDEX_OBJECT_TABLE:
|
|
|
|
ReindexTable(stmt->relation, ¶ms, isTopLevel);
|
|
|
|
break;
|
|
|
|
case REINDEX_OBJECT_SCHEMA:
|
|
|
|
case REINDEX_OBJECT_SYSTEM:
|
|
|
|
case REINDEX_OBJECT_DATABASE:
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This cannot run inside a user transaction block; if we were
|
|
|
|
* inside a transaction, then its commit- and
|
|
|
|
* start-transaction-command calls would not have the intended
|
|
|
|
* effect!
|
|
|
|
*/
|
|
|
|
PreventInTransactionBlock(isTopLevel,
|
|
|
|
(stmt->kind == REINDEX_OBJECT_SCHEMA) ? "REINDEX SCHEMA" :
|
|
|
|
(stmt->kind == REINDEX_OBJECT_SYSTEM) ? "REINDEX SYSTEM" :
|
|
|
|
"REINDEX DATABASE");
|
|
|
|
ReindexMultipleTables(stmt->name, stmt->kind, ¶ms);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
elog(ERROR, "unrecognized object type: %d",
|
|
|
|
(int) stmt->kind);
|
|
|
|
break;
|
|
|
|
}
|
2020-12-03 02:13:21 +01:00
|
|
|
}
|
|
|
|
|
2000-02-18 10:30:20 +01:00
|
|
|
/*
|
2002-08-29 17:56:20 +02:00
|
|
|
* ReindexIndex
|
2005-06-22 23:14:31 +02:00
|
|
|
* Recreate a specific index.
|
2000-02-18 10:30:20 +01:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
static void
|
|
|
|
ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel)
|
2000-02-18 10:30:20 +01:00
|
|
|
{
|
2019-05-08 14:15:01 +02:00
|
|
|
struct ReindexIndexCallbackState state;
|
2002-03-26 20:17:02 +01:00
|
|
|
Oid indOid;
|
2015-03-30 21:01:44 +02:00
|
|
|
char persistence;
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
char relkind;
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
|
2015-03-30 21:01:44 +02:00
|
|
|
/*
|
|
|
|
* Find and lock index, and check permissions on table; use callback to
|
|
|
|
* obtain lock on table first, to avoid deadlock hazard. The lock level
|
|
|
|
* used here must match the index lock obtained in reindex_index().
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
*
|
|
|
|
* If it's a temporary index, we will perform a non-concurrent reindex,
|
|
|
|
* even if CONCURRENTLY was requested. In that case, reindex_index() will
|
|
|
|
* upgrade the lock, but that's OK, because other sessions can't hold
|
|
|
|
* locks on our temporary table.
|
2015-03-30 21:01:44 +02:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
state.params = *params;
|
2019-05-08 14:15:01 +02:00
|
|
|
state.locked_table_oid = InvalidOid;
|
2019-03-29 08:25:20 +01:00
|
|
|
indOid = RangeVarGetRelidExtended(indexRelation,
|
2021-01-18 06:03:10 +01:00
|
|
|
(params->options & REINDEXOPT_CONCURRENTLY) != 0 ?
|
2020-09-04 03:36:35 +02:00
|
|
|
ShareUpdateExclusiveLock : AccessExclusiveLock,
|
2018-03-31 01:33:42 +02:00
|
|
|
0,
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
RangeVarCallbackForReindexIndex,
|
2019-05-08 14:15:01 +02:00
|
|
|
&state);
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
|
2015-03-30 21:01:44 +02:00
|
|
|
/*
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
* Obtain the current persistence and kind of the existing index. We
|
|
|
|
* already hold a lock on the index.
|
2015-03-30 21:01:44 +02:00
|
|
|
*/
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
persistence = get_rel_persistence(indOid);
|
|
|
|
relkind = get_rel_relkind(indOid);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
if (relkind == RELKIND_PARTITIONED_INDEX)
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexPartitions(indOid, params, isTopLevel);
|
|
|
|
else if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
persistence != RELPERSISTENCE_TEMP)
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexRelationConcurrently(indOid, params);
|
2019-03-29 08:25:20 +01:00
|
|
|
else
|
2021-01-18 06:03:10 +01:00
|
|
|
{
|
|
|
|
ReindexParams newparams = *params;
|
|
|
|
|
|
|
|
newparams.options |= REINDEXOPT_REPORT_PROGRESS;
|
|
|
|
reindex_index(indOid, false, persistence, &newparams);
|
|
|
|
}
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check permissions on table before acquiring relation lock; also lock
|
|
|
|
* the heap before the RangeVarGetRelidExtended takes the index lock, to avoid
|
|
|
|
* deadlocks.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
RangeVarCallbackForReindexIndex(const RangeVar *relation,
|
|
|
|
Oid relId, Oid oldRelId, void *arg)
|
|
|
|
{
|
|
|
|
char relkind;
|
2019-05-08 14:15:01 +02:00
|
|
|
struct ReindexIndexCallbackState *state = arg;
|
|
|
|
LOCKMODE table_lockmode;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lock level here should match table lock in reindex_index() for
|
|
|
|
* non-concurrent case and table locks used by index_concurrently_*() for
|
|
|
|
* concurrent case.
|
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
table_lockmode = (state->params.options & REINDEXOPT_CONCURRENTLY) != 0 ?
|
2020-09-04 03:36:35 +02:00
|
|
|
ShareUpdateExclusiveLock : ShareLock;
|
2000-11-08 23:10:03 +01:00
|
|
|
|
2011-07-09 04:19:30 +02:00
|
|
|
/*
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
* If we previously locked some other index's heap, and the name we're
|
|
|
|
* looking up no longer refers to that relation, release the now-useless
|
|
|
|
* lock.
|
2011-07-09 04:19:30 +02:00
|
|
|
*/
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
if (relId != oldRelId && OidIsValid(oldRelId))
|
|
|
|
{
|
2019-05-08 14:15:01 +02:00
|
|
|
UnlockRelationOid(state->locked_table_oid, table_lockmode);
|
|
|
|
state->locked_table_oid = InvalidOid;
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* If the relation does not exist, there's nothing more to do. */
|
|
|
|
if (!OidIsValid(relId))
|
|
|
|
return;
|
2000-02-18 10:30:20 +01:00
|
|
|
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
/*
|
|
|
|
* If the relation does exist, check whether it's an index. But note that
|
|
|
|
* the relation might have been dropped between the time we did the name
|
|
|
|
* lookup and now. In that case, there's nothing to do.
|
|
|
|
*/
|
|
|
|
relkind = get_rel_relkind(relId);
|
|
|
|
if (!relkind)
|
|
|
|
return;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (relkind != RELKIND_INDEX &&
|
|
|
|
relkind != RELKIND_PARTITIONED_INDEX)
|
2003-07-20 23:56:35 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
errmsg("\"%s\" is not an index", relation->relname)));
|
2002-03-26 20:17:02 +01:00
|
|
|
|
2003-09-24 20:54:02 +02:00
|
|
|
/* Check permissions */
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
if (!pg_class_ownercheck(relId, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_INDEX, relation->relname);
|
2000-02-18 10:30:20 +01:00
|
|
|
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
/* Lock heap before index to avoid deadlock. */
|
|
|
|
if (relId != oldRelId)
|
|
|
|
{
|
2019-05-08 14:15:01 +02:00
|
|
|
Oid table_oid = IndexGetRelation(relId, true);
|
|
|
|
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
/*
|
2019-05-08 14:15:01 +02:00
|
|
|
* If the OID isn't valid, it means the index was concurrently
|
|
|
|
* dropped, which is not a problem for us; just return normally.
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
*/
|
2019-05-08 14:15:01 +02:00
|
|
|
if (OidIsValid(table_oid))
|
|
|
|
{
|
|
|
|
LockRelationOid(table_oid, table_lockmode);
|
|
|
|
state->locked_table_oid = table_oid;
|
|
|
|
}
|
Improve table locking behavior in the face of current DDL.
In the previous coding, callers were faced with an awkward choice:
look up the name, do permissions checks, and then lock the table; or
look up the name, lock the table, and then do permissions checks.
The first choice was wrong because the results of the name lookup
and permissions checks might be out-of-date by the time the table
lock was acquired, while the second allowed a user with no privileges
to interfere with access to a table by users who do have privileges
(e.g. if a malicious backend queues up for an AccessExclusiveLock on
a table on which AccessShareLock is already held, further attempts
to access the table will be blocked until the AccessExclusiveLock
is obtained and the malicious backend's transaction rolls back).
To fix, allow callers of RangeVarGetRelid() to pass a callback which
gets executed after performing the name lookup but before acquiring
the relation lock. If the name lookup is retried (because
invalidation messages are received), the callback will be re-executed
as well, so we get the best of both worlds. RangeVarGetRelid() is
renamed to RangeVarGetRelidExtended(); callers not wishing to supply
a callback can continue to invoke it as RangeVarGetRelid(), which is
now a macro. Since the only one caller that uses nowait = true now
passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
as well. The callback can also be used for supplemental locking - for
example, REINDEX INDEX needs to acquire the table lock before the index
lock to reduce deadlock possibilities.
There's a lot more work to be done here to fix all the cases where this
can be a problem, but this commit provides the general infrastructure
and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
|
|
|
}
|
2000-02-18 10:30:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ReindexTable
|
2005-06-22 23:14:31 +02:00
|
|
|
* Recreate all indexes of a table (and of its toast table, if any)
|
2000-02-18 10:30:20 +01:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
static Oid
|
|
|
|
ReindexTable(RangeVar *relation, ReindexParams *params, bool isTopLevel)
|
2000-02-18 10:30:20 +01:00
|
|
|
{
|
2002-03-26 20:17:02 +01:00
|
|
|
Oid heapOid;
|
2019-03-29 08:25:20 +01:00
|
|
|
bool result;
|
2000-02-18 10:30:20 +01:00
|
|
|
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
/*
|
|
|
|
* The lock level used here should match reindex_relation().
|
|
|
|
*
|
|
|
|
* If it's a temporary table, we will perform a non-concurrent reindex,
|
|
|
|
* even if CONCURRENTLY was requested. In that case, reindex_relation()
|
|
|
|
* will upgrade the lock, but that's OK, because other sessions can't hold
|
|
|
|
* locks on our temporary table.
|
|
|
|
*/
|
2019-03-29 08:25:20 +01:00
|
|
|
heapOid = RangeVarGetRelidExtended(relation,
|
2021-01-18 06:03:10 +01:00
|
|
|
(params->options & REINDEXOPT_CONCURRENTLY) != 0 ?
|
2020-09-04 03:36:35 +02:00
|
|
|
ShareUpdateExclusiveLock : ShareLock,
|
2019-03-29 08:25:20 +01:00
|
|
|
0,
|
2011-12-21 21:17:28 +01:00
|
|
|
RangeVarCallbackOwnsTable, NULL);
|
2002-10-22 00:06:20 +02:00
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
if (get_rel_relkind(heapOid) == RELKIND_PARTITIONED_TABLE)
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexPartitions(heapOid, params, isTopLevel);
|
|
|
|
else if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
get_rel_persistence(heapOid) != RELPERSISTENCE_TEMP)
|
2019-06-05 11:05:41 +02:00
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
result = ReindexRelationConcurrently(heapOid, params);
|
2019-06-05 11:05:41 +02:00
|
|
|
|
|
|
|
if (!result)
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("table \"%s\" has no indexes that can be reindexed concurrently",
|
|
|
|
relation->relname)));
|
|
|
|
}
|
2019-03-29 08:25:20 +01:00
|
|
|
else
|
2019-06-05 11:05:41 +02:00
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams newparams = *params;
|
|
|
|
|
|
|
|
newparams.options |= REINDEXOPT_REPORT_PROGRESS;
|
2019-03-29 08:25:20 +01:00
|
|
|
result = reindex_relation(heapOid,
|
|
|
|
REINDEX_REL_PROCESS_TOAST |
|
|
|
|
REINDEX_REL_CHECK_CONSTRAINTS,
|
2021-01-18 06:03:10 +01:00
|
|
|
&newparams);
|
2019-06-05 11:05:41 +02:00
|
|
|
if (!result)
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errmsg("table \"%s\" has no indexes to reindex",
|
|
|
|
relation->relname)));
|
|
|
|
}
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return heapOid;
|
2000-02-18 10:30:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-03-08 17:18:43 +01:00
|
|
|
* ReindexMultipleTables
|
|
|
|
* Recreate indexes of tables selected by objectName/objectKind.
|
2003-09-24 20:54:02 +02:00
|
|
|
*
|
|
|
|
* To reduce the probability of deadlocks, each table is reindexed in a
|
|
|
|
* separate transaction, so we can release the lock on it right away.
|
2007-03-13 01:33:44 +01:00
|
|
|
* That means this must not be called within a user transaction block!
|
2000-02-18 10:30:20 +01:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
static void
|
2015-05-15 13:09:57 +02:00
|
|
|
ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams *params)
|
2000-02-18 10:30:20 +01:00
|
|
|
{
|
2014-12-08 16:28:00 +01:00
|
|
|
Oid objectOid;
|
2004-05-26 06:41:50 +02:00
|
|
|
Relation relationRelation;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan;
|
2015-03-08 17:18:43 +01:00
|
|
|
ScanKeyData scan_keys[1];
|
2004-05-26 06:41:50 +02:00
|
|
|
HeapTuple tuple;
|
2000-06-28 05:33:33 +02:00
|
|
|
MemoryContext private_context;
|
2000-02-18 10:30:20 +01:00
|
|
|
MemoryContext old;
|
2004-05-26 06:41:50 +02:00
|
|
|
List *relids = NIL;
|
2015-03-08 17:18:43 +01:00
|
|
|
int num_keys;
|
2019-03-29 08:25:20 +01:00
|
|
|
bool concurrent_warning = false;
|
2021-02-04 06:34:20 +01:00
|
|
|
bool tablespace_warning = false;
|
2000-02-18 10:30:20 +01:00
|
|
|
|
2014-12-08 16:28:00 +01:00
|
|
|
AssertArg(objectName);
|
|
|
|
Assert(objectKind == REINDEX_OBJECT_SCHEMA ||
|
|
|
|
objectKind == REINDEX_OBJECT_SYSTEM ||
|
|
|
|
objectKind == REINDEX_OBJECT_DATABASE);
|
2000-02-18 10:30:20 +01:00
|
|
|
|
2020-09-04 03:36:35 +02:00
|
|
|
if (objectKind == REINDEX_OBJECT_SYSTEM &&
|
2021-01-18 06:03:10 +01:00
|
|
|
(params->options & REINDEXOPT_CONCURRENTLY) != 0)
|
2019-03-29 08:25:20 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-06-20 06:28:12 +02:00
|
|
|
errmsg("cannot reindex system catalogs concurrently")));
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2014-12-08 16:28:00 +01:00
|
|
|
/*
|
2015-03-08 17:18:43 +01:00
|
|
|
* Get OID of object to reindex, being the database currently being used
|
|
|
|
* by session for a database or for system catalogs, or the schema defined
|
|
|
|
* by caller. At the same time do permission checks that need different
|
|
|
|
* processing depending on the object type.
|
2014-12-08 16:28:00 +01:00
|
|
|
*/
|
|
|
|
if (objectKind == REINDEX_OBJECT_SCHEMA)
|
|
|
|
{
|
|
|
|
objectOid = get_namespace_oid(objectName, false);
|
2000-06-28 05:33:33 +02:00
|
|
|
|
2014-12-08 16:28:00 +01:00
|
|
|
if (!pg_namespace_ownercheck(objectOid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SCHEMA,
|
2014-12-08 16:28:00 +01:00
|
|
|
objectName);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
objectOid = MyDatabaseId;
|
|
|
|
|
2015-03-08 17:18:43 +01:00
|
|
|
if (strcmp(objectName, get_database_name(objectOid)) != 0)
|
2014-12-08 16:28:00 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("can only reindex the currently open database")));
|
2015-03-08 17:18:43 +01:00
|
|
|
if (!pg_database_ownercheck(objectOid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
|
2014-12-08 16:28:00 +01:00
|
|
|
objectName);
|
|
|
|
}
|
2000-02-18 10:30:20 +01:00
|
|
|
|
2000-06-28 05:33:33 +02:00
|
|
|
/*
|
|
|
|
* Create a memory context that will survive forced transaction commits we
|
2003-05-02 22:54:36 +02:00
|
|
|
* do below. Since it is a child of PortalContext, it will go away
|
2000-06-28 05:33:33 +02:00
|
|
|
* eventually even if we suffer an error; there's no need for special
|
|
|
|
* abort cleanup logic.
|
|
|
|
*/
|
2003-05-02 22:54:36 +02:00
|
|
|
private_context = AllocSetContextCreate(PortalContext,
|
2015-03-08 17:18:43 +01:00
|
|
|
"ReindexMultipleTables",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_SMALL_SIZES);
|
2000-02-18 10:30:20 +01:00
|
|
|
|
2003-09-24 20:54:02 +02:00
|
|
|
/*
|
2015-03-08 17:18:43 +01:00
|
|
|
* Define the search keys to find the objects to reindex. For a schema, we
|
|
|
|
* select target relations using relnamespace, something not necessary for
|
|
|
|
* a database-wide operation.
|
2014-12-08 16:28:00 +01:00
|
|
|
*/
|
|
|
|
if (objectKind == REINDEX_OBJECT_SCHEMA)
|
|
|
|
{
|
2014-12-11 23:54:05 +01:00
|
|
|
num_keys = 1;
|
2014-12-08 16:28:00 +01:00
|
|
|
ScanKeyInit(&scan_keys[0],
|
|
|
|
Anum_pg_class_relnamespace,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(objectOid));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
num_keys = 0;
|
|
|
|
|
2001-11-20 03:46:13 +01:00
|
|
|
/*
|
|
|
|
* Scan pg_class to build a list of the relations we need to reindex.
|
2003-09-24 20:54:02 +02:00
|
|
|
*
|
2013-07-05 21:25:51 +02:00
|
|
|
* We only consider plain relations and materialized views here (toast
|
|
|
|
* rels will be processed indirectly by reindex_relation).
|
2001-11-20 03:46:13 +01:00
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
relationRelation = table_open(RelationRelationId, AccessShareLock);
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scan = table_beginscan_catalog(relationRelation, num_keys, scan_keys);
|
2002-05-21 01:51:44 +02:00
|
|
|
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
|
2000-02-18 10:30:20 +01:00
|
|
|
{
|
2003-09-24 20:54:02 +02:00
|
|
|
Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Oid relid = classtuple->oid;
|
2003-09-24 20:54:02 +02:00
|
|
|
|
2014-12-11 23:54:05 +01:00
|
|
|
/*
|
2015-03-08 17:18:43 +01:00
|
|
|
* Only regular tables and matviews can have indexes, so ignore any
|
|
|
|
* other kind of relation.
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
*
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
* Partitioned tables/indexes are skipped but matching leaf partitions
|
|
|
|
* are processed.
|
2014-12-11 23:54:05 +01:00
|
|
|
*/
|
2013-03-04 01:23:31 +01:00
|
|
|
if (classtuple->relkind != RELKIND_RELATION &&
|
|
|
|
classtuple->relkind != RELKIND_MATVIEW)
|
2003-09-24 20:54:02 +02:00
|
|
|
continue;
|
2002-08-29 17:56:20 +02:00
|
|
|
|
2007-09-10 23:59:37 +02:00
|
|
|
/* Skip temp tables of other backends; we can't reindex them at all */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
|
2009-04-01 00:12:48 +02:00
|
|
|
!isTempNamespace(classtuple->relnamespace))
|
2007-09-10 23:59:37 +02:00
|
|
|
continue;
|
|
|
|
|
2005-06-22 23:14:31 +02:00
|
|
|
/* Check user/system classification, and optionally skip */
|
2015-03-08 17:18:43 +01:00
|
|
|
if (objectKind == REINDEX_OBJECT_SYSTEM &&
|
|
|
|
!IsSystemClass(relid, classtuple))
|
2014-12-08 16:28:00 +01:00
|
|
|
continue;
|
2003-09-24 20:54:02 +02:00
|
|
|
|
2018-08-09 09:40:15 +02:00
|
|
|
/*
|
|
|
|
* The table can be reindexed if the user is superuser, the table
|
|
|
|
* owner, or the database/schema owner (but in the latter case, only
|
|
|
|
* if it's not a shared relation). pg_class_ownercheck includes the
|
|
|
|
* superuser case, and depending on objectKind we already know that
|
|
|
|
* the user has permission to run REINDEX on this database or schema
|
|
|
|
* per the permission checks at the beginning of this routine.
|
|
|
|
*/
|
|
|
|
if (classtuple->relisshared &&
|
|
|
|
!pg_class_ownercheck(relid, GetUserId()))
|
|
|
|
continue;
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.
The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed. That's even sillier. With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.
With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID. Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep
IsCatalogRelation as a convenience function.
This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid. The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.
While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.
Also improve the comments in catalog.c.
There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId. I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId. But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.
Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-09 05:27:29 +02:00
|
|
|
* Skip system tables, since index_create() would reject indexing them
|
|
|
|
* concurrently (and it would likely fail if we tried).
|
2019-03-29 08:25:20 +01:00
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
|
Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.
The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed. That's even sillier. With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.
With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID. Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep
IsCatalogRelation as a convenience function.
This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid. The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.
While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.
Also improve the comments in catalog.c.
There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId. I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId. But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.
Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-09 05:27:29 +02:00
|
|
|
IsCatalogRelationOid(relid))
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
|
|
|
if (!concurrent_warning)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-06-20 06:28:12 +02:00
|
|
|
errmsg("cannot reindex system catalogs concurrently, skipping all")));
|
2019-03-29 08:25:20 +01:00
|
|
|
concurrent_warning = true;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
/*
|
|
|
|
* If a new tablespace is set, check if this relation has to be
|
|
|
|
* skipped.
|
|
|
|
*/
|
|
|
|
if (OidIsValid(params->tablespaceOid))
|
|
|
|
{
|
|
|
|
bool skip_rel = false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Mapped relations cannot be moved to different tablespaces (in
|
|
|
|
* particular this eliminates all shared catalogs.).
|
|
|
|
*/
|
|
|
|
if (RELKIND_HAS_STORAGE(classtuple->relkind) &&
|
|
|
|
!OidIsValid(classtuple->relfilenode))
|
|
|
|
skip_rel = true;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A system relation is always skipped, even with
|
|
|
|
* allow_system_table_mods enabled.
|
|
|
|
*/
|
|
|
|
if (IsSystemClass(relid, classtuple))
|
|
|
|
skip_rel = true;
|
|
|
|
|
|
|
|
if (skip_rel)
|
|
|
|
{
|
|
|
|
if (!tablespace_warning)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("cannot move system relations, skipping all")));
|
|
|
|
tablespace_warning = true;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-08 17:18:43 +01:00
|
|
|
/* Save the list of relation OIDs in private context */
|
|
|
|
old = MemoryContextSwitchTo(private_context);
|
|
|
|
|
2014-12-08 16:28:00 +01:00
|
|
|
/*
|
2015-03-08 17:18:43 +01:00
|
|
|
* We always want to reindex pg_class first if it's selected to be
|
|
|
|
* reindexed. This ensures that if there is any corruption in
|
|
|
|
* pg_class' indexes, they will be fixed before we process any other
|
|
|
|
* tables. This is critical because reindexing itself will try to
|
|
|
|
* update pg_class.
|
2014-12-08 16:28:00 +01:00
|
|
|
*/
|
2015-03-08 17:18:43 +01:00
|
|
|
if (relid == RelationRelationId)
|
|
|
|
relids = lcons_oid(relid, relids);
|
|
|
|
else
|
|
|
|
relids = lappend_oid(relids, relid);
|
2003-09-24 20:54:02 +02:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(old);
|
2000-02-18 10:30:20 +01:00
|
|
|
}
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scan);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(relationRelation, AccessShareLock);
|
2000-02-18 10:30:20 +01:00
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
/*
|
|
|
|
* Process each relation listed in a separate transaction. Note that this
|
|
|
|
* commits and then starts a new transaction immediately.
|
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexMultipleInternal(relids, params);
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
|
|
|
MemoryContextDelete(private_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Error callback specific to ReindexPartitions().
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
reindex_error_callback(void *arg)
|
|
|
|
{
|
|
|
|
ReindexErrorInfo *errinfo = (ReindexErrorInfo *) arg;
|
|
|
|
|
2021-12-03 13:38:26 +01:00
|
|
|
Assert(RELKIND_HAS_PARTITIONS(errinfo->relkind));
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
|
|
|
if (errinfo->relkind == RELKIND_PARTITIONED_TABLE)
|
|
|
|
errcontext("while reindexing partitioned table \"%s.%s\"",
|
|
|
|
errinfo->relnamespace, errinfo->relname);
|
|
|
|
else if (errinfo->relkind == RELKIND_PARTITIONED_INDEX)
|
|
|
|
errcontext("while reindexing partitioned index \"%s.%s\"",
|
|
|
|
errinfo->relnamespace, errinfo->relname);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ReindexPartitions
|
|
|
|
*
|
|
|
|
* Reindex a set of partitions, per the partitioned index or table given
|
|
|
|
* by the caller.
|
|
|
|
*/
|
|
|
|
static void
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel)
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
{
|
|
|
|
List *partitions = NIL;
|
|
|
|
char relkind = get_rel_relkind(relid);
|
|
|
|
char *relname = get_rel_name(relid);
|
|
|
|
char *relnamespace = get_namespace_name(get_rel_namespace(relid));
|
|
|
|
MemoryContext reindex_context;
|
|
|
|
List *inhoids;
|
|
|
|
ListCell *lc;
|
|
|
|
ErrorContextCallback errcallback;
|
|
|
|
ReindexErrorInfo errinfo;
|
|
|
|
|
2021-12-03 13:38:26 +01:00
|
|
|
Assert(RELKIND_HAS_PARTITIONS(relkind));
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if this runs in a transaction block, with an error callback to
|
|
|
|
* provide more context under which a problem happens.
|
|
|
|
*/
|
|
|
|
errinfo.relname = pstrdup(relname);
|
|
|
|
errinfo.relnamespace = pstrdup(relnamespace);
|
|
|
|
errinfo.relkind = relkind;
|
|
|
|
errcallback.callback = reindex_error_callback;
|
|
|
|
errcallback.arg = (void *) &errinfo;
|
|
|
|
errcallback.previous = error_context_stack;
|
|
|
|
error_context_stack = &errcallback;
|
|
|
|
|
|
|
|
PreventInTransactionBlock(isTopLevel,
|
|
|
|
relkind == RELKIND_PARTITIONED_TABLE ?
|
|
|
|
"REINDEX TABLE" : "REINDEX INDEX");
|
|
|
|
|
|
|
|
/* Pop the error context stack */
|
|
|
|
error_context_stack = errcallback.previous;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create special memory context for cross-transaction storage.
|
|
|
|
*
|
|
|
|
* Since it is a child of PortalContext, it will go away eventually even
|
|
|
|
* if we suffer an error so there is no need for special abort cleanup
|
|
|
|
* logic.
|
|
|
|
*/
|
|
|
|
reindex_context = AllocSetContextCreate(PortalContext, "Reindex",
|
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
|
|
|
|
|
|
|
/* ShareLock is enough to prevent schema modifications */
|
|
|
|
inhoids = find_all_inheritors(relid, ShareLock, NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The list of relations to reindex are the physical partitions of the
|
|
|
|
* tree so discard any partitioned table or index.
|
|
|
|
*/
|
|
|
|
foreach(lc, inhoids)
|
|
|
|
{
|
|
|
|
Oid partoid = lfirst_oid(lc);
|
|
|
|
char partkind = get_rel_relkind(partoid);
|
|
|
|
MemoryContext old_context;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This discards partitioned tables, partitioned indexes and foreign
|
|
|
|
* tables.
|
|
|
|
*/
|
|
|
|
if (!RELKIND_HAS_STORAGE(partkind))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
Assert(partkind == RELKIND_INDEX ||
|
|
|
|
partkind == RELKIND_RELATION);
|
|
|
|
|
|
|
|
/* Save partition OID */
|
|
|
|
old_context = MemoryContextSwitchTo(reindex_context);
|
|
|
|
partitions = lappend_oid(partitions, partoid);
|
|
|
|
MemoryContextSwitchTo(old_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Process each partition listed in a separate transaction. Note that
|
|
|
|
* this commits and then starts a new transaction immediately.
|
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexMultipleInternal(partitions, params);
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Clean up working storage --- note we must do this after
|
|
|
|
* StartTransactionCommand, else we might be trying to delete the active
|
|
|
|
* context!
|
|
|
|
*/
|
|
|
|
MemoryContextDelete(reindex_context);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ReindexMultipleInternal
|
|
|
|
*
|
|
|
|
* Reindex a list of relations, each one being processed in its own
|
|
|
|
* transaction. This commits the existing transaction immediately,
|
|
|
|
* and starts a new transaction when finished.
|
|
|
|
*/
|
|
|
|
static void
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexMultipleInternal(List *relids, ReindexParams *params)
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
{
|
|
|
|
ListCell *l;
|
|
|
|
|
2008-05-12 22:02:02 +02:00
|
|
|
PopActiveSnapshot();
|
2003-05-14 05:26:03 +02:00
|
|
|
CommitTransactionCommand();
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
2004-05-26 06:41:50 +02:00
|
|
|
foreach(l, relids)
|
2000-02-18 10:30:20 +01:00
|
|
|
{
|
2004-05-26 06:41:50 +02:00
|
|
|
Oid relid = lfirst_oid(l);
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
char relkind;
|
|
|
|
char relpersistence;
|
2003-09-24 20:54:02 +02:00
|
|
|
|
2003-05-14 05:26:03 +02:00
|
|
|
StartTransactionCommand();
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
2004-09-13 22:10:13 +02:00
|
|
|
/* functions in indexes may want a snapshot set */
|
2008-05-12 22:02:02 +02:00
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
2015-05-15 13:09:57 +02:00
|
|
|
|
2020-09-02 02:08:12 +02:00
|
|
|
/* check if the relation still exists */
|
|
|
|
if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
|
|
|
|
{
|
|
|
|
PopActiveSnapshot();
|
|
|
|
CommitTransactionCommand();
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
/*
|
|
|
|
* Check permissions except when moving to database's default if a new
|
|
|
|
* tablespace is chosen. Note that this check also happens in
|
|
|
|
* ExecReindex(), but we do an extra check here as this runs across
|
|
|
|
* multiple transactions.
|
|
|
|
*/
|
|
|
|
if (OidIsValid(params->tablespaceOid) &&
|
|
|
|
params->tablespaceOid != MyDatabaseTableSpace)
|
|
|
|
{
|
|
|
|
AclResult aclresult;
|
|
|
|
|
|
|
|
aclresult = pg_tablespace_aclcheck(params->tablespaceOid,
|
|
|
|
GetUserId(), ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE,
|
|
|
|
get_tablespace_name(params->tablespaceOid));
|
|
|
|
}
|
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
relkind = get_rel_relkind(relid);
|
|
|
|
relpersistence = get_rel_persistence(relid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Partitioned tables and indexes can never be processed directly, and
|
|
|
|
* a list of their leaves should be built first.
|
|
|
|
*/
|
2021-12-03 13:38:26 +01:00
|
|
|
Assert(!RELKIND_HAS_PARTITIONS(relkind));
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
relpersistence != RELPERSISTENCE_TEMP)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams newparams = *params;
|
|
|
|
|
|
|
|
newparams.options |= REINDEXOPT_MISSING_OK;
|
|
|
|
(void) ReindexRelationConcurrently(relid, &newparams);
|
2019-03-29 08:25:20 +01:00
|
|
|
/* ReindexRelationConcurrently() does the verbose output */
|
|
|
|
}
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
else if (relkind == RELKIND_INDEX)
|
|
|
|
{
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams newparams = *params;
|
|
|
|
|
|
|
|
newparams.options |=
|
|
|
|
REINDEXOPT_REPORT_PROGRESS | REINDEXOPT_MISSING_OK;
|
|
|
|
reindex_index(relid, false, relpersistence, &newparams);
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
PopActiveSnapshot();
|
|
|
|
/* reindex_index() does the verbose output */
|
|
|
|
}
|
2019-03-29 08:25:20 +01:00
|
|
|
else
|
|
|
|
{
|
2019-06-05 11:05:41 +02:00
|
|
|
bool result;
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexParams newparams = *params;
|
2019-06-05 11:05:41 +02:00
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
newparams.options |=
|
|
|
|
REINDEXOPT_REPORT_PROGRESS | REINDEXOPT_MISSING_OK;
|
2019-03-29 08:25:20 +01:00
|
|
|
result = reindex_relation(relid,
|
|
|
|
REINDEX_REL_PROCESS_TOAST |
|
|
|
|
REINDEX_REL_CHECK_CONSTRAINTS,
|
2021-01-18 06:03:10 +01:00
|
|
|
&newparams);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
if (result && (params->options & REINDEXOPT_VERBOSE) != 0)
|
2015-05-15 13:09:57 +02:00
|
|
|
ereport(INFO,
|
|
|
|
(errmsg("table \"%s.%s\" was reindexed",
|
|
|
|
get_namespace_name(get_rel_namespace(relid)),
|
|
|
|
get_rel_name(relid))));
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
PopActiveSnapshot();
|
|
|
|
}
|
|
|
|
|
|
|
|
CommitTransactionCommand();
|
|
|
|
}
|
|
|
|
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
StartTransactionCommand();
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ReindexRelationConcurrently - process REINDEX CONCURRENTLY for given
|
|
|
|
* relation OID
|
|
|
|
*
|
2019-06-05 11:05:41 +02:00
|
|
|
* 'relationOid' can either belong to an index, a table or a materialized
|
|
|
|
* view. For tables and materialized views, all its indexes will be rebuilt,
|
|
|
|
* excluding invalid indexes and any indexes used in exclusion constraints,
|
|
|
|
* but including its associated toast table indexes. For indexes, the index
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
* itself will be rebuilt.
|
2019-03-29 08:25:20 +01:00
|
|
|
*
|
|
|
|
* The locks taken on parent tables and involved indexes are kept until the
|
|
|
|
* transaction is committed, at which point a session lock is taken on each
|
|
|
|
* relation. Both of these protect against concurrent schema changes.
|
2019-06-05 11:05:41 +02:00
|
|
|
*
|
|
|
|
* Returns true if any indexes have been rebuilt (including toast table's
|
|
|
|
* indexes, when relevant), otherwise returns false.
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
*
|
|
|
|
* NOTE: This cannot be used on temporary relations. A concurrent build would
|
|
|
|
* cause issues with ON COMMIT actions triggered by the transactions of the
|
|
|
|
* concurrent build. Temporary relations are not subject to concurrent
|
|
|
|
* concerns, so there's no need for the more complicated concurrent build,
|
|
|
|
* anyway, and a non-concurrent reindex is more efficient.
|
2019-03-29 08:25:20 +01:00
|
|
|
*/
|
|
|
|
static bool
|
2021-01-18 06:03:10 +01:00
|
|
|
ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
typedef struct ReindexIndexInfo
|
|
|
|
{
|
|
|
|
Oid indexId;
|
|
|
|
Oid tableId;
|
|
|
|
Oid amId;
|
2021-01-15 14:31:42 +01:00
|
|
|
bool safe; /* for set_indexsafe_procflags */
|
2021-01-12 21:04:49 +01:00
|
|
|
} ReindexIndexInfo;
|
2019-03-29 08:25:20 +01:00
|
|
|
List *heapRelationIds = NIL;
|
|
|
|
List *indexIds = NIL;
|
|
|
|
List *newIndexIds = NIL;
|
|
|
|
List *relationLocks = NIL;
|
|
|
|
List *lockTags = NIL;
|
|
|
|
ListCell *lc,
|
|
|
|
*lc2;
|
|
|
|
MemoryContext private_context;
|
|
|
|
MemoryContext oldcontext;
|
|
|
|
char relkind;
|
|
|
|
char *relationName = NULL;
|
|
|
|
char *relationNamespace = NULL;
|
|
|
|
PGRUsage ru0;
|
2020-09-29 07:15:57 +02:00
|
|
|
const int progress_index[] = {
|
|
|
|
PROGRESS_CREATEIDX_COMMAND,
|
|
|
|
PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_INDEX_OID,
|
|
|
|
PROGRESS_CREATEIDX_ACCESS_METHOD_OID
|
|
|
|
};
|
|
|
|
int64 progress_vals[4];
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a memory context that will survive forced transaction commits we
|
|
|
|
* do below. Since it is a child of PortalContext, it will go away
|
|
|
|
* eventually even if we suffer an error; there's no need for special
|
|
|
|
* abort cleanup logic.
|
|
|
|
*/
|
|
|
|
private_context = AllocSetContextCreate(PortalContext,
|
|
|
|
"ReindexConcurrent",
|
|
|
|
ALLOCSET_SMALL_SIZES);
|
|
|
|
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_VERBOSE) != 0)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
|
|
|
/* Save data needed by REINDEX VERBOSE in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
|
|
|
relationName = get_rel_name(relationOid);
|
|
|
|
relationNamespace = get_namespace_name(get_rel_namespace(relationOid));
|
|
|
|
|
|
|
|
pg_rusage_init(&ru0);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
relkind = get_rel_relkind(relationOid);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Extract the list of indexes that are going to be rebuilt based on the
|
2019-10-18 14:49:39 +02:00
|
|
|
* relation Oid given by caller.
|
2019-03-29 08:25:20 +01:00
|
|
|
*/
|
|
|
|
switch (relkind)
|
|
|
|
{
|
|
|
|
case RELKIND_RELATION:
|
|
|
|
case RELKIND_MATVIEW:
|
|
|
|
case RELKIND_TOASTVALUE:
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* In the case of a relation, find all its indexes including
|
|
|
|
* toast indexes.
|
|
|
|
*/
|
|
|
|
Relation heapRelation;
|
|
|
|
|
|
|
|
/* Save the list of relation OIDs in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
|
|
|
/* Track this relation for session locks */
|
|
|
|
heapRelationIds = lappend_oid(heapRelationIds, relationOid);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
2019-05-10 01:18:46 +02:00
|
|
|
if (IsCatalogRelationOid(relationOid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-06-20 06:28:12 +02:00
|
|
|
errmsg("cannot reindex system catalogs concurrently")));
|
2019-05-10 01:18:46 +02:00
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Open relation to get its indexes */
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_MISSING_OK) != 0)
|
2020-09-02 02:08:12 +02:00
|
|
|
{
|
|
|
|
heapRelation = try_table_open(relationOid,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
/* leave if relation does not exist */
|
|
|
|
if (!heapRelation)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
heapRelation = table_open(relationOid,
|
|
|
|
ShareUpdateExclusiveLock);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
if (OidIsValid(params->tablespaceOid) &&
|
|
|
|
IsSystemRelation(heapRelation))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot move system relation \"%s\"",
|
|
|
|
RelationGetRelationName(heapRelation))));
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Add all the valid indexes of relation to list */
|
|
|
|
foreach(lc, RelationGetIndexList(heapRelation))
|
|
|
|
{
|
|
|
|
Oid cellOid = lfirst_oid(lc);
|
|
|
|
Relation indexRelation = index_open(cellOid,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
|
|
|
|
if (!indexRelation->rd_index->indisvalid)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-05-10 01:18:46 +02:00
|
|
|
errmsg("cannot reindex invalid index \"%s.%s\" concurrently, skipping",
|
2019-03-29 08:25:20 +01:00
|
|
|
get_namespace_name(get_rel_namespace(cellOid)),
|
|
|
|
get_rel_name(cellOid))));
|
|
|
|
else if (indexRelation->rd_index->indisexclusion)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-05-10 01:18:46 +02:00
|
|
|
errmsg("cannot reindex exclusion constraint index \"%s.%s\" concurrently, skipping",
|
2019-03-29 08:25:20 +01:00
|
|
|
get_namespace_name(get_rel_namespace(cellOid)),
|
|
|
|
get_rel_name(cellOid))));
|
|
|
|
else
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx;
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Save the list of relation OIDs in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
idx = palloc(sizeof(ReindexIndexInfo));
|
|
|
|
idx->indexId = cellOid;
|
|
|
|
/* other fields set later */
|
|
|
|
|
|
|
|
indexIds = lappend(indexIds, idx);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
index_close(indexRelation, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Also add the toast indexes */
|
|
|
|
if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
|
|
|
|
{
|
|
|
|
Oid toastOid = heapRelation->rd_rel->reltoastrelid;
|
|
|
|
Relation toastRelation = table_open(toastOid,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
|
|
|
|
/* Save the list of relation OIDs in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
|
|
|
/* Track this relation for session locks */
|
|
|
|
heapRelationIds = lappend_oid(heapRelationIds, toastOid);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
foreach(lc2, RelationGetIndexList(toastRelation))
|
|
|
|
{
|
|
|
|
Oid cellOid = lfirst_oid(lc2);
|
|
|
|
Relation indexRelation = index_open(cellOid,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
|
|
|
|
if (!indexRelation->rd_index->indisvalid)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_INDEX_CORRUPTED),
|
2019-05-10 01:18:46 +02:00
|
|
|
errmsg("cannot reindex invalid index \"%s.%s\" concurrently, skipping",
|
2019-03-29 08:25:20 +01:00
|
|
|
get_namespace_name(get_rel_namespace(cellOid)),
|
|
|
|
get_rel_name(cellOid))));
|
|
|
|
else
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx;
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Save the list of relation OIDs in private
|
|
|
|
* context
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
idx = palloc(sizeof(ReindexIndexInfo));
|
|
|
|
idx->indexId = cellOid;
|
|
|
|
indexIds = lappend(indexIds, idx);
|
|
|
|
/* other fields set later */
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
}
|
|
|
|
|
|
|
|
index_close(indexRelation, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
table_close(toastRelation, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
table_close(heapRelation, NoLock);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case RELKIND_INDEX:
|
|
|
|
{
|
2020-09-02 02:08:12 +02:00
|
|
|
Oid heapId = IndexGetRelation(relationOid,
|
2021-01-18 06:03:10 +01:00
|
|
|
(params->options & REINDEXOPT_MISSING_OK) != 0);
|
2020-09-02 02:08:12 +02:00
|
|
|
Relation heapRelation;
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx;
|
2020-09-02 02:08:12 +02:00
|
|
|
|
|
|
|
/* if relation is missing, leave */
|
|
|
|
if (!OidIsValid(heapId))
|
|
|
|
break;
|
2019-03-29 08:25:20 +01:00
|
|
|
|
Clean up the behavior and API of catalog.c's is-catalog-relation tests.
The right way for IsCatalogRelation/Class to behave is to return true
for OIDs less than FirstBootstrapObjectId (not FirstNormalObjectId),
without any of the ad-hoc fooling around with schema membership.
The previous code was wrong because (1) it claimed that
information_schema tables were not catalog relations but their toast
tables were, which is silly; and (2) if you dropped and recreated
information_schema, which is a supported operation, the behavior
changed. That's even sillier. With this definition, "catalog
relations" are exactly the ones traceable to the postgres.bki data,
which seems like what we want.
With this simplification, we don't actually need access to the pg_class
tuple to identify a catalog relation; we only need its OID. Hence,
replace IsCatalogClass with "IsCatalogRelationOid(oid)". But keep
IsCatalogRelation as a convenience function.
This allows fixing some arguably-wrong semantics in contrib/sepgsql and
ReindexRelationConcurrently, which were using an IsSystemNamespace test
where what they really should be using is IsCatalogRelationOid. The
previous coding failed to protect toast tables of system catalogs, and
also was not on board with the general principle that user-created tables
do not become catalogs just by virtue of being renamed into pg_catalog.
We can also get rid of a messy hack in ReindexMultipleTables.
While we're at it, also rename IsSystemNamespace to IsCatalogNamespace,
because the previous name invited confusion with the more expansive
semantics used by IsSystemRelation/Class.
Also improve the comments in catalog.c.
There are a few remaining places in replication-related code that are
special-casing OIDs below FirstNormalObjectId. I'm inclined to think
those are wrong too, and if there should be any special case it should
just extend to FirstBootstrapObjectId. But first we need to debate
whether a FOR ALL TABLES publication should include information_schema.
Discussion: https://postgr.es/m/21697.1557092753@sss.pgh.pa.us
Discussion: https://postgr.es/m/15150.1557257111@sss.pgh.pa.us
2019-05-09 05:27:29 +02:00
|
|
|
if (IsCatalogRelationOid(heapId))
|
2019-03-29 08:25:20 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-06-20 06:28:12 +02:00
|
|
|
errmsg("cannot reindex system catalogs concurrently")));
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2020-03-10 07:38:17 +01:00
|
|
|
/*
|
|
|
|
* Don't allow reindex for an invalid index on TOAST table, as
|
2020-11-07 23:33:43 +01:00
|
|
|
* if rebuilt it would not be possible to drop it. Match
|
|
|
|
* error message in reindex_index().
|
2020-03-10 07:38:17 +01:00
|
|
|
*/
|
|
|
|
if (IsToastNamespace(get_rel_namespace(relationOid)) &&
|
|
|
|
!get_index_isvalid(relationOid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2020-11-07 23:33:43 +01:00
|
|
|
errmsg("cannot reindex invalid index on TOAST table")));
|
2020-03-10 07:38:17 +01:00
|
|
|
|
2020-09-02 02:08:12 +02:00
|
|
|
/*
|
|
|
|
* Check if parent relation can be locked and if it exists,
|
|
|
|
* this needs to be done at this stage as the list of indexes
|
|
|
|
* to rebuild is not complete yet, and REINDEXOPT_MISSING_OK
|
|
|
|
* should not be used once all the session locks are taken.
|
|
|
|
*/
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_MISSING_OK) != 0)
|
2020-09-02 02:08:12 +02:00
|
|
|
{
|
|
|
|
heapRelation = try_table_open(heapId,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
/* leave if relation does not exist */
|
|
|
|
if (!heapRelation)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
heapRelation = table_open(heapId,
|
|
|
|
ShareUpdateExclusiveLock);
|
2021-02-04 06:34:20 +01:00
|
|
|
|
|
|
|
if (OidIsValid(params->tablespaceOid) &&
|
|
|
|
IsSystemRelation(heapRelation))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot move system relation \"%s\"",
|
|
|
|
get_rel_name(relationOid))));
|
|
|
|
|
2020-09-02 02:08:12 +02:00
|
|
|
table_close(heapRelation, NoLock);
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Save the list of relation OIDs in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
|
|
|
/* Track the heap relation of this index for session locks */
|
|
|
|
heapRelationIds = list_make1_oid(heapId);
|
|
|
|
|
2019-04-17 02:33:51 +02:00
|
|
|
/*
|
|
|
|
* Save the list of relation OIDs in private context. Note
|
|
|
|
* that invalid indexes are allowed here.
|
|
|
|
*/
|
2021-01-12 21:04:49 +01:00
|
|
|
idx = palloc(sizeof(ReindexIndexInfo));
|
|
|
|
idx->indexId = relationOid;
|
|
|
|
indexIds = lappend(indexIds, idx);
|
|
|
|
/* other fields set later */
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-04-17 02:33:51 +02:00
|
|
|
MemoryContextSwitchTo(oldcontext);
|
2019-03-29 08:25:20 +01:00
|
|
|
break;
|
|
|
|
}
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
case RELKIND_PARTITIONED_TABLE:
|
Add support for partitioned tables and indexes in REINDEX
Until now, REINDEX was not able to work with partitioned tables and
indexes, forcing users to reindex partitions one by one. This extends
REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned
index and table in input, respectively, to reindex all the partitions
assigned to them with physical storage (foreign tables, partitioned
tables and indexes are then discarded).
This shares some logic with schema and database REINDEX as each
partition gets processed in its own transaction after building a list of
relations to work on. This choice has the advantage to minimize the
number of invalid indexes to one partition with REINDEX CONCURRENTLY in
the event a cancellation or failure in-flight, as the only indexes
handled at once in a single REINDEX CONCURRENTLY loop are the ones from
the partition being working on.
Isolation tests are added to emulate some cases I bumped into while
developing this feature, particularly with the concurrent drop of a
leaf partition reindexed. However, this is rather limited as LOCK would
cause REINDEX to block in the first transaction building the list of
partitions.
Per its multi-transaction nature, this new flavor cannot run in a
transaction block, similarly to REINDEX SCHEMA, SYSTEM and DATABASE.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Anastasia Lubennikova
Discussion: https://postgr.es/m/db12e897-73ff-467e-94cb-4af03705435f.adger.lj@alibaba-inc.com
2020-09-08 03:09:22 +02:00
|
|
|
case RELKIND_PARTITIONED_INDEX:
|
2019-03-29 08:25:20 +01:00
|
|
|
default:
|
|
|
|
/* Return error if type of relation is not supported */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
2019-05-10 01:18:46 +02:00
|
|
|
errmsg("cannot reindex this type of relation concurrently")));
|
2019-03-29 08:25:20 +01:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2020-09-02 02:08:12 +02:00
|
|
|
/*
|
|
|
|
* Definitely no indexes, so leave. Any checks based on
|
|
|
|
* REINDEXOPT_MISSING_OK should be done only while the list of indexes to
|
|
|
|
* work on is built as the session locks taken before this transaction
|
|
|
|
* commits will make sure that they cannot be dropped by a concurrent
|
|
|
|
* session until this operation completes.
|
|
|
|
*/
|
2019-03-29 08:25:20 +01:00
|
|
|
if (indexIds == NIL)
|
|
|
|
{
|
|
|
|
PopActiveSnapshot();
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
/* It's not a shared catalog, so refuse to move it to shared tablespace */
|
|
|
|
if (params->tablespaceOid == GLOBALTABLESPACE_OID)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot move non-shared relation to tablespace \"%s\"",
|
|
|
|
get_tablespace_name(params->tablespaceOid))));
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
Assert(heapRelationIds != NIL);
|
|
|
|
|
|
|
|
/*-----
|
|
|
|
* Now we have all the indexes we want to process in indexIds.
|
|
|
|
*
|
|
|
|
* The phases now are:
|
|
|
|
*
|
|
|
|
* 1. create new indexes in the catalog
|
|
|
|
* 2. build new indexes
|
|
|
|
* 3. let new indexes catch up with tuples inserted in the meantime
|
|
|
|
* 4. swap index names
|
|
|
|
* 5. mark old indexes as dead
|
|
|
|
* 6. drop old indexes
|
|
|
|
*
|
|
|
|
* We process each phase for all indexes before moving to the next phase,
|
|
|
|
* for efficiency.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Phase 1 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* Create a new index with the same properties as the old one, but it is
|
|
|
|
* only registered in catalogs and will be built later. Then get session
|
|
|
|
* locks on all involved tables. See analogous code in DefineIndex() for
|
|
|
|
* more detailed comments.
|
|
|
|
*/
|
|
|
|
|
|
|
|
foreach(lc, indexIds)
|
|
|
|
{
|
|
|
|
char *concurrentName;
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx = lfirst(lc);
|
|
|
|
ReindexIndexInfo *newidx;
|
2019-03-29 08:25:20 +01:00
|
|
|
Oid newIndexId;
|
|
|
|
Relation indexRel;
|
|
|
|
Relation heapRel;
|
2022-05-09 17:35:08 +02:00
|
|
|
Oid save_userid;
|
|
|
|
int save_sec_context;
|
|
|
|
int save_nestlevel;
|
2019-03-29 08:25:20 +01:00
|
|
|
Relation newIndexRel;
|
2019-03-29 10:53:40 +01:00
|
|
|
LockRelId *lockrelid;
|
2021-02-04 06:34:20 +01:00
|
|
|
Oid tablespaceid;
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
indexRel = index_open(idx->indexId, ShareUpdateExclusiveLock);
|
2019-03-29 08:25:20 +01:00
|
|
|
heapRel = table_open(indexRel->rd_index->indrelid,
|
|
|
|
ShareUpdateExclusiveLock);
|
|
|
|
|
2022-05-09 17:35:08 +02:00
|
|
|
/*
|
|
|
|
* Switch to the table owner's userid, so that any index functions are
|
|
|
|
* run as that user. Also lock down security-restricted operations
|
|
|
|
* and arrange to make GUC variable changes local to this command.
|
|
|
|
*/
|
|
|
|
GetUserIdAndSecContext(&save_userid, &save_sec_context);
|
|
|
|
SetUserIdAndSecContext(heapRel->rd_rel->relowner,
|
|
|
|
save_sec_context | SECURITY_RESTRICTED_OPERATION);
|
|
|
|
save_nestlevel = NewGUCNestLevel();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/* determine safety of this index for set_indexsafe_procflags */
|
|
|
|
idx->safe = (indexRel->rd_indexprs == NIL &&
|
|
|
|
indexRel->rd_indpred == NIL);
|
2021-01-12 21:04:49 +01:00
|
|
|
idx->tableId = RelationGetRelid(heapRel);
|
|
|
|
idx->amId = indexRel->rd_rel->relam;
|
|
|
|
|
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
2020-01-22 01:49:18 +01:00
|
|
|
/* This function shouldn't be called for temporary relations. */
|
|
|
|
if (indexRel->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
|
|
|
|
elog(ERROR, "cannot reindex a temporary table concurrently");
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
|
2021-01-12 21:04:49 +01:00
|
|
|
idx->tableId);
|
|
|
|
|
2020-09-29 07:15:57 +02:00
|
|
|
progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
|
|
|
|
progress_vals[1] = 0; /* initializing */
|
2021-01-12 21:04:49 +01:00
|
|
|
progress_vals[2] = idx->indexId;
|
|
|
|
progress_vals[3] = idx->amId;
|
2020-09-29 07:15:57 +02:00
|
|
|
pgstat_progress_update_multi_param(4, progress_index, progress_vals);
|
2019-04-07 11:30:14 +02:00
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Choose a temporary relation name for the new index */
|
2021-01-12 21:04:49 +01:00
|
|
|
concurrentName = ChooseRelationName(get_rel_name(idx->indexId),
|
2019-03-29 08:25:20 +01:00
|
|
|
NULL,
|
|
|
|
"ccnew",
|
|
|
|
get_rel_namespace(indexRel->rd_index->indrelid),
|
|
|
|
false);
|
|
|
|
|
2021-02-04 06:34:20 +01:00
|
|
|
/* Choose the new tablespace, indexes of toast tables are not moved */
|
|
|
|
if (OidIsValid(params->tablespaceOid) &&
|
|
|
|
heapRel->rd_rel->relkind != RELKIND_TOASTVALUE)
|
|
|
|
tablespaceid = params->tablespaceOid;
|
|
|
|
else
|
|
|
|
tablespaceid = indexRel->rd_rel->reltablespace;
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Create new index definition based on given index */
|
|
|
|
newIndexId = index_concurrently_create_copy(heapRel,
|
2021-01-12 21:04:49 +01:00
|
|
|
idx->indexId,
|
2021-02-04 06:34:20 +01:00
|
|
|
tablespaceid,
|
2019-03-29 08:25:20 +01:00
|
|
|
concurrentName);
|
|
|
|
|
2019-10-23 08:04:48 +02:00
|
|
|
/*
|
|
|
|
* Now open the relation of the new index, a session-level lock is
|
|
|
|
* also needed on it.
|
|
|
|
*/
|
|
|
|
newIndexRel = index_open(newIndexId, ShareUpdateExclusiveLock);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Save the list of OIDs and locks in private context
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
newidx = palloc(sizeof(ReindexIndexInfo));
|
|
|
|
newidx->indexId = newIndexId;
|
2021-01-15 14:31:42 +01:00
|
|
|
newidx->safe = idx->safe;
|
2021-01-12 21:04:49 +01:00
|
|
|
newidx->tableId = idx->tableId;
|
|
|
|
newidx->amId = idx->amId;
|
|
|
|
|
|
|
|
newIndexIds = lappend(newIndexIds, newidx);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Save lockrelid to protect each relation from drop then close
|
|
|
|
* relations. The lockrelid on parent relation is not taken here to
|
|
|
|
* avoid multiple locks taken on the same relation, instead we rely on
|
|
|
|
* parentRelationIds built earlier.
|
|
|
|
*/
|
2019-03-29 10:53:40 +01:00
|
|
|
lockrelid = palloc(sizeof(*lockrelid));
|
|
|
|
*lockrelid = indexRel->rd_lockInfo.lockRelId;
|
|
|
|
relationLocks = lappend(relationLocks, lockrelid);
|
|
|
|
lockrelid = palloc(sizeof(*lockrelid));
|
|
|
|
*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
|
|
|
|
relationLocks = lappend(relationLocks, lockrelid);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
index_close(indexRel, NoLock);
|
|
|
|
index_close(newIndexRel, NoLock);
|
2022-05-09 17:35:08 +02:00
|
|
|
|
|
|
|
/* Roll back any GUC changes executed by index functions */
|
|
|
|
AtEOXact_GUC(false, save_nestlevel);
|
|
|
|
|
|
|
|
/* Restore userid and security context */
|
|
|
|
SetUserIdAndSecContext(save_userid, save_sec_context);
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
table_close(heapRel, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Save the heap lock for following visibility checks with other backends
|
|
|
|
* might conflict with this session.
|
|
|
|
*/
|
|
|
|
foreach(lc, heapRelationIds)
|
|
|
|
{
|
|
|
|
Relation heapRelation = table_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
|
2019-03-29 10:53:40 +01:00
|
|
|
LockRelId *lockrelid;
|
2019-03-29 08:25:20 +01:00
|
|
|
LOCKTAG *heaplocktag;
|
|
|
|
|
|
|
|
/* Save the list of locks in private context */
|
|
|
|
oldcontext = MemoryContextSwitchTo(private_context);
|
|
|
|
|
|
|
|
/* Add lockrelid of heap relation to the list of locked relations */
|
2019-03-29 10:53:40 +01:00
|
|
|
lockrelid = palloc(sizeof(*lockrelid));
|
|
|
|
*lockrelid = heapRelation->rd_lockInfo.lockRelId;
|
|
|
|
relationLocks = lappend(relationLocks, lockrelid);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
|
|
|
|
|
|
|
|
/* Save the LOCKTAG for this parent relation for the wait phase */
|
2019-03-29 10:53:40 +01:00
|
|
|
SET_LOCKTAG_RELATION(*heaplocktag, lockrelid->dbId, lockrelid->relId);
|
2019-03-29 08:25:20 +01:00
|
|
|
lockTags = lappend(lockTags, heaplocktag);
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
/* Close heap relation */
|
|
|
|
table_close(heapRelation, NoLock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Get a session-level lock on each table. */
|
|
|
|
foreach(lc, relationLocks)
|
|
|
|
{
|
2019-03-29 10:53:40 +01:00
|
|
|
LockRelId *lockrelid = (LockRelId *) lfirst(lc);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-03-29 10:53:40 +01:00
|
|
|
LockRelationIdForSession(lockrelid, ShareUpdateExclusiveLock);
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
PopActiveSnapshot();
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/*
|
|
|
|
* Because we don't take a snapshot in this transaction, there's no need
|
|
|
|
* to set the PROC_IN_SAFE_IC flag here.
|
|
|
|
*/
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Phase 2 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* Build the new indexes in a separate transaction for each index to avoid
|
|
|
|
* having open transactions for an unnecessary long time. But before
|
|
|
|
* doing that, wait until no running transactions could have the table of
|
|
|
|
* the index open with the old list of indexes. See "phase 2" in
|
|
|
|
* DefineIndex() for more details.
|
|
|
|
*/
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_1);
|
|
|
|
WaitForLockersMultiple(lockTags, ShareLock, true);
|
2019-03-29 08:25:20 +01:00
|
|
|
CommitTransactionCommand();
|
|
|
|
|
2020-09-29 07:15:57 +02:00
|
|
|
foreach(lc, newIndexIds)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *newidx = lfirst(lc);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/* Start new transaction for this index's concurrent build */
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2019-10-25 03:20:08 +02:00
|
|
|
/*
|
|
|
|
* Check for user-requested abort. This is inside a transaction so as
|
|
|
|
* xact.c does not issue a useless WARNING, and ensures that
|
|
|
|
* session-level locks are cleaned up on abort.
|
|
|
|
*/
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/* Tell concurrent indexing to ignore us, if index qualifies */
|
|
|
|
if (newidx->safe)
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/* Set ActiveSnapshot since functions in the indexes may need it */
|
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
|
|
|
|
2020-09-29 07:15:57 +02:00
|
|
|
/*
|
|
|
|
* Update progress for the index to build, with the correct parent
|
|
|
|
* table involved.
|
|
|
|
*/
|
2021-01-12 21:04:49 +01:00
|
|
|
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, newidx->tableId);
|
2020-09-29 07:15:57 +02:00
|
|
|
progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
|
|
|
|
progress_vals[1] = PROGRESS_CREATEIDX_PHASE_BUILD;
|
2021-01-12 21:04:49 +01:00
|
|
|
progress_vals[2] = newidx->indexId;
|
|
|
|
progress_vals[3] = newidx->amId;
|
2020-09-29 07:15:57 +02:00
|
|
|
pgstat_progress_update_multi_param(4, progress_index, progress_vals);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/* Perform concurrent build of new index */
|
2021-01-12 21:04:49 +01:00
|
|
|
index_concurrently_build(newidx->tableId, newidx->indexId);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
PopActiveSnapshot();
|
|
|
|
CommitTransactionCommand();
|
|
|
|
}
|
2021-01-15 14:31:42 +01:00
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
StartTransactionCommand();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/*
|
|
|
|
* Because we don't take a snapshot or Xid in this transaction, there's no
|
|
|
|
* need to set the PROC_IN_SAFE_IC flag here.
|
|
|
|
*/
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Phase 3 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* During this phase the old indexes catch up with any new tuples that
|
|
|
|
* were created during the previous phase. See "phase 3" in DefineIndex()
|
|
|
|
* for more details.
|
|
|
|
*/
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_2);
|
|
|
|
WaitForLockersMultiple(lockTags, ShareLock, true);
|
2019-03-29 08:25:20 +01:00
|
|
|
CommitTransactionCommand();
|
|
|
|
|
|
|
|
foreach(lc, newIndexIds)
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *newidx = lfirst(lc);
|
2019-03-29 08:25:20 +01:00
|
|
|
TransactionId limitXmin;
|
|
|
|
Snapshot snapshot;
|
|
|
|
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2019-10-25 03:20:08 +02:00
|
|
|
/*
|
|
|
|
* Check for user-requested abort. This is inside a transaction so as
|
|
|
|
* xact.c does not issue a useless WARNING, and ensures that
|
|
|
|
* session-level locks are cleaned up on abort.
|
|
|
|
*/
|
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/* Tell concurrent indexing to ignore us, if index qualifies */
|
|
|
|
if (newidx->safe)
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Take the "reference snapshot" that will be used by validate_index()
|
|
|
|
* to filter candidate tuples.
|
|
|
|
*/
|
|
|
|
snapshot = RegisterSnapshot(GetTransactionSnapshot());
|
|
|
|
PushActiveSnapshot(snapshot);
|
|
|
|
|
2020-09-29 07:15:57 +02:00
|
|
|
/*
|
|
|
|
* Update progress for the index to build, with the correct parent
|
|
|
|
* table involved.
|
|
|
|
*/
|
2021-01-12 21:04:49 +01:00
|
|
|
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
|
|
|
|
newidx->tableId);
|
2020-09-29 07:15:57 +02:00
|
|
|
progress_vals[0] = PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY;
|
|
|
|
progress_vals[1] = PROGRESS_CREATEIDX_PHASE_VALIDATE_IDXSCAN;
|
2021-01-12 21:04:49 +01:00
|
|
|
progress_vals[2] = newidx->indexId;
|
|
|
|
progress_vals[3] = newidx->amId;
|
2020-09-29 07:15:57 +02:00
|
|
|
pgstat_progress_update_multi_param(4, progress_index, progress_vals);
|
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
validate_index(newidx->tableId, newidx->indexId, snapshot);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We can now do away with our active snapshot, we still need to save
|
|
|
|
* the xmin limit to wait for older snapshots.
|
|
|
|
*/
|
|
|
|
limitXmin = snapshot->xmin;
|
|
|
|
|
2008-05-12 22:02:02 +02:00
|
|
|
PopActiveSnapshot();
|
2019-03-29 08:25:20 +01:00
|
|
|
UnregisterSnapshot(snapshot);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* To ensure no deadlocks, we must commit and start yet another
|
|
|
|
* transaction, and do our wait before any snapshot has been taken in
|
|
|
|
* it.
|
|
|
|
*/
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The index is now valid in the sense that it contains all currently
|
|
|
|
* interesting tuples. But since it might not contain tuples deleted
|
|
|
|
* just before the reference snap was taken, we have to wait out any
|
|
|
|
* transactions that might have older snapshots.
|
2021-01-15 14:31:42 +01:00
|
|
|
*
|
|
|
|
* Because we don't take a snapshot or Xid in this transaction,
|
|
|
|
* there's no need to set the PROC_IN_SAFE_IC flag here.
|
2019-03-29 08:25:20 +01:00
|
|
|
*/
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_3);
|
|
|
|
WaitForOlderSnapshots(limitXmin, true);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2003-05-14 05:26:03 +02:00
|
|
|
CommitTransactionCommand();
|
2000-02-18 10:30:20 +01:00
|
|
|
}
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Phase 4 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* Now that the new indexes have been validated, swap each new index with
|
|
|
|
* its corresponding old index.
|
|
|
|
*
|
|
|
|
* We mark the new indexes as valid and the old indexes as not valid at
|
|
|
|
* the same time to make sure we only get constraint violations from the
|
|
|
|
* indexes with the correct names.
|
|
|
|
*/
|
|
|
|
|
2003-05-14 05:26:03 +02:00
|
|
|
StartTransactionCommand();
|
2000-06-28 05:33:33 +02:00
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/*
|
|
|
|
* Because this transaction only does catalog manipulations and doesn't do
|
|
|
|
* any index operations, we can set the PROC_IN_SAFE_IC flag here
|
|
|
|
* unconditionally.
|
|
|
|
*/
|
|
|
|
set_indexsafe_procflags();
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
forboth(lc, indexIds, lc2, newIndexIds)
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *oldidx = lfirst(lc);
|
|
|
|
ReindexIndexInfo *newidx = lfirst(lc2);
|
2019-03-29 08:25:20 +01:00
|
|
|
char *oldName;
|
|
|
|
|
2019-10-25 03:20:08 +02:00
|
|
|
/*
|
|
|
|
* Check for user-requested abort. This is inside a transaction so as
|
|
|
|
* xact.c does not issue a useless WARNING, and ensures that
|
|
|
|
* session-level locks are cleaned up on abort.
|
|
|
|
*/
|
2019-03-29 08:25:20 +01:00
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
|
|
|
/* Choose a relation name for old index */
|
2021-01-12 21:04:49 +01:00
|
|
|
oldName = ChooseRelationName(get_rel_name(oldidx->indexId),
|
2019-03-29 08:25:20 +01:00
|
|
|
NULL,
|
|
|
|
"ccold",
|
2021-01-12 21:04:49 +01:00
|
|
|
get_rel_namespace(oldidx->tableId),
|
2019-03-29 08:25:20 +01:00
|
|
|
false);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Swap old index with the new one. This also marks the new one as
|
|
|
|
* valid and the old one as not valid.
|
|
|
|
*/
|
2021-01-12 21:04:49 +01:00
|
|
|
index_concurrently_swap(newidx->indexId, oldidx->indexId, oldName);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Invalidate the relcache for the table, so that after this commit
|
|
|
|
* all sessions will refresh any cached plans that might reference the
|
|
|
|
* index.
|
|
|
|
*/
|
2021-01-12 21:04:49 +01:00
|
|
|
CacheInvalidateRelcacheByRelid(oldidx->tableId);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* CCI here so that subsequent iterations see the oldName in the
|
|
|
|
* catalog and can choose a nonconflicting name for their oldName.
|
|
|
|
* Otherwise, this could lead to conflicts if a table has two indexes
|
|
|
|
* whose names are equal for the first NAMEDATALEN-minus-a-few
|
|
|
|
* characters.
|
|
|
|
*/
|
|
|
|
CommandCounterIncrement();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Commit this transaction and make index swaps visible */
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/*
|
|
|
|
* While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no
|
|
|
|
* real need for that, because we only acquire an Xid after the wait is
|
|
|
|
* done, and that lasts for a very short period.
|
|
|
|
*/
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Phase 5 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* Mark the old indexes as dead. First we must wait until no running
|
|
|
|
* transaction could be using the index for a query. See also
|
|
|
|
* index_drop() for more details.
|
|
|
|
*/
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_4);
|
|
|
|
WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
foreach(lc, indexIds)
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *oldidx = lfirst(lc);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-10-25 03:20:08 +02:00
|
|
|
/*
|
|
|
|
* Check for user-requested abort. This is inside a transaction so as
|
|
|
|
* xact.c does not issue a useless WARNING, and ensures that
|
|
|
|
* session-level locks are cleaned up on abort.
|
|
|
|
*/
|
2019-03-29 08:25:20 +01:00
|
|
|
CHECK_FOR_INTERRUPTS();
|
2019-10-25 03:20:08 +02:00
|
|
|
|
2021-01-12 21:04:49 +01:00
|
|
|
index_concurrently_set_dead(oldidx->tableId, oldidx->indexId);
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Commit this transaction to make the updates visible. */
|
|
|
|
CommitTransactionCommand();
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
2021-01-15 14:31:42 +01:00
|
|
|
/*
|
|
|
|
* While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no
|
|
|
|
* real need for that, because we only acquire an Xid after the wait is
|
|
|
|
* done, and that lasts for a very short period.
|
|
|
|
*/
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
/*
|
|
|
|
* Phase 6 of REINDEX CONCURRENTLY
|
|
|
|
*
|
|
|
|
* Drop the old indexes.
|
|
|
|
*/
|
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
|
2020-09-29 07:15:57 +02:00
|
|
|
PROGRESS_CREATEIDX_PHASE_WAIT_5);
|
2019-04-07 11:30:14 +02:00
|
|
|
WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
PushActiveSnapshot(GetTransactionSnapshot());
|
|
|
|
|
|
|
|
{
|
|
|
|
ObjectAddresses *objects = new_object_addresses();
|
|
|
|
|
|
|
|
foreach(lc, indexIds)
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx = lfirst(lc);
|
2019-03-30 07:16:24 +01:00
|
|
|
ObjectAddress object;
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-03-30 07:16:24 +01:00
|
|
|
object.classId = RelationRelationId;
|
2021-01-12 21:04:49 +01:00
|
|
|
object.objectId = idx->indexId;
|
2019-03-30 07:16:24 +01:00
|
|
|
object.objectSubId = 0;
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-03-30 07:16:24 +01:00
|
|
|
add_exact_object_address(&object, objects);
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
|
|
|
|
* right lock level.
|
|
|
|
*/
|
|
|
|
performMultipleDeletions(objects, DROP_RESTRICT,
|
|
|
|
PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
PopActiveSnapshot();
|
|
|
|
CommitTransactionCommand();
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Finally, release the session-level lock on the table.
|
|
|
|
*/
|
|
|
|
foreach(lc, relationLocks)
|
|
|
|
{
|
2019-03-29 10:53:40 +01:00
|
|
|
LockRelId *lockrelid = (LockRelId *) lfirst(lc);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-03-29 10:53:40 +01:00
|
|
|
UnlockRelationIdForSession(lockrelid, ShareUpdateExclusiveLock);
|
2019-03-29 08:25:20 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Start a new transaction to finish process properly */
|
|
|
|
StartTransactionCommand();
|
|
|
|
|
|
|
|
/* Log what we did */
|
2021-01-18 06:03:10 +01:00
|
|
|
if ((params->options & REINDEXOPT_VERBOSE) != 0)
|
2019-03-29 08:25:20 +01:00
|
|
|
{
|
|
|
|
if (relkind == RELKIND_INDEX)
|
|
|
|
ereport(INFO,
|
|
|
|
(errmsg("index \"%s.%s\" was reindexed",
|
|
|
|
relationNamespace, relationName),
|
|
|
|
errdetail("%s.",
|
|
|
|
pg_rusage_show(&ru0))));
|
|
|
|
else
|
|
|
|
{
|
|
|
|
foreach(lc, newIndexIds)
|
|
|
|
{
|
2021-01-12 21:04:49 +01:00
|
|
|
ReindexIndexInfo *idx = lfirst(lc);
|
|
|
|
Oid indOid = idx->indexId;
|
2019-03-29 08:25:20 +01:00
|
|
|
|
|
|
|
ereport(INFO,
|
|
|
|
(errmsg("index \"%s.%s\" was reindexed",
|
|
|
|
get_namespace_name(get_rel_namespace(indOid)),
|
|
|
|
get_rel_name(indOid))));
|
|
|
|
/* Don't show rusage here, since it's not per index. */
|
|
|
|
}
|
|
|
|
|
|
|
|
ereport(INFO,
|
|
|
|
(errmsg("table \"%s.%s\" was reindexed",
|
|
|
|
relationNamespace, relationName),
|
|
|
|
errdetail("%s.",
|
|
|
|
pg_rusage_show(&ru0))));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2000-06-28 05:33:33 +02:00
|
|
|
MemoryContextDelete(private_context);
|
2019-03-29 08:25:20 +01:00
|
|
|
|
2019-04-07 11:30:14 +02:00
|
|
|
pgstat_progress_end_command();
|
|
|
|
|
2019-03-29 08:25:20 +01:00
|
|
|
return true;
|
2000-02-18 10:30:20 +01:00
|
|
|
}
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Insert or delete an appropriate pg_inherits tuple to make the given index
|
|
|
|
* be a partition of the indicated parent index.
|
|
|
|
*
|
|
|
|
* This also corrects the pg_depend information for the affected index.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
|
|
|
|
{
|
|
|
|
Relation pg_inherits;
|
|
|
|
ScanKeyData key[2];
|
|
|
|
SysScanDesc scan;
|
|
|
|
Oid partRelid = RelationGetRelid(partitionIdx);
|
|
|
|
HeapTuple tuple;
|
|
|
|
bool fix_dependencies;
|
|
|
|
|
|
|
|
/* Make sure this is an index */
|
|
|
|
Assert(partitionIdx->rd_rel->relkind == RELKIND_INDEX ||
|
|
|
|
partitionIdx->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Scan pg_inherits for rows linking our index to some parent.
|
|
|
|
*/
|
|
|
|
pg_inherits = relation_open(InheritsRelationId, RowExclusiveLock);
|
|
|
|
ScanKeyInit(&key[0],
|
|
|
|
Anum_pg_inherits_inhrelid,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(partRelid));
|
|
|
|
ScanKeyInit(&key[1],
|
|
|
|
Anum_pg_inherits_inhseqno,
|
|
|
|
BTEqualStrategyNumber, F_INT4EQ,
|
|
|
|
Int32GetDatum(1));
|
|
|
|
scan = systable_beginscan(pg_inherits, InheritsRelidSeqnoIndexId, true,
|
|
|
|
NULL, 2, key);
|
|
|
|
tuple = systable_getnext(scan);
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
|
|
|
{
|
|
|
|
if (parentOid == InvalidOid)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* No pg_inherits row, and no parent wanted: nothing to do in this
|
|
|
|
* case.
|
|
|
|
*/
|
|
|
|
fix_dependencies = false;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2021-03-25 14:47:38 +01:00
|
|
|
StoreSingleInheritance(partRelid, parentOid, 1);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
fix_dependencies = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
Form_pg_inherits inhForm = (Form_pg_inherits) GETSTRUCT(tuple);
|
|
|
|
|
|
|
|
if (parentOid == InvalidOid)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* There exists a pg_inherits row, which we want to clear; do so.
|
|
|
|
*/
|
|
|
|
CatalogTupleDelete(pg_inherits, &tuple->t_self);
|
|
|
|
fix_dependencies = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* A pg_inherits row exists. If it's the same we want, then we're
|
|
|
|
* good; if it differs, that amounts to a corrupt catalog and
|
|
|
|
* should not happen.
|
|
|
|
*/
|
|
|
|
if (inhForm->inhparent != parentOid)
|
|
|
|
{
|
|
|
|
/* unexpected: we should not get called in this case */
|
|
|
|
elog(ERROR, "bogus pg_inherit row: inhrelid %u inhparent %u",
|
|
|
|
inhForm->inhrelid, inhForm->inhparent);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* already in the right state */
|
|
|
|
fix_dependencies = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* done with pg_inherits */
|
|
|
|
systable_endscan(scan);
|
|
|
|
relation_close(pg_inherits, RowExclusiveLock);
|
|
|
|
|
2018-10-22 04:04:48 +02:00
|
|
|
/* set relhassubclass if an index partition has been added to the parent */
|
|
|
|
if (OidIsValid(parentOid))
|
|
|
|
SetRelationHasSubclass(parentOid, true);
|
|
|
|
|
2019-04-25 16:50:14 +02:00
|
|
|
/* set relispartition correctly on the partition */
|
|
|
|
update_relispartition(partRelid, OidIsValid(parentOid));
|
|
|
|
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
if (fix_dependencies)
|
|
|
|
{
|
|
|
|
/*
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
* Insert/delete pg_depend rows. If setting a parent, add PARTITION
|
|
|
|
* dependencies on the parent index and the table; if removing a
|
|
|
|
* parent, delete PARTITION dependencies.
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
*/
|
|
|
|
if (OidIsValid(parentOid))
|
|
|
|
{
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
ObjectAddress partIdx;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
ObjectAddress parentIdx;
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
ObjectAddress partitionTbl;
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
ObjectAddressSet(partIdx, RelationRelationId, partRelid);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
ObjectAddressSet(parentIdx, RelationRelationId, parentOid);
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
ObjectAddressSet(partitionTbl, RelationRelationId,
|
|
|
|
partitionIdx->rd_index->indrelid);
|
|
|
|
recordDependencyOn(&partIdx, &parentIdx,
|
|
|
|
DEPENDENCY_PARTITION_PRI);
|
|
|
|
recordDependencyOn(&partIdx, &partitionTbl,
|
|
|
|
DEPENDENCY_PARTITION_SEC);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
deleteDependencyRecordsForClass(RelationRelationId, partRelid,
|
|
|
|
RelationRelationId,
|
Redesign the partition dependency mechanism.
The original setup for dependencies of partitioned objects had
serious problems:
1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents. Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation. (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension. I'm not sure if any less-artificial
cases exist.)
2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.
Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies. This is
cosmetic to users but causes testing problems.
To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted. To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.
To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes. (They behave identically except for having different
priorities for being cited in error messages.) This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.
This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.
Catversion bump because the contents of pg_depend change for
partitioning relationships.
Patch HEAD only. It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.
Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
|
|
|
DEPENDENCY_PARTITION_PRI);
|
|
|
|
deleteDependencyRecordsForClass(RelationRelationId, partRelid,
|
|
|
|
RelationRelationId,
|
|
|
|
DEPENDENCY_PARTITION_SEC);
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
}
|
2018-03-20 15:19:41 +01:00
|
|
|
|
|
|
|
/* make our updates visible */
|
|
|
|
CommandCounterIncrement();
|
Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 15:49:22 +01:00
|
|
|
}
|
|
|
|
}
|
2019-04-25 16:50:14 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Subroutine of IndexSetParentIndex to update the relispartition flag of the
|
|
|
|
* given index to the given value.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
update_relispartition(Oid relationId, bool newval)
|
|
|
|
{
|
|
|
|
HeapTuple tup;
|
|
|
|
Relation classRel;
|
|
|
|
|
|
|
|
classRel = table_open(RelationRelationId, RowExclusiveLock);
|
|
|
|
tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
|
2019-05-05 18:44:32 +02:00
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
elog(ERROR, "cache lookup failed for relation %u", relationId);
|
2019-04-25 16:50:14 +02:00
|
|
|
Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
|
|
|
|
((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
|
|
|
|
CatalogTupleUpdate(classRel, &tup->t_self, tup);
|
|
|
|
heap_freetuple(tup);
|
|
|
|
table_close(classRel, RowExclusiveLock);
|
|
|
|
}
|
2020-11-25 22:21:08 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the PROC_IN_SAFE_IC flag in MyProc->statusFlags.
|
|
|
|
*
|
|
|
|
* When doing concurrent index builds, we can set this flag
|
|
|
|
* to tell other processes concurrently running CREATE
|
|
|
|
* INDEX CONCURRENTLY or REINDEX CONCURRENTLY to ignore us when
|
|
|
|
* doing their waits for concurrent snapshots. On one hand it
|
|
|
|
* avoids pointlessly waiting for a process that's not interesting
|
|
|
|
* anyway; but more importantly it avoids deadlocks in some cases.
|
|
|
|
*
|
|
|
|
* This can be done safely only for indexes that don't execute any
|
|
|
|
* expressions that could access other tables, so index must not be
|
|
|
|
* expressional nor partial. Caller is responsible for only calling
|
|
|
|
* this routine when that assumption holds true.
|
|
|
|
*
|
|
|
|
* (The flag is reset automatically at transaction end, so it must be
|
|
|
|
* set for each transaction.)
|
|
|
|
*/
|
|
|
|
static inline void
|
|
|
|
set_indexsafe_procflags(void)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* This should only be called before installing xid or xmin in MyProc;
|
|
|
|
* otherwise, concurrent processes could see an Xmin that moves backwards.
|
|
|
|
*/
|
|
|
|
Assert(MyProc->xid == InvalidTransactionId &&
|
|
|
|
MyProc->xmin == InvalidTransactionId);
|
|
|
|
|
|
|
|
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
|
|
|
|
MyProc->statusFlags |= PROC_IN_SAFE_IC;
|
|
|
|
ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
|
|
|
|
LWLockRelease(ProcArrayLock);
|
|
|
|
}
|