postgresql/src/backend/commands/indexcmds.c

3462 lines
104 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* indexcmds.c
* POSTGRES define and remove index code.
*
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/commands/indexcmds.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/amapi.h"
2019-01-15 00:54:18 +01:00
#include "access/heapam.h"
#include "access/htup_details.h"
#include "access/reloptions.h"
#include "access/sysattr.h"
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
#include "access/tableam.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
#include "catalog/pg_am.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_inherits.h"
1999-07-16 07:00:38 +02:00
#include "catalog/pg_opclass.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_type.h"
#include "commands/comment.h"
2003-06-27 16:45:32 +02:00
#include "commands/dbcommands.h"
#include "commands/defrem.h"
#include "commands/event_trigger.h"
#include "commands/progress.h"
#include "commands/tablecmds.h"
#include "commands/tablespace.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
#include "parser/parse_oper.h"
#include "partitioning/partdesc.h"
#include "pgstat.h"
#include "rewrite/rewriteManip.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/sinvaladt.h"
#include "utils/acl.h"
1999-07-16 07:00:38 +02:00
#include "utils/builtins.h"
#include "utils/fmgroids.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
#include "utils/pg_rusage.h"
#include "utils/regproc.h"
#include "utils/snapmgr.h"
1999-07-16 07:00:38 +02:00
#include "utils/syscache.h"
/* non-export function prototypes */
static void CheckPredicate(Expr *predicate);
static void ComputeIndexAttrs(IndexInfo *indexInfo,
Oid *typeOidP,
Oid *collationOidP,
Oid *classOidP,
int16 *colOptionP,
2004-08-29 07:07:03 +02:00
List *attList,
List *exclusionOpNames,
2004-08-29 07:07:03 +02:00
Oid relId,
const char *accessMethodName, Oid accessMethodId,
bool amcanorder,
2004-08-29 07:07:03 +02:00
bool isconstraint);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
bool primary, bool isconstraint);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
Oid relId, Oid oldRelId, void *arg);
static bool ReindexRelationConcurrently(Oid relationOid, int options);
static void ReindexPartitionedIndex(Relation parentIdx);
static void update_relispartition(Oid relationId, bool newval);
/*
* CheckIndexCompatible
* Determine whether an existing index definition is compatible with a
* prospective index definition, such that the existing index storage
* could become the storage of the new index, avoiding a rebuild.
*
* 'heapRelation': the relation the index would apply to.
* 'accessMethodName': name of the AM to use.
* 'attributeList': a list of IndexElem specifying columns and expressions
* to index on.
* 'exclusionOpNames': list of names of exclusion-constraint operators,
* or NIL if not an exclusion constraint.
*
* This is tailored to the needs of ALTER TABLE ALTER TYPE, which recreates
* any indexes that depended on a changing column from their pg_get_indexdef
* or pg_get_constraintdef definitions. We omit some of the sanity checks of
* DefineIndex. We assume that the old and new indexes have the same number
* of columns and that if one has an expression column or predicate, both do.
* Errors arising from the attribute list still apply.
*
* Most column type changes that can skip a table rewrite do not invalidate
* indexes. We acknowledge this when all operator classes, collations and
* exclusion operators match. Though we could further permit intra-opfamily
* changes for btree and hash indexes, that adds subtle complexity with no
* concrete benefit for core types. Note, that INCLUDE columns aren't
* checked by this function, for them it's enough that table rewrite is
* skipped.
*
* When a comparison or exclusion operator has a polymorphic input type, the
* actual input types must also match. This defends against the possibility
* that operators could vary behavior in response to get_fn_expr_argtype().
* At present, this hazard is theoretical: check_exclusion_constraint() and
* all core index access methods decline to set fn_expr for such calls.
*
* We do not yet implement a test to verify compatibility of expression
* columns or predicates, so assume any such index is incompatible.
*/
bool
CheckIndexCompatible(Oid oldId,
const char *accessMethodName,
List *attributeList,
List *exclusionOpNames)
{
bool isconstraint;
Oid *typeObjectId;
Oid *collationObjectId;
Oid *classObjectId;
Oid accessMethodId;
Oid relationId;
HeapTuple tuple;
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
Form_pg_index indexForm;
Form_pg_am accessMethodForm;
IndexAmRoutine *amRoutine;
bool amcanorder;
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
int old_natts;
bool isnull;
bool ret = true;
oidvector *old_indclass;
oidvector *old_indcollation;
Relation irel;
int i;
Datum d;
/* Caller should already have the relation locked in some way. */
relationId = IndexGetRelation(oldId, false);
/*
* We can pretend isconstraint = false unconditionally. It only serves to
* decide the text of an error message that should never happen for us.
*/
isconstraint = false;
numberOfAttributes = list_length(attributeList);
Assert(numberOfAttributes > 0);
Assert(numberOfAttributes <= INDEX_MAX_KEYS);
/* look up the access method */
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("access method \"%s\" does not exist",
accessMethodName)));
accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
accessMethodId = accessMethodForm->oid;
amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
ReleaseSysCache(tuple);
amcanorder = amRoutine->amcanorder;
/*
* Compute the operator classes, collations, and exclusion operators for
* the new index, so we can test whether it's compatible with the existing
* one. Note that ComputeIndexAttrs might fail here, but that's OK:
* DefineIndex would have called this function with the same arguments
* later on, and it would have failed then anyway. Our attributeList
* contains only key attributes, thus we're filling ii_NumIndexAttrs and
* ii_NumIndexKeyAttrs with same value.
*/
indexInfo = makeNode(IndexInfo);
indexInfo->ii_NumIndexAttrs = numberOfAttributes;
indexInfo->ii_NumIndexKeyAttrs = numberOfAttributes;
indexInfo->ii_Expressions = NIL;
indexInfo->ii_ExpressionsState = NIL;
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
indexInfo->ii_PredicateState = NULL;
indexInfo->ii_ExclusionOps = NULL;
indexInfo->ii_ExclusionProcs = NULL;
indexInfo->ii_ExclusionStrats = NULL;
indexInfo->ii_Am = accessMethodId;
indexInfo->ii_AmCache = NULL;
indexInfo->ii_Context = CurrentMemoryContext;
typeObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
ComputeIndexAttrs(indexInfo,
typeObjectId, collationObjectId, classObjectId,
coloptions, attributeList,
exclusionOpNames, relationId,
accessMethodName, accessMethodId,
amcanorder, isconstraint);
/* Get the soon-obsolete pg_index tuple. */
tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(oldId));
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for index %u", oldId);
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
indexForm = (Form_pg_index) GETSTRUCT(tuple);
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
/*
* We don't assess expressions or predicates; assume incompatibility.
* Also, if the index is invalid for any reason, treat it as incompatible.
*/
if (!(heap_attisnull(tuple, Anum_pg_index_indpred, NULL) &&
heap_attisnull(tuple, Anum_pg_index_indexprs, NULL) &&
indexForm->indisvalid))
{
ReleaseSysCache(tuple);
return false;
}
/* Any change in operator class or collation breaks compatibility. */
old_natts = indexForm->indnkeyatts;
Assert(old_natts == numberOfAttributes);
d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indcollation, &isnull);
Assert(!isnull);
old_indcollation = (oidvector *) DatumGetPointer(d);
d = SysCacheGetAttr(INDEXRELID, tuple, Anum_pg_index_indclass, &isnull);
Assert(!isnull);
old_indclass = (oidvector *) DatumGetPointer(d);
ret = (memcmp(old_indclass->values, classObjectId,
old_natts * sizeof(Oid)) == 0 &&
memcmp(old_indcollation->values, collationObjectId,
old_natts * sizeof(Oid)) == 0);
ReleaseSysCache(tuple);
if (!ret)
return false;
/* For polymorphic opcintype, column type changes break compatibility. */
irel = index_open(oldId, AccessShareLock); /* caller probably has a lock */
for (i = 0; i < old_natts; i++)
{
if (IsPolymorphicType(get_opclass_input_type(classObjectId[i])) &&
TupleDescAttr(irel->rd_att, i)->atttypid != typeObjectId[i])
{
ret = false;
break;
}
}
/* Any change in exclusion operator selections breaks compatibility. */
if (ret && indexInfo->ii_ExclusionOps != NULL)
{
Oid *old_operators,
*old_procs;
uint16 *old_strats;
RelationGetExclusionInfo(irel, &old_operators, &old_procs, &old_strats);
ret = memcmp(old_operators, indexInfo->ii_ExclusionOps,
old_natts * sizeof(Oid)) == 0;
/* Require an exact input type match for polymorphic operators. */
if (ret)
{
for (i = 0; i < old_natts && ret; i++)
{
Oid left,
right;
op_input_types(indexInfo->ii_ExclusionOps[i], &left, &right);
if ((IsPolymorphicType(left) || IsPolymorphicType(right)) &&
TupleDescAttr(irel->rd_att, i)->atttypid != typeObjectId[i])
{
ret = false;
break;
}
}
}
}
index_close(irel, NoLock);
return ret;
}
/*
* WaitForOlderSnapshots
*
* Wait for transactions that might have an older snapshot than the given xmin
* limit, because it might not contain tuples deleted just before it has
* been taken. Obtain a list of VXIDs of such transactions, and wait for them
* individually. This is used when building an index concurrently.
*
* We can exclude any running transactions that have xmin > the xmin given;
* their oldest snapshot must be newer than our xmin limit.
* We can also exclude any transactions that have xmin = zero, since they
* evidently have no live snapshot at all (and any one they might be in
* process of taking is certainly newer than ours). Transactions in other
* DBs can be ignored too, since they'll never even be able to see the
* index being worked on.
*
* We can also exclude autovacuum processes and processes running manual
* lazy VACUUMs, because they won't be fazed by missing index entries
* either. (Manual ANALYZEs, however, can't be excluded because they
* might be within transactions that are going to do arbitrary operations
* later.)
*
* Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
* check for that.
*
* If a process goes idle-in-transaction with xmin zero, we do not need to
* wait for it anymore, per the above argument. We do not have the
* infrastructure right now to stop waiting if that happens, but we can at
* least avoid the folly of waiting when it is idle at the time we would
* begin to wait. We do this by repeatedly rechecking the output of
* GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
* doesn't show up in the output, we know we can forget about it.
*/
static void
WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
{
int n_old_snapshots;
int i;
VirtualTransactionId *old_snapshots;
old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
&n_old_snapshots);
if (progress)
pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);
for (i = 0; i < n_old_snapshots; i++)
{
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
continue; /* found uninteresting in previous cycle */
if (i > 0)
{
/* see if anything's changed ... */
VirtualTransactionId *newer_snapshots;
int n_newer_snapshots;
int j;
int k;
newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
true, false,
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
&n_newer_snapshots);
for (j = i; j < n_old_snapshots; j++)
{
if (!VirtualTransactionIdIsValid(old_snapshots[j]))
continue; /* found uninteresting in previous cycle */
for (k = 0; k < n_newer_snapshots; k++)
{
if (VirtualTransactionIdEquals(old_snapshots[j],
newer_snapshots[k]))
break;
}
if (k >= n_newer_snapshots) /* not there anymore */
SetInvalidVirtualTransactionId(old_snapshots[j]);
}
pfree(newer_snapshots);
}
if (VirtualTransactionIdIsValid(old_snapshots[i]))
{
if (progress)
{
PGPROC *holder = BackendIdGetProc(old_snapshots[i].backendId);
pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
holder->pid);
}
VirtualXactLock(old_snapshots[i], true);
}
if (progress)
pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, i + 1);
}
}
/*
1999-05-25 18:15:34 +02:00
* DefineIndex
* Creates a new index.
*
* 'relationId': the OID of the heap relation on which the index is to be
* created
* 'stmt': IndexStmt describing the properties of the new index.
* 'indexRelationId': normally InvalidOid, but during bootstrap can be
* nonzero to specify a preselected OID for the index.
* 'parentIndexId': the OID of the parent index; InvalidOid if not the child
* of a partitioned index.
* 'parentConstraintId': the OID of the parent constraint; InvalidOid if not
* the child of a constraint (only used when recursing)
* 'is_alter_table': this is due to an ALTER rather than a CREATE operation.
* 'check_rights': check for CREATE rights in namespace and tablespace. (This
* should be true except when ALTER is deleting/recreating an index.)
* 'check_not_in_use': check for table not already in use in current session.
* This should be true unless caller is holding the table open, in which
* case the caller had better have checked it earlier.
* 'skip_build': make the catalog entries but don't create the index files
* 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
*
* Returns the object address of the created index.
*/
ObjectAddress
DefineIndex(Oid relationId,
IndexStmt *stmt,
Oid indexRelationId,
Oid parentIndexId,
Oid parentConstraintId,
bool is_alter_table,
bool check_rights,
bool check_not_in_use,
bool skip_build,
bool quiet)
{
char *indexRelationName;
char *accessMethodName;
Oid *typeObjectId;
Oid *collationObjectId;
Oid *classObjectId;
Oid accessMethodId;
Oid namespaceId;
Oid tablespaceId;
Oid createdConstraintId = InvalidOid;
List *indexColNames;
List *allIndexParams;
Relation rel;
HeapTuple tuple;
Form_pg_am accessMethodForm;
IndexAmRoutine *amRoutine;
bool amcanorder;
amoptions_function amoptions;
bool partitioned;
Datum reloptions;
int16 *coloptions;
IndexInfo *indexInfo;
bits16 flags;
bits16 constr_flags;
int numberOfAttributes;
int numberOfKeyAttributes;
TransactionId limitXmin;
ObjectAddress address;
LockRelId heaprelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
Snapshot snapshot;
int save_nestlevel = -1;
int i;
/*
* Some callers need us to run with an empty default_tablespace; this is a
* necessary hack to be able to reproduce catalog state accurately when
* recreating indexes after table-rewriting ALTER TABLE.
*/
if (stmt->reset_default_tblspc)
{
save_nestlevel = NewGUCNestLevel();
(void) set_config_option("default_tablespace", "",
PGC_USERSET, PGC_S_SESSION,
GUC_ACTION_SAVE, true, 0, false);
}
/*
* Start progress report. If we're building a partition, this was already
* done.
*/
if (!OidIsValid(parentIndexId))
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
relationId);
/*
* No index OID to report yet
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
InvalidOid);
/*
* count key attributes in index
*/
numberOfKeyAttributes = list_length(stmt->indexParams);
/*
* Calculate the new list of index columns including both key columns and
* INCLUDE columns. Later we can determine which of these are key
* columns, and which are just part of the INCLUDE list by checking the
* list position. A list item in a position less than ii_NumIndexKeyAttrs
* is part of the key columns, and anything equal to and over is part of
* the INCLUDE columns.
*/
allIndexParams = list_concat(list_copy(stmt->indexParams),
list_copy(stmt->indexIncludingParams));
numberOfAttributes = list_length(allIndexParams);
if (numberOfAttributes <= 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("must specify at least one column")));
if (numberOfAttributes > INDEX_MAX_KEYS)
ereport(ERROR,
(errcode(ERRCODE_TOO_MANY_COLUMNS),
errmsg("cannot use more than %d columns in an index",
INDEX_MAX_KEYS)));
/*
* Only SELECT ... FOR UPDATE/SHARE are allowed while doing a standard
* index build; but for concurrent builds we allow INSERT/UPDATE/DELETE
* (but not VACUUM).
*
* NB: Caller is responsible for making sure that relationId refers to the
* relation on which the index should be built; except in bootstrap mode,
* this will typically require the caller to have already locked the
* relation. To avoid lock upgrade hazards, that lock should be at least
* as strong as the one we take here.
*
* NB: If the lock strength here ever changes, code that is run by
* parallel workers under the control of certain particular ambuild
* functions will need to be updated, too.
*/
lockmode = stmt->concurrent ? ShareUpdateExclusiveLock : ShareLock;
rel = table_open(relationId, lockmode);
namespaceId = RelationGetNamespace(rel);
/* Ensure that it makes sense to index this kind of relation */
switch (rel->rd_rel->relkind)
{
case RELKIND_RELATION:
case RELKIND_MATVIEW:
case RELKIND_PARTITIONED_TABLE:
/* OK */
break;
case RELKIND_FOREIGN_TABLE:
/*
* Custom error message for FOREIGN TABLE since the term is close
* to a regular table and can confuse the user.
*/
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot create index on foreign table \"%s\"",
RelationGetRelationName(rel))));
break;
default:
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("\"%s\" is not a table or materialized view",
RelationGetRelationName(rel))));
break;
}
/*
* Establish behavior for partitioned tables, and verify sanity of
* parameters.
*
* We do not build an actual index in this case; we only create a few
* catalog entries. The actual indexes are built by recursing for each
* partition.
*/
partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
if (partitioned)
{
if (stmt->concurrent)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot create index on partitioned table \"%s\" concurrently",
RelationGetRelationName(rel))));
if (stmt->excludeOpNames)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot create exclusion constraints on partitioned table \"%s\"",
RelationGetRelationName(rel))));
}
/*
* Don't try to CREATE INDEX on temp tables of other backends.
*/
if (RELATION_IS_OTHER_TEMP(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot create indexes on temporary tables of other sessions")));
/*
* Unless our caller vouches for having checked this already, insist that
* the table not be in use by our own session, either. Otherwise we might
* fail to make entries in the new index (for instance, if an INSERT or
* UPDATE is in progress and has already made its list of target indexes).
*/
if (check_not_in_use)
CheckTableNotInUse(rel, "CREATE INDEX");
/*
* Verify we (still) have CREATE rights in the rel's namespace.
2005-10-15 04:49:52 +02:00
* (Presumably we did when the rel was created, but maybe not anymore.)
* Skip check if caller doesn't want it. Also skip check if
* bootstrapping, since permissions machinery may not be working yet.
*/
if (check_rights && !IsBootstrapProcessingMode())
{
AclResult aclresult;
aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(),
ACL_CREATE);
if (aclresult != ACLCHECK_OK)
aclcheck_error(aclresult, OBJECT_SCHEMA,
get_namespace_name(namespaceId));
}
/*
* Select tablespace to use. If not specified, use default tablespace
* (which may in turn default to database's default).
*/
if (stmt->tableSpace)
{
tablespaceId = get_tablespace_oid(stmt->tableSpace, false);
if (partitioned && tablespaceId == MyDatabaseTableSpace)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
2019-04-30 16:00:38 +02:00
errmsg("cannot specify default tablespace for partitioned relations")));
}
else
{
tablespaceId = GetDefaultTablespace(rel->rd_rel->relpersistence,
partitioned);
/* note InvalidOid is OK in this case */
}
/* Check tablespace permissions */
if (check_rights &&
OidIsValid(tablespaceId) && tablespaceId != MyDatabaseTableSpace)
{
AclResult aclresult;
aclresult = pg_tablespace_aclcheck(tablespaceId, GetUserId(),
ACL_CREATE);
if (aclresult != ACLCHECK_OK)
aclcheck_error(aclresult, OBJECT_TABLESPACE,
get_tablespace_name(tablespaceId));
}
/*
* Force shared indexes into the pg_global tablespace. This is a bit of a
* hack but seems simpler than marking them in the BKI commands. On the
* other hand, if it's not shared, don't allow it to be placed there.
*/
if (rel->rd_rel->relisshared)
tablespaceId = GLOBALTABLESPACE_OID;
else if (tablespaceId == GLOBALTABLESPACE_OID)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("only shared relations can be placed in pg_global tablespace")));
/*
* Choose the index column names.
*/
indexColNames = ChooseIndexColumnNames(allIndexParams);
/*
* Select name for index if caller didn't specify
*/
indexRelationName = stmt->idxname;
if (indexRelationName == NULL)
indexRelationName = ChooseIndexName(RelationGetRelationName(rel),
namespaceId,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
stmt->isconstraint);
/*
2005-10-15 04:49:52 +02:00
* look up the access method, verify it can handle the requested features
*/
accessMethodName = stmt->accessMethod;
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
if (!HeapTupleIsValid(tuple))
2005-11-07 18:36:47 +01:00
{
/*
* Hack to provide more-or-less-transparent updating of old RTREE
2011-05-19 00:14:45 +02:00
* indexes to GiST: if RTREE is requested and not found, use GIST.
2005-11-07 18:36:47 +01:00
*/
if (strcmp(accessMethodName, "rtree") == 0)
{
ereport(NOTICE,
(errmsg("substituting access method \"gist\" for obsolete method \"rtree\"")));
accessMethodName = "gist";
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
2005-11-07 18:36:47 +01:00
}
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("access method \"%s\" does not exist",
accessMethodName)));
}
accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
accessMethodId = accessMethodForm->oid;
amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler);
pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
accessMethodId);
if (stmt->unique && !amRoutine->amcanunique)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support unique indexes",
accessMethodName)));
if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support included columns",
accessMethodName)));
if (numberOfAttributes > 1 && !amRoutine->amcanmulticol)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support multicolumn indexes",
accessMethodName)));
if (stmt->excludeOpNames && amRoutine->amgettuple == NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support exclusion constraints",
accessMethodName)));
amcanorder = amRoutine->amcanorder;
amoptions = amRoutine->amoptions;
pfree(amRoutine);
ReleaseSysCache(tuple);
/*
* Validate predicate, if given
*/
if (stmt->whereClause)
CheckPredicate((Expr *) stmt->whereClause);
/*
* Parse AM-specific options, convert to text array form, validate.
*/
reloptions = transformRelOptions((Datum) 0, stmt->options,
NULL, NULL, false, false);
(void) index_reloptions(amoptions, reloptions, true);
/*
2005-10-15 04:49:52 +02:00
* Prepare arguments for index_create, primarily an IndexInfo structure.
* Note that ii_Predicate must be in implicit-AND format.
*/
indexInfo = makeNode(IndexInfo);
indexInfo->ii_NumIndexAttrs = numberOfAttributes;
indexInfo->ii_NumIndexKeyAttrs = numberOfKeyAttributes;
2003-08-04 02:43:34 +02:00
indexInfo->ii_Expressions = NIL; /* for now */
indexInfo->ii_ExpressionsState = NIL;
indexInfo->ii_Predicate = make_ands_implicit((Expr *) stmt->whereClause);
Faster expression evaluation and targetlist projection. This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
2017-03-14 23:45:36 +01:00
indexInfo->ii_PredicateState = NULL;
indexInfo->ii_ExclusionOps = NULL;
indexInfo->ii_ExclusionProcs = NULL;
indexInfo->ii_ExclusionStrats = NULL;
indexInfo->ii_Unique = stmt->unique;
/* In a concurrent build, mark it not-ready-for-inserts */
indexInfo->ii_ReadyForInserts = !stmt->concurrent;
indexInfo->ii_Concurrent = stmt->concurrent;
indexInfo->ii_BrokenHotChain = false;
indexInfo->ii_ParallelWorkers = 0;
indexInfo->ii_Am = accessMethodId;
indexInfo->ii_AmCache = NULL;
indexInfo->ii_Context = CurrentMemoryContext;
typeObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
collationObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
classObjectId = (Oid *) palloc(numberOfAttributes * sizeof(Oid));
coloptions = (int16 *) palloc(numberOfAttributes * sizeof(int16));
ComputeIndexAttrs(indexInfo,
typeObjectId, collationObjectId, classObjectId,
coloptions, allIndexParams,
stmt->excludeOpNames, relationId,
accessMethodName, accessMethodId,
amcanorder, stmt->isconstraint);
/*
* Extra checks when creating a PRIMARY KEY index.
*/
if (stmt->primary)
index_check_primary_key(rel, indexInfo, is_alter_table, stmt);
/*
* If this table is partitioned and we're creating a unique index or a
* primary key, make sure that the indexed columns are part of the
* partition key. Otherwise it would be possible to violate uniqueness by
* putting values that ought to be unique in different partitions.
*
* We could lift this limitation if we had global indexes, but those have
* their own problems, so this is a useful feature combination.
*/
if (partitioned && (stmt->unique || stmt->primary))
{
PartitionKey key = rel->rd_partkey;
int i;
/*
* A partitioned table can have unique indexes, as long as all the
* columns in the partition key appear in the unique key. A
* partition-local index can enforce global uniqueness iff the PK
* value completely determines the partition that a row is in.
*
* Thus, verify that all the columns in the partition key appear in
* the unique key definition.
*/
for (i = 0; i < key->partnatts; i++)
{
bool found = false;
int j;
const char *constraint_type;
if (stmt->primary)
constraint_type = "PRIMARY KEY";
else if (stmt->unique)
constraint_type = "UNIQUE";
else if (stmt->excludeOpNames != NIL)
constraint_type = "EXCLUDE";
else
{
elog(ERROR, "unknown constraint type");
constraint_type = NULL; /* keep compiler quiet */
}
/*
* It may be possible to support UNIQUE constraints when partition
* keys are expressions, but is it worth it? Give up for now.
*/
if (key->partattrs[i] == 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("unsupported %s constraint with partition key definition",
constraint_type),
errdetail("%s constraints cannot be used when partition keys include expressions.",
constraint_type)));
for (j = 0; j < indexInfo->ii_NumIndexKeyAttrs; j++)
{
if (key->partattrs[i] == indexInfo->ii_IndexAttrNumbers[j])
{
found = true;
break;
}
}
if (!found)
{
Form_pg_attribute att;
att = TupleDescAttr(RelationGetDescr(rel), key->partattrs[i] - 1);
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("insufficient columns in %s constraint definition",
constraint_type),
errdetail("%s constraint on table \"%s\" lacks column \"%s\" which is part of the partition key.",
constraint_type, RelationGetRelationName(rel),
NameStr(att->attname))));
}
}
}
/*
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
* We disallow indexes on system columns. They would not necessarily get
* updated correctly, and they don't seem useful anyway.
*/
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
{
AttrNumber attno = indexInfo->ii_IndexAttrNumbers[i];
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
if (attno < 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("index creation on system columns is not supported")));
}
/*
* Also check for system columns used in expressions or predicates.
*/
if (indexInfo->ii_Expressions || indexInfo->ii_Predicate)
{
Bitmapset *indexattrs = NULL;
pull_varattnos((Node *) indexInfo->ii_Expressions, 1, &indexattrs);
pull_varattnos((Node *) indexInfo->ii_Predicate, 1, &indexattrs);
for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
{
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
if (bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
indexattrs))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("index creation on system columns is not supported")));
}
}
/*
2005-10-15 04:49:52 +02:00
* Report index creation if appropriate (delay this till after most of the
* error checks)
*/
if (stmt->isconstraint && !quiet)
{
const char *constraint_type;
if (stmt->primary)
constraint_type = "PRIMARY KEY";
else if (stmt->unique)
constraint_type = "UNIQUE";
else if (stmt->excludeOpNames != NIL)
constraint_type = "EXCLUDE";
else
{
elog(ERROR, "unknown constraint type");
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
constraint_type = NULL; /* keep compiler quiet */
}
ereport(DEBUG1,
(errmsg("%s %s will create implicit index \"%s\" for table \"%s\"",
is_alter_table ? "ALTER TABLE / ADD" : "CREATE TABLE /",
constraint_type,
indexRelationName, RelationGetRelationName(rel))));
}
/*
* A valid stmt->oldNode implies that we already have a built form of the
* index. The caller should also decline any index build.
*/
Assert(!OidIsValid(stmt->oldNode) || (skip_build && !stmt->concurrent));
/*
* Make the catalog entries for the index, including constraints. This
* step also actually builds the index, except if caller requested not to
* or in concurrent mode, in which case it'll be done later, or doing a
* partitioned index (because those don't have storage).
*/
flags = constr_flags = 0;
if (stmt->isconstraint)
flags |= INDEX_CREATE_ADD_CONSTRAINT;
if (skip_build || stmt->concurrent || partitioned)
flags |= INDEX_CREATE_SKIP_BUILD;
if (stmt->if_not_exists)
flags |= INDEX_CREATE_IF_NOT_EXISTS;
if (stmt->concurrent)
flags |= INDEX_CREATE_CONCURRENT;
if (partitioned)
flags |= INDEX_CREATE_PARTITIONED;
if (stmt->primary)
flags |= INDEX_CREATE_IS_PRIMARY;
/*
* If the table is partitioned, and recursion was declined but partitions
* exist, mark the index as invalid.
*/
if (partitioned && stmt->relation && !stmt->relation->inh)
{
PartitionDesc pd = RelationGetPartitionDesc(rel);
if (pd->nparts != 0)
flags |= INDEX_CREATE_INVALID;
}
if (stmt->deferrable)
constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
if (stmt->initdeferred)
constr_flags |= INDEX_CONSTR_CREATE_INIT_DEFERRED;
indexRelationId =
index_create(rel, indexRelationName, indexRelationId, parentIndexId,
parentConstraintId,
stmt->oldNode, indexInfo, indexColNames,
accessMethodId, tablespaceId,
collationObjectId, classObjectId,
coloptions, reloptions,
flags, constr_flags,
allowSystemTableMods, !check_rights,
&createdConstraintId);
ObjectAddressSet(address, RelationRelationId, indexRelationId);
/*
* Revert to original default_tablespace. Must do this before any return
* from this function, but after index_create, so this is a good time.
*/
if (save_nestlevel >= 0)
AtEOXact_GUC(true, save_nestlevel);
if (!OidIsValid(indexRelationId))
{
table_close(rel, NoLock);
/* If this is the top-level index, we're done */
if (!OidIsValid(parentIndexId))
pgstat_progress_end_command();
return address;
}
/* Add any requested comment */
if (stmt->idxcomment != NULL)
CreateComments(indexRelationId, RelationRelationId, 0,
stmt->idxcomment);
if (partitioned)
{
/*
* Unless caller specified to skip this step (via ONLY), process each
* partition to make sure they all contain a corresponding index.
*
* If we're called internally (no stmt->relation), recurse always.
*/
if (!stmt->relation || stmt->relation->inh)
{
PartitionDesc partdesc = RelationGetPartitionDesc(rel);
int nparts = partdesc->nparts;
Oid *part_oids = palloc(sizeof(Oid) * nparts);
bool invalidate_parent = false;
TupleDesc parentDesc;
Oid *opfamOids;
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
nparts);
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes);
for (i = 0; i < numberOfKeyAttributes; i++)
opfamOids[i] = get_opclass_family(classObjectId[i]);
table_close(rel, NoLock);
/*
* For each partition, scan all existing indexes; if one matches
* our index definition and is not already attached to some other
* parent index, attach it to the one we just created.
*
* If none matches, build a new index by calling ourselves
* recursively with the same options (except for the index name).
*/
for (i = 0; i < nparts; i++)
{
Oid childRelid = part_oids[i];
Relation childrel;
List *childidxs;
ListCell *cell;
AttrNumber *attmap;
bool found = false;
int maplen;
childrel = table_open(childRelid, lockmode);
childidxs = RelationGetIndexList(childrel);
attmap =
convert_tuples_by_name_map(RelationGetDescr(childrel),
parentDesc,
gettext_noop("could not convert row type"));
maplen = parentDesc->natts;
foreach(cell, childidxs)
{
Oid cldidxid = lfirst_oid(cell);
Relation cldidx;
IndexInfo *cldIdxInfo;
/* this index is already partition of another one */
if (has_superclass(cldidxid))
continue;
cldidx = index_open(cldidxid, lockmode);
cldIdxInfo = BuildIndexInfo(cldidx);
if (CompareIndexInfo(cldIdxInfo, indexInfo,
cldidx->rd_indcollation,
collationObjectId,
cldidx->rd_opfamily,
opfamOids,
attmap, maplen))
{
Oid cldConstrOid = InvalidOid;
/*
* Found a match.
*
* If this index is being created in the parent
* because of a constraint, then the child needs to
* have a constraint also, so look for one. If there
* is no such constraint, this index is no good, so
* keep looking.
*/
if (createdConstraintId != InvalidOid)
{
cldConstrOid =
get_relation_idx_constraint_oid(childRelid,
cldidxid);
if (cldConstrOid == InvalidOid)
{
index_close(cldidx, lockmode);
continue;
}
}
/* Attach index to parent and we're done. */
IndexSetParentIndex(cldidx, indexRelationId);
if (createdConstraintId != InvalidOid)
ConstraintSetParentConstraint(cldConstrOid,
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
createdConstraintId,
childRelid);
if (!cldidx->rd_index->indisvalid)
invalidate_parent = true;
found = true;
/* keep lock till commit */
index_close(cldidx, NoLock);
break;
}
index_close(cldidx, lockmode);
}
list_free(childidxs);
table_close(childrel, NoLock);
/*
* If no matching index was found, create our own.
*/
if (!found)
{
IndexStmt *childStmt = copyObject(stmt);
bool found_whole_row;
ListCell *lc;
Apply stopgap fix for bug #15672. Fix DefineIndex so that it doesn't attempt to pass down a to-be-reused index relfilenode to a child index creation, and fix TryReuseIndex to not think that reuse is sensible for a partitioned index. In v11, this fixes a problem where ALTER TABLE on a partitioned table could assign the same relfilenode to several different child indexes, causing very nasty catalog corruption --- in fact, attempting to DROP the partitioned table then leads not only to a database crash, but to inability to restart because the same crash will recur during WAL replay. Either of these two changes would be enough to prevent the failure, but since neither action could possibly be sane, let's put in both changes for future-proofing. In HEAD, no such bug manifests, but that's just an accidental consequence of having changed the pg_class representation of partitioned indexes to have relfilenode = 0. Both of these changes still seem like smart future-proofing. This is only a stop-gap because the code for ALTER TABLE on a partitioned table with a no-op type change still leaves a great deal to be desired. As the added regression tests show, it gets things wrong for comments on child indexes/constraints, and it is regenerating child indexes it doesn't have to. However, fixing those problems will take more work which may not get back-patched into v11. We need a fix for the corruption problem now. Per bug #15672 from Jianing Yang. Patch by me, regression test cases based on work by Amit Langote, who also did a lot of the investigative work. Discussion: https://postgr.es/m/15672-b9fa7db32698269f@postgresql.org
2019-04-26 23:18:07 +02:00
/*
* We can't use the same index name for the child index,
* so clear idxname to let the recursive invocation choose
* a new name. Likewise, the existing target relation
* field is wrong, and if indexOid or oldNode are set,
* they mustn't be applied to the child either.
*/
childStmt->idxname = NULL;
childStmt->relation = NULL;
childStmt->indexOid = InvalidOid;
childStmt->oldNode = InvalidOid;
/*
* Adjust any Vars (both in expressions and in the index's
* WHERE clause) to match the partition's column numbering
* in case it's different from the parent's.
*/
foreach(lc, childStmt->indexParams)
{
2018-06-30 18:25:49 +02:00
IndexElem *ielem = lfirst(lc);
/*
* If the index parameter is an expression, we must
* translate it to contain child Vars.
*/
if (ielem->expr)
{
ielem->expr =
map_variable_attnos((Node *) ielem->expr,
1, 0, attmap, maplen,
InvalidOid,
&found_whole_row);
if (found_whole_row)
elog(ERROR, "cannot convert whole-row table reference");
}
}
childStmt->whereClause =
map_variable_attnos(stmt->whereClause, 1, 0,
attmap, maplen,
InvalidOid, &found_whole_row);
if (found_whole_row)
elog(ERROR, "cannot convert whole-row table reference");
DefineIndex(childRelid, childStmt,
InvalidOid, /* no predefined OID */
indexRelationId, /* this is our child */
createdConstraintId,
is_alter_table, check_rights, check_not_in_use,
skip_build, quiet);
}
pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
i + 1);
pfree(attmap);
}
/*
* The pg_index row we inserted for this index was marked
* indisvalid=true. But if we attached an existing index that is
* invalid, this is incorrect, so update our row to invalid too.
*/
if (invalidate_parent)
{
Relation pg_index = table_open(IndexRelationId, RowExclusiveLock);
HeapTuple tup,
newtup;
tup = SearchSysCache1(INDEXRELID,
ObjectIdGetDatum(indexRelationId));
if (!tup)
elog(ERROR, "cache lookup failed for index %u",
indexRelationId);
newtup = heap_copytuple(tup);
((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false;
CatalogTupleUpdate(pg_index, &tup->t_self, newtup);
ReleaseSysCache(tup);
table_close(pg_index, RowExclusiveLock);
heap_freetuple(newtup);
}
}
else
table_close(rel, NoLock);
/*
* Indexes on partitioned tables are not themselves built, so we're
* done here.
*/
if (!OidIsValid(parentIndexId))
pgstat_progress_end_command();
return address;
}
if (!stmt->concurrent)
{
/* Close the heap and we're done, in the non-concurrent case */
table_close(rel, NoLock);
/* If this is the top-level index, we're done. */
if (!OidIsValid(parentIndexId))
pgstat_progress_end_command();
return address;
}
/* save lockrelid and locktag for below, then close rel */
heaprelid = rel->rd_lockInfo.lockRelId;
SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
table_close(rel, NoLock);
/*
* For a concurrent build, it's important to make the catalog entries
2010-02-26 03:01:40 +01:00
* visible to other transactions before we start to build the index. That
* will prevent them from making incompatible HOT updates. The new index
2010-02-26 03:01:40 +01:00
* will be marked not indisready and not indisvalid, so that no one else
* tries to either insert into it or use it for queries.
*
* We must commit our current transaction so that the index becomes
2006-10-04 02:30:14 +02:00
* visible; then start another. Note that all the data structures we just
* built are lost in the commit. The only data we keep past here are the
* relation IDs.
*
* Before committing, get a session-level lock on the table, to ensure
2006-10-04 02:30:14 +02:00
* that neither it nor the index can be dropped before we finish. This
* cannot block, even if someone else is waiting for access, because we
* already have the same lock within our transaction.
*
* Note: we don't currently bother with a session lock on the index,
2006-10-04 02:30:14 +02:00
* because there are no operations that could change its state while we
* hold lock on the parent table. This might need to change later.
*/
LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
/*
* The index is now visible, so we can report the OID.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
indexRelationId);
/*
* Phase 2 of concurrent index build (see comments for validate_index()
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
* with the old list of indexes. Use ShareLock to consider running
* transactions that hold locks that permit writing to the table. Note we
* do not need to worry about xacts that open the table for writing after
* this point; they will see the new index when they open it.
*
2007-11-15 22:14:46 +01:00
* Note: the reason we use actual lock acquisition here, rather than just
* checking the ProcArray and sleeping, is that deadlock is possible if
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_1);
WaitForLockers(heaplocktag, ShareLock, true);
/*
* At this moment we are sure that there are no transactions with the
* table open for write that don't have this new index in their list of
* indexes. We have waited out all the existing transactions and any new
* transaction will have the new index in its list, but the index is still
* marked as "not-ready-for-inserts". The index is consulted while
* deciding HOT-safety though. This arrangement ensures that no new HOT
* chains can be created where the new tuple and the old tuple in the
* chain have different index keys.
*
* We now take a new snapshot, and build the index using all tuples that
2007-11-15 22:14:46 +01:00
* are visible in this snapshot. We can be sure that any HOT updates to
* these tuples will be compatible with the index, since any updates made
* by transactions that didn't know about the index are now committed or
* rolled back. Thus, each visible tuple is either the end of its
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
/* Perform concurrent build of index */
index_concurrently_build(relationId, indexRelationId);
/* we can do away with our snapshot */
PopActiveSnapshot();
/*
* Commit this transaction to make the indisready update visible.
*/
CommitTransactionCommand();
StartTransactionCommand();
/*
* Phase 3 of concurrent index build
*
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_2);
WaitForLockers(heaplocktag, ShareLock, true);
/*
* Now take the "reference snapshot" that will be used by validate_index()
* to filter candidate tuples. Beware! There might still be snapshots in
2007-11-15 22:14:46 +01:00
* use that treat some transaction as in-progress that our reference
* snapshot treats as committed. If such a recently-committed transaction
* deleted tuples in the table, we will not include them in the index; yet
* those transactions which see the deleting one as still-in-progress will
* expect such tuples to be there once we mark the index as valid.
*
* We solve this by waiting for all endangered transactions to exit before
* we mark the index as valid.
*
2006-10-04 02:30:14 +02:00
* We also set ActiveSnapshot to this snap, since functions in indexes may
* need a snapshot.
*/
snapshot = RegisterSnapshot(GetTransactionSnapshot());
PushActiveSnapshot(snapshot);
/*
* Scan the index and the heap, insert any missing index entries.
*/
validate_index(relationId, indexRelationId, snapshot);
/*
* Drop the reference snapshot. We must do this before waiting out other
* snapshot holders, else we will deadlock against other processes also
* doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
* they must wait for. But first, save the snapshot's xmin to use as
* limitXmin for GetCurrentVirtualXIDs().
*/
limitXmin = snapshot->xmin;
PopActiveSnapshot();
UnregisterSnapshot(snapshot);
/*
* The snapshot subsystem could still contain registered snapshots that
* are holding back our process's advertised xmin; in particular, if
* default_transaction_isolation = serializable, there is a transaction
* snapshot that is still active. The CatalogSnapshot is likewise a
* hazard. To ensure no deadlocks, we must commit and start yet another
* transaction, and do our wait before any snapshot has been taken in it.
*/
CommitTransactionCommand();
StartTransactionCommand();
/* We should now definitely not be advertising any xmin. */
Assert(MyPgXact->xmin == InvalidTransactionId);
/*
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
2006-10-04 02:30:14 +02:00
* before the reference snap was taken, we have to wait out any
* transactions that might have older snapshots.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_3);
WaitForOlderSnapshots(limitXmin, true);
/*
* Index can now be marked valid -- update its pg_index entry
*/
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
/*
* The pg_index update will cause backends (including this one) to update
* relcache entries for the index itself, but we should also send a
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
2007-11-15 22:14:46 +01:00
* would be useful. (Note that our earlier commits did not create reasons
Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
2012-11-29 03:25:27 +01:00
* to replan; so relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
/*
* Last thing to do is release the session-level lock on the parent table.
*/
UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
pgstat_progress_end_command();
return address;
}
/*
* CheckMutability
* Test whether given expression is mutable
*/
static bool
CheckMutability(Expr *expr)
{
/*
* First run the expression through the planner. This has a couple of
* important consequences. First, function default arguments will get
* inserted, which may affect volatility (consider "default now()").
* Second, inline-able functions will get inlined, which may allow us to
2010-07-06 21:19:02 +02:00
* conclude that the function is really less volatile than it's marked. As
* an example, polymorphic functions must be marked with the most volatile
* behavior that they have for any input type, but once we inline the
* function we may be able to conclude that it's not so volatile for the
* particular input type we're dealing with.
*
* We assume here that expression_planner() won't scribble on its input.
*/
expr = expression_planner(expr);
/* Now we can search for non-immutable functions */
return contain_mutable_functions((Node *) expr);
}
/*
* CheckPredicate
* Checks that the given partial-index predicate is valid.
*
* This used to also constrain the form of the predicate to forms that
* indxpath.c could do something with. However, that seems overly
* restrictive. One useful application of partial indexes is to apply
* a UNIQUE constraint across a subset of a table, and in that scenario
* any evaluable predicate will work. So accept any predicate here
* (except ones requiring a plan), and let indxpath.c fend for itself.
*/
static void
CheckPredicate(Expr *predicate)
{
/*
* transformExpr() should have already rejected subqueries, aggregates,
* and window functions, based on the EXPR_KIND_ for a predicate.
*/
/*
2002-09-04 22:31:48 +02:00
* A predicate using mutable functions is probably wrong, for the same
* reasons that we don't allow an index expression to use one.
*/
if (CheckMutability(predicate))
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("functions in index predicate must be marked IMMUTABLE")));
}
/*
* Compute per-index-column information, including indexed column numbers
* or index expressions, opclasses, and indoptions. Note, all output vectors
* should be allocated for all columns, including "including" ones.
*/
static void
ComputeIndexAttrs(IndexInfo *indexInfo,
Oid *typeOidP,
Oid *collationOidP,
Oid *classOidP,
int16 *colOptionP,
List *attList, /* list of IndexElem's */
List *exclusionOpNames,
Oid relId,
const char *accessMethodName,
Oid accessMethodId,
bool amcanorder,
bool isconstraint)
{
ListCell *nextExclOp;
ListCell *lc;
int attn;
int nkeycols = indexInfo->ii_NumIndexKeyAttrs;
/* Allocate space for exclusion operator info, if needed */
if (exclusionOpNames)
{
Assert(list_length(exclusionOpNames) == nkeycols);
indexInfo->ii_ExclusionOps = (Oid *) palloc(sizeof(Oid) * nkeycols);
indexInfo->ii_ExclusionProcs = (Oid *) palloc(sizeof(Oid) * nkeycols);
indexInfo->ii_ExclusionStrats = (uint16 *) palloc(sizeof(uint16) * nkeycols);
nextExclOp = list_head(exclusionOpNames);
}
else
nextExclOp = NULL;
/*
* process attributeList
*/
attn = 0;
foreach(lc, attList)
{
IndexElem *attribute = (IndexElem *) lfirst(lc);
Oid atttype;
Oid attcollation;
/*
* Process the column-or-expression to be indexed.
*/
if (attribute->name != NULL)
{
/* Simple index attribute */
HeapTuple atttuple;
Form_pg_attribute attform;
Assert(attribute->expr == NULL);
atttuple = SearchSysCacheAttName(relId, attribute->name);
if (!HeapTupleIsValid(atttuple))
{
/* difference in error message spellings is historical */
if (isconstraint)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" named in key does not exist",
attribute->name)));
else
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" does not exist",
attribute->name)));
}
attform = (Form_pg_attribute) GETSTRUCT(atttuple);
indexInfo->ii_IndexAttrNumbers[attn] = attform->attnum;
atttype = attform->atttypid;
attcollation = attform->attcollation;
ReleaseSysCache(atttuple);
}
else
{
/* Index expression */
2011-04-10 17:42:00 +02:00
Node *expr = attribute->expr;
Assert(expr != NULL);
if (attn >= nkeycols)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("expressions are not supported in included columns")));
atttype = exprType(expr);
attcollation = exprCollation(expr);
/*
* Strip any top-level COLLATE clause. This ensures that we treat
* "x COLLATE y" and "(x COLLATE y)" alike.
*/
while (IsA(expr, CollateExpr))
expr = (Node *) ((CollateExpr *) expr)->arg;
if (IsA(expr, Var) &&
((Var *) expr)->varattno != InvalidAttrNumber)
{
/*
* User wrote "(column)" or "(column COLLATE something)".
* Treat it like simple attribute anyway.
*/
indexInfo->ii_IndexAttrNumbers[attn] = ((Var *) expr)->varattno;
}
else
{
indexInfo->ii_IndexAttrNumbers[attn] = 0; /* marks expression */
indexInfo->ii_Expressions = lappend(indexInfo->ii_Expressions,
expr);
/*
* transformExpr() should have already rejected subqueries,
* aggregates, and window functions, based on the EXPR_KIND_
* for an index expression.
*/
/*
* An expression using mutable functions is probably wrong,
* since if you aren't going to get the same result for the
* same data every time, it's not clear what the index entries
* mean at all.
*/
if (CheckMutability((Expr *) expr))
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("functions in index expression must be marked IMMUTABLE")));
}
}
typeOidP[attn] = atttype;
/*
* Included columns have no collation, no opclass and no ordering
* options.
*/
if (attn >= nkeycols)
{
if (attribute->collation)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("including column does not support a collation")));
if (attribute->opclass)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("including column does not support an operator class")));
if (attribute->ordering != SORTBY_DEFAULT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("including column does not support ASC/DESC options")));
if (attribute->nulls_ordering != SORTBY_NULLS_DEFAULT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("including column does not support NULLS FIRST/LAST options")));
classOidP[attn] = InvalidOid;
colOptionP[attn] = 0;
collationOidP[attn] = InvalidOid;
attn++;
continue;
}
/*
* Apply collation override if any
*/
if (attribute->collation)
attcollation = get_collation_oid(attribute->collation, false);
/*
* Check we have a collation iff it's a collatable type. The only
* expected failures here are (1) COLLATE applied to a noncollatable
2011-04-10 17:42:00 +02:00
* type, or (2) index expression had an unresolved collation. But we
* might as well code this to be a complete consistency check.
*/
if (type_is_collatable(atttype))
{
if (!OidIsValid(attcollation))
ereport(ERROR,
(errcode(ERRCODE_INDETERMINATE_COLLATION),
errmsg("could not determine which collation to use for index expression"),
errhint("Use the COLLATE clause to set the collation explicitly.")));
}
else
{
if (OidIsValid(attcollation))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("collations are not supported by type %s",
format_type_be(atttype))));
}
collationOidP[attn] = attcollation;
/*
* Identify the opclass to use.
*/
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
classOidP[attn] = ResolveOpClass(attribute->opclass,
atttype,
accessMethodName,
accessMethodId);
/*
* Identify the exclusion operator, if any.
*/
if (nextExclOp)
{
2010-02-26 03:01:40 +01:00
List *opname = (List *) lfirst(nextExclOp);
Oid opid;
Oid opfamily;
int strat;
/*
* Find the operator --- it must accept the column datatype
* without runtime coercion (but binary compatibility is OK)
*/
opid = compatible_oper_opid(opname, atttype, atttype, false);
/*
* Only allow commutative operators to be used in exclusion
* constraints. If X conflicts with Y, but Y does not conflict
* with X, bad things will happen.
*/
if (get_commutator(opid) != opid)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("operator %s is not commutative",
format_operator(opid)),
errdetail("Only commutative operators can be used in exclusion constraints.")));
/*
* Operator must be a member of the right opfamily, too
*/
opfamily = get_opclass_family(classOidP[attn]);
strat = get_op_opfamily_strategy(opid, opfamily);
if (strat == 0)
{
2010-02-26 03:01:40 +01:00
HeapTuple opftuple;
Form_pg_opfamily opfform;
/*
* attribute->opclass might not explicitly name the opfamily,
* so fetch the name of the selected opfamily for use in the
* error message.
*/
opftuple = SearchSysCache1(OPFAMILYOID,
ObjectIdGetDatum(opfamily));
if (!HeapTupleIsValid(opftuple))
elog(ERROR, "cache lookup failed for opfamily %u",
opfamily);
opfform = (Form_pg_opfamily) GETSTRUCT(opftuple);
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("operator %s is not a member of operator family \"%s\"",
format_operator(opid),
NameStr(opfform->opfname)),
errdetail("The exclusion operator must be related to the index operator class for the constraint.")));
}
indexInfo->ii_ExclusionOps[attn] = opid;
indexInfo->ii_ExclusionProcs[attn] = get_opcode(opid);
indexInfo->ii_ExclusionStrats[attn] = strat;
nextExclOp = lnext(nextExclOp);
}
/*
2007-11-15 22:14:46 +01:00
* Set up the per-column options (indoption field). For now, this is
* zero for any un-ordered index, while ordered indexes have DESC and
* NULLS FIRST/LAST options.
*/
colOptionP[attn] = 0;
if (amcanorder)
{
/* default ordering is ASC */
if (attribute->ordering == SORTBY_DESC)
colOptionP[attn] |= INDOPTION_DESC;
/* default null ordering is LAST for ASC, FIRST for DESC */
if (attribute->nulls_ordering == SORTBY_NULLS_DEFAULT)
{
if (attribute->ordering == SORTBY_DESC)
colOptionP[attn] |= INDOPTION_NULLS_FIRST;
}
else if (attribute->nulls_ordering == SORTBY_NULLS_FIRST)
colOptionP[attn] |= INDOPTION_NULLS_FIRST;
}
else
{
/* index AM does not support ordering */
if (attribute->ordering != SORTBY_DEFAULT)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support ASC/DESC options",
accessMethodName)));
if (attribute->nulls_ordering != SORTBY_NULLS_DEFAULT)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("access method \"%s\" does not support NULLS FIRST/LAST options",
accessMethodName)));
}
attn++;
}
}
/*
* Resolve possibly-defaulted operator class specification
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
*
* Note: This is used to resolve operator class specification in index and
* partition key definitions.
*/
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
Oid
ResolveOpClass(List *opclass, Oid attrType,
const char *accessMethodName, Oid accessMethodId)
{
char *schemaname;
char *opcname;
HeapTuple tuple;
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
Form_pg_opclass opform;
Oid opClassId,
opInputType;
/*
2005-10-15 04:49:52 +02:00
* Release 7.0 removed network_ops, timespan_ops, and datetime_ops, so we
* ignore those opclass names so the default *_ops is used. This can be
* removed in some later release. bjm 2000/02/07
*
2003-08-04 02:43:34 +02:00
* Release 7.1 removes lztext_ops, so suppress that too for a while. tgl
* 2000/07/30
*
* Release 7.2 renames timestamp_ops to timestamptz_ops, so suppress that
* too for awhile. I'm starting to think we need a better approach. tgl
2005-10-15 04:49:52 +02:00
* 2000/10/01
*
* Release 8.0 removes bigbox_ops (which was dead code for a long while
* anyway). tgl 2003/11/11
*/
if (list_length(opclass) == 1)
{
char *claname = strVal(linitial(opclass));
if (strcmp(claname, "network_ops") == 0 ||
strcmp(claname, "timespan_ops") == 0 ||
strcmp(claname, "datetime_ops") == 0 ||
strcmp(claname, "lztext_ops") == 0 ||
strcmp(claname, "timestamp_ops") == 0 ||
strcmp(claname, "bigbox_ops") == 0)
opclass = NIL;
}
if (opclass == NIL)
{
/* no operator class specified, so find the default */
opClassId = GetDefaultOpClass(attrType, accessMethodId);
if (!OidIsValid(opClassId))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("data type %s has no default operator class for access method \"%s\"",
format_type_be(attrType), accessMethodName),
errhint("You must specify an operator class for the index or define a default operator class for the data type.")));
return opClassId;
}
/*
* Specific opclass name given, so look up the opclass.
*/
/* deconstruct the name list */
DeconstructQualifiedName(opclass, &schemaname, &opcname);
if (schemaname)
{
/* Look in specific schema only */
2002-09-04 22:31:48 +02:00
Oid namespaceId;
namespaceId = LookupExplicitNamespace(schemaname, false);
tuple = SearchSysCache3(CLAAMNAMENSP,
ObjectIdGetDatum(accessMethodId),
PointerGetDatum(opcname),
ObjectIdGetDatum(namespaceId));
}
else
{
/* Unqualified opclass name, so search the search path */
opClassId = OpclassnameGetOpcid(accessMethodId, opcname);
if (!OidIsValid(opClassId))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("operator class \"%s\" does not exist for access method \"%s\"",
opcname, accessMethodName)));
tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opClassId));
}
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("operator class \"%s\" does not exist for access method \"%s\"",
NameListToString(opclass), accessMethodName)));
/*
* Verify that the index operator class accepts this datatype. Note we
2005-10-15 04:49:52 +02:00
* will accept binary compatibility.
*/
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
opform = (Form_pg_opclass) GETSTRUCT(tuple);
opClassId = opform->oid;
opInputType = opform->opcintype;
if (!IsBinaryCoercible(attrType, opInputType))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
2005-10-15 04:49:52 +02:00
errmsg("operator class \"%s\" does not accept data type %s",
NameListToString(opclass), format_type_be(attrType))));
ReleaseSysCache(tuple);
return opClassId;
}
/*
* GetDefaultOpClass
*
* Given the OIDs of a datatype and an access method, find the default
* operator class, if any. Returns InvalidOid if there is none.
*/
Oid
GetDefaultOpClass(Oid type_id, Oid am_id)
{
Oid result = InvalidOid;
int nexact = 0;
int ncompatible = 0;
int ncompatiblepreferred = 0;
Relation rel;
ScanKeyData skey[1];
SysScanDesc scan;
HeapTuple tup;
TYPCATEGORY tcategory;
/* If it's a domain, look at the base type instead */
type_id = getBaseType(type_id);
tcategory = TypeCategory(type_id);
/*
* We scan through all the opclasses available for the access method,
* looking for one that is marked default and matches the target type
* (either exactly or binary-compatibly, but prefer an exact match).
*
* We could find more than one binary-compatible match. If just one is
* for a preferred type, use that one; otherwise we fail, forcing the user
* to specify which one he wants. (The preferred-type special case is a
* kluge for varchar: it's binary-compatible to both text and bpchar, so
* we need a tiebreaker.) If we find more than one exact match, then
* someone put bogus entries in pg_opclass.
*/
rel = table_open(OperatorClassRelationId, AccessShareLock);
ScanKeyInit(&skey[0],
Anum_pg_opclass_opcmethod,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(am_id));
scan = systable_beginscan(rel, OpclassAmNameNspIndexId, true,
NULL, 1, skey);
while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
Form_pg_opclass opclass = (Form_pg_opclass) GETSTRUCT(tup);
/* ignore altogether if not a default opclass */
if (!opclass->opcdefault)
continue;
if (opclass->opcintype == type_id)
{
nexact++;
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
result = opclass->oid;
}
else if (nexact == 0 &&
IsBinaryCoercible(type_id, opclass->opcintype))
{
if (IsPreferredType(tcategory, opclass->opcintype))
{
ncompatiblepreferred++;
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
result = opclass->oid;
}
else if (ncompatiblepreferred == 0)
{
ncompatible++;
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
result = opclass->oid;
}
}
}
systable_endscan(scan);
table_close(rel, AccessShareLock);
/* raise error if pg_opclass contains inconsistent data */
if (nexact > 1)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("there are multiple default operator classes for data type %s",
format_type_be(type_id))));
if (nexact == 1 ||
ncompatiblepreferred == 1 ||
(ncompatiblepreferred == 0 && ncompatible == 1))
return result;
return InvalidOid;
}
/*
* makeObjectName()
*
* Create a name for an implicitly created index, sequence, constraint,
* extended statistics, etc.
*
* The parameters are typically: the original table name, the original field
* name, and a "type" string (such as "seq" or "pkey"). The field name
* and/or type can be NULL if not relevant.
*
* The result is a palloc'd string.
*
* The basic result we want is "name1_name2_label", omitting "_name2" or
* "_label" when those parameters are NULL. However, we must generate
* a name with less than NAMEDATALEN characters! So, we truncate one or
* both names if necessary to make a short-enough string. The label part
* is never truncated (so it had better be reasonably short).
*
* The caller is responsible for checking uniqueness of the generated
* name and retrying as needed; retrying will be done by altering the
* "label" string (which is why we never truncate that part).
*/
char *
makeObjectName(const char *name1, const char *name2, const char *label)
{
char *name;
int overhead = 0; /* chars needed for label and underscores */
int availchars; /* chars available for name(s) */
int name1chars; /* chars allocated to name1 */
int name2chars; /* chars allocated to name2 */
int ndx;
name1chars = strlen(name1);
if (name2)
{
name2chars = strlen(name2);
overhead++; /* allow for separating underscore */
}
else
name2chars = 0;
if (label)
overhead += strlen(label) + 1;
availchars = NAMEDATALEN - 1 - overhead;
Assert(availchars > 0); /* else caller chose a bad label */
/*
* If we must truncate, preferentially truncate the longer name. This
2005-10-15 04:49:52 +02:00
* logic could be expressed without a loop, but it's simple and obvious as
* a loop.
*/
while (name1chars + name2chars > availchars)
{
if (name1chars > name2chars)
name1chars--;
else
name2chars--;
}
name1chars = pg_mbcliplen(name1, name1chars, name1chars);
if (name2)
name2chars = pg_mbcliplen(name2, name2chars, name2chars);
/* Now construct the string using the chosen lengths */
name = palloc(name1chars + name2chars + overhead + 1);
memcpy(name, name1, name1chars);
ndx = name1chars;
if (name2)
{
name[ndx++] = '_';
memcpy(name + ndx, name2, name2chars);
ndx += name2chars;
}
if (label)
{
name[ndx++] = '_';
strcpy(name + ndx, label);
}
else
name[ndx] = '\0';
return name;
}
/*
* Select a nonconflicting name for a new relation. This is ordinarily
* used to choose index names (which is why it's here) but it can also
* be used for sequences, or any autogenerated relation kind.
*
* name1, name2, and label are used the same way as for makeObjectName(),
* except that the label can't be NULL; digits will be appended to the label
* if needed to create a name that is unique within the specified namespace.
*
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
* If isconstraint is true, we also avoid choosing a name matching any
* existing constraint in the same namespace. (This is stricter than what
* Postgres itself requires, but the SQL standard says that constraint names
* should be unique within schemas, so we follow that for autogenerated
* constraint names.)
*
* Note: it is theoretically possible to get a collision anyway, if someone
* else chooses the same name concurrently. This is fairly unlikely to be
* a problem in practice, especially if one is holding an exclusive lock on
* the relation identified by name1. However, if choosing multiple names
* within a single command, you'd better create the new object and do
* CommandCounterIncrement before choosing the next one!
*
* Returns a palloc'd string.
*/
char *
ChooseRelationName(const char *name1, const char *name2,
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
const char *label, Oid namespaceid,
bool isconstraint)
{
int pass = 0;
char *relname = NULL;
char modlabel[NAMEDATALEN];
/* try the unmodified label first */
StrNCpy(modlabel, label, sizeof(modlabel));
for (;;)
{
relname = makeObjectName(name1, name2, modlabel);
if (!OidIsValid(get_relname_relid(relname, namespaceid)))
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
{
if (!isconstraint ||
!ConstraintNameExists(relname, namespaceid))
break;
}
/* found a conflict, so try a new name component */
pfree(relname);
snprintf(modlabel, sizeof(modlabel), "%s%d", label, ++pass);
}
return relname;
}
/*
* Select the name to be used for an index.
*
* The argument list is pretty ad-hoc :-(
*/
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
bool primary, bool isconstraint)
{
char *indexname;
if (primary)
{
/* the primary key's name does not depend on the specific column(s) */
indexname = ChooseRelationName(tabname,
NULL,
"pkey",
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
namespaceId,
true);
}
else if (exclusionOpNames != NIL)
{
indexname = ChooseRelationName(tabname,
ChooseIndexNameAddition(colnames),
"excl",
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
namespaceId,
true);
}
else if (isconstraint)
{
indexname = ChooseRelationName(tabname,
ChooseIndexNameAddition(colnames),
"key",
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
namespaceId,
true);
}
else
{
indexname = ChooseRelationName(tabname,
ChooseIndexNameAddition(colnames),
"idx",
Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de
2018-09-04 19:45:35 +02:00
namespaceId,
false);
}
return indexname;
}
/*
* Generate "name2" for a new index given the list of column names for it
* (as produced by ChooseIndexColumnNames). This will be passed to
* ChooseRelationName along with the parent table name and a suitable label.
*
* We know that less than NAMEDATALEN characters will actually be used,
* so we can truncate the result once we've generated that many.
*
* XXX See also ChooseForeignKeyConstraintNameAddition and
* ChooseExtendedStatisticNameAddition.
*/
static char *
ChooseIndexNameAddition(List *colnames)
{
char buf[NAMEDATALEN * 2];
int buflen = 0;
ListCell *lc;
buf[0] = '\0';
foreach(lc, colnames)
{
const char *name = (const char *) lfirst(lc);
if (buflen > 0)
2010-02-26 03:01:40 +01:00
buf[buflen++] = '_'; /* insert _ between names */
/*
* At this point we have buflen <= NAMEDATALEN. name should be less
* than NAMEDATALEN already, but use strlcpy for paranoia.
*/
strlcpy(buf + buflen, name, NAMEDATALEN);
buflen += strlen(buf + buflen);
if (buflen >= NAMEDATALEN)
break;
}
return pstrdup(buf);
}
/*
* Select the actual names to be used for the columns of an index, given the
* list of IndexElems for the columns. This is mostly about ensuring the
* names are unique so we don't get a conflicting-attribute-names error.
*
* Returns a List of plain strings (char *, not String nodes).
*/
static List *
ChooseIndexColumnNames(List *indexElems)
{
List *result = NIL;
ListCell *lc;
foreach(lc, indexElems)
{
IndexElem *ielem = (IndexElem *) lfirst(lc);
const char *origname;
const char *curname;
int i;
char buf[NAMEDATALEN];
/* Get the preliminary name from the IndexElem */
if (ielem->indexcolname)
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
origname = ielem->indexcolname; /* caller-specified name */
else if (ielem->name)
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
origname = ielem->name; /* simple column reference */
else
2010-02-26 03:01:40 +01:00
origname = "expr"; /* default name for expression */
/* If it conflicts with any previous column, tweak it */
curname = origname;
for (i = 1;; i++)
{
ListCell *lc2;
char nbuf[32];
int nlen;
foreach(lc2, result)
{
if (strcmp(curname, (char *) lfirst(lc2)) == 0)
break;
}
if (lc2 == NULL)
break; /* found nonconflicting name */
sprintf(nbuf, "%d", i);
/* Ensure generated names are shorter than NAMEDATALEN */
nlen = pg_mbcliplen(origname, strlen(origname),
NAMEDATALEN - 1 - strlen(nbuf));
memcpy(buf, origname, nlen);
strcpy(buf + nlen, nbuf);
curname = buf;
}
/* And attach to the result list */
result = lappend(result, pstrdup(curname));
}
return result;
}
2000-02-18 10:30:20 +01:00
/*
* ReindexIndex
* Recreate a specific index.
2000-02-18 10:30:20 +01:00
*/
void
ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
2000-02-18 10:30:20 +01:00
{
Oid indOid;
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
Oid heapOid = InvalidOid;
Relation irel;
char persistence;
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
/*
* Find and lock index, and check permissions on table; use callback to
* obtain lock on table first, to avoid deadlock hazard. The lock level
* used here must match the index lock obtained in reindex_index().
*/
indOid = RangeVarGetRelidExtended(indexRelation,
concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
0,
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
RangeVarCallbackForReindexIndex,
(void *) &heapOid);
/*
* Obtain the current persistence of the existing index. We already hold
* lock on the index.
*/
irel = index_open(indOid, NoLock);
if (irel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
{
ReindexPartitionedIndex(irel);
return;
}
persistence = irel->rd_rel->relpersistence;
index_close(irel, NoLock);
if (concurrent)
ReindexRelationConcurrently(indOid, options);
else
reindex_index(indOid, false, persistence, options);
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
}
/*
* Check permissions on table before acquiring relation lock; also lock
* the heap before the RangeVarGetRelidExtended takes the index lock, to avoid
* deadlocks.
*/
static void
RangeVarCallbackForReindexIndex(const RangeVar *relation,
Oid relId, Oid oldRelId, void *arg)
{
char relkind;
Oid *heapOid = (Oid *) arg;
/*
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
* If we previously locked some other index's heap, and the name we're
* looking up no longer refers to that relation, release the now-useless
* lock.
*/
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
if (relId != oldRelId && OidIsValid(oldRelId))
{
/* lock level here should match reindex_index() heap lock */
UnlockRelationOid(*heapOid, ShareLock);
*heapOid = InvalidOid;
}
/* If the relation does not exist, there's nothing more to do. */
if (!OidIsValid(relId))
return;
2000-02-18 10:30:20 +01:00
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
/*
* If the relation does exist, check whether it's an index. But note that
* the relation might have been dropped between the time we did the name
* lookup and now. In that case, there's nothing to do.
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
*/
relkind = get_rel_relkind(relId);
if (!relkind)
return;
if (relkind != RELKIND_INDEX &&
relkind != RELKIND_PARTITIONED_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
errmsg("\"%s\" is not an index", relation->relname)));
/* Check permissions */
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
if (!pg_class_ownercheck(relId, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_INDEX, relation->relname);
2000-02-18 10:30:20 +01:00
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
/* Lock heap before index to avoid deadlock. */
if (relId != oldRelId)
{
/*
* Lock level here should match reindex_index() heap lock. If the OID
* isn't valid, it means the index as concurrently dropped, which is
* not a problem for us; just return normally.
Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.
2011-11-30 16:12:27 +01:00
*/
*heapOid = IndexGetRelation(relId, true);
if (OidIsValid(*heapOid))
LockRelationOid(*heapOid, ShareLock);
}
2000-02-18 10:30:20 +01:00
}
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
2000-02-18 10:30:20 +01:00
*/
Oid
ReindexTable(RangeVar *relation, int options, bool concurrent)
2000-02-18 10:30:20 +01:00
{
Oid heapOid;
bool result;
2000-02-18 10:30:20 +01:00
/* The lock level used here should match reindex_relation(). */
heapOid = RangeVarGetRelidExtended(relation,
concurrent ? ShareUpdateExclusiveLock : ShareLock,
0,
RangeVarCallbackOwnsTable, NULL);
if (concurrent)
result = ReindexRelationConcurrently(heapOid, options);
else
result = reindex_relation(heapOid,
REINDEX_REL_PROCESS_TOAST |
REINDEX_REL_CHECK_CONSTRAINTS,
options);
if (!result)
ereport(NOTICE,
(errmsg("table \"%s\" has no indexes",
relation->relname)));
return heapOid;
2000-02-18 10:30:20 +01:00
}
/*
* ReindexMultipleTables
* Recreate indexes of tables selected by objectName/objectKind.
*
* To reduce the probability of deadlocks, each table is reindexed in a
* separate transaction, so we can release the lock on it right away.
* That means this must not be called within a user transaction block!
2000-02-18 10:30:20 +01:00
*/
void
ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
int options, bool concurrent)
2000-02-18 10:30:20 +01:00
{
Oid objectOid;
2004-08-29 07:07:03 +02:00
Relation relationRelation;
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
TableScanDesc scan;
ScanKeyData scan_keys[1];
2004-08-29 07:07:03 +02:00
HeapTuple tuple;
MemoryContext private_context;
MemoryContext old;
2004-08-29 07:07:03 +02:00
List *relids = NIL;
ListCell *l;
int num_keys;
bool concurrent_warning = false;
2000-02-18 10:30:20 +01:00
AssertArg(objectName);
Assert(objectKind == REINDEX_OBJECT_SCHEMA ||
objectKind == REINDEX_OBJECT_SYSTEM ||
objectKind == REINDEX_OBJECT_DATABASE);
2000-02-18 10:30:20 +01:00
if (objectKind == REINDEX_OBJECT_SYSTEM && concurrent)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent reindex of system catalogs is not supported")));
/*
* Get OID of object to reindex, being the database currently being used
* by session for a database or for system catalogs, or the schema defined
* by caller. At the same time do permission checks that need different
* processing depending on the object type.
*/
if (objectKind == REINDEX_OBJECT_SCHEMA)
{
objectOid = get_namespace_oid(objectName, false);
if (!pg_namespace_ownercheck(objectOid, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_SCHEMA,
objectName);
}
else
{
objectOid = MyDatabaseId;
if (strcmp(objectName, get_database_name(objectOid)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("can only reindex the currently open database")));
if (!pg_database_ownercheck(objectOid, GetUserId()))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
objectName);
}
2000-02-18 10:30:20 +01:00
/*
2005-10-15 04:49:52 +02:00
* Create a memory context that will survive forced transaction commits we
* do below. Since it is a child of PortalContext, it will go away
* eventually even if we suffer an error; there's no need for special
* abort cleanup logic.
*/
private_context = AllocSetContextCreate(PortalContext,
"ReindexMultipleTables",
Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
ALLOCSET_SMALL_SIZES);
2000-02-18 10:30:20 +01:00
/*
* Define the search keys to find the objects to reindex. For a schema, we
* select target relations using relnamespace, something not necessary for
* a database-wide operation.
*/
if (objectKind == REINDEX_OBJECT_SCHEMA)
{
num_keys = 1;
ScanKeyInit(&scan_keys[0],
Anum_pg_class_relnamespace,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(objectOid));
}
else
num_keys = 0;
/*
* Scan pg_class to build a list of the relations we need to reindex.
*
* We only consider plain relations and materialized views here (toast
* rels will be processed indirectly by reindex_relation).
*/
relationRelation = table_open(RelationRelationId, AccessShareLock);
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
scan = table_beginscan_catalog(relationRelation, num_keys, scan_keys);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
2000-02-18 10:30:20 +01:00
{
Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
Oid relid = classtuple->oid;
/*
* Only regular tables and matviews can have indexes, so ignore any
* other kind of relation.
*
* It is tempting to also consider partitioned tables here, but that
* has the problem that if the children are in the same schema, they
* would be processed twice. Maybe we could have a separate list of
* partitioned tables, and expand that afterwards into relids,
* ignoring any duplicates.
*/
if (classtuple->relkind != RELKIND_RELATION &&
classtuple->relkind != RELKIND_MATVIEW)
continue;
/* Skip temp tables of other backends; we can't reindex them at all */
if (classtuple->relpersistence == RELPERSISTENCE_TEMP &&
!isTempNamespace(classtuple->relnamespace))
continue;
/* Check user/system classification, and optionally skip */
if (objectKind == REINDEX_OBJECT_SYSTEM &&
!IsSystemClass(relid, classtuple))
continue;
/*
* The table can be reindexed if the user is superuser, the table
* owner, or the database/schema owner (but in the latter case, only
* if it's not a shared relation). pg_class_ownercheck includes the
* superuser case, and depending on objectKind we already know that
* the user has permission to run REINDEX on this database or schema
* per the permission checks at the beginning of this routine.
*/
if (classtuple->relisshared &&
!pg_class_ownercheck(relid, GetUserId()))
continue;
/*
* Skip system tables that index_create() would reject to index
* concurrently. XXX We need the additional check for
* FirstNormalObjectId to skip information_schema tables, because
* IsCatalogClass() here does not cover information_schema, but the
* check in index_create() will error on the TOAST tables of
* information_schema tables.
*/
if (concurrent &&
(IsCatalogClass(relid, classtuple) || relid < FirstNormalObjectId))
{
if (!concurrent_warning)
ereport(WARNING,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent reindex is not supported for catalog relations, skipping all")));
concurrent_warning = true;
continue;
}
/* Save the list of relation OIDs in private context */
old = MemoryContextSwitchTo(private_context);
/*
* We always want to reindex pg_class first if it's selected to be
* reindexed. This ensures that if there is any corruption in
* pg_class' indexes, they will be fixed before we process any other
* tables. This is critical because reindexing itself will try to
* update pg_class.
*/
if (relid == RelationRelationId)
relids = lcons_oid(relid, relids);
else
relids = lappend_oid(relids, relid);
MemoryContextSwitchTo(old);
2000-02-18 10:30:20 +01:00
}
tableam: Add and use scan APIs. Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
table_endscan(scan);
table_close(relationRelation, AccessShareLock);
2000-02-18 10:30:20 +01:00
/* Now reindex each rel in a separate transaction */
PopActiveSnapshot();
CommitTransactionCommand();
foreach(l, relids)
2000-02-18 10:30:20 +01:00
{
2004-08-29 07:07:03 +02:00
Oid relid = lfirst_oid(l);
bool result;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
if (concurrent)
{
result = ReindexRelationConcurrently(relid, options);
/* ReindexRelationConcurrently() does the verbose output */
}
else
{
result = reindex_relation(relid,
REINDEX_REL_PROCESS_TOAST |
REINDEX_REL_CHECK_CONSTRAINTS,
options);
if (result && (options & REINDEXOPT_VERBOSE))
ereport(INFO,
(errmsg("table \"%s.%s\" was reindexed",
get_namespace_name(get_rel_namespace(relid)),
get_rel_name(relid))));
PopActiveSnapshot();
}
CommitTransactionCommand();
}
StartTransactionCommand();
MemoryContextDelete(private_context);
}
/*
* ReindexRelationConcurrently - process REINDEX CONCURRENTLY for given
* relation OID
*
* The relation can be either an index or a table. If it is a table, all its
* valid indexes will be rebuilt, including its associated toast table
* indexes. If it is an index, this index itself will be rebuilt.
*
* The locks taken on parent tables and involved indexes are kept until the
* transaction is committed, at which point a session lock is taken on each
* relation. Both of these protect against concurrent schema changes.
*/
static bool
ReindexRelationConcurrently(Oid relationOid, int options)
{
List *heapRelationIds = NIL;
List *indexIds = NIL;
List *newIndexIds = NIL;
List *relationLocks = NIL;
List *lockTags = NIL;
ListCell *lc,
*lc2;
MemoryContext private_context;
MemoryContext oldcontext;
char relkind;
char *relationName = NULL;
char *relationNamespace = NULL;
PGRUsage ru0;
/*
* Create a memory context that will survive forced transaction commits we
* do below. Since it is a child of PortalContext, it will go away
* eventually even if we suffer an error; there's no need for special
* abort cleanup logic.
*/
private_context = AllocSetContextCreate(PortalContext,
"ReindexConcurrent",
ALLOCSET_SMALL_SIZES);
if (options & REINDEXOPT_VERBOSE)
{
/* Save data needed by REINDEX VERBOSE in private context */
oldcontext = MemoryContextSwitchTo(private_context);
relationName = get_rel_name(relationOid);
relationNamespace = get_namespace_name(get_rel_namespace(relationOid));
pg_rusage_init(&ru0);
MemoryContextSwitchTo(oldcontext);
}
relkind = get_rel_relkind(relationOid);
/*
* Extract the list of indexes that are going to be rebuilt based on the
* list of relation Oids given by caller.
*/
switch (relkind)
{
case RELKIND_RELATION:
case RELKIND_MATVIEW:
case RELKIND_TOASTVALUE:
{
/*
* In the case of a relation, find all its indexes including
* toast indexes.
*/
Relation heapRelation;
/* Save the list of relation OIDs in private context */
oldcontext = MemoryContextSwitchTo(private_context);
/* Track this relation for session locks */
heapRelationIds = lappend_oid(heapRelationIds, relationOid);
MemoryContextSwitchTo(oldcontext);
/* Open relation to get its indexes */
heapRelation = table_open(relationOid, ShareUpdateExclusiveLock);
/* Add all the valid indexes of relation to list */
foreach(lc, RelationGetIndexList(heapRelation))
{
Oid cellOid = lfirst_oid(lc);
Relation indexRelation = index_open(cellOid,
ShareUpdateExclusiveLock);
if (!indexRelation->rd_index->indisvalid)
ereport(WARNING,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
get_namespace_name(get_rel_namespace(cellOid)),
get_rel_name(cellOid))));
else if (indexRelation->rd_index->indisexclusion)
ereport(WARNING,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot reindex concurrently exclusion constraint index \"%s.%s\", skipping",
get_namespace_name(get_rel_namespace(cellOid)),
get_rel_name(cellOid))));
else
{
/* Save the list of relation OIDs in private context */
oldcontext = MemoryContextSwitchTo(private_context);
indexIds = lappend_oid(indexIds, cellOid);
MemoryContextSwitchTo(oldcontext);
}
index_close(indexRelation, NoLock);
}
/* Also add the toast indexes */
if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
{
Oid toastOid = heapRelation->rd_rel->reltoastrelid;
Relation toastRelation = table_open(toastOid,
ShareUpdateExclusiveLock);
/* Save the list of relation OIDs in private context */
oldcontext = MemoryContextSwitchTo(private_context);
/* Track this relation for session locks */
heapRelationIds = lappend_oid(heapRelationIds, toastOid);
MemoryContextSwitchTo(oldcontext);
foreach(lc2, RelationGetIndexList(toastRelation))
{
Oid cellOid = lfirst_oid(lc2);
Relation indexRelation = index_open(cellOid,
ShareUpdateExclusiveLock);
if (!indexRelation->rd_index->indisvalid)
ereport(WARNING,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
get_namespace_name(get_rel_namespace(cellOid)),
get_rel_name(cellOid))));
else
{
/*
* Save the list of relation OIDs in private
* context
*/
oldcontext = MemoryContextSwitchTo(private_context);
indexIds = lappend_oid(indexIds, cellOid);
MemoryContextSwitchTo(oldcontext);
}
index_close(indexRelation, NoLock);
}
table_close(toastRelation, NoLock);
}
table_close(heapRelation, NoLock);
break;
}
case RELKIND_INDEX:
{
Oid heapId = IndexGetRelation(relationOid, false);
/* A shared relation cannot be reindexed concurrently */
if (IsSharedRelation(heapId))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent reindex is not supported for shared relations")));
/* A system catalog cannot be reindexed concurrently */
if (IsSystemNamespace(get_rel_namespace(heapId)))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent reindex is not supported for catalog relations")));
/* Save the list of relation OIDs in private context */
oldcontext = MemoryContextSwitchTo(private_context);
/* Track the heap relation of this index for session locks */
heapRelationIds = list_make1_oid(heapId);
/*
* Save the list of relation OIDs in private context. Note
* that invalid indexes are allowed here.
*/
indexIds = lappend_oid(indexIds, relationOid);
MemoryContextSwitchTo(oldcontext);
break;
}
case RELKIND_PARTITIONED_TABLE:
/* see reindex_relation() */
ereport(WARNING,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("REINDEX of partitioned tables is not yet implemented, skipping \"%s\"",
get_rel_name(relationOid))));
return false;
default:
/* Return error if type of relation is not supported */
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("cannot reindex concurrently this type of relation")));
break;
}
/* Definitely no indexes, so leave */
if (indexIds == NIL)
{
PopActiveSnapshot();
return false;
}
Assert(heapRelationIds != NIL);
/*-----
* Now we have all the indexes we want to process in indexIds.
*
* The phases now are:
*
* 1. create new indexes in the catalog
* 2. build new indexes
* 3. let new indexes catch up with tuples inserted in the meantime
* 4. swap index names
* 5. mark old indexes as dead
* 6. drop old indexes
*
* We process each phase for all indexes before moving to the next phase,
* for efficiency.
*/
/*
* Phase 1 of REINDEX CONCURRENTLY
*
* Create a new index with the same properties as the old one, but it is
* only registered in catalogs and will be built later. Then get session
* locks on all involved tables. See analogous code in DefineIndex() for
* more detailed comments.
*/
foreach(lc, indexIds)
{
char *concurrentName;
Oid indexId = lfirst_oid(lc);
Oid newIndexId;
Relation indexRel;
Relation heapRel;
Relation newIndexRel;
LockRelId *lockrelid;
indexRel = index_open(indexId, ShareUpdateExclusiveLock);
heapRel = table_open(indexRel->rd_index->indrelid,
ShareUpdateExclusiveLock);
pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
RelationGetRelid(heapRel));
pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
indexId);
pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
indexRel->rd_rel->relam);
/* Choose a temporary relation name for the new index */
concurrentName = ChooseRelationName(get_rel_name(indexId),
NULL,
"ccnew",
get_rel_namespace(indexRel->rd_index->indrelid),
false);
/* Create new index definition based on given index */
newIndexId = index_concurrently_create_copy(heapRel,
indexId,
concurrentName);
/* Now open the relation of the new index, a lock is also needed on it */
newIndexRel = index_open(indexId, ShareUpdateExclusiveLock);
/*
* Save the list of OIDs and locks in private context
*/
oldcontext = MemoryContextSwitchTo(private_context);
newIndexIds = lappend_oid(newIndexIds, newIndexId);
/*
* Save lockrelid to protect each relation from drop then close
* relations. The lockrelid on parent relation is not taken here to
* avoid multiple locks taken on the same relation, instead we rely on
* parentRelationIds built earlier.
*/
lockrelid = palloc(sizeof(*lockrelid));
*lockrelid = indexRel->rd_lockInfo.lockRelId;
relationLocks = lappend(relationLocks, lockrelid);
lockrelid = palloc(sizeof(*lockrelid));
*lockrelid = newIndexRel->rd_lockInfo.lockRelId;
relationLocks = lappend(relationLocks, lockrelid);
MemoryContextSwitchTo(oldcontext);
index_close(indexRel, NoLock);
index_close(newIndexRel, NoLock);
table_close(heapRel, NoLock);
}
/*
* Save the heap lock for following visibility checks with other backends
* might conflict with this session.
*/
foreach(lc, heapRelationIds)
{
Relation heapRelation = table_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
LockRelId *lockrelid;
LOCKTAG *heaplocktag;
/* Save the list of locks in private context */
oldcontext = MemoryContextSwitchTo(private_context);
/* Add lockrelid of heap relation to the list of locked relations */
lockrelid = palloc(sizeof(*lockrelid));
*lockrelid = heapRelation->rd_lockInfo.lockRelId;
relationLocks = lappend(relationLocks, lockrelid);
heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
/* Save the LOCKTAG for this parent relation for the wait phase */
SET_LOCKTAG_RELATION(*heaplocktag, lockrelid->dbId, lockrelid->relId);
lockTags = lappend(lockTags, heaplocktag);
MemoryContextSwitchTo(oldcontext);
/* Close heap relation */
table_close(heapRelation, NoLock);
}
/* Get a session-level lock on each table. */
foreach(lc, relationLocks)
{
LockRelId *lockrelid = (LockRelId *) lfirst(lc);
LockRelationIdForSession(lockrelid, ShareUpdateExclusiveLock);
}
PopActiveSnapshot();
CommitTransactionCommand();
StartTransactionCommand();
/*
* Phase 2 of REINDEX CONCURRENTLY
*
* Build the new indexes in a separate transaction for each index to avoid
* having open transactions for an unnecessary long time. But before
* doing that, wait until no running transactions could have the table of
* the index open with the old list of indexes. See "phase 2" in
* DefineIndex() for more details.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_1);
WaitForLockersMultiple(lockTags, ShareLock, true);
CommitTransactionCommand();
forboth(lc, indexIds, lc2, newIndexIds)
{
Relation indexRel;
Oid oldIndexId = lfirst_oid(lc);
Oid newIndexId = lfirst_oid(lc2);
Oid heapId;
CHECK_FOR_INTERRUPTS();
/* Start new transaction for this index's concurrent build */
StartTransactionCommand();
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
/*
* Index relation has been closed by previous commit, so reopen it to
* get its information.
*/
indexRel = index_open(oldIndexId, ShareUpdateExclusiveLock);
heapId = indexRel->rd_index->indrelid;
index_close(indexRel, NoLock);
/* Perform concurrent build of new index */
index_concurrently_build(heapId, newIndexId);
PopActiveSnapshot();
CommitTransactionCommand();
}
StartTransactionCommand();
/*
* Phase 3 of REINDEX CONCURRENTLY
*
* During this phase the old indexes catch up with any new tuples that
* were created during the previous phase. See "phase 3" in DefineIndex()
* for more details.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_2);
WaitForLockersMultiple(lockTags, ShareLock, true);
CommitTransactionCommand();
foreach(lc, newIndexIds)
{
Oid newIndexId = lfirst_oid(lc);
Oid heapId;
TransactionId limitXmin;
Snapshot snapshot;
CHECK_FOR_INTERRUPTS();
StartTransactionCommand();
heapId = IndexGetRelation(newIndexId, false);
/*
* Take the "reference snapshot" that will be used by validate_index()
* to filter candidate tuples.
*/
snapshot = RegisterSnapshot(GetTransactionSnapshot());
PushActiveSnapshot(snapshot);
validate_index(heapId, newIndexId, snapshot);
/*
* We can now do away with our active snapshot, we still need to save
* the xmin limit to wait for older snapshots.
*/
limitXmin = snapshot->xmin;
PopActiveSnapshot();
UnregisterSnapshot(snapshot);
/*
* To ensure no deadlocks, we must commit and start yet another
* transaction, and do our wait before any snapshot has been taken in
* it.
*/
CommitTransactionCommand();
StartTransactionCommand();
/*
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
* transactions that might have older snapshots.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_3);
WaitForOlderSnapshots(limitXmin, true);
CommitTransactionCommand();
2000-02-18 10:30:20 +01:00
}
/*
* Phase 4 of REINDEX CONCURRENTLY
*
* Now that the new indexes have been validated, swap each new index with
* its corresponding old index.
*
* We mark the new indexes as valid and the old indexes as not valid at
* the same time to make sure we only get constraint violations from the
* indexes with the correct names.
*/
StartTransactionCommand();
forboth(lc, indexIds, lc2, newIndexIds)
{
char *oldName;
Oid oldIndexId = lfirst_oid(lc);
Oid newIndexId = lfirst_oid(lc2);
Oid heapId;
CHECK_FOR_INTERRUPTS();
heapId = IndexGetRelation(oldIndexId, false);
/* Choose a relation name for old index */
oldName = ChooseRelationName(get_rel_name(oldIndexId),
NULL,
"ccold",
get_rel_namespace(heapId),
false);
/*
* Swap old index with the new one. This also marks the new one as
* valid and the old one as not valid.
*/
index_concurrently_swap(newIndexId, oldIndexId, oldName);
/*
* Invalidate the relcache for the table, so that after this commit
* all sessions will refresh any cached plans that might reference the
* index.
*/
CacheInvalidateRelcacheByRelid(heapId);
/*
* CCI here so that subsequent iterations see the oldName in the
* catalog and can choose a nonconflicting name for their oldName.
* Otherwise, this could lead to conflicts if a table has two indexes
* whose names are equal for the first NAMEDATALEN-minus-a-few
* characters.
*/
CommandCounterIncrement();
}
/* Commit this transaction and make index swaps visible */
CommitTransactionCommand();
StartTransactionCommand();
/*
* Phase 5 of REINDEX CONCURRENTLY
*
* Mark the old indexes as dead. First we must wait until no running
* transaction could be using the index for a query. See also
* index_drop() for more details.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_4);
WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
foreach(lc, indexIds)
{
Oid oldIndexId = lfirst_oid(lc);
Oid heapId;
CHECK_FOR_INTERRUPTS();
heapId = IndexGetRelation(oldIndexId, false);
index_concurrently_set_dead(heapId, oldIndexId);
}
/* Commit this transaction to make the updates visible. */
CommitTransactionCommand();
StartTransactionCommand();
/*
* Phase 6 of REINDEX CONCURRENTLY
*
* Drop the old indexes.
*/
pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
PROGRESS_CREATEIDX_PHASE_WAIT_4);
WaitForLockersMultiple(lockTags, AccessExclusiveLock, true);
PushActiveSnapshot(GetTransactionSnapshot());
{
ObjectAddresses *objects = new_object_addresses();
foreach(lc, indexIds)
{
Oid oldIndexId = lfirst_oid(lc);
ObjectAddress object;
object.classId = RelationRelationId;
object.objectId = oldIndexId;
object.objectSubId = 0;
add_exact_object_address(&object, objects);
}
/*
* Use PERFORM_DELETION_CONCURRENT_LOCK so that index_drop() uses the
* right lock level.
*/
performMultipleDeletions(objects, DROP_RESTRICT,
PERFORM_DELETION_CONCURRENT_LOCK | PERFORM_DELETION_INTERNAL);
}
PopActiveSnapshot();
CommitTransactionCommand();
/*
* Finally, release the session-level lock on the table.
*/
foreach(lc, relationLocks)
{
LockRelId *lockrelid = (LockRelId *) lfirst(lc);
UnlockRelationIdForSession(lockrelid, ShareUpdateExclusiveLock);
}
/* Start a new transaction to finish process properly */
StartTransactionCommand();
/* Log what we did */
if (options & REINDEXOPT_VERBOSE)
{
if (relkind == RELKIND_INDEX)
ereport(INFO,
(errmsg("index \"%s.%s\" was reindexed",
relationNamespace, relationName),
errdetail("%s.",
pg_rusage_show(&ru0))));
else
{
foreach(lc, newIndexIds)
{
Oid indOid = lfirst_oid(lc);
ereport(INFO,
(errmsg("index \"%s.%s\" was reindexed",
get_namespace_name(get_rel_namespace(indOid)),
get_rel_name(indOid))));
/* Don't show rusage here, since it's not per index. */
}
ereport(INFO,
(errmsg("table \"%s.%s\" was reindexed",
relationNamespace, relationName),
errdetail("%s.",
pg_rusage_show(&ru0))));
}
}
MemoryContextDelete(private_context);
pgstat_progress_end_command();
return true;
2000-02-18 10:30:20 +01:00
}
/*
* ReindexPartitionedIndex
* Reindex each child of the given partitioned index.
*
* Not yet implemented.
*/
static void
ReindexPartitionedIndex(Relation parentIdx)
{
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("REINDEX is not yet implemented for partitioned indexes")));
}
/*
* Insert or delete an appropriate pg_inherits tuple to make the given index
* be a partition of the indicated parent index.
*
* This also corrects the pg_depend information for the affected index.
*/
void
IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
{
Relation pg_inherits;
ScanKeyData key[2];
SysScanDesc scan;
Oid partRelid = RelationGetRelid(partitionIdx);
HeapTuple tuple;
bool fix_dependencies;
/* Make sure this is an index */
Assert(partitionIdx->rd_rel->relkind == RELKIND_INDEX ||
partitionIdx->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
/*
* Scan pg_inherits for rows linking our index to some parent.
*/
pg_inherits = relation_open(InheritsRelationId, RowExclusiveLock);
ScanKeyInit(&key[0],
Anum_pg_inherits_inhrelid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(partRelid));
ScanKeyInit(&key[1],
Anum_pg_inherits_inhseqno,
BTEqualStrategyNumber, F_INT4EQ,
Int32GetDatum(1));
scan = systable_beginscan(pg_inherits, InheritsRelidSeqnoIndexId, true,
NULL, 2, key);
tuple = systable_getnext(scan);
if (!HeapTupleIsValid(tuple))
{
if (parentOid == InvalidOid)
{
/*
* No pg_inherits row, and no parent wanted: nothing to do in this
* case.
*/
fix_dependencies = false;
}
else
{
Datum values[Natts_pg_inherits];
bool isnull[Natts_pg_inherits];
/*
* No pg_inherits row exists, and we want a parent for this index,
* so insert it.
*/
values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(partRelid);
values[Anum_pg_inherits_inhparent - 1] =
ObjectIdGetDatum(parentOid);
values[Anum_pg_inherits_inhseqno - 1] = Int32GetDatum(1);
memset(isnull, false, sizeof(isnull));
tuple = heap_form_tuple(RelationGetDescr(pg_inherits),
values, isnull);
CatalogTupleInsert(pg_inherits, tuple);
fix_dependencies = true;
}
}
else
{
Form_pg_inherits inhForm = (Form_pg_inherits) GETSTRUCT(tuple);
if (parentOid == InvalidOid)
{
/*
* There exists a pg_inherits row, which we want to clear; do so.
*/
CatalogTupleDelete(pg_inherits, &tuple->t_self);
fix_dependencies = true;
}
else
{
/*
* A pg_inherits row exists. If it's the same we want, then we're
* good; if it differs, that amounts to a corrupt catalog and
* should not happen.
*/
if (inhForm->inhparent != parentOid)
{
/* unexpected: we should not get called in this case */
elog(ERROR, "bogus pg_inherit row: inhrelid %u inhparent %u",
inhForm->inhrelid, inhForm->inhparent);
}
/* already in the right state */
fix_dependencies = false;
}
}
/* done with pg_inherits */
systable_endscan(scan);
relation_close(pg_inherits, RowExclusiveLock);
/* set relhassubclass if an index partition has been added to the parent */
if (OidIsValid(parentOid))
SetRelationHasSubclass(parentOid, true);
/* set relispartition correctly on the partition */
update_relispartition(partRelid, OidIsValid(parentOid));
if (fix_dependencies)
{
/*
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
* Insert/delete pg_depend rows. If setting a parent, add PARTITION
* dependencies on the parent index and the table; if removing a
* parent, delete PARTITION dependencies.
*/
if (OidIsValid(parentOid))
{
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
ObjectAddress partIdx;
ObjectAddress parentIdx;
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
ObjectAddress partitionTbl;
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
ObjectAddressSet(partIdx, RelationRelationId, partRelid);
ObjectAddressSet(parentIdx, RelationRelationId, parentOid);
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
ObjectAddressSet(partitionTbl, RelationRelationId,
partitionIdx->rd_index->indrelid);
recordDependencyOn(&partIdx, &parentIdx,
DEPENDENCY_PARTITION_PRI);
recordDependencyOn(&partIdx, &partitionTbl,
DEPENDENCY_PARTITION_SEC);
}
else
{
deleteDependencyRecordsForClass(RelationRelationId, partRelid,
RelationRelationId,
Redesign the partition dependency mechanism. The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
2019-02-11 20:41:13 +01:00
DEPENDENCY_PARTITION_PRI);
deleteDependencyRecordsForClass(RelationRelationId, partRelid,
RelationRelationId,
DEPENDENCY_PARTITION_SEC);
}
/* make our updates visible */
CommandCounterIncrement();
}
}
/*
* Subroutine of IndexSetParentIndex to update the relispartition flag of the
* given index to the given value.
*/
static void
update_relispartition(Oid relationId, bool newval)
{
HeapTuple tup;
Relation classRel;
classRel = table_open(RelationRelationId, RowExclusiveLock);
tup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relationId));
if (!HeapTupleIsValid(tup))
elog(ERROR, "cache lookup failed for relation %u", relationId);
Assert(((Form_pg_class) GETSTRUCT(tup))->relispartition != newval);
((Form_pg_class) GETSTRUCT(tup))->relispartition = newval;
CatalogTupleUpdate(classRel, &tup->t_self, tup);
heap_freetuple(tup);
table_close(classRel, RowExclusiveLock);
}