postgresql/src/include/utils/rel.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

674 lines
23 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* rel.h
* POSTGRES relation descriptor (a/k/a relcache entry) definitions.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
2010-09-20 22:08:53 +02:00
* src/include/utils/rel.h
*
*-------------------------------------------------------------------------
*/
#ifndef REL_H
#define REL_H
#include "access/tupdesc.h"
#include "access/xlog.h"
1999-07-16 19:07:40 +02:00
#include "catalog/pg_class.h"
#include "catalog/pg_index.h"
#include "catalog/pg_publication.h"
#include "nodes/bitmapset.h"
Load relcache entries' partitioning data on-demand, not immediately. Formerly the rd_partkey and rd_partdesc data structures were always populated immediately when a relcache entry was built or rebuilt. This patch changes things so that they are populated only when they are first requested. (Hence, callers *must* now always use RelationGetPartitionKey or RelationGetPartitionDesc; just fetching the pointer directly is no longer acceptable.) This seems to have some performance benefits, but the main reason to do it is that it eliminates a recursive-reload failure that occurs if the partkey or partdesc expressions contain any references to the relation's rowtype (as discovered by Amit Langote). In retrospect, since loading these data structures might result in execution of nearly-arbitrary code via eval_const_expressions, it was a dumb idea to require that to happen during relcache entry rebuild. Also, fix things so that old copies of a relcache partition descriptor will be dropped when the cache entry's refcount goes to zero. In the previous coding it was possible for such copies to survive for the lifetime of the session, as I'd complained of in a previous discussion. (This management technique still isn't perfect, but it's better than before.) Improve the commentary explaining how that works and why it's safe to hand out direct pointers to these relcache substructures. In passing, improve RelationBuildPartitionDesc by using the same memory-context-parent-swap approach used by RelationBuildPartitionKey, thereby making it less dependent on strong assumptions about what partition_bounds_copy does. Avoid doing get_rel_relkind in the critical section, too. Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit for prior work in the area, too. Although this is a pre-existing problem, no back-patch: the patch seems too invasive to be safe to back-patch, and the bug it fixes is a corner case that seems relatively unlikely to cause problems in the field. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 20:43:13 +01:00
#include "partitioning/partdefs.h"
#include "rewrite/prs2lock.h"
#include "storage/block.h"
#include "storage/relfilenode.h"
#include "utils/relcache.h"
#include "utils/reltrigger.h"
/*
* LockRelId and LockInfo really belong to lmgr.h, but it's more convenient
* to declare them here so we can have a LockInfoData field in a Relation.
*/
typedef struct LockRelId
{
Oid relId; /* a relation identifier */
Oid dbId; /* a database identifier */
} LockRelId;
typedef struct LockInfoData
{
LockRelId lockRelId;
} LockInfoData;
typedef LockInfoData *LockInfo;
/*
* Here are the contents of a relation cache entry.
*/
typedef struct RelationData
{
RelFileNode rd_node; /* relation physical identifier */
/* use "struct" here to avoid needing to include smgr.h: */
struct SMgrRelationData *rd_smgr; /* cached file handle, or NULL */
int rd_refcnt; /* reference count */
BackendId rd_backend; /* owning backend id, if temporary relation */
bool rd_islocaltemp; /* rel is a temp rel of this session */
bool rd_isnailed; /* rel is nailed in cache */
bool rd_isvalid; /* relcache entry is valid */
bool rd_indexvalid; /* is rd_indexlist valid? (also rd_pkindex and
* rd_replidindex) */
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
bool rd_statvalid; /* is rd_statlist valid? */
2004-08-29 07:07:03 +02:00
/*----------
* rd_createSubid is the ID of the highest subtransaction the rel has
Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
* survived into or zero if the rel or its rd_node was created before the
* current top transaction. (IndexStmt.oldNode leads to the case of a new
* rel with an old rd_node.) rd_firstRelfilenodeSubid is the ID of the
* highest subtransaction an rd_node change has survived into or zero if
* rd_node matches the value it had at the start of the current top
* transaction. (Rolling back the subtransaction that
* rd_firstRelfilenodeSubid denotes would restore rd_node to the value it
* had at the start of the current top transaction. Rolling back any
* lower subtransaction would not.) Their accuracy is critical to
* RelationNeedsWAL().
*
* rd_newRelfilenodeSubid is the ID of the highest subtransaction the
* most-recent relfilenode change has survived into or zero if not changed
* in the current transaction (or we have forgotten changing it). This
* field is accurate when non-zero, but it can be zero when a relation has
* multiple new relfilenodes within a single transaction, with one of them
* occurring in a subsequently aborted subtransaction, e.g.
* BEGIN;
* TRUNCATE t;
* SAVEPOINT save;
* TRUNCATE t;
* ROLLBACK TO save;
* -- rd_newRelfilenodeSubid is now forgotten
Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
*
* If every rd_*Subid field is zero, they are read-only outside
* relcache.c. Files that trigger rd_node changes by updating
* pg_class.reltablespace and/or pg_class.relfilenode call
* RelationAssumeNewRelfilenode() to update rd_*Subid.
*
* rd_droppedSubid is the ID of the highest subtransaction that a drop of
* the rel has survived into. In entries visible outside relcache.c, this
* is always zero.
*/
SubTransactionId rd_createSubid; /* rel was created in current xact */
Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
SubTransactionId rd_newRelfilenodeSubid; /* highest subxact changing
* rd_node to current value */
SubTransactionId rd_firstRelfilenodeSubid; /* highest subxact changing
* rd_node to any value */
SubTransactionId rd_droppedSubid; /* dropped with another Subid set */
Form_pg_class rd_rel; /* RELATION tuple */
TupleDesc rd_att; /* tuple descriptor */
Oid rd_id; /* relation's object id */
LockInfoData rd_lockInfo; /* lock mgr's info for locking relation */
RuleLock *rd_rules; /* rewrite rules */
MemoryContext rd_rulescxt; /* private memory cxt for rd_rules, if any */
TriggerDesc *trigdesc; /* Trigger info, or NULL if rel has none */
/* use "struct" here to avoid needing to include rowsecurity.h: */
struct RowSecurityDesc *rd_rsdesc; /* row security policies, or NULL */
/* data managed by RelationGetFKeyList: */
List *rd_fkeylist; /* list of ForeignKeyCacheInfo (see below) */
bool rd_fkeyvalid; /* true if list has been computed */
/* data managed by RelationGetPartitionKey: */
Load relcache entries' partitioning data on-demand, not immediately. Formerly the rd_partkey and rd_partdesc data structures were always populated immediately when a relcache entry was built or rebuilt. This patch changes things so that they are populated only when they are first requested. (Hence, callers *must* now always use RelationGetPartitionKey or RelationGetPartitionDesc; just fetching the pointer directly is no longer acceptable.) This seems to have some performance benefits, but the main reason to do it is that it eliminates a recursive-reload failure that occurs if the partkey or partdesc expressions contain any references to the relation's rowtype (as discovered by Amit Langote). In retrospect, since loading these data structures might result in execution of nearly-arbitrary code via eval_const_expressions, it was a dumb idea to require that to happen during relcache entry rebuild. Also, fix things so that old copies of a relcache partition descriptor will be dropped when the cache entry's refcount goes to zero. In the previous coding it was possible for such copies to survive for the lifetime of the session, as I'd complained of in a previous discussion. (This management technique still isn't perfect, but it's better than before.) Improve the commentary explaining how that works and why it's safe to hand out direct pointers to these relcache substructures. In passing, improve RelationBuildPartitionDesc by using the same memory-context-parent-swap approach used by RelationBuildPartitionKey, thereby making it less dependent on strong assumptions about what partition_bounds_copy does. Avoid doing get_rel_relkind in the critical section, too. Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit for prior work in the area, too. Although this is a pre-existing problem, no back-patch: the patch seems too invasive to be safe to back-patch, and the bug it fixes is a corner case that seems relatively unlikely to cause problems in the field. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 20:43:13 +01:00
PartitionKey rd_partkey; /* partition key, or NULL */
MemoryContext rd_partkeycxt; /* private context for rd_partkey, if any */
/* data managed by RelationGetPartitionDesc: */
Load relcache entries' partitioning data on-demand, not immediately. Formerly the rd_partkey and rd_partdesc data structures were always populated immediately when a relcache entry was built or rebuilt. This patch changes things so that they are populated only when they are first requested. (Hence, callers *must* now always use RelationGetPartitionKey or RelationGetPartitionDesc; just fetching the pointer directly is no longer acceptable.) This seems to have some performance benefits, but the main reason to do it is that it eliminates a recursive-reload failure that occurs if the partkey or partdesc expressions contain any references to the relation's rowtype (as discovered by Amit Langote). In retrospect, since loading these data structures might result in execution of nearly-arbitrary code via eval_const_expressions, it was a dumb idea to require that to happen during relcache entry rebuild. Also, fix things so that old copies of a relcache partition descriptor will be dropped when the cache entry's refcount goes to zero. In the previous coding it was possible for such copies to survive for the lifetime of the session, as I'd complained of in a previous discussion. (This management technique still isn't perfect, but it's better than before.) Improve the commentary explaining how that works and why it's safe to hand out direct pointers to these relcache substructures. In passing, improve RelationBuildPartitionDesc by using the same memory-context-parent-swap approach used by RelationBuildPartitionKey, thereby making it less dependent on strong assumptions about what partition_bounds_copy does. Avoid doing get_rel_relkind in the critical section, too. Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit for prior work in the area, too. Although this is a pre-existing problem, no back-patch: the patch seems too invasive to be safe to back-patch, and the bug it fixes is a corner case that seems relatively unlikely to cause problems in the field. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 20:43:13 +01:00
PartitionDesc rd_partdesc; /* partition descriptor, or NULL */
MemoryContext rd_pdcxt; /* private context for rd_partdesc, if any */
/* Same as above, for partdescs that omit detached partitions */
PartitionDesc rd_partdesc_nodetached; /* partdesc w/o detached parts */
MemoryContext rd_pddcxt; /* for rd_partdesc_nodetached, if any */
/*
* pg_inherits.xmin of the partition that was excluded in
* rd_partdesc_nodetached. This informs a future user of that partdesc:
* if this value is not in progress for the active snapshot, then the
* partdesc can be used, otherwise they have to build a new one. (This
* matches what find_inheritance_children_extended would do).
*/
TransactionId rd_partdesc_nodetached_xmin;
/* data managed by RelationGetPartitionQual: */
List *rd_partcheck; /* partition CHECK quals */
bool rd_partcheckvalid; /* true if list has been computed */
MemoryContext rd_partcheckcxt; /* private cxt for rd_partcheck, if any */
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
/* data managed by RelationGetIndexList: */
List *rd_indexlist; /* list of OIDs of indexes on relation */
Oid rd_pkindex; /* OID of primary key, if any */
Oid rd_replidindex; /* OID of replica identity index, if any */
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
/* data managed by RelationGetStatExtList: */
List *rd_statlist; /* list of OIDs of extended stats */
/* data managed by RelationGetIndexAttrBitmap: */
Bitmapset *rd_indexattr; /* identifies columns used in indexes */
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
PublicationActions *rd_pubactions; /* publication actions */
/*
* rd_options is set whenever rd_rel is loaded into the relcache entry.
* Note that you can NOT look into rd_rel for this data. NULL means "use
* defaults".
*/
bytea *rd_options; /* parsed pg_class.reloptions */
tableam: introduce table AM infrastructure. This introduces the concept of table access methods, i.e. CREATE ACCESS METHOD ... TYPE TABLE and CREATE TABLE ... USING (storage-engine). No table access functionality is delegated to table AMs as of this commit, that'll be done in following commits. Subsequent commits will incrementally abstract table access functionality to be routed through table access methods. That change is too large to be reviewed & committed at once, so it'll be done incrementally. Docs will be updated at the end, as adding them incrementally would likely make them less coherent, and definitely is a lot more work, without a lot of benefit. Table access methods are specified similar to index access methods, i.e. pg_am.amhandler returns, as INTERNAL, a pointer to a struct with callbacks. In contrast to index AMs that struct needs to live as long as a backend, typically that's achieved by just returning a pointer to a constant struct. Psql's \d+ now displays a table's access method. That can be disabled with HIDE_TABLEAM=true, which is mainly useful so regression tests can be run against different AMs. It's quite possible that this behaviour still needs to be fine tuned. For now it's not allowed to set a table AM for a partitioned table, as we've not resolved how partitions would inherit that. Disallowing allows us to introduce, if we decide that's the way forward, such a behaviour without a compatibility break. Catversion bumped, to add the heap table AM and references to it. Author: Haribabu Kommi, Andres Freund, Alvaro Herrera, Dimitri Golgov and others Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql https://postgr.es/m/20190107235616.6lur25ph22u5u5av@alap3.anarazel.de https://postgr.es/m/20190304234700.w5tmhducs5wxgzls@alap3.anarazel.de
2019-03-06 18:54:38 +01:00
/*
* Oid of the handler for this relation. For an index this is a function
* returning IndexAmRoutine, for table like relations a function returning
* TableAmRoutine. This is stored separately from rd_indam, rd_tableam as
* its lookup requires syscache access, but during relcache bootstrap we
* need to be able to initialize rd_tableam without syscache lookups.
*/
Oid rd_amhandler; /* OID of index AM's handler function */
/*
* Table access method.
*/
const struct TableAmRoutine *rd_tableam;
/* These are non-NULL only for an index relation: */
Form_pg_index rd_index; /* pg_index tuple describing this index */
/* use "struct" here to avoid needing to include htup.h: */
struct HeapTupleData *rd_indextuple; /* all of pg_index tuple */
/*
* index access support info (used only for an index relation)
*
* Note: only default support procs for each opclass are cached, namely
* those with lefttype and righttype equal to the opclass's opcintype. The
* arrays are indexed by support function number, which is a sufficient
* identifier given that restriction.
*/
MemoryContext rd_indexcxt; /* private memory cxt for this stuff */
/* use "struct" here to avoid needing to include amapi.h: */
struct IndexAmRoutine *rd_indam; /* index AM's API struct */
Oid *rd_opfamily; /* OIDs of op families for each index col */
Oid *rd_opcintype; /* OIDs of opclass declared input data types */
RegProcedure *rd_support; /* OIDs of support procedures */
Load relcache entries' partitioning data on-demand, not immediately. Formerly the rd_partkey and rd_partdesc data structures were always populated immediately when a relcache entry was built or rebuilt. This patch changes things so that they are populated only when they are first requested. (Hence, callers *must* now always use RelationGetPartitionKey or RelationGetPartitionDesc; just fetching the pointer directly is no longer acceptable.) This seems to have some performance benefits, but the main reason to do it is that it eliminates a recursive-reload failure that occurs if the partkey or partdesc expressions contain any references to the relation's rowtype (as discovered by Amit Langote). In retrospect, since loading these data structures might result in execution of nearly-arbitrary code via eval_const_expressions, it was a dumb idea to require that to happen during relcache entry rebuild. Also, fix things so that old copies of a relcache partition descriptor will be dropped when the cache entry's refcount goes to zero. In the previous coding it was possible for such copies to survive for the lifetime of the session, as I'd complained of in a previous discussion. (This management technique still isn't perfect, but it's better than before.) Improve the commentary explaining how that works and why it's safe to hand out direct pointers to these relcache substructures. In passing, improve RelationBuildPartitionDesc by using the same memory-context-parent-swap approach used by RelationBuildPartitionKey, thereby making it less dependent on strong assumptions about what partition_bounds_copy does. Avoid doing get_rel_relkind in the critical section, too. Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit for prior work in the area, too. Although this is a pre-existing problem, no back-patch: the patch seems too invasive to be safe to back-patch, and the bug it fixes is a corner case that seems relatively unlikely to cause problems in the field. Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
2019-12-25 20:43:13 +01:00
struct FmgrInfo *rd_supportinfo; /* lookup info for support procedures */
int16 *rd_indoption; /* per-column AM-specific flags */
List *rd_indexprs; /* index expression trees, if any */
List *rd_indpred; /* index predicate tree, if any */
Oid *rd_exclops; /* OIDs of exclusion operators, if any */
Oid *rd_exclprocs; /* OIDs of exclusion ops' procs, if any */
uint16 *rd_exclstrats; /* exclusion ops' strategy numbers, if any */
Oid *rd_indcollation; /* OIDs of index collations */
Implement operator class parameters PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
2020-03-30 18:17:11 +02:00
bytea **rd_opcoptions; /* parsed opclass-specific options */
/*
* rd_amcache is available for index and table AMs to cache private data
* about the relation. This must be just a cache since it may get reset
* at any time (in particular, it will get reset by a relcache inval
* message for the relation). If used, it must point to a single memory
* chunk palloc'd in CacheMemoryContext, or in rd_indexcxt for an index
* relation. A relcache reset will include freeing that chunk and setting
* rd_amcache = NULL.
*/
void *rd_amcache; /* available for use by index/table AM */
/*
* foreign-table support
*
* rd_fdwroutine must point to a single memory chunk palloc'd in
* CacheMemoryContext. It will be freed and reset to NULL on a relcache
* reset.
*/
/* use "struct" here to avoid needing to include fdwapi.h: */
struct FdwRoutine *rd_fdwroutine; /* cached function pointers, or NULL */
/*
* Hack for CLUSTER, rewriting ALTER TABLE, etc: when writing a new
* version of a table, we need to make any toast pointers inserted into it
* have the existing toast table's OID, not the OID of the transient toast
* table. If rd_toastoid isn't InvalidOid, it is the OID to place in
* toast pointers inserted into this rel. (Note it's set on the new
Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows. In commit 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.
2012-01-12 22:40:14 +01:00
* version of the main heap, not the toast table itself.) This also
* causes toast_save_datum() to try to preserve toast value OIDs.
*/
Oid rd_toastoid; /* Real TOAST table's OID, or InvalidOid */
/* use "struct" here to avoid needing to include pgstat.h: */
struct PgStat_TableStatus *pgstat_info; /* statistics collection area */
} RelationData;
/*
* ForeignKeyCacheInfo
* Information the relcache can cache about foreign key constraints
*
* This is basically just an image of relevant columns from pg_constraint.
* We make it a subclass of Node so that copyObject() can be used on a list
* of these, but we also ensure it is a "flat" object without substructure,
* so that list_free_deep() is sufficient to free such a list.
* The per-FK-column arrays can be fixed-size because we allow at most
* INDEX_MAX_KEYS columns in a foreign key constraint.
*
Correct attach/detach logic for FKs in partitions There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
2018-10-12 17:36:26 +02:00
* Currently, we mostly cache fields of interest to the planner, but the set
* of fields has already grown the constraint OID for other uses.
*/
typedef struct ForeignKeyCacheInfo
{
NodeTag type;
Correct attach/detach logic for FKs in partitions There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
2018-10-12 17:36:26 +02:00
Oid conoid; /* oid of the constraint itself */
Oid conrelid; /* relation constrained by the foreign key */
Oid confrelid; /* relation referenced by the foreign key */
int nkeys; /* number of columns in the foreign key */
/* these arrays each have nkeys valid entries: */
AttrNumber conkey[INDEX_MAX_KEYS]; /* cols in referencing table */
AttrNumber confkey[INDEX_MAX_KEYS]; /* cols in referenced table */
Oid conpfeqop[INDEX_MAX_KEYS]; /* PK = FK operator OIDs */
} ForeignKeyCacheInfo;
/*
* StdRdOptions
* Standard contents of rd_options for heaps.
*
* RelationGetFillFactor() and RelationGetTargetPageFreeSpace() can only
* be applied to relations that use this format or a superset for
* private options data.
*/
/* autovacuum-related reloptions. */
typedef struct AutoVacOpts
{
bool enabled;
int vacuum_threshold;
Trigger autovacuum based on number of INSERTs Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-28 07:20:12 +01:00
int vacuum_ins_threshold;
int analyze_threshold;
int vacuum_cost_limit;
int freeze_min_age;
int freeze_max_age;
int freeze_table_age;
int multixact_freeze_min_age;
int multixact_freeze_max_age;
int multixact_freeze_table_age;
int log_min_duration;
float8 vacuum_cost_delay;
float8 vacuum_scale_factor;
Trigger autovacuum based on number of INSERTs Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
2020-03-28 07:20:12 +01:00
float8 vacuum_ins_scale_factor;
float8 analyze_scale_factor;
} AutoVacOpts;
/* StdRdOptions->vacuum_index_cleanup values */
typedef enum StdRdOptIndexCleanup
{
STDRD_OPTION_VACUUM_INDEX_CLEANUP_AUTO = 0,
STDRD_OPTION_VACUUM_INDEX_CLEANUP_OFF,
STDRD_OPTION_VACUUM_INDEX_CLEANUP_ON
} StdRdOptIndexCleanup;
typedef struct StdRdOptions
{
int32 vl_len_; /* varlena header (do not touch directly!) */
int fillfactor; /* page fill factor in percent (0..100) */
Skip full index scan during cleanup of B-tree indexes when possible Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
2018-04-04 18:29:00 +02:00
/* fraction of newly inserted tuples prior to trigger index cleanup */
int toast_tuple_target; /* target for tuple toasting */
AutoVacOpts autovacuum; /* autovacuum-related options */
bool user_catalog_table; /* use as an additional catalog relation */
int parallel_workers; /* max number of parallel workers */
StdRdOptIndexCleanup vacuum_index_cleanup; /* controls index vacuuming */
bool vacuum_truncate; /* enables vacuum to truncate a relation */
} StdRdOptions;
#define HEAP_MIN_FILLFACTOR 10
#define HEAP_DEFAULT_FILLFACTOR 100
/*
* RelationGetToastTupleTarget
* Returns the relation's toast_tuple_target. Note multiple eval of argument!
*/
#define RelationGetToastTupleTarget(relation, defaulttarg) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->toast_tuple_target : (defaulttarg))
/*
* RelationGetFillFactor
* Returns the relation's fillfactor. Note multiple eval of argument!
*/
#define RelationGetFillFactor(relation, defaultff) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->fillfactor : (defaultff))
/*
* RelationGetTargetPageUsage
* Returns the relation's desired space usage per page in bytes.
*/
#define RelationGetTargetPageUsage(relation, defaultff) \
(BLCKSZ * RelationGetFillFactor(relation, defaultff) / 100)
/*
* RelationGetTargetPageFreeSpace
* Returns the relation's desired freespace per page in bytes.
*/
#define RelationGetTargetPageFreeSpace(relation, defaultff) \
(BLCKSZ * (100 - RelationGetFillFactor(relation, defaultff)) / 100)
/*
* RelationIsUsedAsCatalogTable
* Returns whether the relation should be treated as a catalog table
* from the pov of logical decoding. Note multiple eval of argument!
*/
#define RelationIsUsedAsCatalogTable(relation) \
((relation)->rd_options && \
((relation)->rd_rel->relkind == RELKIND_RELATION || \
(relation)->rd_rel->relkind == RELKIND_MATVIEW) ? \
((StdRdOptions *) (relation)->rd_options)->user_catalog_table : false)
/*
* RelationGetParallelWorkers
* Returns the relation's parallel_workers reloption setting.
* Note multiple eval of argument!
*/
#define RelationGetParallelWorkers(relation, defaultpw) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
/* ViewOptions->check_option values */
typedef enum ViewOptCheckOption
{
VIEW_OPTION_CHECK_OPTION_NOT_SET,
VIEW_OPTION_CHECK_OPTION_LOCAL,
VIEW_OPTION_CHECK_OPTION_CASCADED
} ViewOptCheckOption;
/*
* ViewOptions
* Contents of rd_options for views
*/
typedef struct ViewOptions
{
int32 vl_len_; /* varlena header (do not touch directly!) */
bool security_barrier;
ViewOptCheckOption check_option;
} ViewOptions;
/*
* RelationIsSecurityView
* Returns whether the relation is security view, or not. Note multiple
* eval of argument!
*/
#define RelationIsSecurityView(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_VIEW), \
(relation)->rd_options ? \
((ViewOptions *) (relation)->rd_options)->security_barrier : false)
/*
* RelationHasCheckOption
* Returns true if the relation is a view defined with either the local
* or the cascaded check option. Note multiple eval of argument!
*/
#define RelationHasCheckOption(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_VIEW), \
(relation)->rd_options && \
((ViewOptions *) (relation)->rd_options)->check_option != \
VIEW_OPTION_CHECK_OPTION_NOT_SET)
/*
* RelationHasLocalCheckOption
* Returns true if the relation is a view defined with the local check
* option. Note multiple eval of argument!
*/
#define RelationHasLocalCheckOption(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_VIEW), \
(relation)->rd_options && \
((ViewOptions *) (relation)->rd_options)->check_option == \
VIEW_OPTION_CHECK_OPTION_LOCAL)
/*
* RelationHasCascadedCheckOption
* Returns true if the relation is a view defined with the cascaded check
* option. Note multiple eval of argument!
*/
#define RelationHasCascadedCheckOption(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_VIEW), \
(relation)->rd_options && \
((ViewOptions *) (relation)->rd_options)->check_option == \
VIEW_OPTION_CHECK_OPTION_CASCADED)
/*
* RelationIsValid
* True iff relation descriptor is valid.
*/
#define RelationIsValid(relation) PointerIsValid(relation)
#define InvalidRelation ((Relation) NULL)
/*
* RelationHasReferenceCountZero
* True iff relation reference count is zero.
*
* Note:
* Assumes relation descriptor is valid.
*/
#define RelationHasReferenceCountZero(relation) \
((bool)((relation)->rd_refcnt == 0))
/*
* RelationGetForm
* Returns pg_class tuple for a relation.
*
* Note:
* Assumes relation descriptor is valid.
*/
#define RelationGetForm(relation) ((relation)->rd_rel)
/*
* RelationGetRelid
* Returns the OID of the relation
*/
#define RelationGetRelid(relation) ((relation)->rd_id)
/*
* RelationGetNumberOfAttributes
* Returns the total number of attributes in a relation.
*/
#define RelationGetNumberOfAttributes(relation) ((relation)->rd_rel->relnatts)
/*
* IndexRelationGetNumberOfAttributes
* Returns the number of attributes in an index.
*/
#define IndexRelationGetNumberOfAttributes(relation) \
((relation)->rd_index->indnatts)
/*
* IndexRelationGetNumberOfKeyAttributes
* Returns the number of key attributes in an index.
*/
#define IndexRelationGetNumberOfKeyAttributes(relation) \
((relation)->rd_index->indnkeyatts)
/*
* RelationGetDescr
* Returns tuple descriptor for a relation.
*/
#define RelationGetDescr(relation) ((relation)->rd_att)
/*
* RelationGetRelationName
* Returns the rel's name.
*
* Note that the name is only unique within the containing namespace.
*/
#define RelationGetRelationName(relation) \
(NameStr((relation)->rd_rel->relname))
/*
* RelationGetNamespace
* Returns the rel's namespace OID.
*/
#define RelationGetNamespace(relation) \
((relation)->rd_rel->relnamespace)
/*
* RelationIsMapped
* True if the relation uses the relfilenode map. Note multiple eval
* of argument!
*/
#define RelationIsMapped(relation) \
(RELKIND_HAS_STORAGE((relation)->rd_rel->relkind) && \
((relation)->rd_rel->relfilenode == InvalidOid))
/*
* RelationOpenSmgr
* Open the relation at the smgr level, if not already done.
*/
#define RelationOpenSmgr(relation) \
do { \
if ((relation)->rd_smgr == NULL) \
smgrsetowner(&((relation)->rd_smgr), smgropen((relation)->rd_node, (relation)->rd_backend)); \
} while (0)
/*
* RelationCloseSmgr
* Close the relation at the smgr level, if not already done.
*
* Note: smgrclose should unhook from owner pointer, hence the Assert.
*/
#define RelationCloseSmgr(relation) \
do { \
if ((relation)->rd_smgr != NULL) \
{ \
smgrclose((relation)->rd_smgr); \
Assert((relation)->rd_smgr == NULL); \
} \
} while (0)
/*
* RelationGetTargetBlock
* Fetch relation's current insertion target block.
*
* Returns InvalidBlockNumber if there is no current target block. Note
* that the target block status is discarded on any smgr-level invalidation.
*/
#define RelationGetTargetBlock(relation) \
( (relation)->rd_smgr != NULL ? (relation)->rd_smgr->smgr_targblock : InvalidBlockNumber )
/*
* RelationSetTargetBlock
* Set relation's current insertion target block.
*/
#define RelationSetTargetBlock(relation, targblock) \
do { \
RelationOpenSmgr(relation); \
(relation)->rd_smgr->smgr_targblock = (targblock); \
} while (0)
/*
* RelationIsPermanent
* True if relation is permanent.
*/
#define RelationIsPermanent(relation) \
((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
/*
* RelationNeedsWAL
* True if relation needs WAL.
Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
*
* Returns false if wal_level = minimal and this relation is created or
* truncated in the current transaction. See "Skipping WAL for New
* RelFileNode" in src/backend/access/transam/README.
*/
#define RelationNeedsWAL(relation) \
(RelationIsPermanent(relation) && (XLogIsNeeded() || \
Skip WAL for new relfilenodes, under wal_level=minimal. Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
2020-04-04 21:25:34 +02:00
(relation->rd_createSubid == InvalidSubTransactionId && \
relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
/*
* RelationUsesLocalBuffers
* True if relation's pages are stored in local buffers.
*/
#define RelationUsesLocalBuffers(relation) \
((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
/*
* RELATION_IS_LOCAL
* If a rel is either temp or newly created in the current transaction,
* it can be assumed to be accessible only to the current backend.
* This is typically used to decide that we can skip acquiring locks.
*
* Beware of multiple eval of argument
*/
#define RELATION_IS_LOCAL(relation) \
((relation)->rd_islocaltemp || \
(relation)->rd_createSubid != InvalidSubTransactionId)
/*
* RELATION_IS_OTHER_TEMP
* Test for a temporary relation that belongs to some other session.
*
* Beware of multiple eval of argument
*/
#define RELATION_IS_OTHER_TEMP(relation) \
((relation)->rd_rel->relpersistence == RELPERSISTENCE_TEMP && \
!(relation)->rd_islocaltemp)
/*
* RelationIsScannable
* Currently can only be false for a materialized view which has not been
* populated by its query. This is likely to get more complicated later,
* so use a macro which looks like a function.
*/
#define RelationIsScannable(relation) ((relation)->rd_rel->relispopulated)
/*
* RelationIsPopulated
* Currently, we don't physically distinguish the "populated" and
* "scannable" properties of matviews, but that may change later.
* Hence, use the appropriate one of these macros in code tests.
*/
#define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
/*
* RelationIsAccessibleInLogicalDecoding
* True if we need to log enough information to have access via
* decoding snapshot.
*/
#define RelationIsAccessibleInLogicalDecoding(relation) \
(XLogLogicalInfoActive() && \
RelationNeedsWAL(relation) && \
(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
/*
* RelationIsLogicallyLogged
* True if we need to log enough information to extract the data from the
* WAL stream.
*
* We don't log information for unlogged tables (since they don't WAL log
* anyway), for foreign tables (since they don't WAL log, either),
* and for system tables (their content is hard to make sense of, and
* it would complicate decoding slightly for little gain). Note that we *do*
* log information for user defined catalog tables since they presumably are
* interesting to the user...
*/
#define RelationIsLogicallyLogged(relation) \
(XLogLogicalInfoActive() && \
RelationNeedsWAL(relation) && \
(relation)->rd_rel->relkind != RELKIND_FOREIGN_TABLE && \
!IsCatalogRelation(relation))
/* routines in utils/cache/relcache.c */
extern void RelationIncrementReferenceCount(Relation rel);
extern void RelationDecrementReferenceCount(Relation rel);
#endif /* REL_H */