postgresql/src/backend/utils/cache/syscache.c

1457 lines
28 KiB
C
Raw Normal View History

/*-------------------------------------------------------------------------
*
* syscache.c
* System cache management routines
*
2017-01-03 19:48:53 +01:00
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
2010-09-20 22:08:53 +02:00
* src/backend/utils/cache/syscache.c
*
* NOTES
* These routines allow the parser/planner/executor to perform
* rapid lookups on the contents of the system catalogs.
*
2011-06-18 23:37:30 +02:00
* see utils/syscache.h for a list of the cache IDs
*
*-------------------------------------------------------------------------
*/
1996-11-03 07:54:38 +01:00
#include "postgres.h"
#include "access/htup_details.h"
#include "access/sysattr.h"
#include "catalog/indexing.h"
1999-07-16 07:23:30 +02:00
#include "catalog/pg_aggregate.h"
#include "catalog/pg_am.h"
#include "catalog/pg_amop.h"
#include "catalog/pg_amproc.h"
#include "catalog/pg_auth_members.h"
#include "catalog/pg_authid.h"
#include "catalog/pg_cast.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_conversion.h"
#include "catalog/pg_database.h"
#include "catalog/pg_db_role_setting.h"
#include "catalog/pg_default_acl.h"
#include "catalog/pg_depend.h"
#include "catalog/pg_description.h"
#include "catalog/pg_enum.h"
#include "catalog/pg_event_trigger.h"
#include "catalog/pg_foreign_data_wrapper.h"
#include "catalog/pg_foreign_server.h"
#include "catalog/pg_foreign_table.h"
#include "catalog/pg_language.h"
#include "catalog/pg_namespace.h"
#include "catalog/pg_opclass.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_proc.h"
#include "catalog/pg_publication.h"
#include "catalog/pg_publication_rel.h"
#include "catalog/pg_range.h"
#include "catalog/pg_rewrite.h"
#include "catalog/pg_seclabel.h"
#include "catalog/pg_sequence.h"
#include "catalog/pg_shdepend.h"
#include "catalog/pg_shdescription.h"
#include "catalog/pg_shseclabel.h"
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
#include "catalog/pg_replication_origin.h"
1999-11-24 18:09:28 +01:00
#include "catalog/pg_statistic.h"
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
#include "catalog/pg_statistic_ext.h"
#include "catalog/pg_subscription.h"
#include "catalog/pg_subscription_rel.h"
#include "catalog/pg_tablespace.h"
#include "catalog/pg_transform.h"
#include "catalog/pg_ts_config.h"
#include "catalog/pg_ts_config_map.h"
#include "catalog/pg_ts_dict.h"
#include "catalog/pg_ts_parser.h"
#include "catalog/pg_ts_template.h"
1999-07-16 07:23:30 +02:00
#include "catalog/pg_type.h"
#include "catalog/pg_user_mapping.h"
#include "utils/rel.h"
#include "utils/catcache.h"
#include "utils/syscache.h"
/*---------------------------------------------------------------------------
Adding system caches:
Add your new cache to the list in include/utils/syscache.h.
Keep the list sorted alphabetically.
Add your entry to the cacheinfo[] array below. All cache lists are
alphabetical, so add it in the proper place. Specify the relation OID,
index OID, number of keys, key attribute numbers, and initial number of
hash buckets.
The number of hash buckets must be a power of 2. It's reasonable to
set this to the number of entries that might be in the particular cache
in a medium-size database.
There must be a unique index underlying each syscache (ie, an index
whose key is the same as that of the cache). If there is not one
already, add definitions for it to include/catalog/indexing.h: you need
to add a DECLARE_UNIQUE_INDEX macro and a #define for the index OID.
(Adding an index requires a catversion.h update, while simply
adding/deleting caches only requires a recompile.)
Finally, any place your relation gets heap_insert() or
heap_update() calls, use CatalogTupleInsert() or CatalogTupleUpdate()
instead, which also update indexes. The heap_* calls do not do that.
*---------------------------------------------------------------------------
*/
/*
* struct cachedesc: information defining a single syscache
*/
struct cachedesc
{
Oid reloid; /* OID of the relation being cached */
Oid indoid; /* OID of index relation for this cache */
int nkeys; /* # of keys needed for cache lookup */
int key[4]; /* attribute numbers of key attrs */
int nbuckets; /* number of hash buckets for this cache */
};
static const struct cachedesc cacheinfo[] = {
2005-10-15 04:49:52 +02:00
{AggregateRelationId, /* AGGFNOID */
AggregateFnoidIndexId,
1,
{
Anum_pg_aggregate_aggfnoid,
0,
0,
0
},
16
},
2005-10-15 04:49:52 +02:00
{AccessMethodRelationId, /* AMNAME */
AmNameIndexId,
1,
{
Anum_pg_am_amname,
0,
0,
0
},
4
},
2005-10-15 04:49:52 +02:00
{AccessMethodRelationId, /* AMOID */
AmOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
4
},
{AccessMethodOperatorRelationId, /* AMOPOPID */
AccessMethodOperatorIndexId,
3,
{
Anum_pg_amop_amopopr,
Anum_pg_amop_amoppurpose,
Anum_pg_amop_amopfamily,
0
},
64
},
{AccessMethodOperatorRelationId, /* AMOPSTRATEGY */
AccessMethodStrategyIndexId,
4,
{
Anum_pg_amop_amopfamily,
Anum_pg_amop_amoplefttype,
Anum_pg_amop_amoprighttype,
Anum_pg_amop_amopstrategy
},
64
},
{AccessMethodProcedureRelationId, /* AMPROCNUM */
AccessMethodProcedureIndexId,
4,
{
Anum_pg_amproc_amprocfamily,
Anum_pg_amproc_amproclefttype,
Anum_pg_amproc_amprocrighttype,
Anum_pg_amproc_amprocnum
},
16
},
2005-10-15 04:49:52 +02:00
{AttributeRelationId, /* ATTNAME */
AttributeRelidNameIndexId,
2,
{
Anum_pg_attribute_attrelid,
Anum_pg_attribute_attname,
0,
0
},
32
},
2005-10-15 04:49:52 +02:00
{AttributeRelationId, /* ATTNUM */
AttributeRelidNumIndexId,
2,
{
Anum_pg_attribute_attrelid,
Anum_pg_attribute_attnum,
0,
0
},
128
},
2005-10-15 04:49:52 +02:00
{AuthMemRelationId, /* AUTHMEMMEMROLE */
AuthMemMemRoleIndexId,
2,
{
Anum_pg_auth_members_member,
Anum_pg_auth_members_roleid,
0,
0
},
8
},
2005-10-15 04:49:52 +02:00
{AuthMemRelationId, /* AUTHMEMROLEMEM */
AuthMemRoleMemIndexId,
2,
{
Anum_pg_auth_members_roleid,
Anum_pg_auth_members_member,
0,
0
},
8
},
2005-10-15 04:49:52 +02:00
{AuthIdRelationId, /* AUTHNAME */
AuthIdRolnameIndexId,
1,
{
Anum_pg_authid_rolname,
0,
0,
0
},
8
},
2005-10-15 04:49:52 +02:00
{AuthIdRelationId, /* AUTHOID */
AuthIdOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
{
2005-10-15 04:49:52 +02:00
CastRelationId, /* CASTSOURCETARGET */
CastSourceTargetIndexId,
2,
{
Anum_pg_cast_castsource,
Anum_pg_cast_casttarget,
0,
0
},
256
},
2005-10-15 04:49:52 +02:00
{OperatorClassRelationId, /* CLAAMNAMENSP */
OpclassAmNameNspIndexId,
3,
{
Anum_pg_opclass_opcmethod,
Anum_pg_opclass_opcname,
Anum_pg_opclass_opcnamespace,
0
},
8
},
2005-10-15 04:49:52 +02:00
{OperatorClassRelationId, /* CLAOID */
OpclassOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
{CollationRelationId, /* COLLNAMEENCNSP */
CollationNameEncNspIndexId,
3,
{
Anum_pg_collation_collname,
Anum_pg_collation_collencoding,
Anum_pg_collation_collnamespace,
0
},
8
},
{CollationRelationId, /* COLLOID */
CollationOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
2005-10-15 04:49:52 +02:00
{ConversionRelationId, /* CONDEFAULT */
ConversionDefaultIndexId,
4,
{
Anum_pg_conversion_connamespace,
Anum_pg_conversion_conforencoding,
Anum_pg_conversion_contoencoding,
ObjectIdAttributeNumber,
},
8
},
2005-10-15 04:49:52 +02:00
{ConversionRelationId, /* CONNAMENSP */
ConversionNameNspIndexId,
2,
{
Anum_pg_conversion_conname,
Anum_pg_conversion_connamespace,
0,
0
},
8
},
{ConstraintRelationId, /* CONSTROID */
ConstraintOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
16
},
{ConversionRelationId, /* CONVOID */
ConversionOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
{DatabaseRelationId, /* DATABASEOID */
DatabaseOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
4
},
{DefaultAclRelationId, /* DEFACLROLENSPOBJ */
DefaultAclRoleNspObjIndexId,
3,
{
Anum_pg_default_acl_defaclrole,
Anum_pg_default_acl_defaclnamespace,
Anum_pg_default_acl_defaclobjtype,
0
},
8
},
{EnumRelationId, /* ENUMOID */
EnumOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
{EnumRelationId, /* ENUMTYPOIDNAME */
EnumTypIdLabelIndexId,
2,
{
Anum_pg_enum_enumtypid,
Anum_pg_enum_enumlabel,
0,
0
},
8
},
{EventTriggerRelationId, /* EVENTTRIGGERNAME */
EventTriggerNameIndexId,
1,
{
Anum_pg_event_trigger_evtname,
0,
0,
0
},
8
},
{EventTriggerRelationId, /* EVENTTRIGGEROID */
EventTriggerOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
{ForeignDataWrapperRelationId, /* FOREIGNDATAWRAPPERNAME */
ForeignDataWrapperNameIndexId,
1,
{
Anum_pg_foreign_data_wrapper_fdwname,
0,
0,
0
},
2
},
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
{ForeignDataWrapperRelationId, /* FOREIGNDATAWRAPPEROID */
ForeignDataWrapperOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
{ForeignServerRelationId, /* FOREIGNSERVERNAME */
ForeignServerNameIndexId,
1,
{
Anum_pg_foreign_server_srvname,
0,
0,
0
},
2
},
{ForeignServerRelationId, /* FOREIGNSERVEROID */
ForeignServerOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
2011-04-10 17:42:00 +02:00
{ForeignTableRelationId, /* FOREIGNTABLEREL */
ForeignTableRelidIndexId,
1,
{
Anum_pg_foreign_table_ftrelid,
0,
0,
0
},
4
},
2005-10-15 04:49:52 +02:00
{IndexRelationId, /* INDEXRELID */
IndexRelidIndexId,
1,
{
Anum_pg_index_indexrelid,
0,
0,
0
},
64
},
2005-10-15 04:49:52 +02:00
{LanguageRelationId, /* LANGNAME */
LanguageNameIndexId,
1,
{
Anum_pg_language_lanname,
0,
0,
0
},
4
},
2005-10-15 04:49:52 +02:00
{LanguageRelationId, /* LANGOID */
LanguageOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
4
},
2005-10-15 04:49:52 +02:00
{NamespaceRelationId, /* NAMESPACENAME */
NamespaceNameIndexId,
1,
{
Anum_pg_namespace_nspname,
0,
0,
0
},
4
},
2005-10-15 04:49:52 +02:00
{NamespaceRelationId, /* NAMESPACEOID */
NamespaceOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
16
},
2005-10-15 04:49:52 +02:00
{OperatorRelationId, /* OPERNAMENSP */
OperatorNameNspIndexId,
4,
{
Anum_pg_operator_oprname,
Anum_pg_operator_oprleft,
Anum_pg_operator_oprright,
Anum_pg_operator_oprnamespace
},
256
},
2005-10-15 04:49:52 +02:00
{OperatorRelationId, /* OPEROID */
OperatorOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
32
},
{OperatorFamilyRelationId, /* OPFAMILYAMNAMENSP */
OpfamilyAmNameNspIndexId,
3,
{
Anum_pg_opfamily_opfmethod,
Anum_pg_opfamily_opfname,
Anum_pg_opfamily_opfnamespace,
0
},
8
},
{OperatorFamilyRelationId, /* OPFAMILYOID */
OpfamilyOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
Implement table partitioning. Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
2016-12-07 19:17:43 +01:00
{PartitionedRelationId, /* PARTRELID */
PartitionedRelidIndexId,
1,
{
Anum_pg_partitioned_table_partrelid,
0,
0,
0
},
32
},
2005-10-15 04:49:52 +02:00
{ProcedureRelationId, /* PROCNAMEARGSNSP */
ProcedureNameArgsNspIndexId,
3,
{
Anum_pg_proc_proname,
Anum_pg_proc_proargtypes,
Anum_pg_proc_pronamespace,
0
},
128
},
2005-10-15 04:49:52 +02:00
{ProcedureRelationId, /* PROCOID */
ProcedureOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
128
},
{PublicationRelationId, /* PUBLICATIONNAME */
PublicationNameIndexId,
1,
{
Anum_pg_publication_pubname,
0,
0,
0
},
8
},
{PublicationRelationId, /* PUBLICATIONOID */
PublicationObjectIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
8
},
{PublicationRelRelationId, /* PUBLICATIONREL */
PublicationRelObjectIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
64
},
{PublicationRelRelationId, /* PUBLICATIONRELMAP */
PublicationRelPrrelidPrpubidIndexId,
2,
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
{
Anum_pg_publication_rel_prrelid,
Anum_pg_publication_rel_prpubid,
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
0,
0
},
64
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
},
{RangeRelationId, /* RANGETYPE */
RangeTypidIndexId,
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
1,
{
Anum_pg_range_rngtypid,
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
0,
0,
0
},
4
Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
},
{RelationRelationId, /* RELNAMENSP */
ClassNameNspIndexId,
2,
{
Anum_pg_class_relname,
Anum_pg_class_relnamespace,
0,
0
},
128
},
{RelationRelationId, /* RELOID */
ClassOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
128
},
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
{ReplicationOriginRelationId, /* REPLORIGIDENT */
ReplicationOriginIdentIndex,
1,
{
Anum_pg_replication_origin_roident,
0,
0,
0
},
16
},
Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 21:18:54 +02:00
{ReplicationOriginRelationId, /* REPLORIGNAME */
ReplicationOriginNameIndex,
1,
{
Anum_pg_replication_origin_roname,
0,
0,
0
},
16
},
2005-10-15 04:49:52 +02:00
{RewriteRelationId, /* RULERELNAME */
RewriteRelRulenameIndexId,
2,
{
Anum_pg_rewrite_ev_class,
Anum_pg_rewrite_rulename,
0,
0
},
8
},
{SequenceRelationId, /* SEQRELID */
SequenceRelidIndexId,
1,
{
Anum_pg_sequence_seqrelid,
0,
0,
0
},
32
},
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
{StatisticExtRelationId, /* STATEXTNAMENSP */
StatisticExtNameIndexId,
2,
{
Anum_pg_statistic_ext_stxname,
Anum_pg_statistic_ext_stxnamespace,
Implement multivariate n-distinct coefficients Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
2017-03-24 18:06:10 +01:00
0,
0
},
4
},
{StatisticExtRelationId, /* STATEXTOID */
StatisticExtOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
4
},
{StatisticRelationId, /* STATRELATTINH */
StatisticRelidAttnumInhIndexId,
3,
{
Anum_pg_statistic_starelid,
Anum_pg_statistic_staattnum,
Anum_pg_statistic_stainherit,
0
},
128
},
{SubscriptionRelationId, /* SUBSCRIPTIONNAME */
SubscriptionNameIndexId,
2,
{
Anum_pg_subscription_subdbid,
Anum_pg_subscription_subname,
0,
0
},
4
},
{SubscriptionRelationId, /* SUBSCRIPTIONOID */
SubscriptionObjectIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
4
},
{SubscriptionRelRelationId, /* SUBSCRIPTIONRELMAP */
SubscriptionRelSrrelidSrsubidIndexId,
2,
{
Anum_pg_subscription_rel_srrelid,
Anum_pg_subscription_rel_srsubid,
0,
0
},
64
},
{TableSpaceRelationId, /* TABLESPACEOID */
TablespaceOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0,
},
4
},
{TransformRelationId, /* TRFOID */
2015-05-24 03:35:49 +02:00
TransformOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0,
},
16
},
{TransformRelationId, /* TRFTYPELANG */
2015-05-24 03:35:49 +02:00
TransformTypeLangIndexId,
2,
{
Anum_pg_transform_trftype,
Anum_pg_transform_trflang,
0,
0,
},
16
},
{TSConfigMapRelationId, /* TSCONFIGMAP */
TSConfigMapIndexId,
3,
{
Anum_pg_ts_config_map_mapcfg,
Anum_pg_ts_config_map_maptokentype,
Anum_pg_ts_config_map_mapseqno,
0
},
2
},
{TSConfigRelationId, /* TSCONFIGNAMENSP */
TSConfigNameNspIndexId,
2,
{
Anum_pg_ts_config_cfgname,
Anum_pg_ts_config_cfgnamespace,
0,
0
},
2
},
{TSConfigRelationId, /* TSCONFIGOID */
TSConfigOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
{TSDictionaryRelationId, /* TSDICTNAMENSP */
TSDictionaryNameNspIndexId,
2,
{
Anum_pg_ts_dict_dictname,
Anum_pg_ts_dict_dictnamespace,
0,
0
},
2
},
{TSDictionaryRelationId, /* TSDICTOID */
TSDictionaryOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
{TSParserRelationId, /* TSPARSERNAMENSP */
TSParserNameNspIndexId,
2,
{
Anum_pg_ts_parser_prsname,
Anum_pg_ts_parser_prsnamespace,
0,
0
},
2
},
{TSParserRelationId, /* TSPARSEROID */
TSParserOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
{TSTemplateRelationId, /* TSTEMPLATENAMENSP */
TSTemplateNameNspIndexId,
2,
{
Anum_pg_ts_template_tmplname,
Anum_pg_ts_template_tmplnamespace,
0,
0
},
2
},
{TSTemplateRelationId, /* TSTEMPLATEOID */
TSTemplateOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
2005-10-15 04:49:52 +02:00
{TypeRelationId, /* TYPENAMENSP */
TypeNameNspIndexId,
2,
{
Anum_pg_type_typname,
Anum_pg_type_typnamespace,
0,
0
},
64
},
2005-10-15 04:49:52 +02:00
{TypeRelationId, /* TYPEOID */
TypeOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
64
},
{UserMappingRelationId, /* USERMAPPINGOID */
UserMappingOidIndexId,
1,
{
ObjectIdAttributeNumber,
0,
0,
0
},
2
},
{UserMappingRelationId, /* USERMAPPINGUSERSERVER */
UserMappingUserServerIndexId,
2,
{
Anum_pg_user_mapping_umuser,
Anum_pg_user_mapping_umserver,
0,
0
},
2
}
};
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
static CatCache *SysCache[SysCacheSize];
static bool CacheInitialized = false;
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
/* Sorted array of OIDs of tables that have caches on them */
static Oid SysCacheRelationOid[SysCacheSize];
static int SysCacheRelationOidSize;
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
/* Sorted array of OIDs of tables and indexes used by caches */
static Oid SysCacheSupportingRelOid[SysCacheSize * 2];
static int SysCacheSupportingRelOidSize;
static int oid_compare(const void *a, const void *b);
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
/*
* InitCatalogCache - initialize the caches
*
* Note that no database access is done here; we only allocate memory
* and initialize the cache structure. Interrogation of the database
* to complete initialization of a cache happens upon first use
* of that cache.
*/
void
InitCatalogCache(void)
{
int cacheId;
int i,
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
j;
StaticAssertStmt(SysCacheSize == (int) lengthof(cacheinfo),
"SysCacheSize does not match syscache.c's array");
Assert(!CacheInitialized);
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
SysCacheRelationOidSize = SysCacheSupportingRelOidSize = 0;
for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
{
SysCache[cacheId] = InitCatCache(cacheId,
cacheinfo[cacheId].reloid,
cacheinfo[cacheId].indoid,
cacheinfo[cacheId].nkeys,
cacheinfo[cacheId].key,
cacheinfo[cacheId].nbuckets);
if (!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "could not initialize cache %u (%d)",
cacheinfo[cacheId].reloid, cacheId);
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
/* Accumulate data for OID lists, too */
SysCacheRelationOid[SysCacheRelationOidSize++] =
cacheinfo[cacheId].reloid;
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
SysCacheSupportingRelOid[SysCacheSupportingRelOidSize++] =
cacheinfo[cacheId].reloid;
SysCacheSupportingRelOid[SysCacheSupportingRelOidSize++] =
cacheinfo[cacheId].indoid;
/* see comments for RelationInvalidatesSnapshotsOnly */
Assert(!RelationInvalidatesSnapshotsOnly(cacheinfo[cacheId].reloid));
}
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
Assert(SysCacheRelationOidSize <= lengthof(SysCacheRelationOid));
Assert(SysCacheSupportingRelOidSize <= lengthof(SysCacheSupportingRelOid));
/* Sort and de-dup OID arrays, so we can use binary search. */
pg_qsort(SysCacheRelationOid, SysCacheRelationOidSize,
sizeof(Oid), oid_compare);
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
for (i = 1, j = 0; i < SysCacheRelationOidSize; i++)
{
if (SysCacheRelationOid[i] != SysCacheRelationOid[j])
SysCacheRelationOid[++j] = SysCacheRelationOid[i];
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
}
SysCacheRelationOidSize = j + 1;
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
pg_qsort(SysCacheSupportingRelOid, SysCacheSupportingRelOidSize,
sizeof(Oid), oid_compare);
for (i = 1, j = 0; i < SysCacheSupportingRelOidSize; i++)
{
if (SysCacheSupportingRelOid[i] != SysCacheSupportingRelOid[j])
SysCacheSupportingRelOid[++j] = SysCacheSupportingRelOid[i];
}
SysCacheSupportingRelOidSize = j + 1;
2000-02-18 10:30:20 +01:00
CacheInitialized = true;
}
/*
* InitCatalogCachePhase2 - finish initializing the caches
*
* Finish initializing all the caches, including necessary database
* access.
*
* This is *not* essential; normally we allow syscaches to be initialized
* on first use. However, it is useful as a mechanism to preload the
* relcache with entries for the most-commonly-used system catalogs.
* Therefore, we invoke this routine when we need to write a new relcache
* init file.
*/
void
InitCatalogCachePhase2(void)
{
int cacheId;
Assert(CacheInitialized);
for (cacheId = 0; cacheId < SysCacheSize; cacheId++)
InitCatCachePhase2(SysCache[cacheId], true);
}
/*
* SearchSysCache
*
* A layer on top of SearchCatCache that does the initialization and
* key-setting for you.
*
* Returns the cache copy of the tuple if one is found, NULL if not.
* The tuple is the 'cache' copy and must NOT be modified!
*
* When the caller is done using the tuple, call ReleaseSysCache()
* to release the reference count grabbed by SearchSysCache(). If this
* is not done, the tuple will remain locked in cache until end of
* transaction, which is tolerable but not desirable.
*
* CAUTION: The tuple that is returned must NOT be freed by the caller!
*/
HeapTuple
SearchSysCache(int cacheId,
Datum key1,
Datum key2,
Datum key3,
Datum key4)
{
if (cacheId < 0 || cacheId >= SysCacheSize ||
2002-09-04 22:31:48 +02:00
!PointerIsValid(SysCache[cacheId]))
2011-06-18 23:37:30 +02:00
elog(ERROR, "invalid cache ID: %d", cacheId);
return SearchCatCache(SysCache[cacheId], key1, key2, key3, key4);
}
/*
* ReleaseSysCache
* Release previously grabbed reference count on a tuple
*/
void
ReleaseSysCache(HeapTuple tuple)
{
ReleaseCatCache(tuple);
}
/*
* SearchSysCacheCopy
*
* A convenience routine that does SearchSysCache and (if successful)
* returns a modifiable copy of the syscache entry. The original
* syscache entry is released before returning. The caller should
* heap_freetuple() the result when done with it.
*/
HeapTuple
SearchSysCacheCopy(int cacheId,
Datum key1,
Datum key2,
Datum key3,
Datum key4)
{
HeapTuple tuple,
newtuple;
tuple = SearchSysCache(cacheId, key1, key2, key3, key4);
if (!HeapTupleIsValid(tuple))
return tuple;
newtuple = heap_copytuple(tuple);
ReleaseSysCache(tuple);
return newtuple;
}
/*
* SearchSysCacheExists
*
* A convenience routine that just probes to see if a tuple can be found.
* No lock is retained on the syscache entry.
*/
bool
SearchSysCacheExists(int cacheId,
Datum key1,
Datum key2,
Datum key3,
Datum key4)
{
HeapTuple tuple;
tuple = SearchSysCache(cacheId, key1, key2, key3, key4);
if (!HeapTupleIsValid(tuple))
return false;
ReleaseSysCache(tuple);
return true;
}
/*
* GetSysCacheOid
*
* A convenience routine that does SearchSysCache and returns the OID
* of the found tuple, or InvalidOid if no tuple could be found.
* No lock is retained on the syscache entry.
*/
Oid
GetSysCacheOid(int cacheId,
Datum key1,
Datum key2,
Datum key3,
Datum key4)
{
HeapTuple tuple;
Oid result;
tuple = SearchSysCache(cacheId, key1, key2, key3, key4);
if (!HeapTupleIsValid(tuple))
return InvalidOid;
result = HeapTupleGetOid(tuple);
ReleaseSysCache(tuple);
return result;
}
/*
* SearchSysCacheAttName
*
* This routine is equivalent to SearchSysCache on the ATTNAME cache,
* except that it will return NULL if the found attribute is marked
* attisdropped. This is convenient for callers that want to act as
* though dropped attributes don't exist.
*/
HeapTuple
SearchSysCacheAttName(Oid relid, const char *attname)
{
HeapTuple tuple;
tuple = SearchSysCache2(ATTNAME,
ObjectIdGetDatum(relid),
CStringGetDatum(attname));
if (!HeapTupleIsValid(tuple))
return NULL;
if (((Form_pg_attribute) GETSTRUCT(tuple))->attisdropped)
{
ReleaseSysCache(tuple);
return NULL;
}
return tuple;
}
/*
* SearchSysCacheCopyAttName
*
* As above, an attisdropped-aware version of SearchSysCacheCopy.
*/
HeapTuple
SearchSysCacheCopyAttName(Oid relid, const char *attname)
{
HeapTuple tuple,
newtuple;
tuple = SearchSysCacheAttName(relid, attname);
if (!HeapTupleIsValid(tuple))
return tuple;
newtuple = heap_copytuple(tuple);
ReleaseSysCache(tuple);
return newtuple;
}
/*
* SearchSysCacheExistsAttName
*
* As above, an attisdropped-aware version of SearchSysCacheExists.
*/
bool
SearchSysCacheExistsAttName(Oid relid, const char *attname)
{
HeapTuple tuple;
tuple = SearchSysCacheAttName(relid, attname);
if (!HeapTupleIsValid(tuple))
return false;
ReleaseSysCache(tuple);
return true;
}
/*
* SysCacheGetAttr
*
* Given a tuple previously fetched by SearchSysCache(),
* extract a specific attribute.
*
* This is equivalent to using heap_getattr() on a tuple fetched
* from a non-cached relation. Usually, this is only used for attributes
* that could be NULL or variable length; the fixed-size attributes in
* a system table are accessed just by mapping the tuple onto the C struct
* declarations from include/catalog/.
*
* As with heap_getattr(), if the attribute is of a pass-by-reference type
* then a pointer into the tuple data area is returned --- the caller must
* not modify or pfree the datum!
*
* Note: it is legal to use SysCacheGetAttr() with a cacheId referencing
* a different cache for the same catalog the tuple was fetched from.
*/
Datum
SysCacheGetAttr(int cacheId, HeapTuple tup,
AttrNumber attributeNumber,
bool *isNull)
{
/*
2005-10-15 04:49:52 +02:00
* We just need to get the TupleDesc out of the cache entry, and then we
* can apply heap_getattr(). Normally the cache control data is already
* valid (because the caller recently fetched the tuple via this same
* cache), but there are cases where we have to initialize the cache here.
*/
if (cacheId < 0 || cacheId >= SysCacheSize ||
!PointerIsValid(SysCache[cacheId]))
2011-06-18 23:37:30 +02:00
elog(ERROR, "invalid cache ID: %d", cacheId);
if (!PointerIsValid(SysCache[cacheId]->cc_tupdesc))
{
InitCatCachePhase2(SysCache[cacheId], false);
Assert(PointerIsValid(SysCache[cacheId]->cc_tupdesc));
}
return heap_getattr(tup, attributeNumber,
SysCache[cacheId]->cc_tupdesc,
isNull);
}
/*
* GetSysCacheHashValue
*
* Get the hash value that would be used for a tuple in the specified cache
* with the given search keys.
*
* The reason for exposing this as part of the API is that the hash value is
* exposed in cache invalidation operations, so there are places outside the
* catcache code that need to be able to compute the hash values.
*/
uint32
GetSysCacheHashValue(int cacheId,
Datum key1,
Datum key2,
Datum key3,
Datum key4)
{
if (cacheId < 0 || cacheId >= SysCacheSize ||
!PointerIsValid(SysCache[cacheId]))
elog(ERROR, "invalid cache ID: %d", cacheId);
return GetCatCacheHashValue(SysCache[cacheId], key1, key2, key3, key4);
}
/*
* List-search interface
*/
struct catclist *
SearchSysCacheList(int cacheId, int nkeys,
Datum key1, Datum key2, Datum key3, Datum key4)
{
if (cacheId < 0 || cacheId >= SysCacheSize ||
2002-09-04 22:31:48 +02:00
!PointerIsValid(SysCache[cacheId]))
2011-06-18 23:37:30 +02:00
elog(ERROR, "invalid cache ID: %d", cacheId);
return SearchCatCacheList(SysCache[cacheId], nkeys,
key1, key2, key3, key4);
}
2017-05-13 00:17:29 +02:00
/*
* SysCacheInvalidate
*
* Invalidate entries in the specified cache, given a hash value.
* See CatCacheInvalidate() for more info.
*
* This routine is only quasi-public: it should only be used by inval.c.
*/
void
SysCacheInvalidate(int cacheId, uint32 hashValue)
{
if (cacheId < 0 || cacheId >= SysCacheSize)
elog(ERROR, "invalid cache ID: %d", cacheId);
/* if this cache isn't initialized yet, no need to do anything */
if (!PointerIsValid(SysCache[cacheId]))
return;
CatCacheInvalidate(SysCache[cacheId], hashValue);
}
/*
* Certain relations that do not have system caches send snapshot invalidation
* messages in lieu of catcache messages. This is for the benefit of
* GetCatalogSnapshot(), which can then reuse its existing MVCC snapshot
* for scanning one of those catalogs, rather than taking a new one, if no
* invalidation has been received.
*
* Relations that have syscaches need not (and must not) be listed here. The
* catcache invalidation messages will also flush the snapshot. If you add a
* syscache for one of these relations, remove it from this list.
*/
bool
RelationInvalidatesSnapshotsOnly(Oid relid)
{
switch (relid)
{
case DbRoleSettingRelationId:
case DependRelationId:
case SharedDependRelationId:
case DescriptionRelationId:
case SharedDescriptionRelationId:
case SecLabelRelationId:
case SharedSecLabelRelationId:
return true;
default:
break;
}
return false;
}
/*
* Test whether a relation has a system cache.
*/
bool
RelationHasSysCache(Oid relid)
{
int low = 0,
high = SysCacheRelationOidSize - 1;
while (low <= high)
{
int middle = low + (high - low) / 2;
if (SysCacheRelationOid[middle] == relid)
return true;
if (SysCacheRelationOid[middle] < relid)
low = middle + 1;
else
high = middle - 1;
}
return false;
}
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
/*
* Test whether a relation supports a system cache, ie it is either a
* cached table or the index used for a cache.
*/
bool
RelationSupportsSysCache(Oid relid)
{
int low = 0,
high = SysCacheSupportingRelOidSize - 1;
while (low <= high)
{
int middle = low + (high - low) / 2;
if (SysCacheSupportingRelOid[middle] == relid)
return true;
if (SysCacheSupportingRelOid[middle] < relid)
low = middle + 1;
else
high = middle - 1;
}
return false;
}
/*
* OID comparator for pg_qsort
*/
static int
oid_compare(const void *a, const void *b)
{
Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
2015-06-07 21:32:09 +02:00
Oid oa = *((const Oid *) a);
Oid ob = *((const Oid *) b);
if (oa == ob)
return 0;
return (oa > ob) ? 1 : -1;
}