1996-07-09 08:22:35 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
1999-02-14 00:22:53 +01:00
|
|
|
* catalog.c
|
2006-07-31 22:09:10 +02:00
|
|
|
* routines concerned with catalog naming conventions and other
|
|
|
|
* bits of hard-wired knowledge
|
1997-09-07 07:04:48 +02:00
|
|
|
*
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2018-01-03 05:30:12 +01:00
|
|
|
* Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/catalog/catalog.c
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
1997-01-10 21:19:49 +01:00
|
|
|
|
1999-07-16 01:04:24 +02:00
|
|
|
#include "postgres.h"
|
1996-11-04 00:27:08 +01:00
|
|
|
|
2005-08-12 03:36:05 +02:00
|
|
|
#include <fcntl.h>
|
|
|
|
#include <unistd.h>
|
|
|
|
|
|
|
|
#include "access/genam.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "access/sysattr.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "access/transam.h"
|
|
|
|
#include "catalog/catalog.h"
|
2006-07-31 22:09:10 +02:00
|
|
|
#include "catalog/indexing.h"
|
2007-07-26 00:16:18 +02:00
|
|
|
#include "catalog/namespace.h"
|
2006-07-31 22:09:10 +02:00
|
|
|
#include "catalog/pg_auth_members.h"
|
|
|
|
#include "catalog/pg_authid.h"
|
|
|
|
#include "catalog/pg_database.h"
|
2002-04-12 22:38:31 +02:00
|
|
|
#include "catalog/pg_namespace.h"
|
2006-07-31 22:09:10 +02:00
|
|
|
#include "catalog/pg_pltemplate.h"
|
2009-10-08 00:14:26 +02:00
|
|
|
#include "catalog/pg_db_role_setting.h"
|
Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
|
|
|
#include "catalog/pg_replication_origin.h"
|
2006-07-31 22:09:10 +02:00
|
|
|
#include "catalog/pg_shdepend.h"
|
|
|
|
#include "catalog/pg_shdescription.h"
|
2011-07-20 19:18:24 +02:00
|
|
|
#include "catalog/pg_shseclabel.h"
|
2017-01-19 18:00:00 +01:00
|
|
|
#include "catalog/pg_subscription.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/pg_tablespace.h"
|
Assert that we don't invent relfilenodes or type OIDs in binary upgrade.
During pg_upgrade's restore run, all relfilenode choices should be
overridden by commands in the dump script. If we ever find ourselves
choosing a relfilenode in the ordinary way, someone blew it. Likewise for
pg_type OIDs. Since pg_upgrade might well succeed anyway, if there happens
not to be a conflict during the regression test run, we need assertions
here to keep us on the straight and narrow.
We might someday be able to remove the assertion in GetNewRelFileNode,
if pg_upgrade is rewritten to remove its assumption that old and new
relfilenodes always match. But it's hard to see how to get rid of the
pg_type OID constraint, since those OIDs are embedded in user tables
in some cases.
Back-patch as far as 9.5, because of the risk of back-patches breaking
something here even if it works in HEAD. I'd prefer to go back further,
but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary
table. We can't use the later-branch solution of a CTE for compatibility
reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some
other way to do the query. (I did check, by dint of changing the Asserts
to elog(WARNING), that there are no other cases of unwanted OID assignments
during 9.4's regression test run.)
Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
2017-06-13 02:04:32 +02:00
|
|
|
#include "catalog/pg_type.h"
|
2006-07-31 22:09:10 +02:00
|
|
|
#include "catalog/toasting.h"
|
1999-07-16 07:00:38 +02:00
|
|
|
#include "miscadmin.h"
|
2005-08-12 03:36:05 +02:00
|
|
|
#include "storage/fd.h"
|
|
|
|
#include "utils/fmgroids.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/rel.h"
|
2008-03-26 17:20:48 +01:00
|
|
|
#include "utils/tqual.h"
|
2002-04-12 22:38:31 +02:00
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
|
|
|
|
/*
|
2002-04-12 22:38:31 +02:00
|
|
|
* IsSystemRelation
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
* True iff the relation is either a system catalog or toast table.
|
|
|
|
* By a system catalog, we mean one that created in the pg_catalog schema
|
2014-05-06 18:12:18 +02:00
|
|
|
* during initdb. User-created relations in pg_catalog don't count as
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
* system catalogs.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2002-04-12 22:38:31 +02:00
|
|
|
* NB: TOAST relations are considered system relations by this test
|
|
|
|
* for compatibility with the old IsSystemRelationName function.
|
2001-11-17 00:30:35 +01:00
|
|
|
* This is appropriate in many places but not all. Where it's not,
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
* also check IsToastRelation or use IsCatalogRelation().
|
2002-04-12 22:38:31 +02:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsSystemRelation(Relation relation)
|
|
|
|
{
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
return IsSystemClass(RelationGetRelid(relation), relation->rd_rel);
|
2002-04-12 22:38:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsSystemClass
|
|
|
|
* Like the above, but takes a Form_pg_class as argument.
|
|
|
|
* Used when we do not want to open the relation and have to
|
|
|
|
* search pg_class directly.
|
|
|
|
*/
|
|
|
|
bool
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
IsSystemClass(Oid relid, Form_pg_class reltuple)
|
2002-04-12 22:38:31 +02:00
|
|
|
{
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
return IsToastClass(reltuple) || IsCatalogClass(relid, reltuple);
|
|
|
|
}
|
2002-04-12 22:38:31 +02:00
|
|
|
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
/*
|
|
|
|
* IsCatalogRelation
|
|
|
|
* True iff the relation is a system catalog, or the toast table for
|
|
|
|
* a system catalog. By a system catalog, we mean one that created
|
|
|
|
* in the pg_catalog schema during initdb. As with IsSystemRelation(),
|
|
|
|
* user-created relations in pg_catalog don't count as system catalogs.
|
|
|
|
*
|
|
|
|
* Note that IsSystemRelation() returns true for ALL toast relations,
|
|
|
|
* but this function returns true only for toast relations of system
|
|
|
|
* catalogs.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsCatalogRelation(Relation relation)
|
|
|
|
{
|
|
|
|
return IsCatalogClass(RelationGetRelid(relation), relation->rd_rel);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsCatalogClass
|
|
|
|
* True iff the relation is a system catalog relation.
|
|
|
|
*
|
|
|
|
* Check IsCatalogRelation() for details.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsCatalogClass(Oid relid, Form_pg_class reltuple)
|
|
|
|
{
|
2014-05-06 18:12:18 +02:00
|
|
|
Oid relnamespace = reltuple->relnamespace;
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Never consider relations outside pg_catalog/pg_toast to be catalog
|
|
|
|
* relations.
|
|
|
|
*/
|
|
|
|
if (!IsSystemNamespace(relnamespace) && !IsToastNamespace(relnamespace))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* ----
|
|
|
|
* Check whether the oid was assigned during initdb, when creating the
|
|
|
|
* initial template database. Minus the relations in information_schema
|
|
|
|
* excluded above, these are integral part of the system.
|
|
|
|
* We could instead check whether the relation is pinned in pg_depend, but
|
|
|
|
* this is noticeably cheaper and doesn't require catalog access.
|
|
|
|
*
|
2015-05-20 15:18:11 +02:00
|
|
|
* This test is safe since even an oid wraparound will preserve this
|
2018-01-11 14:31:11 +01:00
|
|
|
* property (cf. GetNewObjectId()) and it has the advantage that it works
|
Refine our definition of what constitutes a system relation.
Although user-defined relations can't be directly created in
pg_catalog, it's possible for them to end up there, because you can
create them in some other schema and then use ALTER TABLE .. SET SCHEMA
to move them there. Previously, such relations couldn't afterwards
be manipulated, because IsSystemRelation()/IsSystemClass() rejected
all attempts to modify objects in the pg_catalog schema, regardless
of their origin. With this patch, they now reject only those
objects in pg_catalog which were created at initdb-time, allowing
most operations on user-created tables in pg_catalog to proceed
normally.
This patch also adds new functions IsCatalogRelation() and
IsCatalogClass(), which is similar to IsSystemRelation() and
IsSystemClass() but with a slightly narrower definition: only TOAST
tables of system catalogs are included, rather than *all* TOAST tables.
This is currently used only for making decisions about when
invalidation messages need to be sent, but upcoming logical decoding
patches will find other uses for this information.
Andres Freund, with some modifications by me.
2013-11-29 02:57:20 +01:00
|
|
|
* correctly even if a user decides to create a relation in the pg_catalog
|
|
|
|
* namespace.
|
|
|
|
* ----
|
|
|
|
*/
|
|
|
|
return relid < FirstNormalObjectId;
|
2002-04-12 22:38:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsToastRelation
|
|
|
|
* True iff relation is a TOAST support relation (or index).
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsToastRelation(Relation relation)
|
|
|
|
{
|
|
|
|
return IsToastNamespace(RelationGetNamespace(relation));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsToastClass
|
|
|
|
* Like the above, but takes a Form_pg_class as argument.
|
|
|
|
* Used when we do not want to open the relation and have to
|
|
|
|
* search pg_class directly.
|
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsToastClass(Form_pg_class reltuple)
|
|
|
|
{
|
2002-09-04 22:31:48 +02:00
|
|
|
Oid relnamespace = reltuple->relnamespace;
|
2002-04-12 22:38:31 +02:00
|
|
|
|
2002-09-04 22:31:48 +02:00
|
|
|
return IsToastNamespace(relnamespace);
|
2002-04-12 22:38:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsSystemNamespace
|
|
|
|
* True iff namespace is pg_catalog.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2002-04-12 22:38:31 +02:00
|
|
|
* NOTE: the reason this isn't a macro is to avoid having to include
|
|
|
|
* catalog/pg_namespace.h in a lot of places.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
|
|
|
bool
|
2002-04-12 22:38:31 +02:00
|
|
|
IsSystemNamespace(Oid namespaceId)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2002-04-12 22:38:31 +02:00
|
|
|
return namespaceId == PG_CATALOG_NAMESPACE;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IsToastNamespace
|
2007-07-26 00:16:18 +02:00
|
|
|
* True iff namespace is pg_toast or my temporary-toast-table namespace.
|
2002-04-12 22:38:31 +02:00
|
|
|
*
|
2007-07-26 00:16:18 +02:00
|
|
|
* Note: this will return false for temporary-toast-table namespaces belonging
|
|
|
|
* to other backends. Those are treated the same as other backends' regular
|
|
|
|
* temp table namespaces, and access is prevented where appropriate.
|
2002-04-12 22:38:31 +02:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsToastNamespace(Oid namespaceId)
|
|
|
|
{
|
2007-07-26 00:16:18 +02:00
|
|
|
return (namespaceId == PG_TOAST_NAMESPACE) ||
|
|
|
|
isTempToastNamespace(namespaceId);
|
2001-11-17 00:30:35 +01:00
|
|
|
}
|
|
|
|
|
2002-04-12 22:38:31 +02:00
|
|
|
|
2001-11-17 00:30:35 +01:00
|
|
|
/*
|
2002-04-12 22:38:31 +02:00
|
|
|
* IsReservedName
|
|
|
|
* True iff name starts with the pg_ prefix.
|
|
|
|
*
|
2004-01-06 19:07:32 +01:00
|
|
|
* For some classes of objects, the prefix pg_ is reserved for
|
2016-04-08 22:56:27 +02:00
|
|
|
* system objects only. As of 8.0, this was only true for
|
|
|
|
* schema and tablespace names. With 9.6, this is also true
|
|
|
|
* for roles.
|
2001-11-17 00:30:35 +01:00
|
|
|
*/
|
|
|
|
bool
|
2002-04-12 22:38:31 +02:00
|
|
|
IsReservedName(const char *name)
|
2001-11-17 00:30:35 +01:00
|
|
|
{
|
2002-04-12 22:38:31 +02:00
|
|
|
/* ugly coding for speed */
|
|
|
|
return (name[0] == 'p' &&
|
|
|
|
name[1] == 'g' &&
|
|
|
|
name[2] == '_');
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|
|
|
|
|
2002-04-12 22:38:31 +02:00
|
|
|
|
2006-07-31 22:09:10 +02:00
|
|
|
/*
|
|
|
|
* IsSharedRelation
|
|
|
|
* Given the OID of a relation, determine whether it's supposed to be
|
|
|
|
* shared across an entire database cluster.
|
|
|
|
*
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
* In older releases, this had to be hard-wired so that we could compute the
|
|
|
|
* locktag for a relation and lock it before examining its catalog entry.
|
|
|
|
* Since we now have MVCC catalog access, the race conditions that made that
|
|
|
|
* a hard requirement are gone, so we could look at relaxing this restriction.
|
|
|
|
* However, if we scanned the pg_class entry to find relisshared, and only
|
|
|
|
* then locked the relation, pg_class could get updated in the meantime,
|
|
|
|
* forcing us to scan the relation again, which would definitely be complex
|
|
|
|
* and might have undesirable performance consequences. Fortunately, the set
|
|
|
|
* of shared relations is fairly static, so a hand-maintained list of their
|
|
|
|
* OIDs isn't completely impractical.
|
2006-07-31 22:09:10 +02:00
|
|
|
*/
|
|
|
|
bool
|
|
|
|
IsSharedRelation(Oid relationId)
|
|
|
|
{
|
|
|
|
/* These are the shared catalogs (look for BKI_SHARED_RELATION) */
|
|
|
|
if (relationId == AuthIdRelationId ||
|
|
|
|
relationId == AuthMemRelationId ||
|
|
|
|
relationId == DatabaseRelationId ||
|
|
|
|
relationId == PLTemplateRelationId ||
|
|
|
|
relationId == SharedDescriptionRelationId ||
|
|
|
|
relationId == SharedDependRelationId ||
|
2011-07-20 19:18:24 +02:00
|
|
|
relationId == SharedSecLabelRelationId ||
|
2009-10-08 00:14:26 +02:00
|
|
|
relationId == TableSpaceRelationId ||
|
Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
|
|
|
relationId == DbRoleSettingRelationId ||
|
2017-01-19 18:00:00 +01:00
|
|
|
relationId == ReplicationOriginRelationId ||
|
|
|
|
relationId == SubscriptionRelationId)
|
2006-07-31 22:09:10 +02:00
|
|
|
return true;
|
|
|
|
/* These are their indexes (see indexing.h) */
|
|
|
|
if (relationId == AuthIdRolnameIndexId ||
|
|
|
|
relationId == AuthIdOidIndexId ||
|
|
|
|
relationId == AuthMemRoleMemIndexId ||
|
|
|
|
relationId == AuthMemMemRoleIndexId ||
|
|
|
|
relationId == DatabaseNameIndexId ||
|
|
|
|
relationId == DatabaseOidIndexId ||
|
|
|
|
relationId == PLTemplateNameIndexId ||
|
|
|
|
relationId == SharedDescriptionObjIndexId ||
|
|
|
|
relationId == SharedDependDependerIndexId ||
|
|
|
|
relationId == SharedDependReferenceIndexId ||
|
2011-07-20 19:18:24 +02:00
|
|
|
relationId == SharedSecLabelObjectIndexId ||
|
2006-07-31 22:09:10 +02:00
|
|
|
relationId == TablespaceOidIndexId ||
|
2009-10-08 00:14:26 +02:00
|
|
|
relationId == TablespaceNameIndexId ||
|
Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
|
|
|
relationId == DbRoleSettingDatidRolidIndexId ||
|
|
|
|
relationId == ReplicationOriginIdentIndex ||
|
2017-01-19 18:00:00 +01:00
|
|
|
relationId == ReplicationOriginNameIndex ||
|
|
|
|
relationId == SubscriptionObjectIndexId ||
|
|
|
|
relationId == SubscriptionNameIndexId)
|
2006-07-31 22:09:10 +02:00
|
|
|
return true;
|
|
|
|
/* These are their toast tables and toast indexes (see toasting.h) */
|
2012-02-29 03:43:36 +01:00
|
|
|
if (relationId == PgShdescriptionToastTable ||
|
2009-10-08 00:14:26 +02:00
|
|
|
relationId == PgShdescriptionToastIndex ||
|
|
|
|
relationId == PgDbRoleSettingToastTable ||
|
2015-03-22 03:14:49 +01:00
|
|
|
relationId == PgDbRoleSettingToastIndex ||
|
|
|
|
relationId == PgShseclabelToastTable ||
|
|
|
|
relationId == PgShseclabelToastIndex)
|
2006-07-31 22:09:10 +02:00
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
1996-07-09 08:22:35 +02:00
|
|
|
/*
|
2005-08-12 03:36:05 +02:00
|
|
|
* GetNewOid
|
|
|
|
* Generate a new OID that is unique within the given relation.
|
|
|
|
*
|
|
|
|
* Caller must have a suitable lock on the relation.
|
|
|
|
*
|
|
|
|
* Uniqueness is promised only if the relation has a unique index on OID.
|
|
|
|
* This is true for all system catalogs that have OIDs, but might not be
|
|
|
|
* true for user tables. Note that we are effectively assuming that the
|
|
|
|
* table has a relatively small number of entries (much less than 2^32)
|
|
|
|
* and there aren't very long runs of consecutive existing OIDs. Again,
|
|
|
|
* this is reasonable for system catalogs but less so for user tables.
|
|
|
|
*
|
|
|
|
* Since the OID is not immediately inserted into the table, there is a
|
|
|
|
* race condition here; but a problem could occur only if someone else
|
|
|
|
* managed to cycle through 2^32 OIDs and generate the same OID before we
|
2014-05-06 18:12:18 +02:00
|
|
|
* finish inserting our row. This seems unlikely to be a problem. Note
|
2005-08-12 03:36:05 +02:00
|
|
|
* that if we had to *commit* the row to end the race condition, the risk
|
Do not select new object OIDs that match recently-dead entries.
When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:
* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.
* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.
The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
2018-04-11 23:41:09 +02:00
|
|
|
* would be rather higher; therefore we use SnapshotAny in the test, so that
|
|
|
|
* we will see uncommitted rows. (We used to use SnapshotDirty, but that has
|
|
|
|
* the disadvantage that it ignores recently-deleted rows, creating a risk
|
|
|
|
* of transient conflicts for as long as our own MVCC snapshots think a
|
|
|
|
* recently-deleted row is live. The risk is far higher when selecting TOAST
|
|
|
|
* OIDs, because SnapshotToast considers dead rows as active indefinitely.)
|
2005-08-12 03:36:05 +02:00
|
|
|
*/
|
|
|
|
Oid
|
|
|
|
GetNewOid(Relation relation)
|
|
|
|
{
|
|
|
|
Oid oidIndex;
|
|
|
|
|
|
|
|
/* If relation doesn't have OIDs at all, caller is confused */
|
|
|
|
Assert(relation->rd_rel->relhasoids);
|
|
|
|
|
|
|
|
/* In bootstrap mode, we don't have any indexes to use */
|
|
|
|
if (IsBootstrapProcessingMode())
|
|
|
|
return GetNewObjectId();
|
|
|
|
|
|
|
|
/* The relcache will cache the identity of the OID index for us */
|
|
|
|
oidIndex = RelationGetOidIndex(relation);
|
|
|
|
|
|
|
|
/* If no OID index, just hand back the next OID counter value */
|
|
|
|
if (!OidIsValid(oidIndex))
|
|
|
|
{
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* System catalogs that have OIDs should *always* have a unique OID
|
|
|
|
* index; we should only take this path for user tables. Give a
|
|
|
|
* warning if it looks like somebody forgot an index.
|
2005-08-12 03:36:05 +02:00
|
|
|
*/
|
|
|
|
if (IsSystemRelation(relation))
|
|
|
|
elog(WARNING, "generating possibly-non-unique OID for \"%s\"",
|
|
|
|
RelationGetRelationName(relation));
|
|
|
|
|
|
|
|
return GetNewObjectId();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Otherwise, use the index to find a nonconflicting OID */
|
2008-04-13 01:14:21 +02:00
|
|
|
return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber);
|
2005-08-12 03:36:05 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* GetNewOidWithIndex
|
|
|
|
* Guts of GetNewOid: use the supplied index
|
|
|
|
*
|
|
|
|
* This is exported separately because there are cases where we want to use
|
|
|
|
* an index that will not be recognized by RelationGetOidIndex: TOAST tables
|
2012-10-10 23:04:37 +02:00
|
|
|
* have indexes that are usable, but have multiple columns and are on
|
2014-05-06 18:12:18 +02:00
|
|
|
* ordinary columns rather than a true OID column. This code will work
|
2012-10-10 23:04:37 +02:00
|
|
|
* anyway, so long as the OID is the index's first column. The caller must
|
|
|
|
* pass in the actual heap attnum of the OID column, however.
|
2005-08-12 03:36:05 +02:00
|
|
|
*
|
|
|
|
* Caller must have a suitable lock on the relation.
|
|
|
|
*/
|
|
|
|
Oid
|
2008-04-13 01:14:21 +02:00
|
|
|
GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
|
2005-08-12 03:36:05 +02:00
|
|
|
{
|
|
|
|
Oid newOid;
|
2009-06-11 16:49:15 +02:00
|
|
|
SysScanDesc scan;
|
2005-08-12 03:36:05 +02:00
|
|
|
ScanKeyData key;
|
|
|
|
bool collides;
|
|
|
|
|
Assert that we don't invent relfilenodes or type OIDs in binary upgrade.
During pg_upgrade's restore run, all relfilenode choices should be
overridden by commands in the dump script. If we ever find ourselves
choosing a relfilenode in the ordinary way, someone blew it. Likewise for
pg_type OIDs. Since pg_upgrade might well succeed anyway, if there happens
not to be a conflict during the regression test run, we need assertions
here to keep us on the straight and narrow.
We might someday be able to remove the assertion in GetNewRelFileNode,
if pg_upgrade is rewritten to remove its assumption that old and new
relfilenodes always match. But it's hard to see how to get rid of the
pg_type OID constraint, since those OIDs are embedded in user tables
in some cases.
Back-patch as far as 9.5, because of the risk of back-patches breaking
something here even if it works in HEAD. I'd prefer to go back further,
but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary
table. We can't use the later-branch solution of a CTE for compatibility
reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some
other way to do the query. (I did check, by dint of changing the Asserts
to elog(WARNING), that there are no other cases of unwanted OID assignments
during 9.4's regression test run.)
Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
2017-06-13 02:04:32 +02:00
|
|
|
/*
|
|
|
|
* We should never be asked to generate a new pg_type OID during
|
|
|
|
* pg_upgrade; doing so would risk collisions with the OIDs it wants to
|
|
|
|
* assign. Hitting this assert means there's some path where we failed to
|
|
|
|
* ensure that a type OID is determined by commands in the dump script.
|
|
|
|
*/
|
|
|
|
Assert(!IsBinaryUpgrade || RelationGetRelid(relation) != TypeRelationId);
|
|
|
|
|
2005-08-12 03:36:05 +02:00
|
|
|
/* Generate new OIDs until we find one not in the table */
|
|
|
|
do
|
|
|
|
{
|
2008-02-20 18:44:09 +01:00
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2005-08-12 03:36:05 +02:00
|
|
|
newOid = GetNewObjectId();
|
|
|
|
|
|
|
|
ScanKeyInit(&key,
|
2008-04-13 01:14:21 +02:00
|
|
|
oidcolumn,
|
2005-08-12 03:36:05 +02:00
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(newOid));
|
|
|
|
|
Do not select new object OIDs that match recently-dead entries.
When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:
* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.
* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.
The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
2018-04-11 23:41:09 +02:00
|
|
|
/* see notes above about using SnapshotAny */
|
2008-04-13 01:14:21 +02:00
|
|
|
scan = systable_beginscan(relation, indexId, true,
|
Do not select new object OIDs that match recently-dead entries.
When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:
* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.
* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.
The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
2018-04-11 23:41:09 +02:00
|
|
|
SnapshotAny, 1, &key);
|
2005-08-12 03:36:05 +02:00
|
|
|
|
2008-04-13 01:14:21 +02:00
|
|
|
collides = HeapTupleIsValid(systable_getnext(scan));
|
2005-08-12 03:36:05 +02:00
|
|
|
|
2008-04-13 01:14:21 +02:00
|
|
|
systable_endscan(scan);
|
2005-08-12 03:36:05 +02:00
|
|
|
} while (collides);
|
|
|
|
|
|
|
|
return newOid;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* GetNewRelFileNode
|
2014-03-08 18:20:30 +01:00
|
|
|
* Generate a new relfilenode number that is unique within the
|
|
|
|
* database of the given tablespace.
|
2005-08-12 03:36:05 +02:00
|
|
|
*
|
|
|
|
* If the relfilenode will also be used as the relation's OID, pass the
|
|
|
|
* opened pg_class catalog, and this routine will guarantee that the result
|
|
|
|
* is also an unused OID within pg_class. If the result is to be used only
|
|
|
|
* as a relfilenode for an existing relation, pass NULL for pg_class.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2005-08-12 03:36:05 +02:00
|
|
|
* As with GetNewOid, there is some theoretical risk of a race condition,
|
|
|
|
* but it doesn't seem worth worrying about.
|
1996-07-09 08:22:35 +02:00
|
|
|
*
|
2005-08-12 03:36:05 +02:00
|
|
|
* Note: we don't support using this in bootstrap mode. All relations
|
|
|
|
* created by bootstrap have preassigned OIDs, so there's no need.
|
1996-07-09 08:22:35 +02:00
|
|
|
*/
|
1997-09-07 07:04:48 +02:00
|
|
|
Oid
|
2010-12-13 18:34:26 +01:00
|
|
|
GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)
|
1996-07-09 08:22:35 +02:00
|
|
|
{
|
2010-08-13 22:10:54 +02:00
|
|
|
RelFileNodeBackend rnode;
|
2005-08-12 03:36:05 +02:00
|
|
|
char *rpath;
|
|
|
|
int fd;
|
|
|
|
bool collides;
|
2010-12-13 18:34:26 +01:00
|
|
|
BackendId backend;
|
|
|
|
|
Assert that we don't invent relfilenodes or type OIDs in binary upgrade.
During pg_upgrade's restore run, all relfilenode choices should be
overridden by commands in the dump script. If we ever find ourselves
choosing a relfilenode in the ordinary way, someone blew it. Likewise for
pg_type OIDs. Since pg_upgrade might well succeed anyway, if there happens
not to be a conflict during the regression test run, we need assertions
here to keep us on the straight and narrow.
We might someday be able to remove the assertion in GetNewRelFileNode,
if pg_upgrade is rewritten to remove its assumption that old and new
relfilenodes always match. But it's hard to see how to get rid of the
pg_type OID constraint, since those OIDs are embedded in user tables
in some cases.
Back-patch as far as 9.5, because of the risk of back-patches breaking
something here even if it works in HEAD. I'd prefer to go back further,
but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary
table. We can't use the later-branch solution of a CTE for compatibility
reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some
other way to do the query. (I did check, by dint of changing the Asserts
to elog(WARNING), that there are no other cases of unwanted OID assignments
during 9.4's regression test run.)
Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
2017-06-13 02:04:32 +02:00
|
|
|
/*
|
|
|
|
* If we ever get here during pg_upgrade, there's something wrong; all
|
|
|
|
* relfilenode assignments during a binary-upgrade run should be
|
|
|
|
* determined by commands in the dump script.
|
|
|
|
*/
|
|
|
|
Assert(!IsBinaryUpgrade);
|
|
|
|
|
2010-12-13 18:34:26 +01:00
|
|
|
switch (relpersistence)
|
|
|
|
{
|
|
|
|
case RELPERSISTENCE_TEMP:
|
Improve the situation for parallel query versus temp relations.
Transmit the leader's temp-namespace state to workers. This is important
because without it, the workers do not really have the same search path
as the leader. For example, there is no good reason (and no extant code
either) to prevent a worker from executing a temp function that the
leader created previously; but as things stood it would fail to find the
temp function, and then either fail or execute the wrong function entirely.
We still prohibit a worker from creating a temp namespace on its own.
In effect, a worker can only see the session's temp namespace if the leader
had created it before starting the worker, which seems like the right
semantics.
Also, transmit the leader's BackendId to workers, and arrange for workers
to use that when determining the physical file path of a temp relation
belonging to their session. While the original intent was to prevent such
accesses entirely, there were a number of holes in that, notably in places
like dbsize.c which assume they can safely access temp rels of other
sessions anyway. We might as well get this right, as a small down payment
on someday allowing workers to access the leader's temp tables. (With
this change, directly using "MyBackendId" as a relation or buffer backend
ID is deprecated; you should use BackendIdForTempRelations() instead.
I left a couple of such uses alone though, as they're not going to be
reachable in parallel workers until we do something about localbuf.c.)
Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down
into localbuf.c, which is where it actually matters, instead of having it
in relation_open(). This amounts to recognizing that access to temp
tables' catalog entries is perfectly safe in a worker, it's only the data
in local buffers that is problematic.
Having done all that, we can get rid of the test in has_parallel_hazard()
that says that use of a temp table's rowtype is unsafe in parallel workers.
That test was unduly expensive, and if we really did need such a
prohibition, that was not even close to being a bulletproof guard for it.
(For example, any user-defined function executed in a parallel worker
might have attempted such access.)
2016-06-10 02:16:11 +02:00
|
|
|
backend = BackendIdForTempRelations();
|
2010-12-13 18:34:26 +01:00
|
|
|
break;
|
2010-12-29 12:48:53 +01:00
|
|
|
case RELPERSISTENCE_UNLOGGED:
|
2010-12-13 18:34:26 +01:00
|
|
|
case RELPERSISTENCE_PERMANENT:
|
|
|
|
backend = InvalidBackendId;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
elog(ERROR, "invalid relpersistence: %c", relpersistence);
|
|
|
|
return InvalidOid; /* placate compiler */
|
|
|
|
}
|
2005-08-12 03:36:05 +02:00
|
|
|
|
2010-02-07 21:48:13 +01:00
|
|
|
/* This logic should match RelationInitPhysicalAddr */
|
2010-08-13 22:10:54 +02:00
|
|
|
rnode.node.spcNode = reltablespace ? reltablespace : MyDatabaseTableSpace;
|
|
|
|
rnode.node.dbNode = (rnode.node.spcNode == GLOBALTABLESPACE_OID) ? InvalidOid : MyDatabaseId;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The relpath will vary based on the backend ID, so we must initialize
|
|
|
|
* that properly here to make sure that any collisions based on filename
|
|
|
|
* are properly detected.
|
|
|
|
*/
|
|
|
|
rnode.backend = backend;
|
2005-08-12 03:36:05 +02:00
|
|
|
|
|
|
|
do
|
|
|
|
{
|
2008-02-20 18:44:09 +01:00
|
|
|
CHECK_FOR_INTERRUPTS();
|
|
|
|
|
2005-08-12 03:36:05 +02:00
|
|
|
/* Generate the OID */
|
|
|
|
if (pg_class)
|
2010-08-13 22:10:54 +02:00
|
|
|
rnode.node.relNode = GetNewOid(pg_class);
|
2005-08-12 03:36:05 +02:00
|
|
|
else
|
2010-08-13 22:10:54 +02:00
|
|
|
rnode.node.relNode = GetNewObjectId();
|
2005-08-12 03:36:05 +02:00
|
|
|
|
|
|
|
/* Check for existing file of same name */
|
2008-08-11 13:05:11 +02:00
|
|
|
rpath = relpath(rnode, MAIN_FORKNUM);
|
2017-09-23 15:49:22 +02:00
|
|
|
fd = BasicOpenFile(rpath, O_RDONLY | PG_BINARY);
|
2005-08-12 03:36:05 +02:00
|
|
|
|
|
|
|
if (fd >= 0)
|
|
|
|
{
|
|
|
|
/* definite collision */
|
|
|
|
close(fd);
|
|
|
|
collides = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Here we have a little bit of a dilemma: if errno is something
|
2005-10-15 04:49:52 +02:00
|
|
|
* other than ENOENT, should we declare a collision and loop? In
|
|
|
|
* particular one might think this advisable for, say, EPERM.
|
2005-08-12 03:36:05 +02:00
|
|
|
* However there really shouldn't be any unreadable files in a
|
|
|
|
* tablespace directory, and if the EPERM is actually complaining
|
|
|
|
* that we can't read the directory itself, we'd be in an infinite
|
|
|
|
* loop. In practice it seems best to go ahead regardless of the
|
2005-10-15 04:49:52 +02:00
|
|
|
* errno. If there is a colliding file we will get an smgr
|
|
|
|
* failure when we attempt to create the new relation file.
|
2005-08-12 03:36:05 +02:00
|
|
|
*/
|
|
|
|
collides = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
pfree(rpath);
|
|
|
|
} while (collides);
|
|
|
|
|
2010-08-13 22:10:54 +02:00
|
|
|
return rnode.node.relNode;
|
1996-07-09 08:22:35 +02:00
|
|
|
}
|