2004-06-18 08:14:31 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* tablespace.c
|
|
|
|
* Commands to manipulate table spaces
|
|
|
|
*
|
|
|
|
* Tablespaces in PostgreSQL are designed to allow users to determine
|
|
|
|
* where the data file(s) for a given database object reside on the file
|
|
|
|
* system.
|
|
|
|
*
|
|
|
|
* A tablespace represents a directory on the file system. At tablespace
|
|
|
|
* creation time, the directory must be empty. To simplify things and
|
|
|
|
* remove the possibility of having file name conflicts, we isolate
|
|
|
|
* files within a tablespace into database-specific subdirectories.
|
|
|
|
*
|
|
|
|
* To support file access via the information given in RelFileNode, we
|
2004-06-21 03:04:45 +02:00
|
|
|
* maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are
|
2004-06-18 08:14:31 +02:00
|
|
|
* named by tablespace OIDs and point to the actual tablespace directories.
|
2010-01-12 03:42:52 +01:00
|
|
|
* There is also a per-cluster version directory in each tablespace.
|
2004-06-18 08:14:31 +02:00
|
|
|
* Thus the full path to an arbitrary file is
|
2010-01-12 03:42:52 +01:00
|
|
|
* $PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenode
|
|
|
|
* e.g.
|
2010-02-17 05:19:41 +01:00
|
|
|
* $PGDATA/pg_tblspc/20981/PG_9.0_201002161/719849/83292814
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
2004-06-21 06:06:07 +02:00
|
|
|
* There are two tablespaces created at initdb time: pg_global (for shared
|
|
|
|
* tables) and pg_default (for everything else). For backwards compatibility
|
2004-06-18 08:14:31 +02:00
|
|
|
* and to remain functional on platforms without symlinks, these tablespaces
|
|
|
|
* are accessed specially: they are respectively
|
|
|
|
* $PGDATA/global/relfilenode
|
|
|
|
* $PGDATA/base/dboid/relfilenode
|
|
|
|
*
|
|
|
|
* To allow CREATE DATABASE to give a new database a default tablespace
|
|
|
|
* that's different from the template database's default, we make the
|
|
|
|
* provision that a zero in pg_class.reltablespace means the database's
|
2004-08-29 07:07:03 +02:00
|
|
|
* default tablespace. Without this, CREATE DATABASE would have to go in
|
2004-11-05 20:17:13 +01:00
|
|
|
* and munge the system catalogs of the new database.
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
|
|
|
*
|
2013-01-01 23:15:01 +01:00
|
|
|
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
|
2004-06-18 08:14:31 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/tablespace.c
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <dirent.h>
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
|
|
|
|
#include "access/heapam.h"
|
2010-01-05 22:54:00 +01:00
|
|
|
#include "access/reloptions.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "access/sysattr.h"
|
2006-07-13 18:49:20 +02:00
|
|
|
#include "access/xact.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/catalog.h"
|
2005-07-07 22:40:02 +02:00
|
|
|
#include "catalog/dependency.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/indexing.h"
|
2010-11-25 17:48:49 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/pg_tablespace.h"
|
2006-02-12 04:22:21 +01:00
|
|
|
#include "commands/comment.h"
|
2011-07-20 19:18:24 +02:00
|
|
|
#include "commands/seclabel.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "commands/tablespace.h"
|
2013-02-22 02:46:17 +01:00
|
|
|
#include "common/relpath.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "miscadmin.h"
|
2007-11-15 21:36:40 +01:00
|
|
|
#include "postmaster/bgwriter.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "storage/fd.h"
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
#include "storage/standby.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "utils/acl.h"
|
|
|
|
#include "utils/builtins.h"
|
|
|
|
#include "utils/fmgroids.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "utils/guc.h"
|
2007-06-07 21:19:57 +02:00
|
|
|
#include "utils/memutils.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/rel.h"
|
2008-03-26 22:10:39 +01:00
|
|
|
#include "utils/tqual.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/* GUC variables */
|
2005-10-15 04:49:52 +02:00
|
|
|
char *default_tablespace = NULL;
|
2007-06-03 19:08:34 +02:00
|
|
|
char *temp_tablespaces = NULL;
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
static void create_tablespace_directories(const char *location,
|
2010-02-26 03:01:40 +01:00
|
|
|
const Oid tablespaceoid);
|
2010-01-12 03:42:52 +01:00
|
|
|
static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Each database using a table space is isolated into its own name space
|
|
|
|
* by a subdirectory named for the database OID. On first creation of an
|
|
|
|
* object in the tablespace, create the subdirectory. If the subdirectory
|
2010-01-07 05:05:39 +01:00
|
|
|
* already exists, fall through quietly.
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
2006-01-19 05:45:38 +01:00
|
|
|
* isRedo indicates that we are creating an object during WAL replay.
|
|
|
|
* In this case we will cope with the possibility of the tablespace
|
|
|
|
* directory not being there either --- this could happen if we are
|
2004-08-29 23:08:48 +02:00
|
|
|
* replaying an operation on a table in a subsequently-dropped tablespace.
|
|
|
|
* We handle this by making a directory in the place where the tablespace
|
|
|
|
* symlink would normally be. This isn't an exact replay of course, but
|
|
|
|
* it's the best we can do given the available information.
|
2006-03-29 17:15:43 +02:00
|
|
|
*
|
2010-01-07 05:10:39 +01:00
|
|
|
* If tablespaces are not supported, we still need it in case we have to
|
|
|
|
* re-create a database subdirectory (of $PGDATA/base) during WAL replay.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
|
|
|
void
|
2004-07-11 21:52:52 +02:00
|
|
|
TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
struct stat st;
|
2004-08-29 07:07:03 +02:00
|
|
|
char *dir;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
2004-08-29 07:07:03 +02:00
|
|
|
* The global tablespace doesn't have per-database subdirectories, so
|
|
|
|
* nothing to do for it.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
|
|
|
if (spcNode == GLOBALTABLESPACE_OID)
|
|
|
|
return;
|
|
|
|
|
|
|
|
Assert(OidIsValid(spcNode));
|
|
|
|
Assert(OidIsValid(dbNode));
|
|
|
|
|
|
|
|
dir = GetDatabasePath(dbNode, spcNode);
|
|
|
|
|
|
|
|
if (stat(dir, &st) < 0)
|
|
|
|
{
|
2010-01-07 05:10:39 +01:00
|
|
|
/* Directory does not exist? */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (errno == ENOENT)
|
|
|
|
{
|
|
|
|
/*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
|
|
|
|
* or TablespaceCreateDbspace is running concurrently.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2006-01-19 05:45:38 +01:00
|
|
|
LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Recheck to see if someone created the directory while we were
|
|
|
|
* waiting for lock.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
|
|
|
if (stat(dir, &st) == 0 && S_ISDIR(st.st_mode))
|
|
|
|
{
|
2010-01-07 05:10:39 +01:00
|
|
|
/* Directory was created */
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Directory creation failed? */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (mkdir(dir, S_IRWXU) < 0)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2004-08-30 04:54:42 +02:00
|
|
|
char *parentdir;
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* Failure other than not exists or not in WAL replay? */
|
2004-08-29 23:08:48 +02:00
|
|
|
if (errno != ENOENT || !isRedo)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2005-10-15 04:49:52 +02:00
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
dir)));
|
2010-01-07 05:10:39 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/*
|
|
|
|
* Parent directories are missing during WAL replay, so
|
2010-02-26 03:01:40 +01:00
|
|
|
* continue by creating simple parent directories rather
|
|
|
|
* than a symlink.
|
2010-01-12 03:42:52 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
/* create two parents up if not exist */
|
2004-08-29 23:08:48 +02:00
|
|
|
parentdir = pstrdup(dir);
|
|
|
|
get_parent_directory(parentdir);
|
2010-01-12 03:42:52 +01:00
|
|
|
get_parent_directory(parentdir);
|
|
|
|
/* Can't create parent and it doesn't already exist? */
|
|
|
|
if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
parentdir)));
|
|
|
|
pfree(parentdir);
|
|
|
|
|
|
|
|
/* create one parent up if not exist */
|
|
|
|
parentdir = pstrdup(dir);
|
|
|
|
get_parent_directory(parentdir);
|
|
|
|
/* Can't create parent and it doesn't already exist? */
|
|
|
|
if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
|
2004-08-29 23:08:48 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2005-10-15 04:49:52 +02:00
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
parentdir)));
|
2004-08-29 23:08:48 +02:00
|
|
|
pfree(parentdir);
|
2010-01-07 05:10:39 +01:00
|
|
|
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Create database directory */
|
2004-08-29 23:08:48 +02:00
|
|
|
if (mkdir(dir, S_IRWXU) < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2005-10-15 04:49:52 +02:00
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
dir)));
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
LWLockRelease(TablespaceCreateLock);
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not stat directory \"%s\": %m", dir)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Is it not a directory? */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (!S_ISDIR(st.st_mode))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" exists but is not a directory",
|
|
|
|
dir)));
|
|
|
|
}
|
|
|
|
|
|
|
|
pfree(dir);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a table space
|
|
|
|
*
|
|
|
|
* Only superusers can create a tablespace. This seems a reasonable restriction
|
|
|
|
* since we're determining the system layout and, anyway, we probably have
|
|
|
|
* root if we're doing this kind of activity
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2004-06-18 08:14:31 +02:00
|
|
|
CreateTableSpace(CreateTableSpaceStmt *stmt)
|
|
|
|
{
|
|
|
|
#ifdef HAVE_SYMLINK
|
2004-08-29 07:07:03 +02:00
|
|
|
Relation rel;
|
|
|
|
Datum values[Natts_pg_tablespace];
|
2008-11-02 02:45:28 +01:00
|
|
|
bool nulls[Natts_pg_tablespace];
|
2004-06-18 08:14:31 +02:00
|
|
|
HeapTuple tuple;
|
|
|
|
Oid tablespaceoid;
|
2004-08-29 07:07:03 +02:00
|
|
|
char *location;
|
2005-10-15 04:49:52 +02:00
|
|
|
Oid ownerId;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* Must be super user */
|
|
|
|
if (!superuser())
|
|
|
|
ereport(ERROR,
|
2004-08-29 07:07:03 +02:00
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied to create tablespace \"%s\"",
|
|
|
|
stmt->tablespacename),
|
|
|
|
errhint("Must be superuser to create a tablespace.")));
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* However, the eventual owner of the tablespace need not be */
|
|
|
|
if (stmt->owner)
|
2010-08-05 16:45:09 +02:00
|
|
|
ownerId = get_role_oid(stmt->owner, false);
|
2004-06-18 08:14:31 +02:00
|
|
|
else
|
2005-06-28 07:09:14 +02:00
|
|
|
ownerId = GetUserId();
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* Unix-ify the offered path, and strip any trailing slashes */
|
|
|
|
location = pstrdup(stmt->location);
|
|
|
|
canonicalize_path(location);
|
|
|
|
|
|
|
|
/* disallow quotes, else CREATE DATABASE would be at risk */
|
|
|
|
if (strchr(location, '\''))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_NAME),
|
2007-11-15 22:14:46 +01:00
|
|
|
errmsg("tablespace location cannot contain single quotes")));
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allowing relative paths seems risky
|
|
|
|
*
|
|
|
|
* this also helps us ensure that location is not empty or whitespace
|
|
|
|
*/
|
|
|
|
if (!is_absolute_path(location))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("tablespace location must be an absolute path")));
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Check that location isn't too long. Remember that we're going to append
|
2010-02-26 03:01:40 +01:00
|
|
|
* 'PG_XXX/<dboid>/<relid>.<nnn>'. FYI, we never actually reference the
|
2010-01-12 03:42:52 +01:00
|
|
|
* whole path, but mkdir() uses the first two parts.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
|
|
|
|
OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH)
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("tablespace location \"%s\" is too long",
|
|
|
|
location)));
|
|
|
|
|
2004-06-21 06:06:07 +02:00
|
|
|
/*
|
|
|
|
* Disallow creation of tablespaces named "pg_xxx"; we reserve this
|
|
|
|
* namespace for system purposes.
|
|
|
|
*/
|
|
|
|
if (!allowSystemTableMods && IsReservedName(stmt->tablespacename))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_RESERVED_NAME),
|
|
|
|
errmsg("unacceptable tablespace name \"%s\"",
|
|
|
|
stmt->tablespacename),
|
2005-10-15 04:49:52 +02:00
|
|
|
errdetail("The prefix \"pg_\" is reserved for system tablespaces.")));
|
2004-06-21 06:06:07 +02:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
2004-08-29 07:07:03 +02:00
|
|
|
* Check that there is no other tablespace by this name. (The unique
|
|
|
|
* index would catch this anyway, but might as well give a friendlier
|
|
|
|
* message.)
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
if (OidIsValid(get_tablespace_oid(stmt->tablespacename, true)))
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" already exists",
|
|
|
|
stmt->tablespacename)));
|
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Insert tuple into pg_tablespace. The purpose of doing this first is to
|
|
|
|
* lock the proposed tablename against other would-be creators. The
|
|
|
|
* insertion will roll back if we find problems below.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2005-04-14 22:03:27 +02:00
|
|
|
rel = heap_open(TableSpaceRelationId, RowExclusiveLock);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
MemSet(nulls, false, sizeof(nulls));
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
values[Anum_pg_tablespace_spcname - 1] =
|
|
|
|
DirectFunctionCall1(namein, CStringGetDatum(stmt->tablespacename));
|
|
|
|
values[Anum_pg_tablespace_spcowner - 1] =
|
2005-06-28 07:09:14 +02:00
|
|
|
ObjectIdGetDatum(ownerId);
|
2008-11-02 02:45:28 +01:00
|
|
|
nulls[Anum_pg_tablespace_spcacl - 1] = true;
|
2010-01-05 22:54:00 +01:00
|
|
|
nulls[Anum_pg_tablespace_spcoptions - 1] = true;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
tuple = heap_form_tuple(rel->rd_att, values, nulls);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
tablespaceoid = simple_heap_insert(rel, tuple);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
CatalogUpdateIndexes(rel, tuple);
|
|
|
|
|
|
|
|
heap_freetuple(tuple);
|
|
|
|
|
2005-07-07 22:40:02 +02:00
|
|
|
/* Record dependency on owner */
|
|
|
|
recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId);
|
|
|
|
|
2010-11-25 17:48:49 +01:00
|
|
|
/* Post creation hook for new tablespace */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectPostCreateHook(TableSpaceRelationId, tablespaceoid, 0);
|
2010-11-25 17:48:49 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
create_tablespace_directories(location, tablespaceoid);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_tblspc_create_rec xlrec;
|
|
|
|
XLogRecData rdata[2];
|
|
|
|
|
|
|
|
xlrec.ts_id = tablespaceoid;
|
|
|
|
rdata[0].data = (char *) &xlrec;
|
|
|
|
rdata[0].len = offsetof(xl_tblspc_create_rec, ts_path);
|
2005-06-06 22:22:58 +02:00
|
|
|
rdata[0].buffer = InvalidBuffer;
|
2004-08-29 23:08:48 +02:00
|
|
|
rdata[0].next = &(rdata[1]);
|
|
|
|
|
|
|
|
rdata[1].data = (char *) location;
|
|
|
|
rdata[1].len = strlen(location) + 1;
|
2005-06-06 22:22:58 +02:00
|
|
|
rdata[1].buffer = InvalidBuffer;
|
2004-08-29 23:08:48 +02:00
|
|
|
rdata[1].next = NULL;
|
|
|
|
|
|
|
|
(void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_CREATE, rdata);
|
|
|
|
}
|
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Force synchronous commit, to minimize the window between creating the
|
|
|
|
* symlink on-disk and marking the transaction committed. It's not great
|
|
|
|
* that there is any window at all, but definitely we don't want to make
|
|
|
|
* it larger than necessary.
|
2007-08-02 00:45:09 +02:00
|
|
|
*/
|
|
|
|
ForceSyncCommit();
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
pfree(location);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* We keep the lock on pg_tablespace until commit */
|
|
|
|
heap_close(rel, NoLock);
|
2004-08-29 07:07:03 +02:00
|
|
|
#else /* !HAVE_SYMLINK */
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("tablespaces are not supported on this platform")));
|
2004-08-29 07:07:03 +02:00
|
|
|
#endif /* HAVE_SYMLINK */
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return tablespaceoid;
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Drop a table space
|
|
|
|
*
|
|
|
|
* Be careful to check that the tablespace is empty.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
DropTableSpace(DropTableSpaceStmt *stmt)
|
|
|
|
{
|
|
|
|
#ifdef HAVE_SYMLINK
|
2004-08-29 07:07:03 +02:00
|
|
|
char *tablespacename = stmt->tablespacename;
|
|
|
|
HeapScanDesc scandesc;
|
|
|
|
Relation rel;
|
|
|
|
HeapTuple tuple;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
Oid tablespaceoid;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the target tuple
|
|
|
|
*/
|
2006-01-19 05:45:38 +01:00
|
|
|
rel = heap_open(TableSpaceRelationId, RowExclusiveLock);
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(tablespacename));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scandesc = heap_beginscan_catalog(rel, 1, entry);
|
2004-06-18 08:14:31 +02:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
2006-06-16 22:23:45 +02:00
|
|
|
{
|
2006-10-04 02:30:14 +02:00
|
|
|
if (!stmt->missing_ok)
|
2006-06-16 22:23:45 +02:00
|
|
|
{
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
tablespacename)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
2006-10-03 23:21:36 +02:00
|
|
|
(errmsg("tablespace \"%s\" does not exist, skipping",
|
2006-06-16 22:23:45 +02:00
|
|
|
tablespacename)));
|
|
|
|
/* XXX I assume I need one or both of these next two calls */
|
|
|
|
heap_endscan(scandesc);
|
|
|
|
heap_close(rel, NoLock);
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
tablespaceoid = HeapTupleGetOid(tuple);
|
|
|
|
|
2005-06-28 07:09:14 +02:00
|
|
|
/* Must be tablespace owner */
|
|
|
|
if (!pg_tablespace_ownercheck(tablespaceoid, GetUserId()))
|
2004-06-18 08:14:31 +02:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_TABLESPACE,
|
|
|
|
tablespacename);
|
|
|
|
|
|
|
|
/* Disallow drop of the standard tablespaces, even by superuser */
|
|
|
|
if (tablespaceoid == GLOBALTABLESPACE_OID ||
|
|
|
|
tablespaceoid == DEFAULTTABLESPACE_OID)
|
|
|
|
aclcheck_error(ACLCHECK_NO_PRIV, ACL_KIND_TABLESPACE,
|
|
|
|
tablespacename);
|
|
|
|
|
2012-03-09 20:34:56 +01:00
|
|
|
/* DROP hook for the tablespace being removed */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectDropHook(TableSpaceRelationId, tablespaceoid, 0);
|
2012-03-09 20:34:56 +01:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Remove the pg_tablespace tuple (this will roll back if we fail below)
|
2004-08-29 23:08:48 +02:00
|
|
|
*/
|
|
|
|
simple_heap_delete(rel, &tuple->t_self);
|
|
|
|
|
|
|
|
heap_endscan(scandesc);
|
|
|
|
|
2006-02-12 04:22:21 +01:00
|
|
|
/*
|
2011-07-20 19:18:24 +02:00
|
|
|
* Remove any comments or security labels on this tablespace.
|
2006-02-12 04:22:21 +01:00
|
|
|
*/
|
|
|
|
DeleteSharedComments(tablespaceoid, TableSpaceRelationId);
|
2011-07-20 19:18:24 +02:00
|
|
|
DeleteSharedSecurityLabel(tablespaceoid, TableSpaceRelationId);
|
2006-02-12 04:22:21 +01:00
|
|
|
|
2005-08-30 03:08:47 +02:00
|
|
|
/*
|
|
|
|
* Remove dependency on owner.
|
|
|
|
*/
|
2009-01-22 21:16:10 +01:00
|
|
|
deleteSharedDependencyRecordsFor(TableSpaceRelationId, tablespaceoid, 0);
|
2005-08-30 03:08:47 +02:00
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
/*
|
|
|
|
* Acquire TablespaceCreateLock to ensure that no TablespaceCreateDbspace
|
|
|
|
* is running concurrently.
|
|
|
|
*/
|
|
|
|
LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Try to remove the physical infrastructure.
|
2004-08-29 23:08:48 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(tablespaceoid, false))
|
2007-11-15 21:36:40 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Not all files deleted? However, there can be lingering empty files
|
|
|
|
* in the directories, left behind by for example DROP TABLE, that
|
|
|
|
* have been scheduled for deletion at next checkpoint (see comments
|
2007-11-15 22:14:46 +01:00
|
|
|
* in mdunlink() for details). We could just delete them immediately,
|
2007-11-15 21:36:40 +01:00
|
|
|
* but we can't tell them apart from important data files that we
|
|
|
|
* mustn't delete. So instead, we force a checkpoint which will clean
|
|
|
|
* out any lingering files, and try again.
|
|
|
|
*/
|
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(tablespaceoid, false))
|
2007-11-15 21:36:40 +01:00
|
|
|
{
|
|
|
|
/* Still not empty, the files must be important then */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("tablespace \"%s\" is not empty",
|
|
|
|
tablespacename)));
|
|
|
|
}
|
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
|
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_tblspc_drop_rec xlrec;
|
|
|
|
XLogRecData rdata[1];
|
|
|
|
|
|
|
|
xlrec.ts_id = tablespaceoid;
|
|
|
|
rdata[0].data = (char *) &xlrec;
|
|
|
|
rdata[0].len = sizeof(xl_tblspc_drop_rec);
|
2005-06-06 22:22:58 +02:00
|
|
|
rdata[0].buffer = InvalidBuffer;
|
2004-08-29 23:08:48 +02:00
|
|
|
rdata[0].next = NULL;
|
|
|
|
|
|
|
|
(void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_DROP, rdata);
|
|
|
|
}
|
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
/*
|
2006-10-04 02:30:14 +02:00
|
|
|
* Note: because we checked that the tablespace was empty, there should be
|
|
|
|
* no need to worry about flushing shared buffers or free space map
|
2006-03-29 23:17:39 +02:00
|
|
|
* entries for relations in the tablespace.
|
|
|
|
*/
|
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Force synchronous commit, to minimize the window between removing the
|
|
|
|
* files on-disk and marking the transaction committed. It's not great
|
|
|
|
* that there is any window at all, but definitely we don't want to make
|
|
|
|
* it larger than necessary.
|
2007-08-02 00:45:09 +02:00
|
|
|
*/
|
|
|
|
ForceSyncCommit();
|
|
|
|
|
2006-03-29 23:17:39 +02:00
|
|
|
/*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Allow TablespaceCreateDbspace again.
|
|
|
|
*/
|
|
|
|
LWLockRelease(TablespaceCreateLock);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* We keep the lock on pg_tablespace until commit */
|
|
|
|
heap_close(rel, NoLock);
|
|
|
|
#else /* !HAVE_SYMLINK */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("tablespaces are not supported on this platform")));
|
|
|
|
#endif /* HAVE_SYMLINK */
|
|
|
|
}
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
2010-01-12 03:42:52 +01:00
|
|
|
* create_tablespace_directories
|
2004-08-29 23:08:48 +02:00
|
|
|
*
|
2010-01-12 03:42:52 +01:00
|
|
|
* Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/
|
|
|
|
* to the specified directory
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
create_tablespace_directories(const char *location, const Oid tablespaceoid)
|
|
|
|
{
|
2013-10-13 06:09:18 +02:00
|
|
|
char *linkloc;
|
|
|
|
char *location_with_version_dir;
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
|
|
|
|
location_with_version_dir = psprintf("%s/%s", location,
|
2010-02-26 03:01:40 +01:00
|
|
|
TABLESPACE_VERSION_DIRECTORY);
|
2010-01-12 03:42:52 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to coerce target directory to safe permissions. If this fails,
|
|
|
|
* it doesn't exist or has the wrong owner.
|
|
|
|
*/
|
2010-12-10 23:35:33 +01:00
|
|
|
if (chmod(location, S_IRWXU) != 0)
|
2010-01-12 03:42:52 +01:00
|
|
|
{
|
|
|
|
if (errno == ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_FILE),
|
2010-07-02 04:44:32 +02:00
|
|
|
errmsg("directory \"%s\" does not exist", location),
|
2010-07-18 06:47:46 +02:00
|
|
|
InRecovery ? errhint("Create this directory for the tablespace before "
|
2011-04-10 17:42:00 +02:00
|
|
|
"restarting the server.") : 0));
|
2010-01-12 03:42:52 +01:00
|
|
|
else
|
|
|
|
ereport(ERROR,
|
2010-02-26 03:01:40 +01:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not set permissions on directory \"%s\": %m",
|
|
|
|
location)));
|
2010-01-12 03:42:52 +01:00
|
|
|
}
|
|
|
|
|
2010-07-20 20:14:16 +02:00
|
|
|
if (InRecovery)
|
|
|
|
{
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Our theory for replaying a CREATE is to forcibly drop the target
|
2011-04-10 17:42:00 +02:00
|
|
|
* subdirectory if present, and then recreate it. This may be more
|
|
|
|
* work than needed, but it is simple to implement.
|
2010-07-20 20:14:16 +02:00
|
|
|
*/
|
|
|
|
if (stat(location_with_version_dir, &st) == 0 && S_ISDIR(st.st_mode))
|
|
|
|
{
|
|
|
|
if (!rmtree(location_with_version_dir, true))
|
|
|
|
/* If this failed, mkdir() below is going to error. */
|
|
|
|
ereport(WARNING,
|
|
|
|
(errmsg("some useless files may be left behind in old database directory \"%s\"",
|
|
|
|
location_with_version_dir)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* The creation of the version directory prevents more than one tablespace
|
|
|
|
* in a single location.
|
2010-01-12 03:42:52 +01:00
|
|
|
*/
|
|
|
|
if (mkdir(location_with_version_dir, S_IRWXU) < 0)
|
|
|
|
{
|
|
|
|
if (errno == EEXIST)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("directory \"%s\" already in use as a tablespace",
|
|
|
|
location_with_version_dir)));
|
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2010-02-26 03:01:40 +01:00
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
location_with_version_dir)));
|
2010-01-12 03:42:52 +01:00
|
|
|
}
|
|
|
|
|
2010-07-20 20:14:16 +02:00
|
|
|
/* Remove old symlink in recovery, in case it points to the wrong place */
|
|
|
|
if (InRecovery)
|
|
|
|
{
|
|
|
|
if (unlink(linkloc) < 0 && errno != ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove symbolic link \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
}
|
2010-11-23 21:27:50 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/*
|
|
|
|
* Create the symlink under PGDATA
|
|
|
|
*/
|
|
|
|
if (symlink(location, linkloc) < 0)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create symbolic link \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
|
|
|
|
pfree(linkloc);
|
|
|
|
pfree(location_with_version_dir);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* destroy_tablespace_directories
|
|
|
|
*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* Attempt to remove filesystem infrastructure for the tablespace.
|
2010-01-12 03:42:52 +01:00
|
|
|
*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* 'redo' indicates we are redoing a drop from XLOG; in that case we should
|
2012-06-10 21:20:04 +02:00
|
|
|
* not throw an ERROR for problems, just LOG them. The worst consequence of
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* not removing files here would be failure to release some disk space, which
|
|
|
|
* does not justify throwing an error that would require manual intervention
|
|
|
|
* to get the database running again.
|
2004-08-29 23:08:48 +02:00
|
|
|
*
|
2010-01-12 03:42:52 +01:00
|
|
|
* Returns TRUE if successful, FALSE if some subdirectory is not empty
|
2004-08-29 23:08:48 +02:00
|
|
|
*/
|
|
|
|
static bool
|
2010-01-12 03:42:52 +01:00
|
|
|
destroy_tablespace_directories(Oid tablespaceoid, bool redo)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
char *linkloc;
|
|
|
|
char *linkloc_with_version_dir;
|
2004-08-29 23:08:48 +02:00
|
|
|
DIR *dirdesc;
|
|
|
|
struct dirent *de;
|
|
|
|
char *subfile;
|
|
|
|
struct stat st;
|
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
linkloc_with_version_dir = psprintf("pg_tblspc/%u/%s", tablespaceoid,
|
2010-02-26 03:01:40 +01:00
|
|
|
TABLESPACE_VERSION_DIRECTORY);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
2005-10-15 04:49:52 +02:00
|
|
|
* Check if the tablespace still contains any files. We try to rmdir each
|
|
|
|
* per-database directory we find in it. rmdir failure implies there are
|
|
|
|
* still files in that subdirectory, so give up. (We do not have to worry
|
|
|
|
* about undoing any already completed rmdirs, since the next attempt to
|
|
|
|
* use the tablespace from that database will simply recreate the
|
|
|
|
* subdirectory via TablespaceCreateDbspace.)
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Since we hold TablespaceCreateLock, no one else should be creating any
|
|
|
|
* fresh subdirectories in parallel. It is possible that new files are
|
|
|
|
* being created within subdirectories, though, so the rmdir call could
|
|
|
|
* fail. Worst consequence is a less friendly error message.
|
2007-03-22 20:51:44 +01:00
|
|
|
*
|
|
|
|
* If redo is true then ENOENT is a likely outcome here, and we allow it
|
|
|
|
* to pass without comment. In normal operation we still allow it, but
|
2007-11-15 22:14:46 +01:00
|
|
|
* with a warning. This is because even though ProcessUtility disallows
|
2007-03-22 20:51:44 +01:00
|
|
|
* DROP TABLESPACE in a transaction block, it's possible that a previous
|
|
|
|
* DROP failed and rolled back after removing the tablespace directories
|
2012-06-10 21:20:04 +02:00
|
|
|
* and/or symlink. We want to allow a new DROP attempt to succeed at
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
* removing the catalog entries (and symlink if still present), so we
|
|
|
|
* should not give a hard error here.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
dirdesc = AllocateDir(linkloc_with_version_dir);
|
2004-06-18 08:14:31 +02:00
|
|
|
if (dirdesc == NULL)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2007-03-22 20:51:44 +01:00
|
|
|
if (errno == ENOENT)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2007-03-22 20:51:44 +01:00
|
|
|
if (!redo)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open directory \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc_with_version_dir)));
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
/* The symlink might still exist, so go try to remove it */
|
|
|
|
goto remove_symlink;
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
else if (redo)
|
|
|
|
{
|
|
|
|
/* in redo, just log other types of error */
|
|
|
|
ereport(LOG,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open directory \"%s\": %m",
|
|
|
|
linkloc_with_version_dir)));
|
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
return false;
|
|
|
|
}
|
2005-06-19 23:34:03 +02:00
|
|
|
/* else let ReadDir report the error */
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
while ((de = ReadDir(dirdesc, linkloc_with_version_dir)) != NULL)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
if (strcmp(de->d_name, ".") == 0 ||
|
2010-01-12 03:42:52 +01:00
|
|
|
strcmp(de->d_name, "..") == 0)
|
2004-06-18 08:14:31 +02:00
|
|
|
continue;
|
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
subfile = psprintf("%s/%s", linkloc_with_version_dir, de->d_name);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* This check is just to deliver a friendlier error message */
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
if (!redo && !directory_is_empty(subfile))
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
|
|
|
FreeDir(dirdesc);
|
2010-01-12 03:42:52 +01:00
|
|
|
pfree(subfile);
|
|
|
|
pfree(linkloc_with_version_dir);
|
2004-08-29 23:08:48 +02:00
|
|
|
return false;
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* remove empty directory */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (rmdir(subfile) < 0)
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
ereport(redo ? LOG : ERROR,
|
2004-06-18 08:14:31 +02:00
|
|
|
(errcode_for_file_access(),
|
2007-05-31 17:13:06 +02:00
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
2004-06-18 08:14:31 +02:00
|
|
|
subfile)));
|
|
|
|
|
|
|
|
pfree(subfile);
|
|
|
|
}
|
2004-08-29 07:07:03 +02:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
FreeDir(dirdesc);
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* remove version directory */
|
|
|
|
if (rmdir(linkloc_with_version_dir) < 0)
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
{
|
|
|
|
ereport(redo ? LOG : ERROR,
|
2010-01-12 03:42:52 +01:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
|
|
|
linkloc_with_version_dir)));
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
return false;
|
|
|
|
}
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* Try to remove the symlink. We must however deal with the possibility
|
|
|
|
* that it's a directory instead of a symlink --- this could happen during
|
|
|
|
* WAL replay (see TablespaceCreateDbspace), and it is also the case on
|
|
|
|
* Windows where junction points lstat() as directories.
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
*
|
|
|
|
* Note: in the redo case, we'll return true if this final step fails;
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
* there's no point in retrying it. Also, ENOENT should provoke no more
|
|
|
|
* than a warning.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
remove_symlink:
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc = pstrdup(linkloc_with_version_dir);
|
|
|
|
get_parent_directory(linkloc);
|
|
|
|
if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
if (rmdir(linkloc) < 0)
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
ereport(redo ? LOG : ERROR,
|
2004-08-29 23:08:48 +02:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc)));
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
if (unlink(linkloc) < 0)
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
ereport(redo ? LOG : (errno == ENOENT ? WARNING : ERROR),
|
2004-08-29 23:08:48 +02:00
|
|
|
(errcode_for_file_access(),
|
2004-11-05 18:11:34 +01:00
|
|
|
errmsg("could not remove symbolic link \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc)));
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
pfree(linkloc);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
return true;
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if a directory is empty.
|
2004-10-17 22:47:21 +02:00
|
|
|
*
|
|
|
|
* This probably belongs somewhere else, but not sure where...
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2004-10-17 22:47:21 +02:00
|
|
|
bool
|
2004-06-18 08:14:31 +02:00
|
|
|
directory_is_empty(const char *path)
|
|
|
|
{
|
2004-08-29 07:07:03 +02:00
|
|
|
DIR *dirdesc;
|
2004-06-18 08:14:31 +02:00
|
|
|
struct dirent *de;
|
|
|
|
|
|
|
|
dirdesc = AllocateDir(path);
|
|
|
|
|
2005-06-19 23:34:03 +02:00
|
|
|
while ((de = ReadDir(dirdesc, path)) != NULL)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
if (strcmp(de->d_name, ".") == 0 ||
|
|
|
|
strcmp(de->d_name, "..") == 0)
|
|
|
|
continue;
|
|
|
|
FreeDir(dirdesc);
|
|
|
|
return false;
|
|
|
|
}
|
2004-08-29 07:07:03 +02:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
FreeDir(dirdesc);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
/*
|
|
|
|
* Rename a tablespace
|
|
|
|
*/
|
2012-12-24 00:25:03 +01:00
|
|
|
Oid
|
2004-06-25 23:55:59 +02:00
|
|
|
RenameTableSpace(const char *oldname, const char *newname)
|
|
|
|
{
|
2012-12-24 00:25:03 +01:00
|
|
|
Oid tspId;
|
2004-08-29 07:07:03 +02:00
|
|
|
Relation rel;
|
|
|
|
ScanKeyData entry[1];
|
2004-06-25 23:55:59 +02:00
|
|
|
HeapScanDesc scan;
|
|
|
|
HeapTuple tup;
|
|
|
|
HeapTuple newtuple;
|
|
|
|
Form_pg_tablespace newform;
|
|
|
|
|
|
|
|
/* Search pg_tablespace */
|
2005-04-14 22:03:27 +02:00
|
|
|
rel = heap_open(TableSpaceRelationId, RowExclusiveLock);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(oldname));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scan = heap_beginscan_catalog(rel, 1, entry);
|
2004-06-25 23:55:59 +02:00
|
|
|
tup = heap_getnext(scan, ForwardScanDirection);
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
oldname)));
|
|
|
|
|
2012-12-24 00:25:03 +01:00
|
|
|
tspId = HeapTupleGetOid(tup);
|
2004-06-25 23:55:59 +02:00
|
|
|
newtuple = heap_copytuple(tup);
|
|
|
|
newform = (Form_pg_tablespace) GETSTRUCT(newtuple);
|
|
|
|
|
|
|
|
heap_endscan(scan);
|
|
|
|
|
2005-06-28 07:09:14 +02:00
|
|
|
/* Must be owner */
|
|
|
|
if (!pg_tablespace_ownercheck(HeapTupleGetOid(newtuple), GetUserId()))
|
2004-06-25 23:55:59 +02:00
|
|
|
aclcheck_error(ACLCHECK_NO_PRIV, ACL_KIND_TABLESPACE, oldname);
|
|
|
|
|
|
|
|
/* Validate new name */
|
|
|
|
if (!allowSystemTableMods && IsReservedName(newname))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_RESERVED_NAME),
|
|
|
|
errmsg("unacceptable tablespace name \"%s\"", newname),
|
2005-10-15 04:49:52 +02:00
|
|
|
errdetail("The prefix \"pg_\" is reserved for system tablespaces.")));
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
/* Make sure the new name doesn't exist */
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(newname));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scan = heap_beginscan_catalog(rel, 1, entry);
|
2004-06-25 23:55:59 +02:00
|
|
|
tup = heap_getnext(scan, ForwardScanDirection);
|
|
|
|
if (HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" already exists",
|
|
|
|
newname)));
|
2004-08-29 07:07:03 +02:00
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
heap_endscan(scan);
|
|
|
|
|
|
|
|
/* OK, update the entry */
|
|
|
|
namestrcpy(&(newform->spcname), newname);
|
|
|
|
|
|
|
|
simple_heap_update(rel, &newtuple->t_self, newtuple);
|
|
|
|
CatalogUpdateIndexes(rel, newtuple);
|
|
|
|
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostAlterHook(TableSpaceRelationId, tspId, 0);
|
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
heap_close(rel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
|
|
|
return tspId;
|
2004-06-25 23:55:59 +02:00
|
|
|
}
|
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
/*
|
|
|
|
* Alter table space options
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2010-01-05 22:54:00 +01:00
|
|
|
AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
HeapScanDesc scandesc;
|
|
|
|
HeapTuple tup;
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid tablespaceoid;
|
2010-01-05 22:54:00 +01:00
|
|
|
Datum datum;
|
|
|
|
Datum newOptions;
|
|
|
|
Datum repl_val[Natts_pg_tablespace];
|
|
|
|
bool isnull;
|
|
|
|
bool repl_null[Natts_pg_tablespace];
|
|
|
|
bool repl_repl[Natts_pg_tablespace];
|
|
|
|
HeapTuple newtuple;
|
|
|
|
|
|
|
|
/* Search pg_tablespace */
|
|
|
|
rel = heap_open(TableSpaceRelationId, RowExclusiveLock);
|
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(stmt->tablespacename));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scandesc = heap_beginscan_catalog(rel, 1, entry);
|
2010-01-05 22:54:00 +01:00
|
|
|
tup = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
2010-02-26 03:01:40 +01:00
|
|
|
stmt->tablespacename)));
|
2010-01-05 22:54:00 +01:00
|
|
|
|
2012-12-29 13:55:37 +01:00
|
|
|
tablespaceoid = HeapTupleGetOid(tup);
|
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
/* Must be owner of the existing object */
|
|
|
|
if (!pg_tablespace_ownercheck(HeapTupleGetOid(tup), GetUserId()))
|
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_TABLESPACE,
|
|
|
|
stmt->tablespacename);
|
|
|
|
|
|
|
|
/* Generate new proposed spcoptions (text array) */
|
|
|
|
datum = heap_getattr(tup, Anum_pg_tablespace_spcoptions,
|
|
|
|
RelationGetDescr(rel), &isnull);
|
|
|
|
newOptions = transformRelOptions(isnull ? (Datum) 0 : datum,
|
|
|
|
stmt->options, NULL, NULL, false,
|
|
|
|
stmt->isReset);
|
|
|
|
(void) tablespace_reloptions(newOptions, true);
|
|
|
|
|
|
|
|
/* Build new tuple. */
|
|
|
|
memset(repl_null, false, sizeof(repl_null));
|
|
|
|
memset(repl_repl, false, sizeof(repl_repl));
|
|
|
|
if (newOptions != (Datum) 0)
|
|
|
|
repl_val[Anum_pg_tablespace_spcoptions - 1] = newOptions;
|
|
|
|
else
|
|
|
|
repl_null[Anum_pg_tablespace_spcoptions - 1] = true;
|
|
|
|
repl_repl[Anum_pg_tablespace_spcoptions - 1] = true;
|
|
|
|
newtuple = heap_modify_tuple(tup, RelationGetDescr(rel), repl_val,
|
|
|
|
repl_null, repl_repl);
|
|
|
|
|
|
|
|
/* Update system catalog. */
|
|
|
|
simple_heap_update(rel, &newtuple->t_self, newtuple);
|
|
|
|
CatalogUpdateIndexes(rel, newtuple);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
|
|
|
InvokeObjectPostAlterHook(TableSpaceRelationId, HeapTupleGetOid(tup), 0);
|
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
heap_freetuple(newtuple);
|
|
|
|
|
|
|
|
/* Conclude heap scan. */
|
|
|
|
heap_endscan(scandesc);
|
|
|
|
heap_close(rel, NoLock);
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return tablespaceoid;
|
2010-01-05 22:54:00 +01:00
|
|
|
}
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* Routines for handling the GUC variable 'default_tablespace'.
|
|
|
|
*/
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* check_hook: validate new default_tablespace */
|
|
|
|
bool
|
|
|
|
check_default_tablespace(char **newval, void **extra, GucSource source)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* If we aren't inside a transaction, we cannot do database access so
|
2005-10-15 04:49:52 +02:00
|
|
|
* cannot verify the name. Must accept the value on faith.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
|
|
|
if (IsTransactionState())
|
|
|
|
{
|
2011-04-07 06:11:01 +02:00
|
|
|
if (**newval != '\0' &&
|
|
|
|
!OidIsValid(get_tablespace_oid(*newval, true)))
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
/*
|
2013-09-04 00:56:22 +02:00
|
|
|
* When source == PGC_S_TEST, don't throw a hard error for a
|
|
|
|
* nonexistent tablespace, only a NOTICE. See comments in guc.h.
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
*/
|
|
|
|
if (source == PGC_S_TEST)
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
*newval)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
GUC_check_errdetail("Tablespace \"%s\" does not exist.",
|
|
|
|
*newval);
|
|
|
|
return false;
|
|
|
|
}
|
2004-11-05 20:17:13 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
return true;
|
2004-11-05 20:17:13 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* GetDefaultTablespace -- get the OID of the current default tablespace
|
|
|
|
*
|
2010-12-13 18:34:26 +01:00
|
|
|
* Temporary objects have different default tablespaces, hence the
|
|
|
|
* relpersistence parameter must be specified.
|
2007-06-03 19:08:34 +02:00
|
|
|
*
|
|
|
|
* May return InvalidOid to indicate "use the database's default tablespace".
|
|
|
|
*
|
|
|
|
* Note that caller is expected to check appropriate permissions for any
|
|
|
|
* result other than InvalidOid.
|
2004-11-05 20:17:13 +01:00
|
|
|
*
|
|
|
|
* This exists to hide (and possibly optimize the use of) the
|
|
|
|
* default_tablespace GUC variable.
|
|
|
|
*/
|
|
|
|
Oid
|
2010-12-13 18:34:26 +01:00
|
|
|
GetDefaultTablespace(char relpersistence)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
Oid result;
|
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* The temp-table case is handled elsewhere */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (relpersistence == RELPERSISTENCE_TEMP)
|
2007-06-07 21:19:57 +02:00
|
|
|
{
|
|
|
|
PrepareTempTablespaces();
|
|
|
|
return GetNextTempTableSpace();
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/* Fast path for default_tablespace == "" */
|
|
|
|
if (default_tablespace == NULL || default_tablespace[0] == '\0')
|
|
|
|
return InvalidOid;
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* It is tempting to cache this lookup for more speed, but then we would
|
2005-10-15 04:49:52 +02:00
|
|
|
* fail to detect the case where the tablespace was dropped since the GUC
|
|
|
|
* variable was set. Note also that we don't complain if the value fails
|
|
|
|
* to refer to an existing tablespace; we just silently return InvalidOid,
|
|
|
|
* causing the new object to be created in the database's tablespace.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
result = get_tablespace_oid(default_tablespace, true);
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* Allow explicit specification of database's default tablespace in
|
|
|
|
* default_tablespace without triggering permissions checks.
|
|
|
|
*/
|
|
|
|
if (result == MyDatabaseTableSpace)
|
|
|
|
result = InvalidOid;
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Routines for handling the GUC variable 'temp_tablespaces'.
|
|
|
|
*/
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
int numSpcs;
|
|
|
|
Oid tblSpcs[1]; /* VARIABLE LENGTH ARRAY */
|
|
|
|
} temp_tablespaces_extra;
|
|
|
|
|
|
|
|
/* check_hook: validate new temp_tablespaces */
|
|
|
|
bool
|
|
|
|
check_temp_tablespaces(char **newval, void **extra, GucSource source)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
|
|
|
char *rawname;
|
|
|
|
List *namelist;
|
|
|
|
|
|
|
|
/* Need a modifiable copy of string */
|
2011-04-07 06:11:01 +02:00
|
|
|
rawname = pstrdup(*newval);
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Parse string into list of identifiers */
|
|
|
|
if (!SplitIdentifierString(rawname, ',', &namelist))
|
|
|
|
{
|
|
|
|
/* syntax error in name list */
|
2011-04-07 06:11:01 +02:00
|
|
|
GUC_check_errdetail("List syntax is invalid.");
|
2007-06-03 19:08:34 +02:00
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
2011-04-07 06:11:01 +02:00
|
|
|
return false;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we aren't inside a transaction, we cannot do database access so
|
|
|
|
* cannot verify the individual names. Must accept the list on faith.
|
2007-06-07 21:19:57 +02:00
|
|
|
* Fortunately, there's then also no need to pass the data to fd.c.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2007-06-07 21:19:57 +02:00
|
|
|
if (IsTransactionState())
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
2011-04-07 06:11:01 +02:00
|
|
|
temp_tablespaces_extra *myextra;
|
2007-11-15 22:14:46 +01:00
|
|
|
Oid *tblSpcs;
|
|
|
|
int numSpcs;
|
2007-06-07 21:19:57 +02:00
|
|
|
ListCell *l;
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* temporary workspace until we are done verifying the list */
|
|
|
|
tblSpcs = (Oid *) palloc(list_length(namelist) * sizeof(Oid));
|
2007-06-07 21:19:57 +02:00
|
|
|
numSpcs = 0;
|
2007-06-03 19:08:34 +02:00
|
|
|
foreach(l, namelist)
|
|
|
|
{
|
|
|
|
char *curname = (char *) lfirst(l);
|
2007-06-07 21:19:57 +02:00
|
|
|
Oid curoid;
|
|
|
|
AclResult aclresult;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Allow an empty string (signifying database default) */
|
|
|
|
if (curname[0] == '\0')
|
2007-06-07 21:19:57 +02:00
|
|
|
{
|
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
2007-06-03 19:08:34 +02:00
|
|
|
continue;
|
2007-06-07 21:19:57 +02:00
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2010-08-05 16:45:09 +02:00
|
|
|
/*
|
2013-09-04 00:56:22 +02:00
|
|
|
* In an interactive SET command, we ereport for bad info. When
|
|
|
|
* source == PGC_S_TEST, don't throw a hard error for a
|
|
|
|
* nonexistent tablespace, only a NOTICE. See comments in guc.h.
|
2010-08-05 16:45:09 +02:00
|
|
|
*/
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
curoid = get_tablespace_oid(curname, source <= PGC_S_TEST);
|
2007-06-07 21:19:57 +02:00
|
|
|
if (curoid == InvalidOid)
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
{
|
|
|
|
if (source == PGC_S_TEST)
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
curname)));
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
}
|
2007-06-07 21:19:57 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allow explicit specification of database's default tablespace
|
|
|
|
* in temp_tablespaces without triggering permissions checks.
|
|
|
|
*/
|
|
|
|
if (curoid == MyDatabaseTableSpace)
|
|
|
|
{
|
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* Check permissions, similarly complaining only if interactive */
|
2007-06-07 21:19:57 +02:00
|
|
|
aclresult = pg_tablespace_aclcheck(curoid, GetUserId(),
|
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
{
|
|
|
|
if (source >= PGC_S_INTERACTIVE)
|
|
|
|
aclcheck_error(aclresult, ACL_KIND_TABLESPACE, curname);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
tblSpcs[numSpcs++] = curoid;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
2007-06-07 21:19:57 +02:00
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* Now prepare an "extra" struct for assign_temp_tablespaces */
|
|
|
|
myextra = malloc(offsetof(temp_tablespaces_extra, tblSpcs) +
|
|
|
|
numSpcs * sizeof(Oid));
|
|
|
|
if (!myextra)
|
|
|
|
return false;
|
|
|
|
myextra->numSpcs = numSpcs;
|
|
|
|
memcpy(myextra->tblSpcs, tblSpcs, numSpcs * sizeof(Oid));
|
|
|
|
*extra = (void *) myextra;
|
|
|
|
|
|
|
|
pfree(tblSpcs);
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* assign_hook: do extra actions as needed */
|
|
|
|
void
|
|
|
|
assign_temp_tablespaces(const char *newval, void *extra)
|
|
|
|
{
|
|
|
|
temp_tablespaces_extra *myextra = (temp_tablespaces_extra *) extra;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If check_temp_tablespaces was executed inside a transaction, then pass
|
|
|
|
* the list it made to fd.c. Otherwise, clear fd.c's list; we must be
|
|
|
|
* still outside a transaction, or else restoring during transaction exit,
|
|
|
|
* and in either case we can just let the next PrepareTempTablespaces call
|
|
|
|
* make things sane.
|
|
|
|
*/
|
|
|
|
if (myextra)
|
|
|
|
SetTempTablespaces(myextra->tblSpcs, myextra->numSpcs);
|
|
|
|
else
|
|
|
|
SetTempTablespaces(NULL, 0);
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-06-07 21:19:57 +02:00
|
|
|
* PrepareTempTablespaces -- prepare to use temp tablespaces
|
2007-06-03 19:08:34 +02:00
|
|
|
*
|
2007-06-07 21:19:57 +02:00
|
|
|
* If we have not already done so in the current transaction, parse the
|
|
|
|
* temp_tablespaces GUC variable and tell fd.c which tablespace(s) to use
|
|
|
|
* for temp files.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2007-06-07 21:19:57 +02:00
|
|
|
void
|
|
|
|
PrepareTempTablespaces(void)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
|
|
|
char *rawname;
|
|
|
|
List *namelist;
|
2007-06-07 21:19:57 +02:00
|
|
|
Oid *tblSpcs;
|
|
|
|
int numSpcs;
|
|
|
|
ListCell *l;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* No work if already done in current transaction */
|
|
|
|
if (TempTablespacesAreSet())
|
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Can't do catalog access unless within a transaction. This is just a
|
|
|
|
* safety check in case this function is called by low-level code that
|
|
|
|
* could conceivably execute outside a transaction. Note that in such a
|
|
|
|
* scenario, fd.c will fall back to using the current database's default
|
2007-06-07 21:19:57 +02:00
|
|
|
* tablespace, which should always be OK.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2007-06-07 21:19:57 +02:00
|
|
|
if (!IsTransactionState())
|
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Need a modifiable copy of string */
|
|
|
|
rawname = pstrdup(temp_tablespaces);
|
|
|
|
|
|
|
|
/* Parse string into list of identifiers */
|
|
|
|
if (!SplitIdentifierString(rawname, ',', &namelist))
|
|
|
|
{
|
|
|
|
/* syntax error in name list */
|
2007-06-07 21:19:57 +02:00
|
|
|
SetTempTablespaces(NULL, 0);
|
2007-06-03 19:08:34 +02:00
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
2007-06-07 21:19:57 +02:00
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Store tablespace OIDs in an array in TopTransactionContext */
|
|
|
|
tblSpcs = (Oid *) MemoryContextAlloc(TopTransactionContext,
|
2007-11-15 22:14:46 +01:00
|
|
|
list_length(namelist) * sizeof(Oid));
|
2007-06-07 21:19:57 +02:00
|
|
|
numSpcs = 0;
|
|
|
|
foreach(l, namelist)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
2007-06-07 21:19:57 +02:00
|
|
|
char *curname = (char *) lfirst(l);
|
|
|
|
Oid curoid;
|
|
|
|
AclResult aclresult;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Allow an empty string (signifying database default) */
|
|
|
|
if (curname[0] == '\0')
|
|
|
|
{
|
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
|
|
|
continue;
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Else verify that name is a valid tablespace name */
|
2010-08-05 16:45:09 +02:00
|
|
|
curoid = get_tablespace_oid(curname, true);
|
2007-06-07 21:19:57 +02:00
|
|
|
if (curoid == InvalidOid)
|
|
|
|
{
|
2010-08-05 16:45:09 +02:00
|
|
|
/* Skip any bad list elements */
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/*
|
2007-11-15 22:14:46 +01:00
|
|
|
* Allow explicit specification of database's default tablespace in
|
|
|
|
* temp_tablespaces without triggering permissions checks.
|
2007-06-07 21:19:57 +02:00
|
|
|
*/
|
|
|
|
if (curoid == MyDatabaseTableSpace)
|
|
|
|
{
|
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check permissions similarly */
|
|
|
|
aclresult = pg_tablespace_aclcheck(curoid, GetUserId(),
|
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
tblSpcs[numSpcs++] = curoid;
|
|
|
|
}
|
|
|
|
|
|
|
|
SetTempTablespaces(tblSpcs, numSpcs);
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* get_tablespace_oid - given a tablespace name, look up the OID
|
|
|
|
*
|
2010-08-05 16:45:09 +02:00
|
|
|
* If missing_ok is false, throw an error if tablespace name not found. If
|
|
|
|
* true, just return InvalidOid.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
|
|
|
Oid
|
2010-08-05 16:45:09 +02:00
|
|
|
get_tablespace_oid(const char *tablespacename, bool missing_ok)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
Oid result;
|
|
|
|
Relation rel;
|
|
|
|
HeapScanDesc scandesc;
|
|
|
|
HeapTuple tuple;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Search pg_tablespace. We use a heapscan here even though there is an
|
|
|
|
* index on name, on the theory that pg_tablespace will usually have just
|
|
|
|
* a few entries and so an indexed lookup is a waste of effort.
|
|
|
|
*/
|
2005-04-14 22:03:27 +02:00
|
|
|
rel = heap_open(TableSpaceRelationId, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(tablespacename));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scandesc = heap_beginscan_catalog(rel, 1, entry);
|
2004-11-05 20:17:13 +01:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/* We assume that there can be at most one matching tuple */
|
2004-11-05 20:17:13 +01:00
|
|
|
if (HeapTupleIsValid(tuple))
|
|
|
|
result = HeapTupleGetOid(tuple);
|
|
|
|
else
|
|
|
|
result = InvalidOid;
|
|
|
|
|
|
|
|
heap_endscan(scandesc);
|
|
|
|
heap_close(rel, AccessShareLock);
|
|
|
|
|
2010-08-05 16:45:09 +02:00
|
|
|
if (!OidIsValid(result) && !missing_ok)
|
2011-04-10 17:42:00 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
tablespacename)));
|
2010-08-05 16:45:09 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* get_tablespace_name - given a tablespace OID, look up the name
|
|
|
|
*
|
|
|
|
* Returns a palloc'd string, or NULL if no such tablespace.
|
|
|
|
*/
|
|
|
|
char *
|
|
|
|
get_tablespace_name(Oid spc_oid)
|
|
|
|
{
|
|
|
|
char *result;
|
|
|
|
Relation rel;
|
|
|
|
HeapScanDesc scandesc;
|
|
|
|
HeapTuple tuple;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Search pg_tablespace. We use a heapscan here even though there is an
|
2007-11-15 22:14:46 +01:00
|
|
|
* index on oid, on the theory that pg_tablespace will usually have just a
|
|
|
|
* few entries and so an indexed lookup is a waste of effort.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2005-04-14 22:03:27 +02:00
|
|
|
rel = heap_open(TableSpaceRelationId, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
ObjectIdAttributeNumber,
|
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(spc_oid));
|
Use an MVCC snapshot, rather than SnapshotNow, for catalog scans.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
2013-07-02 15:47:01 +02:00
|
|
|
scandesc = heap_beginscan_catalog(rel, 1, entry);
|
2004-11-05 20:17:13 +01:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
|
|
|
/* We assume that there can be at most one matching tuple */
|
|
|
|
if (HeapTupleIsValid(tuple))
|
|
|
|
result = pstrdup(NameStr(((Form_pg_tablespace) GETSTRUCT(tuple))->spcname));
|
|
|
|
else
|
|
|
|
result = NULL;
|
|
|
|
|
|
|
|
heap_endscan(scandesc);
|
|
|
|
heap_close(rel, AccessShareLock);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
|
|
|
* TABLESPACE resource manager's routines
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
tblspc_redo(XLogRecPtr lsn, XLogRecord *record)
|
|
|
|
{
|
|
|
|
uint8 info = record->xl_info & ~XLR_INFO_MASK;
|
|
|
|
|
2009-01-20 19:59:37 +01:00
|
|
|
/* Backup blocks are not used in tblspc records */
|
|
|
|
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
if (info == XLOG_TBLSPC_CREATE)
|
|
|
|
{
|
|
|
|
xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
|
|
|
|
char *location = xlrec->ts_path;
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
create_tablespace_directories(location, xlrec->ts_id);
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
else if (info == XLOG_TBLSPC_DROP)
|
|
|
|
{
|
|
|
|
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
|
|
|
|
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
/*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* If we issued a WAL record for a drop tablespace it implies that
|
|
|
|
* there were no files in it at all when the DROP was done. That means
|
|
|
|
* that no permanent objects can exist in it at this point.
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*
|
2010-02-26 03:01:40 +01:00
|
|
|
* It is possible for standby users to be using this tablespace as a
|
|
|
|
* location for their temporary files, so if we fail to remove all
|
|
|
|
* files then do conflict processing and try again, if currently
|
|
|
|
* enabled.
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
*
|
2012-06-10 21:20:04 +02:00
|
|
|
* Other possible reasons for failure include bollixed file
|
|
|
|
* permissions on a standby server when they were okay on the primary,
|
|
|
|
* etc etc. There's not much we can do about that, so just remove what
|
|
|
|
* we can and press on.
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(xlrec->ts_id, true))
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
{
|
2010-01-14 12:08:02 +01:00
|
|
|
ResolveRecoveryConflictWithTablespace(xlrec->ts_id);
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
|
|
|
|
/*
|
2010-02-26 03:01:40 +01:00
|
|
|
* If we did recovery processing then hopefully the backends who
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* wrote temp files should have cleaned up and exited by now. So
|
|
|
|
* retry before complaining. If we fail again, this is just a LOG
|
|
|
|
* condition, because it's not worth throwing an ERROR for (as
|
|
|
|
* that would crash the database and require manual intervention
|
|
|
|
* before we could get past this WAL record on restart).
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(xlrec->ts_id, true))
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
ereport(LOG,
|
2010-02-26 03:01:40 +01:00
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
2012-06-10 21:20:04 +02:00
|
|
|
errmsg("directories for tablespace %u could not be removed",
|
|
|
|
xlrec->ts_id),
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
errhint("You can remove the directories manually if necessary.")));
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
elog(PANIC, "tblspc_redo: unknown op code %u", info);
|
|
|
|
}
|