2004-06-18 08:14:31 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* tablespace.c
|
|
|
|
* Commands to manipulate table spaces
|
|
|
|
*
|
|
|
|
* Tablespaces in PostgreSQL are designed to allow users to determine
|
|
|
|
* where the data file(s) for a given database object reside on the file
|
|
|
|
* system.
|
|
|
|
*
|
|
|
|
* A tablespace represents a directory on the file system. At tablespace
|
|
|
|
* creation time, the directory must be empty. To simplify things and
|
|
|
|
* remove the possibility of having file name conflicts, we isolate
|
|
|
|
* files within a tablespace into database-specific subdirectories.
|
|
|
|
*
|
|
|
|
* To support file access via the information given in RelFileNode, we
|
2004-06-21 03:04:45 +02:00
|
|
|
* maintain a symbolic-link map in $PGDATA/pg_tblspc. The symlinks are
|
2004-06-18 08:14:31 +02:00
|
|
|
* named by tablespace OIDs and point to the actual tablespace directories.
|
2010-01-12 03:42:52 +01:00
|
|
|
* There is also a per-cluster version directory in each tablespace.
|
2004-06-18 08:14:31 +02:00
|
|
|
* Thus the full path to an arbitrary file is
|
2010-01-12 03:42:52 +01:00
|
|
|
* $PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenode
|
|
|
|
* e.g.
|
2010-02-17 05:19:41 +01:00
|
|
|
* $PGDATA/pg_tblspc/20981/PG_9.0_201002161/719849/83292814
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
2004-06-21 06:06:07 +02:00
|
|
|
* There are two tablespaces created at initdb time: pg_global (for shared
|
|
|
|
* tables) and pg_default (for everything else). For backwards compatibility
|
2004-06-18 08:14:31 +02:00
|
|
|
* and to remain functional on platforms without symlinks, these tablespaces
|
|
|
|
* are accessed specially: they are respectively
|
|
|
|
* $PGDATA/global/relfilenode
|
|
|
|
* $PGDATA/base/dboid/relfilenode
|
|
|
|
*
|
|
|
|
* To allow CREATE DATABASE to give a new database a default tablespace
|
|
|
|
* that's different from the template database's default, we make the
|
|
|
|
* provision that a zero in pg_class.reltablespace means the database's
|
|
|
|
* default tablespace. Without this, CREATE DATABASE would have to go in
|
2004-11-05 20:17:13 +01:00
|
|
|
* and munge the system catalogs of the new database.
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
|
|
|
*
|
2022-01-08 01:04:57 +01:00
|
|
|
* Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
|
2004-06-18 08:14:31 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/commands/tablespace.c
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
|
|
|
#include <unistd.h>
|
|
|
|
#include <dirent.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
|
|
|
|
#include "access/heapam.h"
|
2012-08-30 22:15:44 +02:00
|
|
|
#include "access/htup_details.h"
|
2019-11-12 04:00:16 +01:00
|
|
|
#include "access/reloptions.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
#include "access/sysattr.h"
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
#include "access/tableam.h"
|
2006-07-13 18:49:20 +02:00
|
|
|
#include "access/xact.h"
|
2014-11-06 12:52:08 +01:00
|
|
|
#include "access/xloginsert.h"
|
2021-07-31 08:50:26 +02:00
|
|
|
#include "access/xlogutils.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/catalog.h"
|
2005-07-07 22:40:02 +02:00
|
|
|
#include "catalog/dependency.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/indexing.h"
|
2014-01-19 00:56:40 +01:00
|
|
|
#include "catalog/namespace.h"
|
2010-11-25 17:48:49 +01:00
|
|
|
#include "catalog/objectaccess.h"
|
2014-01-19 00:56:40 +01:00
|
|
|
#include "catalog/pg_namespace.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "catalog/pg_tablespace.h"
|
2006-02-12 04:22:21 +01:00
|
|
|
#include "commands/comment.h"
|
2011-07-20 19:18:24 +02:00
|
|
|
#include "commands/seclabel.h"
|
2014-01-19 00:56:40 +01:00
|
|
|
#include "commands/tablecmds.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "commands/tablespace.h"
|
2018-04-07 23:45:39 +02:00
|
|
|
#include "common/file_perm.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "miscadmin.h"
|
2007-11-15 21:36:40 +01:00
|
|
|
#include "postmaster/bgwriter.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "storage/fd.h"
|
2014-01-19 00:56:40 +01:00
|
|
|
#include "storage/lmgr.h"
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
#include "storage/standby.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
#include "utils/acl.h"
|
|
|
|
#include "utils/builtins.h"
|
|
|
|
#include "utils/fmgroids.h"
|
2011-09-04 07:13:16 +02:00
|
|
|
#include "utils/guc.h"
|
2014-01-19 00:56:40 +01:00
|
|
|
#include "utils/lsyscache.h"
|
2007-06-07 21:19:57 +02:00
|
|
|
#include "utils/memutils.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/rel.h"
|
2017-01-21 02:29:53 +01:00
|
|
|
#include "utils/varlena.h"
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/* GUC variables */
|
2004-11-05 20:17:13 +01:00
|
|
|
char *default_tablespace = NULL;
|
2007-06-03 19:08:34 +02:00
|
|
|
char *temp_tablespaces = NULL;
|
2022-01-14 09:27:44 +01:00
|
|
|
bool allow_in_place_tablespaces = false;
|
2004-11-05 20:17:13 +01:00
|
|
|
|
pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Currently, database OIDs, relfilenodes, and tablespace OIDs can all
change when a cluster is upgraded using pg_upgrade. It seems better
to preserve them, because (1) it makes troubleshooting pg_upgrade
easier, since you don't have to do a lot of work to match up files
in the old and new clusters, (2) it allows 'rsync' to save bandwidth
when used to re-sync a cluster after an upgrade, and (3) if we ever
encrypt or sign blocks, we would likely want to use a nonce that
depends on these values.
This patch only arranges to preserve relfilenodes and tablespace
OIDs. The task of preserving database OIDs is left for another patch,
since it involves some complexities that don't exist in these cases.
Database OIDs have a similar issue, but there are some tricky points
in that case that do not apply to these cases, so that problem is left
for another patch.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
2022-01-17 19:32:44 +01:00
|
|
|
Oid binary_upgrade_next_pg_tablespace_oid = InvalidOid;
|
2004-11-05 20:17:13 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
static void create_tablespace_directories(const char *location,
|
|
|
|
const Oid tablespaceoid);
|
|
|
|
static bool destroy_tablespace_directories(Oid tablespaceoid, bool redo);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Each database using a table space is isolated into its own name space
|
|
|
|
* by a subdirectory named for the database OID. On first creation of an
|
|
|
|
* object in the tablespace, create the subdirectory. If the subdirectory
|
2010-01-07 05:05:39 +01:00
|
|
|
* already exists, fall through quietly.
|
2004-06-18 08:14:31 +02:00
|
|
|
*
|
2006-01-19 05:45:38 +01:00
|
|
|
* isRedo indicates that we are creating an object during WAL replay.
|
|
|
|
* In this case we will cope with the possibility of the tablespace
|
|
|
|
* directory not being there either --- this could happen if we are
|
2004-08-29 23:08:48 +02:00
|
|
|
* replaying an operation on a table in a subsequently-dropped tablespace.
|
|
|
|
* We handle this by making a directory in the place where the tablespace
|
|
|
|
* symlink would normally be. This isn't an exact replay of course, but
|
|
|
|
* it's the best we can do given the available information.
|
2006-03-29 17:15:43 +02:00
|
|
|
*
|
2010-01-07 05:10:39 +01:00
|
|
|
* If tablespaces are not supported, we still need it in case we have to
|
|
|
|
* re-create a database subdirectory (of $PGDATA/base) during WAL replay.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
|
|
|
void
|
2004-07-11 21:52:52 +02:00
|
|
|
TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
struct stat st;
|
|
|
|
char *dir;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The global tablespace doesn't have per-database subdirectories, so
|
|
|
|
* nothing to do for it.
|
|
|
|
*/
|
|
|
|
if (spcNode == GLOBALTABLESPACE_OID)
|
|
|
|
return;
|
|
|
|
|
|
|
|
Assert(OidIsValid(spcNode));
|
|
|
|
Assert(OidIsValid(dbNode));
|
|
|
|
|
|
|
|
dir = GetDatabasePath(dbNode, spcNode);
|
|
|
|
|
|
|
|
if (stat(dir, &st) < 0)
|
|
|
|
{
|
2010-01-07 05:10:39 +01:00
|
|
|
/* Directory does not exist? */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (errno == ENOENT)
|
|
|
|
{
|
|
|
|
/*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
|
|
|
|
* or TablespaceCreateDbspace is running concurrently.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2006-01-19 05:45:38 +01:00
|
|
|
LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Recheck to see if someone created the directory while we were
|
|
|
|
* waiting for lock.
|
|
|
|
*/
|
|
|
|
if (stat(dir, &st) == 0 && S_ISDIR(st.st_mode))
|
|
|
|
{
|
2010-01-07 05:10:39 +01:00
|
|
|
/* Directory was created */
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Directory creation failed? */
|
2018-04-07 23:45:39 +02:00
|
|
|
if (MakePGDirectory(dir) < 0)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
|
|
|
char *parentdir;
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* Failure other than not exists or not in WAL replay? */
|
2004-08-29 23:08:48 +02:00
|
|
|
if (errno != ENOENT || !isRedo)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
dir)));
|
2010-01-07 05:10:39 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/*
|
|
|
|
* Parent directories are missing during WAL replay, so
|
|
|
|
* continue by creating simple parent directories rather
|
|
|
|
* than a symlink.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* create two parents up if not exist */
|
2004-08-29 23:08:48 +02:00
|
|
|
parentdir = pstrdup(dir);
|
|
|
|
get_parent_directory(parentdir);
|
2010-01-12 03:42:52 +01:00
|
|
|
get_parent_directory(parentdir);
|
|
|
|
/* Can't create parent and it doesn't already exist? */
|
2018-04-07 23:45:39 +02:00
|
|
|
if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)
|
2010-01-12 03:42:52 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
parentdir)));
|
|
|
|
pfree(parentdir);
|
|
|
|
|
|
|
|
/* create one parent up if not exist */
|
|
|
|
parentdir = pstrdup(dir);
|
|
|
|
get_parent_directory(parentdir);
|
|
|
|
/* Can't create parent and it doesn't already exist? */
|
2018-04-07 23:45:39 +02:00
|
|
|
if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)
|
2004-08-29 23:08:48 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
parentdir)));
|
|
|
|
pfree(parentdir);
|
2010-01-07 05:10:39 +01:00
|
|
|
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Create database directory */
|
2018-04-07 23:45:39 +02:00
|
|
|
if (MakePGDirectory(dir) < 0)
|
2004-08-29 23:08:48 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
dir)));
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
LWLockRelease(TablespaceCreateLock);
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not stat directory \"%s\": %m", dir)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2010-01-07 05:05:39 +01:00
|
|
|
/* Is it not a directory? */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (!S_ISDIR(st.st_mode))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" exists but is not a directory",
|
|
|
|
dir)));
|
|
|
|
}
|
|
|
|
|
|
|
|
pfree(dir);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create a table space
|
|
|
|
*
|
|
|
|
* Only superusers can create a tablespace. This seems a reasonable restriction
|
|
|
|
* since we're determining the system layout and, anyway, we probably have
|
|
|
|
* root if we're doing this kind of activity
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2004-06-18 08:14:31 +02:00
|
|
|
CreateTableSpace(CreateTableSpaceStmt *stmt)
|
|
|
|
{
|
|
|
|
#ifdef HAVE_SYMLINK
|
|
|
|
Relation rel;
|
|
|
|
Datum values[Natts_pg_tablespace];
|
2008-11-02 02:45:28 +01:00
|
|
|
bool nulls[Natts_pg_tablespace];
|
2004-06-18 08:14:31 +02:00
|
|
|
HeapTuple tuple;
|
|
|
|
Oid tablespaceoid;
|
|
|
|
char *location;
|
2005-06-28 07:09:14 +02:00
|
|
|
Oid ownerId;
|
2014-01-19 02:59:31 +01:00
|
|
|
Datum newOptions;
|
2022-01-14 09:27:44 +01:00
|
|
|
bool in_place;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2021-09-08 17:02:18 +02:00
|
|
|
/* Must be superuser */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (!superuser())
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
|
|
|
|
errmsg("permission denied to create tablespace \"%s\"",
|
|
|
|
stmt->tablespacename),
|
|
|
|
errhint("Must be superuser to create a tablespace.")));
|
|
|
|
|
|
|
|
/* However, the eventual owner of the tablespace need not be */
|
|
|
|
if (stmt->owner)
|
Allow CURRENT/SESSION_USER to be used in certain commands
Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
user specifiers in place of an explicit user name.
This commit also fixes some quite ugly handling of special standards-
mandated syntax in CREATE USER MAPPING, which in particular would fail
to work in presence of a role named "current_user".
The special role specifiers PUBLIC and NONE also have more consistent
handling now.
Also take the opportunity to add location tracking to user specifiers.
Authors: Kyotaro Horiguchi. Heavily reworked by Álvaro Herrera.
Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
2015-03-09 19:41:54 +01:00
|
|
|
ownerId = get_rolespec_oid(stmt->owner, false);
|
2004-06-18 08:14:31 +02:00
|
|
|
else
|
2005-06-28 07:09:14 +02:00
|
|
|
ownerId = GetUserId();
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* Unix-ify the offered path, and strip any trailing slashes */
|
|
|
|
location = pstrdup(stmt->location);
|
|
|
|
canonicalize_path(location);
|
|
|
|
|
|
|
|
/* disallow quotes, else CREATE DATABASE would be at risk */
|
|
|
|
if (strchr(location, '\''))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_NAME),
|
Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 20:10:30 +01:00
|
|
|
errmsg("tablespace location cannot contain single quotes")));
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2022-01-14 09:27:44 +01:00
|
|
|
in_place = allow_in_place_tablespaces && strlen(location) == 0;
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
|
|
|
* Allowing relative paths seems risky
|
|
|
|
*
|
2022-01-14 09:27:44 +01:00
|
|
|
* This also helps us ensure that location is not empty or whitespace,
|
|
|
|
* unless specifying a developer-only in-place tablespace.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2022-01-14 09:27:44 +01:00
|
|
|
if (!in_place && !is_absolute_path(location))
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("tablespace location must be an absolute path")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that location isn't too long. Remember that we're going to append
|
Rationalize common/relpath.[hc].
Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that. To clean it up:
* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h. We won't consider this symbol part of the FE/BE API.
* Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.
* So, relfilenode.h now includes relpath.h rather than vice-versa. This
direction of dependency is fine. (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)
* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.
* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().
* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers. (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)
* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().
* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames. This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.
The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.
Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 23:30:50 +02:00
|
|
|
* 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. FYI, we never actually
|
2018-04-07 23:45:39 +02:00
|
|
|
* reference the whole path here, but MakePGDirectory() uses the first two
|
|
|
|
* parts.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
|
Rationalize common/relpath.[hc].
Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that. To clean it up:
* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h. We won't consider this symbol part of the FE/BE API.
* Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.
* So, relfilenode.h now includes relpath.h rather than vice-versa. This
direction of dependency is fine. (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)
* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.
* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().
* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers. (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)
* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().
* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames. This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.
The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.
Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
2014-04-30 23:30:50 +02:00
|
|
|
OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1 + OIDCHARS > MAXPGPATH)
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("tablespace location \"%s\" is too long",
|
|
|
|
location)));
|
|
|
|
|
2015-04-28 23:35:12 +02:00
|
|
|
/* Warn if the tablespace is in the data directory. */
|
|
|
|
if (path_is_prefix_of_path(DataDir, location))
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
|
|
|
|
errmsg("tablespace location should not be inside the data directory")));
|
|
|
|
|
2004-06-21 06:06:07 +02:00
|
|
|
/*
|
|
|
|
* Disallow creation of tablespaces named "pg_xxx"; we reserve this
|
|
|
|
* namespace for system purposes.
|
|
|
|
*/
|
|
|
|
if (!allowSystemTableMods && IsReservedName(stmt->tablespacename))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_RESERVED_NAME),
|
|
|
|
errmsg("unacceptable tablespace name \"%s\"",
|
|
|
|
stmt->tablespacename),
|
|
|
|
errdetail("The prefix \"pg_\" is reserved for system tablespaces.")));
|
|
|
|
|
Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.
This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule. Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests. Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.
Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles. Document that (though this
commit doesn't in itself make it any less safe than before).
Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 17:34:00 +02:00
|
|
|
/*
|
|
|
|
* If built with appropriate switch, whine when regression-testing
|
|
|
|
* conventions for tablespace names are violated.
|
|
|
|
*/
|
|
|
|
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
|
|
|
|
if (strncmp(stmt->tablespacename, "regress_", 8) != 0)
|
|
|
|
elog(WARNING, "tablespaces created by regression test cases should have names starting with \"regress_\"");
|
|
|
|
#endif
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
|
|
|
* Check that there is no other tablespace by this name. (The unique
|
|
|
|
* index would catch this anyway, but might as well give a friendlier
|
|
|
|
* message.)
|
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
if (OidIsValid(get_tablespace_oid(stmt->tablespacename, true)))
|
2004-06-18 08:14:31 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" already exists",
|
|
|
|
stmt->tablespacename)));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Insert tuple into pg_tablespace. The purpose of doing this first is to
|
|
|
|
* lock the proposed tablename against other would-be creators. The
|
|
|
|
* insertion will roll back if we find problems below.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, RowExclusiveLock);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
MemSet(nulls, false, sizeof(nulls));
|
2004-06-18 08:14:31 +02:00
|
|
|
|
pg_upgrade: Preserve relfilenodes and tablespace OIDs.
Currently, database OIDs, relfilenodes, and tablespace OIDs can all
change when a cluster is upgraded using pg_upgrade. It seems better
to preserve them, because (1) it makes troubleshooting pg_upgrade
easier, since you don't have to do a lot of work to match up files
in the old and new clusters, (2) it allows 'rsync' to save bandwidth
when used to re-sync a cluster after an upgrade, and (3) if we ever
encrypt or sign blocks, we would likely want to use a nonce that
depends on these values.
This patch only arranges to preserve relfilenodes and tablespace
OIDs. The task of preserving database OIDs is left for another patch,
since it involves some complexities that don't exist in these cases.
Database OIDs have a similar issue, but there are some tricky points
in that case that do not apply to these cases, so that problem is left
for another patch.
Shruthi KC, based on an earlier patch from Antonin Houska, reviewed
and with some adjustments by me.
Discussion: http://postgr.es/m/CA+TgmoYgTwYcUmB=e8+hRHOFA0kkS6Kde85+UNdon6q7bt1niQ@mail.gmail.com
2022-01-17 19:32:44 +01:00
|
|
|
if (IsBinaryUpgrade)
|
|
|
|
{
|
|
|
|
/* Use binary-upgrade override for tablespace oid */
|
|
|
|
if (!OidIsValid(binary_upgrade_next_pg_tablespace_oid))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("pg_tablespace OID value not set when in binary upgrade mode")));
|
|
|
|
|
|
|
|
tablespaceoid = binary_upgrade_next_pg_tablespace_oid;
|
|
|
|
binary_upgrade_next_pg_tablespace_oid = InvalidOid;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
tablespaceoid = GetNewOidWithIndex(rel, TablespaceOidIndexId,
|
|
|
|
Anum_pg_tablespace_oid);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
values[Anum_pg_tablespace_oid - 1] = ObjectIdGetDatum(tablespaceoid);
|
2004-06-18 08:14:31 +02:00
|
|
|
values[Anum_pg_tablespace_spcname - 1] =
|
|
|
|
DirectFunctionCall1(namein, CStringGetDatum(stmt->tablespacename));
|
|
|
|
values[Anum_pg_tablespace_spcowner - 1] =
|
2005-06-28 07:09:14 +02:00
|
|
|
ObjectIdGetDatum(ownerId);
|
2008-11-02 02:45:28 +01:00
|
|
|
nulls[Anum_pg_tablespace_spcacl - 1] = true;
|
2014-01-19 02:59:31 +01:00
|
|
|
|
|
|
|
/* Generate new proposed spcoptions (text array) */
|
|
|
|
newOptions = transformRelOptions((Datum) 0,
|
|
|
|
stmt->options,
|
|
|
|
NULL, NULL, false, false);
|
|
|
|
(void) tablespace_reloptions(newOptions, true);
|
|
|
|
if (newOptions != (Datum) 0)
|
|
|
|
values[Anum_pg_tablespace_spcoptions - 1] = newOptions;
|
|
|
|
else
|
|
|
|
nulls[Anum_pg_tablespace_spcoptions - 1] = true;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2008-11-02 02:45:28 +01:00
|
|
|
tuple = heap_form_tuple(rel->rd_att, values, nulls);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
CatalogTupleInsert(rel, tuple);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
heap_freetuple(tuple);
|
|
|
|
|
2005-07-07 22:40:02 +02:00
|
|
|
/* Record dependency on owner */
|
|
|
|
recordDependencyOnOwner(TableSpaceRelationId, tablespaceoid, ownerId);
|
|
|
|
|
2010-11-25 17:48:49 +01:00
|
|
|
/* Post creation hook for new tablespace */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectPostCreateHook(TableSpaceRelationId, tablespaceoid, 0);
|
2010-11-25 17:48:49 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
create_tablespace_directories(location, tablespaceoid);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_tblspc_create_rec xlrec;
|
|
|
|
|
|
|
|
xlrec.ts_id = tablespaceoid;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) &xlrec,
|
|
|
|
offsetof(xl_tblspc_create_rec, ts_path));
|
|
|
|
XLogRegisterData((char *) location, strlen(location) + 1);
|
2004-08-29 23:08:48 +02:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
(void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_CREATE);
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
/*
|
|
|
|
* Force synchronous commit, to minimize the window between creating the
|
|
|
|
* symlink on-disk and marking the transaction committed. It's not great
|
|
|
|
* that there is any window at all, but definitely we don't want to make
|
|
|
|
* it larger than necessary.
|
|
|
|
*/
|
|
|
|
ForceSyncCommit();
|
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
pfree(location);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* We keep the lock on pg_tablespace until commit */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2015-09-05 22:15:38 +02:00
|
|
|
|
|
|
|
return tablespaceoid;
|
2004-06-18 08:14:31 +02:00
|
|
|
#else /* !HAVE_SYMLINK */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("tablespaces are not supported on this platform")));
|
2015-09-05 22:15:38 +02:00
|
|
|
return InvalidOid; /* keep compiler quiet */
|
2004-06-18 08:14:31 +02:00
|
|
|
#endif /* HAVE_SYMLINK */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Drop a table space
|
|
|
|
*
|
|
|
|
* Be careful to check that the tablespace is empty.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
DropTableSpace(DropTableSpaceStmt *stmt)
|
|
|
|
{
|
|
|
|
#ifdef HAVE_SYMLINK
|
|
|
|
char *tablespacename = stmt->tablespacename;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scandesc;
|
2004-06-18 08:14:31 +02:00
|
|
|
Relation rel;
|
|
|
|
HeapTuple tuple;
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Form_pg_tablespace spcform;
|
2004-06-18 08:14:31 +02:00
|
|
|
ScanKeyData entry[1];
|
|
|
|
Oid tablespaceoid;
|
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025dfe in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems. One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.
Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.
Backpatch this to 11, where this problem has been latent all along. We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.
Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more
correct.
It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace. Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.
This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 19:32:14 +01:00
|
|
|
char *detail;
|
|
|
|
char *detail_log;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the target tuple
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, RowExclusiveLock);
|
2006-01-19 05:45:38 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(tablespacename));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scandesc = table_beginscan_catalog(rel, 1, entry);
|
2004-06-18 08:14:31 +02:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
|
|
|
if (!HeapTupleIsValid(tuple))
|
2006-06-16 22:23:45 +02:00
|
|
|
{
|
|
|
|
if (!stmt->missing_ok)
|
|
|
|
{
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
tablespacename)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
2006-10-03 23:21:36 +02:00
|
|
|
(errmsg("tablespace \"%s\" does not exist, skipping",
|
2006-06-16 22:23:45 +02:00
|
|
|
tablespacename)));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scandesc);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2006-06-16 22:23:45 +02:00
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
spcform = (Form_pg_tablespace) GETSTRUCT(tuple);
|
|
|
|
tablespaceoid = spcform->oid;
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2005-06-28 07:09:14 +02:00
|
|
|
/* Must be tablespace owner */
|
|
|
|
if (!pg_tablespace_ownercheck(tablespaceoid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_TABLESPACE,
|
2004-06-18 08:14:31 +02:00
|
|
|
tablespacename);
|
|
|
|
|
|
|
|
/* Disallow drop of the standard tablespaces, even by superuser */
|
2021-07-15 17:41:47 +02:00
|
|
|
if (IsPinnedObject(TableSpaceRelationId, tablespaceoid))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NO_PRIV, OBJECT_TABLESPACE,
|
2004-06-18 08:14:31 +02:00
|
|
|
tablespacename);
|
|
|
|
|
Prevent drop of tablespaces used by partitioned relations
When a tablespace is used in a partitioned relation (per commits
ca4103025dfe in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems. One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.
Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.
Backpatch this to 11, where this problem has been latent all along. We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.
Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more
correct.
It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace. Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.
This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2021-01-14 19:32:14 +01:00
|
|
|
/* Check for pg_shdepend entries depending on this tablespace */
|
|
|
|
if (checkSharedDependencies(TableSpaceRelationId, tablespaceoid,
|
|
|
|
&detail, &detail_log))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST),
|
|
|
|
errmsg("tablespace \"%s\" cannot be dropped because some objects depend on it",
|
|
|
|
tablespacename),
|
|
|
|
errdetail_internal("%s", detail),
|
|
|
|
errdetail_log("%s", detail_log)));
|
|
|
|
|
2012-03-09 20:34:56 +01:00
|
|
|
/* DROP hook for the tablespace being removed */
|
2013-03-07 02:52:06 +01:00
|
|
|
InvokeObjectDropHook(TableSpaceRelationId, tablespaceoid, 0);
|
2012-03-09 20:34:56 +01:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
|
|
|
* Remove the pg_tablespace tuple (this will roll back if we fail below)
|
|
|
|
*/
|
2017-02-01 22:13:30 +01:00
|
|
|
CatalogTupleDelete(rel, &tuple->t_self);
|
2004-08-29 23:08:48 +02:00
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scandesc);
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2006-02-12 04:22:21 +01:00
|
|
|
/*
|
2011-07-20 19:18:24 +02:00
|
|
|
* Remove any comments or security labels on this tablespace.
|
2006-02-12 04:22:21 +01:00
|
|
|
*/
|
|
|
|
DeleteSharedComments(tablespaceoid, TableSpaceRelationId);
|
2011-07-20 19:18:24 +02:00
|
|
|
DeleteSharedSecurityLabel(tablespaceoid, TableSpaceRelationId);
|
2006-02-12 04:22:21 +01:00
|
|
|
|
2005-08-30 03:08:47 +02:00
|
|
|
/*
|
|
|
|
* Remove dependency on owner.
|
|
|
|
*/
|
2009-01-22 21:16:10 +01:00
|
|
|
deleteSharedDependencyRecordsFor(TableSpaceRelationId, tablespaceoid, 0);
|
2005-08-30 03:08:47 +02:00
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
/*
|
|
|
|
* Acquire TablespaceCreateLock to ensure that no TablespaceCreateDbspace
|
|
|
|
* is running concurrently.
|
|
|
|
*/
|
|
|
|
LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
2007-11-15 21:36:40 +01:00
|
|
|
* Try to remove the physical infrastructure.
|
2004-08-29 23:08:48 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(tablespaceoid, false))
|
2007-11-15 21:36:40 +01:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Not all files deleted? However, there can be lingering empty files
|
|
|
|
* in the directories, left behind by for example DROP TABLE, that
|
|
|
|
* have been scheduled for deletion at next checkpoint (see comments
|
|
|
|
* in mdunlink() for details). We could just delete them immediately,
|
|
|
|
* but we can't tell them apart from important data files that we
|
|
|
|
* mustn't delete. So instead, we force a checkpoint which will clean
|
|
|
|
* out any lingering files, and try again.
|
2022-02-11 22:21:23 +01:00
|
|
|
*/
|
|
|
|
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On Windows, an unlinked file persists in the directory listing
|
2014-11-12 13:33:17 +01:00
|
|
|
* until no process retains an open handle for the file. The DDL
|
|
|
|
* commands that schedule files for unlink send invalidation messages
|
2022-02-11 22:21:23 +01:00
|
|
|
* directing other PostgreSQL processes to close the files, but
|
|
|
|
* nothing guarantees they'll be processed in time. So, we'll also
|
|
|
|
* use a global barrier to ask all backends to close all files, and
|
|
|
|
* wait until they're finished.
|
2007-11-15 21:36:40 +01:00
|
|
|
*/
|
2022-02-11 22:21:23 +01:00
|
|
|
LWLockRelease(TablespaceCreateLock);
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
|
2022-05-07 05:19:52 +02:00
|
|
|
|
2022-02-11 22:21:23 +01:00
|
|
|
/* And now try again. */
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(tablespaceoid, false))
|
2007-11-15 21:36:40 +01:00
|
|
|
{
|
|
|
|
/* Still not empty, the files must be important then */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("tablespace \"%s\" is not empty",
|
|
|
|
tablespacename)));
|
|
|
|
}
|
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
|
|
|
|
/* Record the filesystem change in XLOG */
|
|
|
|
{
|
|
|
|
xl_tblspc_drop_rec xlrec;
|
|
|
|
|
|
|
|
xlrec.ts_id = tablespaceoid;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(xl_tblspc_drop_rec));
|
|
|
|
|
|
|
|
(void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_DROP);
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
|
2006-01-19 05:45:38 +01:00
|
|
|
/*
|
2006-03-29 23:17:39 +02:00
|
|
|
* Note: because we checked that the tablespace was empty, there should be
|
|
|
|
* no need to worry about flushing shared buffers or free space map
|
|
|
|
* entries for relations in the tablespace.
|
|
|
|
*/
|
|
|
|
|
2007-08-02 00:45:09 +02:00
|
|
|
/*
|
|
|
|
* Force synchronous commit, to minimize the window between removing the
|
|
|
|
* files on-disk and marking the transaction committed. It's not great
|
|
|
|
* that there is any window at all, but definitely we don't want to make
|
|
|
|
* it larger than necessary.
|
|
|
|
*/
|
|
|
|
ForceSyncCommit();
|
|
|
|
|
2006-03-29 23:17:39 +02:00
|
|
|
/*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Allow TablespaceCreateDbspace again.
|
|
|
|
*/
|
|
|
|
LWLockRelease(TablespaceCreateLock);
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/* We keep the lock on pg_tablespace until commit */
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2004-08-29 23:08:48 +02:00
|
|
|
#else /* !HAVE_SYMLINK */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("tablespaces are not supported on this platform")));
|
|
|
|
#endif /* HAVE_SYMLINK */
|
|
|
|
}
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
2010-01-12 03:42:52 +01:00
|
|
|
* create_tablespace_directories
|
2004-08-29 23:08:48 +02:00
|
|
|
*
|
2010-01-12 03:42:52 +01:00
|
|
|
* Attempt to create filesystem infrastructure linking $PGDATA/pg_tblspc/
|
|
|
|
* to the specified directory
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
create_tablespace_directories(const char *location, const Oid tablespaceoid)
|
|
|
|
{
|
2013-10-13 06:09:18 +02:00
|
|
|
char *linkloc;
|
|
|
|
char *location_with_version_dir;
|
2014-04-05 05:09:35 +02:00
|
|
|
struct stat st;
|
2022-01-14 09:27:44 +01:00
|
|
|
bool in_place;
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
|
2022-01-14 09:27:44 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're asked to make an 'in place' tablespace, create the directory
|
|
|
|
* directly where the symlink would normally go. This is a developer-only
|
|
|
|
* option for now, to facilitate regression testing.
|
|
|
|
*/
|
|
|
|
in_place = strlen(location) == 0;
|
|
|
|
|
|
|
|
if (in_place)
|
|
|
|
{
|
|
|
|
if (MakePGDirectory(linkloc) < 0 && errno != EEXIST)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
}
|
|
|
|
|
|
|
|
location_with_version_dir = psprintf("%s/%s", in_place ? linkloc : location,
|
2010-01-12 03:42:52 +01:00
|
|
|
TABLESPACE_VERSION_DIRECTORY);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Attempt to coerce target directory to safe permissions. If this fails,
|
2022-01-14 09:27:44 +01:00
|
|
|
* it doesn't exist or has the wrong owner. Not needed for in-place mode,
|
|
|
|
* because in that case we created the directory with the desired
|
|
|
|
* permissions.
|
2010-01-12 03:42:52 +01:00
|
|
|
*/
|
2022-01-14 09:27:44 +01:00
|
|
|
if (!in_place && chmod(location, pg_dir_create_mode) != 0)
|
2010-01-12 03:42:52 +01:00
|
|
|
{
|
|
|
|
if (errno == ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_FILE),
|
2010-07-02 04:44:32 +02:00
|
|
|
errmsg("directory \"%s\" does not exist", location),
|
2010-07-18 06:47:46 +02:00
|
|
|
InRecovery ? errhint("Create this directory for the tablespace before "
|
|
|
|
"restarting the server.") : 0));
|
2010-01-12 03:42:52 +01:00
|
|
|
else
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not set permissions on directory \"%s\": %m",
|
|
|
|
location)));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The creation of the version directory prevents more than one tablespace
|
2021-08-28 08:33:23 +02:00
|
|
|
* in a single location. This imitates TablespaceCreateDbspace(), but it
|
|
|
|
* ignores concurrency and missing parent directories. The chmod() would
|
|
|
|
* have failed in the absence of a parent. pg_tablespace_spcname_index
|
|
|
|
* prevents concurrency.
|
2010-01-12 03:42:52 +01:00
|
|
|
*/
|
2021-08-28 08:33:23 +02:00
|
|
|
if (stat(location_with_version_dir, &st) < 0)
|
2010-01-12 03:42:52 +01:00
|
|
|
{
|
2021-08-28 08:33:23 +02:00
|
|
|
if (errno != ENOENT)
|
2010-01-12 03:42:52 +01:00
|
|
|
ereport(ERROR,
|
2021-08-28 08:33:23 +02:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not stat directory \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
location_with_version_dir)));
|
2021-08-28 08:33:23 +02:00
|
|
|
else if (MakePGDirectory(location_with_version_dir) < 0)
|
2010-01-12 03:42:52 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create directory \"%s\": %m",
|
|
|
|
location_with_version_dir)));
|
|
|
|
}
|
2021-08-28 08:33:23 +02:00
|
|
|
else if (!S_ISDIR(st.st_mode))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
|
|
|
|
errmsg("\"%s\" exists but is not a directory",
|
|
|
|
location_with_version_dir)));
|
|
|
|
else if (!InRecovery)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_OBJECT_IN_USE),
|
|
|
|
errmsg("directory \"%s\" already in use as a tablespace",
|
|
|
|
location_with_version_dir)));
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2014-04-05 05:09:35 +02:00
|
|
|
/*
|
|
|
|
* In recovery, remove old symlink, in case it points to the wrong place.
|
|
|
|
*/
|
2022-01-14 09:27:44 +01:00
|
|
|
if (!in_place && InRecovery)
|
2015-06-26 21:53:13 +02:00
|
|
|
remove_tablespace_symlink(linkloc);
|
2010-11-23 21:27:50 +01:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/*
|
|
|
|
* Create the symlink under PGDATA
|
|
|
|
*/
|
2022-01-14 09:27:44 +01:00
|
|
|
if (!in_place && symlink(location, linkloc) < 0)
|
2010-01-12 03:42:52 +01:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not create symbolic link \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
|
|
|
|
pfree(linkloc);
|
|
|
|
pfree(location_with_version_dir);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* destroy_tablespace_directories
|
|
|
|
*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* Attempt to remove filesystem infrastructure for the tablespace.
|
2010-01-12 03:42:52 +01:00
|
|
|
*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* 'redo' indicates we are redoing a drop from XLOG; in that case we should
|
|
|
|
* not throw an ERROR for problems, just LOG them. The worst consequence of
|
|
|
|
* not removing files here would be failure to release some disk space, which
|
|
|
|
* does not justify throwing an error that would require manual intervention
|
|
|
|
* to get the database running again.
|
2004-08-29 23:08:48 +02:00
|
|
|
*
|
2017-08-16 06:22:32 +02:00
|
|
|
* Returns true if successful, false if some subdirectory is not empty
|
2004-08-29 23:08:48 +02:00
|
|
|
*/
|
|
|
|
static bool
|
2010-01-12 03:42:52 +01:00
|
|
|
destroy_tablespace_directories(Oid tablespaceoid, bool redo)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
char *linkloc;
|
|
|
|
char *linkloc_with_version_dir;
|
2004-08-29 23:08:48 +02:00
|
|
|
DIR *dirdesc;
|
|
|
|
struct dirent *de;
|
|
|
|
char *subfile;
|
|
|
|
struct stat st;
|
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
linkloc_with_version_dir = psprintf("pg_tblspc/%u/%s", tablespaceoid,
|
2010-01-12 03:42:52 +01:00
|
|
|
TABLESPACE_VERSION_DIRECTORY);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if the tablespace still contains any files. We try to rmdir each
|
|
|
|
* per-database directory we find in it. rmdir failure implies there are
|
|
|
|
* still files in that subdirectory, so give up. (We do not have to worry
|
|
|
|
* about undoing any already completed rmdirs, since the next attempt to
|
|
|
|
* use the tablespace from that database will simply recreate the
|
|
|
|
* subdirectory via TablespaceCreateDbspace.)
|
|
|
|
*
|
2006-01-19 05:45:38 +01:00
|
|
|
* Since we hold TablespaceCreateLock, no one else should be creating any
|
|
|
|
* fresh subdirectories in parallel. It is possible that new files are
|
|
|
|
* being created within subdirectories, though, so the rmdir call could
|
|
|
|
* fail. Worst consequence is a less friendly error message.
|
2007-03-22 20:51:44 +01:00
|
|
|
*
|
|
|
|
* If redo is true then ENOENT is a likely outcome here, and we allow it
|
|
|
|
* to pass without comment. In normal operation we still allow it, but
|
|
|
|
* with a warning. This is because even though ProcessUtility disallows
|
|
|
|
* DROP TABLESPACE in a transaction block, it's possible that a previous
|
|
|
|
* DROP failed and rolled back after removing the tablespace directories
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
* and/or symlink. We want to allow a new DROP attempt to succeed at
|
|
|
|
* removing the catalog entries (and symlink if still present), so we
|
|
|
|
* should not give a hard error here.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
dirdesc = AllocateDir(linkloc_with_version_dir);
|
2004-06-18 08:14:31 +02:00
|
|
|
if (dirdesc == NULL)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2007-03-22 20:51:44 +01:00
|
|
|
if (errno == ENOENT)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2007-03-22 20:51:44 +01:00
|
|
|
if (!redo)
|
|
|
|
ereport(WARNING,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open directory \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc_with_version_dir)));
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
/* The symlink might still exist, so go try to remove it */
|
|
|
|
goto remove_symlink;
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
else if (redo)
|
|
|
|
{
|
|
|
|
/* in redo, just log other types of error */
|
|
|
|
ereport(LOG,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not open directory \"%s\": %m",
|
|
|
|
linkloc_with_version_dir)));
|
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
return false;
|
|
|
|
}
|
2005-06-19 23:34:03 +02:00
|
|
|
/* else let ReadDir report the error */
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
while ((de = ReadDir(dirdesc, linkloc_with_version_dir)) != NULL)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
if (strcmp(de->d_name, ".") == 0 ||
|
2010-01-12 03:42:52 +01:00
|
|
|
strcmp(de->d_name, "..") == 0)
|
2004-06-18 08:14:31 +02:00
|
|
|
continue;
|
|
|
|
|
2013-10-13 06:09:18 +02:00
|
|
|
subfile = psprintf("%s/%s", linkloc_with_version_dir, de->d_name);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
|
|
|
/* This check is just to deliver a friendlier error message */
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
if (!redo && !directory_is_empty(subfile))
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
|
|
|
FreeDir(dirdesc);
|
2010-01-12 03:42:52 +01:00
|
|
|
pfree(subfile);
|
|
|
|
pfree(linkloc_with_version_dir);
|
2004-08-29 23:08:48 +02:00
|
|
|
return false;
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* remove empty directory */
|
2004-06-18 08:14:31 +02:00
|
|
|
if (rmdir(subfile) < 0)
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
ereport(redo ? LOG : ERROR,
|
2004-06-18 08:14:31 +02:00
|
|
|
(errcode_for_file_access(),
|
2007-05-31 17:13:06 +02:00
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
2004-06-18 08:14:31 +02:00
|
|
|
subfile)));
|
|
|
|
|
|
|
|
pfree(subfile);
|
|
|
|
}
|
2004-08-29 07:07:03 +02:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
FreeDir(dirdesc);
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
/* remove version directory */
|
|
|
|
if (rmdir(linkloc_with_version_dir) < 0)
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
{
|
|
|
|
ereport(redo ? LOG : ERROR,
|
2010-01-12 03:42:52 +01:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
|
|
|
linkloc_with_version_dir)));
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
return false;
|
|
|
|
}
|
2010-02-26 03:01:40 +01:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
/*
|
2010-01-12 03:42:52 +01:00
|
|
|
* Try to remove the symlink. We must however deal with the possibility
|
2004-08-29 23:08:48 +02:00
|
|
|
* that it's a directory instead of a symlink --- this could happen during
|
|
|
|
* WAL replay (see TablespaceCreateDbspace), and it is also the case on
|
2010-01-12 03:42:52 +01:00
|
|
|
* Windows where junction points lstat() as directories.
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
*
|
|
|
|
* Note: in the redo case, we'll return true if this final step fails;
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
* there's no point in retrying it. Also, ENOENT should provoke no more
|
|
|
|
* than a warning.
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-14 00:06:52 +02:00
|
|
|
remove_symlink:
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc = pstrdup(linkloc_with_version_dir);
|
|
|
|
get_parent_directory(linkloc);
|
2016-04-08 18:31:42 +02:00
|
|
|
if (lstat(linkloc, &st) < 0)
|
|
|
|
{
|
|
|
|
int saved_errno = errno;
|
|
|
|
|
|
|
|
ereport(redo ? LOG : (saved_errno == ENOENT ? WARNING : ERROR),
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not stat file \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
}
|
|
|
|
else if (S_ISDIR(st.st_mode))
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
if (rmdir(linkloc) < 0)
|
2016-04-08 18:31:42 +02:00
|
|
|
{
|
|
|
|
int saved_errno = errno;
|
|
|
|
|
|
|
|
ereport(redo ? LOG : (saved_errno == ENOENT ? WARNING : ERROR),
|
2004-08-29 23:08:48 +02:00
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc)));
|
2016-04-08 18:31:42 +02:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2015-06-26 21:53:13 +02:00
|
|
|
#ifdef S_ISLNK
|
|
|
|
else if (S_ISLNK(st.st_mode))
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
2010-01-12 03:42:52 +01:00
|
|
|
if (unlink(linkloc) < 0)
|
2014-01-30 02:03:57 +01:00
|
|
|
{
|
|
|
|
int saved_errno = errno;
|
|
|
|
|
|
|
|
ereport(redo ? LOG : (saved_errno == ENOENT ? WARNING : ERROR),
|
2004-08-29 23:08:48 +02:00
|
|
|
(errcode_for_file_access(),
|
2004-11-05 18:11:34 +01:00
|
|
|
errmsg("could not remove symbolic link \"%s\": %m",
|
2010-01-12 03:42:52 +01:00
|
|
|
linkloc)));
|
2014-01-30 02:03:57 +01:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
2015-06-26 21:53:13 +02:00
|
|
|
#endif
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Refuse to remove anything that's not a directory or symlink */
|
|
|
|
ereport(redo ? LOG : ERROR,
|
2016-04-08 18:31:42 +02:00
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
2015-11-17 03:16:42 +01:00
|
|
|
errmsg("\"%s\" is not a directory or symbolic link",
|
2015-06-26 21:53:13 +02:00
|
|
|
linkloc)));
|
|
|
|
}
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
pfree(linkloc_with_version_dir);
|
|
|
|
pfree(linkloc);
|
2004-06-18 08:14:31 +02:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
return true;
|
2004-06-18 08:14:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if a directory is empty.
|
2004-10-17 22:47:21 +02:00
|
|
|
*
|
|
|
|
* This probably belongs somewhere else, but not sure where...
|
2004-06-18 08:14:31 +02:00
|
|
|
*/
|
2004-10-17 22:47:21 +02:00
|
|
|
bool
|
2004-06-18 08:14:31 +02:00
|
|
|
directory_is_empty(const char *path)
|
|
|
|
{
|
|
|
|
DIR *dirdesc;
|
|
|
|
struct dirent *de;
|
|
|
|
|
|
|
|
dirdesc = AllocateDir(path);
|
|
|
|
|
2005-06-19 23:34:03 +02:00
|
|
|
while ((de = ReadDir(dirdesc, path)) != NULL)
|
2004-06-18 08:14:31 +02:00
|
|
|
{
|
|
|
|
if (strcmp(de->d_name, ".") == 0 ||
|
|
|
|
strcmp(de->d_name, "..") == 0)
|
|
|
|
continue;
|
|
|
|
FreeDir(dirdesc);
|
|
|
|
return false;
|
|
|
|
}
|
2004-08-29 07:07:03 +02:00
|
|
|
|
2004-06-18 08:14:31 +02:00
|
|
|
FreeDir(dirdesc);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-06-26 21:53:13 +02:00
|
|
|
/*
|
|
|
|
* remove_tablespace_symlink
|
|
|
|
*
|
|
|
|
* This function removes symlinks in pg_tblspc. On Windows, junction points
|
|
|
|
* act like directories so we must be able to apply rmdir. This function
|
|
|
|
* works like the symlink removal code in destroy_tablespace_directories,
|
|
|
|
* except that failure to remove is always an ERROR. But if the file doesn't
|
|
|
|
* exist at all, that's OK.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
remove_tablespace_symlink(const char *linkloc)
|
|
|
|
{
|
|
|
|
struct stat st;
|
|
|
|
|
2016-04-08 18:31:42 +02:00
|
|
|
if (lstat(linkloc, &st) < 0)
|
2015-06-26 21:53:13 +02:00
|
|
|
{
|
|
|
|
if (errno == ENOENT)
|
|
|
|
return;
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2015-10-29 01:23:53 +01:00
|
|
|
errmsg("could not stat file \"%s\": %m", linkloc)));
|
2015-06-26 21:53:13 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (S_ISDIR(st.st_mode))
|
|
|
|
{
|
|
|
|
/*
|
2016-04-08 18:31:42 +02:00
|
|
|
* This will fail if the directory isn't empty, but not if it's a
|
|
|
|
* junction point.
|
2015-06-26 21:53:13 +02:00
|
|
|
*/
|
2016-04-08 18:31:42 +02:00
|
|
|
if (rmdir(linkloc) < 0 && errno != ENOENT)
|
2015-06-26 21:53:13 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
|
|
|
errmsg("could not remove directory \"%s\": %m",
|
|
|
|
linkloc)));
|
|
|
|
}
|
|
|
|
#ifdef S_ISLNK
|
|
|
|
else if (S_ISLNK(st.st_mode))
|
|
|
|
{
|
|
|
|
if (unlink(linkloc) < 0 && errno != ENOENT)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2016-04-08 18:31:42 +02:00
|
|
|
errmsg("could not remove symbolic link \"%s\": %m",
|
2015-06-26 21:53:13 +02:00
|
|
|
linkloc)));
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/* Refuse to remove anything that's not a directory or symlink */
|
|
|
|
ereport(ERROR,
|
2016-04-08 18:31:42 +02:00
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
|
|
|
errmsg("\"%s\" is not a directory or symbolic link",
|
2015-06-26 21:53:13 +02:00
|
|
|
linkloc)));
|
|
|
|
}
|
|
|
|
}
|
2010-01-12 03:42:52 +01:00
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
/*
|
|
|
|
* Rename a tablespace
|
|
|
|
*/
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress
|
2004-06-25 23:55:59 +02:00
|
|
|
RenameTableSpace(const char *oldname, const char *newname)
|
|
|
|
{
|
2012-12-24 00:25:03 +01:00
|
|
|
Oid tspId;
|
2004-06-25 23:55:59 +02:00
|
|
|
Relation rel;
|
|
|
|
ScanKeyData entry[1];
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scan;
|
2004-06-25 23:55:59 +02:00
|
|
|
HeapTuple tup;
|
|
|
|
HeapTuple newtuple;
|
|
|
|
Form_pg_tablespace newform;
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddress address;
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
/* Search pg_tablespace */
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, RowExclusiveLock);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(oldname));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scan = table_beginscan_catalog(rel, 1, entry);
|
2004-06-25 23:55:59 +02:00
|
|
|
tup = heap_getnext(scan, ForwardScanDirection);
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
oldname)));
|
|
|
|
|
|
|
|
newtuple = heap_copytuple(tup);
|
|
|
|
newform = (Form_pg_tablespace) GETSTRUCT(newtuple);
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
tspId = newform->oid;
|
2004-06-25 23:55:59 +02:00
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scan);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
2005-06-28 07:09:14 +02:00
|
|
|
/* Must be owner */
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
if (!pg_tablespace_ownercheck(tspId, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NO_PRIV, OBJECT_TABLESPACE, oldname);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
/* Validate new name */
|
|
|
|
if (!allowSystemTableMods && IsReservedName(newname))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_RESERVED_NAME),
|
|
|
|
errmsg("unacceptable tablespace name \"%s\"", newname),
|
|
|
|
errdetail("The prefix \"pg_\" is reserved for system tablespaces.")));
|
|
|
|
|
Add an enforcement mechanism for global object names in regression tests.
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.
This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule. Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests. Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.
Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles. Document that (though this
commit doesn't in itself make it any less safe than before).
Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
2019-06-29 17:34:00 +02:00
|
|
|
/*
|
|
|
|
* If built with appropriate switch, whine when regression-testing
|
|
|
|
* conventions for tablespace names are violated.
|
|
|
|
*/
|
|
|
|
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
|
|
|
|
if (strncmp(newname, "regress_", 8) != 0)
|
|
|
|
elog(WARNING, "tablespaces created by regression test cases should have names starting with \"regress_\"");
|
|
|
|
#endif
|
|
|
|
|
2004-06-25 23:55:59 +02:00
|
|
|
/* Make sure the new name doesn't exist */
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(newname));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scan = table_beginscan_catalog(rel, 1, entry);
|
2004-06-25 23:55:59 +02:00
|
|
|
tup = heap_getnext(scan, ForwardScanDirection);
|
|
|
|
if (HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" already exists",
|
|
|
|
newname)));
|
2004-08-29 07:07:03 +02:00
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scan);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
|
|
|
/* OK, update the entry */
|
|
|
|
namestrcpy(&(newform->spcname), newname);
|
|
|
|
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
|
2004-06-25 23:55:59 +02:00
|
|
|
|
2013-03-18 03:55:14 +01:00
|
|
|
InvokeObjectPostAlterHook(TableSpaceRelationId, tspId, 0);
|
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
ObjectAddressSet(address, TableSpaceRelationId, tspId);
|
|
|
|
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2012-12-24 00:25:03 +01:00
|
|
|
|
Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.
Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command. To wit:
* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
the new constraint
* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
schema that originally contained the object.
* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
of the object added to or dropped from the extension.
There's no user-visible change in this commit, and no functional change
either.
Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 18:10:50 +01:00
|
|
|
return address;
|
2004-06-25 23:55:59 +02:00
|
|
|
}
|
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
/*
|
|
|
|
* Alter table space options
|
|
|
|
*/
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid
|
2010-01-05 22:54:00 +01:00
|
|
|
AlterTableSpaceOptions(AlterTableSpaceOptionsStmt *stmt)
|
|
|
|
{
|
|
|
|
Relation rel;
|
|
|
|
ScanKeyData entry[1];
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scandesc;
|
2010-01-05 22:54:00 +01:00
|
|
|
HeapTuple tup;
|
2012-12-29 13:55:37 +01:00
|
|
|
Oid tablespaceoid;
|
2010-01-05 22:54:00 +01:00
|
|
|
Datum datum;
|
|
|
|
Datum newOptions;
|
|
|
|
Datum repl_val[Natts_pg_tablespace];
|
|
|
|
bool isnull;
|
|
|
|
bool repl_null[Natts_pg_tablespace];
|
|
|
|
bool repl_repl[Natts_pg_tablespace];
|
|
|
|
HeapTuple newtuple;
|
|
|
|
|
|
|
|
/* Search pg_tablespace */
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, RowExclusiveLock);
|
2010-01-05 22:54:00 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(stmt->tablespacename));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scandesc = table_beginscan_catalog(rel, 1, entry);
|
2010-01-05 22:54:00 +01:00
|
|
|
tup = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
stmt->tablespacename)));
|
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
tablespaceoid = ((Form_pg_tablespace) GETSTRUCT(tup))->oid;
|
2012-12-29 13:55:37 +01:00
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
/* Must be owner of the existing object */
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
if (!pg_tablespace_ownercheck(tablespaceoid, GetUserId()))
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_TABLESPACE,
|
2010-01-05 22:54:00 +01:00
|
|
|
stmt->tablespacename);
|
|
|
|
|
|
|
|
/* Generate new proposed spcoptions (text array) */
|
|
|
|
datum = heap_getattr(tup, Anum_pg_tablespace_spcoptions,
|
|
|
|
RelationGetDescr(rel), &isnull);
|
|
|
|
newOptions = transformRelOptions(isnull ? (Datum) 0 : datum,
|
|
|
|
stmt->options, NULL, NULL, false,
|
|
|
|
stmt->isReset);
|
|
|
|
(void) tablespace_reloptions(newOptions, true);
|
|
|
|
|
|
|
|
/* Build new tuple. */
|
|
|
|
memset(repl_null, false, sizeof(repl_null));
|
|
|
|
memset(repl_repl, false, sizeof(repl_repl));
|
|
|
|
if (newOptions != (Datum) 0)
|
|
|
|
repl_val[Anum_pg_tablespace_spcoptions - 1] = newOptions;
|
|
|
|
else
|
|
|
|
repl_null[Anum_pg_tablespace_spcoptions - 1] = true;
|
|
|
|
repl_repl[Anum_pg_tablespace_spcoptions - 1] = true;
|
|
|
|
newtuple = heap_modify_tuple(tup, RelationGetDescr(rel), repl_val,
|
|
|
|
repl_null, repl_repl);
|
|
|
|
|
|
|
|
/* Update system catalog. */
|
2017-01-31 22:42:24 +01:00
|
|
|
CatalogTupleUpdate(rel, &newtuple->t_self, newtuple);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
InvokeObjectPostAlterHook(TableSpaceRelationId, tablespaceoid, 0);
|
2013-03-18 03:55:14 +01:00
|
|
|
|
2010-01-05 22:54:00 +01:00
|
|
|
heap_freetuple(newtuple);
|
|
|
|
|
|
|
|
/* Conclude heap scan. */
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scandesc);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, NoLock);
|
2012-12-29 13:55:37 +01:00
|
|
|
|
|
|
|
return tablespaceoid;
|
2010-01-05 22:54:00 +01:00
|
|
|
}
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* Routines for handling the GUC variable 'default_tablespace'.
|
|
|
|
*/
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* check_hook: validate new default_tablespace */
|
|
|
|
bool
|
|
|
|
check_default_tablespace(char **newval, void **extra, GucSource source)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
/*
|
2019-06-11 08:20:48 +02:00
|
|
|
* If we aren't inside a transaction, or connected to a database, we
|
|
|
|
* cannot do the catalog accesses necessary to verify the name. Must
|
|
|
|
* accept the value on faith.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
2019-06-11 08:20:48 +02:00
|
|
|
if (IsTransactionState() && MyDatabaseId != InvalidOid)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
2011-04-07 06:11:01 +02:00
|
|
|
if (**newval != '\0' &&
|
|
|
|
!OidIsValid(get_tablespace_oid(*newval, true)))
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
/*
|
2013-09-04 00:56:22 +02:00
|
|
|
* When source == PGC_S_TEST, don't throw a hard error for a
|
|
|
|
* nonexistent tablespace, only a NOTICE. See comments in guc.h.
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
*/
|
|
|
|
if (source == PGC_S_TEST)
|
|
|
|
{
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
*newval)));
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
GUC_check_errdetail("Tablespace \"%s\" does not exist.",
|
|
|
|
*newval);
|
|
|
|
return false;
|
|
|
|
}
|
2004-11-05 20:17:13 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
return true;
|
2004-11-05 20:17:13 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* GetDefaultTablespace -- get the OID of the current default tablespace
|
|
|
|
*
|
2010-12-13 18:34:26 +01:00
|
|
|
* Temporary objects have different default tablespaces, hence the
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
* relpersistence parameter must be specified. Also, for partitioned tables,
|
|
|
|
* we disallow specifying the database default, so that needs to be specified
|
|
|
|
* too.
|
2007-06-03 19:08:34 +02:00
|
|
|
*
|
|
|
|
* May return InvalidOid to indicate "use the database's default tablespace".
|
|
|
|
*
|
|
|
|
* Note that caller is expected to check appropriate permissions for any
|
|
|
|
* result other than InvalidOid.
|
2004-11-05 20:17:13 +01:00
|
|
|
*
|
|
|
|
* This exists to hide (and possibly optimize the use of) the
|
|
|
|
* default_tablespace GUC variable.
|
|
|
|
*/
|
|
|
|
Oid
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
GetDefaultTablespace(char relpersistence, bool partitioned)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
Oid result;
|
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* The temp-table case is handled elsewhere */
|
2010-12-13 18:34:26 +01:00
|
|
|
if (relpersistence == RELPERSISTENCE_TEMP)
|
2007-06-07 21:19:57 +02:00
|
|
|
{
|
|
|
|
PrepareTempTablespaces();
|
|
|
|
return GetNextTempTableSpace();
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/* Fast path for default_tablespace == "" */
|
|
|
|
if (default_tablespace == NULL || default_tablespace[0] == '\0')
|
|
|
|
return InvalidOid;
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* It is tempting to cache this lookup for more speed, but then we would
|
|
|
|
* fail to detect the case where the tablespace was dropped since the GUC
|
|
|
|
* variable was set. Note also that we don't complain if the value fails
|
|
|
|
* to refer to an existing tablespace; we just silently return InvalidOid,
|
|
|
|
* causing the new object to be created in the database's tablespace.
|
|
|
|
*/
|
2010-08-05 16:45:09 +02:00
|
|
|
result = get_tablespace_oid(default_tablespace, true);
|
2005-10-15 04:49:52 +02:00
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* Allow explicit specification of database's default tablespace in
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
* default_tablespace without triggering permissions checks. Don't allow
|
|
|
|
* specifying that when creating a partitioned table, however, since the
|
|
|
|
* result is confusing.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
|
|
|
if (result == MyDatabaseTableSpace)
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
{
|
|
|
|
if (partitioned)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("cannot specify default tablespace for partitioned relations")));
|
2004-11-05 20:17:13 +01:00
|
|
|
result = InvalidOid;
|
Fix tablespace inheritance for partitioned rels
Commit ca4103025dfe left a few loose ends. The most important one
(broken pg_dump output) is already fixed by virtue of commit
3b23552ad8bb, but some things remained:
* When ALTER TABLE rewrites tables, the indexes must remain in the
tablespace they were originally in. This didn't work because
index recreation during ALTER TABLE runs manufactured SQL (yuck),
which runs afoul of default_tablespace in competition with the parent
relation tablespace. To fix, reset default_tablespace to the empty
string temporarily, and add the TABLESPACE clause as appropriate.
* Setting a partitioned rel's tablespace to the database default is
confusing; if it worked, it would direct the partitions to that
tablespace regardless of default_tablespace. But in reality it does
not work, and making it work is a larger project. Therefore, throw
an error when this condition is detected, to alert the unwary.
Add some docs and tests, too.
Author: Álvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
2019-04-25 16:20:23 +02:00
|
|
|
}
|
2004-11-05 20:17:13 +01:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Routines for handling the GUC variable 'temp_tablespaces'.
|
|
|
|
*/
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
typedef struct
|
|
|
|
{
|
2020-07-03 23:01:34 +02:00
|
|
|
/* Array of OIDs to be passed to SetTempTablespaces() */
|
2011-04-07 06:11:01 +02:00
|
|
|
int numSpcs;
|
2015-02-20 23:32:01 +01:00
|
|
|
Oid tblSpcs[FLEXIBLE_ARRAY_MEMBER];
|
2011-04-07 06:11:01 +02:00
|
|
|
} temp_tablespaces_extra;
|
|
|
|
|
|
|
|
/* check_hook: validate new temp_tablespaces */
|
|
|
|
bool
|
|
|
|
check_temp_tablespaces(char **newval, void **extra, GucSource source)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
|
|
|
char *rawname;
|
|
|
|
List *namelist;
|
|
|
|
|
|
|
|
/* Need a modifiable copy of string */
|
2011-04-07 06:11:01 +02:00
|
|
|
rawname = pstrdup(*newval);
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Parse string into list of identifiers */
|
|
|
|
if (!SplitIdentifierString(rawname, ',', &namelist))
|
|
|
|
{
|
|
|
|
/* syntax error in name list */
|
2011-04-07 06:11:01 +02:00
|
|
|
GUC_check_errdetail("List syntax is invalid.");
|
2007-06-03 19:08:34 +02:00
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
2011-04-07 06:11:01 +02:00
|
|
|
return false;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-06-11 08:20:48 +02:00
|
|
|
* If we aren't inside a transaction, or connected to a database, we
|
|
|
|
* cannot do the catalog accesses necessary to verify the name. Must
|
2007-06-07 21:19:57 +02:00
|
|
|
* accept the value on faith. Fortunately, there's then also no need to
|
|
|
|
* pass the data to fd.c.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2019-06-11 08:20:48 +02:00
|
|
|
if (IsTransactionState() && MyDatabaseId != InvalidOid)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
2011-04-07 06:11:01 +02:00
|
|
|
temp_tablespaces_extra *myextra;
|
2007-06-07 21:19:57 +02:00
|
|
|
Oid *tblSpcs;
|
|
|
|
int numSpcs;
|
|
|
|
ListCell *l;
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* temporary workspace until we are done verifying the list */
|
|
|
|
tblSpcs = (Oid *) palloc(list_length(namelist) * sizeof(Oid));
|
2007-06-07 21:19:57 +02:00
|
|
|
numSpcs = 0;
|
2007-06-03 19:08:34 +02:00
|
|
|
foreach(l, namelist)
|
|
|
|
{
|
|
|
|
char *curname = (char *) lfirst(l);
|
2007-06-07 21:19:57 +02:00
|
|
|
Oid curoid;
|
|
|
|
AclResult aclresult;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Allow an empty string (signifying database default) */
|
|
|
|
if (curname[0] == '\0')
|
2007-06-07 21:19:57 +02:00
|
|
|
{
|
2020-07-03 23:01:34 +02:00
|
|
|
/* InvalidOid signifies database's default tablespace */
|
2007-06-07 21:19:57 +02:00
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
2007-06-03 19:08:34 +02:00
|
|
|
continue;
|
2007-06-07 21:19:57 +02:00
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2010-08-05 16:45:09 +02:00
|
|
|
/*
|
2013-09-04 00:56:22 +02:00
|
|
|
* In an interactive SET command, we ereport for bad info. When
|
|
|
|
* source == PGC_S_TEST, don't throw a hard error for a
|
|
|
|
* nonexistent tablespace, only a NOTICE. See comments in guc.h.
|
2010-08-05 16:45:09 +02:00
|
|
|
*/
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
curoid = get_tablespace_oid(curname, source <= PGC_S_TEST);
|
2007-06-07 21:19:57 +02:00
|
|
|
if (curoid == InvalidOid)
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
{
|
|
|
|
if (source == PGC_S_TEST)
|
|
|
|
ereport(NOTICE,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
curname)));
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.
Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.
Backpatch to all supported versions.
2012-01-30 09:32:46 +01:00
|
|
|
}
|
2007-06-07 21:19:57 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Allow explicit specification of database's default tablespace
|
|
|
|
* in temp_tablespaces without triggering permissions checks.
|
|
|
|
*/
|
|
|
|
if (curoid == MyDatabaseTableSpace)
|
|
|
|
{
|
2020-07-03 23:01:34 +02:00
|
|
|
/* InvalidOid signifies database's default tablespace */
|
2007-06-07 21:19:57 +02:00
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* Check permissions, similarly complaining only if interactive */
|
2007-06-07 21:19:57 +02:00
|
|
|
aclresult = pg_tablespace_aclcheck(curoid, GetUserId(),
|
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
{
|
|
|
|
if (source >= PGC_S_INTERACTIVE)
|
2017-12-02 15:26:34 +01:00
|
|
|
aclcheck_error(aclresult, OBJECT_TABLESPACE, curname);
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
tblSpcs[numSpcs++] = curoid;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
2007-06-07 21:19:57 +02:00
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
/* Now prepare an "extra" struct for assign_temp_tablespaces */
|
|
|
|
myextra = malloc(offsetof(temp_tablespaces_extra, tblSpcs) +
|
|
|
|
numSpcs * sizeof(Oid));
|
|
|
|
if (!myextra)
|
|
|
|
return false;
|
|
|
|
myextra->numSpcs = numSpcs;
|
|
|
|
memcpy(myextra->tblSpcs, tblSpcs, numSpcs * sizeof(Oid));
|
|
|
|
*extra = (void *) myextra;
|
|
|
|
|
|
|
|
pfree(tblSpcs);
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
|
|
|
|
2011-04-07 06:11:01 +02:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* assign_hook: do extra actions as needed */
|
|
|
|
void
|
|
|
|
assign_temp_tablespaces(const char *newval, void *extra)
|
|
|
|
{
|
|
|
|
temp_tablespaces_extra *myextra = (temp_tablespaces_extra *) extra;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If check_temp_tablespaces was executed inside a transaction, then pass
|
|
|
|
* the list it made to fd.c. Otherwise, clear fd.c's list; we must be
|
|
|
|
* still outside a transaction, or else restoring during transaction exit,
|
|
|
|
* and in either case we can just let the next PrepareTempTablespaces call
|
|
|
|
* make things sane.
|
|
|
|
*/
|
|
|
|
if (myextra)
|
|
|
|
SetTempTablespaces(myextra->tblSpcs, myextra->numSpcs);
|
|
|
|
else
|
|
|
|
SetTempTablespaces(NULL, 0);
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2007-06-07 21:19:57 +02:00
|
|
|
* PrepareTempTablespaces -- prepare to use temp tablespaces
|
2007-06-03 19:08:34 +02:00
|
|
|
*
|
2007-06-07 21:19:57 +02:00
|
|
|
* If we have not already done so in the current transaction, parse the
|
|
|
|
* temp_tablespaces GUC variable and tell fd.c which tablespace(s) to use
|
|
|
|
* for temp files.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2007-06-07 21:19:57 +02:00
|
|
|
void
|
|
|
|
PrepareTempTablespaces(void)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
|
|
|
char *rawname;
|
|
|
|
List *namelist;
|
2007-06-07 21:19:57 +02:00
|
|
|
Oid *tblSpcs;
|
|
|
|
int numSpcs;
|
|
|
|
ListCell *l;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* No work if already done in current transaction */
|
|
|
|
if (TempTablespacesAreSet())
|
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/*
|
2007-06-07 21:19:57 +02:00
|
|
|
* Can't do catalog access unless within a transaction. This is just a
|
|
|
|
* safety check in case this function is called by low-level code that
|
|
|
|
* could conceivably execute outside a transaction. Note that in such a
|
|
|
|
* scenario, fd.c will fall back to using the current database's default
|
|
|
|
* tablespace, which should always be OK.
|
2007-06-03 19:08:34 +02:00
|
|
|
*/
|
2007-06-07 21:19:57 +02:00
|
|
|
if (!IsTransactionState())
|
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
/* Need a modifiable copy of string */
|
|
|
|
rawname = pstrdup(temp_tablespaces);
|
|
|
|
|
|
|
|
/* Parse string into list of identifiers */
|
|
|
|
if (!SplitIdentifierString(rawname, ',', &namelist))
|
|
|
|
{
|
|
|
|
/* syntax error in name list */
|
2007-06-07 21:19:57 +02:00
|
|
|
SetTempTablespaces(NULL, 0);
|
2007-06-03 19:08:34 +02:00
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
2007-06-07 21:19:57 +02:00
|
|
|
return;
|
2007-06-03 19:08:34 +02:00
|
|
|
}
|
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Store tablespace OIDs in an array in TopTransactionContext */
|
|
|
|
tblSpcs = (Oid *) MemoryContextAlloc(TopTransactionContext,
|
|
|
|
list_length(namelist) * sizeof(Oid));
|
|
|
|
numSpcs = 0;
|
|
|
|
foreach(l, namelist)
|
2007-06-03 19:08:34 +02:00
|
|
|
{
|
2007-06-07 21:19:57 +02:00
|
|
|
char *curname = (char *) lfirst(l);
|
|
|
|
Oid curoid;
|
|
|
|
AclResult aclresult;
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Allow an empty string (signifying database default) */
|
|
|
|
if (curname[0] == '\0')
|
|
|
|
{
|
2020-07-03 23:01:34 +02:00
|
|
|
/* InvalidOid signifies database's default tablespace */
|
2007-06-07 21:19:57 +02:00
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
|
|
|
continue;
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/* Else verify that name is a valid tablespace name */
|
2010-08-05 16:45:09 +02:00
|
|
|
curoid = get_tablespace_oid(curname, true);
|
2007-06-07 21:19:57 +02:00
|
|
|
if (curoid == InvalidOid)
|
|
|
|
{
|
2010-08-05 16:45:09 +02:00
|
|
|
/* Skip any bad list elements */
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
|
|
|
}
|
2007-06-03 19:08:34 +02:00
|
|
|
|
2007-06-07 21:19:57 +02:00
|
|
|
/*
|
|
|
|
* Allow explicit specification of database's default tablespace in
|
|
|
|
* temp_tablespaces without triggering permissions checks.
|
|
|
|
*/
|
|
|
|
if (curoid == MyDatabaseTableSpace)
|
|
|
|
{
|
2020-07-03 23:01:34 +02:00
|
|
|
/* InvalidOid signifies database's default tablespace */
|
|
|
|
tblSpcs[numSpcs++] = InvalidOid;
|
2007-06-07 21:19:57 +02:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check permissions similarly */
|
|
|
|
aclresult = pg_tablespace_aclcheck(curoid, GetUserId(),
|
|
|
|
ACL_CREATE);
|
|
|
|
if (aclresult != ACLCHECK_OK)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
tblSpcs[numSpcs++] = curoid;
|
|
|
|
}
|
|
|
|
|
|
|
|
SetTempTablespaces(tblSpcs, numSpcs);
|
2007-06-03 19:08:34 +02:00
|
|
|
|
|
|
|
pfree(rawname);
|
|
|
|
list_free(namelist);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
/*
|
|
|
|
* get_tablespace_oid - given a tablespace name, look up the OID
|
|
|
|
*
|
2010-08-05 16:45:09 +02:00
|
|
|
* If missing_ok is false, throw an error if tablespace name not found. If
|
|
|
|
* true, just return InvalidOid.
|
2004-11-05 20:17:13 +01:00
|
|
|
*/
|
|
|
|
Oid
|
2010-08-05 16:45:09 +02:00
|
|
|
get_tablespace_oid(const char *tablespacename, bool missing_ok)
|
2004-11-05 20:17:13 +01:00
|
|
|
{
|
|
|
|
Oid result;
|
|
|
|
Relation rel;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scandesc;
|
2004-11-05 20:17:13 +01:00
|
|
|
HeapTuple tuple;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Search pg_tablespace. We use a heapscan here even though there is an
|
|
|
|
* index on name, on the theory that pg_tablespace will usually have just
|
|
|
|
* a few entries and so an indexed lookup is a waste of effort.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
|
|
|
Anum_pg_tablespace_spcname,
|
|
|
|
BTEqualStrategyNumber, F_NAMEEQ,
|
|
|
|
CStringGetDatum(tablespacename));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scandesc = table_beginscan_catalog(rel, 1, entry);
|
2004-11-05 20:17:13 +01:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/* We assume that there can be at most one matching tuple */
|
2004-11-05 20:17:13 +01:00
|
|
|
if (HeapTupleIsValid(tuple))
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
result = ((Form_pg_tablespace) GETSTRUCT(tuple))->oid;
|
2004-11-05 20:17:13 +01:00
|
|
|
else
|
|
|
|
result = InvalidOid;
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scandesc);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
2010-08-05 16:45:09 +02:00
|
|
|
if (!OidIsValid(result) && !missing_ok)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_UNDEFINED_OBJECT),
|
|
|
|
errmsg("tablespace \"%s\" does not exist",
|
|
|
|
tablespacename)));
|
|
|
|
|
2004-11-05 20:17:13 +01:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* get_tablespace_name - given a tablespace OID, look up the name
|
|
|
|
*
|
|
|
|
* Returns a palloc'd string, or NULL if no such tablespace.
|
|
|
|
*/
|
|
|
|
char *
|
|
|
|
get_tablespace_name(Oid spc_oid)
|
|
|
|
{
|
|
|
|
char *result;
|
|
|
|
Relation rel;
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
TableScanDesc scandesc;
|
2004-11-05 20:17:13 +01:00
|
|
|
HeapTuple tuple;
|
|
|
|
ScanKeyData entry[1];
|
|
|
|
|
2007-06-03 19:08:34 +02:00
|
|
|
/*
|
|
|
|
* Search pg_tablespace. We use a heapscan here even though there is an
|
|
|
|
* index on oid, on the theory that pg_tablespace will usually have just a
|
|
|
|
* few entries and so an indexed lookup is a waste of effort.
|
|
|
|
*/
|
2019-01-21 19:32:19 +01:00
|
|
|
rel = table_open(TableSpaceRelationId, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
ScanKeyInit(&entry[0],
|
Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-21 00:36:57 +01:00
|
|
|
Anum_pg_tablespace_oid,
|
2004-11-05 20:17:13 +01:00
|
|
|
BTEqualStrategyNumber, F_OIDEQ,
|
|
|
|
ObjectIdGetDatum(spc_oid));
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
scandesc = table_beginscan_catalog(rel, 1, entry);
|
2004-11-05 20:17:13 +01:00
|
|
|
tuple = heap_getnext(scandesc, ForwardScanDirection);
|
|
|
|
|
|
|
|
/* We assume that there can be at most one matching tuple */
|
|
|
|
if (HeapTupleIsValid(tuple))
|
|
|
|
result = pstrdup(NameStr(((Form_pg_tablespace) GETSTRUCT(tuple))->spcname));
|
|
|
|
else
|
|
|
|
result = NULL;
|
|
|
|
|
tableam: Add and use scan APIs.
Too allow table accesses to be not directly dependent on heap, several
new abstractions are needed. Specifically:
1) Heap scans need to be generalized into table scans. Do this by
introducing TableScanDesc, which will be the "base class" for
individual AMs. This contains the AM independent fields from
HeapScanDesc.
The previous heap_{beginscan,rescan,endscan} et al. have been
replaced with a table_ version.
There's no direct replacement for heap_getnext(), as that returned
a HeapTuple, which is undesirable for a other AMs. Instead there's
table_scan_getnextslot(). But note that heap_getnext() lives on,
it's still used widely to access catalog tables.
This is achieved by new scan_begin, scan_end, scan_rescan,
scan_getnextslot callbacks.
2) The portion of parallel scans that's shared between backends need
to be able to do so without the user doing per-AM work. To achieve
that new parallelscan_{estimate, initialize, reinitialize}
callbacks are introduced, which operate on a new
ParallelTableScanDesc, which again can be subclassed by AMs.
As it is likely that several AMs are going to be block oriented,
block oriented callbacks that can be shared between such AMs are
provided and used by heap. table_block_parallelscan_{estimate,
intiialize, reinitialize} as callbacks, and
table_block_parallelscan_{nextpage, init} for use in AMs. These
operate on a ParallelBlockTableScanDesc.
3) Index scans need to be able to access tables to return a tuple, and
there needs to be state across individual accesses to the heap to
store state like buffers. That's now handled by introducing a
sort-of-scan IndexFetchTable, which again is intended to be
subclassed by individual AMs (for heap IndexFetchHeap).
The relevant callbacks for an AM are index_fetch_{end, begin,
reset} to create the necessary state, and index_fetch_tuple to
retrieve an indexed tuple. Note that index_fetch_tuple
implementations need to be smarter than just blindly fetching the
tuples for AMs that have optimizations similar to heap's HOT - the
currently alive tuple in the update chain needs to be fetched if
appropriate.
Similar to table_scan_getnextslot(), it's undesirable to continue
to return HeapTuples. Thus index_fetch_heap (might want to rename
that later) now accepts a slot as an argument. Core code doesn't
have a lot of call sites performing index scans without going
through the systable_* API (in contrast to loads of heap_getnext
calls and working directly with HeapTuples).
Index scans now store the result of a search in
IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the
target is not generally a HeapTuple anymore that seems cleaner.
To be able to sensible adapt code to use the above, two further
callbacks have been introduced:
a) slot_callbacks returns a TupleTableSlotOps* suitable for creating
slots capable of holding a tuple of the AMs
type. table_slot_callbacks() and table_slot_create() are based
upon that, but have additional logic to deal with views, foreign
tables, etc.
While this change could have been done separately, nearly all the
call sites that needed to be adapted for the rest of this commit
also would have been needed to be adapted for
table_slot_callbacks(), making separation not worthwhile.
b) tuple_satisfies_snapshot checks whether the tuple in a slot is
currently visible according to a snapshot. That's required as a few
places now don't have a buffer + HeapTuple around, but a
slot (which in heap's case internally has that information).
Additionally a few infrastructure changes were needed:
I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now
internally uses a slot to keep track of tuples. While
systable_getnext() still returns HeapTuples, and will so for the
foreseeable future, the index API (see 1) above) now only deals with
slots.
The remainder, and largest part, of this commit is then adjusting all
scans in postgres to use the new APIs.
Author: Andres Freund, Haribabu Kommi, Alvaro Herrera
Discussion:
https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
2019-03-11 20:46:41 +01:00
|
|
|
table_endscan(scandesc);
|
2019-01-21 19:32:19 +01:00
|
|
|
table_close(rel, AccessShareLock);
|
2004-11-05 20:17:13 +01:00
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
/*
|
|
|
|
* TABLESPACE resource manager's routines
|
|
|
|
*/
|
|
|
|
void
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
tblspc_redo(XLogReaderState *record)
|
2004-08-29 23:08:48 +02:00
|
|
|
{
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
|
2004-08-29 23:08:48 +02:00
|
|
|
|
2009-01-20 19:59:37 +01:00
|
|
|
/* Backup blocks are not used in tblspc records */
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
Assert(!XLogRecHasAnyBlockRefs(record));
|
2009-01-20 19:59:37 +01:00
|
|
|
|
2004-08-29 23:08:48 +02:00
|
|
|
if (info == XLOG_TBLSPC_CREATE)
|
|
|
|
{
|
|
|
|
xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
|
|
|
|
char *location = xlrec->ts_path;
|
|
|
|
|
2010-01-12 03:42:52 +01:00
|
|
|
create_tablespace_directories(location, xlrec->ts_id);
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
else if (info == XLOG_TBLSPC_DROP)
|
|
|
|
{
|
|
|
|
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
|
|
|
|
|
2022-05-07 05:19:52 +02:00
|
|
|
/* Close all smgr fds in all backends. */
|
|
|
|
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
|
|
|
|
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
/*
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* If we issued a WAL record for a drop tablespace it implies that
|
|
|
|
* there were no files in it at all when the DROP was done. That means
|
|
|
|
* that no permanent objects can exist in it at this point.
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*
|
|
|
|
* It is possible for standby users to be using this tablespace as a
|
|
|
|
* location for their temporary files, so if we fail to remove all
|
|
|
|
* files then do conflict processing and try again, if currently
|
|
|
|
* enabled.
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
*
|
|
|
|
* Other possible reasons for failure include bollixed file
|
|
|
|
* permissions on a standby server when they were okay on the primary,
|
|
|
|
* etc etc. There's not much we can do about that, so just remove what
|
|
|
|
* we can and press on.
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(xlrec->ts_id, true))
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
{
|
2010-01-14 12:08:02 +01:00
|
|
|
ResolveRecoveryConflictWithTablespace(xlrec->ts_id);
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we did recovery processing then hopefully the backends who
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
* wrote temp files should have cleaned up and exited by now. So
|
|
|
|
* retry before complaining. If we fail again, this is just a LOG
|
|
|
|
* condition, because it's not worth throwing an ERROR for (as
|
|
|
|
* that would crash the database and require manual intervention
|
|
|
|
* before we could get past this WAL record on restart).
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
*/
|
2010-01-12 03:42:52 +01:00
|
|
|
if (!destroy_tablespace_directories(xlrec->ts_id, true))
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
ereport(LOG,
|
2004-08-29 23:08:48 +02:00
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay. Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction. Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record. And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.
Back-patch to 9.0 where Hot Standby was introduced. In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly. Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 20:43:58 +01:00
|
|
|
errmsg("directories for tablespace %u could not be removed",
|
|
|
|
xlrec->ts_id),
|
|
|
|
errhint("You can remove the directories manually if necessary.")));
|
Allow read only connections during recovery, known as Hot Standby.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
|
|
|
}
|
2004-08-29 23:08:48 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
elog(PANIC, "tblspc_redo: unknown op code %u", info);
|
|
|
|
}
|