postgresql/doc/TODO.detail/tablespaces

From pgsql-hackers-owner+M174@hub.org Sun Mar 12 22:31:11 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA25886
	for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:31:10 -0500 (EST)
Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id XAA04589 for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:19:33 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by news.tht.net (8.9.3/8.9.3) with SMTP id XAA42854;
	Sun, 12 Mar 2000 23:05:05 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M174@hub.org)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA95917
	for <pgsql-hackers@postgreSQL.org>; Sun, 12 Mar 2000 23:00:56 -0500 (EST)
	(envelope-from pgman@candle.pha.pa.us)
Received: (from pgman@localhost)
	by candle.pha.pa.us (8.9.0/8.9.0) id WAA25403
	for pgsql-hackers@postgreSQL.org; Sun, 12 Mar 2000 22:59:56 -0500 (EST)
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-Id: <200003130359.WAA25403@candle.pha.pa.us>
Subject: [HACKERS] Fix for RENAME
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: Sun, 12 Mar 2000 22:59:56 -0500 (EST)
X-Mailer: ELM [version 2.4ME+ PL72 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

I have thought about the issue with ALTER TABLE RENAME and keeping the
file system in sync with the database.

It seems there are three commands that can cause these to get out of
sync:

	CREATE TABLE/INDEX
	DROP TABLE/INDEX
	ALTER TABLE RENAME

Now, if we had file names based only on the oid, we can eliminate file
renaming for RENAME, but the others are still a problem.

Seems there are three ways to get out of sync:

	ABORT transaction
	backend crash
	OS crash

The last two are the same, except the backend crash restarts the
postmaster, while the OS crash has the postmaster starting up normally.

Here is my idea.  Create a C List of file names to unlink on transaction
commit or abort.  For CREATE, unlink created files on transaction ABORT.
For DROP, unlink dropped files on COMMIT.  For RENAME, create a hard
link for the new table linked to old table, and unlink the old file name
on COMMIT or the new file on ABORT.

That takes care of COMMIT and ABORT.  For backend crash or OS crash, add
a postgres command-line flag for recovery.  Have the postmaster on
startup or shared memory refresh start up a postgres backend on every
database with the recovery flag set.  Have the postgres backend find all
the oids in the pg_class table, and have it go through every file in the
database directory and remove all files that don't match the oids/names
in pg_class.  Also, remove all old sort, noname, and temp files at the
same time.  Seems we should be doing this anyway.

Care would have to be taken that a corrupted database that caused a
postgres crash on connection would not get the postmaster startup into
an infinite loop.

Comments?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

From reedstrm@wallace.ece.rice.edu Tue Mar 14 12:33:31 2000
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA23826
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 13:33:29 -0500 (EST)
Received: by wallace.ece.rice.edu
	via sendmail from stdin
	id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
	for pgman@candle.pha.pa.us; Tue, 14 Mar 2000 12:33:32 -0600 (CST)
Date: Tue, 14 Mar 2000 12:33:32 -0600
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
To: Hiroshi Inoue <Inoue@tpf.co.jp>
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
Message-ID: <20000314123331.A6094@rice.edu>
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Mutt/1.0i
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
Status: OR

Hiroshi -
I've just about finished working up a patch to store the physical
file name in the pg_class table. There are only two places that
require a Rule for generating the filename, and one of them is
only used for bootstrapping. For the initial cut, I used the rule:

The filename consists of the TABLENAME, and underscore, and the OID.
If this is longer than NAMEDATALEN, shorten the TABLENAME.

I implemented this rule by exporting Tom's  makeObjectName function
from analyze.c, which is used to make other system generated names
that are have a requirement to be human readable. Replacing this
rule with any other in the future would be straightforward, except
for bootstrap. There are a number of places in bootstrap that need to
know the filename. I've factored them out into yet another set of
#defines (in catname.h) to make that easier.


I'm working through the regression tests right now: this is a relatively
extensive change, since it modifies the low level access routines, and the
buffer cache (which I indexed on physical filename, rather than relname,
as it is now) Hopefully, I caught all the places that assume relname ==
filename == unique name within a single database (see, I want schemas...)

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
> > -----Original Message-----
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> >
> > > > They use the existing table file.  It is only when
> > > > adding/removing/renaming file system files that this
> > out-of-sync problem
> > > > happens.
> > > >
> >
> > Not sure.  I was going to get the CREATE/DROP/RENAME working as it
> > should then as we add more features, we can implement this solution for
> > them too.
> >
>
> Hmm,is general solution difficult ?
> Is more flexible naming rule bad ?
>
> This the 3rd or 4th time that I mention the following.
>
> PostgreSQL doesn't keep the information in itself where tables are
> allocated. So we need a naming rule to find where existent tables
> are allocated.  Don't you wonder the spec ?
>
> Regards.
>
> Hiroshi Inoue
> Inoue@tpf.co.jp
>
>

From pgsql-hackers-owner+M74@hub.org Tue Mar 14 18:14:15 2000
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA06093
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 19:14:13 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by hub.org (8.9.3/8.9.3) with SMTP id SAA95465;
	Tue, 14 Mar 2000 18:45:35 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M74@hub.org)
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
	by hub.org (8.9.3/8.9.3) with ESMTP id NAA31276
	for <pgsql-hackers@postgresql.org>; Tue, 14 Mar 2000 13:33:52 -0500 (EST)
	(envelope-from reedstrm@wallace.ece.rice.edu)
Received: by wallace.ece.rice.edu
	via sendmail from stdin
	id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
	for pgsql-hackers@postgresql.org; Tue, 14 Mar 2000 12:33:32 -0600 (CST)
Date: Tue, 14 Mar 2000 12:33:32 -0600
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
To: Hiroshi Inoue <Inoue@tpf.co.jp>
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
Message-ID: <20000314123331.A6094@rice.edu>
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Mutt/1.0i
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

Hiroshi -
I've just about finished working up a patch to store the physical
file name in the pg_class table. There are only two places that
require a Rule for generating the filename, and one of them is
only used for bootstrapping. For the initial cut, I used the rule:

The filename consists of the TABLENAME, and underscore, and the OID.
If this is longer than NAMEDATALEN, shorten the TABLENAME.

I implemented this rule by exporting Tom's  makeObjectName function
from analyze.c, which is used to make other system generated names
that are have a requirement to be human readable. Replacing this
rule with any other in the future would be straightforward, except
for bootstrap. There are a number of places in bootstrap that need to
know the filename. I've factored them out into yet another set of
#defines (in catname.h) to make that easier.


I'm working through the regression tests right now: this is a relatively
extensive change, since it modifies the low level access routines, and the
buffer cache (which I indexed on physical filename, rather than relname,
as it is now) Hopefully, I caught all the places that assume relname ==
filename == unique name within a single database (see, I want schemas...)

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
> > -----Original Message-----
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> >
> > > > They use the existing table file.  It is only when
> > > > adding/removing/renaming file system files that this
> > out-of-sync problem
> > > > happens.
> > > >
> >
> > Not sure.  I was going to get the CREATE/DROP/RENAME working as it
> > should then as we add more features, we can implement this solution for
> > them too.
> >
>
> Hmm,is general solution difficult ?
> Is more flexible naming rule bad ?
>
> This the 3rd or 4th time that I mention the following.
>
> PostgreSQL doesn't keep the information in itself where tables are
> allocated. So we need a naming rule to find where existent tables
> are allocated.  Don't you wonder the spec ?
>
> Regards.
>
> Hiroshi Inoue
> Inoue@tpf.co.jp
>
>

From mascarm@mascari.com Tue Mar 14 16:34:04 2000
Received: from corvette.mascari.com (dhcp26136016.columbus.rr.com [24.26.136.16])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04395
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 17:32:14 -0500 (EST)
Received: from mascari.com (ferrari.mascari.com [192.168.2.1])
	by corvette.mascari.com (8.9.3/8.9.3) with ESMTP id RAA09562;
	Tue, 14 Mar 2000 17:27:22 -0500
Message-ID: <38CEBD0A.52ADB37E@mascari.com>
Date: Tue, 14 Mar 2000 17:28:26 -0500
From: Mike Mascari <mascarm@mascari.com>
X-Mailer: Mozilla 4.7 [en] (Win95; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Bruce Momjian <pgman@candle.pha.pa.us>
CC: Hiroshi Inoue <Inoue@tpf.co.jp>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
References: <200003141545.KAA17518@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
>
> > Hmm,is general solution difficult ?
> > Is more flexible naming rule bad ?
> >
> > This the 3rd or 4th time that I mention the following.
>
> That's because I didn't understand.
>
> >
> > PostgreSQL doesn't keep the information in itself where tables are
> > allocated. So we need a naming rule to find where existent tables
> > are allocated.  Don't you wonder the spec ?
>
> How does naming the files in the database help our DROP/CREATE problem?
> It would help RENAME a little bit.  Not sure about the others because
> currently they don't have a problem.

I've been thinking about this somewhat, and I think the first
step necessary in correctly supporting ROLLBACK-able DDL
statements in transactions is the change to <relname>_<oid>.
Imagine the scenario:

CREATE TABLE test (key int4);

a) Session #1:

BEGIN;

b) Session #2:

BEGIN;
DROP TABLE test;
CREATE TABLE test (value varchar(32));

c) Session #1:

DROP TABLE test;
COMMIT;

d) Session #2:

COMMIT;

What's clear to me is that, if DDL statements are to be
ROLLBACK-able, either (1) an AccessExclusive lock is held on the
relation until transaction commit (like Phillip Warner stated was
Dec/Rdb's behavior) or (2) PostgreSQL must be capable of
supporting "multi-versioned schema" as well as tuples. Before
step 'c' is executed, both tables must simultaneously exist in
the database with the same name, which works fine in the cataloge
thanks to MVCC, but requires that, on disk, there exists:

test_01231  - Session #1's table, available for ROLLBACK
test_13421  - Session #2's table, available for COMMIT

Now, I believe it was Andreas who suggested that VACUUM be
modified to perform cleanup. I agree with this. VACUUM will need
to check for aborted relation tuples in pg_class and remove the
associated file from the filesystem in the event, for example,
that Session #2 aborted -or- Session #1 aborted leaving the
original pg_class tuple the "active" one and Session #2 attempted
to COMMIT, which violates the UNIQUE constraint on the relname of
pg_class. In addition, for "active" relation entries, VACUUM
should verify the filename is
<relname>_<oid> for the given oid. If it is not, it should rename
the filename on the filesystem. Again, this is purely cosmetic
for administrative purposes only, but would allow
for lack of atomicity only with respect to the label of the
relation file, until the next
VACUUM is run.

For the case of ALTER TABLE RENAME, ALTER TABLE DROP COLUMN,
etc., the same functionality would apply. But, as in previous
discussions regarding ALTER TABLE DROP COLUMN, PostgreSQL MUST be
capable of allowing multiple tuples with different attribute
counts and types within the same relation:

CREATE TABLE test (key int4);

a) Session #1:

BEGIN;

b) Session #2:

BEGIN;
ALTER TABLE test ADD COLUMN value int4;
INSERT INTO test values (1, 1);

c) Session #1:

INSERT INTO test values (0);
COMMIT;

d) Session #2:

COMMIT;

This also means that Hiroshi's plan to suppress the visibility of
attributes for ALTER TABLE DROP COLUMN would be required anyway,
to allow for "multi-versioning" of attributes within a single
tuple (i.e., like multi-versioning of tuples within relations),
an attribute is either visible or not, but the tuple should
always grow, until, of course, the next VACUUM.

So, to support rollback-able DDL statements ("multi-versioning
schema", if you will), PostgreSQL needs:

1) relation names of the form <relname>_<oid>
2) support "multi-versioning" of attributes within a single tuple
3) modify VACUUM to:

  A) Remove filesystem files whose pg_class tuples are no longer
valid
  B) Rename filesystem files to relname of pg_class when the
<relname>_<oid> doesn't match
  C) Reconstruct relations after attributes have been
added/dropped.

4) All DDL statements should perform their non-create filesystem
functions in the now infamous "post-transaction-commit" trigger.
If the backend should crash between the time the transaction
committed and the rename() or unlink(), no adverse affects would
be encountered with the database WRT data, VACUUM would clean up
the rename() problem, and, worst-case scenario, an old
<relname>_<oid> file would lie around unused. But at least it
would no longer prohibit the creation of a table by the same
name....

Just my humble opinion,

Mike Mascari

From Inoue@tpf.co.jp Tue Mar 14 20:31:35 2000
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA08792
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 21:30:35 -0500 (EST)
Received: from cadzone ([126.0.1.40] (may be forged))
          by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
   id LAA00515; Wed, 15 Mar 2000 11:29:09 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
        "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Subject: RE: [HACKERS] Fix for RENAME
Date: Wed, 15 Mar 2000 11:35:46 +0900
Message-ID: <000c01bf8e27$2b3c3ce0$2801007e@tpf.co.jp>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
In-Reply-To: <20000314123331.A6094@rice.edu>
Importance: Normal
Status: ORr

> -----Original Message-----
> From: Ross J. Reedstrom [mailto:reedstrm@wallace.ece.rice.edu]
>
> Hiroshi -
> I've just about finished working up a patch to store the physical
> file name in the pg_class table. There are only two places that
> require a Rule for generating the filename, and one of them is
> only used for bootstrapping.

Thanks for your trial.
It's nice that only two places require naming rule.

I don't stick to one naming rule.
The only limitation is the uniqueness and the rule
could be changed according to situations.
For example,we could change the naming rule according to
the kind of relation such as system/user relations.

I'm now inclined to introduce a new system relation to store
the physical path name. It could also have table(data)space
information in the (near ?) future.
It seems better to separate it from pg_class because table(data?)
space may change the concept of table allocation.

Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


From Inoue@tpf.co.jp Wed Mar 15 02:00:58 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA17887
	for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 03:00:57 -0500 (EST)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id CAA02974 for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 02:54:44 -0500 (EST)
Received: from cadzone ([126.0.1.40] (may be forged))
          by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
   id QAA00734; Wed, 15 Mar 2000 16:53:56 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
        "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Subject: RE: [HACKERS] Fix for RENAME
Date: Wed, 15 Mar 2000 17:00:35 +0900
Message-ID: <001101bf8e54$8b941cc0$2801007e@tpf.co.jp>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
In-Reply-To: <200003150433.XAA13256@candle.pha.pa.us>
Importance: Normal
Status: ORr

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
>
> > I'm now inclined to introduce a new system relation to store
> > the physical path name. It could also have table(data)space
> > information in the (near ?) future.
> > It seems better to separate it from pg_class because table(data?)
> > space may change the concept of table allocation.
>
> Why not just put it in pg_class?
>

Not sure,it's only my feeling.
Comments please,everyone.

We have taken a practical way which doesn't break file per table
assumption in this thread and it wouldn't so difficult  to implement.
In fact Ross has already tried it.

However there was a discussion about data(table)space for
months ago and currently a new discussion is there.
Judging from the previous discussion,I can't expect so much
that it could get a practical consensus(How many opinions there
were). We can make a practical step toward future by encapsulating
the information of table allocation. Separating table alloc info from
pg_class seems one of the way.
There may be more essential things for encapsulation.

Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


From pgsql-hackers-owner+M196@hub.org Thu Mar 16 03:02:35 2000
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA05789
	for <pgman@candle.pha.pa.us>; Thu, 16 Mar 2000 04:02:29 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by hub.org (8.9.3/8.9.3) with SMTP id CAA27302;
	Thu, 16 Mar 2000 02:58:55 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M196@hub.org)
Received: from downtown.oche.de (root@downtown.oche.de [194.94.253.3])
	by hub.org (8.9.3/8.9.3) with ESMTP id CAA23907
	for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2000 02:37:54 -0500 (EST)
	(envelope-from mne@darwin.oche.de)
Received: from darwin.oche.de (uucp@localhost)
	by downtown.oche.de (8.9.3/8.9.3/Debian/GNU) with SMTP id IAA30654
	for <pgsql-hackers@postgresql.org>; Thu, 16 Mar 2000 08:40:04 +0100
Received: from mne by darwin.oche.de with local (Exim 3.12 #1 (Debian))
	id 12VUhX-0003Vz-00
	for <pgsql-hackers@postgreSQL.org>; Thu, 16 Mar 2000 08:28:11 +0100
Date: Thu, 16 Mar 2000 08:28:11 +0100 (CET)
From: Martin Neumann <mne@mne.de>
Subject: [HACKERS] RfD: Design of tablespaces
To: pgsql-hackers@postgresql.org
MIME-Version: 1.0
Content-Type: TEXT/plain; CHARSET=US-ASCII
Message-Id: <E12VUhX-0003Vz-00@darwin.oche.de>
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR


I have written some thoughts on the concept of tablespace
down. I would be happy to get some comments on it.

-----------------------------------------------------------------
  Implementation of tablespaces within PostgreSQL
- a brainstorming paper designed for general discussion -

by Martin Neumann, 2000/3/15


1. What are tablespaces?
-------------------------

Tablespaces make it possible to distribute storage objects
over multiple points of storage (POS). Therefor one could
say a tablespace can be a POS.

Example:

tablespace_a -----> /mnt/raid/arena0/
tablespace_b -----> /mnt/raid/emc0/

Tablespaces can also store their data on other tablespaces:

tablespace_c -----> tablespace_b

This is quite interessting for administration purposes.


2. What are its advantages?
----------------------------

As you can choose a different tablespace for every storage
object (table, index etc.) it is easy to improve the following
aspects of your system:

 - Reliability

 You can put storage objects (mostly tables) you strongly depend
 on onto a more reliable tablespace (mirrored RAID or perhaps
 simply a directory which gets backuped more often than others).

 - Speed

 You can put storage objects you rarely need onto a rather slow
 tablespace and keep your quick tablespaces clean from this.

 A fast, but more expensive RAID-Stripeset can be used more
 efficiently as it doesn't get filled with non-performance
 sensitive data.

 But also distributing storage objects which have equal needs
 in sense of speed onto different tablespaces makes sense as
 you gain more speed by distributing data over more than one
 harddisk spindle.

 - Manageability

 You can grant and revoke rights on base of a tablespace.

 As every storage object belongs to exactly one tablespace,
 you can easily group storage objects using a tablespace.


3. What about disk I/O?
------------------------

Tablespaces tell the storage manager only where to store
the data, not how. This is the reasonable way.


4. Usage
---------

CREATE TABLESPACE tsname TYPE storage_type storage_options

Examples:

CREATE TABLESPACE tsemc0
  TYPE classic DIRECTORY /mnt/raid/emc0 NOFSYNC

CREATE TABLESPACE tsarena0 TYPE raw DEVICE /dev/araid/0
  MINSIZE 128 MAXSIZE 4096 GROW 4 32 SHRINK 2 32
  BLOCKSIZE 16384

CREATE TABLESPACE quick0 TYPE link TABLESPACE tsarena0;

--

CREATE TABLE tbname ( ... ) TABLESPACE tsname;

Examples:

CREATE TABLE foo (
  id   int4 NOT NULL UNIQUE,
  name text NOT NULL
) TABLESPACE tsemc0;

CREATE TABLE bar (
  id   int4 NOT NULL UNIQUE,
  name text NOT NULL
) TABLESPACE default;

If the tablespace isn't given, the storage objects gets created
in the "default" tablespace.

"default" is the PostgreSQL's default tablespace and the only one
which has to exist on each system.

--

ALTER TABLESPACE tsname tssettings

Examples:

ALTER TABLESPACE tsemc0 DIRECTORY /mnt/raid/emc1


NOTE: altering tablespaces without recreating the contained
storage objects introduces many problems.
Realisation is difficult and won't be my first goal.

--

DROP TABLESPACE tsname [FORCE]

Examples:

DROP TABLESPACE tsarena0

This will immediately remove the tablespace tsarena0
if it contains no storage objects.

If it still contains some the tablespace is marked for
deletion.

This means:
1. you can't create new storage objects in the tablespace
2. if the last storage object inside gets dropped, the
   tablespace will be removed.


DROP TABLESPACE tsarena0 FORCE

This will remove the tablespace including all contained
storage objects immediately.

--

VACUUM tsname

Example:

VACUUM tsemc1

This will vacuum a single tablespace with all contained
storage objects.
-----------------------------------------------------------------

--
Martin Neumann, Welkenrather Str. 118c, 52074 Aachen, Germany
mne@mne.de - http://www.mne.de/mne/ - sms@mne.de [eMail2SMS]
Tel. 0241 / 8876-080 - Mobil: 0173 / 27 69 632
..------.---------------------------------------------------------
|  at  | Inform GmbH - Abteilung Airport Logistics
| work | Pascalstr. 23 - 52076 Aachen - Tel. 02408 / 9456-0
|______| martin.neumann@inform-ac.com - http://www.inform-ac.com