Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA04589 for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:19:33 -0500 (EST)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA02974 for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 02:54:44 -0500 (EST)
From JanWieck@t-online.de Wed Jun 14 19:01:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA21372
for <pgman@candle.pha.pa.us>; Wed, 14 Jun 2000 19:00:59 -0400 (EDT)
Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA01930 for <pgman@candle.pha.pa.us>; Wed, 14 Jun 2000 18:51:11 -0400 (EDT)
Received: from fwd01.sul.t-online.de
by mailout02.sul.t-online.com with smtp
id 132Lz6-0004ec-01; Thu, 15 Jun 2000 00:50:08 +0200
Received: from hot.jw.home (340000654369-0001@[62.224.107.172]) by fwd01.sul.t-online.de
with esmtp id 132Lyy-0tYyi9C; Thu, 15 Jun 2000 00:50:00 +0200
Received: (from wieck@localhost)
by hot.jw.home (8.8.5/8.8.5) id WAA07887;
Wed, 14 Jun 2000 22:43:39 +0200
From: JanWieck@t-online.de (Jan Wieck)
Message-Id: <200006142043.WAA07887@hot.jw.home>
Subject: Re: [HACKERS] Big 7.1 open items
In-Reply-To: <14752.960996980@sss.pgh.pa.us> from Tom Lane at "Jun 14, 2000 11:36:20
am"
To: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed, 14 Jun 2000 22:43:39 +0200 (MEST)
CC: Oliver Elphick <olly@lfix.co.uk>, Bruce Momjian <pgman@candle.pha.pa.us>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 14 Jun 2000 22:44:16 -0400"
Date: Wed, 14 Jun 2000 23:13:52 -0400
Message-ID: <16985.961038832@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> That was my point --- that in doing this change, we are taking on more
> TODO items, that may detract from our main TODO items.
True, but they are also TODO items that could be handled by people other
than the inner circle of key developers. The actual rejiggering of
table-to-filename mapping is going to have to be done by one of the
small number of people who are fully up to speed on backend internals.
But we've got a lot more folks who would be able (and, hopefully,
willing) to design and code whatever tools are needed to make the
dbadmin's job easier in the face of the new filesystem layout. I'd
rather not expend a lot of core time to avoid needing those tools,
especially when I feel the old approach is fatally flawed anyway.
> Even gdb shows us the filename/tablename in backtraces. We are never
> going to be able to reproduce that.
Backtraces from *what*, exactly? 99% of the backend is still going
to be dealing with the same data as ever. It might be that poking
around in fd.c will be a little harder, but considering that fd.c
doesn't really know or care what the files it's manipulating are
anyway, I'm not convinced that this is a real issue.
> I guess I don't consider table schema commands inside transactions and
> such to be as big an items as the utility features we will need to
> build.
You've *got* to be kidding. We're constantly seeing complaints about
the fact that rolling back DROP or RENAME TABLE fails --- and worse,
leaves the table in a corrupted/inconsistent state. As far as I can
tell, that's one of the worst robustness problems we've got left to
fix. This is a big deal IMHO, and I want it to be fixed and fixed
right. I don't see how to fix it right if we try to keep physical
filenames tied to logical tablenames.
Moreover, that restriction will continue to hurt us if we try to
preserve it while implementing tablespaces, ANSI schemas, etc.
regards, tom lane
From pgsql-hackers-owner+M3397@hub.org Thu Jun 15 03:03:33 2000
Received: from hub.org (root@hub.org [216.126.84.1])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24286
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:03:32 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5F72T815284;
Thu, 15 Jun 2000 03:02:29 -0400 (EDT)
Received: from mailo.vtcif.telstra.com.au (mailo.vtcif.telstra.com.au [202.12.144.17])
by hub.org (8.10.1/8.10.1) with ESMTP id e5F721814963
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 03:02:01 -0400 (EDT)
Received: (from uucp@localhost) by mailo.vtcif.telstra.com.au (8.8.2/8.6.9) id RAA01186; Thu, 15 Jun 2000 17:01:48 +1000 (EST)
Received: from maili.vtcif.telstra.com.au(202.12.142.17)
via SMTP by mailo.vtcif.telstra.com.au, id smtpd0SbI.z; Thu Jun 15 17:00:39 2000
Received: (from uucp@localhost) by maili.vtcif.telstra.com.au (8.8.2/8.6.9) id RAA21419; Thu, 15 Jun 2000 17:00:37 +1000 (EST)
Received: from localhost(127.0.0.1), claiming to be "mail.cdn.telstra.com.au"
via SMTP by localhost, id smtpdWTHrU_; Thu Jun 15 16:59:34 2000
Received: from lunitari.nimrod.itg.telecom.com.au (lunitari.nimrod.itg.telecom.com.au [192.53.254.48]) by mail.cdn.telstra.com.au (8.8.2/8.6.9) with ESMTP id QAA04796; Thu, 15 Jun 2000 16:59:33 +1000 (EST)
Received: from nimrod.itg.telecom.com.au (majere [192.53.254.45])
by lunitari.nimrod.itg.telecom.com.au (8.9.1/8.9.3) with ESMTP id QAA18056;
> Any strong objections to the mixed relname_oid solution? It gets us
> everything oids does, and still lets Bruce use 'ls -l' to find the big
> tables, putting off writing any admin tools that'll need to be rewritten,
> anyway.
Doesn't relname_oid defeat the purpose of oid file names, which is that
they don't change when the table is renamed? Wasn't it going to be oids
with a tool to create a symlink of relname -> oid ?
From pgsql-hackers-owner+M3400@hub.org Thu Jun 15 03:31:16 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24604
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:31:15 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01191 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:15:28 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5F7CP835301;
Thu, 15 Jun 2000 03:12:25 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5F7Bt833744
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 03:11:55 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18801;
Comments: In-reply-to "Ross J. Reedstrom" <reedstrm@rice.edu>
message dated "Thu, 15 Jun 2000 01:03:12 -0500"
Date: Thu, 15 Jun 2000 03:11:52 -0400
Message-ID: <18798.961053112@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> Any strong objections to the mixed relname_oid solution?
Yes!
You cannot make it work reliably unless the relname part is the original
relname and does not track ALTER TABLE RENAME. IMHO having an obsolete
relname in the filename is worse than not having the relname at all;
it's a recipe for confusion, it means you still need admin tools to tell
which end is really up, and what's worst is you might think you don't.
Furthermore it requires an additional column in pg_class to keep track
of the original relname, which is a waste of space and effort.
It also creates a portability risk, or at least fails to remove one,
since you are critically dependent on the assumption that the OS
supports long filenames --- on a filesystem that truncates names to less
than about 45 characters you're in very deep trouble. An OID-only
approach still works on traditional 14-char-filename Unix filesystems
(it'd mostly even work on DOS 8+3, though I doubt we care about that).
Finally, one of the reasons I want to go to filenames based only on OID
is that that'll make life easier for mdblindwrt. Original relname + OID
doesn't help, in fact it makes life harder (more shmem space needed to
keep track of the filename for each buffer).
Can we *PLEASE JUST LET GO* of this bad idea? No relname in the
filename. Period.
regards, tom lane
From tgl@sss.pgh.pa.us Thu Jun 15 03:31:11 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24592
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:31:10 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01213 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:15:46 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18833;
Thu, 15 Jun 2000 03:14:30 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@Yahoo.com>, Oliver Elphick <olly@lfix.co.uk>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 14 Jun 2000 23:21:15 -0400"
Date: Thu, 15 Jun 2000 03:14:30 -0400
Message-ID: <18830.961053270@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Well, we did have someone do a test implementation of oid file names,
> and their report was that is looked pretty ugly. However, if people are
> convinced it has to be done, we can get started. I guess I was waiting
> for Vadim's storage manager, where the whole idea of separate files is
> going to go away anyway, I suspect. We would then have to re-write all
> our admin tools for the new format.
I seem to recall him saying that he wanted to go to filename == OID
just like I'm suggesting. But I agree we probably ought to hold off
doing anything until he gets back from Russia and can let us know
whether that's still his plan. If he is planning one-huge-file or
something like that, we might as well let these issues go unfixed
for one more release cycle.
regards, tom lane
From pgsql-hackers-owner+M3401@hub.org Thu Jun 15 03:31:15 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24601
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:31:14 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01428 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 03:19:39 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5F7GP843802;
Thu, 15 Jun 2000 03:16:25 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5F7Fr842651
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 03:15:53 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18833;
Thu, 15 Jun 2000 03:14:30 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@Yahoo.com>, Oliver Elphick <olly@lfix.co.uk>,
> > You need something that works from the command line, and
> something that
> > works if PostgreSQL is not running. How would you restore
> one file from
> > a tape.
>
> "Restore one file from a tape"? How are you going to do that anyway?
> You can't save and restore portions of a database like that, because
> of transaction commit status problems. To restore table X correctly,
> you'd have to restore pg_log as well, and then your other tables are
> hosed --- unless you also restore all of them from the backup. Only
> a complete database restore from tape would work, and for that you
> don't need to tell which file is which. So the above argument is a
> red herring.
>From what I know it is possible to simply restore one table file
since pg_log keeps all tid's. Of course it cannot guarantee integrity
and does not work if the table was altered.
> I realize it's nice to be able to tell which table file is which by
> eyeball, but the price we are paying for that small convenience is
> just too high. Give that up, and we can have rollbackable DROP and
> RENAME now (I'll personally commit to making it happen for 7.1).
> Continue to insist on it, and I don't think we'll *ever* have those
> features in a really robust form. It's just not possible to do
> multiple file renames atomically.
In the last proposal Bruce and I had it all layed out for tabname + oid
with no overhead in the normal situation, and little overhead if a rename
table crashed or was not rolled back or committed properly
which imho had all advantages combined.
Andreas
From ZeugswetterA@wien.spardat.at Thu Jun 15 04:31:04 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA25144
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 04:31:03 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA03225 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 04:05:41 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA100894;
Thu, 15 Jun 2000 10:04:52 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)
> In reality, very few people are going to be interested in restoring
> a table in a way that breaks referential integrity and other
> normal assumptions about what exists in the database.
This is not true. In my DBA history it would have saved me manweeks
of work if an easy and efficient restore of one single table from backup
would have been available in Informix and Oracle.
We allways had to restore most of the whole system to another machine only
to get back at some table info that would then be manually re-added
to the production system.
A restore of one table to a different/new tablename would have been
very convenient, and this is currently possible in PostgreSQL.
(create new table with same schema, then replace new table data file
with file from backup)
> The reality
> is that most people are going to engage in a little time travel
> to a past, consistent backup rather than do as you suggest.
No, this is what is done most of the time, but it is very inconvenient
to tell people that they loose all work from past days, so it is usually
done as I noted above if possible. We once had a situation where all data
was deleted from a table, but the problem was only noticed 3 weeks later.
> This is going to be more and more true as Postgres gains more and
> more acceptance in (no offense intended) the real world.
>
> >Right now, we use 'ps' with args to display backend
> information, and ls
> >-l to show disk information. We are going to lose that here.
>
> Dependence on "ls -l" is, IMO, a very weak argument.
In normal situations where everything works I agree, it is the
error situations where it really helps if you see what data is where.
debugging, lsof, Bruce already named them.
Andreas
From pgsql-hackers-owner+M3405@hub.org Thu Jun 15 04:31:09 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA25151
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 04:31:07 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA04151 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 04:30:23 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5F8RI883087;
Thu, 15 Jun 2000 04:27:18 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65])
by hub.org (8.10.1/8.10.1) with ESMTP id e5F8Qx881928
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 04:27:00 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA79848;
Thu, 15 Jun 2000 10:26:13 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)
> > Any strong objections to the mixed relname_oid solution?
>
> Yes!
>
> You cannot make it work reliably unless the relname part is
> the original
> relname and does not track ALTER TABLE RENAME.
It does, or should at least. Only problem case is where db crashes during
alter or commit/rollback. This could be fixed by first open that fails to
find the file
or vacuum, or some other utility.
> IMHO having
> an obsolete
> relname in the filename is worse than not having the relname at all;
> it's a recipe for confusion, it means you still need admin
> tools to tell
> which end is really up, and what's worst is you might think you don't.
>
> Furthermore it requires an additional column in pg_class to keep track
> of the original relname, which is a waste of space and effort.
it does not.
> Finally, one of the reasons I want to go to filenames based
> only on OID
> is that that'll make life easier for mdblindwrt. Original
> relname + OID
> doesn't help, in fact it makes life harder (more shmem space needed to
> keep track of the filename for each buffer).
I do not see this. filename is constructed from relname+oid.
if not found, do directory scan for *_<OID>.dat, if found --> rename.
Andreas
From pgsql-hackers-owner+M3407@hub.org Thu Jun 15 05:01:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA25462
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 05:01:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA04667 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 04:45:51 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5F8gr817124;
Thu, 15 Jun 2000 04:42:53 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65])
by hub.org (8.10.1/8.10.1) with ESMTP id e5F8gX815763
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 04:42:34 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA29072;
Thu, 15 Jun 2000 10:41:51 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)
This is not necessary, since *_<OID> is unique regardless of relname prefix.
Andreas
From scrappy@hub.org Thu Jun 15 08:30:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA03846
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 08:30:58 -0400 (EDT)
Received: from thelab.hub.org (nat193.152.mpoweredpc.net [142.177.193.152]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA14167 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 08:16:58 -0400 (EDT)
Received: from localhost (scrappy@localhost)
by thelab.hub.org (8.9.3/8.9.3) with ESMTP id JAA74856;
Thu, 15 Jun 2000 09:14:29 -0300 (ADT)
(envelope-from scrappy@hub.org)
X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs
Date: Thu, 15 Jun 2000 09:14:29 -0300 (ADT)
From: The Hermit Hacker <scrappy@hub.org>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Jan Wieck <JanWieck@Yahoo.com>,
> > Backtraces from *what*, exactly? 99% of the backend is still going
> > to be dealing with the same data as ever. It might be that poking
> > around in fd.c will be a little harder, but considering that fd.c
> > doesn't really know or care what the files it's manipulating are
> > anyway, I'm not convinced that this is a real issue.
>
> I was just throwing gdb out as an example. The bigger ones are ls,
> lsof/fstat, and tar.
You've lost me on this one ... if someone does an lsof of the process, it
will still provide them a list of open files ... are you complaining about
the extra step required to translate the file name to a "valid table"?
Oh, one point here ... this whole 'filenaming issue' ... as far as ls is
concerned, at least, only affects the superuser, since he's the only one
that can go 'ls'ng around i nthe directories ...
And, ummm, how hard would it be to have \d in psql display the "physical
table name" as part of its output?
Slight tangent here:
One thing that I think would be great if we could add is some sort of:
SELECT db_name, disk_space;
query wher a database owner, not the superuser, could see how much disk
space their tables are using up ... possible?
From pgsql-hackers-owner+M3412@hub.org Thu Jun 15 08:30:55 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA03842
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 08:30:54 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA15241 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 08:31:29 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5FCSM877572;
Thu, 15 Jun 2000 08:28:22 -0400 (EDT)
Received: from zrtps06s.us.nortel.com ([47.140.48.50])
by hub.org (8.10.1/8.10.1) with ESMTP id e5FCRS877255
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 08:27:28 -0400 (EDT)
Received: from ertpg15e1.nortelnetworks.com (actually zrtph06n.us.nortel.com)
by zrtps06s.us.nortel.com; Thu, 15 Jun 2000 08:26:34 -0400
Received: from zrtpd004.us.nortel.com (actually zrtpd004)
by ertpg15e1.nortelnetworks.com; Thu, 15 Jun 2000 08:26:11 -0400
Received: from zrtpd003.us.nortel.com ([47.140.224.137])
by zrtpd004.us.nortel.com
with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
id MPQCZWMM; Thu, 15 Jun 2000 08:26:10 -0400
Received: from americasm01.nt.com (hrtpp28d.us.nortel.com [47.190.110.250])
by zrtpd003.us.nortel.com
with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
From dhogaza@pacifier.com Thu Jun 15 09:31:05 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA04418
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 09:31:04 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA20080 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 09:22:36 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id GAA05755;
At 10:04 AM 6/15/00 +0200, Zeugswetter Andreas SB wrote:
>
>> In reality, very few people are going to be interested in restoring
>> a table in a way that breaks referential integrity and other
>> normal assumptions about what exists in the database.
>
>This is not true. In my DBA history it would have saved me manweeks
>of work if an easy and efficient restore of one single table from backup
>would have been available in Informix and Oracle.
>We allways had to restore most of the whole system to another machine only
>to get back at some table info that would then be manually re-added
>to the production system.
I'm missing something, I guess. You would do a createdb, do a filesystem
copy of pg_log and one file into it, and then read data from the table
without having to restore the other tables in the database?
I'm just curious - when was the last time you restored a Postgres
database in this piecemeal manner, and how often do you do it?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From pgsql-hackers-owner+M3440@hub.org Thu Jun 15 14:46:22 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA04607
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 14:46:21 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA12695 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 12:48:58 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5FGjXI40370;
Thu, 15 Jun 2000 12:45:33 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154])
by hub.org (8.10.1/8.10.1) with ESMTP id e5FGjJI39359
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 12:45:20 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m132clb-000LEEC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for pgsql-hackers@postgresql.org; Thu, 15 Jun 2000 11:45:19 -0500 (CDT)
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Importance: Normal
Status: ORr
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org
> [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian
>
> > > Can we *PLEASE JUST LET GO* of this bad idea? No relname in the
> > > filename. Period.
> > >
> >
> > Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
> > all_ when I first put up patches two month ago. O.K., I'll do the oids
> > only version (and fix up relpath_blind)
>
> Hold on. I don't think we want that work done yet. Seems even Tom is
> thinking that if Vadim is going to re-do everything later anyway, we may
> be better with a relname/oid solution that does require additional
> administration apps.
>
Hmm,why is naming rule first ?
I've never enphasized naming rule except that it should be unique.
It has been my main point to reduce the necessity of naming rule
as possible. IIRC,by keeping the stored place in pg_class,Ross's
trial patch remains only 2 places where naming rule is required.
So wouldn't we be free from naming rule(it would not be so difficult
to change naming rule if the rule is found to be bad) ?
I've also mentioned many times neither relname nor oid is sufficient
for the uniqueness. In addiiton neither relname nor oid would be
necessary for the uniqueness.
IMHO,it's bad to rely on the item which is neither necessary nor
sufficient.
I proposed relname+unique_id naming once. The unique_id is
independent from oid. The relname is only for convinience for
DBA and so we don't have to change it due to RENAME.
Db's consistency is much more important than dba's satis-
faction.
Comments ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3448@hub.org Thu Jun 15 19:01:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00764
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 19:01:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA17328 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 18:57:32 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5FMsMI97744;
Thu, 15 Jun 2000 18:54:22 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154])
by hub.org (8.10.1/8.10.1) with ESMTP id e5FMs0I94252
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 18:54:00 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m132iWN-000LEEC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for pgsql-hackers@postgresql.org; Thu, 15 Jun 2000 17:53:59 -0500 (CDT)
Date: Thu, 15 Jun 2000 17:53:59 -0500
From: "Ross J. Reedstrom" <reedstrm@rice.edu>
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
In-Reply-To: <200006152148.RAA27790@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Thu, Jun 15, 2000 at 05:48:59PM -0400
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
On Thu, Jun 15, 2000 at 05:48:59PM -0400, Bruce Momjian wrote:
> > I've also mentioned many times neither relname nor oid is sufficient
> > for the uniqueness. In addiiton neither relname nor oid would be
> > necessary for the uniqueness.
> > IMHO,it's bad to rely on the item which is neither necessary nor
> > sufficient.
> > I proposed relname+unique_id naming once. The unique_id is
> > independent from oid. The relname is only for convinience for
> > DBA and so we don't have to change it due to RENAME.
> > Db's consistency is much more important than dba's satis-
> > faction.
> >
> > Comments ?
>
> I am happy not to rename the file on 'RENAME', but seems no one likes
> that.
Good, 'cause that's how I've implemented it so far. Actually, all
I've done is port my previous patch to current, with one little
change: I added a macro RelationGetRealRelationName which does what
RelationGetPhysicalRelationName used to do: i.e. return the relname with
no temptable funny business, and used that for the relcache macros. It
passes all the serial regression tests: I haven't run the parallel tests
yet. ALTER TABLE RENAME rollsback nicely. I'll need to learn some omre
about xacts to get DROP TABLE rolling back.
I'll drop it on PATCHES right now, for comment.
Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005
From pgsql-patches-owner+M233@hub.org Thu Jun 15 19:31:07 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA01228
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 19:31:04 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA17880 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 19:05:42 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5FN11I12640;
Thu, 15 Jun 2000 19:01:01 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154])
by hub.org (8.10.1/8.10.1) with ESMTP id e5FN0qI12620
for <pgsql-patches@postgresql.org>; Thu, 15 Jun 2000 19:00:52 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m132iZu-000LEEC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for pgsql-patches@postgresql.org; Thu, 15 Jun 2000 17:57:38 -0500 (CDT)
Date: Thu, 15 Jun 2000 17:57:38 -0500
From: "Ross J. Reedstrom" <reedstrm@rice.edu>
To: Bruce Momjian <pgman@candle.pha.pa.us>
Cc: pgsql-patches@postgresql.org
Subject: [PATCHES] filename patch (was Re: [HACKERS] Big 7.1 open items)
From pgsql-hackers-owner+M3451@hub.org Thu Jun 15 20:01:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA01651
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 20:00:59 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA20985 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 19:57:49 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5FNsgI25402;
Thu, 15 Jun 2000 19:54:42 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5FNsCI22412
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 19:54:12 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA02263;
Comments: In-reply-to "Ross J. Reedstrom" <reedstrm@rice.edu>
message dated "Thu, 15 Jun 2000 11:45:19 -0500"
Date: Thu, 15 Jun 2000 19:53:52 -0400
Message-ID: <2260.961113232@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
>> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
>>>> Any strong objections to the mixed relname_oid solution?
>>
>> Yes!
> The plan here was to let VACUUM handle renaming the file, since it
> will already have all the necessary locks. This shortens the window
> of confusion. ALTER TABLE RENAME doesn't happen that often, really -
> the relname is there just for human consumption, then.
Yeah, I've seen tons of discussion of how if we do this, that, and
the other thing, and be prepared to fix up some other things in case
of crash recovery, we can make it work with filename == relname + OID
(where relname tracks logical name, at least at some remove).
Probably. Assuming nobody forgets anything.
I'm just trying to point out that that's a huge amount of pretty
delicate mechanism. The amount of work required to make it trustworthy
looks to me to dwarf the admin tools that Bruce is complaining about.
And we only have a few people competent to do the work. (With all
due respect, Ross, if you weren't already aware of the implications
for mdblindwrt, I have to wonder what else you missed.)
Filename == OID is so simple, reliable, and straightforward by
comparison that I think the decision is a no-brainer.
If we could afford to sink unlimited time into this one issue then
it might make sense to do it the hard way, but we have enough
important stuff on our TODO list to keep us all busy for years ---
I cannot believe that it's an effective use of our time to do this.
> Hmm, what's all this with functions in catalog.c that are only called by
> smgr/md.c? seems to me that anything having to do with physical storage
> (like the path!) belongs in the smgr abstraction.
Yeah, there's a bunch of stuff that should have been implemented by
adding new smgr entry points, but wasn't. It should be pushed down.
(I can't resist pointing out that one of those things is physical
relation rename, which will go away and not *need* to be pushed down
if we do it the way I want.)
regards, tom lane
From tgl@sss.pgh.pa.us Thu Jun 15 20:00:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA01647
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 20:00:58 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA21034 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 19:58:30 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA02283;
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Thu, 15 Jun 2000 15:35:45 -0400"
Date: Thu, 15 Jun 2000 19:57:05 -0400
Message-ID: <2280.961113425@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at
>> all_ when I first put up patches two month ago. O.K., I'll do the oids
>> only version (and fix up relpath_blind)
> Hold on. I don't think we want that work done yet. Seems even Tom is
> thinking that if Vadim is going to re-do everything later anyway, we may
> be better with a relname/oid solution that does require additional
> administration apps.
Don't put words in my mouth, please. If we are going to throw the
work away later, it'd be foolish to do the much greater amount of
work needed to make filename=relname+OID fly than is needed for
filename=OID.
However, I'm pretty sure I recall Vadim stating that he thought
filename=OID would be required for his smgr changes anyway...
regards, tom lane
From pgsql-hackers-owner+M3453@hub.org Thu Jun 15 21:01:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA02731
for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 21:01:01 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA23469 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 20:36:36 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5G0WDI97134;
Thu, 15 Jun 2000 20:32:13 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5G0VsI97003
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 20:31:54 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
> > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
> >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> >>>> Any strong objections to the mixed relname_oid solution?
> >>
> >> Yes!
>
> > The plan here was to let VACUUM handle renaming the file, since it
> > will already have all the necessary locks. This shortens the window
> > of confusion. ALTER TABLE RENAME doesn't happen that often, really -
> > the relname is there just for human consumption, then.
>
> Yeah, I've seen tons of discussion of how if we do this, that, and
> the other thing, and be prepared to fix up some other things in case
> of crash recovery, we can make it work with filename == relname + OID
> (where relname tracks logical name, at least at some remove).
>
I've seen little discussion of how to avoid the use of naming rule.
I've proposed many times that we should keep the information
where the table is stored in our database itself. I've never seen
clear objections to it. So I could understand my proposal is OK ?
Isn't it much more important than naming rule ? Under the
mechanism,we could easily replace bad naming rule.
And I believe that Ross's work is mostly around the mechanism
not naming rule.
Now I like neither relname nor oid because it's not sufficient
for my purpose.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Thu Jun 15 22:01:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA03637
for <maillist@candle.pha.pa.us>; Thu, 15 Jun 2000 22:01:01 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA28521 for <maillist@candle.pha.pa.us>; Thu, 15 Jun 2000 21:58:46 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA02730;
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <2727.961120647@sss.pgh.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Status: OR
Sorry for my previous mail. It was posted by my mistake.
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Now I like neither relname nor oid because it's not sufficient
> > for my purpose.
>
> We should probably not do much of anything with this issue until
> we have a clearer understanding of what we want to do about
> tablespaces and schemas.
>
> My gut feeling is that we will end up with pathnames that look
> something like
>
> .../data/base/DBNAME/TABLESPACE/OIDOFRELATION
>
Schema is a logical concept and irrevant to physical location.
I strongly object your suggestion unless above means *default*
location.
Tablespace is an encapsulation of table allocation and the
name should be irrevant to the location basically. So above
seems very bad for me.
Anyway I don't see any advantage in fixed mapping impleme
ntation. After renewal,we should at least have a possibility to
allocate a specific table in arbitrary separate directory.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Thu Jun 15 23:31:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06634;
Thu, 15 Jun 2000 23:30:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA03227; Thu, 15 Jun 2000 23:18:54 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
id MAA07544; Fri, 16 Jun 2000 12:18:06 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>, "Tom Lane" <tgl@sss.pgh.pa.us>
relname/unique_id but need some work new pg_class column,
no relname change. for unique-id generation filename not relname
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3465@hub.org Fri Jun 16 00:01:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA06924
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 00:01:00 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA05470 for <pgman@candle.pha.pa.us>; Thu, 15 Jun 2000 23:59:46 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5G3uaI10809;
Thu, 15 Jun 2000 23:56:36 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5G3uKI10702
for <pgsql-hackers@postgresql.org>; Thu, 15 Jun 2000 23:56:21 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <3264.961127021@sss.pgh.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Please add my opinion for naming rule.
>
> > relname/unique_id but need some work new
> pg_class column,
> > no relname change. for unique-id generation filename not relname
>
> Why is a unique ID better than --- or even different from ---
> using the relation's OID? It seems pointless to me...
>
For example,in the implementation of CLUSTER command,
we would need another new file for the target relation in
order to put sorted rows but don't we want to change the
OID ? It would be needed for table re-construction generally.
If I remember correectly,you once proposed OID+version
naming for the cases.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Fri Jun 16 02:01:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA08093
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 02:00:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA10174 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 01:34:44 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <3238.961126521@sss.pgh.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Tablespace is an encapsulation of table allocation and the
> > name should be irrevant to the location basically. So above
> > seems very bad for me.
> > Anyway I don't see any advantage in fixed mapping impleme
> > ntation. After renewal,we should at least have a possibility to
> > allocate a specific table in arbitrary separate directory.
>
> Call a "directory" a "tablespace" and we're on the same page,
> aren't we? Actually I'd envision some kind of admin command
> "CREATE TABLESPACE foo AS /path/to/wherever".
Yes,I think 'tablespace -> directory' is the most natural
extension under current file_per_table storage manager.
If many_tables_in_a_file storage manager is introduced,we
may be able to change the definiiton of TABLESPACE
to 'tablespace -> files' like Oracle.
> That would make
> appropriate system catalog entries and also create a symlink
> from ".../data/base/foo" (or some such place) to the target
> directory.
> Then when we make a table in that tablespace,
> it's in the right place. Problem solved, no?
>
I don't like symlink for dbms data files. However it may
be OK,If symlink are limited to 'tablespace->directory'
corrspondence and all tablespaces(including default
etc) are symlink. It is simple and all debugging would
be processed under tablespace_is_symlink environment.
> It gets a little trickier if you want to be able to split
> multi-gig tables across several tablespaces, though, since
> you couldn't just append ".N" to the base table path in that
> scenario.
>
This seems to be not that easy to solve now.
Ross doesn't change this naming rule for multi-gig
tables either in his trial.
> I'd be interested to know what sort of facilities Oracle
> provides for managing huge tables...
>
In my knowledge about old Oracle,one TABLESPACE
could have many DATAFILEs which could contain
many tables.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3469@hub.org Fri Jun 16 02:01:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA08109
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 02:01:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA11218 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 01:57:33 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5G5tLI49492;
Fri, 16 Jun 2000 01:55:21 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5G5tAI49395
for <pgsql-hackers@postgresql.org>; Fri, 16 Jun 2000 01:55:10 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA05749;
>> Why is a unique ID better than --- or even different from ---
>> using the relation's OID? It seems pointless to me...
> For example,in the implementation of CLUSTER command,
> we would need another new file for the target relation in
> order to put sorted rows but don't we want to change the
> OID ? It would be needed for table re-construction generally.
> If I remember correectly,you once proposed OID+version
> naming for the cases.
Hmm, so you are thinking that the pg_class row for the table would
include this uniqueID, and then committing the pg_class update would
be the atomic action that replaces the old table contents with the
new? It does have some attraction now that I think about it.
But there are other ways we could do the same thing. If we want to
have tablespaces, there will need to be a tablespace identifier in
each pg_class row. So we could do CLUSTER in the same way as we'd
move a table from one tablespace to another: create the new files in
the new tablespace directory, and the commit of the new pg_class row
with the new tablespace value is the atomic action that makes the new
files valid and the old files not.
You will probably say "but I didn't want to move my table to a new
tablespace just to cluster it!" I think we could live with that,
though. A tablespace doesn't need to have any existence more concrete
than a subdirectory, in my vision of the way things would work. We
could do something like making two subdirectories of each place that
the dbadmin designates as a "tablespace", so that we make two logical
tablespaces out of what the dbadmin thinks of as one. Then we can
ping-pong between those directories to do things like clustering "in
place".
Basically I want to keep the bottom-level mechanisms as simple and
reliable as we possibly can. The fewer concepts are known down at
the bottom, the better. If we can keep the pathname constituents
to just "tablespace" and "relation OID" we'll be in great shape ---
but each additional concept that has to be known down there is
another potential problem.
regards, tom lane
From pgsql-hackers-owner+M3471@hub.org Fri Jun 16 03:31:05 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA12816
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 03:31:04 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA14405 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 03:03:38 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5G71YI83633;
Fri, 16 Jun 2000 03:01:34 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5G713I82023
for <pgsql-hackers@postgresql.org>; Fri, 16 Jun 2000 03:01:04 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <5746.961134886@sss.pgh.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> Why is a unique ID better than --- or even different from ---
> >> using the relation's OID? It seems pointless to me...
>
> > For example,in the implementation of CLUSTER command,
> > we would need another new file for the target relation in
> > order to put sorted rows but don't we want to change the
> > OID ? It would be needed for table re-construction generally.
> > If I remember correectly,you once proposed OID+version
> > naming for the cases.
>
> Hmm, so you are thinking that the pg_class row for the table would
> include this uniqueID,
No,I just include the place where the table is stored(pathname under
current file_per_table storage manager) in the pg_class row because
I don't want to rely on table allocating rule(naming rule for current)
to access existent relation files. This has always been my main point.
Many_tables_in_a_file storage manager wouldn't be able to live without
keeping this kind of infomation.
This information(where it is stored) is diffrent from tablespace(where
to store) information. There was an idea to keep the information into
opaque entry in pg_class which only a specific storage manager
could handle. There was an idea to have a new system table which
keeps the information. and so on...
> and then committing the pg_class update would
> be the atomic action that replaces the old table contents with the
> new? It does have some attraction now that I think about it.
>
> But there are other ways we could do the same thing. If we want to
> have tablespaces, there will need to be a tablespace identifier in
> each pg_class row. So we could do CLUSTER in the same way as we'd
> move a table from one tablespace to another: create the new files in
> the new tablespace directory, and the commit of the new pg_class row
> with the new tablespace value is the atomic action that makes the new
> files valid and the old files not.
>
> You will probably say "but I didn't want to move my table to a new
> tablespace just to cluster it!"
Yes.
> I think we could live with that,
> though. A tablespace doesn't need to have any existence more concrete
> than a subdirectory, in my vision of the way things would work. We
> could do something like making two subdirectories of each place that
> the dbadmin designates as a "tablespace", so that we make two logical
> tablespaces out of what the dbadmin thinks of as one.
Certainly we could design TABLESPACE(where to store) as above.
> Then we can
> ping-pong between those directories to do things like clustering "in
> place".
>
But maybe we must keep the directory information where the table was
*ping-ponged* in (e.g.) pg_class. Is such an implementation cleaner or
more extensible than mine(keeping the stored place exactly) ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3473@hub.org Fri Jun 16 04:01:12 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA13087
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 04:01:11 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA16002 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 03:37:24 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5G7ZZI51521;
Fri, 16 Jun 2000 03:35:35 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5G7ZEI51350
for <pgsql-hackers@postgresql.org>; Fri, 16 Jun 2000 03:35:14 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA06103;
Fri, 16 Jun 2000 03:34:47 -0400 (EDT)
To: Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
Comments: In-reply-to Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
message dated "Fri, 16 Jun 2000 15:36:04 +1000"
Date: Fri, 16 Jun 2000 03:34:47 -0400
Message-ID: <6100.961140887@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> Tom Lane wrote:
>> I don't see a lot of value in that. Better to do something like
>> tablespaces:
>>
>> <dbroot>/<oidoftablespace>/<oidofobject>
> What is the benefit of having oidoftablespace in the directory path?
> Isn't tablespace an idea so you can store it somewhere completely
> different?
> Or is there some symlink idea or something?
Exactly --- I'm assuming that the tablespace "directory" is likely
to be a symlink to some other mounted volume. The point here is
to keep the low-level file access routines from having to know very
much about tablespaces or file organization. In the above proposal,
all they need to know is the relation's OID and the name (or OID)
of the tablespace the relation's assigned to; then they can form
a valid path using a hardwired rule. There's still plenty of
flexibility of organization, but it's not necessary to know that
where the rubber meets the road (eg, when you're down inside mdblindwrt
trying to dump a dirty buffer to disk with no spare resources to find
out anything about the relation the page belongs to...)
regards, tom lane
From JanWieck@t-online.de Fri Jun 16 11:01:06 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA28913
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 11:01:05 -0400 (EDT)
Received: from mailout05.sul.t-online.com (mailout05.sul.t-online.com [194.25.134.82]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id KAA01818 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 10:46:42 -0400 (EDT)
Received: from fwd06.sul.t-online.de
by mailout05.sul.t-online.com with smtp
id 132xN9-0006ze-03; Fri, 16 Jun 2000 16:45:27 +0200
Received: from hot.jw.home (340000654369-0001@[62.158.179.251]) by fwd06.sul.t-online.de
with esmtp id 132xMx-0E54HQC; Fri, 16 Jun 2000 16:45:15 +0200
Received: (from wieck@localhost)
by hot.jw.home (8.8.5/8.8.5) id OAA15163;
Fri, 16 Jun 2000 14:42:12 +0200
From: JanWieck@t-online.de (Jan Wieck)
Message-Id: <200006161242.OAA15163@hot.jw.home>
Subject: Re: [HACKERS] Big 7.1 open items
In-Reply-To: <3238.961126521@sss.pgh.pa.us> from Tom Lane at "Jun 15, 2000 11:35:21
pm"
To: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 16 Jun 2000 14:42:12 +0200 (MEST)
CC: Hiroshi Inoue <Inoue@tpf.co.jp>, Bruce Momjian <maillist@candle.pha.pa.us>,
> You can run out of space even if there are plenty GB's
> free on your disks. You have to create tablespaces
> explicitly.
Not to mention the reverse: if I read this right, you have to suck
up your GB's long in advance of actually needing them. That's OK
for a machine that's dedicated to Oracle ... not so OK for smaller
installations, playpens, etc.
I'm not convinced that there's anything fundamentally wrong with
doing storage allocation in Unix files the way we have been.
(At least not when we're sitting atop a well-done filesystem,
which may leave the Linux folk out in the cold ;-).)
regards, tom lane
From tgl@sss.pgh.pa.us Fri Jun 16 12:01:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA29853
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 12:01:02 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA08255 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 11:48:10 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA07461;
Fri, 16 Jun 2000 11:46:41 -0400 (EDT)
To: Jan Wieck <JanWieck@Yahoo.com>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Bruce Momjian <maillist@candle.pha.pa.us>,
This isn't any harder for md.c to deal with than what we do now,
but by making the /N subdirectories be symlinks, the dbadmin could
easily arrange for extension segments to go on different filesystems.
Also, since /N subdirectory symlinks can be added as needed,
expanding available space by attaching more disks isn't hard.
(If the admin hasn't pre-made a /N symlink when it's needed,
I'd envision the backend just automatically creating a plain
subdirectory so that it can extend the table.)
A limitation is that the N'th extension segments of all the relations
in a given tablespace have to be in the same place, but I don't see
that as a major objection. Worst case is you make a separate tablespace
for each of your multi-gig relations ... you're probably not going to
have a very large number of such relations, so this doesn't seem like
unmanageable admin complexity.
We'd still want to create some tools to help the dbadmin with slinging
all these symlinks around, of course. But I think it's critical to keep
the low-level file access protocol simple and reliable, which really
means minimizing the amount of information the backend needs to know to
figure out which file to write a page in. With something like the above
you only need to know the tablespace name (or more likely OID), the
relation OID (+name or not, depending on outcome of other argument),
and the offset in the table. No worse than now from the software's
point of view.
Comments?
regards, tom lane
From lockhart@alumni.caltech.edu Fri Jun 16 12:31:50 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA00649
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 12:31:49 -0400 (EDT)
Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA13118 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 12:31:52 -0400 (EDT)
Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203])
by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id JAA15007;
Fri, 16 Jun 2000 09:27:18 -0700 (PDT)
Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1])
by golem.jpl.nasa.gov (Postfix) with ESMTP
id DD8426F51; Fri, 16 Jun 2000 16:27:22 +0000 (UTC)
> the low-level file access protocol simple and reliable, which really
> means minimizing the amount of information the backend needs to know
> to figure out which file to write a page in. With something like the
> above you only need to know the tablespace name (or more likely OID),
> the relation OID (+name or not, depending on outcome of other
> argument), and the offset in the table. No worse than now from the
> software's point of view.
> Comments?
I'm probably missing the context a bit, but imho we should try hard to
stay away from symlinks as the general solution for anything.
Sorry for being behind here, but to make sure I'm on the right page:
o tablespaces decouple storage from logical tables
o a database lives in a default tablespace, unless specified
o by default, a table will live in the default tablespace
o (eventually) a table can be split across tablespaces
Some thoughts:
o the ability to split single tables across disks was essential for
scalability when disks were small. But with RAID, NAS, etc etc isn't
that a smaller issue now?
o "tablespaces" would implement our less-developed "with location"
feature, right? Splitting databases, whole indices and whole tables
across storage is the biggest win for this work since more users will
use the feature.
o location information needs to travel with individual tables anyway.
From scrappy@hub.org Fri Jun 16 13:01:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01191;
Fri, 16 Jun 2000 13:01:01 -0400 (EDT)
Received: from thelab.hub.org (nat193.152.mpoweredpc.net [142.177.193.152]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA15282; Fri, 16 Jun 2000 12:53:23 -0400 (EDT)
Received: from localhost (scrappy@localhost)
by thelab.hub.org (8.9.3/8.9.3) with ESMTP id NAA28326;
Fri, 16 Jun 2000 13:50:37 -0300 (ADT)
(envelope-from scrappy@hub.org)
X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs
Date: Fri, 16 Jun 2000 13:50:37 -0300 (ADT)
From: The Hermit Hacker <scrappy@hub.org>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Hiroshi Inoue <Inoue@tpf.co.jp>,
> Keep current system no work rename/create no rollback
>
> relname/oid but less work new pg_class column,
> no rename change filename not accurate on
> rename
>
> relname/oid with more work complex code
> rename change during
> vacuum
>
> oid filename less work, but confusing to admins
> need admin tools
My vote is with Tom on this one ... oid only ... the admin should be able
to do a quick SELECT on a table to find out the OID->table mapping, and I
believe its already been pointed out that you cant' just restore one file
anyway, so it kinda negates the "server isn't running problem" ...
From tgl@sss.pgh.pa.us Fri Jun 16 13:01:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01188
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 13:01:01 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA15530 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 12:55:38 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA07750;
Fri, 16 Jun 2000 12:54:00 -0400 (EDT)
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
cc: Jan Wieck <JanWieck@yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>,
Comments: In-reply-to Thomas Lockhart <lockhart@alumni.caltech.edu>
message dated "Fri, 16 Jun 2000 16:27:22 -0000"
Date: Fri, 16 Jun 2000 12:54:00 -0400
Message-ID: <7747.961174440@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> ... But I think it's critical to keep
>> the low-level file access protocol simple and reliable, which really
>> means minimizing the amount of information the backend needs to know
>> to figure out which file to write a page in. With something like the
>> above you only need to know the tablespace name (or more likely OID),
>> the relation OID (+name or not, depending on outcome of other
>> argument), and the offset in the table. No worse than now from the
>> software's point of view.
>> Comments?
> I'm probably missing the context a bit, but imho we should try hard to
> stay away from symlinks as the general solution for anything.
Why?
regards, tom lane
From dhogaza@pacifier.com Fri Jun 16 14:55:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA02086
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 14:54:59 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA26430 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 14:40:00 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id LAA08661;
>This isn't any harder for md.c to deal with than what we do now,
>but by making the /N subdirectories be symlinks, the dbadmin could
>easily arrange for extension segments to go on different filesystems.
I personally dislike depending on symlinks to move stuff around.
Among other things, a pg_dump/restore (and presumably future
backup tools?) can't recreate the disk layout automatically.
>We'd still want to create some tools to help the dbadmin with slinging
>all these symlinks around, of course.
OK, if symlinks are simply an implementation detail hidden from the
dbadmin, and if the physical structure is kept in the db so it can
be rebuilt if necessary automatically, then I don't mind symlinks.
> But I think it's critical to keep
>the low-level file access protocol simple and reliable, which really
>means minimizing the amount of information the backend needs to know to
>figure out which file to write a page in. With something like the above
>you only need to know the tablespace name (or more likely OID), the
>relation OID (+name or not, depending on outcome of other argument),
>and the offset in the table. No worse than now from the software's
>point of view.
Make the code that creates and otherwise manipulates tablespaces
do the work, while keeping the low-level file access protocol simple.
Yes, this approach sounds very good to me.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From pgsql-hackers-owner+M3500@hub.org Fri Jun 16 14:55:10 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA02107
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 14:55:09 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA26943 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 14:44:12 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5GIelM05972;
Fri, 16 Jun 2000 14:40:47 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155])
by hub.org (8.10.1/8.10.1) with ESMTP id e5GIe5M05692
for <pgsql-hackers@postgresql.org>; Fri, 16 Jun 2000 14:40:05 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id LAA08667;
Comments: In-reply-to Don Baccus <dhogaza@pacifier.com>
message dated "Fri, 16 Jun 2000 10:50:23 -0700"
Date: Fri, 16 Jun 2000 15:00:10 -0400
Message-ID: <8244.961182010@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Don Baccus <dhogaza@pacifier.com> writes:
>> This isn't any harder for md.c to deal with than what we do now,
>> but by making the /N subdirectories be symlinks, the dbadmin could
>> easily arrange for extension segments to go on different filesystems.
> I personally dislike depending on symlinks to move stuff around.
> Among other things, a pg_dump/restore (and presumably future
> backup tools?) can't recreate the disk layout automatically.
Good point, we'd need some way of saving/restoring the tablespace
structures.
>> We'd still want to create some tools to help the dbadmin with slinging
>> all these symlinks around, of course.
> OK, if symlinks are simply an implementation detail hidden from the
> dbadmin, and if the physical structure is kept in the db so it can
> be rebuilt if necessary automatically, then I don't mind symlinks.
I'm not sure about keeping it in the db --- creates a bit of a
chicken-and-egg problem doesn't it? Maybe there needs to be a
"system database" that has nailed-down pathnames (no tablespaces
for you baby) and contains the critical installation-wide tables
like pg_database, pg_user, pg_tablespace. A restore would have
to restore these tables first anyway.
> Make the code that creates and otherwise manipulates tablespaces
> do the work, while keeping the low-level file access protocol simple.
Right, that's the bottom line for me.
regards, tom lane
From reedstrm@rice.edu Fri Jun 16 16:51:50 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA03689
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 16:51:49 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id PAA03409 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 15:48:40 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m1331to-000LEJC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for maillist@candle.pha.pa.us; Fri, 16 Jun 2000 14:35:28 -0500 (CDT)
Date: Fri, 16 Jun 2000 14:35:28 -0500
From: "Ross J. Reedstrom" <reedstrm@rice.edu>
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Jan Wieck <JanWieck@yahoo.com>,
Rice University, 6100 S. Main St., Houston, TX 77005
From dhogaza@pacifier.com Fri Jun 16 16:51:51 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA03692
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 16:51:50 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id PAA02911 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 15:43:13 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id MAA11003;
>> OK, if symlinks are simply an implementation detail hidden from the
>> dbadmin, and if the physical structure is kept in the db so it can
>> be rebuilt if necessary automatically, then I don't mind symlinks.
>
>I'm not sure about keeping it in the db --- creates a bit of a
>chicken-and-egg problem doesn't it?
Not if the tablespace creates preceeds the tables stored in them.
> Maybe there needs to be a
>"system database" that has nailed-down pathnames (no tablespaces
>for you baby) and contains the critical installation-wide tables
>like pg_database, pg_user, pg_tablespace. A restore would have
>to restore these tables first anyway.
Oh, I see. Yes, when I've looked into this and have thought about
it I've assumed that there would always be a known starting point
which would contain the installation-wide tables.
>From a practical point of view, I don't think that's really a
problem.
I've not looked into how Oracle does this, I assume it builds
a system tablespace on one of the initial mount points you give
it when you install the thing. The paths to the mount points
are stored in specific files known to Oracle, I think. It's
been over a year (not long enough!) since I've set up Oracle...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From pgsql-hackers-owner+M3512@hub.org Fri Jun 16 17:31:04 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04168
for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:31:03 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id RAA12122 for <pgman@candle.pha.pa.us>; Fri, 16 Jun 2000 17:09:28 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5GL7WM02231;
Fri, 16 Jun 2000 17:07:32 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154])
by hub.org (8.10.1/8.10.1) with ESMTP id e5GL7EM02150
for <pgsql-hackers@postgresql.org>; Fri, 16 Jun 2000 17:07:14 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m1333Kb-000LEJC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for pgsql-hackers@postgresql.org; Fri, 16 Jun 2000 16:07:13 -0500 (CDT)
In-Reply-To: <2260.961113232@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Thu, Jun 15, 2000 at 07:53:52PM -0400
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
On Thu, Jun 15, 2000 at 07:53:52PM -0400, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote:
> >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> >>>> Any strong objections to the mixed relname_oid solution?
> >>
> >> Yes!
>
> > The plan here was to let VACUUM handle renaming the file, since it
> > will already have all the necessary locks. This shortens the window
> > of confusion. ALTER TABLE RENAME doesn't happen that often, really -
> > the relname is there just for human consumption, then.
>
> Yeah, I've seen tons of discussion of how if we do this, that, and
> the other thing, and be prepared to fix up some other things in case
> of crash recovery, we can make it work with filename == relname + OID
> (where relname tracks logical name, at least at some remove).
>
> Probably. Assuming nobody forgets anything.
I agree, it seems a major undertaking, at first glance. And second. Even
third. Especially for someone who hasn't 'earned his spurs' yet. as
it were.
> I'm just trying to point out that that's a huge amount of pretty
> delicate mechanism. The amount of work required to make it trustworthy
> looks to me to dwarf the admin tools that Bruce is complaining about.
> And we only have a few people competent to do the work. (With all
> due respect, Ross, if you weren't already aware of the implications
> for mdblindwrt, I have to wonder what else you missed.)
Ah, you knew that comment would come back to haunt me (I have a
tendency to think out loud, even if checking and coming back latter
would be better;-) In fact, there's no problem, and never was, since the
buffer->blind.relname is filled in via RelationGetPhysicalRelationName,
just like every other path that requires direct file access. I just
didn't remember that I had in fact checked it (it's been a couple months,
and I just got back from vacation ;-)
Actually, Once I re-checked it, the code looked very familiar. I had
spent time looking at the blind write code in the context of getting
rid of the only non-startup use of GetRawDatabaseInfo.
As to missing things: I'm leaning heavily on Bruce's previous
work for temp tables, to seperate the two uses of relname, via the
RelationGetRelationName and RelationGetPhysicalRelationName. There are
102 uses of the first in the current code (many in elog messages), and
only 11 of the second. If I'd had to do the original work of finding
every use of relname, and catagorizing it, I agree I'm not (yet) up to
it, but I have more confidence in Bruce's (already tested) work.
>
> Filename == OID is so simple, reliable, and straightforward by
> comparison that I think the decision is a no-brainer.
>
Perhaps. Changing the label of the file on disk still requires finding
all the code that assumes it knows what that name is, and changing it.
Same work.
> If we could afford to sink unlimited time into this one issue then
> it might make sense to do it the hard way, but we have enough
> important stuff on our TODO list to keep us all busy for years ---
> I cannot believe that it's an effective use of our time to do this.
>
The joys of Open Development. You've spent a fair amount of time trying
to convince _me_ not to waste my time. Thanks, but I'm pretty bull headed
sometimes. Since I've already done something of the work, take a look
at what I've got, and then tell me I'm wasting my time, o.k.?
>
> > Hmm, what's all this with functions in catalog.c that are only called by
> > smgr/md.c? seems to me that anything having to do with physical storage
> > (like the path!) belongs in the smgr abstraction.
>
> Yeah, there's a bunch of stuff that should have been implemented by
> adding new smgr entry points, but wasn't. It should be pushed down.
> (I can't resist pointing out that one of those things is physical
> relation rename, which will go away and not *need* to be pushed down
> if we do it the way I want.)
>
Oh, I agree completely. In fact, As I said to Hiroshi last time this came
up, I think of the field in pg_class an an opaque token, to be filled in
by the smgr, and only used by code further up to hand back to the smgr
routines. Same should be true of the buffer->blind struct.
Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005
From Inoue@tpf.co.jp Fri Jun 16 19:31:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA05334
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 19:30:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA19834 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 19:09:59 -0400 (EDT)
Received: from mcadnote1 (ppm122.noc.fukui.nsk.ne.jp [210.161.188.41])
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
id IAA08210; Sat, 17 Jun 2000 08:08:15 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Tom Lane" <tgl@sss.pgh.pa.us>, "Jan Wieck" <JanWieck@Yahoo.com>
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
In-Reply-To: <7181.961167635@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Importance: Normal
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> JanWieck@t-online.de (Jan Wieck) writes:
> > There are also disadvantages.
>
> > You can run out of space even if there are plenty GB's
> > free on your disks. You have to create tablespaces
> > explicitly.
>
> Not to mention the reverse: if I read this right, you have to suck
> up your GB's long in advance of actually needing them. That's OK
> for a machine that's dedicated to Oracle ... not so OK for smaller
> installations, playpens, etc.
>
I've had an anxiety about the way like Oracle's preallocation.
It had not been easy for me to estimate the extent size in
Oracle. Maybe it would lose the simplicity of environment
settings which is one of the biggest advantage of PostgreSQL.
It seems that we should also provide not_preallocated DATAFILE
when many_tables_in_a_file storage manager is introduced.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Fri Jun 16 19:31:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA05337
for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 19:31:00 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA20335 for <maillist@candle.pha.pa.us>; Fri, 16 Jun 2000 19:18:26 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA09274;
Fri, 16 Jun 2000 19:16:37 -0400 (EDT)
To: "Ross J. Reedstrom" <reedstrm@rice.edu>
cc: Thomas Lockhart <lockhart@alumni.caltech.edu>,
Jan Wieck <JanWieck@Yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>,
> It seems that we should also provide not_preallocated DATAFILE
> when many_tables_in_a_file storage manager is introduced.
Several people in this thread have been talking like a
single-physical-file storage manager is in our future, but I can't
recall anyone saying that they were going to do such a thing or even
presenting reasons why it'd be a good idea.
Seems to me that physical file per relation is considerably better for
our purposes. It's easier to figure out what's going on for admin and
debug work, it means less lock contention among different backends
appending concurrently to different relations, and it gives the OS a
better shot at doing effective read-ahead on sequential scans.
So why all the enthusiasm for multi-tables-per-file?
regards, tom lane
From chris@bitmead.com Fri Jun 16 21:01:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07578;
Fri, 16 Jun 2000 21:01:00 -0400 (EDT)
Received: from tech.com.au (IDENT:root@techpt.lnk.telstra.net [139.130.75.122]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA24724; Fri, 16 Jun 2000 20:39:30 -0400 (EDT)
Received: from bitmead.com (IDENT:chris@tardis [203.41.180.243])
by tech.com.au (8.9.3/8.9.3) with ESMTP id KAA21388;
Sat, 17 Jun 2000 10:39:21 +1000
Sender: chris@tech.com.au
Message-ID: <394AC8B4.C5B4CCFB@bitmead.com>
Date: Sat, 17 Jun 2000 10:39:16 +1000
From: Chris Bitmead <chris@bitmead.com>
X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Bruce Momjian <pgman@candle.pha.pa.us>
CC: Tom Lane <tgl@sss.pgh.pa.us>, Hiroshi Inoue <Inoue@tpf.co.jp>,
> > So why all the enthusiasm for multi-tables-per-file?
It allows you to use raw partitions which stop the OS double buffering
and wasting half of memory, as well as removing the overhead of indirect
blocks in the file system.
From Inoue@tpf.co.jp Sat Jun 17 06:00:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA22177;
Sat, 17 Jun 2000 06:00:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id FAA21759; Sat, 17 Jun 2000 05:36:27 -0400 (EDT)
Received: from mcadnote1 (ppm130.noc.fukui.nsk.ne.jp [210.161.188.49])
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
id SAA08383; Sat, 17 Jun 2000 18:35:36 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>, "Tom Lane" <tgl@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Importance: Normal
Status: OR
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> >
> > So why all the enthusiasm for multi-tables-per-file?
>
> No idea. I thought Vadim mentioned it, but I am not sure anymore. I
> certainly like our current system.
>
Oops,I'm not so enthusiastic for multi_tables_per_file smgr.
I believe that Ross and I have taken a practical way that doesn't
break current file_per_table smgr.
However it seems very natural to take multi_tables_per_file
smgr into account when we consider TABLESPACE concept.
Because TABLESPACE is an encapsulation,it should have
a possibility to handle multi_tables_per_file smgr IMHO.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Sat Jun 17 12:31:08 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA02794;
Sat, 17 Jun 2000 12:31:07 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA07194; Sat, 17 Jun 2000 12:12:53 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA18824;
> However it seems very natural to take multi_tables_per_file
> smgr into account when we consider TABLESPACE concept.
> Because TABLESPACE is an encapsulation,it should have
> a possibility to handle multi_tables_per_file smgr IMHO.
OK, I see: you're just saying that the tablespace stuff should be
designed in such a way that it would work with a non-file-per-table
smgr. Agreed, that'd be a good check of a clean design, and someday
we might need it...
regards, tom lane
From tgl@sss.pgh.pa.us Sun Jun 18 12:30:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA06514
for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 12:30:58 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA04979 for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 12:07:44 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA12163;
Sun, 18 Jun 2000 12:06:29 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@Yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Sun, 18 Jun 2000 09:33:44 -0400"
Date: Sun, 18 Jun 2000 12:06:29 -0400
Message-ID: <12160.961344389@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> ... We could even get fancy and
> round-robin through all the extents directories, looping around to the
> beginning when we run out of them. That sounds nice.
That sounds horrible. There's no way to tell which extent directory
extent N goes into except by scanning the location directory to find
out how many extent subdirectories there are (so that you can compute
N modulo number-of-directories). Do you want to pay that price on every
file open?
Worse, what happens when you add another extent directory? You can't
find your old extents anymore, that's what, because they're not in the
right place (N modulo number-of-directories just changed). Since the
extents are presumably on different volumes, you're talking about
physical file moves to get them where they should be. You probably
can't add a new extent without shutting down the entire database while
you reshuffle files --- at the very least you'd need to get exclusive
locks on all the tables in that tablespace.
Also, you'll get filename conflicts from multiple extents of a single
table appearing in one of the recycled extent dirs. You could work
around it by using the non-modulo'd N as part of the final file name,
but that just adds more complexity and makes the filename-generation
machinery that much more closely tied to this specific way of doing
things.
The right way to do this is that extent N goes into extents subdirectory
N, period. If there's no such subdirectory, create one on-the-fly as a
plain subdirectory of the location directory. The dbadmin can easily
create secondary extent symlinks *in advance of their being needed*.
Reorganizing later is much more painful since it requires moving
physical files, but I think that'd be true no matter what. At least
we should see to it that adding more space in advance of needing it is
painless.
It's possible to do it that way (auto-create extent subdir if needed)
without tying the md.c machinery real closely to a specific filename
creation procedure: it's just the same sort of thing as install programs
customarily do. "If you fail to create a file, try creating its
ancestor directory." We'd have to think about whether it'd be a good
idea to allow auto-creation of more than one level of directory; offhand
it seems that needing to make more than one level is probably a sign of
an erroneous path, not need for another extent subdirectory.
regards, tom lane
From dhogaza@pacifier.com Sun Jun 18 20:01:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA19951
for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 20:00:59 -0400 (EDT)
Received: from smtp.pacifier.com (asteroid.pacifier.com [199.2.117.154]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA24345 for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 19:50:06 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id QAA05302;
>If we eliminate the round-robin idea, what did people think of the rest
>of the ideas?
Why invent new syntax when "create tablespace" is something a lot
of folks will recognize?
And why not use "create table ... using ... "? In other words,
Oracle-compatible for this construct? Sure, Postgres doesn't
have to follow Oraclisms but picking an existing contruct means
at least SOME folks can import a datamodel without having to
edit it.
Does your proposal break the smgr abstraction, i.e. does it
preclude later efforts to (say) implement an (optional)
raw-device storage manager?
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From pgsql-hackers-owner+M3571@hub.org Sun Jun 18 23:28:13 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA23880
for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 23:28:12 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA04627 for <pgman@candle.pha.pa.us>; Sun, 18 Jun 2000 23:24:37 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5J3GQM78526;
Sun, 18 Jun 2000 23:16:26 -0400 (EDT)
Received: from candle.pha.pa.us (pgman@nav-43.dsl.navpoint.com [162.33.245.46])
by hub.org (8.10.1/8.10.1) with ESMTP id e5J3E3M71538
for <pgsql-hackers@postgresql.org>; Sun, 18 Jun 2000 23:14:03 -0400 (EDT)
My basic proposal is that we optionally allow symlinks when creating
tablespace directories, and that we interrogate those symlinks during a
dump so administrators can move tablespaces around without having to
modify environment variables or system tables.
I also suggested creating an extent directory to hold extents, like
extent/2 and extent/3. This will allow administration for smaller sites
to be simpler.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From dhogaza@pacifier.com Mon Jun 19 00:31:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA01941
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:31:00 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA06881 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:11:39 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id VAA29138;
>My basic proposal is that we optionally allow symlinks when creating
>tablespace directories, and that we interrogate those symlinks during a
>dump so administrators can move tablespaces around without having to
>modify environment variables or system tables.
If they can move them around from within the db, they'll have no need to
move them around from outside the db.
I don't quite understand your devotion to using filesystem commands
outside the database to do database administration.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From pgsql-hackers-owner+M3573@hub.org Mon Jun 19 01:31:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA01981
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 01:31:01 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09569 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 01:13:53 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5J4T3M86960;
Mon, 19 Jun 2000 00:29:04 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5J4RFM80712
for <pgsql-hackers@postgresql.org>; Mon, 19 Jun 2000 00:27:15 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09517;
Mon, 19 Jun 2000 00:25:53 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Sun, 18 Jun 2000 23:13:44 -0400"
Date: Mon, 19 Jun 2000 00:25:52 -0400
Message-ID: <9514.961388752@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I also suggested creating an extent directory to hold extents, like
> extent/2 and extent/3. This will allow administration for smaller sites
> to be simpler.
I don't see the value in creating an extra level of directory --- seems
that just adds one more Unix directory-lookup cycle to each file open,
without any apparent return. What's wrong with extent directory names
like extent2, extent3, etc?
Obviously the extent dirnames must be chosen so they can't conflict
with table filenames, but that's easily done. For example, if table
files are named like 'OID_xxx' then 'extentN' will never conflict.
regards, tom lane
From tgl@sss.pgh.pa.us Mon Jun 19 00:30:58 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA01934
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:30:58 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA07814 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:29:36 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09535;
Mon, 19 Jun 2000 00:28:14 -0400 (EDT)
To: Don Baccus <dhogaza@pacifier.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Jan Wieck <JanWieck@yahoo.com>,
Comments: In-reply-to Don Baccus <dhogaza@pacifier.com>
message dated "Sun, 18 Jun 2000 21:07:48 -0700"
Date: Mon, 19 Jun 2000 00:28:14 -0400
Message-ID: <9532.961388894@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr
Don Baccus <dhogaza@pacifier.com> writes:
> If they can move them around from within the db, they'll have no need to
> move them around from outside the db.
> I don't quite understand your devotion to using filesystem commands
> outside the database to do database administration.
Being *able* to use filesystem commands to see/fix what's going on is a
good thing, particularly from a development/debugging standpoint. But
I agree we want to have within-the-system admin commands to do the same
things.
regards, tom lane
From pgsql-hackers-owner+M3574@hub.org Mon Jun 19 01:31:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA01977
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 01:31:00 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09374 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 01:07:50 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5J4VkM95901;
Mon, 19 Jun 2000 00:31:46 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5J4TgM89399
for <pgsql-hackers@postgresql.org>; Mon, 19 Jun 2000 00:29:42 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09535;
Mon, 19 Jun 2000 00:28:14 -0400 (EDT)
To: Don Baccus <dhogaza@pacifier.com>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Jan Wieck <JanWieck@yahoo.com>,
Comments: In-reply-to Don Baccus <dhogaza@pacifier.com>
message dated "Sun, 18 Jun 2000 21:07:48 -0700"
Date: Mon, 19 Jun 2000 00:28:14 -0400
Message-ID: <9532.961388894@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
Don Baccus <dhogaza@pacifier.com> writes:
> If they can move them around from within the db, they'll have no need to
> move them around from outside the db.
> I don't quite understand your devotion to using filesystem commands
> outside the database to do database administration.
Being *able* to use filesystem commands to see/fix what's going on is a
good thing, particularly from a development/debugging standpoint. But
I agree we want to have within-the-system admin commands to do the same
things.
regards, tom lane
From dhogaza@pacifier.com Mon Jun 19 00:58:39 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA00799
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:58:38 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA08143 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 00:37:39 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id VAA00259;
>Being *able* to use filesystem commands to see/fix what's going on is a
>good thing, particularly from a development/debugging standpoint.
Of course it's a crutch for development, but outside of development
circles few users will know how to use the OS in regard to the
database.
Assuming PG takes off. Of course, if it remains the realm of the
dedicated hard-core hacker, I'm wrong.
I have nothing against preserving the ability to use filesystem
commands if there's no significant costs inherent with this approach.
I'd view the breaking of smgr abstraction as a significant cost (though
I agree with Ross that it Bruce's proposal shouldn't require that, I
asked my question to flush Bruce out, if you will, because he's
devoted to a particular outside-the-db management model).
> But
>I agree we want to have within-the-system admin commands to do the same
>things.
MUST have, I should think.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From Inoue@tpf.co.jp Mon Jun 19 12:31:17 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA29988
for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 12:31:16 -0400 (EDT)
Received: from sd.tpf.co.jp (mail.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA21005 for <pgman@candle.pha.pa.us>; Mon, 19 Jun 2000 12:15:22 -0400 (EDT)
Received: from mcadnote1 (ppm127.noc.fukui.nsk.ne.jp [210.161.188.46])
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Importance: Normal
Status: ORr
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
>
> The fact is that symlink information is already stored in the file
> system. If we store symlink information in the database too, there
> exists the ability for the two to get out of sync. My point is that I
> think we can _not_ store symlink information in the database, and query
> the file system using lstat when required.
>
Hmm,this seems pretty confusing to me.
I don't understand the necessity of symlink.
Directory tree,symlink,hard link ... are OS's standard.
But I don't think they are fit for dbms management.
PostgreSQL is a database system of cource. So
couldn't it handle more flexible structure than OS's
directory tree for itself ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Tue Jun 20 02:01:04 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24419
for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 02:00:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA26090 for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 01:51:00 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Importance: Normal
Status: ORr
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
>
> > > -----Original Message-----
> > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> > >
> > > The fact is that symlink information is already stored in the file
> > > system. If we store symlink information in the database too, there
> > > exists the ability for the two to get out of sync. My point is that I
> > > think we can _not_ store symlink information in the database,
> and query
> > > the file system using lstat when required.
> > >
> > Hmm,this seems pretty confusing to me.
> > I don't understand the necessity of symlink.
> > Directory tree,symlink,hard link ... are OS's standard.
> > But I don't think they are fit for dbms management.
> >
> > PostgreSQL is a database system of cource. So
> > couldn't it handle more flexible structure than OS's
> > directory tree for itself ?
>
> Yes, but is anyone suggesting a solution that does not work with
> symlinks? If not, why not do it that way?
>
Maybe other solutions have been proposed already because
there have been so many opinions and proposals.
I've felt TABLE(DATA)SPACE discussion has always been
divergent. IMHO,one of the main cause is that various factors
have been discussed at once. Shouldn't we make step by step
consensus in TABLE(DATA)SPACE discussion ?
IMHO,the first step is to decide the syntax of CREATE TABLE
command not to define TABLE(DATA)SPACE.
Comments ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Tue Jun 20 10:51:32 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA15181
for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 10:51:31 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id KAA26466 for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 10:37:20 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id KAA29689;
Tue, 20 Jun 2000 10:36:04 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Jan Wieck <JanWieck@yahoo.com>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Tue, 20 Jun 2000 09:40:03 -0400"
Date: Tue, 20 Jun 2000 10:36:04 -0400
Message-ID: <29686.961511764@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Agreed. Seems we have several issues:
> filename contents
> tablespace implementation
> tablespace directory layout
> tablespace commands and syntax
I think we've agreed that the filename must depend on tablespace,
file version, and file segment number in some fashion --- plus
the table name/OID of course. Although there's no real consensus
about exactly how to construct the name, agreeing on the components
is still a positive step.
A couple of other areas of contention were:
revising smgr interface to be cleaner
exactly what to store in pg_class
I don't think there's any quibble about the idea of cleaning up smgr,
but we don't have a complete proposal on the table yet either.
As for the pg_class issue, I still favor storing
(a) OID of tablespace --- not for file access, but so that
associated tablespace-table entry can be looked up
by tablespace management operations
(b) pathname of file as a column of type "name", including
a %d to be replaced by segment #
I think Peter was holding out for storing purely numeric tablespace OID
and table version in pg_class and having a hardwired mapping to pathname
somewhere in smgr. However, I think that doing it that way gains only
micro-efficiency compared to passing a "name" around, while using the
name approach buys us flexibility that's needed for at least some of
the variants under discussion. Given that the exact filename contents
are still so contentious, I think it'd be a bad idea to pick an
implementation that doesn't allow some leeway as to what the filename
will be. A name also has the advantage that it is a single item that
can be used to identify the table to smgr, which will help in cleaning
up the smgr interface.
As for tablespace layout/implementation, the only real proposal I've
heard is that there be a subdirectory of the database directory for each
tablespace, and that that have a subdirectory for each segment (extent)
of its tables --- where any of these subdirectories could be symlinks
off to a different filesystem. Some unhappiness was raised about
depending on symlinks for this function, but I didn't hear one single
concrete reason not to do it, nor an alternative design. Unless someone
comes up with a counterproposal, I think that that's what the actual
access mechanism will look like. We still need to talk about what we
want to store in the SQL-level representation of a tablespace, and what
sort of tablespace management tools/commands are needed. (Although
"try to make it look like Oracle" seems to be pretty much the consensus
for the command level, not all of us know exactly what that means...)
Comments? Anything else that we do have consensus on?
regards, tom lane
From pgsql-hackers-owner+M3615@hub.org Tue Jun 20 12:55:05 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA25768
for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 12:55:04 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA09949 for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 12:41:15 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5KGcCM19112;
Tue, 20 Jun 2000 12:38:12 -0400 (EDT)
Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236])
by hub.org (8.10.1/8.10.1) with ESMTP id e5KGbbM18701
for <pgsql-hackers@postgresql.org>; Tue, 20 Jun 2000 12:37:37 -0400 (EDT)
Received: from regulus.student.UU.SE ([130.238.5.2]:43625 "EHLO
regulus.its.uu.se") by merganser.its.uu.se with ESMTP
id <S303230AbQFTQhF>; Tue, 20 Jun 2000 18:37:05 +0200
Received: from peter (helo=localhost)
by regulus.its.uu.se with local-esmtp (Exim 3.02 #2)
id 134R7f-0003wS-00; Tue, 20 Jun 2000 18:43:35 +0200
Date: Tue, 20 Jun 2000 18:43:35 +0200 (CEST)
From: Peter Eisentraut <peter_e@gmx.net>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@Yahoo.com>, Tom Lane <tgl@sss.pgh.pa.us>,
> If we have a new CREATE DATABASE LOCATION command, we can say:
>
> CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql';
> CREATE DATABASE newdb IN dbloc;
We kind of have this already, with CREATE DATABASE foo WITH LOCATION =
'bar'; but of course with environment variable kludgery. But it's a start.
> mkdir /var/private/pgsql/dbloc
> ln -s /var/private/pgsql/dbloc data/base/dbloc
I think the problem with this was that you'd have to do an extra lookup
into, say, pg_location to resolve this. Some people are talking about
blind writes, this is not really blind.
> CREATE LOCATION tabloc IN '/var/private/pgsql';
> CREATE TABLE newtab ... IN tabloc;
Okay, so we'd have "table spaces" and "database spaces". Seems like one
"space" ought to be enough. I was thinking that the database "space" would
serve as a default "space" for tables created within it but you could
still create tables in other "spaces" than were the database really is. In
fact, the database wouldn't show up at all in the file names anymore,
which may or may not be a good thing.
I think Tom suggested something more or less like this:
$PGDATA/base/tablespace/segment/table
(leaving the details of "table" aside for now). pg_class would get a
column storing the table space somehow, say an oid reference to
pg_location. There would have to be a default tablespace that's created by
initdb and it's indicated by oid 0. So if you create a simple little table
"foo" it ends up in
$PGDATA/base/0/0/foo
That is pretty manageable. Now to create a table space you do
CREATE LOCATION "name" AT '/some/where';
which would make an entry in pg_location and, similar to how you
suggested, create a symlink from
$PGDATA/base/newoid -> /some/where
Then when you create a new table at that new location this gets simply
noted in pg_class with an oid reference, the rest works completely
transparently and no lookup outside of pg_class required. The system would
create the segment 0 subdirectory automatically.
When tables get segmented the system would simply create subdirectories 1,
2, 3, etc. as needed, just as it created the 0 as need, no extra code.
pg_dump doesn't need to use lstat or whatever at all because the locations
are catalogued. Administrators don't even need to know about the linking
business, they just make sure the target directory exists.
Two more items to ponder:
* per-location transaction logs
* pg_upgrade
--
Peter Eisentraut Sernanders v<>g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
From Inoue@tpf.co.jp Tue Jun 20 17:10:56 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA10307
for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 17:10:55 -0400 (EDT)
Received: from sd.tpf.co.jp (mail.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id QAA08017 for <pgman@candle.pha.pa.us>; Tue, 20 Jun 2000 16:57:44 -0400 (EDT)
Received: from mcadnote1 (ppm127.noc.fukui.nsk.ne.jp [210.161.188.46])
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
id FAA00867; Wed, 21 Jun 2000 05:56:44 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Tom Lane" <tgl@sss.pgh.pa.us>, "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "Jan Wieck" <JanWieck@yahoo.com>, "Ross J. Reedstrom" <reedstrm@rice.edu>,
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
In-Reply-To: <29686.961511764@sss.pgh.pa.us>
Importance: Normal
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Agreed. Seems we have several issues:
>
> > filename contents
> > tablespace implementation
> > tablespace directory layout
> > tablespace commands and syntax
>
[snip]
>
> Comments? Anything else that we do have consensus on?
>
Before the details of tablespace implementation,
1) How to change(extend) the syntax of CREATE TABLE
We only add table(data)space name with some
keyword ? i.e Do we consider tablespace as an
abstraction ?
To confirm our mutual understanding.
2) Is tablespace defined per PostgreSQL's database ?
3) Is default tablespace defined per database/user or
for all ?
AFAIK in Oracle,2) global, 3) per user.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Tue Jun 20 20:00:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA12668;
Tue, 20 Jun 2000 20:00:58 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA21016; Tue, 20 Jun 2000 19:54:18 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
> I think the problem with this was that you'd have to do an extra lookup
> into, say, pg_location to resolve this. Some people are talking about
> blind writes, this is not really blind.
>
> > CREATE LOCATION tabloc IN '/var/private/pgsql';
> > CREATE TABLE newtab ... IN tabloc;
>
> Okay, so we'd have "table spaces" and "database spaces". Seems like one
> "space" ought to be enough.
Does your "database space" correspond to current PostgreSQL's database ?
And is it different from SCHEMA ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Wed Jun 21 00:23:48 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA18016;
Wed, 21 Jun 2000 00:23:47 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA05207; Wed, 21 Jun 2000 00:07:58 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA03002;
Wed, 21 Jun 2000 00:06:42 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Peter Eisentraut <peter_e@gmx.net>,
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-reply-to: <2999.961560402@sss.pgh.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I recommend making a dbname in each directory, then putting the
> > location inside there.
>
> This still seems backwards to me. Why is it better than tablespace
> directory inside database directory?
>
> One significant problem with it is that there's no longer (AFAICS)
> a "default" per-database directory that corresponds to the current
> working directory of backends running in that database. Thus,
> for example, it's not immediately clear where temporary files and
> backend core-dump files will end up. Also, you've just added an
> essential extra level (if not two) to the pathnames that backends will
> use to address files.
>
> There is a great deal to be said for
> ..../database/tablespace/filename
OK,I seem to have gotten the answer for the question
Is tablespace defined per PostgreSQL's database ?
You and Bruce
1) tablespace is per database
Peter seems to have the following idea(?? not sure)
2) database = tablespace
My opinion
3) database and tablespace are relatively irrelevant.
I assume PostgreSQL's database would correspond
to the concept of SCHEMA.
It seems we are different from the first.
Shoudln't we reach an agreement on it in the first place ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3636@hub.org Wed Jun 21 01:31:12 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20523
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:31:12 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA08982 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:15:17 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5L5Bp151546;
Wed, 21 Jun 2000 01:11:51 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5L5BP151324
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 01:11:25 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA03463;
Wed, 21 Jun 2000 01:09:52 -0400 (EDT)
To: Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
Comments: In-reply-to Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
message dated "Wed, 21 Jun 2000 14:45:01 +1000"
Date: Wed, 21 Jun 2000 01:09:52 -0400
Message-ID: <3459.961564192@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> What I meant is, would you still be able to create tablespaces on
> systems without symlinks? That would seem to be a desirable feature.
All else being equal, it'd be nice. Since all else is not equal,
exactly how much sweat are we willing to expend on supporting that
feature on such systems --- to the exclusion of other features we
might expend the same sweat on, with more widely useful results?
Bear in mind that everything will still *work* just fine on such a
platform, you just don't have a way to spread the database across
multiple filesystems. That's only an issue if the platform has a
fairly Unixy notion of filesystems ... but no symlinks.
A few messages back someone was opining that we were wasting our time
thinking about tablespaces at all, because any modern platform can
create disk-spanning filesystems for itself, so applications don't have
to worry. I don't buy that argument in general, but I'm quite willing
to quote it for the *very* few systems that are Unixy enough to run
Postgres in the first place, but not quite Unixy enough to have
symlinks.
You gotta draw the line somewhere at what you will support, and
this particular line seems to me to be entirely reasonable and
justifiable. YMMV...
regards, tom lane
From dhogaza@pacifier.com Wed Jun 21 01:31:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20492
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:30:58 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09401 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:22:50 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA22395;
At 11:22 AM 6/21/00 +1000, Philip J. Warner wrote:
>It may be worth considering leaving the CREATE TABLE statement alone.
>Dec/RDB uses a new statement entirely to define where a table goes...
It's worth considering, but on the other hand Oracle users greatly
outnumber Compaq/RDB users these days...
If there's no SQL92 guidance for implementing a feature, I'm pretty much in
favor of tracking Oracle, whose SQL dialect is rapidly becoming a
de-facto standard.
I'm not saying I like the fact, Oracle's a pain in the ass. But when
adopting existing syntax, might as well adopt that of the crushing
borg.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From lockhart@alumni.caltech.edu Wed Jun 21 01:31:07 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20508;
Wed, 21 Jun 2000 01:31:06 -0400 (EDT)
Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09355; Wed, 21 Jun 2000 01:22:03 -0400 (EDT)
Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203])
by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id WAA00821;
Tue, 20 Jun 2000 22:18:38 -0700 (PDT)
Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1])
by golem.jpl.nasa.gov (Postfix) with ESMTP
id AF4376F51; Wed, 21 Jun 2000 05:19:29 +0000 (UTC)
> Yes, I didn't like the environment variable stuff. In fact, I would
> like to not mention the symlink location anywhere in the database, so
> it can be changed without changing it in the database.
Well, as y'all have noticed, I think there are strong reasons to use
environment variables to manage locations, and that symlinks are a
potential portability and robustness problem.
An additional point which has relevance to this whole discussion:
In the future we may allow system resource such as tables to carry names
which use multi-byte encodings. afaik these encodings are not allowed to
be used for physical file names, and even if they were the utility of
using standard operating system utilities like ls goes way down.
istm that from a portability and evolutionary standpoint OID-only file
names (or at least file names *not* based on relation/class names) is a
requirement.
Comments?
- Thomas
From tgl@sss.pgh.pa.us Wed Jun 21 01:31:05 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20503
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:31:05 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09513 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 01:25:18 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA03557;
> OK,I seem to have gotten the answer for the question
> Is tablespace defined per PostgreSQL's database ?
Not necessarily --- the tablespace subdirectories could be symlinks
pointing to the same place (assuming you use OIDs or something to keep
the table filenames unique even across databases). This is just an
implementation mechanism; it doesn't foreclose the policy decision
whether tablespaces are database-local or installation-wide.
(OTOH, pathnames like tablespace/database would pretty much force
tablespaces to be installation-wide whether you wanted it that way
or not.)
> My opinion
> 3) database and tablespace are relatively irrelevant.
> I assume PostgreSQL's database would correspond
> to the concept of SCHEMA.
My inclindation is that tablespaces should be installation-wide, but
I'm not completely sold on it. In any case I could see wanting a
permissions mechanism that would only allow some databases to have
tables in a particular tablespace.
We do need to think more about how traditional Postgres databases
fit together with SCHEMA. Maybe we wouldn't even need multiple
databases per installation if we had SCHEMA done right.
regards, tom lane
From pgsql-hackers-owner+M3641@hub.org Wed Jun 21 02:31:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA25698
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 02:31:00 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA11423 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 02:09:13 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5L5we151226;
Wed, 21 Jun 2000 01:58:40 -0400 (EDT)
Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154])
by hub.org (8.10.1/8.10.1) with ESMTP id e5L5wE151030
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 01:58:14 -0400 (EDT)
Received: by rice.edu
via sendmail from stdin
id <m134dJu-000LGmC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
for pgsql-hackers@postgresql.org; Wed, 21 Jun 2000 00:45:02 -0500 (CDT)
Date: Wed, 21 Jun 2000 00:45:02 -0500
From: "Ross J. Reedstrom" <reedstrm@rice.edu>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Bruce Momjian <pgman@candle.pha.pa.us>,
Peter Eisentraut <peter_e@gmx.net>, Jan Wieck <JanWieck@yahoo.com>,
In-Reply-To: <3554.961565037@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Wed, Jun 21, 2000 at 01:23:57AM -0400
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: ORr
On Wed, Jun 21, 2000 at 01:23:57AM -0400, Tom Lane wrote:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>
> > My opinion
> > 3) database and tablespace are relatively irrelevant.
> > I assume PostgreSQL's database would correspond
> > to the concept of SCHEMA.
>
> My inclindation is that tablespaces should be installation-wide, but
> I'm not completely sold on it. In any case I could see wanting a
> permissions mechanism that would only allow some databases to have
> tables in a particular tablespace.
>
> We do need to think more about how traditional Postgres databases
> fit together with SCHEMA. Maybe we wouldn't even need multiple
> databases per installation if we had SCHEMA done right.
>
The important point I think is that tablespaces are about physical
storage/namespace, and SCHEMA are about logical namespace: it would make
sense for tables from multiple schema to live in the same tablespace,
as well as tables from one schema to be stored in multiple tablespaces.
Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005
From pgsql-hackers-owner+M3644@hub.org Wed Jun 21 02:31:03 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA25704
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 02:31:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA11923 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 02:22:41 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5L6JO196109;
Wed, 21 Jun 2000 02:19:24 -0400 (EDT)
Received: from mailo.vtcif.telstra.com.au (mailo.vtcif.telstra.com.au [202.12.144.17])
by hub.org (8.10.1/8.10.1) with ESMTP id e5L6JB196028
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 02:19:11 -0400 (EDT)
Received: (from uucp@localhost) by mailo.vtcif.telstra.com.au (8.8.2/8.6.9) id QAA21128 for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 16:19:04 +1000 (EST)
Received: from maili.vtcif.telstra.com.au(202.12.142.17)
via SMTP by mailo.vtcif.telstra.com.au, id smtpd08EKgu; Wed Jun 21 16:17:56 2000
Received: (from uucp@localhost) by maili.vtcif.telstra.com.au (8.8.2/8.6.9) id QAA02825 for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 16:17:55 +1000 (EST)
Received: from localhost(127.0.0.1), claiming to be "mail.cdn.telstra.com.au"
via SMTP by localhost, id smtpdnjRBD_; Wed Jun 21 16:17:14 2000
Received: from lunitari.nimrod.itg.telecom.com.au (lunitari.nimrod.itg.telecom.com.au [192.53.254.48]) by mail.cdn.telstra.com.au (8.8.2/8.6.9) with ESMTP id QAA07553 for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 16:17:14 +1000 (EST)
Received: from nimrod.itg.telecom.com.au (majere [192.53.254.45])
by lunitari.nimrod.itg.telecom.com.au (8.9.1/8.9.3) with ESMTP id QAA05880
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 16:15:56 +1000 (EST)
In other words there is a directory for databases, and a directory for
tablespaces. Database tables are symlinked to the appropriate
tablespace. So there is multiple databases per tablespace and multiple
tablespaces per database.
From pgsql-hackers-owner+M3648@hub.org Wed Jun 21 09:01:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA06055
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 09:01:00 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA29647 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 08:52:25 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5LCo0112103;
Wed, 21 Jun 2000 08:50:00 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65])
by hub.org (8.10.1/8.10.1) with ESMTP id e5LCnS112011
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 08:49:28 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id OAA27330;
Wed, 21 Jun 2000 14:48:44 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)
> > > CREATE LOCATION tabloc IN '/var/private/pgsql';
> > > CREATE TABLE newtab ... IN tabloc;
> >
> > Okay, so we'd have "table spaces" and "database spaces".
> Seems like one
> > "space" ought to be enough.
Yes, one space should be enough.
>
> Does your "database space" correspond to current PostgreSQL's
> database ?
I think we should think of the "database space" as the default "table space"
for this database.
> And is it different from SCHEMA ?
Please don't mix schema and database, they are two different issues.
Even Oracle has a database, only in Oracle you are limited to one database
per instance. We do not want to add this limitation to PostgreSQL.
Andreas
From e99re41@DoCS.UU.SE Wed Jun 21 10:01:10 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA06585;
Wed, 21 Jun 2000 10:01:09 -0400 (EDT)
Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA03592; Wed, 21 Jun 2000 09:38:34 -0400 (EDT)
Received: from Ulv.DoCS.UU.SE (e99re41@Ulv.DoCS.UU.SE [130.238.9.167])
by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id PAA20520;
Wed, 21 Jun 2000 15:34:34 +0200 (MET DST)
Received: from localhost (e99re41@localhost) by Ulv.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id PAA10847; Wed, 21 Jun 2000 15:34:27 +0200
X-Authentication-Warning: Ulv.DoCS.UU.SE: e99re41 owned process doing -bs
Date: Wed, 21 Jun 2000 15:34:27 +0200 (MET DST)
From: Peter Eisentraut <e99re41@DoCS.UU.SE>
Reply-To: Peter Eisentraut <peter_e@gmx.net>
To: Hiroshi Inoue <Inoue@tpf.co.jp>
cc: Tom Lane <tgl@sss.pgh.pa.us>, Bruce Momjian <pgman@candle.pha.pa.us>,
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id KAA06585
Status: OR
On Wed, 21 Jun 2000, Hiroshi Inoue wrote:
> Peter seems to have the following idea(?? not sure)
> 2) database = tablespace
No, I thought that a database would have a table space assigned that would
serve as the default for newly created tables, but could be overridden. So
you could group databases onto disks as you want, but a couple of
particularly big/important/unimportant/etc tables from each database could
be put on a different disk. At least this seems to be the most flexible
and conceptually simple solution.
Ideally, directories per database would go away, but then we'd have the
system tables colliding, since those have the same oid in each database.
But that's not really important. So essentially you'd have
$PGDATA/base/tablespacesomething/database/tables
In the default tablespace, "tablespacesomething" is an ordinary directory,
for other tablespaces it symlinks somewhere else. For those browsing
$PGDATA/base, it all looks the same (unless you have colour ls). For those
browsing the actual storage location it looks like
/var/foo/elsewhere/database/tables.
I'm sure you can squeeze the extension segments in there, maybe between
tablespace and database.
What I think Bruce is saying is that there should be both database spaces
and table spaces, I think that's too much.
> My opinion
> 3) database and tablespace are relatively irrelevant.
> I assume PostgreSQL's database would correspond
> to the concept of SCHEMA.
A database corresponds to a catalog and a schema corresponds to nothing
yet.
--
Peter Eisentraut Sernanders v<>g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
From e99re41@DoCS.UU.SE Wed Jun 21 10:01:09 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA06582;
Wed, 21 Jun 2000 10:01:08 -0400 (EDT)
Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA04510; Wed, 21 Jun 2000 09:43:48 -0400 (EDT)
Received: from Ulv.DoCS.UU.SE (e99re41@Ulv.DoCS.UU.SE [130.238.9.167])
by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id PAA20730;
Wed, 21 Jun 2000 15:39:23 +0200 (MET DST)
Received: from localhost (e99re41@localhost) by Ulv.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id PAA10853; Wed, 21 Jun 2000 15:39:16 +0200
X-Authentication-Warning: Ulv.DoCS.UU.SE: e99re41 owned process doing -bs
Date: Wed, 21 Jun 2000 15:39:16 +0200 (MET DST)
From: Peter Eisentraut <e99re41@DoCS.UU.SE>
Reply-To: Peter Eisentraut <peter_e@gmx.net>
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Jan Wieck <JanWieck@yahoo.com>, Tom Lane <tgl@sss.pgh.pa.us>,
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id KAA06582
Status: ORr
On Tue, 20 Jun 2000, Bruce Momjian wrote:
> What I was suggesting is not to catalog the symlink locations, but to
> use lstat when dumping, so that admins can move files around using
> symlinks and not have to udpate the database.
That surely wouldn't make those happy that are calling for smgr
abstraction.
--
Peter Eisentraut Sernanders v<>g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
From tgl@sss.pgh.pa.us Wed Jun 21 11:31:09 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08120;
Wed, 21 Jun 2000 11:31:08 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA13232; Wed, 21 Jun 2000 11:08:38 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA04286;
Wed, 21 Jun 2000 11:07:20 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Peter Eisentraut <peter_e@gmx.net>,
> In this way, the database has a view of its main directory, plus a /loc
> subdirectory for the tablespace. In the other location, we have
> /var/pgsql/dbname/loc because this allows different databases to use:
> CREATE TABLESPACE loc USING '/var/pgsql'
> and they do not collide with each other in /var/pgsql.
But they don't collide anyway, because the dbname is already unique.
Isn't the extra subdirectory a waste?
Because table files will have installation-wide unique names, there's
no really good reason to have either level of subdirectory; you could
just make
CREATE TABLESPACE loc USING '/var/pgsql'
do
ln -s /var/pgsql data/base/dbname/loc
and it'd still work even if multiple DBs were using the same tablespace.
However, forcing creation of a subdirectory does give you the chance to
make sure the subdir is owned by postgres and has the right permissions,
so there's something to be said for that. It might be reasonable to do
mkdir /var/pgsql/dbname
chmod 700 /var/pgsql/dbname
ln -s /var/pgsql/dbname data/base/dbname/loc
regards, tom lane
From lockhart@alumni.caltech.edu Wed Jun 21 11:31:10 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08135;
Wed, 21 Jun 2000 11:31:09 -0400 (EDT)
Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA15864; Wed, 21 Jun 2000 11:30:06 -0400 (EDT)
Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203])
by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id IAA02881;
Wed, 21 Jun 2000 08:26:40 -0700 (PDT)
Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1])
by golem.jpl.nasa.gov (Postfix) with ESMTP
id AB8AE6F51; Wed, 21 Jun 2000 15:27:36 +0000 (UTC)
> Sorry, disagree. Environment variables are a pain to administer, and
> quite counter-intuitive.
Well, I guess we disagree. But until we have a complete proposed
solution, we should leave environment variables on the table, since they
*do* allow some decoupling of logical and physical storage, and *do*
give the administrator some control over resources *that the admin would
not otherwise have*.
> > istm that from a portability and evolutionary standpoint OID-only
> > file names (or at least file names *not* based on relation/class
> > names) is a requirement.
> Maybe a requirement at some point for some installations, but I hope
> not a general requirement.
If a table name can have characters which are not legal for file names,
then how would you propose to support it? If we are doing a
restructuring of the storage scheme, this should be taken into account.
lockhart=# create table "one/two" (i int);
ERROR: cannot create one/two
Why not? It demonstrates an unfortunate linkage between file systems and
database resources.
- Thomas
From tgl@sss.pgh.pa.us Wed Jun 21 11:31:18 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08164;
Wed, 21 Jun 2000 11:31:12 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA15786; Wed, 21 Jun 2000 11:29:30 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA04451;
Wed, 21 Jun 2000 11:28:09 -0400 (EDT)
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Peter Eisentraut <peter_e@gmx.net>,
Jan Wieck <JanWieck@Yahoo.com>, Hiroshi Inoue <Inoue@tpf.co.jp>,
Comments: In-reply-to Thomas Lockhart <lockhart@alumni.caltech.edu>
message dated "Wed, 21 Jun 2000 05:19:29 -0000"
Date: Wed, 21 Jun 2000 11:28:09 -0400
Message-ID: <4448.961601289@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> Well, as y'all have noticed, I think there are strong reasons to use
> environment variables to manage locations, and that symlinks are a
> potential portability and robustness problem.
Reasons? Evidence?
> An additional point which has relevance to this whole discussion:
> In the future we may allow system resource such as tables to carry names
> which use multi-byte encodings. afaik these encodings are not allowed to
> be used for physical file names, and even if they were the utility of
> using standard operating system utilities like ls goes way down.
Good point, although in one sense a string is a string --- as long as
we don't allow embedded nulls in server-side encodings, we could use
anything that Postgres thought was a name in a filename, and the OS
should take it. But if your local ls doesn't show it the way you see
in Postgres, the usefulness of having the tablename in the filename
goes way down.
> istm that from a portability and evolutionary standpoint OID-only file
> names (or at least file names *not* based on relation/class names) is a
> requirement.
No argument from me ;-). I've been looking for compromise positions
but I still think that pure numeric filenames are the cleanest solution.
There's something else that should be taken into account: for WAL, the
log will need to record the table file that each insert/delete/update
operation affects. To do that with the smgr-token-is-a-pathname
approach I was suggesting yesterday, I think you have to record the
database name and pathname in each WAL log entry. That's 64 bytes/log
entry which is a *lot*. If we bit the bullet and restricted ourselves
to numeric filenames then the log would need just four numeric values:
database OID
tablespace OID
relation OID
relation version number
(this set of 4 values would also be an smgr file reference token).
16 bytes/log entry looks much better than 64.
At the moment I can recall the following opinions:
Pure OID filenames: Thomas, Tom, Marc, Peter E.
OID+relname filenames: Bruce
Vadim was in the pure-OID camp a few months ago, but I won't presume
to list him there now since he hasn't been involved in this most
recent round of discussions. I'm not sure where anyone else stands...
but at least in terms of the core group it's pretty clear where the
majority opinion is.
regards, tom lane
From lamar.owen@wgcr.org Wed Jun 21 11:51:39 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA09021;
Wed, 21 Jun 2000 11:51:38 -0400 (EDT)
Received: from www.wgcr.org (IDENT:root@www.wgcr.org [206.74.232.194]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA18613; Wed, 21 Jun 2000 11:51:48 -0400 (EDT)
Received: from wgcr.org ([206.74.232.197])
by www.wgcr.org (8.9.3/8.9.3/WGCR) with ESMTP id LAA19124;
Wed, 21 Jun 2000 11:48:25 -0400
Message-ID: <3950E3C3.7322BD70@wgcr.org>
Date: Wed, 21 Jun 2000 11:48:19 -0400
From: Lamar Owen <lamar.owen@wgcr.org>
X-Mailer: Mozilla 4.61 [en] (Win95; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Tom Lane <tgl@sss.pgh.pa.us>
CC: Thomas Lockhart <lockhart@alumni.caltech.edu>,
Bruce Momjian <pgman@candle.pha.pa.us>,
Peter Eisentraut <peter_e@gmx.net>, Jan Wieck <JanWieck@Yahoo.com>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 21 Jun 2000 11:45:12 -0400"
Date: Wed, 21 Jun 2000 12:10:15 -0400
Message-ID: <4786.961603815@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yes, that is true. My idea is that they may want to create loc1 and
> loc2 which initially point to the same location, but later may be moved.
> For example, one tablespace for tables, another for indexes. They may
> initially point to the same directory, but later be split.
Well, that opens up a completely different issue, which is what about
moving tables from one tablespace to another?
I think the way you appear to be implying above (shut down the server
so that you can rearrange subdirectories by hand) is the wrong way to
go about it. For one thing, lots of people don't want to shut down
their servers completely for that long, but it's difficult to avoid
doing so if you want to move files by filesystem commands. For another
thing, the above approach requires guessing in advance --- maybe long
in advance --- how you are going to want to repartition your database
when it gets too big for your existing storage.
The right way to address this problem is to invent a "move table to
new tablespace" command. This'd be pretty trivial to implement based
on a file-versioning approach: the new version of the pg_class tuple
has a new tablespace identifier in it.
regards, tom lane
From pgsql-hackers-owner+M3670@hub.org Wed Jun 21 12:30:42 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA10371
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 12:30:41 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA22315 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 12:23:18 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5LGJU175424;
Wed, 21 Jun 2000 12:19:30 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5LGJJ175359
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 12:19:19 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA04878;
Wed, 21 Jun 2000 12:17:38 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Lamar Owen <lamar.owen@wgcr.org>,
Thomas Lockhart <lockhart@alumni.caltech.edu>,
Peter Eisentraut <peter_e@gmx.net>, Jan Wieck <JanWieck@Yahoo.com>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 21 Jun 2000 12:03:12 -0400"
Date: Wed, 21 Jun 2000 12:17:37 -0400
Message-ID: <4875.961604257@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Sorry Bruce -- I understand and am sympathetic to your position, and, at
>> one time, I agreed with it. But not any more.
> I thought the most recent proposal was to just throw ~16 chars of the
> file name on the end of the file name, and that should not be used for
> anything except visibility. WAL would not need to store that. It could
> just grab the file name that matches the oid/sequence number.
But that's extra complexity in WAL, plus extra complexity in renaming
tables (if you want the filename to track the logical table name, which
I expect you would), plus extra complexity in smgr and bufmgr and other
places.
I think people are coming around to the notion that it's better to keep
these low-level operations simple, even if we need to expend more work
on high-level admin tools as a result.
But we do need to remember to expend that effort on tools! Let's not
drop the ball on that, folks.
regards, tom lane
From tgl@sss.pgh.pa.us Wed Jun 21 12:30:40 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA10364
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 12:30:38 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA22593 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 12:25:58 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA04944;
Wed, 21 Jun 2000 12:24:44 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Peter Eisentraut <peter_e@gmx.net>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 21 Jun 2000 12:14:59 -0400"
Date: Wed, 21 Jun 2000 12:24:44 -0400
Message-ID: <4941.961604684@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Well, that opens up a completely different issue, which is what about
>> moving tables from one tablespace to another?
> Are you suggesting that doing dbname/locname is somehow harder to do
> that? If you are, I don't understand why.
It doesn't make it harder, but it still seems pointless to have the
extra directory level. Bear in mind that if we go with all-OID
filenames then you're not going to be looking at "loc1" and "loc2"
anyway, but at "5938171" and "8583727". It's not much of a convenience
to the admin to see that, so we might as well save a level of directory
lookup.
> The general issue of moving tables between tablespaces can be done from
> in the database. I don't think it is reasonable to shut down the db to
> do that. However, I can see moving tablespaces to different symlinked
> locations may require a shutdown.
Only if you insist on doing it outside the database using filesystem
tools. Another way is to create a new tablespace in the desired new
location, then move the tables one-by-one to that new tablespace.
I suppose either one might be preferable depending on your access
patterns --- locking your most critical tables while they're being moved
might be as bad as a total shutdown.
regards, tom lane
From tgl@sss.pgh.pa.us Wed Jun 21 13:01:06 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA11366
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 13:01:05 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA24726 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 12:47:50 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA05112;
Wed, 21 Jun 2000 12:46:34 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Peter Eisentraut <peter_e@gmx.net>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 21 Jun 2000 12:40:35 -0400"
Date: Wed, 21 Jun 2000 12:46:34 -0400
Message-ID: <5109.961605994@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>>>> Are you suggesting that doing dbname/locname is somehow harder to do
>>>> that? If you are, I don't understand why.
>>
>> It doesn't make it harder, but it still seems pointless to have the
>> extra directory level. Bear in mind that if we go with all-OID
>> filenames then you're not going to be looking at "loc1" and "loc2"
>> anyway, but at "5938171" and "8583727". It's not much of a convenience
>> to the admin to see that, so we might as well save a level of directory
>> lookup.
> Just seems easier to have stuff segregates into separate per-db
> directories for clarity. Also, as directories get bigger, finding a
> specific file in there becomes harder. Putting 10 databases all in the
> same directory seems bad in this regard.
Huh? I wasn't arguing against making a db-specific directory below the
tablespace point. I was arguing against making *another* directory
below that one.
> I don't think we want to be using
> symlinks for tables if we can avoid it.
Agreed, but where did that come from? None of these proposals mentioned
symlinks for anything but directories, AFAIR.
regards, tom lane
From peter@localhost.its.uu.se Wed Jun 21 14:31:13 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA13233
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 14:31:13 -0400 (EDT)
Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA04201 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 14:11:42 -0400 (EDT)
Received: from regulus.student.UU.SE ([130.238.5.2]:34923 "EHLO
regulus.its.uu.se") by merganser.its.uu.se with ESMTP
id <S385153AbQFUSJq>; Wed, 21 Jun 2000 20:09:46 +0200
Received: from peter (helo=localhost)
by regulus.its.uu.se with local-esmtp (Exim 3.02 #2)
id 134p2o-0000Uo-00; Wed, 21 Jun 2000 20:16:10 +0200
Date: Wed, 21 Jun 2000 20:16:10 +0200 (CEST)
From: Peter Eisentraut <peter_e@gmx.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Hiroshi Inoue <Inoue@tpf.co.jp>,
> I think Peter was holding out for storing purely numeric tablespace OID
> and table version in pg_class and having a hardwired mapping to pathname
> somewhere in smgr. However, I think that doing it that way gains only
> micro-efficiency compared to passing a "name" around, while using the
> name approach buys us flexibility that's needed for at least some of
> the variants under discussion.
But that name can only be a dozen or so characters, contain no slash or
other funny characters, etc. That's really poor. Then the alternative is
to have an internal name and an external canonical name. Then you have two
names to worry about. Also consider that when you store both the table
space oid and the internal name in pg_class you create redundant data.
What if you rename the table space? Do you leave the internal name out of
sync? Then what good is the internal name? I'm just concerned that we are
creating at the table space level problems similar to that we're trying to
get rid of at the relation and database level.
--
Peter Eisentraut Sernanders v<>g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
From tgl@sss.pgh.pa.us Wed Jun 21 18:14:19 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA24147
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 18:14:18 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id RAA24649 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 17:40:59 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA06031;
Wed, 21 Jun 2000 17:39:38 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Peter Eisentraut <peter_e@gmx.net>, Hiroshi Inoue <Inoue@tpf.co.jp>,
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 21 Jun 2000 14:42:21 -0400"
Date: Wed, 21 Jun 2000 17:39:38 -0400
Message-ID: <6028.961623578@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> But that name can only be a dozen or so characters, contain no slash or
>> other funny characters, etc. That's really poor. Then the alternative is
>> to have an internal name and an external canonical name. Then you have two
>> names to worry about. Also consider that when you store both the table
>> space oid and the internal name in pg_class you create redundant data.
>> What if you rename the table space? Do you leave the internal name out of
>> sync? Then what good is the internal name? I'm just concerned that we are
>> creating at the table space level problems similar to that we're trying to
>> get rid of at the relation and database level.
> Agreed. Having table spaces stored by directories named by oid just
> seems very complicated for no reason.
Huh? He just gave you two very good reasons: avoid Unix-derived
limitations on the naming of tablespaces (and tables), and avoid
problems with renaming tablespaces.
I'm pretty much firmly back in the "OID and nothing but" camp.
Or perhaps I should say "OID, file version, and nothing but",
since we still need a version number to do CLUSTER etc.
regards, tom lane
From vmikheev@SECTORBASE.COM Wed Jun 21 22:18:38 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07570;
Wed, 21 Jun 2000 22:18:36 -0400 (EDT)
Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA29965; Wed, 21 Jun 2000 19:07:37 -0400 (EDT)
Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21)
> If we bit the bullet and restricted ourselves to numeric filenames then
> the log would need just four numeric values:
> database OID
> tablespace OID
Is someone going to implement it for 7.1?
> relation OID
> relation version number
I believe that we can avoid versions using WAL...
> (this set of 4 values would also be an smgr file reference token).
> 16 bytes/log entry looks much better than 64.
>
> At the moment I can recall the following opinions:
>
> Pure OID filenames: Thomas, Tom, Marc, Peter E.
+ me.
But what about LOCATIONs? I object using environment and think that
locations
must be stored in pg_control..?
Vadim
From Inoue@tpf.co.jp Wed Jun 21 22:18:39 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07573;
Wed, 21 Jun 2000 22:18:38 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA01857; Wed, 21 Jun 2000 19:37:04 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
In-Reply-To: <4448.961601289@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> No argument from me ;-). I've been looking for compromise positions
> but I still think that pure numeric filenames are the cleanest solution.
>
> There's something else that should be taken into account: for WAL, the
> log will need to record the table file that each insert/delete/update
> operation affects. To do that with the smgr-token-is-a-pathname
> approach I was suggesting yesterday, I think you have to record the
> database name and pathname in each WAL log entry. That's 64 bytes/log
> entry which is a *lot*. If we bit the bullet and restricted ourselves
> to numeric filenames then the log would need just four numeric values:
> database OID
> tablespace OID
I strongly object to keep tablespace OID for smgr file reference token
though we have to keep it for another purpose of cource. I've mentioned
many times tablespace(where to store) info should be distinguished from
*where it is stored* info. Generally tablespace isn't sufficiently
restrictive
for this purpose. e.g. there was an idea about round-robin. e.g. Oracle's
tablespace could have pluaral files... etc.
IMHO,it is misleading to use tablespace OID as (a part of) reference token.
> relation OID
> relation version number
> (this set of 4 values would also be an smgr file reference token).
> 16 bytes/log entry looks much better than 64.
>
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Wed Jun 21 22:18:15 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07540;
Wed, 21 Jun 2000 22:18:11 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA04100; Wed, 21 Jun 2000 20:15:09 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
> > If we bit the bullet and restricted ourselves to numeric filenames then
> > the log would need just four numeric values:
> > database OID
> > tablespace OID
>
> Is someone going to implement it for 7.1?
>
> > relation OID
> > relation version number
>
> I believe that we can avoid versions using WAL...
>
How to re-construct tables in place ?
Is the following right ?
1) save the content of current table to somewhere
2) shrink the table and related indexes
3) reload the saved(+some filtering) content
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Wed Jun 21 22:18:16 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07553;
Wed, 21 Jun 2000 22:18:15 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA05872; Wed, 21 Jun 2000 20:44:21 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
> > > I believe that we can avoid versions using WAL...
> > >
> >
> > How to re-construct tables in place ?
> > Is the following right ?
> > 1) save the content of current table to somewhere
> > 2) shrink the table and related indexes
> > 3) reload the saved(+some filtering) content
>
> Or - create tmp file and load with new content; log "intent to
> relink table
> file";
> relink table file; log "file is relinked".
>
It seems to me that whole content of the table should be
logged before relinking or shrinking.
Is my understanding right ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3700@hub.org Wed Jun 21 22:17:59 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07504
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 22:17:58 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA07914 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 21:23:22 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M1It194420;
Wed, 21 Jun 2000 21:18:55 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M1Ig194334
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 21:18:43 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
In-Reply-To: <4448.961601289@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> At the moment I can recall the following opinions:
>
> Pure OID filenames: Thomas, Tom, Marc, Peter E.
>
> OID+relname filenames: Bruce
>
Please add my opinion to the list.
Unique-id filename: Hiroshi
(Unqiue-id is irrelevant to OID/relname).
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From pgsql-hackers-owner+M3701@hub.org Wed Jun 21 22:18:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07513
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 22:18:01 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA08502 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 21:33:13 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M1QS107400;
Wed, 21 Jun 2000 21:26:28 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M1QA107223
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 21:26:10 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
> > > Or - create tmp file and load with new content;
> > > log "intent to relink table file";
> > > relink table file; log "file is relinked".
> >
> > It seems to me that whole content of the table should be
> > logged before relinking or shrinking.
>
> Why not just fsync tmp files?
>
Probably I've misunderstood *relink*.
If *relink* different from *rename* ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From vmikheev@SECTORBASE.COM Wed Jun 21 22:17:52 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07492;
Wed, 21 Jun 2000 22:17:51 -0400 (EDT)
Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA08730; Wed, 21 Jun 2000 21:37:44 -0400 (EDT)
Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21)
unlink(tmp file). We can do additional logging (with log flush) of these
steps
if required, postpone on-recovery redo of operations till last relink log
record/
end of log/transaction abort etc etc etc.
Vadim
From Inoue@tpf.co.jp Wed Jun 21 23:22:36 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA10350
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 23:22:35 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA13743 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 23:07:50 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
I see,old file would be rolled back from tmp2 file on abort.
This would work on most platforms.
But cygwin port has a flaw that files could not be unlinked
if they are open. So *relink* may fail in some cases(including
rollback cases).
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Wed Jun 21 23:22:38 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA10353
for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 23:22:36 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA14206 for <pgman@candle.pha.pa.us>; Wed, 21 Jun 2000 23:16:26 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA07099;
Wed, 21 Jun 2000 23:14:50 -0400 (EDT)
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
cc: Thomas Lockhart <lockhart@alumni.caltech.edu>,
Bruce Momjian <pgman@candle.pha.pa.us>,
Peter Eisentraut <peter_e@gmx.net>, Jan Wieck <JanWieck@Yahoo.com>,
> I believe that we can avoid versions using WAL...
I don't think so. You're basically saying that
1. create file 'new'
2. delete file 'old'
3. rename 'new' to 'old'
is safe as long as you have a redo log to ensure that the rename
happens even if you crash between steps 2 and 3. But crash is not
the only hazard. What if step 3 just plain fails? Redo won't help.
I'm having a hard time inventing really plausible examples, but a
slightly implausible example is that someone chmod's the containing
directory -w between steps 2 and 3. (Maybe it's not so implausible
if you assume a crash after step 2 ... someone might have left the
directory nonwritable while restoring the system.)
If we use file version numbers, then the *only* thing needed to
make a valid transition between one set of files and another is
a commit of the update of pg_class that shows the new version number
in the rel's pg_class tuple. The worst that can happen to you in
a crash or other failure is that you are unable to get rid of the
set of files that you don't want anymore. That might waste disk
space but it doesn't leave the database corrupted.
> But what about LOCATIONs? I object using environment and think that
> locations must be stored in pg_control..?
I don't like environment variables for this either; it's just way too
easy to start the postmaster with wrong environment. It still seems
to me that relying on subdirectory symlinks is a good way to go.
pg_control is not so good --- if it gets corrupted, how do you recover?
symlinks can be recreated by hand if necessary, but...
regards, tom lane
From pgsql-hackers-owner+M3711@hub.org Thu Jun 22 01:01:06 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA22245
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 01:01:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA18310 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 00:43:00 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M3US167109;
Wed, 21 Jun 2000 23:30:28 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M3U0164115
for <pgsql-hackers@postgresql.org>; Wed, 21 Jun 2000 23:30:00 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA07156;
"Unique ID" is more or less equivalent to "OID + version number",
right?
I was trying earlier to convince myself that a single unique-ID value
would be better than OID+version for the smgr interface, because it'd
certainly be easier to pass around. I failed to convince myself though,
and the thing that bothered me was this. Suppose you are trying to
recover a corrupted database manually, and the only information you have
about which table is which is a somewhat out-of-date listing of OIDs
versus table names. (Maybe it's out of date because you got it from
your last backup tape.) If the files are named OID+version you're not
going to have much trouble seeing which is which, even if some of the
versions are higher than what was on the tape. But if version-updated
tables are given entirely new unique IDs, you've got no hope at all of
telling which one corresponds to what you had in the listing. Maybe
you can tell by looking through the physical file contents, but
certainly this way is more fragile from the point of view of data
recovery.
regards, tom lane
From tgl@sss.pgh.pa.us Thu Jun 22 01:01:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA22232;
Thu, 22 Jun 2000 01:00:59 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA17842; Thu, 22 Jun 2000 00:31:06 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA07254;
> I strongly object to keep tablespace OID for smgr file reference token
> though we have to keep it for another purpose of cource. I've mentioned
> many times tablespace(where to store) info should be distinguished from
> *where it is stored* info.
Sure. But this proposal assumes that we're relying on symlinks to
carry the information about physical locations corresponding to
tablespace OIDs. The backend just needs to know enough to access a
relation file at a relative pathname like
tablespaceOID/relationOID
(ignoring version and segment numbers for now). Under the hood,
a symlink for tablespaceOID gets the work done.
Certainly this is not a perfect mechanism. But it is simple, it
is reliable, it is portable to most of the platforms we care about
(yeah, I know we have a Win port, but you wouldn't ever recommend
someone to run a *serious* database on it would you?), and in general
I think the bang-for-the-buck ratio is enormous. I do not want to
have to deal with explicit tablespace bookkeeping in the backend,
but that seems like what we'd have to do in order to improve on
symlinks.
regards, tom lane
From pgsql-hackers-owner+M3720@hub.org Thu Jun 22 02:01:02 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24025
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 02:01:02 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21392 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 01:56:49 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M5jp143149;
Thu, 22 Jun 2000 01:45:51 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M5jT143025
for <pgsql-hackers@postgreSQL.org>; Thu, 22 Jun 2000 01:45:29 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA11735;
>I'm wondering if pg_dump should store the location of the tablespace. If
>your machine dies, you get a new machine to re-create the database, you
>may not want the tablespace in the same spot. And text-editing a
>gigabyte file would be extremely painful.
So you don't dump your create tablespace statements, recognizing that on
a new machine (due to upgrades or crashing) you might assign them to
different directories/mount points/whatever. That's the reason for
wanting to hide physical allocation in tablespaces ... the rest of
your datamodel doesn't need to know.
Or you do dump your tablespaces, and knowing the paths assigned
to various ones set up your new machine accordingly.
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From dhogaza@pacifier.com Thu Jun 22 02:00:58 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24005
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 02:00:58 -0400 (EDT)
Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21369 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 01:56:18 -0400 (EDT)
Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68])
by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA12121;
>If the symlink create fails in CREATE TABLESPACE, it just creates an
>ordinary directory.
Silent surprises - the earmark of truly professional software ...
- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.
From Inoue@tpf.co.jp Thu Jun 22 02:01:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24009
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 02:00:59 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21277 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 01:54:44 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
In-Reply-To: <7251.961648182@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > I strongly object to keep tablespace OID for smgr file reference token
> > though we have to keep it for another purpose of cource. I've mentioned
> > many times tablespace(where to store) info should be distinguished from
> > *where it is stored* info.
>
> Sure. But this proposal assumes that we're relying on symlinks to
> carry the information about physical locations corresponding to
> tablespace OIDs. The backend just needs to know enough to access a
> relation file at a relative pathname like
> tablespaceOID/relationOID
> (ignoring version and segment numbers for now). Under the hood,
> a symlink for tablespaceOID gets the work done.
>
I think tablespaceOID is an easy substitution for the purpose.
I don't like to depend on poor directory tree structure in dbms
either..
> Certainly this is not a perfect mechanism. But it is simple, it
> is reliable, it is portable to most of the platforms we care about
> (yeah, I know we have a Win port, but you wouldn't ever recommend
> someone to run a *serious* database on it would you?), and in general
> I think the bang-for-the-buck ratio is enormous. I do not want to
> have to deal with explicit tablespace bookkeeping in the backend,
> but that seems like what we'd have to do in order to improve on
> symlinks.
>
I've already mentioned about it 10 times or so but unfortunately
I see no one on my side yet.
OK,I've given up the discussion about it. I don't want to waste
my time any more.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Thu Jun 22 03:31:04 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA28813
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 03:31:03 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA23901 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 03:06:47 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA07725;
Thu, 22 Jun 2000 03:05:00 -0400 (EDT)
To: Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
Comments: In-reply-to Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>
message dated "Thu, 22 Jun 2000 13:43:56 +1000"
Date: Thu, 22 Jun 2000 03:05:00 -0400
Message-ID: <7722.961657500@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> I'm wondering if pg_dump should store the location of the tablespace. If
> your machine dies, you get a new machine to re-create the database, you
> may not want the tablespace in the same spot. And text-editing a
> gigabyte file would be extremely painful.
Might make sense to store the tablespace setup separately from the bulk
of the data, but certainly you want some way to dump that info in a
restorable form.
I've been thinking lately that the pg_dump shove-it-all-in-one-file
approach doesn't scale anyway. We ought to start thinking about ways
to make the standard dump method store schema separately from bulk
data, for example. That's offtopic for this thread but ought to be
on the TODO list someplace...
regards, tom lane
From pgsql-hackers-owner+M3727@hub.org Thu Jun 22 03:31:06 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA28819
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 03:31:05 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA24751 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 03:29:00 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M7KP140211;
Thu, 22 Jun 2000 03:20:25 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M7Jb139991
for <pgsql-hackers@postgresql.org>; Thu, 22 Jun 2000 03:19:37 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA07785;
Comments: In-reply-to "Philip J. Warner" <pjw@rhyme.com.au>
message dated "Thu, 22 Jun 2000 16:31:33 +1000"
Date: Thu, 22 Jun 2000 03:17:45 -0400
Message-ID: <7782.961658265@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
"Philip J. Warner" <pjw@rhyme.com.au> writes:
>> ... the thing that bothered me was this. Suppose you are trying to
>> recover a corrupted database manually, and the only information you have
>> about which table is which is a somewhat out-of-date listing of OIDs
>> versus table names.
> This worries me a little; in the Dec/RDB world it is a very long time since
> database backups were done by copying the files. There is a database
> backup/restore utility which runs while the database is on-line and makes
> sure a valid snapshot is taken. Backing up storage areas (table spapces)
> can be done separately by the same utility, and again, it records enough
> information to ensure integrity. Maybe the thing to do is write a pg_backup
> utility, which in a first pass could, presumably, be synonymous with pg_dump?
pg_dump already does the consistent-snapshot trick (it just has to run
inside a single transaction).
> Am I missing something here? Is there a problem with backing up using
> 'pg_dump | gzip'?
None, as long as your ambition extends no further than restoring your
data to where it was at your last pg_dump. I was thinking about the
all-too-common-in-the-real-world scenario where you're hoping to recover
some data more recent than your last backup from the fractured shards
of your database...
regards, tom lane
From zeugswettera@wien.spardat.at Thu Jun 22 05:01:11 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA29525
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 05:01:09 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA27070 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 04:38:32 -0400 (EDT)
Received: from peligor.server.lan.at (peligor.server.lan.at [10.8.32.84])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA23252;
Thu, 22 Jun 2000 10:37:45 +0200
Received: from zeus (totalctlh1-port029.f000.d0188.sd.spardat.at [10.8.35.226])
by peligor.server.lan.at (8.9.1/8.9.1) with SMTP id KAA02457;
Thu, 22 Jun 2000 10:41:04 GMT
From: Zeugswetter Andreas SB <zeugswettera@wien.spardat.at>
To: Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>,
> > pg_dump would recreate a CREATE TABLESPACE command:
> >
> > printf("CREATE TABLESPACE %s USING %s", loc, symloc);
> >
> > where symloc would be SELECT symloc(loc) and return the value into a
> > variable that is used by pg_dump. The backend would do the lstat() and
> > return the value to the client.
>
> I'm wondering if pg_dump should store the location of the tablespace. If
> your machine dies, you get a new machine to re-create the database, you
> may not want the tablespace in the same spot. And text-editing a
> gigabyte file would be extremely painful.
Yes, that seems like a valid concern that should be kept in mind.
It should also be possible to restore a pg instance to a different location
on the same machine.
Maybe this could be done by adding a utility that dumps all tablespace
info which could then be altered to desire.
I still opt for instance-wide tablespaces. People wanting separation can easily
create different tablespaces for each database, but those that only want to
separate data and index need only create two tablespaces. A typical installation would
have 1 to 4 tablespaces (systemtbs, datatbs, indextbs, toasttbs | lobdbs )
I would also switch the directory structure between dbname and extent subdir,
because that allows less symlinks/filesystems, and thus less admin.
thus you would have:
tablespace1/extent1/dbname1
tablespace1/extent2/dbname1
tablespace1/extent1/dbname2
Andreas
From pjw@rhyme.com.au Thu Jun 22 04:01:05 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA29060
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 04:01:03 -0400 (EDT)
Received: from acheron.rime.com.au (root@albatr.lnk.telstra.net [139.130.54.222]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA25604 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 03:50:30 -0400 (EDT)
Received: from oberon (Oberon.rime.com.au [203.8.195.100])
by acheron.rime.com.au (8.9.3/8.9.3) with SMTP id RAA08811;
From pgsql-hackers-owner+M3730@hub.org Thu Jun 22 05:31:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA29741
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 05:31:00 -0400 (EDT)
Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id FAA28478 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 05:18:37 -0400 (EDT)
Received: from hub.org (majordom@localhost [127.0.0.1])
by hub.org (8.10.1/8.10.1) with SMTP id e5M96W171286;
Thu, 22 Jun 2000 05:06:32 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
by hub.org (8.10.1/8.10.1) with ESMTP id e5M96A168442
for <pgsql-hackers@postgresql.org>; Thu, 22 Jun 2000 05:06:10 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR
> -----Original Message-----
> From: Peter Eisentraut [mailto:e99re41@DoCS.UU.SE]
>
> > My opinion
> > 3) database and tablespace are relatively irrelevant.
> > I assume PostgreSQL's database would correspond
> > to the concept of SCHEMA.
>
> A database corresponds to a catalog and a schema corresponds to nothing
> yet.
>
Oh I see your point. However I've thought that current PostgreSQL's
database is an imcomplete SCHEMA and still feel so in reality.
Catalog per database has been nothing but needless for me from
the first.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From Inoue@tpf.co.jp Thu Jun 22 07:31:01 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA07559
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 07:31:00 -0400 (EDT)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id HAA02741 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 07:08:29 -0400 (EDT)
Received: from cadzone ([126.0.1.40] (may be forged))
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
In-Reply-To: <7153.961644430@sss.pgh.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Status: OR
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Please add my opinion to the list.
> > Unique-id filename: Hiroshi
> > (Unqiue-id is irrelevant to OID/relname).
>
> "Unique ID" is more or less equivalent to "OID + version number",
> right?
>
Hmm,no one seems to be on my side at this point also.
OK,I change my mind as follows.
OID except cygwin,unique-id on cygwin
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
From tgl@sss.pgh.pa.us Thu Jun 22 11:31:06 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA10544
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 11:31:05 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA23513 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 11:28:53 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA08851;
We don't really want to do that, do we? That's a huge difference in
behavior to have in just one port --- especially a port that none of
the primary developers use (AFAIK anyway). The cygwin port's normal
state of existence will be "broken", surely, if we go that way.
Besides which, OID alone doesn't give us a possibility of file
versioning, and as I commented to Vadim I think we will want that,
WAL or no WAL. So it seems to me the two viable choices are
unique-id or OID+version-number. Either way, the file-naming behavior
should be the same across all platforms.
regards, tom lane
From vmikheev@SECTORBASE.COM Thu Jun 22 14:31:00 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA11892
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 14:30:59 -0400 (EDT)
Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA10107 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 14:17:04 -0400 (EDT)
Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21)
> > I believe that we can avoid versions using WAL...
>
> I don't think so. You're basically saying that
> 1. create file 'new'
> 2. delete file 'old'
> 3. rename 'new' to 'old'
> is safe as long as you have a redo log to ensure that the rename
> happens even if you crash between steps 2 and 3. But crash is not
> the only hazard. What if step 3 just plain fails? Redo won't help.
Ok, ok. Let's use *unique* file name for each table version.
But after thinking, seems that I agreed with Hiroshi about using
*some unique id* for file names instead of oid+version: we could use
just DB' OID + this unique ID in log records to find table file - just
8 bytes.
So, add me to Hiroshi' camp... if Hiroshi is ready to implement new file
naming -:)
> > But what about LOCATIONs? I object using environment and think that
> > locations must be stored in pg_control..?
>
> I don't like environment variables for this either; it's just way too
> easy to start the postmaster with wrong environment. It still seems
> to me that relying on subdirectory symlinks is a good way to go.
I always thought so.
> pg_control is not so good --- if it gets corrupted, how do
> you recover?
Impossible to recover anyway - pg_control keeps last checkpoint pointer,
required for recovery. That's why Oracle recommends (requires?) at least
two copies of control file (and log too).
But what if log gets corrupted? Or file system (lost symlinks etc)?
One will have to use backup...
Vadim
From peter@localhost.its.uu.se Thu Jun 22 18:37:35 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA19684
for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 18:37:34 -0400 (EDT)
Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA02841 for <pgman@candle.pha.pa.us>; Thu, 22 Jun 2000 18:31:53 -0400 (EDT)
Received: from regulus.student.UU.SE ([130.238.5.2]:37596 "EHLO
regulus.its.uu.se") by merganser.its.uu.se with ESMTP
id <S125060AbQFVW3s>; Fri, 23 Jun 2000 00:29:48 +0200
Received: from peter (helo=localhost)
by regulus.its.uu.se with local-esmtp (Exim 3.02 #2)
id 135FaG-00062q-00; Fri, 23 Jun 2000 00:36:28 +0200
Date: Fri, 23 Jun 2000 00:36:28 +0200 (CEST)
From: Peter Eisentraut <peter_e@gmx.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Hiroshi Inoue <Inoue@tpf.co.jp>, Bruce Momjian <pgman@candle.pha.pa.us>,
> The idea was to put the main files in the directory, and create Extent2,
> Extent3 directories for the extents.
The reasoning was, that the database subdir should be below the extentdir,
so that creating different fs for each extent would be easier, and not
depend
on the database name.
It is easy to create fs for:
/var/myspace
or
/var/myspace[/extent1]
/var/myspace/extent2
but not if it has dbname in it.
Andreas
From ZeugswetterA@wien.spardat.at Thu Jun 29 06:34:49 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA25201
for <pgman@candle.pha.pa.us>; Thu, 29 Jun 2000 06:34:44 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id GAA00379 for <pgman@candle.pha.pa.us>; Thu, 29 Jun 2000 06:35:30 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id MAA33950;
Thu, 29 Jun 2000 12:33:42 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)
> Come to think of it, it would probably make sense to adapt the existing
> notion of "location" (cf initlocation script) into something meaning
> "directory that users are allowed to create tablespaces (including
> databases) in".
This is what I've been trying to push all along. But note that this
mechanism does allow multiple databases per location. :)
--
Peter Eisentraut Sernanders v<>g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
From ZeugswetterA@wien.spardat.at Mon Jul 3 04:30:07 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA16088
for <pgman@candle.pha.pa.us>; Mon, 3 Jul 2000 04:30:05 -0400 (EDT)
Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA19031 for <pgman@candle.pha.pa.us>; Mon, 3 Jul 2000 04:30:07 -0400 (EDT)
Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16])
by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA28416;
Mon, 3 Jul 2000 10:28:06 +0200
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0)