Add TODO detail directory.

This commit is contained in:
Bruce Momjian 1999-09-20 15:40:12 +00:00
parent 7559677551
commit 957e6a6921
19 changed files with 12082 additions and 0 deletions

2
doc/TODO.detail/README Normal file
View File

@ -0,0 +1,2 @@
These files are in standard Unix mailbox format, and are detail
information related to the TODO list.

107
doc/TODO.detail/alpha Normal file
View File

@ -0,0 +1,107 @@
From owner-pgsql-hackers@hub.org Fri May 14 16:00:46 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA02173
for <maillist@candle.pha.pa.us>; Fri, 14 May 1999 16:00:44 -0400 (EDT)
Received: from hub.org (hub.org [209.167.229.1]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id QAA02824 for <maillist@candle.pha.pa.us>; Fri, 14 May 1999 16:00:45 -0400 (EDT)
Received: from hub.org (hub.org [209.167.229.1])
by hub.org (8.9.3/8.9.3) with ESMTP id PAA47798;
Fri, 14 May 1999 15:57:54 -0400 (EDT)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 14 May 1999 15:54:30 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.9.3/8.9.3) id PAA47191
for pgsql-hackers-outgoing; Fri, 14 May 1999 15:54:28 -0400 (EDT)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from thelab.hub.org (nat194.147.mpoweredpc.net [142.177.194.147])
by hub.org (8.9.3/8.9.3) with ESMTP id PAA46457
for <pgsql-hackers@postgresql.org>; Fri, 14 May 1999 15:49:35 -0400 (EDT)
(envelope-from scrappy@hub.org)
Received: from localhost (scrappy@localhost)
by thelab.hub.org (8.9.3/8.9.1) with ESMTP id QAA16128;
Fri, 14 May 1999 16:49:44 -0300 (ADT)
(envelope-from scrappy@hub.org)
X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs
Date: Fri, 14 May 1999 16:49:44 -0300 (ADT)
From: The Hermit Hacker <scrappy@hub.org>
To: pgsql-hackers@postgreSQL.org
cc: Jack Howarth <howarth@nitro.med.uc.edu>
Subject: [HACKERS] postgresql bug report (fwd)
Message-ID: <Pine.BSF.4.05.9905141649150.47191-100000@thelab.hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
---------- Forwarded message ----------
Date: Fri, 14 May 1999 14:50:58 -0400
From: Jack Howarth <howarth@nitro.med.uc.edu>
To: scrappy@hub.org
Subject: postgresql bug report
Marc,
In porting the RedHat 6.0 srpm set for a linuxppc release we
believe a bug has been identified in
the postgresql source for 6.5-0.beta1. Our development tools are as
follows...
glibc 2.1.1 pre 2
linux 2.2.6
egcs 1.1.2
the latest binutils snapshot
The bug that we see is that when egcs compiles postgresql at -O1 or
higher (-O0 is fine),
postgresql creates incorrectly formed databases such that when the user
does a destroydb
the database can not be destroyed. Franz Sirl has identified the problem
as follows...
it seems that this problem is a type casting/promotion bug in the
source. The
routine _bt_checkkeys() in backend/access/nbtree/nbtutils.c calls
int2eq() in
backend/utils/adt/int.c via a function pointer
*fmgr_faddr(&key[0].sk_func). As
the type information for int2eq is lost via the function pointer,
the compiler
passes 2 ints, but int2eq expects 2 (preformatted in a 32bit reg)
int16's.
This particular bug goes away, if I for example change int2eq to:
bool
int2eq(int32 arg1, int32 arg2)
{
return (int16)arg1 == (int16)arg2;
}
This moves away the type casting/promotion "work" from caller to the
callee and
is probably the right thing to do for functions used via function
pointers.
...because of the large number of changes required to do this, Franz
thought we should
pass this on to the postgresql maintainers for correction. Please feel
free to contact
Franz Sirl (Franz.Sirl-kernel@lauterbach.com) if you have any questions
on this bug
report.
--
------------------------------------------------------------------------------
Jack W. Howarth, Ph.D. 231 Bethesda Avenue
NMR Facility Director Cincinnati, Ohio 45267-0524
Dept. of Molecular Genetics phone: (513) 558-4420
Univ. of Cincinnati College of Medicine fax: (513) 558-8474

94
doc/TODO.detail/arrays Normal file
View File

@ -0,0 +1,94 @@
From owner-pgsql-hackers@hub.org Wed Nov 25 19:01:02 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA16399
for <maillist@candle.pha.pa.us>; Wed, 25 Nov 1998 19:01:01 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id SAA05250 for <maillist@candle.pha.pa.us>; Wed, 25 Nov 1998 18:53:12 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id SAA17798;
Wed, 25 Nov 1998 18:49:38 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 25 Nov 1998 18:49:07 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id SAA17697
for pgsql-hackers-outgoing; Wed, 25 Nov 1998 18:49:06 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from mail.enterprise.net (root@mail.enterprise.net [194.72.192.18])
by hub.org (8.9.1/8.9.1) with ESMTP id SAA17650;
Wed, 25 Nov 1998 18:48:55 -0500 (EST)
(envelope-from olly@lfix.co.uk)
Received: from linda.lfix.co.uk (root@max01-040.enterprise.net [194.72.197.40])
by mail.enterprise.net (8.8.5/8.8.5) with ESMTP id XAA20539;
Wed, 25 Nov 1998 23:48:52 GMT
Received: from linda.lfix.co.uk (olly@localhost [127.0.0.1])
by linda.lfix.co.uk (8.9.1a/8.9.1/Debian/GNU) with ESMTP id XAA12089;
Wed, 25 Nov 1998 23:48:52 GMT
Message-Id: <199811252348.XAA12089@linda.lfix.co.uk>
X-Mailer: exmh version 2.0.2 2/24/98 (debian)
X-URL: http://www.lfix.co.uk/oliver
X-face: "xUFVDj+ZJtL_IbURmI}!~xAyPC"Mrk=MkAm&tPQnNq(FWxv49R}\>0oI8VM?O2VY+N7@F-
KMLl*!h}B)u@TW|B}6<X<J|}QsVlTi:RA:O7Abc(@D2Y/"J\S,b1!<&<B/J}b.Ii9@B]H6V!+#sE0Q
_+=`K$5TI|4I0-=Cp%pt~L#QYydO'iBXR~\tT?uftep9n9AF`@SzTwsw6uqJ}pL,h(cZi}T#PB"#!k
p^e=Z.K~fuw$l?]lUV)?R]U}l;f*~Ol)#fpKR)Yt}XOr6BI\_Jjr0!@GMnpCTnTym4f;c{;Ms=0{`D
Lq9MO6{wj%s-*N"G,g
To: bugs@postgreSQL.org, hackers@postgreSQL.org
Subject: [HACKERS] Failures with arrays
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 25 Nov 1998 23:48:51 +0000
From: "Oliver Elphick" <olly@lfix.co.uk>
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
This was reported as a bug with the Debian package of 6.3.2; the same
behaviour is still present in 6.4.
bray=> create table foo ( t text[]);
CREATE
bray=> insert into foo values ( '{"a"}');
INSERT 201354 1
bray=> insert into foo values ( '{"a","b"}');
INSERT 201355 1
bray=> insert into foo values ( '{"a","b","c"}');
INSERT 201356 1
bray=> select * from foo;
t
-------------
{"a"}
{"a","b"}
{"a","b","c"}
(3 rows)
bray=> select t[1] from foo;
ERROR: type name lookup of t failed
bray=> select * from foo;
t
-------------
{"a"}
{"a","b"}
{"a","b","c"}
(3 rows)
bray=> select foo.t[1] from foo;
t
-
a
a
a
(3 rows)
bray=> select count(foo.t[1]) from foo;
pqReadData() -- backend closed the channel unexpectedly.
--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"Let us therefore come boldly unto the throne of grace,
that we may obtain mercy, and find grace to help in
time of need." Hebrews 4:16

1556
doc/TODO.detail/cnfify Normal file

File diff suppressed because it is too large Load Diff

351
doc/TODO.detail/flock Normal file
View File

@ -0,0 +1,351 @@
From tgl@sss.pgh.pa.us Sun Aug 30 11:25:23 1998
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA12607
for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 11:25:20 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id LAA15788;
Sun, 30 Aug 1998 11:23:38 -0400 (EDT)
To: Bruce Momjian <maillist@candle.pha.pa.us>
cc: dz@cs.unitn.it (Massimo Dal Zotto), hackers@postgreSQL.org
Subject: Re: [HACKERS] flock patch breaks things here
In-reply-to: Your message of Sun, 30 Aug 1998 08:19:52 -0400 (EDT)
<199808301219.IAA08821@candle.pha.pa.us>
Date: Sun, 30 Aug 1998 11:23:38 -0400
Message-ID: <15786.904490618@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: RO
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Can't we just have configure check for flock(). Another idea is to
> create a 'pid' file in the pgsql/data/base directory, and do a kill -0
> to see if it is stil running before removing the lock.
The latter approach is what I was going to suggest. Writing a pid file
would be a fine idea anyway --- for one thing, it makes it a lot easier
to write a "kill the postmaster" script. Given that the postmaster
should write a pid file, a new postmaster should look for an existing
pid file, and try to do a kill(pid, 0) on the number contained therein.
If this doesn't return an error, then you figure there is already a
postmaster running, complain, and exit. Otherwise you figure you is it,
(re)write the pid file and away you go. Then pqcomm.c can just
unconditionally delete any old file that's in the way of making the
pipe.
The pidfile checking and creation probably ought to go in postmaster.c,
not down inside pqcomm.c. I never liked the fact that a critical
interlock function was being done by a low-level library that one might
not even want to invoke (if all your clients are using TCP, opening up
the Unix-domain socket is a waste of time, no?).
BTW, there is another problem with relying on flock on the socket file
for this purpose: it opens up a hole for a denial-of-service attack.
Anyone who can write the file can flock it. (We already had a problem
with DOS via creating a dummy file at /tmp/.s.PGSQL.5432, but it would
be harder to spot the culprit with an flock-based interference.)
regards, tom lane
From owner-pgsql-hackers@hub.org Sun Aug 30 12:27:41 1998
Received: from hub.org (hub.org [209.47.148.200])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA12976
for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 12:27:37 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id MAA09234; Sun, 30 Aug 1998 12:24:51 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 30 Aug 1998 12:23:26 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id MAA09167 for pgsql-hackers-outgoing; Sun, 30 Aug 1998 12:23:25 -0400 (EDT)
Received: from mambo.cs.unitn.it (mambo.cs.unitn.it [193.205.199.204]) by hub.org (8.8.8/8.7.5) with SMTP id MAA09150 for <hackers@postgreSQL.org>; Sun, 30 Aug 1998 12:23:08 -0400 (EDT)
Received: from boogie.cs.unitn.it (dz@boogie [193.205.199.79]) by mambo.cs.unitn.it (8.6.12/8.6.12) with ESMTP id SAA29572; Sun, 30 Aug 1998 18:21:42 +0200
Received: (from dz@localhost) by boogie.cs.unitn.it (8.8.5/8.6.9) id SAA05993; Sun, 30 Aug 1998 18:21:41 +0200
From: Massimo Dal Zotto <dz@cs.unitn.it>
Message-Id: <199808301621.SAA05993@boogie.cs.unitn.it>
Subject: Re: [HACKERS] flock patch breaks things here
To: hackers@postgreSQL.org (PostgreSQL Hackers)
Date: Sun, 30 Aug 1998 18:21:41 +0200 (MET DST)
Cc: tgl@sss.pgh.pa.us (Tom Lane)
In-Reply-To: <15786.904490618@sss.pgh.pa.us> from "Tom Lane" at Aug 30, 98 11:23:38 am
X-Mailer: ELM [version 2.4 PL24 ME4]
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
>
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Can't we just have configure check for flock(). Another idea is to
> > create a 'pid' file in the pgsql/data/base directory, and do a kill -0
> > to see if it is stil running before removing the lock.
>
> The latter approach is what I was going to suggest. Writing a pid file
> would be a fine idea anyway --- for one thing, it makes it a lot easier
> to write a "kill the postmaster" script. Given that the postmaster
> should write a pid file, a new postmaster should look for an existing
> pid file, and try to do a kill(pid, 0) on the number contained therein.
> If this doesn't return an error, then you figure there is already a
> postmaster running, complain, and exit. Otherwise you figure you is it,
> (re)write the pid file and away you go. Then pqcomm.c can just
> unconditionally delete any old file that's in the way of making the
> pipe.
>
> The pidfile checking and creation probably ought to go in postmaster.c,
> not down inside pqcomm.c. I never liked the fact that a critical
> interlock function was being done by a low-level library that one might
> not even want to invoke (if all your clients are using TCP, opening up
> the Unix-domain socket is a waste of time, no?).
>
> BTW, there is another problem with relying on flock on the socket file
> for this purpose: it opens up a hole for a denial-of-service attack.
> Anyone who can write the file can flock it. (We already had a problem
> with DOS via creating a dummy file at /tmp/.s.PGSQL.5432, but it would
> be harder to spot the culprit with an flock-based interference.)
This came to my mind, but I didn't think this would have happened so
quickly. In my opinion the socket and the pidfile should be created in a
directory owned by postgres, for example /tmp/.Pgsql-unix, like does X.
--
Massimo Dal Zotto
+----------------------------------------------------------------------+
| Massimo Dal Zotto email: dz@cs.unitn.it |
| Via Marconi, 141 phone: ++39-461-534251 |
| 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
| Italy pgp: finger dz@tango.cs.unitn.it |
+----------------------------------------------------------------------+
From owner-pgsql-hackers@hub.org Sun Aug 30 13:01:10 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA13785
for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 13:01:09 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id MAA29386 for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 12:58:24 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id MAA11406; Sun, 30 Aug 1998 12:54:48 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 30 Aug 1998 12:52:22 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id MAA11310 for pgsql-hackers-outgoing; Sun, 30 Aug 1998 12:52:20 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id MAA11296 for <hackers@postgreSQL.org>; Sun, 30 Aug 1998 12:52:13 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id MAA16094;
Sun, 30 Aug 1998 12:50:55 -0400 (EDT)
To: Massimo Dal Zotto <dz@cs.unitn.it>
cc: hackers@postgreSQL.org (PostgreSQL Hackers)
Subject: Re: [HACKERS] flock patch breaks things here
In-reply-to: Your message of Sun, 30 Aug 1998 18:21:41 +0200 (MET DST)
<199808301621.SAA05993@boogie.cs.unitn.it>
Date: Sun, 30 Aug 1998 12:50:55 -0400
Message-ID: <16092.904495855@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
Massimo Dal Zotto <dz@cs.unitn.it> writes:
> In my opinion the socket and the pidfile should be created in a
> directory owned by postgres, for example /tmp/.Pgsql-unix, like does X.
The pidfile belongs at the top level of the database directory (eg,
/usr/local/pgsql/data/postmaster.pid), because what it actually
represents is that there is a postmaster running *for that database
group*.
If you want to support multiple database sets on one machine (which I
do), then the interlock has to be per database directory. Putting the
pidfile into a common directory would mean we'd have to invent some
kind of pidfile naming convention to keep multiple postmasters from
tromping on each other. This is unnecessarily complex.
I agree with you that putting the socket file into a less easily munged
directory than /tmp would be a good idea for security. But that's a
separate issue. On machines that understand stickybits for directories,
the security hole is not really very big.
At this point, the fact that /tmp/.s.PGSQL.port# is the socket path is
effectively a version-independent aspect of the FE/BE protocol, and so
we can't change it without breaking old applications. I'm not sure that
that's worth the security improvement.
What I'd like to see someday is a postmaster command line switch to tell
it to use *only* TCP connections and not create a Unix socket at all.
That hasn't been possible so far, because we were relying on the socket
file to provide a safety interlock against starting multiple
postmasters. But an interlock using a pidfile would be much better.
(Look around; *every* other Unix daemon I know of that wants to ensure
that there's only one of it uses a pidfile interlock. Not file locking.
There's a reason why that's the well-trodden path.)
regards, tom lane
From owner-pgsql-hackers@hub.org Sun Aug 30 15:31:13 1998
Received: from hub.org (hub.org [209.47.148.200])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA15275
for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 15:31:11 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id PAA22194; Sun, 30 Aug 1998 15:27:20 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 30 Aug 1998 15:23:58 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id PAA21800 for pgsql-hackers-outgoing; Sun, 30 Aug 1998 15:23:57 -0400 (EDT)
Received: from thelab.hub.org (nat0118.mpoweredpc.net [142.177.188.118]) by hub.org (8.8.8/8.7.5) with ESMTP id PAA21696 for <hackers@postgreSQL.org>; Sun, 30 Aug 1998 15:22:51 -0400 (EDT)
Received: from localhost (scrappy@localhost)
by thelab.hub.org (8.9.1/8.8.8) with SMTP id QAA18542;
Sun, 30 Aug 1998 16:21:29 -0300 (ADT)
(envelope-from scrappy@hub.org)
X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs
Date: Sun, 30 Aug 1998 16:21:28 -0300 (ADT)
From: The Hermit Hacker <scrappy@hub.org>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Massimo Dal Zotto <dz@cs.unitn.it>,
PostgreSQL Hackers <hackers@postgreSQL.org>
Subject: Re: [HACKERS] flock patch breaks things here
In-Reply-To: <16092.904495855@sss.pgh.pa.us>
Message-ID: <Pine.BSF.4.02.9808301618350.343-100000@thelab.hub.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
On Sun, 30 Aug 1998, Tom Lane wrote:
> Massimo Dal Zotto <dz@cs.unitn.it> writes:
> > In my opinion the socket and the pidfile should be created in a
> > directory owned by postgres, for example /tmp/.Pgsql-unix, like does X.
>
> The pidfile belongs at the top level of the database directory (eg,
> /usr/local/pgsql/data/postmaster.pid), because what it actually
> represents is that there is a postmaster running *for that database
> group*.
I have to agree with this one...but then it also negates the
argument about the flock() DoS...*grin*
BTW...I like the kill(pid,0) solution myself, primarily because it
is, i think, the most portable solution.
I would not consider a patch to remove the flock() solution and
replace it with the kill(pid,0) solution a new feature, just an
improvement of an existing one...either way, moving the pid file (or
socket, for that matter) from /tmp should be listed as a security related
requirement for v6.4 :)
Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
From owner-pgsql-hackers@hub.org Sun Aug 30 22:41:10 1998
Received: from hub.org (hub.org [209.47.148.200])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA01526
for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 22:41:08 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id WAA29298; Sun, 30 Aug 1998 22:38:18 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 30 Aug 1998 22:35:05 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id WAA29203 for pgsql-hackers-outgoing; Sun, 30 Aug 1998 22:35:03 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id WAA29017 for <hackers@postgreSQL.org>; Sun, 30 Aug 1998 22:34:55 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id WAA20075;
Sun, 30 Aug 1998 22:34:41 -0400 (EDT)
To: The Hermit Hacker <scrappy@hub.org>
cc: PostgreSQL Hackers <hackers@postgreSQL.org>
Subject: Re: [HACKERS] flock patch breaks things here
In-reply-to: Your message of Sun, 30 Aug 1998 16:21:28 -0300 (ADT)
<Pine.BSF.4.02.9808301618350.343-100000@thelab.hub.org>
Date: Sun, 30 Aug 1998 22:34:40 -0400
Message-ID: <20073.904530880@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
The Hermit Hacker <scrappy@hub.org> writes:
> either way, moving the pid file (or
> socket, for that matter) from /tmp should be listed as a security related
> requirement for v6.4 :)
Huh? There is no pid file being generated in /tmp (or anywhere else)
at the moment. If we do add one, it should not go into /tmp for the
reasons I gave before.
Where the Unix-domain socket file lives is an entirely separate issue.
If we move the socket out of /tmp then we have just kicked away all the
work we did to preserve backwards compatibility of the FE/BE protocol
with existing clients. Being able to talk to a 1.0 client isn't much
good if you aren't listening where he's going to try to contact you.
So I think I have to vote in favor of leaving the socket where it is.
regards, tom lane
From owner-pgsql-hackers@hub.org Mon Aug 31 11:31:19 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA21195
for <maillist@candle.pha.pa.us>; Mon, 31 Aug 1998 11:31:13 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id LAA06827 for <maillist@candle.pha.pa.us>; Mon, 31 Aug 1998 11:17:41 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA24792; Mon, 31 Aug 1998 11:12:18 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 31 Aug 1998 11:10:31 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA24742 for pgsql-hackers-outgoing; Mon, 31 Aug 1998 11:10:29 -0400 (EDT)
Received: from trillium.nmsu.edu (trillium.NMSU.Edu [128.123.5.15]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA24725 for <hackers@postgreSQL.org>; Mon, 31 Aug 1998 11:10:22 -0400 (EDT)
Received: (from brook@localhost)
by trillium.nmsu.edu (8.8.8/8.8.8) id JAA03282;
Mon, 31 Aug 1998 09:09:01 -0600 (MDT)
Date: Mon, 31 Aug 1998 09:09:01 -0600 (MDT)
Message-Id: <199808311509.JAA03282@trillium.nmsu.edu>
From: Brook Milligan <brook@trillium.NMSU.Edu>
To: tgl@sss.pgh.pa.us
CC: dg@informix.com, hackers@postgreSQL.org
In-reply-to: <23042.904573041@sss.pgh.pa.us> (message from Tom Lane on Mon, 31
Aug 1998 10:17:21 -0400)
Subject: Re: [HACKERS] flock patch breaks things here
References: <23042.904573041@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
I just came up with an idea that might help alleviate the /tmp security
exposure without creating a backwards-compatibility problem. It works
like this:
1. During installation, create a subdirectory of /tmp to hold Postgres'
socket files and associated pid lockfiles. This subdirectory should be
owned by the Postgres superuser and have permissions 755
(world-readable, writable only by Postgres superuser). Maybe call it
/tmp/.pgsql --- the name should start with a dot to keep it out of the
way. (Bruce points out that some systems clear /tmp during reboot, so
it might be that a postmaster will have to be prepared to recreate this
directory at startup --- anyone know if subdirectories of /tmp are
zapped too? My system doesn't do that...)
...
I notice that on my system, the X11 socket files in /tmp/.X11-unix are
actually symlinks to socket files in /usr/spool/sockets/X11. I dunno if
it's worth our trouble to get into putting our sockets under /usr/spool
or /var/spool or whatever --- seems like another configuration choice to
mess up. It'd be nice if the socket directory lived somewhere where the
parent dirs weren't world-writable, but this would mean one more thing
that you have to have root permissions for in order to install pgsql.
It seems like we need a directory for locks (= pid files) and one for
sockets (perhaps the same one). I strongly suggest that the location
for these be configurable. By default, it might make sense to put
them in ~pgsql/locks and ~pgsql/sockets. It is easy (i.e., I'll be
glad to do it) to modify configure.in to take options like
--lock-dir=/var/spool/lock
--socket-dir=/var/spool/sockets
that set cc defines and have the code respond accordingly. This way,
those who don't care (or don't have root access) can use the defaults,
whereas those with root access who like to keep locks and sockets in a
common place can do so easily. Either way, multiple postmasters (all
compiled with the same options of course) can check the appropriate
locks in the well-known places. Finally, drop the link into /tmp for
the old socket and document that it will be disappearing at some
point, and all is fine.
If someone wants to give me some guidance on what preprocessor
variables should be set in response to the above options (or something
like them), I'll do the configure stuff.
Cheers,
Brook

69
doc/TODO.detail/fsync Normal file
View File

@ -0,0 +1,69 @@
From owner-pgsql-general@hub.org Fri Dec 18 06:31:23 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA05554
for <maillist@candle.pha.pa.us>; Fri, 18 Dec 1998 06:31:21 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id EAA21127 for <maillist@candle.pha.pa.us>; Fri, 18 Dec 1998 04:46:38 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id EAA01409;
Fri, 18 Dec 1998 04:44:19 -0500 (EST)
(envelope-from owner-pgsql-general@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 18 Dec 1998 04:43:22 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id EAA01093
for pgsql-general-outgoing; Fri, 18 Dec 1998 04:43:18 -0500 (EST)
(envelope-from owner-pgsql-general@postgreSQL.org)
Received: from dune.krs.ru (dune.krs.ru [195.161.16.38])
by hub.org (8.9.1/8.9.1) with ESMTP id EAA01067
for <pgsql-general@postgreSQL.org>; Fri, 18 Dec 1998 04:43:09 -0500 (EST)
(envelope-from vadim@krs.ru)
Received: from krs.ru (localhost.krs.ru [127.0.0.1])
by dune.krs.ru (8.8.8/8.8.7) with ESMTP id QAA16201;
Fri, 18 Dec 1998 16:41:44 +0700 (KRS)
(envelope-from vadim@krs.ru)
Message-ID: <367A2354.E998763@krs.ru>
Date: Fri, 18 Dec 1998 16:41:40 +0700
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
X-Accept-Language: ru, en
MIME-Version: 1.0
To: Anton de Wet <adw@obsidian.co.za>
CC: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Why PostgreSQL is better than other commerial softwares?
References: <Pine.LNX.4.04.9812181046030.9458-100000@ra.obsidian.co.za>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-general@postgreSQL.org
Precedence: bulk
Status: RO
Anton de Wet wrote:
>
> >
> > Often quick mailing list support?
>
> :-)
>
> While on the subject I finally found the solution to a problem I (and one
> or two other people) posted about without answer. (So sometimes it's slow
> mailing list support).
>
> In importing about 5 million records (which I copy in blocks of 10000) the
> copy became linearly slower. After a friend RTFM and refered me, I used
> the -F switch (passed by the postmaster to the backend processes) and the
> time became linear and a LOT shorter. Import time for the 5000000 records
> now the same (or maybe even slightly faster, I didn't accurately time
> them) as importing the data into oracle on the same machine.
"While on the subject..." -:)
This is the problem of buffer manager, known for very long time:
when copy eats all buffers, manager begins write/fsync each
durty buffer to free buffer for new data. All updated relations
should be fsynced _once_ @ transaction commit. You would get
the same results without -F...
I still have no time to implement this -:(
Vadim

332
doc/TODO.detail/lex Normal file
View File

@ -0,0 +1,332 @@
From selkovjr@mcs.anl.gov Sat Jul 25 05:31:05 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA16564
for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 05:31:03 -0400 (EDT)
Received: from antares.mcs.anl.gov (mcs.anl.gov [140.221.9.6]) by renoir.op.net (o1/$Revision: 1.1 $) with SMTP id FAA01775 for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 05:28:22 -0400 (EDT)
Received: from mcs.anl.gov (wit.mcs.anl.gov [140.221.5.148]) by antares.mcs.anl.gov (8.6.10/8.6.10) with ESMTP
id EAA28698 for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 04:27:05 -0500
Sender: selkovjr@mcs.anl.gov
Message-ID: <35B9968D.21CF60A2@mcs.anl.gov>
Date: Sat, 25 Jul 1998 08:25:49 +0000
From: "Gene Selkov, Jr." <selkovjr@mcs.anl.gov>
Organization: MCS, Argonne Natl. Lab
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.32 i586)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: position-aware scanners
References: <199807250524.BAA07296@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO
Bruce,
I attached here (trough the web links) a couple examples, totally
irrelevant to postgres but good enough to discuss token locations. I
might as well try to patch the backend parser, though not sure how soon.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.
The first c parser I wrote,
http://wit.mcs.anl.gov/~selkovjr/unit-troff.tgz, is not very
sophisticated, so token locations reported by yyerr() may be slightly
incorrect (+/- one position depending on the existence and type of the
lookahead token. It is a filter used to typeset the units of measurement
with eqn. To use it, unpack the tar file and run make. The Makefile is
not too generic but I built it on various systems including linux,
freebsd and sunos 4.3. The invocation can be something like this:
./check 0 parse "l**3/(mmoll*min)"
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
`'(''
l**3/(mmoll*min)
^^^^^
Now to the guts. As far as I can imagine, the only way to consistently
keep track of each character read by the scanner (regardless of the
length of expressions it will match) is to redefine its YY_INPUT like
this:
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
{ \
int c = (int) buffer[pos++]; \
result = (c == '\0') ? YY_NULL : (buf[0] = c, 1); \
}
Here, buffer is the pointer to the origin of the string being scanned
and pos is a global variable, similar in usage to a file pointer (you
can both read and manipulate it at will). The buffer and the pointer are
initialized by the function
void setString(char *s)
{
buffer = s;
pos = 0;
}
each time the new string is to be parsed. This (exportable) function is
part of the interface.
In this simplistic design, yyerror() is part of the scanner module and
it uses the pos variable to report the location of unexpected tokens.
The downside of such arrangement is that in case of error condition, you
can't easily tell whether your context is current or lookahead token, it
just reports the position of the last token read (be it $ (end of
buffer) or something else):
./check 0 convert "mol/foo"
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
`'(''
mol/foo
^^^
(should be at the beginning of "foo")
./check 0 convert "mmol//l"
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
`'(''
mmol//l
^
(should be at the second '/')
I believe this is why most simple parsers made with yacc would report
parse errors being "at or near" some token, which is fair enough if the
expression is not too complex.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2. The second version of the same scanner,
http://wit.mcs.anl.gov/~selkovjr/scanner-example.tgz, addresses this
problem by recording exact locations of the tokens in each instance of
the token semantic data structure. The global,
UNIT_YYSTYPE unit_yylval;
would be normally used to export the token semantics (including its
original or modified text and location data) to the parser.
Unfortunately, I cannot show you the parser part in c, because that's
about when I stopped writing parsers in c. Instead, I included a small
test program, test.c, that mimics the parser's expectations for the
scanner data pretty well. I am assuming here that you are not interested
in digging someone else's ugly guts for relatively small bit of
information; let me know if I am wrong and I will send you the complete
perl code (also generated with bison).
To run this example, unpack the tar file and run Make. Then do
gcc test.c scanner.o
and run a.out
Note the line
yylval = unit_getyylval();
in test.c. You will not normally need it in a c parser. It is enough to
define yylval as an external variable and link it to yylval in yylex()
In the bison-generated parser, yylval gets pushed into a stack (pointed
to by yylsp) each time a new token is read. For each syntax rule, the
bison macros @1, @2, ... are just shortcuts to locations in the stack 1,
2, ... levels deep. In following code fragment, @3 refers to the
location info for the third term in the rule (INTEGER):
(sorry about perl, but I think you can do the same things in c without
significant changes to your existing parser)
term: base {
$$ = $1;
$$->{'order'} = 1;
}
| base EXP INTEGER {
$$ = $1;
$$->{'order'} = @3->{'text'};
$$->{'scale'} = $$->{'scale'} ** $$->{'order'};
if ( $$->{'order'} == 0 ) {
yyerror("Error: expecting a non-zero
integer exponent");
YYERROR;
}
}
which translates to:
($yyn == 10) && do {
$yyval = $yyvsa[-1];
$yyval->{'order'} = 1;
last SWITCH;
};
($yyn == 11) && do {
$yyval = $yyvsa[-3];
$yyval->{'order'} = $yylsa[-1]->{'text'}
$yyval->{'scale'} = $yyval->{'scale'} ** $yyval->{'order'};
if ( $yyval->{'order'} == 0 ) {
yyerror("Error: expecting a non-zero integer
exponent");
goto yyerrlab1 ;
}
last SWITCH;
};
In c, you will have a bit more complicated pointer arithmetic to adress
the stack, but the usage of objects will be the same. Note here that it
is convenient to keep all information about the token in its location
info, (yylsa, yylsp, yylval, @n), while everything relating to the value
of the expression, or to the parse tree, is better placed in the
semantic stack (yyssa, yyssp, yysval, $n). Also note that in some cases
you can do semantic checks inside rules and report useful messages
before or instead of invoking yyerror();
Finally, it is useful to make the following wrapper function around
external yylex() in order to maintain your own token stack. Unlike the
parser's internal stack which is only as deep as the rule being reduced,
this one can hold all tokens recognized during the current run, and that
can be extremely helpful for error reporting and any transformations you
may need. In this way, you can even scan (tokenize) the whole buffer
before handing it off to the parser (who knows, you may need a token
ahead of what is currently seen by the parser):
sub tokenize {
undef @tokenTable;
my ($tok, $text, $name, $unit, $first_line, $first_column,
$last_line, $last_column);
while ( ($tok = &UnitLex::yylex()) > 0 ) { # this is where the
c-coded yylex is called,
# UnitLex is the perl
extension encapsulating it
( $text, $name, $unit, $first_line, $first_column, $last_line,
$last_column ) = &UnitLex::getyylval;
push(@tokenTable,
Unit::yyltype->new (
'token' => $tok,
'text' => $text,
'name' => $name,
'unit' => $unit,
'first_line' => $first_line,
'first_column' => $first_column,
'last_line' => $last_line,
'last_column' => $last_column,
)
)
}
}
It is now a lot easier to handle various state-related problems, such as
backtracking and error reporting. The yylex() function as seen by the
parser might be constructed somewhat like this:
sub yylex {
$yylloc = $tokenTable[$tokenNo]; # $tokenNo is a global; now
instead of a "file pointer",
# as in the first example, we have
a "token pointer"
undef $yylval;
# disregard this; name this block "computing semantic values"
if ( $yylloc->{'token'} == UNIT) {
$yylval = Unit::Operand->new(
'unit' => Unit::Dict::unit($yylloc->{'unit'}),
'base' => Unit::Dict::base($yylloc->{'unit'}),
'scale' => Unit::Dict::scale($yylloc->{'unit'}),
'scaleToBase' => Unit::Dict::scaleToBase($yylloc->{'unit'}),
'loc' => $yylloc,
);
}
elsif ( ($yylloc->{'token'} == INTEGER ) || ($yylloc->{'token'} ==
POSITIVE_NUMBER) ) {
$yylval = Unit::Operand->new(
'unit' => '1',
'base' => '1',
'scale' => 1,
'scaleToBase' => 1,
'loc' => $yylloc,
);
}
$tokenNo++;
return(%{$yylloc}->{'token'}); # This is all the parser needs to
know about this token.
# But we already made sure we saved
everything we need to know.
}
Now the most interesting part, the error reporting routine:
sub yyerror {
my ($str) = @_;
my ($message, $start, $end, $loc);
$loc = $tokenTable[$tokenNo-1]; # This is the same as to say,
# "obtain the location info for the
current token"
# You may use this routine for your own purposes or let parser use
it
if( $str ne 'parse error' ) {
$message = "$str instead of `" . $loc->{'name'} . "' <" .
$loc->{'text'} . ">, at line " . $loc->{'first_line'} . ":\n\
n";
}
else {
$message = "unexpected token `" . $loc->{'name'} . "' <" .
$loc->{'text'} . ">, at line " . loc->{'first_line'} . ":\n
\n";
}
$message .= $parseBuffer . "\n"; # that's the original string that
was used to set the parser buffer
$message .= ( ' ' x ($loc->{'first_column'} + 1) ) . ( '^' x
length($loc->{'text'}) ). "\n";
if( $str ne 'parse error' ) {
print STDERR "$str instead of `", $loc->{'name'}, "' {",
$loc->{'text'}, "}, at line ", $loc->{'first_line'}, ":\n\n";
}
else {
print STDERR "unexpected token `", $loc->{'name'}, "' {",
$loc->{'text'}, "}, at line ", $loc->{'first_line'}, ":\n\n";
}
print STDERR "$parseBuffer\n";
print STDERR ' ' x ($loc->{'first_column'} + 1), '^' x
length($loc->{'text'}), "\n";
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Scanners used in these examples assume there is a single line of text on
the input (the first_line and last_line elements of yylloc are simply
ignored). If you want to be able to parse multi-line buffers, just add a
lex rule for '\n' that will increment the line count and reset the pos
variable to zero.
Ugly as it may seem, I find this approach extremely liberating. If the
grammar becomes too complicated for a LALR(1) parser, I can cascade
multiple parsers. The token table can then be used to reassemble parts
of original expression for subordinate parsers, preserving the location
info all the way down, so that subordinate parsers can report their
problems consistently. You probably don't need this, as SQL is very well
thought of and has parsable grammar. But it may be of some help, for
error reporting.
--Gene

5708
doc/TODO.detail/limit Normal file

File diff suppressed because it is too large Load Diff

207
doc/TODO.detail/logging Normal file
View File

@ -0,0 +1,207 @@
From owner-pgsql-hackers@hub.org Fri Nov 13 13:24:37 1998
Received: from hub.org (majordom@hub.org [209.47.148.200])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457
for <maillist@candle.pha.pa.us>; Fri, 13 Nov 1998 13:24:35 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id NAA02464;
Fri, 13 Nov 1998 13:22:52 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id NAA02331
for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.1/8.9.1) with SMTP id NAA02316
for <pgsql-hackers@postgreSQL.org>; Fri, 13 Nov 1998 13:21:06 -0500 (EST)
(envelope-from wieck@sapserv.debis.de)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET
Message-Id: <m0zeOEf-000EBPC@orion.SAPserv.Hamburg.dsh.de>
From: jwieck@debis.com (Jan Wieck)
Subject: [HACKERS] shmem limits and redolog
To: pgsql-hackers@postgreSQL.org (PostgreSQL HACKERS)
Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET)
Reply-To: jwieck@debis.com (Jan Wieck)
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
Hi,
I'm currently hacking around on a solution for logging all
database operations at query level that can recover a crashed
database from the last successful backup by redoing all the
commands.
Well, I wanted it to be as flexible as can. So I decided to
make it per database configurable. One could say which
databases are logged and if a database is, if it is logged
sync or async (in sync mode, every COMMIT forces an fsync of
the actual logfile and controlfiles).
To make async mode as fast as can, I'm using a shared memory
of 32K per database (not per backend) that is used as a wrap
around buffer from the backends to place their query
information. So the log writer can fall a little behind if
there are many backends doing different things that don't
lock each other.
Now I'm a little in doubt about the shared memory limits
reported. Was it a good decision to use shared memory? Am I
better off using socket's?
The bad thing in what I have up to now (it's far from
complete) is, that even if a database isn't currently logged,
a redolog writer is started and creates the 32K shmem segment
(plus a semaphore set with 5 semaphores). This is because I
plan to create commands like
ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname';
and the like that can be used at runtime (while more than one
backend is connected to the database) to turn logging on/off,
switch to/from backup mode (all other activity is stopped)
etc.
So every 32 databases will require another megabyte of shared
memory. The logging master controls which databases have
activity and kills redolog writers after some time of
inactivity, and the shmem is freed then. But it can hurt if
someone really has many many databases that are all used at
the same time.
What do the others say?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
From owner-pgsql-hackers@hub.org Wed Dec 16 15:46:41 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521
for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:46:40 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id PAA08772 for <maillist@candle.pha.pa.us>; Wed, 16 Dec 1998 15:10:01 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id PAA01254;
Wed, 16 Dec 1998 15:06:56 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id OAA00660
for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.1/8.9.1) with SMTP id OAA00643
for <pgsql-hackers@postgreSQL.org>; Wed, 16 Dec 1998 14:58:05 -0500 (EST)
(envelope-from wieck@sapserv.debis.de)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@postgreSQL.org
id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET
Message-Id: <m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de>
From: jwieck@debis.com (Jan Wieck)
Subject: Re: [HACKERS] redolog - for discussion
To: vadim@krs.ru (Vadim Mikheev)
Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET)
Cc: jwieck@debis.com, pgsql-hackers@postgreSQL.org
Reply-To: jwieck@debis.com (Jan Wieck)
In-Reply-To: <3677B71D.C67462B3@krs.ru> from "Vadim Mikheev" at Dec 16, 98 08:35:25 pm
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Vadim wrote:
>
> Jan Wieck wrote:
> >
> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET};
> >
> ...
> >
> > For the others, the backend starts the recovery program
> > which reads the redolog files, establishes database
> > connections as required and reruns all the commands in
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > them. If a required logfile isn't found, it tells the
> ^^^^^
>
> I foresee problems with using _commands_ logging for
> recovery/replication -:((
>
> Let's consider two concurrent updates in READ COMMITTED mode:
>
> update test set x = 2 where y = 1;
>
> and
>
> update test set x = 3 where y = 1;
>
> The result of both committed transaction will be x = 2
> if the 1st transaction updated row _after_ 2nd transaction
> and x = 3 if the 2nd transaction gets row after 1st one.
> Order of updates is not defined by order in which commands
> begun and so order in which commands should be rerun
> will be unknown...
Yepp, the order in which commands begun is absolutely not of
interest. Locking could already delay the execution of one
command until another one started later has finished and
released the lock. It's a classic race condition.
Thus, my plan was to log the queries just before the call to
CommitTransactionCommand() in tcop. This has the advantage,
that queries which bail out with errors don't get into the
log at all and must not get rerun. And I can set a static
flag to false before starting the command, which is set to
true in the buffer manager when a buffer is written (marked
dirty), so filtering out queries that do no updates at all is
easy.
Unfortunately query level logging get's hit by the current
implementation of sequence numbers. If a query that get's
aborted somewhere in the middle (maybe by a trigger) called
nextval() for rows processed earlier, the sequence number
isn't advanced at recovery time, because the query is
suppressed at all. And sequences aren't locked, so for
concurrently running queries getting numbers from the same
sequence, the results aren't reproduceable. If some
application selects a value resulting from a sequence and
uses that later in another query, how could the redolog know
that this has changed? It's a Const in the query logged, and
all that corrupts the whole thing.
All that is painful and I don't see another solution yet than
to hook into nextval(), log out the numbers generated in
normal operation and getting back the same numbers in redo
mode.
The whole thing gets more and more complicated :-(
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

1240
doc/TODO.detail/memory Normal file

File diff suppressed because it is too large Load Diff

119
doc/TODO.detail/nulls Normal file
View File

@ -0,0 +1,119 @@
From owner-pgsql-general@hub.org Fri Oct 9 18:22:09 1998
Received: from hub.org (majordom@hub.org [209.47.148.200])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA04220
for <maillist@candle.pha.pa.us>; Fri, 9 Oct 1998 18:22:08 -0400 (EDT)
Received: from localhost (majordom@localhost)
by hub.org (8.8.8/8.8.8) with SMTP id SAA26960;
Fri, 9 Oct 1998 18:18:29 -0400 (EDT)
(envelope-from owner-pgsql-general@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Oct 1998 18:18:07 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.8.8/8.8.8) id SAA26917
for pgsql-general-outgoing; Fri, 9 Oct 1998 18:18:04 -0400 (EDT)
(envelope-from owner-pgsql-general@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-general@postgreSQL.org using -f
Received: from gecko.statsol.com (gecko.statsol.com [198.11.51.133])
by hub.org (8.8.8/8.8.8) with ESMTP id SAA26904
for <pgsql-general@postgresql.org>; Fri, 9 Oct 1998 18:17:46 -0400 (EDT)
(envelope-from statsol@statsol.com)
Received: from gecko (gecko [198.11.51.133])
by gecko.statsol.com (8.9.0/8.9.0) with SMTP id SAA00557
for <pgsql-general@postgresql.org>; Fri, 9 Oct 1998 18:18:00 -0400 (EDT)
Date: Fri, 9 Oct 1998 18:18:00 -0400 (EDT)
From: Steve Doliov <statsol@statsol.com>
X-Sender: statsol@gecko
To: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Making NULLs visible.
Message-ID: <Pine.GSO.3.96.981009181716.545B-100000@gecko>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-general@postgreSQL.org
Precedence: bulk
Status: RO
On Fri, 9 Oct 1998, Bruce Momjian wrote:
> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > > Yes, \ always outputs as \\, excepts someone changed it last week, and I
> > > am requesting a reversal. Do you like the \N if it is unique?
> >
> > Well, it's certainly clear, but could be confused with \n (newline). Can we
> > have \0 instead?
>
> Yes, but it is uppercase. \0 looks like an octal number to me, and I
> think we even output octals sometimes, don't we?
>
my first suggestion may have been hare-brained, but why not just make the
specifics of the output user-configurable. So if the user chooses \0, so
be it, if the user chooses \N so be it, if the user likes NULL so be it.
but the option would only have one value per database at any given point
in time. so database x could use \N on tuesday and NULL on wednesday, but
database x could never have two references to the characters(s) used to
represent a null value.
steve
From owner-pgsql-general@hub.org Sun Oct 11 17:31:08 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA20043
for <maillist@candle.pha.pa.us>; Sun, 11 Oct 1998 17:31:02 -0400 (EDT)
Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id RAA03069 for <maillist@candle.pha.pa.us>; Sun, 11 Oct 1998 17:10:34 -0400 (EDT)
Received: from localhost (majordom@localhost)
by hub.org (8.8.8/8.8.8) with SMTP id QAA10856;
Sun, 11 Oct 1998 16:57:34 -0400 (EDT)
(envelope-from owner-pgsql-general@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Oct 1998 16:53:35 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.8.8/8.8.8) id QAA10393
for pgsql-general-outgoing; Sun, 11 Oct 1998 16:53:34 -0400 (EDT)
(envelope-from owner-pgsql-general@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-general@postgreSQL.org using -f
Received: from mail1.panix.com (mail1.panix.com [166.84.0.212])
by hub.org (8.8.8/8.8.8) with ESMTP id QAA10378
for <pgsql-general@postgreSQL.org>; Sun, 11 Oct 1998 16:53:28 -0400 (EDT)
(envelope-from tomg@admin.nrnet.org)
Received: from mailhost.nrnet.org (root@mailhost.nrnet.org [166.84.192.39])
by mail1.panix.com (8.8.8/8.8.8/PanixM1.3) with ESMTP id QAA16311
for <pgsql-general@postgreSQL.org>; Sun, 11 Oct 1998 16:53:24 -0400 (EDT)
Received: from admin.nrnet.org (uucp@localhost)
by mailhost.nrnet.org (8.8.7/8.8.4) with UUCP
id QAA16345 for pgsql-general@postgreSQL.org; Sun, 11 Oct 1998 16:28:47 -0400
Received: from localhost (tomg@localhost)
by admin.nrnet.org (8.8.7/8.8.7) with SMTP id QAA11569
for <pgsql-general@postgreSQL.org>; Sun, 11 Oct 1998 16:28:41 -0400
Date: Sun, 11 Oct 1998 16:28:41 -0400 (EDT)
From: Thomas Good <tomg@admin.nrnet.org>
To: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Making NULLs visible.
In-Reply-To: <Pine.GSO.3.96.981009181716.545B-100000@gecko>
Message-ID: <Pine.LNX.3.96.981011161908.11556A-100000@admin.nrnet.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-pgsql-general@postgreSQL.org
Precedence: bulk
Status: RO
Watching all this go by...as a guy who has to move alot of data
from legacy dbs to postgres, I've gotten used to \N being a null.
My vote, if I were allowed to cast one, would be to have one null
and that would be the COPY command null. I have no difficulty
distinguishing a null from a newline...
At the pgsql command prompt I would find seeing \N rather reassuring.
I've seen alot of these little guys.
---------- Sisters of Charity Medical Center ----------
Department of Psychiatry
----
Thomas Good <tomg@q8.nrnet.org>
Coordinator, North Richmond C.M.H.C. Information Systems
75 Vanderbilt Ave, Quarters 8 Phone: 718-354-5528
Staten Island, NY 10304 Fax: 718-354-5056

987
doc/TODO.detail/optimizer Normal file
View File

@ -0,0 +1,987 @@
From owner-pgsql-hackers@hub.org Mon Mar 22 18:43:41 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA23978
for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 18:43:39 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id SAA06472 for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 18:36:44 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.2/8.9.1) with SMTP id SAA92604;
Mon, 22 Mar 1999 18:34:23 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Mar 1999 18:33:50 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.2/8.9.1) id SAA92469
for pgsql-hackers-outgoing; Mon, 22 Mar 1999 18:33:47 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from po8.andrew.cmu.edu (PO8.ANDREW.CMU.EDU [128.2.10.108])
by hub.org (8.9.2/8.9.1) with ESMTP id SAA92456
for <pgsql-hackers@postgresql.org>; Mon, 22 Mar 1999 18:33:41 -0500 (EST)
(envelope-from er1p+@andrew.cmu.edu)
Received: (from postman@localhost) by po8.andrew.cmu.edu (8.8.5/8.8.2) id SAA12894 for pgsql-hackers@postgresql.org; Mon, 22 Mar 1999 18:33:38 -0500 (EST)
Received: via switchmail; Mon, 22 Mar 1999 18:33:38 -0500 (EST)
Received: from cloudy.me.cmu.edu via qmail
ID </afs/andrew.cmu.edu/service/mailqs/q007/QF.Aqxh7Lu00gNtQ0TZE5>;
Mon, 22 Mar 1999 18:27:20 -0500 (EST)
Received: from cloudy.me.cmu.edu via qmail
ID </afs/andrew.cmu.edu/usr2/er1p/.Outgoing/QF.Uqxh7JS00gNtMmTJFk>;
Mon, 22 Mar 1999 18:27:17 -0500 (EST)
Received: from mms.4.60.Jun.27.1996.03.05.56.sun4.41.EzMail.2.0.CUILIB.3.45.SNAP.NOT.LINKED.cloudy.me.cmu.edu.sun4m.412
via MS.5.6.cloudy.me.cmu.edu.sun4_41;
Mon, 22 Mar 1999 18:27:15 -0500 (EST)
Message-ID: <sqxh7H_00gNtAmTJ5Q@andrew.cmu.edu>
Date: Mon, 22 Mar 1999 18:27:15 -0500 (EST)
From: Erik Riedel <riedel+@CMU.EDU>
To: pgsql-hackers@postgreSQL.org
Subject: [HACKERS] optimizer and type question
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
[last week aggregation, this week, the optimizer]
I have a somewhat general optimizer question/problem that I would like
to get some input on - i.e. I'd like to know what is "supposed" to
work here and what I should be expecting. Sadly, I think the patch
for this is more involved than my last message.
Using my favorite table these days:
Table = lineitem
+------------------------+----------------------------------+-------+
| Field | Type | Length|
+------------------------+----------------------------------+-------+
| l_orderkey | int4 not null | 4 |
| l_partkey | int4 not null | 4 |
| l_suppkey | int4 not null | 4 |
| l_linenumber | int4 not null | 4 |
| l_quantity | float4 not null | 4 |
| l_extendedprice | float4 not null | 4 |
| l_discount | float4 not null | 4 |
| l_tax | float4 not null | 4 |
| l_returnflag | char() not null | 1 |
| l_linestatus | char() not null | 1 |
| l_shipdate | date | 4 |
| l_commitdate | date | 4 |
| l_receiptdate | date | 4 |
| l_shipinstruct | char() not null | 25 |
| l_shipmode | char() not null | 10 |
| l_comment | char() not null | 44 |
+------------------------+----------------------------------+-------+
Index: lineitem_index_
and the query:
--
-- Query 1
--
explain select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc, count(*) as count_order
from lineitem
where l_shipdate <= '1998-09-02'::date
group by l_returnflag, l_linestatus
order by l_returnflag, l_linestatus;
note that I have eliminated the date calculation in my query of last
week and manually replaced it with a constant (since this wasn't
happening automatically - but let's not worry about that for now).
And this is only an explain, we care about the optimizer. So we get:
Sort (cost=34467.88 size=0 width=0)
-> Aggregate (cost=34467.88 size=0 width=0)
-> Group (cost=34467.88 size=0 width=0)
-> Sort (cost=34467.88 size=0 width=0)
-> Seq Scan on lineitem (cost=34467.88 size=200191 width=44)
so let's think about the selectivity that is being chosen for the
seq scan (the where l_shipdate <= '1998-09-02').
Turns out the optimizer is choosing "33%", even though the real answer
is somewhere in 90+% (that's how the query is designed). So, why does
it do that?
Turns out that selectivity in this case is determined via
plancat::restriction_selectivity() which calls into functionOID = 103
(intltsel) for operatorOID = 1096 (date "<=") on relation OID = 18663
(my lineitem).
This all follows because of the description of 1096 (date "<=") in
pg_operator. Looking at local1_template1.bki.source near line 1754
shows:
insert OID = 1096 ( "<=" PGUID 0 <...> date_le intltsel intltjoinsel )
where we see that indeed, it thinks "intltsel" is the right function
to use for "oprrest" in the case of dates.
Question 1 - is intltsel the right thing for selectivity on dates?
Hope someone is still with me.
So now we're running selfuncs::intltsel() where we make a further call
to selfuncs::gethilokey(). The job of gethilokey is to determine the
min and max values of a particular attribute in the table, which will
then be used with the constant in my where clause to estimate the
selectivity. It is going to search the pg_statistic relation with
three key values:
Anum_pg_statistic_starelid 18663 (lineitem)
Anum_pg_statistic_staattnum 11 (l_shipdate)
Anum_pg_statistic_staop 1096 (date "<=")
this finds no tuples in pg_statistic. Why is that? The only nearby
tuple in pg_statistic is:
starelid|staattnum|staop|stalokey |stahikey
--------+---------+-----+----------------+----------------
18663| 11| 0|01-02-1992 |12-01-1998
and the reason the query doesn't match anything? Because 1096 != 0.
But why is it 0 in pg_statistic? Statistics are determined near line
1844 in vacuum.c (assuming a 'vacuum analyze' run at some point)
i = 0;
values[i++] = (Datum) relid; /* 1 */
values[i++] = (Datum) attp->attnum; /* 2 */
====> values[i++] = (Datum) InvalidOid; /* 3 */
fmgr_info(stats->outfunc, &out_function);
out_string = <...min...>
values[i++] = (Datum) fmgr(F_TEXTIN, out_string);
pfree(out_string);
out_string = <...max...>
values[i++] = (Datum) fmgr(F_TEXTIN, out_string);
pfree(out_string);
stup = heap_formtuple(sd->rd_att, values, nulls);
the "offending" line is setting the staop to InvalidOid (i.e. 0).
Question 2 - is this right? Is the intent for 0 to serve as a
"wildcard", or should it be inserting an entry for each operation
individually?
In the case of "wildcard" then gethilokey() should allow a match for
Anum_pg_statistic_staop 0
instead of requiring the more restrictive 1096. In the current code,
what happens next is gethilokey() returns "not found" and intltsel()
returns the default 1/3 which I see in the resultant query plan (size
= 200191 is 1/3 of the number of lineitem tuples).
Question 3 - is there any inherent reason it couldn't get this right?
The statistic is in the table 1992 to 1998, so the '1998-09-02' date
should be 90-some% selectivity, a much better guess than 33%.
Doesn't make a difference for this particular query, of course,
because the seq scan must proceed anyhow, but it could easily affect
other queries where selectivities matter (and it affects the
modifications I am trying to test in the optimizer to be "smarter"
about selectivities - my overall context is to understand/improve the
behavior that the underlying storage system sees from queries like this).
OK, so let's say we treat 0 as a "wildcard" and stop checking for
1096. Not we let gethilokey() return the two dates from the statistic
table. The immediate next thing that intltsel() does, near lines 122
in selfuncs.c is call atol() on the strings from gethilokey(). And
guess what it comes up with?
low = 1
high = 12
because it calls atol() on '01-02-1992' and '12-01-1998'. This
clearly isn't right, it should get some large integer that includes
the year and day in the result. Then it should compare reasonably
with my constant from the where clause and give a decent selectivity
value. This leads to a re-visit of Question 1.
Question 4 - should date "<=" use a dateltsel() function instead of
intltsel() as oprrest?
If anyone is still with me, could you tell me if this makes sense, or
if there is some other location where the appropriate type conversion
could take place so that intltsel() gets something reasonable when it
does the atol() calls?
Could someone also give me a sense for how far out-of-whack the whole
current selectivity-handling structure is? It seems that most of the
operators in pg_operator actually use intltsel() and would have
type-specific problems like that described. Or is the problem in the
way attribute values are stored in pg_statistic by vacuum analyze? Or
is there another layer where type conversion belongs?
Phew. Enough typing, hope someone can follow this and address at
least some of the questions.
Thanks.
Erik Riedel
Carnegie Mellon University
www.cs.cmu.edu/~riedel
From owner-pgsql-hackers@hub.org Mon Mar 22 20:31:11 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA00802
for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 20:31:09 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id UAA13231 for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 20:15:20 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.2/8.9.1) with SMTP id UAA01981;
Mon, 22 Mar 1999 20:14:04 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Mar 1999 20:13:32 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.2/8.9.1) id UAA01835
for pgsql-hackers-outgoing; Mon, 22 Mar 1999 20:13:28 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6])
by hub.org (8.9.2/8.9.1) with ESMTP id UAA01822
for <pgsql-hackers@postgreSQL.org>; Mon, 22 Mar 1999 20:13:21 -0500 (EST)
(envelope-from tgl@sss.pgh.pa.us)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id UAA23294;
Mon, 22 Mar 1999 20:12:43 -0500 (EST)
To: Erik Riedel <riedel+@CMU.EDU>
cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] optimizer and type question
In-reply-to: Your message of Mon, 22 Mar 1999 18:27:15 -0500 (EST)
<sqxh7H_00gNtAmTJ5Q@andrew.cmu.edu>
Date: Mon, 22 Mar 1999 20:12:43 -0500
Message-ID: <23292.922151563@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
Erik Riedel <riedel+@CMU.EDU> writes:
> [ optimizer doesn't find relevant pg_statistic entry ]
It's clearly a bug that the selectivity code is not finding this tuple.
If your analysis is correct, then selectivity estimation has *never*
worked properly, or at least not in recent memory :-(. Yipes.
Bruce and I found a bunch of other problems in the optimizer recently,
so it doesn't faze me to assume that this is broken too.
> the "offending" line is setting the staop to InvalidOid (i.e. 0).
> Question 2 - is this right? Is the intent for 0 to serve as a
> "wildcard",
My thought is that what the staop column ought to be is the OID of the
comparison function that was used to determine the sort order of the
column. Without a sort op the lowest and highest keys in the column are
not well defined, so it makes no sense to assert "these are the lowest
and highest values" without providing the sort op that determined that.
(For sufficiently complex data types one could reasonably have multiple
ordering operators. A crude example is sorting on "circumference" and
"area" for polygons.) But typically the sort op will be the "<"
operator for the column data type.
So, the vacuum code is definitely broken --- it's not storing the sort
op that it used. The code in gethilokey might be broken too, depending
on how it is producing the operator it's trying to match against the
tuple. For example, if the actual operator in the query is any of
< <= > >= on int4, then int4lt ought to be used to probe the pg_statistic
table. I'm not sure if we have adequate info in pg_operator or pg_type
to let the optimizer code determine the right thing to probe with :-(
> The immediate next thing that intltsel() does, near lines 122
> in selfuncs.c is call atol() on the strings from gethilokey(). And
> guess what it comes up with?
> low = 1
> high = 12
> because it calls atol() on '01-02-1992' and '12-01-1998'. This
> clearly isn't right, it should get some large integer that includes
> the year and day in the result. Then it should compare reasonably
> with my constant from the where clause and give a decent selectivity
> value. This leads to a re-visit of Question 1.
> Question 4 - should date "<=" use a dateltsel() function instead of
> intltsel() as oprrest?
This is clearly busted as well. I'm not sure that creating dateltsel()
is the right fix, however, because if you go down that path then every
single datatype needs its own selectivity function; that's more than we
need.
What we really want here is to be able to map datatype values into
some sort of numeric range so that we can compute what fraction of the
low-key-to-high-key range is on each side of the probe value (the
constant taken from the query). This general concept will apply to
many scalar types, so what we want is a type-specific mapping function
and a less-specific fraction-computing-function. Offhand I'd say that
we want intltsel() and floatltsel(), plus conversion routines that can
produce either int4 or float8 from a data type as seems appropriate.
Anything that couldn't map to one or the other would have to supply its
own selectivity function.
> Or is the problem in the
> way attribute values are stored in pg_statistic by vacuum analyze?
Looks like it converts the low and high values to text and stores them
that way. Ugly as can be :-( but I'm not sure there is a good
alternative. We have no "wild card" column type AFAIK, which is what
these columns of pg_statistic would have to be to allow storage of
unconverted min and max values.
I think you've found a can of worms here. Congratulations ;-)
regards, tom lane
From owner-pgsql-hackers@hub.org Mon Mar 22 23:31:00 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA03384
for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 23:30:58 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id XAA25586 for <maillist@candle.pha.pa.us>; Mon, 22 Mar 1999 23:18:25 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.2/8.9.1) with SMTP id XAA17955;
Mon, 22 Mar 1999 23:17:24 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Mar 1999 23:16:49 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.2/8.9.1) id XAA17764
for pgsql-hackers-outgoing; Mon, 22 Mar 1999 23:16:46 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from po8.andrew.cmu.edu (PO8.ANDREW.CMU.EDU [128.2.10.108])
by hub.org (8.9.2/8.9.1) with ESMTP id XAA17745
for <pgsql-hackers@postgreSQL.org>; Mon, 22 Mar 1999 23:16:39 -0500 (EST)
(envelope-from er1p+@andrew.cmu.edu)
Received: (from postman@localhost) by po8.andrew.cmu.edu (8.8.5/8.8.2) id XAA04273; Mon, 22 Mar 1999 23:16:37 -0500 (EST)
Received: via switchmail; Mon, 22 Mar 1999 23:16:37 -0500 (EST)
Received: from hazy.adsl.net.cmu.edu via qmail
ID </afs/andrew.cmu.edu/service/mailqs/q000/QF.kqxlJ:S00anI00p040>;
Mon, 22 Mar 1999 23:15:09 -0500 (EST)
Received: from hazy.adsl.net.cmu.edu via qmail
ID </afs/andrew.cmu.edu/usr2/er1p/.Outgoing/QF.MqxlJ3q00anI01hKE0>;
Mon, 22 Mar 1999 23:15:00 -0500 (EST)
Received: from mms.4.60.Jun.27.1996.03.02.53.sun4.51.EzMail.2.0.CUILIB.3.45.SNAP.NOT.LINKED.hazy.adsl.net.cmu.edu.sun4m.54
via MS.5.6.hazy.adsl.net.cmu.edu.sun4_51;
Mon, 22 Mar 1999 23:14:55 -0500 (EST)
Message-ID: <4qxlJ0200anI01hK40@andrew.cmu.edu>
Date: Mon, 22 Mar 1999 23:14:55 -0500 (EST)
From: Erik Riedel <riedel+@CMU.EDU>
To: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [HACKERS] optimizer and type question
Cc: pgsql-hackers@postgreSQL.org
In-Reply-To: <23292.922151563@sss.pgh.pa.us>
References: <23292.922151563@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr
OK, building on your high-level explanation, I am attaching a patch that
attempts to do something "better" than the current code. Note that I
have only tested this with the date type and my particular query. I
haven't run it through the regression, so consider it "proof of concept"
at best. Although hopefully it will serve my purposes.
> My thought is that what the staop column ought to be is the OID of the
> comparison function that was used to determine the sort order of the
> column. Without a sort op the lowest and highest keys in the column are
> not well defined, so it makes no sense to assert "these are the lowest
> and highest values" without providing the sort op that determined that.
>
> (For sufficiently complex data types one could reasonably have multiple
> ordering operators. A crude example is sorting on "circumference" and
> "area" for polygons.) But typically the sort op will be the "<"
> operator for the column data type.
>
I changed vacuum.c to do exactly that. oid of the lt sort op.
> So, the vacuum code is definitely broken --- it's not storing the sort
> op that it used. The code in gethilokey might be broken too, depending
> on how it is producing the operator it's trying to match against the
> tuple. For example, if the actual operator in the query is any of
> < <= > >= on int4, then int4lt ought to be used to probe the pg_statistic
> table. I'm not sure if we have adequate info in pg_operator or pg_type
> to let the optimizer code determine the right thing to probe with :-(
>
This indeed seems like a bigger problem. I thought about somehow using
type-matching from the sort op and the actual operator in the query - if
both the left and right type match, then consider them the same for
purposes of this probe. That seemed complicated, so I punted in my
example - it just does the search with relid and attnum and assumes that
only returns one tuple. This works in my case (maybe in all cases,
because of the way vacuum is currently written - ?).
> What we really want here is to be able to map datatype values into
> some sort of numeric range so that we can compute what fraction of the
> low-key-to-high-key range is on each side of the probe value (the
> constant taken from the query). This general concept will apply to
> many scalar types, so what we want is a type-specific mapping function
> and a less-specific fraction-computing-function. Offhand I'd say that
> we want intltsel() and floatltsel(), plus conversion routines that can
> produce either int4 or float8 from a data type as seems appropriate.
> Anything that couldn't map to one or the other would have to supply its
> own selectivity function.
>
This is what my example then does. Uses the stored sort op to get the
type and then uses typinput to convert from the string to an int4.
Then puts the int4 back into string format because that's what everyone
was expecting.
It seems to work for my particular query. I now get:
(selfuncs) gethilokey() obj 18663 attr 11 opid 1096 (ignored)
(selfuncs) gethilokey() found op 1087 in pg_proc
(selfuncs) gethilokey() found type 1082 in pg_type
(selfuncs) gethilokey() going to use 1084 to convert type 1082
(selfuncs) gethilokey() have low -2921 high -396
(selfuncs) intltsel() high -396 low -2921 val -486
(plancat) restriction_selectivity() for func 103 op 1096 rel 18663 attr
11 const -486 flag 3 returns 0.964356
NOTICE: QUERY PLAN:
Sort (cost=34467.88 size=0 width=0)
-> Aggregate (cost=34467.88 size=0 width=0)
-> Group (cost=34467.88 size=0 width=0)
-> Sort (cost=34467.88 size=0 width=0)
-> Seq Scan on lineitem (cost=34467.88 size=579166 width=44)
including my printfs, which exist in the patch as well.
Selectivity is now the expected 96% and the size estimate for the seq
scan is much closer to correct.
Again, not tested with anything besides date, so caveat not-tested.
Hope this helps.
Erik
----------------------[optimizer_fix.sh]------------------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
# selfuncs.c.diff
# vacuum.c.diff
# This archive created: Mon Mar 22 22:58:14 1999
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'selfuncs.c.diff'
then
echo shar: "will not over-write existing file 'selfuncs.c.diff'"
else
cat << \SHAR_EOF > 'selfuncs.c.diff'
***
/afs/ece.cmu.edu/project/lcs/lcs-004/er1p/postgres/611/src/backend/utils/adt
/selfuncs.c Thu Mar 11 23:59:35 1999
---
/afs/ece.cmu.edu/project/lcs/lcs-004/er1p/postgres/615/src/backend/utils/adt
/selfuncs.c Mon Mar 22 22:57:25 1999
***************
*** 32,37 ****
--- 32,40 ----
#include "utils/lsyscache.h" /* for get_oprrest() */
#include "catalog/pg_statistic.h"
+ #include "catalog/pg_proc.h" /* for Form_pg_proc */
+ #include "catalog/pg_type.h" /* for Form_pg_type */
+
/* N is not a valid var/constant or relation id */
#define NONVALUE(N) ((N) == -1)
***************
*** 103,110 ****
bottom;
result = (float64) palloc(sizeof(float64data));
! if (NONVALUE(attno) || NONVALUE(relid))
*result = 1.0 / 3;
else
{
/* XXX val = atol(value); */
--- 106,114 ----
bottom;
result = (float64) palloc(sizeof(float64data));
! if (NONVALUE(attno) || NONVALUE(relid)) {
*result = 1.0 / 3;
+ }
else
{
/* XXX val = atol(value); */
***************
*** 117,130 ****
}
high = atol(highchar);
low = atol(lowchar);
if ((flag & SEL_RIGHT && val < low) ||
(!(flag & SEL_RIGHT) && val > high))
{
float32data nvals;
nvals = getattdisbursion(relid, (int) attno);
! if (nvals == 0)
*result = 1.0 / 3.0;
else
{
*result = 3.0 * (float64data) nvals;
--- 121,136 ----
}
high = atol(highchar);
low = atol(lowchar);
+ printf("(selfuncs) intltsel() high %d low %d val %d\n",high,low,val);
if ((flag & SEL_RIGHT && val < low) ||
(!(flag & SEL_RIGHT) && val > high))
{
float32data nvals;
nvals = getattdisbursion(relid, (int) attno);
! if (nvals == 0) {
*result = 1.0 / 3.0;
+ }
else
{
*result = 3.0 * (float64data) nvals;
***************
*** 336,341 ****
--- 342,353 ----
{
Relation rel;
HeapScanDesc scan;
+ /* this assumes there is only one row in the statistics table for any
particular */
+ /* relid, attnum pair - could be more complicated if staop is also
used. */
+ /* at the moment, if there are multiple rows, this code ends up
picking the */
+ /* "first" one
- er1p */
+ /* the actual "ignoring" is done in the call to heap_beginscan()
below, where */
+ /* we only mention 2 of the 3 keys in this array
- er1p */
static ScanKeyData key[3] = {
{0, Anum_pg_statistic_starelid, F_OIDEQ, {0, 0, F_OIDEQ}},
{0, Anum_pg_statistic_staattnum, F_INT2EQ, {0, 0, F_INT2EQ}},
***************
*** 344,355 ****
bool isnull;
HeapTuple tuple;
rel = heap_openr(StatisticRelationName);
key[0].sk_argument = ObjectIdGetDatum(relid);
key[1].sk_argument = Int16GetDatum((int16) attnum);
key[2].sk_argument = ObjectIdGetDatum(opid);
! scan = heap_beginscan(rel, 0, SnapshotNow, 3, key);
tuple = heap_getnext(scan, 0);
if (!HeapTupleIsValid(tuple))
{
--- 356,377 ----
bool isnull;
HeapTuple tuple;
+ HeapTuple tup;
+ Form_pg_proc proc;
+ Form_pg_type typ;
+ Oid which_op;
+ Oid which_type;
+ int32 low_value;
+ int32 high_value;
+
rel = heap_openr(StatisticRelationName);
key[0].sk_argument = ObjectIdGetDatum(relid);
key[1].sk_argument = Int16GetDatum((int16) attnum);
key[2].sk_argument = ObjectIdGetDatum(opid);
! printf("(selfuncs) gethilokey() obj %d attr %d opid %d (ignored)\n",
! key[0].sk_argument,key[1].sk_argument,key[2].sk_argument);
! scan = heap_beginscan(rel, 0, SnapshotNow, 2, key);
tuple = heap_getnext(scan, 0);
if (!HeapTupleIsValid(tuple))
{
***************
*** 376,383 ****
--- 398,461 ----
&isnull));
if (isnull)
elog(DEBUG, "gethilokey: low key is null");
+
heap_endscan(scan);
heap_close(rel);
+
+ /* now we deal with type conversion issues
*/
+ /* when intltsel() calls this routine (who knows what other callers
might do) */
+ /* it assumes that it can call atol() on the strings and then use
integer */
+ /* comparison from there. what we are going to do here, then, is try
to use */
+ /* the type information from Anum_pg_statistic_staop to convert the
high */
+ /* and low values
- er1p */
+
+ /* WARNING: this code has only been tested with the date type and has
NOT */
+ /* been regression tested. consider it "sample" code of what might
be the */
+ /* right kind of thing to do
- er1p */
+
+ /* get the 'op' from pg_statistic and look it up in pg_proc */
+ which_op = heap_getattr(tuple,
+ Anum_pg_statistic_staop,
+ RelationGetDescr(rel),
+ &isnull);
+ if (InvalidOid == which_op) {
+ /* ignore all this stuff, try conversion only if we have a valid staop */
+ /* note that there is an accompanying change to 'vacuum analyze' that */
+ /* gets this set to something useful. */
+ } else {
+ /* staop looks valid, so let's see what we can do about conversion */
+ tup = SearchSysCacheTuple(PROOID, ObjectIdGetDatum(which_op), 0, 0, 0);
+ if (!HeapTupleIsValid(tup)) {
+ elog(ERROR, "selfuncs: unable to find op in pg_proc %d", which_op);
+ }
+ printf("(selfuncs) gethilokey() found op %d in pg_proc\n",which_op);
+
+ /* use that to determine the type of stahikey and stalokey via pg_type */
+ proc = (Form_pg_proc) GETSTRUCT(tup);
+ which_type = proc->proargtypes[0]; /* XXX - use left and right
separately? */
+ tup = SearchSysCacheTuple(TYPOID, ObjectIdGetDatum(which_type), 0, 0, 0);
+ if (!HeapTupleIsValid(tup)) {
+ elog(ERROR, "selfuncs: unable to find type in pg_type %d", which_type);
+ }
+ printf("(selfuncs) gethilokey() found type %d in pg_type\n",which_type);
+
+ /* and use that type to get the conversion function to int4 */
+ typ = (Form_pg_type) GETSTRUCT(tup);
+ printf("(selfuncs) gethilokey() going to use %d to convert type
%d\n",typ->typinput,which_type);
+
+ /* and convert the low and high strings */
+ low_value = (int32) fmgr(typ->typinput, *low, -1);
+ high_value = (int32) fmgr(typ->typinput, *high, -1);
+ printf("(selfuncs) gethilokey() have low %d high
%d\n",low_value,high_value);
+
+ /* now we have int4's, which we put back into strings because
that's what out */
+ /* callers (intltsel() at least) expect
- er1p */
+ pfree(*low); pfree(*high); /* let's not leak the old strings */
+ *low = int4out(low_value);
+ *high = int4out(high_value);
+
+ /* XXX - this probably leaks the two tups we got from
SearchSysCacheTuple() - er1p */
+ }
}
float64
SHAR_EOF
fi
if test -f 'vacuum.c.diff'
then
echo shar: "will not over-write existing file 'vacuum.c.diff'"
else
cat << \SHAR_EOF > 'vacuum.c.diff'
***
/afs/ece.cmu.edu/project/lcs/lcs-004/er1p/postgres/611/src/backend/commands/
vacuum.c Thu Mar 11 23:59:09 1999
---
/afs/ece.cmu.edu/project/lcs/lcs-004/er1p/postgres/615/src/backend/commands/
vacuum.c Mon Mar 22 21:23:15 1999
***************
*** 1842,1848 ****
i = 0;
values[i++] = (Datum) relid; /* 1 */
values[i++] = (Datum) attp->attnum; /* 2 */
! values[i++] = (Datum) InvalidOid; /* 3 */
fmgr_info(stats->outfunc, &out_function);
out_string = (*fmgr_faddr(&out_function)) (stats->min,
stats->attr->atttypid);
values[i++] = (Datum) fmgr(F_TEXTIN, out_string);
--- 1842,1848 ----
i = 0;
values[i++] = (Datum) relid; /* 1 */
values[i++] = (Datum) attp->attnum; /* 2 */
! values[i++] = (Datum) stats->f_cmplt.fn_oid; /* 3 */ /* get the
'<' oid, instead of 'invalid' - er1p */
fmgr_info(stats->outfunc, &out_function);
out_string = (*fmgr_faddr(&out_function)) (stats->min,
stats->attr->atttypid);
values[i++] = (Datum) fmgr(F_TEXTIN, out_string);
SHAR_EOF
fi
exit 0
# End of shell archive
From owner-pgsql-hackers@hub.org Tue Mar 23 12:31:05 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA17491
for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:31:04 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id MAA08839 for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:08:14 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.2/8.9.1) with SMTP id MAA93649;
Tue, 23 Mar 1999 12:04:57 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Mar 1999 12:03:00 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.2/8.9.1) id MAA93355
for pgsql-hackers-outgoing; Tue, 23 Mar 1999 12:02:55 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6])
by hub.org (8.9.2/8.9.1) with ESMTP id MAA93336
for <pgsql-hackers@postgreSQL.org>; Tue, 23 Mar 1999 12:02:43 -0500 (EST)
(envelope-from tgl@sss.pgh.pa.us)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id MAA24455;
Tue, 23 Mar 1999 12:01:57 -0500 (EST)
To: Erik Riedel <riedel+@CMU.EDU>
cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] optimizer and type question
In-reply-to: Your message of Mon, 22 Mar 1999 23:14:55 -0500 (EST)
<4qxlJ0200anI01hK40@andrew.cmu.edu>
Date: Tue, 23 Mar 1999 12:01:57 -0500
Message-ID: <24453.922208517@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Erik Riedel <riedel+@CMU.EDU> writes:
> OK, building on your high-level explanation, I am attaching a patch that
> attempts to do something "better" than the current code. Note that I
> have only tested this with the date type and my particular query.
Glad to see you working on this. I don't like the details of your
patch too much though ;-). Here are some suggestions for making it
better.
1. I think just removing staop from the lookup in gethilokey is OK for
now, though I'm dubious about Bruce's thought that we could delete that
field entirely. As you observe, vacuum will not currently put more
than one tuple for a column into pg_statistic, so we can just do the
lookup with relid and attno and leave it at that. But I think we ought
to leave the field there, with the idea that vacuum might someday
compute more than one statistic for a data column. Fixing vacuum to
put its sort op into the field is a good idea in the meantime.
2. The type conversion you're doing in gethilokey is a mess; I think
what you ought to make it do is simply the inbound conversion of the
string from pg_statistic into the internal representation for the
column's datatype, and return that value as a Datum. It also needs
a cleaner success/failure return convention --- this business with
"n" return is ridiculously type-specific. Also, the best and easiest
way to find the type to convert to is to look up the column type in
the info for the given relid, not search pg_proc with the staop value.
(I'm not sure that will even work, since there are pg_proc entries
with wildcard argument types.)
3. The atol() calls currently found in intltsel are a type-specific
cheat on what is conceptually a two-step process:
* Convert the string stored in pg_statistic back to the internal
form for the column data type.
* Generate a numeric representation of the data value that can be
used as an estimate of the range of values in the table.
The second step is trivial for integers, which may obscure the fact
that there are two steps involved, but nonetheless there are. If
you think about applying selectivity logic to strings, say, it
becomes clear that the second step is a necessary component of the
process. Furthermore, the second step must also be applied to the
probe value that's being passed into the selectivity operator.
(The probe value is already in internal form, of course; but it is
not necessarily in a useful numeric form.)
We can do the first of these steps by applying the appropriate "XXXin"
conversion function for the column data type, as you have done. The
interesting question is how to do the second one. A really clean
solution would require adding a column to pg_type that points to a
function that will do the appropriate conversion. I'd be inclined to
make all of these functions return "double" (float8) and just have one
top-level selectivity routine for all data types that can use
range-based selectivity logic.
We could probably hack something together that would not use an explicit
conversion function for each data type, but instead would rely on
type-specific assumptions inside the selectivity routines. We'd need many
more selectivity routines though (at least one for each of int, float4,
float8, and text data types) so I'm not sure we'd really save any work
compared to doing it right.
BTW, now that I look at this issue it's real clear that the selectivity
entries in pg_operator are horribly broken. The intltsel/intgtsel
selectivity routines are currently applied to 32 distinct data types:
regression=> select distinct typname,oprleft from pg_operator, pg_type
regression-> where pg_type.oid = oprleft
regression-> and oprrest in (103,104);
typname |oprleft
---------+-------
_aclitem | 1034
abstime | 702
bool | 16
box | 603
bpchar | 1042
char | 18
cidr | 650
circle | 718
date | 1082
datetime | 1184
float4 | 700
float8 | 701
inet | 869
int2 | 21
int4 | 23
int8 | 20
line | 628
lseg | 601
macaddr | 829
money | 790
name | 19
numeric | 1700
oid | 26
oid8 | 30
path | 602
point | 600
polygon | 604
text | 25
time | 1083
timespan | 1186
timestamp| 1296
varchar | 1043
(32 rows)
many of which are very obviously not compatible with integer for *any*
purpose. It looks to me like a lot of data types were added to
pg_operator just by copy-and-paste, without paying attention to whether
the selectivity routines were actually correct for the data type.
As the code stands today, the bogus entries don't matter because
gethilokey always fails, so we always get 1/3 as the selectivity
estimate for any comparison operator (except = and != of course).
I had actually noticed that fact and assumed that it was supposed
to work that way :-(. But, clearly, there is code in here that
is *trying* to be smarter.
As soon as we fix gethilokey so that it can succeed, we will start
getting essentially-random selectivity estimates for those data types
that aren't actually binary-compatible with integer. That will not do;
we have to do something about the issue.
regards, tom lane
From tgl@sss.pgh.pa.us Tue Mar 23 12:31:02 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA17484
for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:31:01 -0500 (EST)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id MAA09042 for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:10:55 -0500 (EST)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id MAA24474;
Tue, 23 Mar 1999 12:09:52 -0500 (EST)
To: Bruce Momjian <maillist@candle.pha.pa.us>
cc: riedel+@CMU.EDU, pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] optimizer and type question
In-reply-to: Your message of Mon, 22 Mar 1999 21:25:45 -0500 (EST)
<199903230225.VAA01641@candle.pha.pa.us>
Date: Tue, 23 Mar 1999 12:09:52 -0500
Message-ID: <24471.922208992@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: RO
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> What we really need is some way to determine how far the requested value
> is from the min/max values. With int, we just do (val-min)/(max-min).
> That works, but how do we do that for types that don't support division.
> Strings come to mind in this case.
What I'm envisioning is that we still apply the (val-min)/(max-min)
logic, but apply it to numeric values that are produced in a
type-dependent way.
For ints and floats the conversion is trivial, of course.
For strings, the first thing that comes to mind is to return 0 for a
null string and the value of the first byte for a non-null string.
This would give you one-part-in-256 selectivity which is plenty good
enough for what the selectivity code needs to do. (Actually, it's
only that good if the strings' first bytes are pretty well spread out.
If you have a table containing English words, for example, you might
only get about one part in 26 this way, since the first bytes will
probably only run from A to Z. Might be better to use the first two
characters of the string to compute the selectivity representation.)
In general, you can apply this logic as long as you can come up with
some numerical approximation to the data type's sorting order. It
doesn't have to be exact.
regards, tom lane
From owner-pgsql-hackers@hub.org Tue Mar 23 12:31:03 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA17488
for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:31:02 -0500 (EST)
Received: from hub.org (majordom@hub.org [209.47.145.100]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id MAA09987 for <maillist@candle.pha.pa.us>; Tue, 23 Mar 1999 12:21:34 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.2/8.9.1) with SMTP id MAA95155;
Tue, 23 Mar 1999 12:18:33 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Mar 1999 12:17:00 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.2/8.9.1) id MAA94857
for pgsql-hackers-outgoing; Tue, 23 Mar 1999 12:16:56 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6])
by hub.org (8.9.2/8.9.1) with ESMTP id MAA94469
for <pgsql-hackers@postgreSQL.org>; Tue, 23 Mar 1999 12:11:33 -0500 (EST)
(envelope-from tgl@sss.pgh.pa.us)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id MAA24474;
Tue, 23 Mar 1999 12:09:52 -0500 (EST)
To: Bruce Momjian <maillist@candle.pha.pa.us>
cc: riedel+@CMU.EDU, pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] optimizer and type question
In-reply-to: Your message of Mon, 22 Mar 1999 21:25:45 -0500 (EST)
<199903230225.VAA01641@candle.pha.pa.us>
Date: Tue, 23 Mar 1999 12:09:52 -0500
Message-ID: <24471.922208992@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> What we really need is some way to determine how far the requested value
> is from the min/max values. With int, we just do (val-min)/(max-min).
> That works, but how do we do that for types that don't support division.
> Strings come to mind in this case.
What I'm envisioning is that we still apply the (val-min)/(max-min)
logic, but apply it to numeric values that are produced in a
type-dependent way.
For ints and floats the conversion is trivial, of course.
For strings, the first thing that comes to mind is to return 0 for a
null string and the value of the first byte for a non-null string.
This would give you one-part-in-256 selectivity which is plenty good
enough for what the selectivity code needs to do. (Actually, it's
only that good if the strings' first bytes are pretty well spread out.
If you have a table containing English words, for example, you might
only get about one part in 26 this way, since the first bytes will
probably only run from A to Z. Might be better to use the first two
characters of the string to compute the selectivity representation.)
In general, you can apply this logic as long as you can come up with
some numerical approximation to the data type's sorting order. It
doesn't have to be exact.
regards, tom lane

313
doc/TODO.detail/outer Normal file
View File

@ -0,0 +1,313 @@
From lockhart@alumni.caltech.edu Thu Jan 7 13:31:08 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA07771
for <maillist@candle.pha.pa.us>; Thu, 7 Jan 1999 13:31:06 -0500 (EST)
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id NAA14597 for <maillist@candle.pha.pa.us>; Thu, 7 Jan 1999 13:27:37 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA13416;
Thu, 7 Jan 1999 18:26:56 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <3694FC70.FAD67BC3@alumni.caltech.edu>
Date: Thu, 07 Jan 1999 18:26:56 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: Postgres Hackers List <hackers@postgresql.org>
Subject: Outer Joins (and need CASE help)
References: <199901071747.MAA07054@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO
> Thomas, do you need help on outer joins?
Yes. I'm going slowly partly because I get distracted with other
Postgres stuff like docs, and partly because I don't understand all of
the pieces I'm working with.
I've identified the place in the MergeJoin code where the null filling
for outer joins needs to happen, and have the "merge walk" code done.
But I don't have the supporting code which actually would know how to
null-fill a result tuple from the left or right. I thought you might be
interested in that?
I've done some work in the parser, and can now do things like:
postgres=> select * from t1 join t2 using (i);
NOTICE: JOIN not yet implemented
i|j|i|k
-+-+-+-
1|2|1|3
(1 row)
But this is just an inner join, and the result isn't quite right since
the second "i" column should probably be omitted. At the moment I
transform it from the syntax above into existing parse nodes, and
everything from there on works.
I don't yet pass an explicit join node into the planner/optimizer, and
that will be the hardest part I assume. Perhaps we can work on that
together.
So, what I'll try to do (soon, in the next few days?) is put in
#ifdef ENABLE_OUTER_JOINS
conditional code into the parser area (already there for the executor)
and commit everything to the development tree. Does that sound OK?
Oh, and if anyone is looking for something to do, I've got a couple of
CASE statements in the case.sql regression test which are commented out
because they crash the backend. They involve references to multiple
tables within a single result column, and in other contexts that
construct works. It would be great if someone had time to track it
down...
- Tom
From lockhart@alumni.caltech.edu Mon Feb 22 02:01:13 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA22073
for <maillist@candle.pha.pa.us>; Mon, 22 Feb 1999 02:01:12 -0500 (EST)
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id BAA26054 for <maillist@candle.pha.pa.us>; Mon, 22 Feb 1999 01:57:00 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA04715;
Mon, 22 Feb 1999 06:56:36 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <36D0FFA4.32ADB75C@alumni.caltech.edu>
Date: Mon, 22 Feb 1999 06:56:36 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: start on outer join
References: <199902220304.WAA10066@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: ROr
Bruce Momjian wrote:
>
> > Will apply ... some other changes laying a bit of
> > groundwork for outer joins so you can start on the planner/optimizer
> > parts :)
> Those will be a synch now that I understand the optimizer. In fact, I
> think it all will happen in the executor.
I've modified executor/nodeMergeJoin.c to walk a left/right/both outer
join, but didn't fill in the part which actually creates the result
tuple (which will be the current left- or right-side tuple plus nulls
for filler). I hope this is up your alley :)
So far, I'm not certain what to pass to the planner. The syntax leads me
to pass a select structure from gram.y with a "JoinExpr" structure in
the "fromClause" list. I need to expand that with a combination of
column names and qualifications, but at the time I see the JoinExpr I
don't have access to the top query structure itself. So I may just keep
a modestly transformed JoinExpr to expand later or to pass to the
planner.
btw, the EXCEPT/INTERSECT stuff from Stefan has some ugliness in gram.y
which needs to be fixed (the shift/reduce conflict is not acceptable for
our release version) and some of that code clearly needs to move to
analyze.c or some other module.
- Tom
From maillist Wed Feb 24 05:27:08 1999
Received: (from maillist@localhost)
by candle.pha.pa.us (8.9.0/8.9.0) id FAA09648;
Wed, 24 Feb 1999 05:27:08 -0500 (EST)
From: Bruce Momjian <maillist>
Message-Id: <199902241027.FAA09648@candle.pha.pa.us>
Subject: Re: [HACKERS] OUTER joins
In-Reply-To: <199902240953.EAA08561@candle.pha.pa.us> from Bruce Momjian at "Feb 24, 1999 4:53:21 am"
To: maillist@candle.pha.pa.us (Bruce Momjian)
Date: Wed, 24 Feb 1999 05:27:07 -0500 (EST)
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
X-Mailer: ELM [version 2.4ME+ PL47 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Status: RO
>
> How do you propose doing outer joins in non-mergejoin situations?
> Mergejoins can only be used currently in equal joins.
Is your solution going to be to make sure the OUTER table is always a
MergeJoin, or on the outside of a join loop? That could work.
That could get tricky if the table is joined to _two_ other tables.
With the cleaned-up optimizer, we can disable non-merge joins in certain
circumstances, and prevent OUTER tables from being inner in the others.
Is that the plan?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From lockhart@alumni.caltech.edu Mon Mar 1 13:01:08 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA21672
for <maillist@candle.pha.pa.us>; Mon, 1 Mar 1999 13:01:06 -0500 (EST)
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-2.jpl.nasa.gov [128.149.68.204]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id MAA12756 for <maillist@candle.pha.pa.us>; Mon, 1 Mar 1999 12:14:16 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id RAA09406;
Mon, 1 Mar 1999 17:10:49 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <36DACA19.E6DBE7D8@alumni.caltech.edu>
Date: Mon, 01 Mar 1999 17:10:49 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: OUTER joins
References: <199902240953.EAA08561@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: ROr
(back from a short vacation...)
> How do you propose doing outer joins in non-mergejoin situations?
> Mergejoins can only be used currently in equal joins.
Hadn't thought about it, other than figuring that implementing the
equi-join first was a good start. There is a class of outer join syntax
(the USING clause) which is implicitly an equi-join...
- Tom
From lockhart@alumni.caltech.edu Mon Mar 8 21:55:02 1999
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA15978
for <maillist@candle.pha.pa.us>; Mon, 8 Mar 1999 21:54:57 -0500 (EST)
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id VAA15837 for <maillist@candle.pha.pa.us>; Mon, 8 Mar 1999 21:48:33 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id CAA06996;
Tue, 9 Mar 1999 02:46:40 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <36E48B90.F3E902B7@alumni.caltech.edu>
Date: Tue, 09 Mar 1999 02:46:40 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: OUTER joins
References: <199903070325.WAA10357@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: ROr
> > Hadn't thought about it, other than figuring that implementing the
> > equi-join first was a good start. There is a class of outer join
> > syntax (the USING clause) which is implicitly an equi-join...
> Not that easy. You don't automatically get a mergejoin from an
> equijoin. I will have to force outer's to be either mergejoins, or
> inners of non-merge joins. Can you add code to non-merge joins in the
> executor to throw out a null row if it does not find an inner match
> for the outer row, and I will handle the optimizer so it doesn't throw
> a non-conforming plan to the executor.
So far I don't have enough info in the parser to get the
planner/optimizer going. Should we work from the front to the back, or
should I go ahead and look at the non-merge joins? It's painfully
obvious that I don't know anything about the middle parts of this to
proceed without lots more research.
- Tom
From lockhart@alumni.caltech.edu Tue Mar 9 22:47:57 1999
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07869
for <maillist@candle.pha.pa.us>; Tue, 9 Mar 1999 22:47:54 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id DAA14761;
Wed, 10 Mar 1999 03:46:43 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <36E5EB23.F5CD959B@alumni.caltech.edu>
Date: Wed, 10 Mar 1999 03:46:43 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>, tgl@mythos.jpl.nasa.gov
Subject: Re: SQL outer
References: <199903100112.UAA05772@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO
> select *
> from outer tab1, tab2, tab3
> where tab1.col1 = tab2.col1 and
> tab1.col1 = tab3.col1
select *
from t1 left join t2 using (c1)
join t3 on (c1 = t3.c1)
Result:
t1.c1 t1.c2 t2.c2 t3.c1
2 12 NULL 32
t1:
c1 c2
1 11
2 12
3 13
4 14
t2:
c1 c2
1 21
3 23
t3:
c1 c2
2 32
From lockhart@alumni.caltech.edu Wed Mar 10 10:48:54 1999
Received: from golem.jpl.nasa.gov (IDENT:root@hectic-1.jpl.nasa.gov [128.149.68.203])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA16741
for <maillist@candle.pha.pa.us>; Wed, 10 Mar 1999 10:48:51 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id PAA17723;
Wed, 10 Mar 1999 15:48:31 GMT
Sender: tgl@mythos.jpl.nasa.gov
Message-ID: <36E6944F.1F93B08@alumni.caltech.edu>
Date: Wed, 10 Mar 1999 15:48:31 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: Thomas Lockhart <lockhart@alumni.caltech.edu>
Subject: Re: SQL outer
References: <199903100112.UAA05772@candle.pha.pa.us> <36E5EB23.F5CD959B@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: ROr
Just thinking...
If the initial RelOptInfo groupings are derived from the WHERE clause
expressions, how about marking the "outer" property in those expressions
in the parser? istm that is where the parser knows about two tables in
one place, and I'm generating those expressions anyway. We could add a
field(s) to the expression structure, or pass along a slightly different
structure...
- Tom

343
doc/TODO.detail/performance Normal file
View File

@ -0,0 +1,343 @@
From owner-pgsql-hackers@hub.org Sun Jun 14 18:45:04 1998
Received: from hub.org (hub.org [209.47.148.200])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03690
for <maillist@candle.pha.pa.us>; Sun, 14 Jun 1998 18:45:00 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA28049; Sun, 14 Jun 1998 18:39:42 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 14 Jun 1998 18:36:06 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27943 for pgsql-hackers-outgoing; Sun, 14 Jun 1998 18:36:04 -0400 (EDT)
Received: from angular.illustra.com (ifmxoak.illustra.com [206.175.10.34]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27925 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 18:35:47 -0400 (EDT)
Received: from hawk.illustra.com (hawk.illustra.com [158.58.61.70]) by angular.illustra.com (8.7.4/8.7.3) with SMTP id PAA21293 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 15:35:12 -0700 (PDT)
Received: by hawk.illustra.com (5.x/smail2.5/06-10-94/S)
id AA07922; Sun, 14 Jun 1998 15:35:13 -0700
From: dg@illustra.com (David Gould)
Message-Id: <9806142235.AA07922@hawk.illustra.com>
Subject: [HACKERS] performance tests, initial results
To: pgsql-hackers@postgreSQL.org
Date: Sun, 14 Jun 1998 15:35:13 -0700 (PDT)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
I have been playing a little with the performance tests found in
pgsql/src/tests/performance and have a few observations that might be of
minor interest.
The tests themselves are simple enough although the result parsing in the
driver did not work on Linux. I am enclosing a patch below to fix this. I
think it will also work better on the other systems.
A summary of results from my testing are below. Details are at the bottom
of this message.
My test system is 'leslie':
linux 2.0.32, gcc version 2.7.2.3
P133, HX chipset, 512K L2, 32MB mem
NCR810 fast scsi, Quantum Atlas 2GB drive (7200 rpm).
Results Summary (times in seconds)
Single txn 8K txn Create 8K idx 8K random Simple
Case Description 8K insert 8K insert Index Insert Scans Orderby
=================== ========== ========= ====== ====== ========= =======
1 From Distribution
P90 FreeBsd -B256 39.56 1190.98 3.69 46.65 65.49 2.27
IDE
2 Running on leslie
P133 Linux 2.0.32 15.48 326.75 2.99 20.69 35.81 1.68
SCSI 32M
3 leslie, -o -F
no forced writes 15.90 24.98 2.63 20.46 36.43 1.69
4 leslie, -o -F
no ASSERTS 14.92 23.23 1.38 18.67 33.79 1.58
5 leslie, -o -F -B2048
more buffers 21.31 42.28 2.65 25.74 42.26 1.72
6 leslie, -o -F -B2048
more bufs, no ASSERT 20.52 39.79 1.40 24.77 39.51 1.55
Case to Case Difference Factors (+ is faster)
Single txn 8K txn Create 8K idx 8K random Simple
Case Description 8K insert 8K insert Index Insert Scans Orderby
=================== ========== ========= ====== ====== ========= =======
leslie vs BSD P90. 2.56 3.65 1.23 2.25 1.83 1.35
(noflush -F) vs no -F -1.03 13.08 1.14 1.01 -1.02 1.00
No Assert vs Assert 1.05 1.07 1.90 1.06 1.07 1.09
-B256 vs -B2048 1.34 1.69 1.01 1.26 1.16 1.02
Observations:
- leslie (P133 linux) appears to be about 1.8 times faster than the
P90 BSD system used for the test result distributed with the source, not
counting the 8K txn insert case which was completely disk bound.
- SCSI disks make a big (factor of 3.6) difference. During this test the
disk was hammering and cpu utilization was < 10%.
- Assertion checking seems to cost about 7% except for create index where
it costs 90%
- the -F option to avoid flushing buffers has tremendous effect if there are
many very small transactions. Or, another way, flushing at the end of the
transaction is a major disaster for performance.
- Something is very wrong with our buffer cache implementation. Going from
256 buffers to 2048 buffers costs an average of 25%. In the 8K txn case
it costs about 70%. I see looking at the code and profiling that in the 8K
txn case this is in BufferSync() which examines all the buffers at commit
time. I don't quite understand why it is so costly for the single 8K row
txn (35%) though.
It would be nice to have some more tests. Maybe the Wisconsin stuff will
be useful.
----------------- patch to test harness. apply from pgsql ------------
*** src/test/performance/runtests.pl.orig Sun Jun 14 11:34:04 1998
Differences %
----------------- patch to test harness. apply from pgsql ------------
*** src/test/performance/runtests.pl.orig Sun Jun 14 11:34:04 1998
--- src/test/performance/runtests.pl Sun Jun 14 12:07:30 1998
***************
*** 84,123 ****
open (STDERR, ">$TmpFile") or die;
select (STDERR); $| = 1;
! for ($i = 0; $i <= $#perftests; $i++)
! {
$test = $perftests[$i];
($test, $XACTBLOCK) = split (/ /, $test);
$runtest = $test;
! if ( $test =~ /\.ntm/ )
! {
! #
# No timing for this queries
- #
close (STDERR); # close $TmpFile
open (STDERR, ">/dev/null") or die;
$runtest =~ s/\.ntm//;
}
! else
! {
close (STDOUT);
open(STDOUT, ">&SAVEOUT");
print STDOUT "\nRunning: $perftests[$i+1] ...";
close (STDOUT);
open (STDOUT, ">/dev/null") or die;
select (STDERR); $| = 1;
! printf "$perftests[$i+1]: ";
}
do "sqls/$runtest";
# Restore STDERR to $TmpFile
! if ( $test =~ /\.ntm/ )
! {
close (STDERR);
open (STDERR, ">>$TmpFile") or die;
}
-
select (STDERR); $| = 1;
$i++;
}
--- 84,116 ----
open (STDERR, ">$TmpFile") or die;
select (STDERR); $| = 1;
! for ($i = 0; $i <= $#perftests; $i++) {
$test = $perftests[$i];
($test, $XACTBLOCK) = split (/ /, $test);
$runtest = $test;
! if ( $test =~ /\.ntm/ ) {
# No timing for this queries
close (STDERR); # close $TmpFile
open (STDERR, ">/dev/null") or die;
$runtest =~ s/\.ntm//;
}
! else {
close (STDOUT);
open(STDOUT, ">&SAVEOUT");
print STDOUT "\nRunning: $perftests[$i+1] ...";
close (STDOUT);
open (STDOUT, ">/dev/null") or die;
select (STDERR); $| = 1;
! print "$perftests[$i+1]: ";
}
do "sqls/$runtest";
# Restore STDERR to $TmpFile
! if ( $test =~ /\.ntm/ ) {
close (STDERR);
open (STDERR, ">>$TmpFile") or die;
}
select (STDERR); $| = 1;
$i++;
}
***************
*** 128,138 ****
open (TMPF, "<$TmpFile") or die;
open (RESF, ">$ResFile") or die;
! while (<TMPF>)
! {
! $str = $_;
! ($test, $rtime) = split (/:/, $str);
! ($tmp, $rtime, $rest) = split (/[ ]+/, $rtime);
! print RESF "$test: $rtime\n";
}
--- 121,130 ----
open (TMPF, "<$TmpFile") or die;
open (RESF, ">$ResFile") or die;
! while (<TMPF>) {
! if (m/^(.*: ).* ([0-9:.]+) *elapsed/) {
! ($test, $rtime) = ($1, $2);
! print RESF $test, $rtime, "\n";
! }
}
------------------------------------------------------------------------
------------------------- testcase detail --------------------------
1. from distribution
DBMS: PostgreSQL 6.2b10
OS: FreeBSD 2.1.5-RELEASE
HardWare: i586/90, 24M RAM, IDE
StartUp: postmaster -B 256 '-o -S 2048' -S
Compiler: gcc 2.6.3
Compiled: -O, without CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.20
8192 INSERTs INTO SIMPLE (1 xact): 39.58
8192 INSERTs INTO SIMPLE (8192 xacts): 1190.98
Create INDEX on SIMPLE: 3.69
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 46.65
8192 random INDEX scans on SIMPLE (1 xact): 65.49
ORDER BY SIMPLE: 2.27
2. run on leslie with asserts
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
OS: Linux 2.0.32 leslie
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
StartUp: postmaster -B 256 '-o -S 2048' -S
Compiler: gcc 2.7.2.3
Compiled: -O, WITH CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.10
8192 INSERTs INTO SIMPLE (1 xact): 15.48
8192 INSERTs INTO SIMPLE (8192 xacts): 326.75
Create INDEX on SIMPLE: 2.99
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.69
8192 random INDEX scans on SIMPLE (1 xact): 35.81
ORDER BY SIMPLE: 1.68
3. with -F to avoid forced i/o
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
OS: Linux 2.0.32 leslie
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
StartUp: postmaster -B 256 '-o -S 2048 -F' -S
Compiler: gcc 2.7.2.3
Compiled: -O, WITH CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.10
8192 INSERTs INTO SIMPLE (1 xact): 15.90
8192 INSERTs INTO SIMPLE (8192 xacts): 24.98
Create INDEX on SIMPLE: 2.63
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.46
8192 random INDEX scans on SIMPLE (1 xact): 36.43
ORDER BY SIMPLE: 1.69
4. no asserts, -F to avoid forced I/O
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
OS: Linux 2.0.32 leslie
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
StartUp: postmaster -B 256 '-o -S 2048' -S
Compiler: gcc 2.7.2.3
Compiled: -O, No CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.10
8192 INSERTs INTO SIMPLE (1 xact): 14.92
8192 INSERTs INTO SIMPLE (8192 xacts): 23.23
Create INDEX on SIMPLE: 1.38
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 18.67
8192 random INDEX scans on SIMPLE (1 xact): 33.79
ORDER BY SIMPLE: 1.58
5. with more buffers (2048 vs 256) and -F to avoid forced i/o
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
OS: Linux 2.0.32 leslie
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
StartUp: postmaster -B 2048 '-o -S 2048 -F' -S
Compiler: gcc 2.7.2.3
Compiled: -O, WITH CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.11
8192 INSERTs INTO SIMPLE (1 xact): 21.31
8192 INSERTs INTO SIMPLE (8192 xacts): 42.28
Create INDEX on SIMPLE: 2.65
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 25.74
8192 random INDEX scans on SIMPLE (1 xact): 42.26
ORDER BY SIMPLE: 1.72
6. No Asserts, more buffers (2048 vs 256) and -F to avoid forced i/o
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
OS: Linux 2.0.32 leslie
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
StartUp: postmaster -B 2048 '-o -S 2048 -F' -S
Compiler: gcc 2.7.2.3
Compiled: -O, No CASSERT checking, with
-DTBL_FREE_CMD_MEMORY (to free memory
if BEGIN/END after each query execution)
DB connection startup: 0.11
8192 INSERTs INTO SIMPLE (1 xact): 20.52
8192 INSERTs INTO SIMPLE (8192 xacts): 39.79
Create INDEX on SIMPLE: 1.40
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 24.77
8192 random INDEX scans on SIMPLE (1 xact): 39.51
ORDER BY SIMPLE: 1.55
---------------------------------------------------------------------
-dg
David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Don't worry about people stealing your ideas. If your ideas are any
good, you'll have to ram them down people's throats." -- Howard Aiken

102
doc/TODO.detail/persistent Normal file
View File

@ -0,0 +1,102 @@
From owner-pgsql-hackers@hub.org Mon May 11 11:31:09 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03006
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:31:07 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id LAA01663 for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:24:42 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA21841; Mon, 11 May 1998 11:15:25 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:15:12 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA21683 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:15:09 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA21451 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:15:03 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA24915;
Mon, 11 May 1998 11:14:43 -0400 (EDT)
To: Brett McCormick <brett@work.chicken.org>
cc: hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
<13655.4384.345723.466046@abraxas.scene.com>
Date: Mon, 11 May 1998 11:14:43 -0400
Message-ID: <24913.894899683@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
Brett McCormick <brett@work.chicken.org> writes:
> same way that the current network socket is passed -- through an execv
> argument. hopefully, however, the non-execv()ing fork will be in 6.4.
Um, you missed the point, Brett. David was hoping to transfer a client
connection from the postmaster to an *already existing* backend process.
Fork, with or without exec, solves the problem for a backend that's
started after the postmaster has accepted the client socket.
This does lead to a different line of thought, however. Pre-started
backends would have access to the "master" connection socket on which
the postmaster listens for client connections, right? Suppose that we
fire the postmaster as postmaster, and demote it to being simply a
manufacturer of new backend processes as old ones get used up. Have
one of the idle backend processes be the one doing the accept() on the
master socket. Once it has a client connection, it performs the
authentication handshake and then starts serving the client (or just
quits if authentication fails). Meanwhile the next idle backend process
has executed accept() on the master socket and is waiting for the next
client; and shortly the postmaster/factory/whateverwecallitnow notices
that it needs to start another backend to add to the idle-backend pool.
This'd probably need some interlocking among the backends. I have no
idea whether it'd be safe to have all the idle backends trying to
do accept() on the master socket simultaneously, but it sounds risky.
Better to use a mutex so that only one gets to do it while the others
sleep.
regards, tom lane
From owner-pgsql-hackers@hub.org Mon May 11 11:35:55 1998
Received: from hub.org (hub.org [209.47.148.200])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03043
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:35:53 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA23494; Mon, 11 May 1998 11:27:10 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:27:02 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA23473 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:27:01 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA23462 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:26:56 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA25006;
Mon, 11 May 1998 11:26:44 -0400 (EDT)
To: Brett McCormick <brett@work.chicken.org>
cc: hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
<13655.4384.345723.466046@abraxas.scene.com>
Date: Mon, 11 May 1998 11:26:44 -0400
Message-ID: <25004.894900404@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
Meanwhile, *I* missed the point about Brett's second comment :-(
Brett McCormick <brett@work.chicken.org> writes:
> There will have to be some sort of arg parsing in any case,
> considering that you can pass configurable arguments to the backend..
If we do the sort of change David and I were just discussing, then the
pre-spawned backend would become responsible for parsing and dealing
with the PGOPTIONS portion of the client's connection request message.
That's just part of shifting the authentication handshake code from
postmaster to backend, so it shouldn't be too hard.
BUT: the whole point is to be able to initialize the backend before it
is connected to a client. How much of the expensive backend startup
work depends on having the client connection options available?
Any work that needs to know the options will have to wait until after
the client connects. If that means most of the startup work can't
happen in advance anyway, then we're out of luck; a pre-started backend
won't save enough time to be worth the effort. (Unless we are willing
to eliminate or redefine the troublesome options...)
regards, tom lane

55
doc/TODO.detail/pg_shadow Normal file
View File

@ -0,0 +1,55 @@
From owner-pgsql-hackers@hub.org Sun Aug 2 20:01:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA15937
for <maillist@candle.pha.pa.us>; Sun, 2 Aug 1998 20:01:11 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id TAA01026 for <maillist@candle.pha.pa.us>; Sun, 2 Aug 1998 19:33:53 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA19878; Sun, 2 Aug 1998 19:30:59 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 02 Aug 1998 19:28:23 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA19534 for pgsql-hackers-outgoing; Sun, 2 Aug 1998 19:28:22 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA19521 for <pgsql-hackers@postgreSQL.org>; Sun, 2 Aug 1998 19:28:15 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id TAA22594
for <pgsql-hackers@postgreSQL.org>; Sun, 2 Aug 1998 19:28:13 -0400 (EDT)
To: pgsql-hackers@postgreSQL.org
Subject: [HACKERS] TODO item: make pg_shadow updates more robust
Date: Sun, 02 Aug 1998 19:28:13 -0400
Message-ID: <22591.902100493@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
I learned the hard way last night that the postmaster's password
authentication routines don't look at the pg_shadow table. They
look at a separate file named pg_pwd, which certain backend operations
will update from pg_shadow. (This is not documented in any user
documentation that I could find; I had to burrow into
src/backend/commands/user.c to discover it.)
Unfortunately, if a clueless dbadmin (like me ;-)) tries to update
password data with the obvious thing,
update pg_shadow set passwd = 'xxxxx' where usename = 'yyyy';
pg_pwd doesn't get fixed.
A more drastic problem is that pg_dump believes it can save and
restore pg_shadow data using "copy". Following an initdb and restore
from a pg_dump -z script, pg_shadow will look just fine, but only
the database admin will be listed in pg_pwd. This is likely to provoke
some confusion, IMHO.
As a short-term thing, the fact that you *must* set passwords with
ALTER USER ought to be documented, preferably someplace where a
dbadmin who's never heard of ALTER USER is likely to find it.
As a longer-term thing, I think it would be far better if ordinary
SQL operations on pg_shadow just did the right thing. Wouldn't it
be possible to implement copying to pg_pwd by means of a trigger on
pg_shadow updates, or something like that?
(I'm afraid that pg_dump -z is pretty well broken for operations on
a password-protected database, btw. Has anyone used it successfully
in that situation?)
regards, tom lane

98
doc/TODO.detail/prepare Normal file
View File

@ -0,0 +1,98 @@
From owner-pgsql-hackers@hub.org Wed Nov 18 14:40:49 1998
Received: from hub.org (majordom@hub.org [209.47.148.200])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA29743
for <maillist@candle.pha.pa.us>; Wed, 18 Nov 1998 14:40:36 -0500 (EST)
Received: from localhost (majordom@localhost)
by hub.org (8.9.1/8.9.1) with SMTP id OAA03716;
Wed, 18 Nov 1998 14:37:04 -0500 (EST)
(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 18 Nov 1998 14:34:39 +0000 (EST)
Received: (from majordom@localhost)
by hub.org (8.9.1/8.9.1) id OAA03395
for pgsql-hackers-outgoing; Wed, 18 Nov 1998 14:34:37 -0500 (EST)
(envelope-from owner-pgsql-hackers@postgreSQL.org)
Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
by hub.org (8.9.1/8.9.1) with SMTP id OAA03381
for <pgsql-hackers@hub.org>; Wed, 18 Nov 1998 14:34:31 -0500 (EST)
(envelope-from wieck@sapserv.debis.de)
Received: by orion.SAPserv.Hamburg.dsh.de
for pgsql-hackers@hub.org
id m0zgDnj-000EBTC; Wed, 18 Nov 98 21:02 MET
Message-Id: <m0zgDnj-000EBTC@orion.SAPserv.Hamburg.dsh.de>
From: jwieck@debis.com (Jan Wieck)
Subject: Re: [HACKERS] PREPARE
To: meskes@usa.net (Michael Meskes)
Date: Wed, 18 Nov 1998 21:02:06 +0100 (MET)
Cc: pgsql-hackers@hub.org
Reply-To: jwieck@debis.com (Jan Wieck)
In-Reply-To: <19981118084843.B869@usa.net> from "Michael Meskes" at Nov 18, 98 08:48:43 am
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO
Michael Meskes wrote:
>
> On Wed, Nov 18, 1998 at 03:23:30AM +0000, Thomas G. Lockhart wrote:
> > > I didn't get this one completly. What input do you mean?
> >
> > Just the original string/query to be prepared...
>
> I see. But wouldn't it be more useful to preprocess the query and store the
> resulting nodes instead? We don't want to parse the statement everytime a
> variable binding comes in.
Right. A real improvement would only be to have the prepared
execution plan in the backend and just giving the parameter
values.
I can think of the following construct:
PREPARE optimizable-statement;
That one will run parser/rewrite/planner, create a new memory
context with a unique identifier and saves the querytree's
and plan's in it. Parameter values are identified by the
usual $n notation. The command returns the identifier.
EXECUTE QUERY identifier [value [, ...]];
then get's back the prepared plan and querytree by the id,
creates an executor context with the given values in the
parameter array and calls ExecutorRun() for them.
The PREPARE needs to analyze the resulting parsetrees to get
the datatypes (and maybe atttypmod's) of the parameters, so
EXECUTE QUERY can convert the values into Datum's using the
types input functions. And the EXECUTE has to be handled
special in tcop (it's something between a regular query and
an utility statement). But it's not too hard to implement.
Finally a
FORGET QUERY identifier;
(don't remember how the others named it) will remove the
prepared plan etc. simply by destroying the memory context
and dropping the identifier from the id->mcontext+prepareinfo
mapping.
This all restricts the usage of PREPARE to optimizable
statements. Is it required to be able to prepare utility
statements (like CREATE TABLE or so) too?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

159
doc/TODO.detail/primary Normal file
View File

@ -0,0 +1,159 @@
From owner-pgsql-hackers@hub.org Fri Sep 4 00:47:06 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA01047
for <maillist@candle.pha.pa.us>; Fri, 4 Sep 1998 00:47:05 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id XAA02044 for <maillist@candle.pha.pa.us>; Thu, 3 Sep 1998 23:11:07 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA27418; Thu, 3 Sep 1998 23:06:16 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 03 Sep 1998 23:04:11 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA27185 for pgsql-hackers-outgoing; Thu, 3 Sep 1998 23:04:09 -0400 (EDT)
Received: from dune.krs.ru (dune.krs.ru [195.161.16.38]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA27169 for <hackers@postgreSQL.org>; Thu, 3 Sep 1998 23:03:59 -0400 (EDT)
Received: from krs.ru (localhost.krs.ru [127.0.0.1])
by dune.krs.ru (8.8.8/8.8.8) with ESMTP id LAA10059;
Fri, 4 Sep 1998 11:03:00 +0800 (KRSS)
(envelope-from vadim@krs.ru)
Message-ID: <35EF5864.E5142D35@krs.ru>
Date: Fri, 04 Sep 1998 11:03:00 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.05 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
MIME-Version: 1.0
To: "D'Arcy J.M. Cain" <darcy@druid.net>
CC: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>, hackers@postgreSQL.org
Subject: Re: [HACKERS] Adding PRIMARY KEY info
References: <m0zEaoV-00006JC@druid.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: RO
D'Arcy J.M. Cain wrote:
>
> Thus spake Vadim Mikheev
> > Imho, indices should be used/created for FOREIGN keys and so pg_index
> > is good place for both PRIMARY and FOREIGN keys infos.
>
> Are you sure? I don't know about implementing it but it seems more
> like an attribute thing rather than an index thing. Certainly from a
> database design viewpoint you want to refer to the fields, not the
> index on them. If you put it into the index then you have to do
> an extra join to get the information.
>
> Perhaps you have to do the extra join anyway for other purposes so it
> may not matter. All I want is to be able to be able to extract the
> field that the designer specified as the key. As long as I can design
> a select statement that gives me that I don't much care how it is
> implemented. I'll cache the information anyway so it won't have a
> huge impact on my programs.
First, let me note that you have to add int28 field to pg_class,
not just oid field, to know what attributeS are in primary key
(we support multi-attribute primary keys).
This could be done...
But what about foreign and unique (!) keys ?
There may be _many_ foreign/unique keys defined for one table!
And so foreign/unique keys info have to be stored somewhere else,
not in pg_class.
pg_index is good place for all _3_ key types because of:
1. index should be created for each foreign key -
just for performance.
2. pg_index already has int28 field for key attributes.
3. pg_index already has indisunique (note that foreign keys
may reference unique keys, not just primary ones).
- so we have just add two fields to pg_index:
bool indisprimary;
oid indreferenced;
^^^^^^^^^^^^^^^^^^
this is for foreign keys: oid of referenced relation'
primary/unique key index.
I agreed that indices are just implementation...
If you don't like to store key infos in pg_index then
new pg_key relation have to be added...
Comments ?
Vadim
From owner-pgsql-hackers@hub.org Sat Sep 5 02:01:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA14437
for <maillist@candle.pha.pa.us>; Sat, 5 Sep 1998 02:01:11 -0400 (EDT)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id BAA09928 for <maillist@candle.pha.pa.us>; Sat, 5 Sep 1998 01:48:32 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA18282; Sat, 5 Sep 1998 01:43:16 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sat, 05 Sep 1998 01:41:40 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA18241 for pgsql-hackers-outgoing; Sat, 5 Sep 1998 01:41:38 -0400 (EDT)
Received: from dune.krs.ru (dune.krs.ru [195.161.16.38]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA18211; Sat, 5 Sep 1998 01:41:21 -0400 (EDT)
Received: from krs.ru (localhost.krs.ru [127.0.0.1])
by dune.krs.ru (8.8.8/8.8.8) with ESMTP id NAA20555;
Sat, 5 Sep 1998 13:40:44 +0800 (KRSS)
(envelope-from vadim@krs.ru)
Message-ID: <35F0CEDB.AD721090@krs.ru>
Date: Sat, 05 Sep 1998 13:40:43 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.05 [en] (X11; I; FreeBSD 2.2.6-RELEASE i386)
MIME-Version: 1.0
To: "D'Arcy J.M. Cain" <darcy@druid.net>
CC: hackers@postgreSQL.org, pgsql-core@postgreSQL.org
Subject: Re: [HACKERS] Adding PRIMARY KEY info
References: <m0zEvLK-00006FC@druid.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: ROr
D'Arcy J.M. Cain wrote:
>
> >
> > pg_index is good place for all _3_ key types because of:
> >
> > 1. index should be created for each foreign key -
> > just for performance.
> > 2. pg_index already has int28 field for key attributes.
> > 3. pg_index already has indisunique (note that foreign keys
> > may reference unique keys, not just primary ones).
> >
> > - so we have just add two fields to pg_index:
> >
> > bool indisprimary;
> > oid indreferenced;
> > ^^^^^^^^^^^^^^^^^^
> > this is for foreign keys: oid of referenced relation'
> > primary/unique key index.
>
> Sounds fine to me. Any chance of seeing this in 6.4?
I could add this (and FOREIGN key implementation) before
11-13 Sep... But not the ALTER TABLE ADD/DROP CONSTRAINT
stuff (ok for Entry SQL).
But we are in beta...
Comments?
> Nope, pg_index is fine by me. Now, once we have this, how do we find
> the index for a particular attribute? I can't seem to figure out the
> relationship between pg_attribute and pg_index. The chart in the docs
> suggests that indkey is the relation but I can't see any useful info
> there for joining the tables.
pg_index:
indrelid - oid of indexed relation
indkey - up to the 8 attnums
pg_attribute:
attrelid - oid of relation
attnum - ...
Without outer join you have to query pg_attribute for each
valid attnum from pg_index->indkey -:(
Vadim

240
doc/TODO.detail/tcl_arrays Normal file
View File

@ -0,0 +1,240 @@
From owner-pgsql-patches@hub.org Wed Oct 14 17:31:26 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA01594
for <maillist@candle.pha.pa.us>; Wed, 14 Oct 1998 17:31:24 -0400 (EDT)
Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id RAA01745 for <maillist@candle.pha.pa.us>; Wed, 14 Oct 1998 17:12:28 -0400 (EDT)
Received: from localhost (majordom@localhost)
by hub.org (8.8.8/8.8.8) with SMTP id RAA06607;
Wed, 14 Oct 1998 17:10:43 -0400 (EDT)
(envelope-from owner-pgsql-patches@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 14 Oct 1998 17:10:27 +0000 (EDT)
Received: (from majordom@localhost)
by hub.org (8.8.8/8.8.8) id RAA06562
for pgsql-patches-outgoing; Wed, 14 Oct 1998 17:10:26 -0400 (EDT)
(envelope-from owner-pgsql-patches@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-patches@postgreSQL.org using -f
Received: from mambo.cs.unitn.it (mambo.cs.unitn.it [193.205.199.204])
by hub.org (8.8.8/8.8.8) with SMTP id RAA06494
for <pgsql-patches@postgreSQL.org>; Wed, 14 Oct 1998 17:10:01 -0400 (EDT)
(envelope-from dz@cs.unitn.it)
Received: from nikita.wizard.net (ts-slip31.gelso.unitn.it [193.205.200.31]) by mambo.cs.unitn.it (8.6.12/8.6.12) with ESMTP id XAA20316 for <pgsql-patches@postgreSQL.org>; Wed, 14 Oct 1998 23:09:52 +0200
Received: (from dz@localhost) by nikita.wizard.net (8.8.5/8.6.9) id WAA00489 for pgsql-patches@postgreSQL.org; Wed, 14 Oct 1998 22:56:58 +0200
From: Massimo Dal Zotto <dz@cs.unitn.it>
Message-Id: <199810142056.WAA00489@nikita.wizard.net>
Subject: [PATCHES] TCL_ARRAYS
To: pgsql-patches@postgreSQL.org (Pgsql Patches)
Date: Wed, 14 Oct 1998 22:56:58 +0200 (MET DST)
X-Mailer: ELM [version 2.4 PL24 ME4]
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-pgsql-patches@postgreSQL.org
Precedence: bulk
Status: RO
Hi,
I have written this patch which fixes some problems with TCL_ARRAYS.
The new array code uses a temporary buffer and is disabled by default
because it depends on contrib/string-io which most of you don't use.
This raises once again the problem of backslashes/escapes and various
ambiguities in pgsql output. I hope this will be solved in 6.5.
*** src/interfaces/libpgtcl/pgtclCmds.c.orig Mon Sep 21 09:00:19 1998
--- src/interfaces/libpgtcl/pgtclCmds.c Wed Oct 14 15:32:21 1998
***************
*** 602,616 ****
{
for (i = 0; i < PQnfields(result); i++)
{
sprintf(nameBuffer, "%d,%.200s", tupno, PQfname(result, i));
if (Tcl_SetVar2(interp, arrVar, nameBuffer,
! #ifdef TCL_ARRAYS
! tcl_value(PQgetvalue(result, tupno, i)),
#else
PQgetvalue(result, tupno, i),
- #endif
TCL_LEAVE_ERR_MSG) == NULL)
return TCL_ERROR;
}
}
Tcl_AppendResult(interp, arrVar, 0);
--- 602,624 ----
{
for (i = 0; i < PQnfields(result); i++)
{
+ #ifdef TCL_ARRAYS
+ char *buff = strdup(PQgetvalue(result, tupno, i));
sprintf(nameBuffer, "%d,%.200s", tupno, PQfname(result, i));
if (Tcl_SetVar2(interp, arrVar, nameBuffer,
! tcl_value(buff),
! TCL_LEAVE_ERR_MSG) == NULL) {
! free(buff);
! return TCL_ERROR;
! }
! free(buff);
#else
+ sprintf(nameBuffer, "%d,%.200s", tupno, PQfname(result, i));
+ if (Tcl_SetVar2(interp, arrVar, nameBuffer,
PQgetvalue(result, tupno, i),
TCL_LEAVE_ERR_MSG) == NULL)
return TCL_ERROR;
+ #endif
}
}
Tcl_AppendResult(interp, arrVar, 0);
***************
*** 636,643 ****
*/
for (tupno = 0; tupno < PQntuples(result); tupno++)
{
const char *field0 = PQgetvalue(result, tupno, 0);
! char * workspace = malloc(strlen(field0) + strlen(appendstr) + 210);
for (i = 1; i < PQnfields(result); i++)
{
--- 644,674 ----
*/
for (tupno = 0; tupno < PQntuples(result); tupno++)
{
+ #ifdef TCL_ARRAYS
+ char *buff = strdup(PQgetvalue(result, tupno, 0));
+ const char *field0 = tcl_value(buff);
+ char *workspace = malloc(strlen(field0) + 210 + strlen(appendstr));
+
+ for (i = 1; i < PQnfields(result); i++)
+ {
+ free(buff);
+ buff = strdup(PQgetvalue(result, tupno, i));
+ sprintf(workspace, "%s,%.200s%s", field0, PQfname(result,i),
+ appendstr);
+ if (Tcl_SetVar2(interp, arrVar, workspace,
+ tcl_value(buff),
+ TCL_LEAVE_ERR_MSG) == NULL)
+ {
+ free(buff);
+ free(workspace);
+ return TCL_ERROR;
+ }
+ }
+ free(buff);
+ free(workspace);
+ #else
const char *field0 = PQgetvalue(result, tupno, 0);
! char *workspace = malloc(strlen(field0) + 210 + strlen(appendstr));
for (i = 1; i < PQnfields(result); i++)
{
***************
*** 652,657 ****
--- 683,689 ----
}
}
free(workspace);
+ #endif
}
Tcl_AppendResult(interp, arrVar, 0);
return TCL_OK;
***************
*** 669,676 ****
--- 701,716 ----
Tcl_AppendResult(interp, "argument to getTuple cannot exceed number of tuples - 1", 0);
return TCL_ERROR;
}
+ #ifdef TCL_ARRAYS
+ for (i = 0; i < PQnfields(result); i++) {
+ char *buff = strdup(PQgetvalue(result, tupno, i));
+ Tcl_AppendElement(interp, tcl_value(buff));
+ free(buff);
+ }
+ #else
for (i = 0; i < PQnfields(result); i++)
Tcl_AppendElement(interp, PQgetvalue(result, tupno, i));
+ #endif
return TCL_OK;
}
else if (strcmp(opt, "-tupleArray") == 0)
***************
*** 688,697 ****
--- 728,748 ----
}
for (i = 0; i < PQnfields(result); i++)
{
+ #ifdef TCL_ARRAYS
+ char *buff = strdup(PQgetvalue(result, tupno, i));
+ if (Tcl_SetVar2(interp, argv[4], PQfname(result, i),
+ tcl_value(buff),
+ TCL_LEAVE_ERR_MSG) == NULL) {
+ free(buff);
+ return TCL_ERROR;
+ }
+ free(buff);
+ #else
if (Tcl_SetVar2(interp, argv[4], PQfname(result, i),
PQgetvalue(result, tupno, i),
TCL_LEAVE_ERR_MSG) == NULL)
return TCL_ERROR;
+ #endif
}
return TCL_OK;
}
***************
*** 1303,1310 ****
sprintf(buffer, "%d", tupno);
Tcl_SetVar2(interp, argv[3], ".tupno", buffer, 0);
for (column = 0; column < ncols; column++)
! Tcl_SetVar2(interp, argv[3], info[column].cname, PQgetvalue(result, tupno, column), 0);
Tcl_SetVar2(interp, argv[3], ".command", "update", 0);
--- 1354,1371 ----
sprintf(buffer, "%d", tupno);
Tcl_SetVar2(interp, argv[3], ".tupno", buffer, 0);
+ #ifdef TCL_ARRAYS
+ for (column = 0; column < ncols; column++) {
+ char *buff = strdup(PQgetvalue(result, tupno, column));
+ Tcl_SetVar2(interp, argv[3], info[column].cname,
+ tcl_value(buff), 0);
+ free(buff);
+ }
+ #else
for (column = 0; column < ncols; column++)
! Tcl_SetVar2(interp, argv[3], info[column].cname,
! PQgetvalue(result, tupno, column), 0);
! #endif
Tcl_SetVar2(interp, argv[3], ".command", "update", 0);
*** src/include/config.h.in.orig Wed Aug 26 09:01:16 1998
--- src/include/config.h.in Wed Oct 14 22:44:00 1998
***************
*** 312,318 ****
* of postgres C-like arrays, for example {{"a1" "a2"} {"b1" "b2"}} instead
* of {{"a1","a2"},{"b1","b2"}}.
*/
! #define TCL_ARRAYS
/*
* The following flag allows limiting the number of rows returned by a query.
--- 312,318 ----
* of postgres C-like arrays, for example {{"a1" "a2"} {"b1" "b2"}} instead
* of {{"a1","a2"},{"b1","b2"}}.
*/
! /* #define TCL_ARRAYS */
/*
* The following flag allows limiting the number of rows returned by a query.
--
Massimo Dal Zotto
+----------------------------------------------------------------------+
| Massimo Dal Zotto email: dz@cs.unitn.it |
| Via Marconi, 141 phone: ++39-461-534251 |
| 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
| Italy pgp: finger dz@tango.cs.unitn.it |
+----------------------------------------------------------------------+