mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-10-04 04:46:52 +02:00
Remove TODO.detail files that contained useless or very old information.
Update TODO accordingly.
This commit is contained in:
parent
5de02e283f
commit
2b721d3d41
@ -1,542 +0,0 @@
|
|||||||
From fjoe@iclub.nsu.ru Tue Jan 23 03:38:45 2001
|
|
||||||
Received: from mx.nsu.ru (root@mx.nsu.ru [193.124.215.71])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA14458
|
|
||||||
for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 03:38:24 -0500 (EST)
|
|
||||||
Received: from iclub.nsu.ru (root@iclub.nsu.ru [193.124.222.66])
|
|
||||||
by mx.nsu.ru (8.9.1/8.9.0) with ESMTP id OAA29153;
|
|
||||||
Tue, 23 Jan 2001 14:31:27 +0600 (NOVT)
|
|
||||||
Received: from localhost (fjoe@localhost)
|
|
||||||
by iclub.nsu.ru (8.11.1/8.11.1) with ESMTP id f0N8VOr15273;
|
|
||||||
Tue, 23 Jan 2001 14:31:25 +0600 (NS)
|
|
||||||
(envelope-from fjoe@iclub.nsu.ru)
|
|
||||||
Date: Tue, 23 Jan 2001 14:31:24 +0600 (NS)
|
|
||||||
From: Max Khon <fjoe@iclub.nsu.ru>
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
Subject: Re: [HACKERS] Bug in FOREIGN KEY
|
|
||||||
In-Reply-To: <200101230416.XAA04293@candle.pha.pa.us>
|
|
||||||
Message-ID: <Pine.BSF.4.21.0101231429310.12474-100000@iclub.nsu.ru>
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
hi, there!
|
|
||||||
|
|
||||||
On Mon, 22 Jan 2001, Bruce Momjian wrote:
|
|
||||||
|
|
||||||
>
|
|
||||||
> > This problem with foreign keys has been reported to me, and I have confirmed
|
|
||||||
> > the bug exists in current sources. The DELETE should succeed:
|
|
||||||
> >
|
|
||||||
> > ---------------------------------------------------------------------------
|
|
||||||
> >
|
|
||||||
> > CREATE TABLE primarytest2 (
|
|
||||||
> > col1 INTEGER,
|
|
||||||
> > col2 INTEGER,
|
|
||||||
> > PRIMARY KEY(col1, col2)
|
|
||||||
> > );
|
|
||||||
> >
|
|
||||||
> > CREATE TABLE foreigntest2 (col3 INTEGER,
|
|
||||||
> > col4 INTEGER,
|
|
||||||
> > FOREIGN KEY (col3, col4) REFERENCES primarytest2
|
|
||||||
> > );
|
|
||||||
> > test=> BEGIN;
|
|
||||||
> > BEGIN
|
|
||||||
> > test=> INSERT INTO primarytest2 VALUES (5,5);
|
|
||||||
> > INSERT 27618 1
|
|
||||||
> > test=> DELETE FROM primarytest2 WHERE col1 = 5 AND col2 = 5;
|
|
||||||
> > ERROR: triggered data change violation on relation "primarytest2"
|
|
||||||
|
|
||||||
I have another (slightly different) example:
|
|
||||||
--- cut here ---
|
|
||||||
test=> CREATE TABLE pr(obj_id int PRIMARY KEY);
|
|
||||||
NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'pr_pkey' for
|
|
||||||
table 'pr'
|
|
||||||
CREATE
|
|
||||||
test=> CREATE TABLE fr(obj_id int REFERENCES pr ON DELETE CASCADE);
|
|
||||||
NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY
|
|
||||||
check(s)
|
|
||||||
CREATE
|
|
||||||
test=> BEGIN;
|
|
||||||
BEGIN
|
|
||||||
test=> INSERT INTO pr (obj_id) VALUES (1);
|
|
||||||
INSERT 200539 1
|
|
||||||
test=> INSERT INTO fr (obj_id) SELECT obj_id FROM pr;
|
|
||||||
INSERT 200540 1
|
|
||||||
test=> DELETE FROM fr;
|
|
||||||
ERROR: triggered data change violation on relation "fr"
|
|
||||||
test=>
|
|
||||||
--- cut here ---
|
|
||||||
|
|
||||||
we are running postgresql 7.1 beta3
|
|
||||||
|
|
||||||
/fjoe
|
|
||||||
|
|
||||||
|
|
||||||
From sszabo@megazone23.bigpanda.com Tue Jan 23 13:41:55 2001
|
|
||||||
Received: from megazone23.bigpanda.com (rfx-64-6-210-138.users.reflexcom.com [64.6.210.138])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA19924
|
|
||||||
for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 13:41:54 -0500 (EST)
|
|
||||||
Received: from localhost (sszabo@localhost)
|
|
||||||
by megazone23.bigpanda.com (8.11.1/8.11.1) with ESMTP id f0NIfLa41018;
|
|
||||||
Tue, 23 Jan 2001 10:41:21 -0800 (PST)
|
|
||||||
Date: Tue, 23 Jan 2001 10:41:21 -0800 (PST)
|
|
||||||
From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: Jan Wieck <janwieck@Yahoo.com>, Peter Eisentraut <peter_e@gmx.net>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
Subject: Re: [HACKERS] Bug in FOREIGN KEY
|
|
||||||
In-Reply-To: <200101230417.XAA04332@candle.pha.pa.us>
|
|
||||||
Message-ID: <Pine.BSF.4.21.0101231031290.40955-100000@megazone23.bigpanda.com>
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
|
|
||||||
> > Think I misinterpreted the SQL3 specs WR to this detail. The
|
|
||||||
> > checks must be made per statement, not at the transaction
|
|
||||||
> > level. I'll try to fix it, but we need to define what will
|
|
||||||
> > happen with referential actions in the case of conflicting
|
|
||||||
> > actions on the same key - there are some possible conflicts:
|
|
||||||
> >
|
|
||||||
> > 1. DEFERRED ON DELETE NO ACTION or RESTRICT
|
|
||||||
> >
|
|
||||||
> > Do the referencing rows reference to the new PK row with
|
|
||||||
> > the same key now, or is this still a constraint
|
|
||||||
> > violation? I would say it's not, because the constraint
|
|
||||||
> > condition is satisfied at the end of the transaction. How
|
|
||||||
> > do other databases behave?
|
|
||||||
> >
|
|
||||||
> > 2. DEFERRED ON DELETE CASCADE, SET NULL or SET DEFAULT
|
|
||||||
> >
|
|
||||||
> > Again I'd say that the action should be suppressed
|
|
||||||
> > because a matching PK row is present at transaction end -
|
|
||||||
> > it's not the same old row, but the constraint itself is
|
|
||||||
> > still satisfied.
|
|
||||||
|
|
||||||
I'm not actually sure on the cascade, set null and set default. The
|
|
||||||
way they are written seems to imply to me that it's based on the state
|
|
||||||
of the database before/after the command in question as opposed to the
|
|
||||||
deferred state of the database because of the stuff about updating the
|
|
||||||
state of partially matching rows immediately after the delete/update of
|
|
||||||
the row which wouldn't really make sense when deferred. Does anyone know
|
|
||||||
what other systems do with a case something like this all in a
|
|
||||||
transaction:
|
|
||||||
|
|
||||||
create table a (a int primary key);
|
|
||||||
create table b (b int references a match full on update cascade
|
|
||||||
on delete cascade deferrable initially deferred);
|
|
||||||
insert into a values (1);
|
|
||||||
insert into a values (2);
|
|
||||||
insert into b values (1);
|
|
||||||
delete from a where a=1;
|
|
||||||
select * from b;
|
|
||||||
commit;
|
|
||||||
|
|
||||||
|
|
||||||
From pgsql-hackers-owner+M3901@postgresql.org Fri Jan 26 17:00:24 2001
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA10576
|
|
||||||
for <pgman@candle.pha.pa.us>; Fri, 26 Jan 2001 17:00:24 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0QLtVq53019;
|
|
||||||
Fri, 26 Jan 2001 16:55:31 -0500 (EST)
|
|
||||||
(envelope-from pgsql-hackers-owner+M3901@postgresql.org)
|
|
||||||
Received: from smtp1b.mail.yahoo.com (smtp3.mail.yahoo.com [128.11.68.135])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0QLqmq52691
|
|
||||||
for <pgsql-hackers@postgresql.org>; Fri, 26 Jan 2001 16:52:48 -0500 (EST)
|
|
||||||
(envelope-from janwieck@yahoo.com)
|
|
||||||
Received: from j13.us.greatbridge.com (HELO jupiter.greatbridge.com) (216.54.52.153)
|
|
||||||
by smtp.mail.vip.suc.yahoo.com with SMTP; 26 Jan 2001 22:49:57 -0000
|
|
||||||
X-Apparently-From: <janwieck@yahoo.com>
|
|
||||||
Received: (from janwieck@localhost)
|
|
||||||
by jupiter.greatbridge.com (8.9.3/8.9.3) id RAA04701;
|
|
||||||
Fri, 26 Jan 2001 17:02:32 -0500
|
|
||||||
From: Jan Wieck <janwieck@Yahoo.com>
|
|
||||||
Message-Id: <200101262202.RAA04701@jupiter.greatbridge.com>
|
|
||||||
Subject: Re: [HACKERS] Bug in FOREIGN KEY
|
|
||||||
In-Reply-To: <200101262110.QAA06902@candle.pha.pa.us> from Bruce Momjian at "Jan
|
|
||||||
26, 2001 04:10:22 pm"
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
Date: Fri, 26 Jan 2001 17:02:32 -0500 (EST)
|
|
||||||
CC: Jan Wieck <janwieck@Yahoo.com>, Peter Eisentraut <peter_e@gmx.net>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
X-Mailer: ELM [version 2.4ME+ PL68 (25)]
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: text/plain; charset=US-ASCII
|
|
||||||
Content-Transfer-Encoding: 7bit
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@postgresql.org
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Bruce Momjian wrote:
|
|
||||||
> Here is another bug:
|
|
||||||
>
|
|
||||||
> test=> begin;
|
|
||||||
> BEGIN
|
|
||||||
> test=> INSERT INTO primarytest2 VALUES (5,5);
|
|
||||||
> INSERT 18757 1
|
|
||||||
> test=> UPDATE primarytest2 SET col2=1 WHERE col1 = 5 AND col2 = 5;
|
|
||||||
> ERROR: deferredTriggerGetPreviousEvent: event for tuple (0,10) not
|
|
||||||
> found
|
|
||||||
|
|
||||||
Schema?
|
|
||||||
|
|
||||||
|
|
||||||
Jan
|
|
||||||
|
|
||||||
--
|
|
||||||
|
|
||||||
#======================================================================#
|
|
||||||
# It's easier to get forgiveness for being wrong than for being right. #
|
|
||||||
# Let's break this rule - forgive me. #
|
|
||||||
#================================================== JanWieck@Yahoo.com #
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
_________________________________________________________
|
|
||||||
Do You Yahoo!?
|
|
||||||
Get your free @yahoo.com address at http://mail.yahoo.com
|
|
||||||
|
|
||||||
|
|
||||||
From pgsql-hackers-owner+M3864@postgresql.org Fri Jan 26 10:07:36 2001
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA17732
|
|
||||||
for <pgman@candle.pha.pa.us>; Fri, 26 Jan 2001 10:07:35 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0QF3lq12782;
|
|
||||||
Fri, 26 Jan 2001 10:03:47 -0500 (EST)
|
|
||||||
(envelope-from pgsql-hackers-owner+M3864@postgresql.org)
|
|
||||||
Received: from mailout00.sul.t-online.com (mailout00.sul.t-online.com [194.25.134.16])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f0QF0Yq12614
|
|
||||||
for <pgsql-hackers@postgresql.org>; Fri, 26 Jan 2001 10:00:34 -0500 (EST)
|
|
||||||
(envelope-from peter_e@gmx.net)
|
|
||||||
Received: from fwd01.sul.t-online.com
|
|
||||||
by mailout00.sul.t-online.com with smtp
|
|
||||||
id 14MALp-0006Im-00; Fri, 26 Jan 2001 15:59:45 +0100
|
|
||||||
Received: from peter.localdomain (520083510237-0001@[212.185.245.73]) by fmrl01.sul.t-online.com
|
|
||||||
with esmtp id 14MALQ-1Z0gkaC; Fri, 26 Jan 2001 15:59:20 +0100
|
|
||||||
Date: Fri, 26 Jan 2001 16:07:27 +0100 (CET)
|
|
||||||
From: Peter Eisentraut <peter_e@gmx.net>
|
|
||||||
To: Hiroshi Inoue <Inoue@tpf.co.jp>
|
|
||||||
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
Subject: Re: [HACKERS] Open 7.1 items
|
|
||||||
In-Reply-To: <3A70FA87.933B3D51@tpf.co.jp>
|
|
||||||
Message-ID: <Pine.LNX.4.30.0101261604030.769-100000@peter.localdomain>
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
|
||||||
X-Sender: 520083510237-0001@t-dialin.net
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@postgresql.org
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Hiroshi Inoue writes:
|
|
||||||
|
|
||||||
> What does this item mean ?
|
|
||||||
> Is it the following ?
|
|
||||||
>
|
|
||||||
> begin;
|
|
||||||
> insert into pk (id) values (1);
|
|
||||||
> update(delete from) pk where id=1;
|
|
||||||
> ERROR: triggered data change violation on relation pk"
|
|
||||||
>
|
|
||||||
> If so, isn't it a simple bug ?
|
|
||||||
|
|
||||||
Depends on the definition of "bug". It's not spec compliant and it's not
|
|
||||||
documented and it's annoying. But it's been like this for a year and the
|
|
||||||
issue is well known and can normally be avoided. It looks like a
|
|
||||||
documentation to-do to me.
|
|
||||||
|
|
||||||
--
|
|
||||||
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
|
|
||||||
|
|
||||||
|
|
||||||
From pgsql-hackers-owner+M3876@postgresql.org Fri Jan 26 13:07:10 2001
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA26086
|
|
||||||
for <pgman@candle.pha.pa.us>; Fri, 26 Jan 2001 13:07:09 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0QI4Vq30248;
|
|
||||||
Fri, 26 Jan 2001 13:04:31 -0500 (EST)
|
|
||||||
(envelope-from pgsql-hackers-owner+M3876@postgresql.org)
|
|
||||||
Received: from sectorbase2.sectorbase.com ([208.48.122.131])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0QI3Aq30098
|
|
||||||
for <pgsql-hackers@postgreSQL.org>; Fri, 26 Jan 2001 13:03:11 -0500 (EST)
|
|
||||||
(envelope-from vmikheev@SECTORBASE.COM)
|
|
||||||
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
|
|
||||||
id <D49FAF71>; Fri, 26 Jan 2001 09:41:23 -0800
|
|
||||||
Message-ID: <8F4C99C66D04D4118F580090272A7A234D32C1@sectorbase1.sectorbase.com>
|
|
||||||
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
||||||
To: "'Jan Wieck'" <janwieck@Yahoo.com>,
|
|
||||||
PostgreSQL HACKERS
|
|
||||||
<pgsql-hackers@postgresql.org>,
|
|
||||||
Bruce Momjian <root@candle.pha.pa.us>
|
|
||||||
Subject: RE: [HACKERS] Open 7.1 items
|
|
||||||
Date: Fri, 26 Jan 2001 10:02:59 -0800
|
|
||||||
MIME-Version: 1.0
|
|
||||||
X-Mailer: Internet Mail Service (5.5.2653.19)
|
|
||||||
Content-Type: text/plain;
|
|
||||||
charset="iso-8859-1"
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@postgresql.org
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
> > FOREIGN KEY INSERT & UPDATE/DELETE in transaction "change violation"
|
|
||||||
>
|
|
||||||
> A well known issue, and I've asked multiple times how exactly
|
|
||||||
> we want to define the behaviour for deferred constraints. Do
|
|
||||||
> foreign keys reference just to a key value and are happy with
|
|
||||||
> it's existance, or do they refer to a particular row?
|
|
||||||
|
|
||||||
I think first. The last is closer to OODBMS world, not to [O]RDBMS one.
|
|
||||||
|
|
||||||
> Consider you have a deferred "ON DELETE CASCADE" constraint
|
|
||||||
> and do a DELETE, INSERT of a PK. Do the FK rows need to be
|
|
||||||
> deleted or not?
|
|
||||||
|
|
||||||
Good example. I think FK should not be deleted. If someone really
|
|
||||||
want to delete "old" FK then he can do
|
|
||||||
|
|
||||||
DELETE PK;
|
|
||||||
SET CONSTRAINT ... IMMEDIATE; -- FK need to be deleted here
|
|
||||||
INSERT PK;
|
|
||||||
|
|
||||||
> Consider you have a deferred "ON DELETE RESTRICT" and "ON
|
|
||||||
> UPDATE CASCADE" constraint. If you DELETE PK1 and UPDATE PK2
|
|
||||||
> to PK1, the FK2 rows need to follow, but does PK2 inherit all
|
|
||||||
> FK1 rows now so it's the master of both groups?
|
|
||||||
|
|
||||||
Yes. Again one can use SET CONSTRAINT to achieve desirable results.
|
|
||||||
It seems that SET CONSTRAINT was designed for these purposes - ie
|
|
||||||
for better flexibility.
|
|
||||||
|
|
||||||
Though, it would be better to look how other DBes handle all these
|
|
||||||
cases -:)
|
|
||||||
|
|
||||||
Vadim
|
|
||||||
|
|
||||||
From janwieck@yahoo.com Fri Jan 26 12:20:27 2001
|
|
||||||
Received: from smtp6.mail.yahoo.com (smtp6.mail.yahoo.com [128.11.69.103])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id MAA22158
|
|
||||||
for <root@candle.pha.pa.us>; Fri, 26 Jan 2001 12:20:27 -0500 (EST)
|
|
||||||
Received: from j13.us.greatbridge.com (HELO jupiter.greatbridge.com) (216.54.52.153)
|
|
||||||
by smtp.mail.vip.suc.yahoo.com with SMTP; 26 Jan 2001 17:20:26 -0000
|
|
||||||
X-Apparently-From: <janwieck@yahoo.com>
|
|
||||||
Received: (from janwieck@localhost)
|
|
||||||
by jupiter.greatbridge.com (8.9.3/8.9.3) id MAA03196;
|
|
||||||
Fri, 26 Jan 2001 12:30:05 -0500
|
|
||||||
From: Jan Wieck <janwieck@yahoo.com>
|
|
||||||
Message-Id: <200101261730.MAA03196@jupiter.greatbridge.com>
|
|
||||||
Subject: Re: [HACKERS] Open 7.1 items
|
|
||||||
To: PostgreSQL HACKERS <pgsql-hackers@postgreSQL.org>,
|
|
||||||
Bruce Momjian <root@candle.pha.pa.us>
|
|
||||||
Date: Fri, 26 Jan 2001 12:30:05 -0500 (EST)
|
|
||||||
X-Mailer: ELM [version 2.4ME+ PL68 (25)]
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: text/plain; charset=US-ASCII
|
|
||||||
Content-Transfer-Encoding: 7bit
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Bruce Momjian wrote:
|
|
||||||
> Here are my open 7.1 items. Thanks for shrinking the list so far.
|
|
||||||
>
|
|
||||||
> ---------------------------------------------------------------------------
|
|
||||||
>
|
|
||||||
> FreeBSD locale bug
|
|
||||||
> Reorder INSERT firing in rules
|
|
||||||
|
|
||||||
I don't recall why this is wanted. AFAIK there's no reason
|
|
||||||
NOT to do so, except for the actual state of beeing far too
|
|
||||||
close to a release candidate.
|
|
||||||
|
|
||||||
> Philip Warner UPDATE crash
|
|
||||||
> JDBC LargeObject short read return value missing
|
|
||||||
> SELECT cash_out(1) crashes all backends
|
|
||||||
> LAZY VACUUM
|
|
||||||
> FOREIGN KEY INSERT & UPDATE/DELETE in transaction "change violation"
|
|
||||||
|
|
||||||
A well known issue, and I've asked multiple times how exactly
|
|
||||||
we want to define the behaviour for deferred constraints. Do
|
|
||||||
foreign keys reference just to a key value and are happy with
|
|
||||||
it's existance, or do they refer to a particular row?
|
|
||||||
|
|
||||||
Consider you have a deferred "ON DELETE CASCADE" constraint
|
|
||||||
and do a DELETE, INSERT of a PK. Do the FK rows need to be
|
|
||||||
deleted or not?
|
|
||||||
|
|
||||||
Consider you have a deferred "ON DELETE RESTRICT" and "ON
|
|
||||||
UPDATE CASCADE" constraint. If you DELETE PK1 and UPDATE PK2
|
|
||||||
to PK1, the FK2 rows need to follow, but does PK2 inherit all
|
|
||||||
FK1 rows now so it's the master of both groups?
|
|
||||||
|
|
||||||
These are only two possible combinations. There are many to
|
|
||||||
think of. As said, I've asked before, but noone voted yet.
|
|
||||||
Move the item to 7.2 anyway, because changing this behaviour
|
|
||||||
would require massive changes in the trigger queue *and* the
|
|
||||||
generic RI triggers, which cannot be tested enough any more.
|
|
||||||
|
|
||||||
|
|
||||||
Jan
|
|
||||||
|
|
||||||
> Usernames limited in length
|
|
||||||
> Does pg_dump preserve COMMENTs?
|
|
||||||
> Failure of nested cursors in JDBC
|
|
||||||
> JDBC setMaxRows() is global variable affecting other objects
|
|
||||||
> Does JDBC Makefile need current dir?
|
|
||||||
> Fix for pg_dump of bad system tables
|
|
||||||
> Steve Howe failure query with rules
|
|
||||||
> ODBC/JDBC not disconnecting properly?
|
|
||||||
> Magnus Hagander ODBC issues?
|
|
||||||
> Merge MySQL/PgSQL translation scripts
|
|
||||||
> Fix ipcclean on Linux
|
|
||||||
> Merge global and template BKI files?
|
|
||||||
>
|
|
||||||
>
|
|
||||||
> --
|
|
||||||
> Bruce Momjian | http://candle.pha.pa.us
|
|
||||||
> pgman@candle.pha.pa.us | (610) 853-3000
|
|
||||||
> + If your life is a hard drive, | 830 Blythe Avenue
|
|
||||||
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
||||||
>
|
|
||||||
|
|
||||||
|
|
||||||
--
|
|
||||||
|
|
||||||
#======================================================================#
|
|
||||||
# It's easier to get forgiveness for being wrong than for being right. #
|
|
||||||
# Let's break this rule - forgive me. #
|
|
||||||
#================================================== JanWieck@Yahoo.com #
|
|
||||||
|
|
||||||
|
|
||||||
_________________________________________________________
|
|
||||||
Do You Yahoo!?
|
|
||||||
Get your free @yahoo.com address at http://mail.yahoo.com
|
|
||||||
|
|
||||||
|
|
||||||
From pgsql-general-owner+M590@postgresql.org Tue Nov 14 16:30:40 2000
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA22313
|
|
||||||
for <pgman@candle.pha.pa.us>; Tue, 14 Nov 2000 17:30:39 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAEMSJs66979;
|
|
||||||
Tue, 14 Nov 2000 17:28:21 -0500 (EST)
|
|
||||||
(envelope-from pgsql-general-owner+M590@postgresql.org)
|
|
||||||
Received: from megazone23.bigpanda.com (138.210.6.64.reflexcom.com [64.6.210.138])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAEMREs66800
|
|
||||||
for <pgsql-general@postgresql.org>; Tue, 14 Nov 2000 17:27:14 -0500 (EST)
|
|
||||||
(envelope-from sszabo@megazone23.bigpanda.com)
|
|
||||||
Received: from localhost (sszabo@localhost)
|
|
||||||
by megazone23.bigpanda.com (8.11.1/8.11.0) with ESMTP id eAEMPpH69059;
|
|
||||||
Tue, 14 Nov 2000 14:25:51 -0800 (PST)
|
|
||||||
Date: Tue, 14 Nov 2000 14:25:51 -0800 (PST)
|
|
||||||
From: Stephan Szabo <sszabo@megazone23.bigpanda.com>
|
|
||||||
To: "Beth K. Gatewood" <bethg@mbt.washington.edu>
|
|
||||||
cc: pgsql-general@postgresql.org
|
|
||||||
Subject: Re: [GENERAL] a request for some experienced input.....
|
|
||||||
In-Reply-To: <3A11ACA1.E5D847DD@mbt.washington.edu>
|
|
||||||
Message-ID: <Pine.BSF.4.21.0011141403380.68986-100000@megazone23.bigpanda.com>
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-general-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
|
|
||||||
On Tue, 14 Nov 2000, Beth K. Gatewood wrote:
|
|
||||||
|
|
||||||
> >
|
|
||||||
>
|
|
||||||
> Stephan-
|
|
||||||
>
|
|
||||||
> Thank you so much for taking the effort to answer this these questions. You
|
|
||||||
> help is truly appreciated....
|
|
||||||
>
|
|
||||||
> I just have a few points for clarification.
|
|
||||||
>
|
|
||||||
> >
|
|
||||||
> > MATCH PARTIAL is a specific match type which describes which rows are
|
|
||||||
> > considered matching rows for purposes of meeting or failing the
|
|
||||||
> > constraint. (In match partial, a fktable (NULL, 2) would match a pk
|
|
||||||
> > table (1,2) as well as a pk table (2,2). It's different from match
|
|
||||||
> > full in which case (NULL,2) would be invalid or match unspecified
|
|
||||||
> > in which case it would match due to the existance of the NULL in any
|
|
||||||
> > case). There are some bizarre implementation details involved with
|
|
||||||
> > it and it's different from the others in ways that make it difficult.
|
|
||||||
> > It's in my list of things to do, but I haven't come up with an acceptable
|
|
||||||
> > mechanism in my head yet.
|
|
||||||
>
|
|
||||||
> Does this mean, currently that I can not have foreign keys with null values?
|
|
||||||
|
|
||||||
Not exactly...
|
|
||||||
|
|
||||||
Match full = In FK row, all columns must be NULL or the value of each
|
|
||||||
column must not be null and there is a row in the PK table where
|
|
||||||
each referencing column equals the corresponding referenced
|
|
||||||
column.
|
|
||||||
|
|
||||||
Unspecified = In FK row, at least one column must be NULL or each
|
|
||||||
referencing column shall be equal to the corresponding referenced
|
|
||||||
column in some row of the referenced table
|
|
||||||
|
|
||||||
Match partial is similar to match full except we ignore the null columns
|
|
||||||
for purposes of the each referencing column equals bit.
|
|
||||||
|
|
||||||
For example:
|
|
||||||
PK Table Key values: (1,2), (1,3), (3,3)
|
|
||||||
Attempted FK Table Key values: (1,2), (1,NULL), (5,NULL), (NULL, NULL)
|
|
||||||
(hopefully I get this right)...
|
|
||||||
In match full, only the 1st and 4th fk values are valid.
|
|
||||||
In match partial, the 1st, 2nd, and 4th fk values are valid.
|
|
||||||
In match unspecified, all the fk values are valid.
|
|
||||||
|
|
||||||
The other note is that generally speaking, all three are basically the
|
|
||||||
same for the single column key. If you're only doing references on one
|
|
||||||
column, the match type is mostly meaningless.
|
|
||||||
|
|
||||||
> > PENDANT adds that for each row of the referenced table the values of
|
|
||||||
> > the specified column(s) are the same as the values of the specified
|
|
||||||
> > column(s) in some row of the referencing tables.
|
|
||||||
>
|
|
||||||
> I am not sure I know what you mean here.....Are you saying that the value for
|
|
||||||
> the FK column must match the value for the PK column?
|
|
||||||
|
|
||||||
I haven't really looked at PENDANT, the above was just a small rewrite of
|
|
||||||
some descriptive text in the sql99 draft I have. There's a whole bunch
|
|
||||||
of rules in the actual text of the referential constraint definition.
|
|
||||||
|
|
||||||
The base stuff seems to be: (Rf is the referencing columns, T is the
|
|
||||||
referenced table)
|
|
||||||
|
|
||||||
3) If PENDANT is specified, then:
|
|
||||||
a) For a given row in the referencing table, let pendant
|
|
||||||
reference designate an instance in which all Rf are
|
|
||||||
non-null.
|
|
||||||
|
|
||||||
b) Let number of pendant paths be the number of pendant
|
|
||||||
references to the same referenced row in a referenced table
|
|
||||||
from all referencing rows in all base tables.
|
|
||||||
|
|
||||||
c) For every row in T, the number of pendant paths is equal to
|
|
||||||
or greater than 1.
|
|
||||||
|
|
||||||
So, I'd read it as every row in T must have at least one referencing row
|
|
||||||
in some base table.
|
|
||||||
|
|
||||||
There are some details about updates and that you can't mix PENDANT and
|
|
||||||
MATCH PARTIAL or SET DEFAULT actions.
|
|
||||||
|
|
||||||
> > The main issues in 7.0 are that older versions (might be fixed in
|
|
||||||
> > 7.0.3) would fail very badly if you used alter table to rename tables that
|
|
||||||
> > were referenced in a fk constraint and that you need to give update
|
|
||||||
> > permission to the referenced table. For the former, 7.1 will (and 7.0.3
|
|
||||||
> > may) give an elog(ERROR) to you rather than crashing the backend and the
|
|
||||||
> > latter should be fixed for 7.1 (although you still need to have write
|
|
||||||
> > perms to the referencing table for referential actions to work properly)
|
|
||||||
>
|
|
||||||
> Are the steps to this outlined somewhere then?
|
|
||||||
|
|
||||||
The permissions stuff is just a matter of using GRANT and REVOKE to set
|
|
||||||
the permissions that a user has to a table.
|
|
||||||
|
|
||||||
|
|
@ -1,129 +0,0 @@
|
|||||||
From pgsql-hackers-owner+M908@postgresql.org Sun Nov 19 14:27:43 2000
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA10885
|
|
||||||
for <pgman@candle.pha.pa.us>; Sun, 19 Nov 2000 14:27:42 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAJJSMs83653;
|
|
||||||
Sun, 19 Nov 2000 14:28:22 -0500 (EST)
|
|
||||||
(envelope-from pgsql-hackers-owner+M908@postgresql.org)
|
|
||||||
Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46] (may be forged))
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAJJQns83565
|
|
||||||
for <pgsql-hackers@postgreSQL.org>; Sun, 19 Nov 2000 14:26:49 -0500 (EST)
|
|
||||||
(envelope-from pgman@candle.pha.pa.us)
|
|
||||||
Received: (from pgman@localhost)
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) id OAA06790;
|
|
||||||
Sun, 19 Nov 2000 14:23:06 -0500 (EST)
|
|
||||||
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
Message-Id: <200011191923.OAA06790@candle.pha.pa.us>
|
|
||||||
Subject: Re: [HACKERS] WAL fsync scheduling
|
|
||||||
In-Reply-To: <002101c0525e$2d964480$b97a30d0@sectorbase.com> "from Vadim Mikheev
|
|
||||||
at Nov 19, 2000 11:23:19 am"
|
|
||||||
To: Vadim Mikheev <vmikheev@sectorbase.com>
|
|
||||||
Date: Sun, 19 Nov 2000 14:23:06 -0500 (EST)
|
|
||||||
CC: Tom Samplonius <tom@sdf.com>, Alfred@candle.pha.pa.us,
|
|
||||||
Perlstein <bright@wintelcom.net>, Larry@candle.pha.pa.us,
|
|
||||||
Rosenman <ler@lerctr.org>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
X-Mailer: ELM [version 2.4ME+ PL77 (25)]
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Transfer-Encoding: 7bit
|
|
||||||
Content-Type: text/plain; charset=US-ASCII
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
[ Charset ISO-8859-1 unsupported, converting... ]
|
|
||||||
> > There are two parts to transaction commit. The first is writing all
|
|
||||||
> > dirty buffers or log changes to the kernel, and second is fsync of the
|
|
||||||
> ^^^^^^^^^^^^
|
|
||||||
> Backend doesn't write any dirty buffer to the kernel at commit time.
|
|
||||||
|
|
||||||
Yes, I suspected that.
|
|
||||||
|
|
||||||
>
|
|
||||||
> > log file.
|
|
||||||
>
|
|
||||||
> The first part is writing commit record into WAL buffers in shmem.
|
|
||||||
> This is what XLogInsert does. After that XLogFlush is called to ensure
|
|
||||||
> that entire commit record is on disk. XLogFlush does *both* write() and
|
|
||||||
> fsync() (single slock is used for both writing and fsyncing) if it needs to
|
|
||||||
> do it at all.
|
|
||||||
|
|
||||||
Yes, I realize there are new steps in WAL.
|
|
||||||
|
|
||||||
>
|
|
||||||
> > I suggest having a per-backend shared memory byte that has the following
|
|
||||||
> > values:
|
|
||||||
> >
|
|
||||||
> > START_LOG_WRITE
|
|
||||||
> > WAIT_ON_FSYNC
|
|
||||||
> > NOT_IN_COMMIT
|
|
||||||
> > backend_number_doing_fsync
|
|
||||||
> >
|
|
||||||
> > I suggest that when each backend starts a commit, it sets its byte to
|
|
||||||
> > START_LOG_WRITE.
|
|
||||||
> ^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
> Isn't START_COMMIT more meaningful?
|
|
||||||
|
|
||||||
Yes.
|
|
||||||
|
|
||||||
>
|
|
||||||
> > When it gets ready to fsync, it checks all backends.
|
|
||||||
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
||||||
> What do you mean by this? The moment just after XLogInsert?
|
|
||||||
|
|
||||||
Just before it calls fsync().
|
|
||||||
|
|
||||||
>
|
|
||||||
> > If all are NOT_IN_COMMIT, it does fsync and continues.
|
|
||||||
>
|
|
||||||
> 1st edition:
|
|
||||||
> > If one or more are in START_LOG_WRITE, it waits until no one is in
|
|
||||||
> > START_LOG_WRITE. It then checks all WAIT_ON_FSYNC, and if it is the
|
|
||||||
> > lowest backend in WAIT_ON_FSYNC, marks all others with its backend
|
|
||||||
> > number, and does fsync. It then clears all backends with its number to
|
|
||||||
> > NOT_IN_COMMIT. Other backend will see they are not the lowest
|
|
||||||
> > WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
|
|
||||||
> > so they can then continue, knowing their data was synced.
|
|
||||||
>
|
|
||||||
> 2nd edition:
|
|
||||||
> > I have another idea. If a backend gets to the point that it needs
|
|
||||||
> > fsync, and there is another backend in START_LOG_WRITE, it can go to an
|
|
||||||
> > interuptable sleep, knowing another backend will perform the fsync and
|
|
||||||
> > wake it up. Therefore, there is no busy-wait or timed sleep.
|
|
||||||
> >
|
|
||||||
> > Of course, a backend must set its status to WAIT_ON_FSYNC to avoid a
|
|
||||||
> > race condition.
|
|
||||||
>
|
|
||||||
> The 2nd edition is much better. But I'm not sure do we really need in
|
|
||||||
> these per-backend bytes in shmem. Why not just have some counters?
|
|
||||||
> We can use a semaphore to wake-up all waiters at once.
|
|
||||||
|
|
||||||
Yes, that is much better and clearer. My idea was just to say, "if no
|
|
||||||
one is entering commit phase, do the commit. If someone else is coming,
|
|
||||||
sleep and wait for them to do the fsync and wake me up with a singal."
|
|
||||||
|
|
||||||
>
|
|
||||||
> > This allows a single backend not to sleep, and allows multiple backends
|
|
||||||
> > to bunch up only when they are all about to commit.
|
|
||||||
> >
|
|
||||||
> > The reason backend numbers are written is so other backends entering the
|
|
||||||
> > commit code will not interfere with the backends performing fsync.
|
|
||||||
>
|
|
||||||
> Being waked-up backend can check what's written/fsynced by calling XLogFlush.
|
|
||||||
|
|
||||||
Seems that may not be needed anymore with a counter. The only issue is
|
|
||||||
that other backends may enter commit while fsync() is happening. The
|
|
||||||
process that did the fsync must be sure to wake up only the backends
|
|
||||||
that were waiting for it, and not other backends that may be also be
|
|
||||||
doing fsync as a group while the first fsync was happening. I leave
|
|
||||||
those details to people more experienced. :-)
|
|
||||||
|
|
||||||
I am just glad people liked my idea.
|
|
||||||
|
|
||||||
--
|
|
||||||
Bruce Momjian | http://candle.pha.pa.us
|
|
||||||
pgman@candle.pha.pa.us | (610) 853-3000
|
|
||||||
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
||||||
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
@ -1,102 +0,0 @@
|
|||||||
From owner-pgsql-hackers@hub.org Mon May 11 11:31:09 1998
|
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
|
|
||||||
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03006
|
|
||||||
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:31:07 -0400 (EDT)
|
|
||||||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.17 $) with ESMTP id LAA01663 for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:24:42 -0400 (EDT)
|
|
||||||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA21841; Mon, 11 May 1998 11:15:25 -0400 (EDT)
|
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:15:12 +0000 (EDT)
|
|
||||||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA21683 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:15:09 -0400 (EDT)
|
|
||||||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA21451 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:15:03 -0400 (EDT)
|
|
||||||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
|
|
||||||
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA24915;
|
|
||||||
Mon, 11 May 1998 11:14:43 -0400 (EDT)
|
|
||||||
To: Brett McCormick <brett@work.chicken.org>
|
|
||||||
cc: hackers@postgreSQL.org
|
|
||||||
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
|
|
||||||
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
|
|
||||||
<13655.4384.345723.466046@abraxas.scene.com>
|
|
||||||
Date: Mon, 11 May 1998 11:14:43 -0400
|
|
||||||
Message-ID: <24913.894899683@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Sender: owner-pgsql-hackers@hub.org
|
|
||||||
Precedence: bulk
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Brett McCormick <brett@work.chicken.org> writes:
|
|
||||||
> same way that the current network socket is passed -- through an execv
|
|
||||||
> argument. hopefully, however, the non-execv()ing fork will be in 6.4.
|
|
||||||
|
|
||||||
Um, you missed the point, Brett. David was hoping to transfer a client
|
|
||||||
connection from the postmaster to an *already existing* backend process.
|
|
||||||
Fork, with or without exec, solves the problem for a backend that's
|
|
||||||
started after the postmaster has accepted the client socket.
|
|
||||||
|
|
||||||
This does lead to a different line of thought, however. Pre-started
|
|
||||||
backends would have access to the "master" connection socket on which
|
|
||||||
the postmaster listens for client connections, right? Suppose that we
|
|
||||||
fire the postmaster as postmaster, and demote it to being simply a
|
|
||||||
manufacturer of new backend processes as old ones get used up. Have
|
|
||||||
one of the idle backend processes be the one doing the accept() on the
|
|
||||||
master socket. Once it has a client connection, it performs the
|
|
||||||
authentication handshake and then starts serving the client (or just
|
|
||||||
quits if authentication fails). Meanwhile the next idle backend process
|
|
||||||
has executed accept() on the master socket and is waiting for the next
|
|
||||||
client; and shortly the postmaster/factory/whateverwecallitnow notices
|
|
||||||
that it needs to start another backend to add to the idle-backend pool.
|
|
||||||
|
|
||||||
This'd probably need some interlocking among the backends. I have no
|
|
||||||
idea whether it'd be safe to have all the idle backends trying to
|
|
||||||
do accept() on the master socket simultaneously, but it sounds risky.
|
|
||||||
Better to use a mutex so that only one gets to do it while the others
|
|
||||||
sleep.
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
|
|
||||||
From owner-pgsql-hackers@hub.org Mon May 11 11:35:55 1998
|
|
||||||
Received: from hub.org (hub.org [209.47.148.200])
|
|
||||||
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03043
|
|
||||||
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:35:53 -0400 (EDT)
|
|
||||||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA23494; Mon, 11 May 1998 11:27:10 -0400 (EDT)
|
|
||||||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:27:02 +0000 (EDT)
|
|
||||||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA23473 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:27:01 -0400 (EDT)
|
|
||||||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA23462 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:26:56 -0400 (EDT)
|
|
||||||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
|
|
||||||
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA25006;
|
|
||||||
Mon, 11 May 1998 11:26:44 -0400 (EDT)
|
|
||||||
To: Brett McCormick <brett@work.chicken.org>
|
|
||||||
cc: hackers@postgreSQL.org
|
|
||||||
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
|
|
||||||
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
|
|
||||||
<13655.4384.345723.466046@abraxas.scene.com>
|
|
||||||
Date: Mon, 11 May 1998 11:26:44 -0400
|
|
||||||
Message-ID: <25004.894900404@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Sender: owner-pgsql-hackers@hub.org
|
|
||||||
Precedence: bulk
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Meanwhile, *I* missed the point about Brett's second comment :-(
|
|
||||||
|
|
||||||
Brett McCormick <brett@work.chicken.org> writes:
|
|
||||||
> There will have to be some sort of arg parsing in any case,
|
|
||||||
> considering that you can pass configurable arguments to the backend..
|
|
||||||
|
|
||||||
If we do the sort of change David and I were just discussing, then the
|
|
||||||
pre-spawned backend would become responsible for parsing and dealing
|
|
||||||
with the PGOPTIONS portion of the client's connection request message.
|
|
||||||
That's just part of shifting the authentication handshake code from
|
|
||||||
postmaster to backend, so it shouldn't be too hard.
|
|
||||||
|
|
||||||
BUT: the whole point is to be able to initialize the backend before it
|
|
||||||
is connected to a client. How much of the expensive backend startup
|
|
||||||
work depends on having the client connection options available?
|
|
||||||
Any work that needs to know the options will have to wait until after
|
|
||||||
the client connects. If that means most of the startup work can't
|
|
||||||
happen in advance anyway, then we're out of luck; a pre-started backend
|
|
||||||
won't save enough time to be worth the effort. (Unless we are willing
|
|
||||||
to eliminate or redefine the troublesome options...)
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
|
|
@ -1319,3 +1319,105 @@ DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267
|
|||||||
---------------------------(end of broadcast)---------------------------
|
---------------------------(end of broadcast)---------------------------
|
||||||
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
||||||
|
|
||||||
|
From owner-pgsql-hackers@hub.org Mon May 11 11:31:09 1998
|
||||||
|
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
|
||||||
|
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03006
|
||||||
|
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:31:07 -0400 (EDT)
|
||||||
|
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.17 $) with ESMTP id LAA01663 for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:24:42 -0400 (EDT)
|
||||||
|
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA21841; Mon, 11 May 1998 11:15:25 -0400 (EDT)
|
||||||
|
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:15:12 +0000 (EDT)
|
||||||
|
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA21683 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:15:09 -0400 (EDT)
|
||||||
|
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA21451 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:15:03 -0400 (EDT)
|
||||||
|
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
|
||||||
|
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA24915;
|
||||||
|
Mon, 11 May 1998 11:14:43 -0400 (EDT)
|
||||||
|
To: Brett McCormick <brett@work.chicken.org>
|
||||||
|
cc: hackers@postgreSQL.org
|
||||||
|
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
|
||||||
|
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
|
||||||
|
<13655.4384.345723.466046@abraxas.scene.com>
|
||||||
|
Date: Mon, 11 May 1998 11:14:43 -0400
|
||||||
|
Message-ID: <24913.894899683@sss.pgh.pa.us>
|
||||||
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
||||||
|
Sender: owner-pgsql-hackers@hub.org
|
||||||
|
Precedence: bulk
|
||||||
|
Status: RO
|
||||||
|
|
||||||
|
Brett McCormick <brett@work.chicken.org> writes:
|
||||||
|
> same way that the current network socket is passed -- through an execv
|
||||||
|
> argument. hopefully, however, the non-execv()ing fork will be in 6.4.
|
||||||
|
|
||||||
|
Um, you missed the point, Brett. David was hoping to transfer a client
|
||||||
|
connection from the postmaster to an *already existing* backend process.
|
||||||
|
Fork, with or without exec, solves the problem for a backend that's
|
||||||
|
started after the postmaster has accepted the client socket.
|
||||||
|
|
||||||
|
This does lead to a different line of thought, however. Pre-started
|
||||||
|
backends would have access to the "master" connection socket on which
|
||||||
|
the postmaster listens for client connections, right? Suppose that we
|
||||||
|
fire the postmaster as postmaster, and demote it to being simply a
|
||||||
|
manufacturer of new backend processes as old ones get used up. Have
|
||||||
|
one of the idle backend processes be the one doing the accept() on the
|
||||||
|
master socket. Once it has a client connection, it performs the
|
||||||
|
authentication handshake and then starts serving the client (or just
|
||||||
|
quits if authentication fails). Meanwhile the next idle backend process
|
||||||
|
has executed accept() on the master socket and is waiting for the next
|
||||||
|
client; and shortly the postmaster/factory/whateverwecallitnow notices
|
||||||
|
that it needs to start another backend to add to the idle-backend pool.
|
||||||
|
|
||||||
|
This'd probably need some interlocking among the backends. I have no
|
||||||
|
idea whether it'd be safe to have all the idle backends trying to
|
||||||
|
do accept() on the master socket simultaneously, but it sounds risky.
|
||||||
|
Better to use a mutex so that only one gets to do it while the others
|
||||||
|
sleep.
|
||||||
|
|
||||||
|
regards, tom lane
|
||||||
|
|
||||||
|
|
||||||
|
From owner-pgsql-hackers@hub.org Mon May 11 11:35:55 1998
|
||||||
|
Received: from hub.org (hub.org [209.47.148.200])
|
||||||
|
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03043
|
||||||
|
for <maillist@candle.pha.pa.us>; Mon, 11 May 1998 11:35:53 -0400 (EDT)
|
||||||
|
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id LAA23494; Mon, 11 May 1998 11:27:10 -0400 (EDT)
|
||||||
|
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 11 May 1998 11:27:02 +0000 (EDT)
|
||||||
|
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id LAA23473 for pgsql-hackers-outgoing; Mon, 11 May 1998 11:27:01 -0400 (EDT)
|
||||||
|
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [206.210.65.6]) by hub.org (8.8.8/8.7.5) with ESMTP id LAA23462 for <hackers@postgreSQL.org>; Mon, 11 May 1998 11:26:56 -0400 (EDT)
|
||||||
|
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
|
||||||
|
by sss.sss.pgh.pa.us (8.8.5/8.8.5) with ESMTP id LAA25006;
|
||||||
|
Mon, 11 May 1998 11:26:44 -0400 (EDT)
|
||||||
|
To: Brett McCormick <brett@work.chicken.org>
|
||||||
|
cc: hackers@postgreSQL.org
|
||||||
|
Subject: Re: [HACKERS] Re: [PATCHES] Try again: S_LOCK reduced contentionh]
|
||||||
|
In-reply-to: Your message of Mon, 11 May 1998 07:57:23 -0700 (PDT)
|
||||||
|
<13655.4384.345723.466046@abraxas.scene.com>
|
||||||
|
Date: Mon, 11 May 1998 11:26:44 -0400
|
||||||
|
Message-ID: <25004.894900404@sss.pgh.pa.us>
|
||||||
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
||||||
|
Sender: owner-pgsql-hackers@hub.org
|
||||||
|
Precedence: bulk
|
||||||
|
Status: RO
|
||||||
|
|
||||||
|
Meanwhile, *I* missed the point about Brett's second comment :-(
|
||||||
|
|
||||||
|
Brett McCormick <brett@work.chicken.org> writes:
|
||||||
|
> There will have to be some sort of arg parsing in any case,
|
||||||
|
> considering that you can pass configurable arguments to the backend..
|
||||||
|
|
||||||
|
If we do the sort of change David and I were just discussing, then the
|
||||||
|
pre-spawned backend would become responsible for parsing and dealing
|
||||||
|
with the PGOPTIONS portion of the client's connection request message.
|
||||||
|
That's just part of shifting the authentication handshake code from
|
||||||
|
postmaster to backend, so it shouldn't be too hard.
|
||||||
|
|
||||||
|
BUT: the whole point is to be able to initialize the backend before it
|
||||||
|
is connected to a client. How much of the expensive backend startup
|
||||||
|
work depends on having the client connection options available?
|
||||||
|
Any work that needs to know the options will have to wait until after
|
||||||
|
the client connects. If that means most of the startup work can't
|
||||||
|
happen in advance anyway, then we're out of luck; a pre-started backend
|
||||||
|
won't save enough time to be worth the effort. (Unless we are willing
|
||||||
|
to eliminate or redefine the troublesome options...)
|
||||||
|
|
||||||
|
regards, tom lane
|
||||||
|
|
||||||
|
|
||||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,916 +0,0 @@
|
|||||||
From pgsql-hackers-owner+M1833@hub.org Sat May 13 22:49:26 2000
|
|
||||||
Received: from news.tht.net (news.hub.org [216.126.91.242])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07394
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 13 May 2000 22:49:24 -0400 (EDT)
|
|
||||||
Received: from hub.org (majordom@hub.org [216.126.84.1])
|
|
||||||
by news.tht.net (8.9.3/8.9.3) with ESMTP id WAB99859;
|
|
||||||
Sat, 13 May 2000 22:44:15 -0400 (EDT)
|
|
||||||
(envelope-from pgsql-hackers-owner+M1833@hub.org)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
|
|
||||||
by hub.org (8.9.3/8.9.3) with ESMTP id WAA51058
|
|
||||||
for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:41:16 -0400 (EDT)
|
|
||||||
(envelope-from tgl@sss.pgh.pa.us)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA18343
|
|
||||||
for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:40:38 -0400 (EDT)
|
|
||||||
To: pgsql-hackers@postgresql.org
|
|
||||||
Subject: [HACKERS] Proposal for fixing numeric type-resolution issues
|
|
||||||
Date: Sat, 13 May 2000 22:40:38 -0400
|
|
||||||
Message-ID: <18340.958272038@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
X-Mailing-List: pgsql-hackers@postgresql.org
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@hub.org
|
|
||||||
Status: ORr
|
|
||||||
|
|
||||||
We've got a collection of problems that are related to the parser's
|
|
||||||
inability to make good type-resolution choices for numeric constants.
|
|
||||||
In some cases you get a hard error; for example "NumericVar + 4.4"
|
|
||||||
yields
|
|
||||||
ERROR: Unable to identify an operator '+' for types 'numeric' and 'float8'
|
|
||||||
You will have to retype this query using an explicit cast
|
|
||||||
because "4.4" is initially typed as float8 and the system can't figure
|
|
||||||
out whether to use numeric or float8 addition. A more subtle problem
|
|
||||||
is that a query like "... WHERE Int2Var < 42" is unable to make use of
|
|
||||||
an index on the int2 column: 42 is resolved as int4, so the operator
|
|
||||||
is int24lt, which works but is not in the opclass of an int2 index.
|
|
||||||
|
|
||||||
Here is a proposal for fixing these problems. I think we could get this
|
|
||||||
done for 7.1 if people like it.
|
|
||||||
|
|
||||||
The basic problem is that there's not enough smarts in the type resolver
|
|
||||||
about the interrelationships of the numeric datatypes. All it has is
|
|
||||||
a concept of a most-preferred type within the category of numeric types.
|
|
||||||
(We are abusing the most-preferred-type mechanism, BTW, because both
|
|
||||||
FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
|
|
||||||
category! This is in fact why the resolver can't make a choice for
|
|
||||||
"numeric+float8".) We need more intelligence than that.
|
|
||||||
|
|
||||||
I propose that we set up a strictly-ordered hierarchy of numeric
|
|
||||||
datatypes, running from least preferred to most preferred:
|
|
||||||
int2, int4, int8, numeric, float4, float8.
|
|
||||||
Rather than simply considering coercions to the most-preferred type,
|
|
||||||
the type resolver should use the following rules:
|
|
||||||
|
|
||||||
1. No value will be down-converted (eg int4 to int2) except by an
|
|
||||||
explicit conversion.
|
|
||||||
|
|
||||||
2. If there is not an exact matching operator, numeric values will be
|
|
||||||
up-converted to the highest numeric datatype present among the operator
|
|
||||||
or function's arguments. For example, given "int2 + int8" we'd up-
|
|
||||||
convert the int2 to int8 and apply int8 addition.
|
|
||||||
|
|
||||||
The final piece of the puzzle is that the type initially assigned to
|
|
||||||
an undecorated numeric constant should be NUMERIC if it contains a
|
|
||||||
decimal point or exponent, and otherwise the smallest of int2, int4,
|
|
||||||
int8, NUMERIC that will represent it. This is a considerable change
|
|
||||||
from the current lexer behavior, where you get either int4 or float8.
|
|
||||||
|
|
||||||
For example, given "NumericVar + 4.4", the constant 4.4 will initially
|
|
||||||
be assigned type NUMERIC, we will resolve the operator as numeric plus,
|
|
||||||
and everything's fine. Given "Float8Var + 4.4", the constant is still
|
|
||||||
initially numeric, but will be up-converted to float8 so that float8
|
|
||||||
addition can be used. The end result is the same as in traditional
|
|
||||||
Postgres: you get float8 addition. Given "Int2Var < 42", the constant
|
|
||||||
is initially typed as int2, since it fits, and we end up selecting
|
|
||||||
int2lt, thereby allowing use of an int2 index. (On the other hand,
|
|
||||||
given "Int2Var < 100000", we'd end up using int4lt, which is correct
|
|
||||||
to avoid overflow.)
|
|
||||||
|
|
||||||
A couple of crucial subtleties here:
|
|
||||||
|
|
||||||
1. We are assuming that the parser or optimizer will constant-fold
|
|
||||||
any conversion functions that are introduced. Thus, in the
|
|
||||||
"Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
|
|
||||||
time execution begins, so there's no performance loss.
|
|
||||||
|
|
||||||
2. We cannot lose precision by initially representing a constant as
|
|
||||||
numeric and later converting it to float. Nor can we exceed NUMERIC's
|
|
||||||
range (the default 1000-digit limit is more than the range of IEEE
|
|
||||||
float8 data). It would not work as well to start out by representing
|
|
||||||
a constant as float and then converting it to numeric.
|
|
||||||
|
|
||||||
Presently, the pg_proc and pg_operator tables contain a pretty fair
|
|
||||||
collection of cross-datatype numeric operators, such as int24lt,
|
|
||||||
float48pl, etc. We could perhaps leave these in, but I believe that
|
|
||||||
it is better to remove them. For example, if int42lt is left in place,
|
|
||||||
then it would capture cases like "Int4Var < 42", whereas we need that
|
|
||||||
to be translated to int4lt so that an int4 index can be used. Removing
|
|
||||||
these operators will eliminate some code bloat and system-catalog bloat
|
|
||||||
to boot.
|
|
||||||
|
|
||||||
As far as I can tell, this proposal is almost compatible with the rules
|
|
||||||
given in SQL92: in particular, SQL92 specifies that an operator having
|
|
||||||
both "approximate numeric" (float) and "exact numeric" (int or numeric)
|
|
||||||
inputs should deliver an approximate-numeric result. I propose
|
|
||||||
deviating from SQL92 in a single respect: SQL92 specifies that a
|
|
||||||
constant containing an exponent (eg 1.2E34) is approximate numeric,
|
|
||||||
which implies that the result of an operator using it is approximate
|
|
||||||
even if the other operand is exact. I believe it's better to treat
|
|
||||||
such a constant as exact (ie, type NUMERIC) and only convert it to
|
|
||||||
float if the other operand is float. Without doing that, an assignment
|
|
||||||
like
|
|
||||||
UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
|
|
||||||
will not work as desired because the constant will be prematurely
|
|
||||||
coerced to float, causing precision loss.
|
|
||||||
|
|
||||||
Comments?
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
From tgl@sss.pgh.pa.us Sun May 14 17:30:56 2000
|
|
||||||
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05808
|
|
||||||
for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:30:52 -0400 (EDT)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.4 $) with ESMTP id RAA16657 for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:29:52 -0400 (EDT)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA20914;
|
|
||||||
Sun, 14 May 2000 17:29:30 -0400 (EDT)
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
|
|
||||||
Subject: Re: [HACKERS] type conversion discussion
|
|
||||||
In-reply-to: <200005141950.PAA04636@candle.pha.pa.us>
|
|
||||||
References: <200005141950.PAA04636@candle.pha.pa.us>
|
|
||||||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
message dated "Sun, 14 May 2000 15:50:20 -0400"
|
|
||||||
Date: Sun, 14 May 2000 17:29:30 -0400
|
|
||||||
Message-ID: <20911.958339770@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
|
|
||||||
> As some point, it seems we need to get all the PostgreSQL minds together
|
|
||||||
> to discuss type conversion issues. These problems continue to come up
|
|
||||||
> from release to release. We are getting better, but it seems a full
|
|
||||||
> discussion could help solidify our strategy.
|
|
||||||
|
|
||||||
OK, here are a few things that bug me about the current type-resolution
|
|
||||||
code:
|
|
||||||
|
|
||||||
1. Poor choice of type to attribute to numeric literals. (A possible
|
|
||||||
solution is sketched in my earlier message, but do we need similar
|
|
||||||
mechanisms for other type categories?)
|
|
||||||
|
|
||||||
2. Tensions between treating string literals as "unknown" type and
|
|
||||||
as "text" type, per this thread so far.
|
|
||||||
|
|
||||||
3. IS_BINARY_COMPATIBLE seems like a bogus concept. Do we really want a
|
|
||||||
fully symmetrical ring of types in each group? I'd prefer to see a
|
|
||||||
one-way equivalence, which allows eg. OID to be silently converted
|
|
||||||
to INT4, but *not* vice versa (except perhaps by specific user cast).
|
|
||||||
This'd be more like a traditional "is-a" or inheritance relationship
|
|
||||||
between datatypes, which has well-understood semantics.
|
|
||||||
|
|
||||||
4. I'm also concerned that the behavior of IS_BINARY_COMPATIBLE isn't
|
|
||||||
very predictable because it will happily go either way. For example,
|
|
||||||
if I do
|
|
||||||
select * from pg_class where oid = 1234;
|
|
||||||
it's unclear whether I will get an oideq or an int4eq operator ---
|
|
||||||
and that's a rather critical point since only one of them can exploit
|
|
||||||
an index on the oid column. Currently, there is some klugery in the
|
|
||||||
planner that works around this by overriding the parser's choice of
|
|
||||||
operator to substitute one that is compatible with an available index.
|
|
||||||
That's a pretty ugly solution ... I'm not sure I know a better one,
|
|
||||||
but as long as we're discussing type resolution issues ...
|
|
||||||
|
|
||||||
5. Lack of extensibility. There's way too much knowledge hard-wired
|
|
||||||
into the parser about type categories, preferred types, binary
|
|
||||||
compatibility, etc. All of it falls down when faced with
|
|
||||||
user-defined datatypes. If we do something like I suggested with
|
|
||||||
a hardwired hierarchy of numeric datatypes, it'll get even worse.
|
|
||||||
All this stuff ought to be driven off fields in pg_type rather than
|
|
||||||
be hardwired into the code, so that the same concepts can be extended
|
|
||||||
to user-defined types.
|
|
||||||
|
|
||||||
I don't have worked-out proposals for any of these but the first,
|
|
||||||
but they've all been bothering me for a while.
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
From tgl@sss.pgh.pa.us Sun May 14 21:02:31 2000
|
|
||||||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07700
|
|
||||||
for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 21:02:28 -0400 (EDT)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA21261;
|
|
||||||
Sun, 14 May 2000 21:03:17 -0400 (EDT)
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
|
|
||||||
Subject: Re: [HACKERS] type conversion discussion
|
|
||||||
In-reply-to: <20911.958339770@sss.pgh.pa.us>
|
|
||||||
References: <200005141950.PAA04636@candle.pha.pa.us> <20911.958339770@sss.pgh.pa.us>
|
|
||||||
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
message dated "Sun, 14 May 2000 17:29:30 -0400"
|
|
||||||
Date: Sun, 14 May 2000 21:03:17 -0400
|
|
||||||
Message-ID: <21258.958352597@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
Here are the results of some further thoughts about type-conversion
|
|
||||||
issues. This is not a complete proposal yet, but a sketch of an
|
|
||||||
approach that might solve several of the gripes in my previous proposal.
|
|
||||||
|
|
||||||
While thinking about this, I realized that my numeric-types proposal
|
|
||||||
of yesterday would break at least a few cases that work nicely now.
|
|
||||||
For example, I frequently do things like
|
|
||||||
select * from pg_class where oid = 1234;
|
|
||||||
whilst poking around in system tables and querytree dumps. If that
|
|
||||||
constant is initially resolved as int2, as I suggested yesterday,
|
|
||||||
then we have "oid = int2" for which there is no operator. To succeed
|
|
||||||
we must decide to promote the constant to int4 --- but with no int4
|
|
||||||
visible among the operands of the "=", it will not work to just "promote
|
|
||||||
numerics to the highest type seen in the operands" as I suggested
|
|
||||||
yesterday. So there has to be some more interaction in there.
|
|
||||||
|
|
||||||
Anyway, I was complaining about the looseness of the concept of
|
|
||||||
binary-compatible types and the fact that the parser's type conversion
|
|
||||||
knowledge is mostly hardwired. These might be resolved by generalizing
|
|
||||||
the numeric type hierarchy idea into a "type promotion lattice", which
|
|
||||||
would work like this:
|
|
||||||
|
|
||||||
* Add a "typpromote" column to pg_type, which contains either zero or
|
|
||||||
the OID of another type that the parser is allowed to promote this
|
|
||||||
type to when searching for usable functions/operators. For example,
|
|
||||||
my numeric-types hierarchy of yesterday would be expressed by making
|
|
||||||
int2 promote to int4, int4 to int8, int8 to numeric, numeric to
|
|
||||||
float4, and float4 to float8. The promotion idea also replaces the
|
|
||||||
current concept of binary-compatible types: for example, OID would
|
|
||||||
link to int4 and varchar would link to text (but not vice versa!).
|
|
||||||
|
|
||||||
* Also add a "typpromotebin" boolean column to pg_type, which contains
|
|
||||||
't' if the type conversion indicated by typpromote is "free", ie,
|
|
||||||
no conversion function need be executed before regarding a value as
|
|
||||||
belonging to the promoted type. This distinguishes binary-compatible
|
|
||||||
from non-binary-compatible cases. If "typpromotebin" is 'f' and the
|
|
||||||
parser decides it needs to apply the conversion, then it has to look
|
|
||||||
up the appropriate conversion function in pg_proc. (More about this
|
|
||||||
below.)
|
|
||||||
|
|
||||||
Now, if the parser fails to find an exact match for a given function
|
|
||||||
or operator name and the exact set of input data types, it proceeds by
|
|
||||||
chasing up the promotion chains for the input data types and trying to
|
|
||||||
locate a set of types for which there is a matching function/operator.
|
|
||||||
If there are multiple possibilities, we choose the one which is the
|
|
||||||
"least promoted" by some yet-to-be-determined metric. (This metric
|
|
||||||
would probably favor "free" conversions over non-free ones, but other
|
|
||||||
than that I'm not quite sure how it should work. The metric would
|
|
||||||
replace a whole bunch of ad-hoc heuristics that are currently applied
|
|
||||||
in the type resolver, so even if it seems rather ad-hoc it'd still be
|
|
||||||
cleaner than what we have ;-).)
|
|
||||||
|
|
||||||
In a situation like the "oid = int2" example above, this mechanism would
|
|
||||||
presumably settle on "int4 = int4" as being the least-promoted
|
|
||||||
equivalent operator. (It could not find "oid = oid" since there is
|
|
||||||
no promotion path from int2 to oid.) That looks bad since it isn't
|
|
||||||
compatible with an oidops index --- but I have a solution for that!
|
|
||||||
I don't think we need the oid opclass at all; why shouldn't indexes
|
|
||||||
on oid be expressed as int4 indexes to begin with? In general, if
|
|
||||||
two types are considered binary-equivalent under the old scheme, then
|
|
||||||
the one that is considered the subtype probably shouldn't have separate
|
|
||||||
index operators under this new scheme. Instead it should just rely on
|
|
||||||
the index operators of the promoted type.
|
|
||||||
|
|
||||||
The point of the proposed typpromotebin field is to save a pg_proc
|
|
||||||
lookup when trying to determine whether a particular promotion is "free"
|
|
||||||
or not. We could save even more lookups if we didn't store the boolean
|
|
||||||
but instead the actual OID of the conversion function, or zero if the
|
|
||||||
promotion is "free". The trouble with that is that it creates a
|
|
||||||
circularity problem when trying to define a new user type --- you can't
|
|
||||||
define the conversion function if its input type doesn't exist yet.
|
|
||||||
In any case, we want the parser to do a function lookup if we've
|
|
||||||
advanced more than one step in the promotion hierarchy: if we've decided
|
|
||||||
to promote int4 to float8 (which will be a four-step chain through int8,
|
|
||||||
numeric, float4) we sure want the thing to use a direct int4tofloat8
|
|
||||||
conversion function if available, not a chain of four conversion
|
|
||||||
functions. So on balance I think we want to look in pg_proc once we've
|
|
||||||
decided which conversion to perform. The only reason for having
|
|
||||||
typpromotebin is that the promotion metric will want to know which
|
|
||||||
conversions are free, and we don't want to have to do a lookup in
|
|
||||||
pg_proc for each alternative we consider, only the ones that are finally
|
|
||||||
selected to be used.
|
|
||||||
|
|
||||||
I can think of at least one special case that still isn't cleanly
|
|
||||||
handled under this scheme, and that is bpchar vs. varchar comparison.
|
|
||||||
Currently, we have
|
|
||||||
|
|
||||||
regression=# select 'a'::bpchar = 'a '::bpchar;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
t
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
This is correct since trailing blanks are insignificant in bpchar land,
|
|
||||||
so the two values should be considered equal. If we try
|
|
||||||
|
|
||||||
regression=# select 'a'::bpchar = 'a '::varchar;
|
|
||||||
ERROR: Unable to identify an operator '=' for types 'bpchar' and 'varchar'
|
|
||||||
You will have to retype this query using an explicit cast
|
|
||||||
|
|
||||||
which is pretty bogus but at least it saves the system from making some
|
|
||||||
random choice about whether bpchar or varchar comparison rules apply.
|
|
||||||
On the other hand,
|
|
||||||
|
|
||||||
regression=# select 'a'::bpchar = 'a '::text;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
f
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
Here the bpchar value has been promoted to text and then text comparison
|
|
||||||
(where trailing blanks *are* significant) is applied. I'm not sure that
|
|
||||||
we can really justify doing this in this case when we reject the bpchar
|
|
||||||
vs varchar case, but maybe someone wants to argue that that's correct.
|
|
||||||
|
|
||||||
The natural setup in my type-promotion scheme would be that both bpchar
|
|
||||||
and varchar link to 'text' as their promoted type. If we do nothing
|
|
||||||
special then text-style comparison would be used in a bpchar vs varchar
|
|
||||||
comparison, which is arguably wrong.
|
|
||||||
|
|
||||||
One way to deal with this without introducing kluges into the type
|
|
||||||
resolver is to provide a full set of bpchar vs text and text vs bpchar
|
|
||||||
operators, and make sure that the promotion metric is such that these
|
|
||||||
will be used in place of text vs text operators if they apply (which
|
|
||||||
should hold, I think, for any reasonable metric). This is probably
|
|
||||||
the only way to get the "right" behavior in any case --- I think that
|
|
||||||
the "right" behavior for such comparisons is to strip trailing blanks
|
|
||||||
from the bpchar side but not the text/varchar side. (I haven't checked
|
|
||||||
to see if SQL92 agrees, though.)
|
|
||||||
|
|
||||||
Another issue is how to fit resolution of "unknown" literals into this
|
|
||||||
scheme. We could probably continue to handle them more or less as we
|
|
||||||
do now, but they might complicate the promotion metric.
|
|
||||||
|
|
||||||
I am not clear yet on whether we'd still need the concept of "type
|
|
||||||
categories" as they presently exist in the resolver. It's possible
|
|
||||||
that we wouldn't, which would be a nice simplification. (If we do
|
|
||||||
still need them, we should have a column in pg_type that defines the
|
|
||||||
category of a type, instead of hard-wiring category assignments.)
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
From e99re41@DoCS.UU.SE Mon May 15 07:39:03 2000
|
|
||||||
Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA10251
|
|
||||||
for <pgman@candle.pha.pa.us>; Mon, 15 May 2000 07:39:01 -0400 (EDT)
|
|
||||||
Received: from Zebra.DoCS.UU.SE (e99re41@Zebra.DoCS.UU.SE [130.238.9.158])
|
|
||||||
by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id NAA10849;
|
|
||||||
Mon, 15 May 2000 13:39:45 +0200 (MET DST)
|
|
||||||
Received: from localhost (e99re41@localhost) by Zebra.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id NAA26523; Mon, 15 May 2000 13:39:44 +0200
|
|
||||||
X-Authentication-Warning: Zebra.DoCS.UU.SE: e99re41 owned process doing -bs
|
|
||||||
Date: Mon, 15 May 2000 13:39:44 +0200 (MET DST)
|
|
||||||
From: Peter Eisentraut <e99re41@DoCS.UU.SE>
|
|
||||||
Reply-To: Peter Eisentraut <peter_e@gmx.net>
|
|
||||||
To: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
Subject: Re: [HACKERS] type conversion discussion
|
|
||||||
In-Reply-To: <20911.958339770@sss.pgh.pa.us>
|
|
||||||
Message-ID: <Pine.GSO.4.02A.10005151309020.26399-100000@Zebra.DoCS.UU.SE>
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Type: TEXT/PLAIN; charset=iso-8859-1
|
|
||||||
Content-Transfer-Encoding: 8bit
|
|
||||||
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id HAA10251
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
On Sun, 14 May 2000, Tom Lane wrote:
|
|
||||||
|
|
||||||
> 1. Poor choice of type to attribute to numeric literals. (A possible
|
|
||||||
> solution is sketched in my earlier message, but do we need similar
|
|
||||||
> mechanisms for other type categories?)
|
|
||||||
|
|
||||||
I think your plan looks good for the numerical land. (I'll ponder the oid
|
|
||||||
issues in a second.) For other type categories, perhaps not. Should a line
|
|
||||||
be promoted to a polygon so you can check if it contains a point? Or a
|
|
||||||
polygon to a box? Higher dimensions? :-)
|
|
||||||
|
|
||||||
|
|
||||||
> 2. Tensions between treating string literals as "unknown" type and
|
|
||||||
> as "text" type, per this thread so far.
|
|
||||||
|
|
||||||
Yes, while we're at it, let's look at this in detail. I claim that
|
|
||||||
something of the form 'xxx' should always be text (or char or whatever),
|
|
||||||
period. Let's consider the cases were this could potentially clash with
|
|
||||||
the current behaviour:
|
|
||||||
|
|
||||||
a) The target type is unambiguously clear, e.g., UPDATE ... SET. Then you
|
|
||||||
cast text to the target type. The effect is identical.
|
|
||||||
|
|
||||||
b) The target type is completely unspecified, e.g. CREATE TABLE AS SELECT
|
|
||||||
'xxx'; This will currently create an "unknown" column. It should arguably
|
|
||||||
create a "text" column.
|
|
||||||
|
|
||||||
Function argument resolution:
|
|
||||||
|
|
||||||
c) There is only one function and it has a "text" argument. No-brainer.
|
|
||||||
|
|
||||||
d) There is only one function and it has an argument other than text. Try
|
|
||||||
to cast text to that type. (This is what's done in general, isn't it?)
|
|
||||||
|
|
||||||
e) The function is overloaded for many types, amongst which is text. Then
|
|
||||||
call the text version. I believe this would currently fail, which I'd
|
|
||||||
consider a deficiency.
|
|
||||||
|
|
||||||
f) The function is overloaded for many types, none of which is text. In
|
|
||||||
that case you have to cast anyway, so you don't lose anything.
|
|
||||||
|
|
||||||
On thing to also keep in mind regarding required casting for (b) and (f)
|
|
||||||
is that SQL never allowed literals of "fancy" types (e.g., DATE) to have
|
|
||||||
undecorated 'yyyy-mm-dd' constants, you always have to say DATE
|
|
||||||
'yyyy-mm-dd'. What Postgres allows is a convencience where DATE would be
|
|
||||||
obvious or implied. In the end it's a win-win situation: you tell the
|
|
||||||
system what you want, and your code is clearer.
|
|
||||||
|
|
||||||
|
|
||||||
> 3. IS_BINARY_COMPATIBLE seems like a bogus concept.
|
|
||||||
|
|
||||||
At least it's bogus when used for types which are not actually binary
|
|
||||||
compatible, e.g. int4 and oid. The result of the current implementation is
|
|
||||||
that you can perfectly happily insert and retrieve negative numbers from
|
|
||||||
oid fields.
|
|
||||||
|
|
||||||
I'm not so sure about the value of this particular equivalency anyway.
|
|
||||||
AFAICS the only functions that make sense for oids are comparisons (incl.
|
|
||||||
min, max), adding integers to them, subtracting one oid from another.
|
|
||||||
Silent mangling with int4 means that you can multiply them, square them,
|
|
||||||
add floating point numbers to them (doesn't really work in practice
|
|
||||||
though), all things that have no business with oids.
|
|
||||||
|
|
||||||
I'd say define the operators that are useful for oids explicitly for oids
|
|
||||||
and require casts for all others, so the users know what they're doing.
|
|
||||||
The fact that an oid is also a number should be an implementation detail.
|
|
||||||
|
|
||||||
In my mind oids are like pointers in C. Indiscriminate mangling of
|
|
||||||
pointers and integers in C has long been dismissed as questionable coding.
|
|
||||||
|
|
||||||
|
|
||||||
Of course I'd be very willing to consider counterexamples to these
|
|
||||||
theories ...
|
|
||||||
|
|
||||||
--
|
|
||||||
Peter Eisentraut Sernanders väg 10:115
|
|
||||||
peter_e@gmx.net 75262 Uppsala
|
|
||||||
http://yi.org/peter-e/ Sweden
|
|
||||||
|
|
||||||
|
|
||||||
From tgl@sss.pgh.pa.us Tue Jun 13 04:58:20 2000
|
|
||||||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24281
|
|
||||||
for <pgman@candle.pha.pa.us>; Tue, 13 Jun 2000 03:58:18 -0400 (EDT)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA02571;
|
|
||||||
Tue, 13 Jun 2000 03:58:43 -0400 (EDT)
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: pgsql-hackers@postgresql.org
|
|
||||||
Subject: Re: [HACKERS] Proposal for fixing numeric type-resolution issues
|
|
||||||
In-reply-to: <200006130741.DAA23502@candle.pha.pa.us>
|
|
||||||
References: <200006130741.DAA23502@candle.pha.pa.us>
|
|
||||||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
message dated "Tue, 13 Jun 2000 03:41:56 -0400"
|
|
||||||
Date: Tue, 13 Jun 2000 03:58:43 -0400
|
|
||||||
Message-ID: <2568.960883123@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
|
|
||||||
> Again, anything to add to the TODO here?
|
|
||||||
|
|
||||||
IIRC, there was some unhappiness with the proposal you quote, so I'm
|
|
||||||
not sure we've quite agreed what to do... but clearly something must
|
|
||||||
be done.
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
|
|
||||||
>> We've got a collection of problems that are related to the parser's
|
|
||||||
>> inability to make good type-resolution choices for numeric constants.
|
|
||||||
>> In some cases you get a hard error; for example "NumericVar + 4.4"
|
|
||||||
>> yields
|
|
||||||
>> ERROR: Unable to identify an operator '+' for types 'numeric' and 'float8'
|
|
||||||
>> You will have to retype this query using an explicit cast
|
|
||||||
>> because "4.4" is initially typed as float8 and the system can't figure
|
|
||||||
>> out whether to use numeric or float8 addition. A more subtle problem
|
|
||||||
>> is that a query like "... WHERE Int2Var < 42" is unable to make use of
|
|
||||||
>> an index on the int2 column: 42 is resolved as int4, so the operator
|
|
||||||
>> is int24lt, which works but is not in the opclass of an int2 index.
|
|
||||||
>>
|
|
||||||
>> Here is a proposal for fixing these problems. I think we could get this
|
|
||||||
>> done for 7.1 if people like it.
|
|
||||||
>>
|
|
||||||
>> The basic problem is that there's not enough smarts in the type resolver
|
|
||||||
>> about the interrelationships of the numeric datatypes. All it has is
|
|
||||||
>> a concept of a most-preferred type within the category of numeric types.
|
|
||||||
>> (We are abusing the most-preferred-type mechanism, BTW, because both
|
|
||||||
>> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
|
|
||||||
>> category! This is in fact why the resolver can't make a choice for
|
|
||||||
>> "numeric+float8".) We need more intelligence than that.
|
|
||||||
>>
|
|
||||||
>> I propose that we set up a strictly-ordered hierarchy of numeric
|
|
||||||
>> datatypes, running from least preferred to most preferred:
|
|
||||||
>> int2, int4, int8, numeric, float4, float8.
|
|
||||||
>> Rather than simply considering coercions to the most-preferred type,
|
|
||||||
>> the type resolver should use the following rules:
|
|
||||||
>>
|
|
||||||
>> 1. No value will be down-converted (eg int4 to int2) except by an
|
|
||||||
>> explicit conversion.
|
|
||||||
>>
|
|
||||||
>> 2. If there is not an exact matching operator, numeric values will be
|
|
||||||
>> up-converted to the highest numeric datatype present among the operator
|
|
||||||
>> or function's arguments. For example, given "int2 + int8" we'd up-
|
|
||||||
>> convert the int2 to int8 and apply int8 addition.
|
|
||||||
>>
|
|
||||||
>> The final piece of the puzzle is that the type initially assigned to
|
|
||||||
>> an undecorated numeric constant should be NUMERIC if it contains a
|
|
||||||
>> decimal point or exponent, and otherwise the smallest of int2, int4,
|
|
||||||
>> int8, NUMERIC that will represent it. This is a considerable change
|
|
||||||
>> from the current lexer behavior, where you get either int4 or float8.
|
|
||||||
>>
|
|
||||||
>> For example, given "NumericVar + 4.4", the constant 4.4 will initially
|
|
||||||
>> be assigned type NUMERIC, we will resolve the operator as numeric plus,
|
|
||||||
>> and everything's fine. Given "Float8Var + 4.4", the constant is still
|
|
||||||
>> initially numeric, but will be up-converted to float8 so that float8
|
|
||||||
>> addition can be used. The end result is the same as in traditional
|
|
||||||
>> Postgres: you get float8 addition. Given "Int2Var < 42", the constant
|
|
||||||
>> is initially typed as int2, since it fits, and we end up selecting
|
|
||||||
>> int2lt, thereby allowing use of an int2 index. (On the other hand,
|
|
||||||
>> given "Int2Var < 100000", we'd end up using int4lt, which is correct
|
|
||||||
>> to avoid overflow.)
|
|
||||||
>>
|
|
||||||
>> A couple of crucial subtleties here:
|
|
||||||
>>
|
|
||||||
>> 1. We are assuming that the parser or optimizer will constant-fold
|
|
||||||
>> any conversion functions that are introduced. Thus, in the
|
|
||||||
>> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
|
|
||||||
>> time execution begins, so there's no performance loss.
|
|
||||||
>>
|
|
||||||
>> 2. We cannot lose precision by initially representing a constant as
|
|
||||||
>> numeric and later converting it to float. Nor can we exceed NUMERIC's
|
|
||||||
>> range (the default 1000-digit limit is more than the range of IEEE
|
|
||||||
>> float8 data). It would not work as well to start out by representing
|
|
||||||
>> a constant as float and then converting it to numeric.
|
|
||||||
>>
|
|
||||||
>> Presently, the pg_proc and pg_operator tables contain a pretty fair
|
|
||||||
>> collection of cross-datatype numeric operators, such as int24lt,
|
|
||||||
>> float48pl, etc. We could perhaps leave these in, but I believe that
|
|
||||||
>> it is better to remove them. For example, if int42lt is left in place,
|
|
||||||
>> then it would capture cases like "Int4Var < 42", whereas we need that
|
|
||||||
>> to be translated to int4lt so that an int4 index can be used. Removing
|
|
||||||
>> these operators will eliminate some code bloat and system-catalog bloat
|
|
||||||
>> to boot.
|
|
||||||
>>
|
|
||||||
>> As far as I can tell, this proposal is almost compatible with the rules
|
|
||||||
>> given in SQL92: in particular, SQL92 specifies that an operator having
|
|
||||||
>> both "approximate numeric" (float) and "exact numeric" (int or numeric)
|
|
||||||
>> inputs should deliver an approximate-numeric result. I propose
|
|
||||||
>> deviating from SQL92 in a single respect: SQL92 specifies that a
|
|
||||||
>> constant containing an exponent (eg 1.2E34) is approximate numeric,
|
|
||||||
>> which implies that the result of an operator using it is approximate
|
|
||||||
>> even if the other operand is exact. I believe it's better to treat
|
|
||||||
>> such a constant as exact (ie, type NUMERIC) and only convert it to
|
|
||||||
>> float if the other operand is float. Without doing that, an assignment
|
|
||||||
>> like
|
|
||||||
>> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
|
|
||||||
>> will not work as desired because the constant will be prematurely
|
|
||||||
>> coerced to float, causing precision loss.
|
|
||||||
>>
|
|
||||||
>> Comments?
|
|
||||||
>>
|
|
||||||
>> regards, tom lane
|
|
||||||
>>
|
|
||||||
|
|
||||||
|
|
||||||
> --
|
|
||||||
> Bruce Momjian | http://www.op.net/~candle
|
|
||||||
> pgman@candle.pha.pa.us | (610) 853-3000
|
|
||||||
> + If your life is a hard drive, | 830 Blythe Avenue
|
|
||||||
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
||||||
|
|
||||||
From tgl@sss.pgh.pa.us Mon Jun 12 14:09:45 2000
|
|
||||||
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01993
|
|
||||||
for <pgman@candle.pha.pa.us>; Mon, 12 Jun 2000 13:09:43 -0400 (EDT)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id NAA01515;
|
|
||||||
Mon, 12 Jun 2000 13:10:01 -0400 (EDT)
|
|
||||||
To: Peter Eisentraut <peter_e@gmx.net>
|
|
||||||
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
||||||
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu>,
|
|
||||||
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
||||||
Subject: Re: [HACKERS] Adding time to DATE type
|
|
||||||
In-reply-to: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain>
|
|
||||||
References: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain>
|
|
||||||
Comments: In-reply-to Peter Eisentraut <peter_e@gmx.net>
|
|
||||||
message dated "Sun, 11 Jun 2000 13:41:24 +0200"
|
|
||||||
Date: Mon, 12 Jun 2000 13:10:00 -0400
|
|
||||||
Message-ID: <1512.960829800@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Status: ORr
|
|
||||||
|
|
||||||
Peter Eisentraut <peter_e@gmx.net> writes:
|
|
||||||
> Bruce Momjian writes:
|
|
||||||
>> Can someone give me a TODO summary for this issue?
|
|
||||||
|
|
||||||
> * make 'text' constants default to text type (not unknown)
|
|
||||||
|
|
||||||
> (I think not everyone's completely convinced on this issue, but I don't
|
|
||||||
> recall anyone being firmly opposed to it.)
|
|
||||||
|
|
||||||
It would be a mistake to eliminate the distinction between unknown and
|
|
||||||
text. See for example my just-posted response to John Cochran on
|
|
||||||
pgsql-general about why 'BOULEVARD'::text behaves differently from
|
|
||||||
'BOULEVARD'::char. If string literals are immediately assigned type
|
|
||||||
text then we will have serious problems with char(n) fields.
|
|
||||||
|
|
||||||
I think it's fine to assign string literals a type of 'unknown'
|
|
||||||
initially. What we need to do is add a phase of type resolution that
|
|
||||||
considers treating them as text, but only after the existing logic fails
|
|
||||||
to deduce a type.
|
|
||||||
|
|
||||||
(BTW it might be better to treat string literals as defaulting to char(n)
|
|
||||||
instead of text, allowing the normal promotion rules to replace char(n)
|
|
||||||
with text if necessary. Not sure if that would make things more or less
|
|
||||||
confusing for operations that intermix fixed- and variable-width char
|
|
||||||
types.)
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
From pgsql-hackers-owner+M1936@postgresql.org Sun Dec 10 13:17:54 2000
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA20676
|
|
||||||
for <pgman@candle.pha.pa.us>; Sun, 10 Dec 2000 13:17:54 -0500 (EST)
|
|
||||||
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eBAIGvZ40566;
|
|
||||||
Sun, 10 Dec 2000 13:16:57 -0500 (EST)
|
|
||||||
(envelope-from pgsql-hackers-owner+M1936@postgresql.org)
|
|
||||||
Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
|
|
||||||
by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eBAI8HZ39820
|
|
||||||
for <pgsql-hackers@postgreSQL.org>; Sun, 10 Dec 2000 13:08:17 -0500 (EST)
|
|
||||||
(envelope-from tgl@sss.pgh.pa.us)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eBAI82o28682;
|
|
||||||
Sun, 10 Dec 2000 13:08:02 -0500 (EST)
|
|
||||||
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
|
|
||||||
cc: pgsql-hackers@postgresql.org
|
|
||||||
Subject: [HACKERS] Unknown-type resolution rules, redux
|
|
||||||
Date: Sun, 10 Dec 2000 13:08:02 -0500
|
|
||||||
Message-ID: <28679.976471682@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-hackers-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
parse_coerce.c contains the following conversation --- I believe the
|
|
||||||
first XXX comment is from me and the second from you:
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Still too many candidates? Try assigning types for the unknown
|
|
||||||
* columns.
|
|
||||||
*
|
|
||||||
* We do this by examining each unknown argument position to see if all
|
|
||||||
* the candidates agree on the type category of that slot. If so, and
|
|
||||||
* if some candidates accept the preferred type in that category,
|
|
||||||
* eliminate the candidates with other input types. If we are down to
|
|
||||||
* one candidate at the end, we win.
|
|
||||||
*
|
|
||||||
* XXX It's kinda bogus to do this left-to-right, isn't it? If we
|
|
||||||
* eliminate some candidates because they are non-preferred at the
|
|
||||||
* first slot, we won't notice that they didn't have the same type
|
|
||||||
* category for a later slot.
|
|
||||||
* XXX Hmm. How else would you do this? These candidates are here because
|
|
||||||
* they all have the same number of matches on arguments with explicit
|
|
||||||
* types, so from here on left-to-right resolution is as good as any.
|
|
||||||
* Need a counterexample to see otherwise...
|
|
||||||
*/
|
|
||||||
|
|
||||||
The comment is out of date anyway because it fails to mention the new
|
|
||||||
rule about preferring STRING category. But to answer your request for
|
|
||||||
a counterexample: consider
|
|
||||||
|
|
||||||
SELECT foo('bar', 'baz')
|
|
||||||
|
|
||||||
First, suppose the available candidates are
|
|
||||||
|
|
||||||
foo(float8, int4)
|
|
||||||
foo(float8, point)
|
|
||||||
|
|
||||||
In this case, we examine the first argument position, see that all the
|
|
||||||
candidates agree on NUMERIC category, so we consider resolving the first
|
|
||||||
unknown input to float8. That eliminates neither candidate so we move
|
|
||||||
on to the second argument position. Here there is a conflict of
|
|
||||||
categories so we can't eliminate anything, and we decide the call is
|
|
||||||
ambiguous. That's correct (or at least Operating As Designed ;-)).
|
|
||||||
|
|
||||||
But now suppose we have
|
|
||||||
|
|
||||||
foo(float8, int4)
|
|
||||||
foo(float4, point)
|
|
||||||
|
|
||||||
Here, at the first position we will still see that all candidates agree
|
|
||||||
on NUMERIC category, and then we will eliminate candidate 2 because it
|
|
||||||
isn't the preferred type in that category. Now when we come to the
|
|
||||||
second argument position, there's only one candidate left so there's
|
|
||||||
no category conflict. Result: this call is considered non-ambiguous.
|
|
||||||
|
|
||||||
This means there is a left-to-right bias in the algorithm. For example,
|
|
||||||
the exact same call *would* be considered ambiguous if the candidates'
|
|
||||||
argument orders were reversed:
|
|
||||||
|
|
||||||
foo(int4, float8)
|
|
||||||
foo(point, float4)
|
|
||||||
|
|
||||||
I do not like that. You could maybe argue that earlier arguments are
|
|
||||||
more important than later ones for functions, but it's harder to make
|
|
||||||
that case for binary operators --- and in any case this behavior is
|
|
||||||
extremely difficult to explain in prose.
|
|
||||||
|
|
||||||
To fix this, I think we need to split the loop into two passes.
|
|
||||||
The first pass does *not* remove any candidates. What it does is to
|
|
||||||
look separately at each UNKNOWN-argument position and attempt to deduce
|
|
||||||
a probable category for it, using the following rules:
|
|
||||||
|
|
||||||
* If any candidate has an input type of STRING category, use STRING
|
|
||||||
category; else if all candidates agree on the category, use that
|
|
||||||
category; else fail because no resolution can be made.
|
|
||||||
|
|
||||||
* The first pass must also remember whether any candidates are of a
|
|
||||||
preferred type within the selected category.
|
|
||||||
|
|
||||||
The probable categories and exists-preferred-type booleans are saved in
|
|
||||||
local arrays. (Note this has to be done this way because
|
|
||||||
IsPreferredType currently allows more than one type to be considered
|
|
||||||
preferred in a category ... so the first pass cannot try to determine a
|
|
||||||
unique type, only a category.)
|
|
||||||
|
|
||||||
If we find a category for every UNKNOWN arg, then we enter a second loop
|
|
||||||
in which we discard candidates. In this pass we discard a candidate if
|
|
||||||
(a) it is of the wrong category, or (b) it is of the right category but
|
|
||||||
is not of preferred type in that category, *and* we found candidate(s)
|
|
||||||
of preferred type at this slot.
|
|
||||||
|
|
||||||
If we end with exactly one candidate then we win.
|
|
||||||
|
|
||||||
It is clear in this algorithm that there is no order dependency: the
|
|
||||||
conditions for keeping or discarding a candidate are fixed before we
|
|
||||||
start the second pass, and do not vary depending on which other
|
|
||||||
candidates were discarded before it.
|
|
||||||
|
|
||||||
Comments?
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
From pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 15:47:47 2001
|
|
||||||
Return-path: <pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org>
|
|
||||||
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
|
|
||||||
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBTKlkT05111
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 15:47:46 -0500 (EST)
|
|
||||||
Received: from postgresql.org (postgresql.org [64.49.215.8])
|
|
||||||
by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTKhZN74322
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 14:43:35 -0600 (CST)
|
|
||||||
(envelope-from pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org)
|
|
||||||
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
||||||
by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTKaem38452
|
|
||||||
for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 15:36:40 -0500 (EST)
|
|
||||||
(envelope-from pgman@candle.pha.pa.us)
|
|
||||||
Received: (from pgman@localhost)
|
|
||||||
by candle.pha.pa.us (8.11.6/8.10.1) id fBTKaTg04256;
|
|
||||||
Sat, 29 Dec 2001 15:36:29 -0500 (EST)
|
|
||||||
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
Message-ID: <200112292036.fBTKaTg04256@candle.pha.pa.us>
|
|
||||||
Subject: Re: [GENERAL] Casting Varchar to Numeric
|
|
||||||
In-Reply-To: <20011206150158.O28880-100000@megazone23.bigpanda.com>
|
|
||||||
To: Stephan Szabo <sszabo@megazone23.bigpanda.com>
|
|
||||||
Date: Sat, 29 Dec 2001 15:36:29 -0500 (EST)
|
|
||||||
cc: Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
|
|
||||||
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
||||||
MIME-Version: 1.0
|
|
||||||
Content-Transfer-Encoding: 7bit
|
|
||||||
Content-Type: text/plain; charset=US-ASCII
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-general-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
> On Mon, 3 Dec 2001, Andy Marden wrote:
|
|
||||||
>
|
|
||||||
> > Martijn,
|
|
||||||
> >
|
|
||||||
> > It does work (believe it or not). I've now tried the method you mention
|
|
||||||
> > below - that also works and is much nicer. I can't believe that PostgreSQL
|
|
||||||
> > can't work this out. Surely implementing an algorithm that understands that
|
|
||||||
> > if you can go from a ->b and b->c then you can certainly go from a->c. If
|
|
||||||
>
|
|
||||||
> It's more complicated than that (and postgres does some of this but not
|
|
||||||
> all), for example the cast text->float8->numeric potentially loses
|
|
||||||
> precision and should probably not be an automatic cast for that reason.
|
|
||||||
>
|
|
||||||
> > this is viewed as too complex a task for the internals - at least a diagram
|
|
||||||
> > or some way of understanding how you should go from a->c would be immensely
|
|
||||||
> > helpful wouldn't it! Daunting for anyone picking up the database and trying
|
|
||||||
> > to do something simple(!)
|
|
||||||
>
|
|
||||||
> There may be a need for documentation on this. Would you like to write
|
|
||||||
> some ;)
|
|
||||||
|
|
||||||
OK, I ran some tests:
|
|
||||||
|
|
||||||
test=> create table test (x text);
|
|
||||||
CREATE
|
|
||||||
test=> insert into test values ('323');
|
|
||||||
INSERT 5122745 1
|
|
||||||
test=> select cast (x as numeric) from test;
|
|
||||||
ERROR: Cannot cast type 'text' to 'numeric'
|
|
||||||
|
|
||||||
I can see problems with automatically casting numeric to text because
|
|
||||||
you have to guess the desired format, but going from text to numeric
|
|
||||||
seems quite easy to do. Is there a reason we don't do it?
|
|
||||||
|
|
||||||
I can cast to integer and float8 fine:
|
|
||||||
|
|
||||||
test=> select cast ( x as integer) from test;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
323
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
test=> select cast ( x as float8) from test;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
323
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
--
|
|
||||||
Bruce Momjian | http://candle.pha.pa.us
|
|
||||||
pgman@candle.pha.pa.us | (610) 853-3000
|
|
||||||
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
||||||
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
||||||
|
|
||||||
---------------------------(end of broadcast)---------------------------
|
|
||||||
TIP 2: you can get off all lists at once with the unregister command
|
|
||||||
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
||||||
|
|
||||||
From pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 19:10:38 2001
|
|
||||||
Return-path: <pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org>
|
|
||||||
Received: from west.navpoint.com (west.navpoint.com [207.106.42.13])
|
|
||||||
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBU0AbT23972
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 19:10:37 -0500 (EST)
|
|
||||||
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
|
|
||||||
by west.navpoint.com (8.11.6/8.10.1) with ESMTP id fBTNVj008959
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 18:31:45 -0500 (EST)
|
|
||||||
Received: from postgresql.org (postgresql.org [64.49.215.8])
|
|
||||||
by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTNQrN78655
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 17:26:53 -0600 (CST)
|
|
||||||
(envelope-from pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org)
|
|
||||||
Received: from sss.pgh.pa.us ([192.204.191.242])
|
|
||||||
by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTN8Fm47978
|
|
||||||
for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 18:08:15 -0500 (EST)
|
|
||||||
(envelope-from tgl@sss.pgh.pa.us)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id fBTN7vg20245;
|
|
||||||
Sat, 29 Dec 2001 18:07:57 -0500 (EST)
|
|
||||||
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
cc: Stephan Szabo <sszabo@megazone23.bigpanda.com>,
|
|
||||||
Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
|
|
||||||
Subject: Re: [GENERAL] Casting Varchar to Numeric
|
|
||||||
In-Reply-To: <200112292036.fBTKaTg04256@candle.pha.pa.us>
|
|
||||||
References: <200112292036.fBTKaTg04256@candle.pha.pa.us>
|
|
||||||
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
|
|
||||||
message dated "Sat, 29 Dec 2001 15:36:29 -0500"
|
|
||||||
Date: Sat, 29 Dec 2001 18:07:57 -0500
|
|
||||||
Message-ID: <20242.1009667277@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-general-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
Bruce Momjian <pgman@candle.pha.pa.us> writes:
|
|
||||||
> I can see problems with automatically casting numeric to text because
|
|
||||||
> you have to guess the desired format, but going from text to numeric
|
|
||||||
> seems quite easy to do. Is there a reason we don't do it?
|
|
||||||
|
|
||||||
I do not think it's a good idea to have implicit casts between text and
|
|
||||||
everything under the sun, because that essentially destroys the type
|
|
||||||
checking system. What we need (see previous discussion) is a flag in
|
|
||||||
pg_proc that says whether a type conversion function may be invoked
|
|
||||||
implicitly or not. I've got no problem with offering text(numeric) and
|
|
||||||
numeric(text) functions that are invoked by explicit function calls or
|
|
||||||
casts --- I just don't want the system trying to use them to make
|
|
||||||
sense of a bogus query.
|
|
||||||
|
|
||||||
> I can cast to integer and float8 fine:
|
|
||||||
|
|
||||||
I don't believe that those should be available as implicit casts either.
|
|
||||||
They are, at the moment:
|
|
||||||
|
|
||||||
regression=# select 33 || 44.0;
|
|
||||||
?column?
|
|
||||||
----------
|
|
||||||
3344
|
|
||||||
(1 row)
|
|
||||||
|
|
||||||
Ugh.
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
---------------------------(end of broadcast)---------------------------
|
|
||||||
TIP 6: Have you searched our list archives?
|
|
||||||
|
|
||||||
http://archives.postgresql.org
|
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
@ -1,402 +0,0 @@
|
|||||||
From selkovjr@mcs.anl.gov Sat Jul 25 05:31:05 1998
|
|
||||||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
|
|
||||||
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA16564
|
|
||||||
for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 05:31:03 -0400 (EDT)
|
|
||||||
Received: from antares.mcs.anl.gov (mcs.anl.gov [140.221.9.6]) by renoir.op.net (o1/$ Revision: 1.18 $) with SMTP id FAA01775 for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 05:28:22 -0400 (EDT)
|
|
||||||
Received: from mcs.anl.gov (wit.mcs.anl.gov [140.221.5.148]) by antares.mcs.anl.gov (8.6.10/8.6.10) with ESMTP
|
|
||||||
id EAA28698 for <maillist@candle.pha.pa.us>; Sat, 25 Jul 1998 04:27:05 -0500
|
|
||||||
Sender: selkovjr@mcs.anl.gov
|
|
||||||
Message-ID: <35B9968D.21CF60A2@mcs.anl.gov>
|
|
||||||
Date: Sat, 25 Jul 1998 08:25:49 +0000
|
|
||||||
From: "Gene Selkov, Jr." <selkovjr@mcs.anl.gov>
|
|
||||||
Organization: MCS, Argonne Natl. Lab
|
|
||||||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.32 i586)
|
|
||||||
MIME-Version: 1.0
|
|
||||||
To: Bruce Momjian <maillist@candle.pha.pa.us>
|
|
||||||
Subject: position-aware scanners
|
|
||||||
References: <199807250524.BAA07296@candle.pha.pa.us>
|
|
||||||
Content-Type: text/plain; charset=us-ascii
|
|
||||||
Content-Transfer-Encoding: 7bit
|
|
||||||
Status: RO
|
|
||||||
|
|
||||||
Bruce,
|
|
||||||
|
|
||||||
I attached here (trough the web links) a couple examples, totally
|
|
||||||
irrelevant to postgres but good enough to discuss token locations. I
|
|
||||||
might as well try to patch the backend parser, though not sure how soon.
|
|
||||||
|
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
1.
|
|
||||||
|
|
||||||
The first c parser I wrote,
|
|
||||||
http://wit.mcs.anl.gov/~selkovjr/unit-troff.tgz, is not very
|
|
||||||
sophisticated, so token locations reported by yyerr() may be slightly
|
|
||||||
incorrect (+/- one position depending on the existence and type of the
|
|
||||||
lookahead token. It is a filter used to typeset the units of measurement
|
|
||||||
with eqn. To use it, unpack the tar file and run make. The Makefile is
|
|
||||||
not too generic but I built it on various systems including linux,
|
|
||||||
freebsd and sunos 4.3. The invocation can be something like this:
|
|
||||||
|
|
||||||
./check 0 parse "l**3/(mmoll*min)"
|
|
||||||
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
|
|
||||||
`'(''
|
|
||||||
|
|
||||||
l**3/(mmoll*min)
|
|
||||||
^^^^^
|
|
||||||
|
|
||||||
Now to the guts. As far as I can imagine, the only way to consistently
|
|
||||||
keep track of each character read by the scanner (regardless of the
|
|
||||||
length of expressions it will match) is to redefine its YY_INPUT like
|
|
||||||
this:
|
|
||||||
|
|
||||||
#undef YY_INPUT
|
|
||||||
#define YY_INPUT(buf,result,max_size) \
|
|
||||||
{ \
|
|
||||||
int c = (int) buffer[pos++]; \
|
|
||||||
result = (c == '\0') ? YY_NULL : (buf[0] = c, 1); \
|
|
||||||
}
|
|
||||||
|
|
||||||
Here, buffer is the pointer to the origin of the string being scanned
|
|
||||||
and pos is a global variable, similar in usage to a file pointer (you
|
|
||||||
can both read and manipulate it at will). The buffer and the pointer are
|
|
||||||
initialized by the function
|
|
||||||
|
|
||||||
void setString(char *s)
|
|
||||||
{
|
|
||||||
buffer = s;
|
|
||||||
pos = 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
each time the new string is to be parsed. This (exportable) function is
|
|
||||||
part of the interface.
|
|
||||||
|
|
||||||
In this simplistic design, yyerror() is part of the scanner module and
|
|
||||||
it uses the pos variable to report the location of unexpected tokens.
|
|
||||||
The downside of such arrangement is that in case of error condition, you
|
|
||||||
can't easily tell whether your context is current or lookahead token, it
|
|
||||||
just reports the position of the last token read (be it $ (end of
|
|
||||||
buffer) or something else):
|
|
||||||
|
|
||||||
./check 0 convert "mol/foo"
|
|
||||||
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
|
|
||||||
`'(''
|
|
||||||
|
|
||||||
mol/foo
|
|
||||||
^^^
|
|
||||||
|
|
||||||
(should be at the beginning of "foo")
|
|
||||||
|
|
||||||
./check 0 convert "mmol//l"
|
|
||||||
parse error, expecting `BASIC_UNIT' or `INTEGER' or `POSITIVE_NUMBER' or
|
|
||||||
`'(''
|
|
||||||
|
|
||||||
mmol//l
|
|
||||||
^
|
|
||||||
|
|
||||||
(should be at the second '/')
|
|
||||||
|
|
||||||
|
|
||||||
I believe this is why most simple parsers made with yacc would report
|
|
||||||
parse errors being "at or near" some token, which is fair enough if the
|
|
||||||
expression is not too complex.
|
|
||||||
|
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
2. The second version of the same scanner,
|
|
||||||
http://wit.mcs.anl.gov/~selkovjr/scanner-example.tgz, addresses this
|
|
||||||
problem by recording exact locations of the tokens in each instance of
|
|
||||||
the token semantic data structure. The global,
|
|
||||||
|
|
||||||
UNIT_YYSTYPE unit_yylval;
|
|
||||||
|
|
||||||
would be normally used to export the token semantics (including its
|
|
||||||
original or modified text and location data) to the parser.
|
|
||||||
Unfortunately, I cannot show you the parser part in c, because that's
|
|
||||||
about when I stopped writing parsers in c. Instead, I included a small
|
|
||||||
test program, test.c, that mimics the parser's expectations for the
|
|
||||||
scanner data pretty well. I am assuming here that you are not interested
|
|
||||||
in digging someone else's ugly guts for relatively small bit of
|
|
||||||
information; let me know if I am wrong and I will send you the complete
|
|
||||||
perl code (also generated with bison).
|
|
||||||
|
|
||||||
To run this example, unpack the tar file and run Make. Then do
|
|
||||||
|
|
||||||
gcc test.c scanner.o
|
|
||||||
|
|
||||||
and run a.out
|
|
||||||
|
|
||||||
Note the line
|
|
||||||
|
|
||||||
yylval = unit_getyylval();
|
|
||||||
|
|
||||||
in test.c. You will not normally need it in a c parser. It is enough to
|
|
||||||
define yylval as an external variable and link it to yylval in yylex()
|
|
||||||
|
|
||||||
In the bison-generated parser, yylval gets pushed into a stack (pointed
|
|
||||||
to by yylsp) each time a new token is read. For each syntax rule, the
|
|
||||||
bison macros @1, @2, ... are just shortcuts to locations in the stack 1,
|
|
||||||
2, ... levels deep. In following code fragment, @3 refers to the
|
|
||||||
location info for the third term in the rule (INTEGER):
|
|
||||||
|
|
||||||
(sorry about perl, but I think you can do the same things in c without
|
|
||||||
significant changes to your existing parser)
|
|
||||||
|
|
||||||
term: base {
|
|
||||||
$$ = $1;
|
|
||||||
$$->{'order'} = 1;
|
|
||||||
}
|
|
||||||
| base EXP INTEGER {
|
|
||||||
$$ = $1;
|
|
||||||
$$->{'order'} = @3->{'text'};
|
|
||||||
$$->{'scale'} = $$->{'scale'} ** $$->{'order'};
|
|
||||||
if ( $$->{'order'} == 0 ) {
|
|
||||||
yyerror("Error: expecting a non-zero
|
|
||||||
integer exponent");
|
|
||||||
YYERROR;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
which translates to:
|
|
||||||
|
|
||||||
($yyn == 10) && do {
|
|
||||||
$yyval = $yyvsa[-1];
|
|
||||||
$yyval->{'order'} = 1;
|
|
||||||
last SWITCH;
|
|
||||||
};
|
|
||||||
|
|
||||||
($yyn == 11) && do {
|
|
||||||
$yyval = $yyvsa[-3];
|
|
||||||
$yyval->{'order'} = $yylsa[-1]->{'text'}
|
|
||||||
$yyval->{'scale'} = $yyval->{'scale'} ** $yyval->{'order'};
|
|
||||||
if ( $yyval->{'order'} == 0 ) {
|
|
||||||
yyerror("Error: expecting a non-zero integer
|
|
||||||
exponent");
|
|
||||||
goto yyerrlab1 ;
|
|
||||||
}
|
|
||||||
last SWITCH;
|
|
||||||
};
|
|
||||||
|
|
||||||
In c, you will have a bit more complicated pointer arithmetic to adress
|
|
||||||
the stack, but the usage of objects will be the same. Note here that it
|
|
||||||
is convenient to keep all information about the token in its location
|
|
||||||
info, (yylsa, yylsp, yylval, @n), while everything relating to the value
|
|
||||||
of the expression, or to the parse tree, is better placed in the
|
|
||||||
semantic stack (yyssa, yyssp, yysval, $n). Also note that in some cases
|
|
||||||
you can do semantic checks inside rules and report useful messages
|
|
||||||
before or instead of invoking yyerror();
|
|
||||||
|
|
||||||
Finally, it is useful to make the following wrapper function around
|
|
||||||
external yylex() in order to maintain your own token stack. Unlike the
|
|
||||||
parser's internal stack which is only as deep as the rule being reduced,
|
|
||||||
this one can hold all tokens recognized during the current run, and that
|
|
||||||
can be extremely helpful for error reporting and any transformations you
|
|
||||||
may need. In this way, you can even scan (tokenize) the whole buffer
|
|
||||||
before handing it off to the parser (who knows, you may need a token
|
|
||||||
ahead of what is currently seen by the parser):
|
|
||||||
|
|
||||||
|
|
||||||
sub tokenize {
|
|
||||||
undef @tokenTable;
|
|
||||||
my ($tok, $text, $name, $unit, $first_line, $first_column,
|
|
||||||
$last_line, $last_column);
|
|
||||||
|
|
||||||
while ( ($tok = &UnitLex::yylex()) > 0 ) { # this is where the
|
|
||||||
c-coded yylex is called,
|
|
||||||
# UnitLex is the perl
|
|
||||||
extension encapsulating it
|
|
||||||
( $text, $name, $unit, $first_line, $first_column, $last_line,
|
|
||||||
$last_column ) = &UnitLex::getyylval;
|
|
||||||
push(@tokenTable,
|
|
||||||
Unit::yyltype->new (
|
|
||||||
'token' => $tok,
|
|
||||||
'text' => $text,
|
|
||||||
'name' => $name,
|
|
||||||
'unit' => $unit,
|
|
||||||
'first_line' => $first_line,
|
|
||||||
'first_column' => $first_column,
|
|
||||||
'last_line' => $last_line,
|
|
||||||
'last_column' => $last_column,
|
|
||||||
)
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
It is now a lot easier to handle various state-related problems, such as
|
|
||||||
backtracking and error reporting. The yylex() function as seen by the
|
|
||||||
parser might be constructed somewhat like this:
|
|
||||||
|
|
||||||
sub yylex {
|
|
||||||
$yylloc = $tokenTable[$tokenNo]; # $tokenNo is a global; now
|
|
||||||
instead of a "file pointer",
|
|
||||||
# as in the first example, we have
|
|
||||||
a "token pointer"
|
|
||||||
undef $yylval;
|
|
||||||
|
|
||||||
|
|
||||||
# disregard this; name this block "computing semantic values"
|
|
||||||
if ( $yylloc->{'token'} == UNIT) {
|
|
||||||
$yylval = Unit::Operand->new(
|
|
||||||
'unit' => Unit::Dict::unit($yylloc->{'unit'}),
|
|
||||||
'base' => Unit::Dict::base($yylloc->{'unit'}),
|
|
||||||
'scale' => Unit::Dict::scale($yylloc->{'unit'}),
|
|
||||||
'scaleToBase' => Unit::Dict::scaleToBase($yylloc->{'unit'}),
|
|
||||||
'loc' => $yylloc,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
elsif ( ($yylloc->{'token'} == INTEGER ) || ($yylloc->{'token'} ==
|
|
||||||
POSITIVE_NUMBER) ) {
|
|
||||||
$yylval = Unit::Operand->new(
|
|
||||||
'unit' => '1',
|
|
||||||
'base' => '1',
|
|
||||||
'scale' => 1,
|
|
||||||
'scaleToBase' => 1,
|
|
||||||
'loc' => $yylloc,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
$tokenNo++;
|
|
||||||
return(%{$yylloc}->{'token'}); # This is all the parser needs to
|
|
||||||
know about this token.
|
|
||||||
# But we already made sure we saved
|
|
||||||
everything we need to know.
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
Now the most interesting part, the error reporting routine:
|
|
||||||
|
|
||||||
|
|
||||||
sub yyerror {
|
|
||||||
my ($str) = @_;
|
|
||||||
my ($message, $start, $end, $loc);
|
|
||||||
|
|
||||||
$loc = $tokenTable[$tokenNo-1]; # This is the same as to say,
|
|
||||||
# "obtain the location info for the
|
|
||||||
current token"
|
|
||||||
|
|
||||||
# You may use this routine for your own purposes or let parser use
|
|
||||||
it
|
|
||||||
if( $str ne 'parse error' ) {
|
|
||||||
$message = "$str instead of `" . $loc->{'name'} . "' <" .
|
|
||||||
$loc->{'text'} . ">, at line " . $loc->{'first_line'} . ":\n\
|
|
||||||
n";
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
$message = "unexpected token `" . $loc->{'name'} . "' <" .
|
|
||||||
$loc->{'text'} . ">, at line " . loc->{'first_line'} . ":\n
|
|
||||||
\n";
|
|
||||||
}
|
|
||||||
|
|
||||||
$message .= $parseBuffer . "\n"; # that's the original string that
|
|
||||||
was used to set the parser buffer
|
|
||||||
|
|
||||||
$message .= ( ' ' x ($loc->{'first_column'} + 1) ) . ( '^' x
|
|
||||||
length($loc->{'text'}) ). "\n";
|
|
||||||
if( $str ne 'parse error' ) {
|
|
||||||
print STDERR "$str instead of `", $loc->{'name'}, "' {",
|
|
||||||
$loc->{'text'}, "}, at line ", $loc->{'first_line'}, ":\n\n";
|
|
||||||
}
|
|
||||||
else {
|
|
||||||
print STDERR "unexpected token `", $loc->{'name'}, "' {",
|
|
||||||
$loc->{'text'}, "}, at line ", $loc->{'first_line'}, ":\n\n";
|
|
||||||
}
|
|
||||||
|
|
||||||
print STDERR "$parseBuffer\n";
|
|
||||||
print STDERR ' ' x ($loc->{'first_column'} + 1), '^' x
|
|
||||||
length($loc->{'text'}), "\n";
|
|
||||||
}
|
|
||||||
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Scanners used in these examples assume there is a single line of text on
|
|
||||||
the input (the first_line and last_line elements of yylloc are simply
|
|
||||||
ignored). If you want to be able to parse multi-line buffers, just add a
|
|
||||||
lex rule for '\n' that will increment the line count and reset the pos
|
|
||||||
variable to zero.
|
|
||||||
|
|
||||||
|
|
||||||
Ugly as it may seem, I find this approach extremely liberating. If the
|
|
||||||
grammar becomes too complicated for a LALR(1) parser, I can cascade
|
|
||||||
multiple parsers. The token table can then be used to reassemble parts
|
|
||||||
of original expression for subordinate parsers, preserving the location
|
|
||||||
info all the way down, so that subordinate parsers can report their
|
|
||||||
problems consistently. You probably don't need this, as SQL is very well
|
|
||||||
thought of and has parsable grammar. But it may be of some help, for
|
|
||||||
error reporting.
|
|
||||||
|
|
||||||
|
|
||||||
--Gene
|
|
||||||
|
|
||||||
From pgsql-patches-owner+M1499@postgresql.org Sat Aug 4 13:11:53 2001
|
|
||||||
Return-path: <pgsql-patches-owner+M1499@postgresql.org>
|
|
||||||
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f74HBrh11339
|
|
||||||
for <pgman@candle.pha.pa.us>; Sat, 4 Aug 2001 13:11:53 -0400 (EDT)
|
|
||||||
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
|
|
||||||
by postgresql.org (8.11.3/8.11.4) with SMTP id f74H89655183;
|
|
||||||
Sat, 4 Aug 2001 13:08:09 -0400 (EDT)
|
|
||||||
(envelope-from pgsql-patches-owner+M1499@postgresql.org)
|
|
||||||
Received: from sss.pgh.pa.us ([192.204.191.242])
|
|
||||||
by postgresql.org (8.11.3/8.11.4) with ESMTP id f74Gxb653074
|
|
||||||
for <pgsql-patches@postgresql.org>; Sat, 4 Aug 2001 12:59:37 -0400 (EDT)
|
|
||||||
(envelope-from tgl@sss.pgh.pa.us)
|
|
||||||
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
||||||
by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id f74GtPC29183;
|
|
||||||
Sat, 4 Aug 2001 12:55:25 -0400 (EDT)
|
|
||||||
To: Dave Page <dpage@vale-housing.co.uk>
|
|
||||||
cc: "'Fernando Nasser'" <fnasser@cygnus.com>,
|
|
||||||
Bruce Momjian <pgman@candle.pha.pa.us>, Neil Padgett <npadgett@redhat.com>,
|
|
||||||
pgsql-patches@postgresql.org
|
|
||||||
Subject: Re: [PATCHES] Patch for Improved Syntax Error Reporting
|
|
||||||
In-Reply-To: <8568FC767B4AD311AC33006097BCD3D61A2D70@woody.vale-housing.co.uk>
|
|
||||||
References: <8568FC767B4AD311AC33006097BCD3D61A2D70@woody.vale-housing.co.uk>
|
|
||||||
Comments: In-reply-to Dave Page <dpage@vale-housing.co.uk>
|
|
||||||
message dated "Sat, 04 Aug 2001 12:37:23 +0100"
|
|
||||||
Date: Sat, 04 Aug 2001 12:55:24 -0400
|
|
||||||
Message-ID: <29180.996944124@sss.pgh.pa.us>
|
|
||||||
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
||||||
Precedence: bulk
|
|
||||||
Sender: pgsql-patches-owner@postgresql.org
|
|
||||||
Status: OR
|
|
||||||
|
|
||||||
Dave Page <dpage@vale-housing.co.uk> writes:
|
|
||||||
> Oh, I quite agree. I'm not adverse to updating my code, I just want to avoid
|
|
||||||
> users getting misleading messages until I come up with those updates.
|
|
||||||
|
|
||||||
Hmm ... if they were actively misleading then I'd share your concern.
|
|
||||||
|
|
||||||
I guess what you're thinking is that the error offset reported by the
|
|
||||||
backend won't correspond directly to what the user typed, and if the
|
|
||||||
user tries to use the offset to manually count off characters, he may
|
|
||||||
arrive at the wrong place? Good point. I'm not sure whether a message
|
|
||||||
like
|
|
||||||
|
|
||||||
ERROR: parser: parse error at or near 'frum';
|
|
||||||
POSITION: 42
|
|
||||||
|
|
||||||
would be likely to encourage people to try that. Thoughts? (I do think
|
|
||||||
this is a good argument for not embedding the position straight into the
|
|
||||||
main error message though...)
|
|
||||||
|
|
||||||
One possible compromise is to combine the straight character-offset
|
|
||||||
approach with a simplistic context display:
|
|
||||||
|
|
||||||
ERROR: parser: parse error at or near 'frum';
|
|
||||||
POSITION: 42 ... oid,relname FRUM ...
|
|
||||||
|
|
||||||
The idea is to define the "POSITION" field as an integer offset possibly
|
|
||||||
followed by whitespace and noise words. An updated client would grab
|
|
||||||
the offset, ignore the rest of the field, and do the right thing. A
|
|
||||||
not-updated client would display the entire message, and with any luck
|
|
||||||
the user would read it correctly.
|
|
||||||
|
|
||||||
regards, tom lane
|
|
||||||
|
|
||||||
---------------------------(end of broadcast)---------------------------
|
|
||||||
TIP 5: Have you checked our extensive FAQ?
|
|
||||||
|
|
||||||
http://www.postgresql.org/users-lounge/docs/faq.html
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user