postgresql/doc/TODO.detail/cnfify

From daybee@bellatlantic.net Sun Aug 23 20:21:48 1998
Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA26688
	for <maillist@candle.pha.pa.us>; Sun, 23 Aug 1998 20:21:46 -0400 (EDT)
Received: from bellatlantic.net (client196-126-169.bellatlantic.net [151.196.126.169])
	by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id UAA09478;
	Sun, 23 Aug 1998 20:18:35 -0400 (EDT)
Message-ID: <35E0ABF0.578694C8@bellatlantic.net>
Date: Sun, 23 Aug 1998 19:55:29 -0400
From: David Hartwig <daybee@bellatlantic.net>
Organization: Home
X-Mailer: Mozilla 4.04 [en] (Win95; I)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org
Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4
References: <199808220353.XAA04528@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: ROr


Bruce Momjian wrote:

> >
> > Hannu Krosing wrote:
> >
> > > > The days where every release fixed server crashes, or added a feature
> > > > that users were 'screaming for' may be a thing of the past.
> > >
> > > Is anyone working on fixing the exploding optimisations for many OR-s,
> > > at least the canonic case used by access?
> > >
> > > My impression is that this has fallen somewhere between
> > > insightdist and Vadim.
> >
> > This is really big for the ODBCers. (And I suspect for JDBCers too.)  Many
> > desktop libraries and end-user tools depend on this "record set" strategy to
> > operate effectively.
> >
> > I have put together a workable hack that runs just before cnfify().  The
> > option is activated through the SET command.  Once activated, it identifies
> > queries with this particular multi-OR pattern generated by these RECORD SET
> > strategies.  Qualified query trees are rewritten as multiple UNIONs.   (One
> > for each OR grouping).
> >
> > The results are profound.    Queries that used to scan tables because of the
> > ORs, now make use of any indexes.   Thus, the size of the table has virtually
> > no effect on performance.  Furthermore, queries that used to crash the
> > backend, now run in under a second.
> >
> > Currently the down sides are:
> >     1. If there is no usable index, performance is significantly worse.  The
> > patch does not check to make sure that there is a usable index.  I could use
> > some pointers on this.
> >
> >     2. Small tables are actually a bit slower than without the patch.
> >
> >     3.  Not very elegant.    I am looking for a more generalized solution.
> > I have lots of ideas, but I would need to know the backend much better before
> > attempting any of them.   My favorite idea is before cnfify(), to factor the
> > OR terms and pull out the constants into a virtual (temporary) table spaces.
> > Then rewrite the query as a join.   The optimizer will (should) treat the new
> > query accordingly.  This assumes that an efficient factoring algorithm exists
> > and that temporary tables can exist in the heap.
> >
> > Illustration:
> > SELECT ... FROM tab WHERE
> > (var1 = const1 AND var2 = const2) OR
> > (var1 = const3 AND var2 = const4) OR
> > (var1 = const5 AND var2 = const6)
> >
> > SELECT ... FROM tab, tmp WHERE
> > (var1 = var_x AND var2 = var_y)
> >
> > tmp
> > var_x  | var_y
> > --------------
> > const1|const2
> > const3|const4
> > const5|const6
>
> David, where are we on this?  I know we have OR's using indexes.  Do we
> still need to look this as a fix, or are we OK.   I have not gotten far
> enough in the optimizer to know how to fix the

Bruce,

If the question is, have I come up with a solution for the cnf'ify problem:  No

If the question is, is it still important:  Very much yes.

It is essential for many RAD tools using remote data objects which make use of key
sets.  Your recent optimization of the OR list goes a long way, but inevitably
users are confronted with multi-part keys.

When I look at the problem my head spins.   I do not have the experience (yet?)
with the backend to be mucking around in the optimizer.  As I see it, cnf'ify is
doing just what it is supposed to do.  Boundless boolean logic.

I think hope may lay though, in identifying each AND'ed group associated with a key
and tagging it as a special sub-root node which cnf'ify does not penetrate.   This
node would be allowed to pass to the later stages of the optimizer where it will be
used to plan index scans.  Easy for me to say.

In the meantime, I still have the patch that I described in prior email.  It has
worked well for us.  Let me restate that.   We could not survive without it!
However, I do not feel that is a sufficiently functional approach that should be
incorporated as a final solution.     I will submit the patch if you, (anyone) does
not come up with a better solution.  It is coded to be activated by a SET KSQO to
minimize its reach.


From daybee@bellatlantic.net Sun Aug 30 12:06:24 1998
Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA12860
	for <maillist@candle.pha.pa.us>; Sun, 30 Aug 1998 12:06:22 -0400 (EDT)
Received: from bellatlantic.net (client196-126-73.bellatlantic.net [151.196.126.73])
	by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id MAA18468;
	Sun, 30 Aug 1998 12:03:33 -0400 (EDT)
Message-ID: <35E9726E.C6E73049@bellatlantic.net>
Date: Sun, 30 Aug 1998 11:40:31 -0400
From: David Hartwig <daybee@bellatlantic.net>
Organization: Home
X-Mailer: Mozilla 4.06 [en] (Win98; I)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org
Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4
References: <199808290344.XAA28089@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO


Bruce Momjian wrote:

> OK, let me try this one.
>
> Why is the system cnf'ifying the query.  Because it  wants to have a
> list of qualifications that are AND'ed, so it can just pick the most
> restrictive/cheapest, and evaluate that one first.
>
> If you have:
>
>         (a=b and c=d) or e=1
>
> In this case, without cnf'ify, it has to evaluate both of them, because
> if one is false, you can't be sure another would be true.  In the
> cnf'ify case,
>
>         (a=b or e=1) and (c=d or e=1)
>
> In this case, it can choose either, and act on just one, if a row fails
> to meet it, it can stop and not evaluate it using the other restriction.
>
> The fact is that it is only going to use fancy join/index in one of the
> two cases, so it tries to pick the best one, and does a brute-force
> qualification test on the remaining item if the first one tried is true.
>
> The problem is of course large where clauses can exponentially expand
> this.  What it really trying to do is to pick a cheapest restriction,
> but the memory explosion and query failure are serious problems.
>
> The issue is that it thinks it is doing something to help things, while
> it is actually hurting things.
>
> In the ODBC case of:
>
>         (x=3 and y=4) or
>         (x=3 and y=5) or
>         (x=3 and y=6) or ...
>
> it clearly is not going to gain anything by choosing any CHEAPEST path,
> because they are all the same in terms of cost, and the use by ODBC
> clients is hurting reliability.
>
> I am inclined to agree with David's solution of breaking apart the query
> into separate UNION queries in certain cases.  It seems to be the most
> logical solution, because the cnf'ify code is working counter to its
> purpose in these cases.
>
> Now, the question is how/where to implement this.  I see your idea of
> making the OR a join to a temp table that holds all the constants.
> Another idea would be to do actual UNION queries:
>
>         SELECT * FROM tab
>         WHERE (x=3 and y=4)
>         UNION
>         SELECT * FROM tab
>         WHERE (x=3 and y=5)
>         UNION
>         SELECT * FROM tab
>         WHERE (x=3 and y=6) ...
>
> This would work well for tables with indexes, but for a sequential scan,
> you are doing a sequential scan for each UNION.

Practically speaking, the lack of an index concern, may not be justified.   The reason
these queries are being generated, with this shape, is because remote data objects on the
client side are being told that a primary key exists on these tables.  The object is told
about these keys  in one of two ways.

1.  It queries the database for the primary key of the table.  The ODBC driver serviced
this request by querying for the attributes used in {table_name}_pkey.

2.  The user manually specifies the primary key.  In this case an actual index may not
exist.   (i.e. MS Access asks the user for this information if a primary key is not found
in a table)

The second case is the only one that would cause a problem.  Fortunately, the solution is
simple.  Add a primary key index!

My only concern is to be able to accurately identify a query with the proper signature
before rewriting it as a UNION.   To what degree should this inspection be taken?

BTW,  I would not do the rewrite on OR's without AND's since you have fixed the OR's use
of the index.

There is one other potential issue.  My experience with using arrays in tables and UNIONS
creates problems.  There are missing array comparison operators which are used by the
implied DISTINCT.

> Another idea is
> subselects.  Also, you have to make sure you return the proper rows,
> keeping duplicates where they are in the base table, but not returning
> them when the meet more than one qualification.
>
>         SELECT * FROM tab
>         WHERE (x,y) IN (SELECT 3, 4
>                         UNION
>                         SELECT 3, 5
>                         UNION
>                         SELECT 3, 6)
>
> I believe we actually support this.  This is not going to use an index
> on tab, so it may be slow if x and y are indexed.
>
> Another more bizarre solution is:
>
>         SELECT * FROM tab
>         WHERE (x,y) = (SELECT 3, 4) OR
>               (x,y) = (SELECT 3, 5) OR
>               (x,y) = (SELECT 3, 6)
>
> Again, I think we do this too.  I don't think cnf'ify does anything with
> this.  I also believe "=" uses indexes on subselects, while IN does not
> because IN could return lots of rows, and an index is slower than a
> non-index join on lots of rows.  Of course, now that we index OR's.
>
> Let me ask another question.  If I do:
>
>         SELECT * FROM tab WHERE x=3 OR x=4
>
> it works, and uses indexes.  Why can't the optimizer just not cnf'ify
> things sometimes, and just do:
>
>         SELECT * FROM tab
>         WHERE   (x=3 AND y=4) OR
>                 (x=3 AND y=5) OR
>                 (x=3 AND y=6)
>
> Why can it handle x=3 OR x=4, but not the more complicated case above,
> without trying to be too smart?  If x,y is a multi-key index, it could
> use that quite easily.  If not, it can do a sequentail scan and run the
> tests.
>
> Another issue.  To the optimizer, x=3 and x=y are totally different.  In
> x=3, it is a column compared to a constant, while in x=y, it is a join.
> That makes a huge difference.
>
> In the case of (a=b and c=d) or e=1, you pick the best path and do the
> a=b join, and throw in the e=1 entries.  You can't easily do both joins,
> because you also need the e=1 stuff.
>
> I wounder what would happen if we prevent cnf'ifying of cases where the
> OR represent only column = constant restrictions.
>
> I meant to really go through the optimizer this month, but other backend
> items took my time.
>
> Can someone run some tests on disabling the cnf'ify calls.  It is my
> understanding that with the non-cnf-ify'ed query, it can't choose an
> optimial path, and starts to do either straight index matches,
> sequential scans, or cartesian products where it joins every row to
> every other row looking for a match.
>
> Let's say we turn off cnf-ify just for non-join queries.  Does that
> help?
>
> I am not sure of the ramifications of telling the optimizer it no longer
> has a variety of paths to choose for evaluating the query.

I did not try this earlier because I thought it was too good to be true.   I was right.
I tried commenting out the normalize() function in the cnfify().   The EXPLAIN showed a
sequential scan and the resulting tuple set was empty.   Time will not allow me to dig
into this further this weekend.

Unless you come up with a better solution, I am going to submit my patch on Monday to
make the Sept. 1st deadline.  It includes a SET switch to activate the rewrite so as not
to cause problems outside the ODBC users.    We can either improve, it or yank it, by the
Oct. 1st deadline.


From infotecn@tin.it Mon Aug 31 03:01:51 1998
Received: from mail.tol.it (mail.tin.it [194.243.154.49])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id DAA09740
	for <maillist@candle.pha.pa.us>; Mon, 31 Aug 1998 03:01:48 -0400 (EDT)
Received: from Server.InfoTecna.com (a-mz6-50.tin.it [212.216.9.113])
          by mail.tol.it (8.8.4/8.8.4) with ESMTP
   id JAA16451; Mon, 31 Aug 1998 09:00:35 +0200 (MET DST)
Received: from tm3.InfoTecna.com (Tm1.InfoTecna.com [192.168.1.1])
	by Server.InfoTecna.com (8.8.5/8.8.5) with SMTP id IAA18678;
	Mon, 31 Aug 1998 08:53:13 +0200
Message-Id: <3.0.5.32.19980831085312.00986cc0@MBox.InfoTecna.com>
X-Sender: denis@MBox.InfoTecna.com
X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32)
Date: Mon, 31 Aug 1998 08:53:12 +0200
To: David Hartwig <daybee@bellatlantic.net>,
        Bruce Momjian <maillist@candle.pha.pa.us>
From: Sbragion Denis <infotecn@tin.it>
Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4
Cc: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org
In-Reply-To: <35E9726E.C6E73049@bellatlantic.net>
References: <199808290344.XAA28089@candle.pha.pa.us>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Status: RO

Hello,

At 11.40 30/08/98 -0400, David Hartwig wrote:
>> Why is the system cnf'ifying the query.  Because it  wants to have a
>> list of qualifications that are AND'ed, so it can just pick the most
>> restrictive/cheapest, and evaluate that one first.

Just a small question about all this optimizations stuff. I'm not a
database expert but I think we are talking about a NP-complete problem.
Could'nt we convert this optimization problem into another NP one that is
known to have a good solution ? For example for the traveling salesman
problem there's an alghoritm that provide a solution that's never more than
two times the optimal one an provides results that are *really* near the
optimal one most of the times. The simplex alghoritm may be another
example. I think that this kind of alghoritm would be better than a
collection ot tricks for special cases, and this tricks could be used
anyway when special cases are detected. Furthermore I also know that exists
a free program I used in the past that provides this kind of optimizations
for chip design. I don't remember the exact name of the program but I
remember it came from Berkeley university. Of course may be I'm totally
missing the point.

Hope it helps !

Bye!

	Dr. Sbragion Denis
	InfoTecna
	Tel, Fax: +39 39 2324054
	URL: http://space.tin.it/internet/dsbragio

From andreas.zeugswetter@telecom.at Mon Aug 31 06:31:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA14231
	for <maillist@candle.pha.pa.us>; Mon, 31 Aug 1998 06:31:12 -0400 (EDT)
Received: from gandalf.telecom.at (gandalf.telecom.at [194.118.26.84]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id GAA21099 for <maillist@candle.pha.pa.us>; Mon, 31 Aug 1998 06:23:41 -0400 (EDT)
Received: from zeugswettera.user.lan.at (zeugswettera.user.lan.at [10.4.123.227]) by gandalf.telecom.at (A.B.C.Delta4/8.8.8) with SMTP id MAA38132; Mon, 31 Aug 1998 12:22:07 +0200
Received: by zeugswettera.user.lan.at with Microsoft Mail
	id <01BDD4DA.C7F5B690@zeugswettera.user.lan.at>; Mon, 31 Aug 1998 12:27:55 +0200
Message-ID: <01BDD4DA.C7F5B690@zeugswettera.user.lan.at>
From: Andreas Zeugswetter <andreas.zeugswetter@telecom.at>
To: "'maillist@candle.pha.pa.us'" <maillist@candle.pha.pa.us>
Cc: "hackers@postgreSQL.org" <hackers@postgreSQL.org>
Subject: AW: [INTERFACES] Re: [HACKERS] changes in 6.4
Date: Mon, 31 Aug 1998 12:22:05 +0200
Encoding: 31 TEXT
Status: RO


>Another idea would be to do actual UNION queries:
>
>	SELECT * FROM tab
>	WHERE (x=3 and y=4)
>	UNION
>	SELECT * FROM tab
>	WHERE (x=3 and y=5)
>	UNION
>	SELECT * FROM tab
>	WHERE (x=3 and y=6) ...
>
>This would work well for tables with indexes, but for a sequential scan,
>you are doing a sequential scan for each UNION.

The most important Application for this syntax will be M$ Access
because it uses this syntax to display x rows from a table in a particular
sort order. In this case x and y will be the primary key and therefore have a
unique index. So I think this special case should work good.

The strategy could be something like:
iff x, y is a unique index
	do the union access path
else
	do something else
done

I think hand written SQL can always be rewritten if it is not fast enough
using this syntax.

Andreas


From owner-pgsql-patches@hub.org Tue Sep  1 02:01:10 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA28687
	for <maillist@candle.pha.pa.us>; Tue, 1 Sep 1998 02:01:06 -0400 (EDT)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA02180; Tue, 1 Sep 1998 01:48:43 -0400 (EDT)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 01 Sep 1998 01:47:48 +0000 (EDT)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA02160 for pgsql-patches-outgoing; Tue, 1 Sep 1998 01:47:46 -0400 (EDT)
Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA02147 for <pgsql-patches@postgreSQL.org>; Tue, 1 Sep 1998 01:47:42 -0400 (EDT)
Received: from bellatlantic.net (client196-126-3.bellatlantic.net [151.196.126.3])
	by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id XAA27530
	for <pgsql-patches@postgreSQL.org>; Mon, 31 Aug 1998 23:24:07 -0400 (EDT)
Message-ID: <35EB2B33.EBF1E9AA@bellatlantic.net>
Date: Mon, 31 Aug 1998 19:01:07 -0400
From: David Hartwig <daybee@bellatlantic.net>
Organization: Insight Distribution Systems
X-Mailer: Mozilla 4.04 [en] (X11; I; Linux 2.0.29 i586)
MIME-Version: 1.0
To: patches <pgsql-patches@postgreSQL.org>
Subject: [PATCHES] Interim AND/OR memory exaustion fix.
Content-Type: multipart/mixed; boundary="------------BEFD1E6DA78A2DC20B524E32"
Sender: owner-pgsql-patches@hub.org
Precedence: bulk
Status: ROr

This is a multi-part message in MIME format.
--------------BEFD1E6DA78A2DC20B524E32
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I will be cleaning this up more before the Oct 1 deadline.

--------------BEFD1E6DA78A2DC20B524E32
Content-Type: text/plain; charset=us-ascii; name="keyset.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="keyset.patch"

*** ./backend/commands/variable.c.orig	Thu Jul 30 19:25:26 1998
--- ./backend/commands/variable.c	Mon Aug 31 17:23:32 1998
***************
*** 24,29 ****
--- 24,30 ----
  extern bool _use_geqo_;
  extern int32 _use_geqo_rels_;
  extern bool _use_right_sided_plans_;
+ extern bool _use_keyset_query_optimizer;

  /*-----------------------------------------------------------------------*/
  static const char *
***************
*** 559,564 ****
--- 560,568 ----
  	},
  #endif
  	{
+ 		"ksqo", parse_ksqo, show_ksqo, reset_ksqo
+ 	},
+ 	{
  		NULL, NULL, NULL, NULL
  	}
  };
***************
*** 611,615 ****
--- 615,663 ----

  	elog(NOTICE, "Unrecognized variable %s", name);

+ 	return TRUE;
+ }
+
+
+ /*-----------------------------------------------------------------------
+ KSQO code will one day be unnecessary when the optimizer makes use of
+ indexes when multiple ORs are specified in the where clause.
+ See optimizer/prep/prepkeyset.c for more on this.
+ 	daveh@insightdist.com    6/16/98
+ -----------------------------------------------------------------------*/
+ bool
+ parse_ksqo(const char *value)
+ {
+ 	if (value == NULL)
+ 	{
+ 		reset_ksqo();
+ 		return TRUE;
+ 	}
+
+ 	if (strcasecmp(value, "on") == 0)
+ 		_use_keyset_query_optimizer = true;
+ 	else if (strcasecmp(value, "off") == 0)
+ 		_use_keyset_query_optimizer = false;
+ 	else
+ 		elog(ERROR, "Bad value for Key Set Query Optimizer (%s)", value);
+
+ 	return TRUE;
+ }
+
+ bool
+ show_ksqo()
+ {
+
+ 	if (_use_keyset_query_optimizer)
+ 		elog(NOTICE, "Key Set Query Optimizer is ON");
+ 	else
+ 		elog(NOTICE, "Key Set Query Optimizer is OFF");
+ 	return TRUE;
+ }
+
+ bool
+ reset_ksqo()
+ {
+ 	_use_keyset_query_optimizer = false;
  	return TRUE;
  }
*** ./backend/optimizer/plan/planner.c.orig	Sun Aug 30 04:28:02 1998
--- ./backend/optimizer/plan/planner.c	Mon Aug 31 17:23:32 1998
***************
*** 69,74 ****
--- 69,75 ----
  	PlannerInitPlan = NULL;
  	PlannerPlanId = 0;

+ 	transformKeySetQuery(parse);
  	result_plan = union_planner(parse);

  	Assert(PlannerQueryLevel == 1);
*** ./backend/optimizer/prep/Makefile.orig	Sun Apr  5 20:23:48 1998
--- ./backend/optimizer/prep/Makefile	Mon Aug 31 17:23:32 1998
***************
*** 13,19 ****

  CFLAGS += -I../..

! OBJS = prepqual.o preptlist.o prepunion.o

  # not ready yet: predmig.o xfunc.o

--- 13,19 ----

  CFLAGS += -I../..

! OBJS = prepqual.o preptlist.o prepunion.o prepkeyset.o

  # not ready yet: predmig.o xfunc.o

*** ./backend/optimizer/prep/prepkeyset.c.orig	Mon Aug 31 17:23:32 1998
--- ./backend/optimizer/prep/prepkeyset.c	Mon Aug 31 18:30:58 1998
***************
*** 0 ****
--- 1,213 ----
+ /*-------------------------------------------------------------------------
+  *
+  * prepkeyset.c--
+  *	  Special preperation for keyset queries.
+  *
+  * Copyright (c) 1994, Regents of the University of California
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include <stdio.h>
+ #include <string.h>
+
+ #include "postgres.h"
+ #include "nodes/pg_list.h"
+ #include "nodes/parsenodes.h"
+ #include "utils/elog.h"
+
+ #include "nodes/nodes.h"
+ #include "nodes/execnodes.h"
+ #include "nodes/plannodes.h"
+ #include "nodes/primnodes.h"
+ #include "nodes/relation.h"
+
+ #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
+ #include "optimizer/planmain.h"
+ /*
+  * Node_Copy--
+  *        a macro to simplify calling of copyObject on the specified field
+  */
+ #define Node_Copy(from, newnode, field) newnode->field = copyObject(from->field)
+
+ /*****  DEBUG stuff
+ #define TABS {int i; printf("\n"); for (i = 0; i<level; i++) printf("\t"); }
+ static int level = 0;
+ ******/
+
+ bool _use_keyset_query_optimizer = FALSE;
+
+ static int inspectOpNode(Expr *expr);
+ static int inspectAndNode(Expr *expr);
+ static int inspectOrNode(Expr *expr);
+
+ /**********************************************************************
+  *   This routine transforms query trees with the following form:
+  *       SELECT a,b, ... FROM one_table WHERE
+  *        (v1 = const1 AND v2 = const2 [ vn = constn ]) OR
+  *        (v1 = const3 AND v2 = const4 [ vn = constn ]) OR
+  *        (v1 = const5 AND v2 = const6 [ vn = constn ]) OR
+  *                         ...
+  *        [(v1 = constn AND v2 = constn [ vn = constn ])]
+  *
+  *                             into
+  *
+  *       SELECT a,b, ... FROM one_table WHERE
+  *        (v1 = const1 AND v2 = const2 [ vn = constn ]) UNION
+  *       SELECT a,b, ... FROM one_table WHERE
+  *        (v1 = const3 AND v2 = const4 [ vn = constn ]) UNION
+  *       SELECT a,b, ... FROM one_table WHERE
+  *        (v1 = const5 AND v2 = const6 [ vn = constn ]) UNION
+  *                         ...
+  *       SELECT a,b, ... FROM one_table WHERE
+  *        [(v1 = constn AND v2 = constn [ vn = constn ])]
+  *
+  *
+  *   To qualify for transformation the query must not be a sub select,
+  *   a HAVING, or a GROUP BY.   It must be a single table and have KSQO
+  *   set to 'on'.
+  *
+  *   The primary use of this transformation is to avoid the exponrntial
+  *   memory consumption of cnfify() and to make use of index access
+  *   methods.
+  *
+  *        daveh@insightdist.com   1998-08-31
+  *
+  *   Needs to better identify the signeture WHERE clause.
+  *   May want to also prune out duplicate where clauses.
+  **********************************************************************/
+ void
+ transformKeySetQuery(Query *origNode)
+ {
+ 	/*   Qualify as a key set query candidate  */
+ 	if (_use_keyset_query_optimizer == FALSE ||
+ 			origNode->groupClause ||
+ 			origNode->havingQual ||
+ 			origNode->hasAggs ||
+ 			origNode->utilityStmt ||
+ 			origNode->unionClause ||
+ 			origNode->unionall ||
+ 			origNode->hasSubLinks ||
+ 			origNode->commandType != CMD_SELECT)
+ 		return;
+
+ 	/*  Qualify single table query   */
+
+ 	/*  Qualify where clause */
+ 	if  ( ! inspectOrNode((Expr*)origNode->qual))  {
+ 		return;
+ 	}
+
+ 	/*  Copy essential elements into a union node */
+ 	/*
+ 	elog(NOTICE, "OR_EXPR=%d, OP_EXPR=%d, AND_EXPR=%d", OR_EXPR, OP_EXPR, AND_EXPR);
+ 	elog(NOTICE, "T_List=%d, T_Expr=%d, T_Var=%d, T_Const=%d", T_List, T_Expr, T_Var, T_Const);
+ 	elog(NOTICE, "opType=%d", ((Expr*)origNode->qual)->opType);
+ 	*/
+ 	while (((Expr*)origNode->qual)->opType == OR_EXPR)  {
+ 		Query	   *unionNode = makeNode(Query);
+
+ 		/*   Pull up Expr =  */
+ 		unionNode->qual = lsecond(((Expr*)origNode->qual)->args);
+
+ 		/*   Pull up balance of tree  */
+ 		origNode->qual = lfirst(((Expr*)origNode->qual)->args);
+
+ 		/*
+ 		elog(NOTICE, "origNode: opType=%d, nodeTag=%d", ((Expr*)origNode->qual)->opType, nodeTag(origNode->qual));
+ 		elog(NOTICE, "unionNode: opType=%d, nodeTag=%d", ((Expr*)unionNode->qual)->opType, nodeTag(unionNode->qual));
+ 		*/
+
+ 		unionNode->commandType = origNode->commandType;
+ 		unionNode->resultRelation = origNode->resultRelation;
+ 		unionNode->isPortal = origNode->isPortal;
+ 		unionNode->isBinary = origNode->isBinary;
+
+ 		if (origNode->uniqueFlag)
+ 			unionNode->uniqueFlag = pstrdup(origNode->uniqueFlag);
+
+ 		Node_Copy(origNode, unionNode, sortClause);
+ 		Node_Copy(origNode, unionNode, rtable);
+ 		Node_Copy(origNode, unionNode, targetList);
+
+ 		origNode->unionClause = lappend(origNode->unionClause, unionNode);
+ 	}
+ 	return;
+ }
+
+
+
+
+ static int
+ inspectOrNode(Expr *expr)
+ {
+ 	int fr = 0, sr = 0;
+ 	Expr *firstExpr, *secondExpr;
+
+ 	if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == OR_EXPR))
+ 		return 0;
+
+ 	firstExpr = lfirst(expr->args);
+ 	secondExpr = lsecond(expr->args);
+ 	if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr)
+ 		return 0;
+
+ 	if (firstExpr->opType == OR_EXPR)
+ 		fr = inspectOrNode(firstExpr);
+ 	else if (firstExpr->opType == OP_EXPR)    /*   Need to make sure it is last  */
+ 		fr = inspectOpNode(firstExpr);
+ 	else if (firstExpr->opType == AND_EXPR)    /*   Need to make sure it is last  */
+ 		fr = inspectAndNode(firstExpr);
+
+
+ 	if (secondExpr->opType == AND_EXPR)
+ 		sr = inspectAndNode(secondExpr);
+ 	else if (secondExpr->opType == OP_EXPR)
+ 		sr = inspectOpNode(secondExpr);
+
+ 	return (fr && sr);
+ }
+
+
+ static int
+ inspectAndNode(Expr *expr)
+ {
+ 	int fr = 0, sr = 0;
+ 	Expr *firstExpr, *secondExpr;
+
+ 	if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == AND_EXPR))
+ 		return 0;
+
+ 	firstExpr = lfirst(expr->args);
+ 	secondExpr = lsecond(expr->args);
+ 	if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr)
+ 		return 0;
+
+ 	if (firstExpr->opType == AND_EXPR)
+ 		fr = inspectAndNode(firstExpr);
+ 	else if (firstExpr->opType == OP_EXPR)
+ 		fr = inspectOpNode(firstExpr);
+
+ 	if (secondExpr->opType == OP_EXPR)
+ 		sr = inspectOpNode(secondExpr);
+
+ 	return (fr && sr);
+ }
+
+
+ static int
+ /******************************************************************
+  *  Return TRUE if T_Var = T_Const, else FALSE
+  *  Actually it does not test for =.    Need to do this!
+  ******************************************************************/
+ inspectOpNode(Expr *expr)
+ {
+ 	Expr *firstExpr, *secondExpr;
+
+ 	if (nodeTag(expr) != T_Expr || expr->opType != OP_EXPR)
+ 		return 0;
+
+ 	firstExpr = lfirst(expr->args);
+ 	secondExpr = lsecond(expr->args);
+ 	return  (firstExpr && secondExpr && nodeTag(firstExpr) == T_Var && nodeTag(secondExpr) == T_Const);
+ }
*** ./include/commands/variable.h.orig	Thu Jul 30 19:27:05 1998
--- ./include/commands/variable.h	Mon Aug 31 17:23:32 1998
***************
*** 54,58 ****
--- 54,61 ----
  extern bool show_geqo(void);
  extern bool reset_geqo(void);
  extern bool parse_geqo(const char *);
+ extern bool show_ksqo(void);
+ extern bool reset_ksqo(void);
+ extern bool parse_ksqo(const char *);

  #endif							/* VARIABLE_H */
*** ./include/optimizer/planmain.h.orig	Mon Aug 31 18:27:03 1998
--- ./include/optimizer/planmain.h	Mon Aug 31 18:26:04 1998
***************
*** 67,71 ****
--- 67,72 ----
  extern List *check_having_qual_for_aggs(Node *clause,
  					List *subplanTargetList, List *groupClause);
  extern List *check_having_qual_for_vars(Node *clause, List *targetlist_so_far);
+ extern void transformKeySetQuery(Query *origNode);

  #endif							/* PLANMAIN_H */

--------------BEFD1E6DA78A2DC20B524E32--


From daveh@insightdist.com Thu Sep  3 12:34:48 1998
Received: from u1.abs.net (root@u1.abs.net [207.114.0.131])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA07696
	for <maillist@candle.pha.pa.us>; Thu, 3 Sep 1998 12:34:46 -0400 (EDT)
Received: from insightdist.com (nobody@localhost)
	by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA23590
	for maillist@candle.pha.pa.us; Thu, 3 Sep 1998 12:17:44 -0400 (EDT)
X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f
Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03)
          id AA56436; Thu, 3 Sep 1998 11:51:24 -0400
Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03)
          id AA45986; Thu, 3 Sep 1998 11:51:24 -0400
Message-Id: <35EEBBEF.2158F68A@insightdist.com>
Date: Thu, 03 Sep 1998 11:55:28 -0400
From: David Hartwig <daveh@insightdist.com>
Organization: Insight Distribution Systems
X-Mailer: Mozilla 4.05 [en] (Win95; I)
Mime-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Cc: David Hartwig <daybee@bellatlantic.net>, pgsql-patches@postgreSQL.org
Subject: Re: [PATCHES] Interim AND/OR memory exaustion fix.
References: <199809030236.WAA22888@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO


Bruce Momjian wrote:

> > I will be cleaning this up more before the Oct 1 deadline.
>
> > *** ./backend/commands/variable.c.orig        Thu Jul 30 19:25:26 1998
> > --- ./backend/commands/variable.c     Mon Aug 31 17:23:32 1998
>
> Applied.  Let's keep talking to see if we can come up with a nice
> general solution to this.
>

Agreed.

> I have been thinking, and the trouble case is a query that uses only one
> table, and had only "column = value" statements.  I believe this can be
> easily identified and reworked somehow.
>

If you are referring to the AND'less set of OR's, I do have plans to not let
that qualify since you have gotten the index scan working with OR's.

I also think that the qualification process should be tightened up.   For
example force the number of AND's to be the same in each OR grouping.   And
have at least n OR's to qualify.    We just need to head off the memory
exhaustion.

> Your subtable idea may be a good one.
>

This sounds like a 6.5 thing.   I needed to stop the bleeding for 6.4.


From bga@mug.org Tue Sep  8 03:39:37 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA06237
	for <maillist@candle.pha.pa.us>; Tue, 8 Sep 1998 03:39:36 -0400 (EDT)
Received: from bgalli.mug.org (bajor.mug.org [207.158.132.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id DAA03648 for <maillist@candle.pha.pa.us>; Tue, 8 Sep 1998 03:38:52 -0400 (EDT)
Received: from localhost (bga@localhost) by bgalli.mug.org (8.8.7/SCO5) with SMTP id DAA02895 for <maillist@candle.pha.pa.us>; Tue, 8 Sep 1998 03:31:26 -0400 (EDT)
Message-Id: <199809080731.DAA02895@bgalli.mug.org>
X-Authentication-Warning: bgalli.mug.org: bga@localhost didn't use HELO protocol
X-Mailer: exmh version 2.0.2 2/24/98
From: "Billy G. Allie" <Bill.Allie@mug.org>
Reply-To: "Billy G. Allie" <Bill.Allie@mug.org>
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: [HACKERS] flock patch breaks things here
In-reply-to: Your message of "Mon, 31 Aug 1998 00:36:34 EDT."
             <199808310436.AAA07618@candle.pha.pa.us>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 08 Sep 1998 03:31:26 -0400
Sender: bga@mug.org
Status: ROr

Bruce Momjian writes:

> I have been thinking about this.  First, we can easily use fopen(r+) to
> check to see if the file exists, and if it does read the pid and do a
> kill -0 to see if it is running.  If no one else does it, I will take it
> on.

It is better to use open with the O_CREAT and O_EXCL set.  If the file does not
exist it will be created and the PID can be written to it.  If the file exists
then the call will fail, at which point it can be opened with fread, and the
PID it contains can be checked to see if it still exists with kill.  The open
call has the added advantage that 'The check for the existence of the file and
the creation of the file if it does not exist is atomic with respect to other
processes executing open naming the same filename in the same directory with
O_EXCL and O_CREAT set.' [from the UnixAWare 7 man page, open(2)].

Also, you can't just delete the file, create it and write the your PID to it
and assume that you have the lock, you need to close the file, sleep some
small amount of time and then open and read the file to see if you still have
the lock.  If you like, I can take this task on.

Oh, the postmaster must clear the PID when it exits.

>
> Second, where to put the pid file.  There is reason to put in /tmp,
> because it will get cleared in a reboot, and because it is locking the
> port number 5432.  There is also reason to put it in /data because you
> can't have more than one postmaster running on a single data directory.
>
> So, we really want to lock both places.  If this is going to make it
> easier for people to run more than one postmaster, because it will
> prevent/warn administrators when they try and put two postmasters in the
> same data dir or port, I say create the pid lock files both places, and
> give the admin a clear description of what he is doing wrong in each
> case.

IHMO, the pid should be put in the data directory.  The reasoning that it will get cleared in a reboot is not sufficent since the logic used to create the PID file will delete it if the PID it contains is not a running process.  Besides, I have used systems where /tmp was not cleared out on a re-boot (for various reasons).  Also, I would rather have a script that explicitly removes the PID locking file at system statup (if it exists), in which case, it doesn't matter where it resides.
--
____       | Billy G. Allie    | Domain....: Bill.Allie@mug.org
|  /|      | 7436 Hartwell     | Compuserve: 76337,2061
|-/-|----- | Dearborn, MI 48126| MSN.......: B_G_Allie@email.msn.com
|/  |LLIE  | (313) 582-1540    |


From owner-pgsql-general@hub.org Thu Oct  1 14:00:57 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA12443
	for <maillist@candle.pha.pa.us>; Thu, 1 Oct 1998 14:00:56 -0400 (EDT)
Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA07930 for <maillist@candle.pha.pa.us>; Thu, 1 Oct 1998 13:57:47 -0400 (EDT)
Received: from localhost (majordom@localhost)
	by hub.org (8.8.8/8.8.8) with SMTP id NAA26913;
	Thu, 1 Oct 1998 13:56:29 -0400 (EDT)
	(envelope-from owner-pgsql-general@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 01 Oct 1998 13:55:56 +0000 (EDT)
Received: (from majordom@localhost)
	by hub.org (8.8.8/8.8.8) id NAA26856
	for pgsql-general-outgoing; Thu, 1 Oct 1998 13:55:54 -0400 (EDT)
	(envelope-from owner-pgsql-general@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-general@postgreSQL.org using -f
Received: from mail.utexas.edu (wb3-a.mail.utexas.edu [128.83.126.138])
	by hub.org (8.8.8/8.8.8) with SMTP id NAA26840
	for <pgsql-general@hub.org>; Thu, 1 Oct 1998 13:55:49 -0400 (EDT)
	(envelope-from taral@mail.utexas.edu)
Received: (qmail 1198 invoked by uid 0); 1 Oct 1998 17:55:40 -0000
Received: from dial-24-13.ots.utexas.edu (HELO taral) (128.83.128.157)
  by umbs-smtp-3 with SMTP; 1 Oct 1998 17:55:40 -0000
From: "Taral" <taral@mail.utexas.edu>
To: <pgsql-general@hub.org>
Subject: [GENERAL] CNF vs DNF
Date: Thu, 1 Oct 1998 12:55:39 -0500
Message-ID: <000001bded64$b34b2200$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
In-Reply-To: <F10BB1FAF801D111829B0060971D839F445B3D@cpsmail>
Importance: Normal
Sender: owner-pgsql-general@postgreSQL.org
Precedence: bulk
Status: RO

> select * from aa where (bb = 2 and ff = 3) or (bb = 4 and ff = 5);

I've been told that the system restructures these in CNF (conjunctive normal
form)... i.e. the above query turns into:

select * from aa where (bb = 2 or bb = 4) and (ff = 3 or bb = 4) and (bb = 2
or ff = 5) and (ff = 3 or ff = 5);

Much longer and much less efficient, AFAICT. Isn't it more efficient to do a
union of many queries (DNF) than an intersection of many subqueries (CNF)?
Certainly remembering the subqueries takes less memory... Also, queries
already in DNF are probably more common than queries in CNF, requiring less
rewrite.

Can someone clarify this?

Taral


From taral@mail.utexas.edu Fri Oct  2 01:35:42 1998
Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134])
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id BAA28231
	for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 01:35:27 -0400 (EDT)
Received: (qmail 16318 invoked by uid 0); 2 Oct 1998 05:35:13 -0000
Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216)
  by umbs-smtp-1 with SMTP; 2 Oct 1998 05:35:13 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>
Cc: <pgsql-general@postgreSQL.org>
Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Fri, 2 Oct 1998 00:35:12 -0500
Message-ID: <000001bdedc6$6cf75d20$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <199810020218.WAA23299@candle.pha.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Status: ROr

> It currently convert to CNF so it can select the most restrictive
> restriction and join, and use those first.  However, the CNF conversion
> is a memory exploder for some queries, and we certainly need to have
> another method to split up those queries into UNIONS.  I think we need
> to code to identify those queries capable of being converted to UNIONS,
> and do that before the query gets to the CNF section.  That would be
> great, and David Hartwig has implemented a limited capability of doing
> this, but we really need a general routine to do this with 100%
> reliability.

Well, if you're talking about a routine to generate a heuristic for CNF vs.
DNF, it is possible to precalculate the query sizes for CNF and DNF
rewrites...

For conversion to CNF:

At every node:

if nodeType = AND then f(node) = f(left) + f(right)
if nodeType = OR then f(node) = f(left) * f(right)

f(root) = a reasonably (but not wonderful) metric

For DNF just switch AND and OR in the above. You may want to compute both
metrics and compare... take the smaller one and use that path.

How to deal with other operators depends on their implementation...

Taral


From taral@mail.utexas.edu Fri Oct  2 12:48:27 1998
Received: from mail.utexas.edu (wb4-a.mail.utexas.edu [128.83.126.140])
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id MAA11438
	for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 12:48:25 -0400 (EDT)
Received: (qmail 15628 invoked by uid 0); 2 Oct 1998 16:47:50 -0000
Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216)
  by umbs-smtp-4 with SMTP; 2 Oct 1998 16:47:50 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>
Cc: <hackers@postgreSQL.org>
Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Fri, 2 Oct 1998 11:47:48 -0500
Message-ID: <000301bdee24$63308740$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-reply-to: <199810021640.MAA10925@candle.pha.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Status: RO

> > Create a temporary oid hash? (for each table selected on, I guess)
>
> What I did with indexes was to run the previous OR clause index
> restrictions through the qualification code, and make sure it failed,
> but I am not sure how that is going to work with a more complex WHERE
> clause.  Perhaps I need to restrict this to just simple cases of
> constants, which are easy to pick out an run through.  Doing this with
> joins would be very hard, I think.

Actually, I was thinking more of an index of returned rows... After each
subquery, the backend would check each row to see if it was already in the
index... Simple duplicate check, in other words. Of course, I don't know how
well this would behave with large tables being returned...

Anyone else have some ideas they want to throw in?

Taral


From taral@mail.utexas.edu Fri Oct  2 17:13:01 1998
Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134])
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA20838
	for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 17:12:27 -0400 (EDT)
Received: (qmail 17418 invoked by uid 0); 2 Oct 1998 21:12:19 -0000
Received: from dial-46-30.ots.utexas.edu (HELO taral) (128.83.112.158)
  by umbs-smtp-1 with SMTP; 2 Oct 1998 21:12:19 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>, <jwieck@debis.com>
Cc: <hackers@postgreSQL.org>
Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Fri, 2 Oct 1998 16:12:19 -0500
Message-ID: <000001bdee49$56c7cd40$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-reply-to: <199810021758.NAA15524@candle.pha.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Status: ROr

> Another idea is that we rewrite queries such as:
>
> 	SELECT *
> 	FROM tab
> 	WHERE (a=1 AND b=2 AND c=3) OR
> 	      (a=1 AND b=2 AND c=4) OR
> 	      (a=1 AND b=2 AND c=5) OR
> 	      (a=1 AND b=2 AND c=6)
>
> into:
>
> 	SELECT *
> 	FROM tab
> 	WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6)

Very nice, but that's like trying to code factorization of numbers... not
pretty, and very CPU intensive on complex queries...

Taral


From taral@mail.utexas.edu Fri Oct  2 17:49:59 1998
Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136])
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA21488
	for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 17:49:52 -0400 (EDT)
Received: (qmail 23729 invoked by uid 0); 2 Oct 1998 21:49:27 -0000
Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22)
  by umbs-smtp-2 with SMTP; 2 Oct 1998 21:49:27 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>
Cc: <jwieck@debis.com>, <hackers@postgreSQL.org>
Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Fri, 2 Oct 1998 16:49:26 -0500
Message-ID: <000001bdee4e$86688b20$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <199810022139.RAA21082@candle.pha.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Status: ROr

> > Very nice, but that's like trying to code factorization of
> numbers... not
> > pretty, and very CPU intensive on complex queries...
>
> Yes, but how large are the WHERE clauses going to be?  Considering the
> cost of cnfify() and UNION, it seems like a clear win.  Is it general
> enough to solve our problems?

Could be... the examples I received where the cnfify() was really bad were
cases where the query was submitted alredy in DNF... and where the UNION was
a simple one. However, I don't know of any algorithms for generic
simplification of logical constraints. One problem is resolution/selection
of factors:

SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c
= 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6);

Try that on for size. You can understand why that code gets ugly, fast.
Somebody could try coding it, but it's not a clear win to me.

My original heuristic was missing one thing: "Where the heuristic fails to
process or decide, default to CNF." Since that's the current behavior, we're
less likely to break things.

Taral


From owner-pgsql-hackers@hub.org Fri Oct  2 19:28:09 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA23341
	for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 19:28:08 -0400 (EDT)
Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id SAA18003 for <maillist@candle.pha.pa.us>; Fri, 2 Oct 1998 18:21:37 -0400 (EDT)
Received: from localhost (majordom@localhost)
	by hub.org (8.8.8/8.8.8) with SMTP id SAA01250;
	Fri, 2 Oct 1998 18:08:02 -0400 (EDT)
	(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 02 Oct 1998 18:04:37 +0000 (EDT)
Received: (from majordom@localhost)
	by hub.org (8.8.8/8.8.8) id SAA00847
	for pgsql-hackers-outgoing; Fri, 2 Oct 1998 18:04:35 -0400 (EDT)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f
Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136])
	by hub.org (8.8.8/8.8.8) with SMTP id SAA00806
	for <hackers@postgreSQL.org>; Fri, 2 Oct 1998 18:04:26 -0400 (EDT)
	(envelope-from taral@mail.utexas.edu)
Received: (qmail 29662 invoked by uid 0); 2 Oct 1998 22:04:25 -0000
Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22)
  by umbs-smtp-2 with SMTP; 2 Oct 1998 22:04:25 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>
Cc: <jwieck@debis.com>, <hackers@postgreSQL.org>
Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Fri, 2 Oct 1998 17:04:24 -0500
Message-ID: <000201bdee50$9d9c4320$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
In-Reply-To: <199810022157.RAA21769@candle.pha.pa.us>
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: ROr

> How do we do that with UNION, and return the right rows.  Seems the
> _join_ happending multiple times would be much worse than the factoring.

Ok... We have two problems:

1) DNF for unjoined queries.
2) Factorization for the rest.

I have some solutions for (1). Not for (2). Remember that unjoined queries
are quite common. :)

For (1), we can always try to parallel the multiple queries... especially in
the case where a sequential search is required.

Taral


From owner-pgsql-hackers@hub.org Sat Oct  3 23:32:35 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06644
	for <maillist@candle.pha.pa.us>; Sat, 3 Oct 1998 23:31:13 -0400 (EDT)
Received: from hub.org (root@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id XAA26912 for <maillist@candle.pha.pa.us>; Sat, 3 Oct 1998 23:14:01 -0400 (EDT)
Received: from localhost (majordom@localhost)
	by hub.org (8.8.8/8.8.8) with SMTP id WAA04407;
	Sat, 3 Oct 1998 22:07:05 -0400 (EDT)
	(envelope-from owner-pgsql-hackers@hub.org)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sat, 03 Oct 1998 22:02:00 +0000 (EDT)
Received: (from majordom@localhost)
	by hub.org (8.8.8/8.8.8) id WAA04010
	for pgsql-hackers-outgoing; Sat, 3 Oct 1998 22:01:59 -0400 (EDT)
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67])
	by hub.org (8.8.8/8.8.8) with ESMTP id WAA03968
	for <hackers@postgreSQL.org>; Sat, 3 Oct 1998 22:00:37 -0400 (EDT)
	(envelope-from maillist@candle.pha.pa.us)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.9.0/8.9.0) id VAA04640;
	Sat, 3 Oct 1998 21:57:30 -0400 (EDT)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199810040157.VAA04640@candle.pha.pa.us>
Subject: Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
In-Reply-To: <000201bdee50$9d9c4320$3b291f0a@taral> from Taral at "Oct 2, 1998  5: 4:24 pm"
To: taral@mail.utexas.edu (Taral)
Date: Sat, 3 Oct 1998 21:57:30 -0400 (EDT)
Cc: jwieck@debis.com, hackers@postgreSQL.org
X-Mailer: ELM [version 2.4ME+ PL47 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@postgreSQL.org
Precedence: bulk
Status: RO


I have another idea.

When we cnfify, this:

	(A AND B) OR (C AND D)

becomes

	(A OR C) AND (A OR D) AND (B OR C) AND (B OR D)

however if A and C are identical, this could become:

	(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)

and A OR A is A:

	A AND (A OR D) AND (B OR A) AND (B OR D)

and since we are now saying A has to be true, we can remove OR's with A:

	A AND (B OR D)

Much smaller, and a big win for queries like:

	SELECT *
	FROM tab
	WHERE	(a=1 AND b=2) OR
		(a=1 AND b=3)

This becomes:

		(a=1) AND (b=2 OR b=3)

which is accurate, and uses our OR indexing.

Seems I could code cnfify() to look for identical qualifications in two
joined OR clauses and remove the duplicates.

Sound like big win, and fairly easy and inexpensive in processing time.

Comments?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


From taral@mail.utexas.edu Sat Oct  3 22:43:41 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA05961
	for <maillist@candle.pha.pa.us>; Sat, 3 Oct 1998 22:42:18 -0400 (EDT)
Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) by renoir.op.net (o1/$ Revision: 1.18 $) with SMTP id WAA25111 for <maillist@candle.pha.pa.us>; Sat, 3 Oct 1998 22:27:34 -0400 (EDT)
Received: (qmail 25622 invoked by uid 0); 4 Oct 1998 02:26:21 -0000
Received: from dial-42-9.ots.utexas.edu (HELO taral) (128.83.111.217)
  by umbs-smtp-2 with SMTP; 4 Oct 1998 02:26:21 -0000
From: "Taral" <taral@mail.utexas.edu>
To: "Bruce Momjian" <maillist@candle.pha.pa.us>
Cc: <jwieck@debis.com>, <hackers@postgreSQL.org>
Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF)
Date: Sat, 3 Oct 1998 21:26:20 -0500
Message-ID: <000501bdef3e$5f5293a0$3b291f0a@taral>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
Importance: Normal
In-Reply-To: <199810040157.VAA04640@candle.pha.pa.us>
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0
Status: ROr

> however if A and C are identical, this could become:
>
> 	(A OR A) AND (A OR D) AND (B OR A) AND (B OR D)
>
> and A OR A is A:
>
> 	A AND (A OR D) AND (B OR A) AND (B OR D)
>
> and since we are now saying A has to be true, we can remove OR's with A:
>
> 	A AND (B OR D)

Very nice... and you could do that after each iteration of the rewrite,
preventing the size from getting too big. :)

I have a symbolic expression tree evaluator that would be perfect for
this... I'll see if I can't adapt it.

Can someone mail me the structures for expression trees? I don't want to
have to excise them from the source. Please?

Taral


From daveh@insightdist.com Mon Nov  9 13:31:07 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA00997
	for <maillist@candle.pha.pa.us>; Mon, 9 Nov 1998 13:31:00 -0500 (EST)
Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA26657 for <maillist@candle.pha.pa.us>; Mon, 9 Nov 1998 13:10:14 -0500 (EST)
Received: from insightdist.com (nobody@localhost)
	by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA17710
	for maillist@candle.pha.pa.us; Mon, 9 Nov 1998 12:52:05 -0500 (EST)
X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f
Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03)
          id AA43498; Mon, 9 Nov 1998 12:38:24 -0500
Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03)
          id AA54446; Mon, 9 Nov 1998 12:38:23 -0500
Message-Id: <3647296F.6F7FDDD2@insightdist.com>
Date: Mon, 09 Nov 1998 12:42:07 -0500
From: David Hartwig <daveh@insightdist.com>
Organization: Insight Distribution Systems
X-Mailer: Mozilla 4.5 [en] (Win98; I)
X-Accept-Language: en
Mime-Version: 1.0
To: Bob Kruger <bkruger@mindspring.com>,
        Bruce Momjian <maillist@candle.pha.pa.us>
Cc: pgsql-general@postgreSQL.org, Byron Nikolaidis <byronn@insightdist.com>
Subject: Re: [GENERAL] Incrementing a Serial Field
References: <3.0.5.32.19981109110757.0082c950@mindspring.com>
Content-Type: multipart/mixed;
	boundary="------------3D3EE7F67DFC542D3928BB7E"
Status: ROr

This is a multi-part message in MIME format.
--------------3D3EE7F67DFC542D3928BB7E
Content-Type: multipart/alternative;
 boundary="------------43E2CC34278FA08EFC9E0611"


--------------43E2CC34278FA08EFC9E0611
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Bob Kruger wrote:

> The second question is that I noticed the ODBC bug (feature?) when linking
> Postgres to MS Access still exists.  This bug occurs when linking a MS
> Access table to a Postgres table, and identifying more than one field as
> the unique record identifier.  This makes Postgres run until it exhausts
> all available memory.  Does anyone know a way around this?  Enabling read
> only ODBC is a feature I would like to make available, but I do not want
> the possibility of postgres crashing because of an error on the part of a
> MS Access user.
>
> BTW - Having capability to be linked to an Access database is not an
> option.  The current project I am working on calls for that, so it is a
> necessary evil that I hav to live with.
>

In the driver connection settings add the following line.

    SET ksql TO 'on';

Stands for: keyset query optimization.  This is not considered a final
solution.  As such, it is undocumented.   Some time in the next day or so, we
will be releasing a version of the driver which will automatically SET ksqo.

You will most likely be satisfied with the results.   One problem with this
solution, however,  is that it does not work if you have any (some kinds of?)
arrays in the table you are browsing.   This is a sideffect of the rewrite to a
UNION which performs an internal sort unique.

Also, if you are using row versioning you may need to overload some operators
for xid and int4.  I have included a script that will take care of this.

Bruce, can I get these operators hardcoded into 6.4.1- assuming there will be
one.   The operators  necessitated by the UNION sideffects.


--------------43E2CC34278FA08EFC9E0611
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
&nbsp;
<p>Bob Kruger wrote:
<blockquote TYPE=CITE>The second question is that I noticed the ODBC bug
(feature?) when linking
<br>Postgres to MS Access still exists.&nbsp; This bug occurs when linking
a MS
<br>Access table to a Postgres table, and identifying more than one field
as
<br>the unique record identifier.&nbsp; This makes Postgres run until it
exhausts
<br>all available memory.&nbsp; Does anyone know a way around this?&nbsp;
Enabling read
<br>only ODBC is a feature I would like to make available, but I do not
want
<br>the possibility of postgres crashing because of an error on the part
of a
<br>MS Access user.
<p>BTW - Having capability to be linked to an Access database is not an
<br>option.&nbsp; The current project I am working on calls for that, so
it is a
<br>necessary evil that I hav to live with.
<br>&nbsp;</blockquote>
In the driver connection settings add the following line.
<p>&nbsp;&nbsp;<tt>&nbsp; SET ksql TO 'on';</tt><tt></tt>
<p>Stands for: keyset query optimization.&nbsp; This is not considered
a final solution.&nbsp; As such, it is undocumented.&nbsp;&nbsp; Some time
in the next day or so, we will be releasing a version of the driver which
will automatically SET ksqo.
<p>You will most likely be satisfied with the results.&nbsp;&nbsp; One
problem with this solution, however,&nbsp; is that it does not work if
you have any (some kinds of?) arrays in the table you are browsing.&nbsp;&nbsp;
This is a sideffect of the rewrite to a UNION which performs an internal
sort unique.
<p>Also, if you are using row versioning you may need to overload some
operators for xid and int4.&nbsp; I have included a script that will take
care of this.
<p>Bruce, can I get these operators hardcoded into 6.4.1- assuming there
will be one.&nbsp;&nbsp; The operators&nbsp; necessitated by the UNION
sideffects.
<br>&nbsp;</html>

--------------43E2CC34278FA08EFC9E0611--

--------------3D3EE7F67DFC542D3928BB7E
Content-Type: text/plain; charset=us-ascii;
 name="xidint4.sql"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="xidint4.sql"

--   Insight Distribution Systems - System V - Apr 1998
--   @(#)xidint4.sql	1.2 :/sccs/sql/extend/s.xidint4.sql 10/2/98 13:40:19"

create function int4eq(xid,int4)
  returns bool
  as ''
  language 'internal';

create operator = (
        leftarg=xid,
        rightarg=int4,
        procedure=int4eq,
        commutator='=',
        negator='<>',
        restrict=eqsel,
        join=eqjoinsel
        );

create function int4lt(xid,xid)
  returns bool
  as ''
  language 'internal';

create operator < (
        leftarg=xid,
        rightarg=xid,
        procedure=int4lt,
        commutator='=',
        negator='<>',
        restrict=eqsel,
        join=eqjoinsel
        );


--------------3D3EE7F67DFC542D3928BB7E--