From daybee@bellatlantic.net Sun Aug 23 20:21:48 1998 Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA26688 for ; Sun, 23 Aug 1998 20:21:46 -0400 (EDT) Received: from bellatlantic.net (client196-126-169.bellatlantic.net [151.196.126.169]) by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id UAA09478; Sun, 23 Aug 1998 20:18:35 -0400 (EDT) Message-ID: <35E0ABF0.578694C8@bellatlantic.net> Date: Sun, 23 Aug 1998 19:55:29 -0400 From: David Hartwig Organization: Home X-Mailer: Mozilla 4.04 [en] (Win95; I) MIME-Version: 1.0 To: Bruce Momjian CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 References: <199808220353.XAA04528@candle.pha.pa.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Status: ROr Bruce Momjian wrote: > > > > Hannu Krosing wrote: > > > > > > The days where every release fixed server crashes, or added a feature > > > > that users were 'screaming for' may be a thing of the past. > > > > > > Is anyone working on fixing the exploding optimisations for many OR-s, > > > at least the canonic case used by access? > > > > > > My impression is that this has fallen somewhere between > > > insightdist and Vadim. > > > > This is really big for the ODBCers. (And I suspect for JDBCers too.) Many > > desktop libraries and end-user tools depend on this "record set" strategy to > > operate effectively. > > > > I have put together a workable hack that runs just before cnfify(). The > > option is activated through the SET command. Once activated, it identifies > > queries with this particular multi-OR pattern generated by these RECORD SET > > strategies. Qualified query trees are rewritten as multiple UNIONs. (One > > for each OR grouping). > > > > The results are profound. Queries that used to scan tables because of the > > ORs, now make use of any indexes. Thus, the size of the table has virtually > > no effect on performance. Furthermore, queries that used to crash the > > backend, now run in under a second. > > > > Currently the down sides are: > > 1. If there is no usable index, performance is significantly worse. The > > patch does not check to make sure that there is a usable index. I could use > > some pointers on this. > > > > 2. Small tables are actually a bit slower than without the patch. > > > > 3. Not very elegant. I am looking for a more generalized solution. > > I have lots of ideas, but I would need to know the backend much better before > > attempting any of them. My favorite idea is before cnfify(), to factor the > > OR terms and pull out the constants into a virtual (temporary) table spaces. > > Then rewrite the query as a join. The optimizer will (should) treat the new > > query accordingly. This assumes that an efficient factoring algorithm exists > > and that temporary tables can exist in the heap. > > > > Illustration: > > SELECT ... FROM tab WHERE > > (var1 = const1 AND var2 = const2) OR > > (var1 = const3 AND var2 = const4) OR > > (var1 = const5 AND var2 = const6) > > > > SELECT ... FROM tab, tmp WHERE > > (var1 = var_x AND var2 = var_y) > > > > tmp > > var_x | var_y > > -------------- > > const1|const2 > > const3|const4 > > const5|const6 > > David, where are we on this? I know we have OR's using indexes. Do we > still need to look this as a fix, or are we OK. I have not gotten far > enough in the optimizer to know how to fix the Bruce, If the question is, have I come up with a solution for the cnf'ify problem: No If the question is, is it still important: Very much yes. It is essential for many RAD tools using remote data objects which make use of key sets. Your recent optimization of the OR list goes a long way, but inevitably users are confronted with multi-part keys. When I look at the problem my head spins. I do not have the experience (yet?) with the backend to be mucking around in the optimizer. As I see it, cnf'ify is doing just what it is supposed to do. Boundless boolean logic. I think hope may lay though, in identifying each AND'ed group associated with a key and tagging it as a special sub-root node which cnf'ify does not penetrate. This node would be allowed to pass to the later stages of the optimizer where it will be used to plan index scans. Easy for me to say. In the meantime, I still have the patch that I described in prior email. It has worked well for us. Let me restate that. We could not survive without it! However, I do not feel that is a sufficiently functional approach that should be incorporated as a final solution. I will submit the patch if you, (anyone) does not come up with a better solution. It is coded to be activated by a SET KSQO to minimize its reach. From daybee@bellatlantic.net Sun Aug 30 12:06:24 1998 Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA12860 for ; Sun, 30 Aug 1998 12:06:22 -0400 (EDT) Received: from bellatlantic.net (client196-126-73.bellatlantic.net [151.196.126.73]) by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id MAA18468; Sun, 30 Aug 1998 12:03:33 -0400 (EDT) Message-ID: <35E9726E.C6E73049@bellatlantic.net> Date: Sun, 30 Aug 1998 11:40:31 -0400 From: David Hartwig Organization: Home X-Mailer: Mozilla 4.06 [en] (Win98; I) MIME-Version: 1.0 To: Bruce Momjian CC: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 References: <199808290344.XAA28089@candle.pha.pa.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Status: RO Bruce Momjian wrote: > OK, let me try this one. > > Why is the system cnf'ifying the query. Because it wants to have a > list of qualifications that are AND'ed, so it can just pick the most > restrictive/cheapest, and evaluate that one first. > > If you have: > > (a=b and c=d) or e=1 > > In this case, without cnf'ify, it has to evaluate both of them, because > if one is false, you can't be sure another would be true. In the > cnf'ify case, > > (a=b or e=1) and (c=d or e=1) > > In this case, it can choose either, and act on just one, if a row fails > to meet it, it can stop and not evaluate it using the other restriction. > > The fact is that it is only going to use fancy join/index in one of the > two cases, so it tries to pick the best one, and does a brute-force > qualification test on the remaining item if the first one tried is true. > > The problem is of course large where clauses can exponentially expand > this. What it really trying to do is to pick a cheapest restriction, > but the memory explosion and query failure are serious problems. > > The issue is that it thinks it is doing something to help things, while > it is actually hurting things. > > In the ODBC case of: > > (x=3 and y=4) or > (x=3 and y=5) or > (x=3 and y=6) or ... > > it clearly is not going to gain anything by choosing any CHEAPEST path, > because they are all the same in terms of cost, and the use by ODBC > clients is hurting reliability. > > I am inclined to agree with David's solution of breaking apart the query > into separate UNION queries in certain cases. It seems to be the most > logical solution, because the cnf'ify code is working counter to its > purpose in these cases. > > Now, the question is how/where to implement this. I see your idea of > making the OR a join to a temp table that holds all the constants. > Another idea would be to do actual UNION queries: > > SELECT * FROM tab > WHERE (x=3 and y=4) > UNION > SELECT * FROM tab > WHERE (x=3 and y=5) > UNION > SELECT * FROM tab > WHERE (x=3 and y=6) ... > > This would work well for tables with indexes, but for a sequential scan, > you are doing a sequential scan for each UNION. Practically speaking, the lack of an index concern, may not be justified. The reason these queries are being generated, with this shape, is because remote data objects on the client side are being told that a primary key exists on these tables. The object is told about these keys in one of two ways. 1. It queries the database for the primary key of the table. The ODBC driver serviced this request by querying for the attributes used in {table_name}_pkey. 2. The user manually specifies the primary key. In this case an actual index may not exist. (i.e. MS Access asks the user for this information if a primary key is not found in a table) The second case is the only one that would cause a problem. Fortunately, the solution is simple. Add a primary key index! My only concern is to be able to accurately identify a query with the proper signature before rewriting it as a UNION. To what degree should this inspection be taken? BTW, I would not do the rewrite on OR's without AND's since you have fixed the OR's use of the index. There is one other potential issue. My experience with using arrays in tables and UNIONS creates problems. There are missing array comparison operators which are used by the implied DISTINCT. > Another idea is > subselects. Also, you have to make sure you return the proper rows, > keeping duplicates where they are in the base table, but not returning > them when the meet more than one qualification. > > SELECT * FROM tab > WHERE (x,y) IN (SELECT 3, 4 > UNION > SELECT 3, 5 > UNION > SELECT 3, 6) > > I believe we actually support this. This is not going to use an index > on tab, so it may be slow if x and y are indexed. > > Another more bizarre solution is: > > SELECT * FROM tab > WHERE (x,y) = (SELECT 3, 4) OR > (x,y) = (SELECT 3, 5) OR > (x,y) = (SELECT 3, 6) > > Again, I think we do this too. I don't think cnf'ify does anything with > this. I also believe "=" uses indexes on subselects, while IN does not > because IN could return lots of rows, and an index is slower than a > non-index join on lots of rows. Of course, now that we index OR's. > > Let me ask another question. If I do: > > SELECT * FROM tab WHERE x=3 OR x=4 > > it works, and uses indexes. Why can't the optimizer just not cnf'ify > things sometimes, and just do: > > SELECT * FROM tab > WHERE (x=3 AND y=4) OR > (x=3 AND y=5) OR > (x=3 AND y=6) > > Why can it handle x=3 OR x=4, but not the more complicated case above, > without trying to be too smart? If x,y is a multi-key index, it could > use that quite easily. If not, it can do a sequentail scan and run the > tests. > > Another issue. To the optimizer, x=3 and x=y are totally different. In > x=3, it is a column compared to a constant, while in x=y, it is a join. > That makes a huge difference. > > In the case of (a=b and c=d) or e=1, you pick the best path and do the > a=b join, and throw in the e=1 entries. You can't easily do both joins, > because you also need the e=1 stuff. > > I wounder what would happen if we prevent cnf'ifying of cases where the > OR represent only column = constant restrictions. > > I meant to really go through the optimizer this month, but other backend > items took my time. > > Can someone run some tests on disabling the cnf'ify calls. It is my > understanding that with the non-cnf-ify'ed query, it can't choose an > optimial path, and starts to do either straight index matches, > sequential scans, or cartesian products where it joins every row to > every other row looking for a match. > > Let's say we turn off cnf-ify just for non-join queries. Does that > help? > > I am not sure of the ramifications of telling the optimizer it no longer > has a variety of paths to choose for evaluating the query. I did not try this earlier because I thought it was too good to be true. I was right. I tried commenting out the normalize() function in the cnfify(). The EXPLAIN showed a sequential scan and the resulting tuple set was empty. Time will not allow me to dig into this further this weekend. Unless you come up with a better solution, I am going to submit my patch on Monday to make the Sept. 1st deadline. It includes a SET switch to activate the rewrite so as not to cause problems outside the ODBC users. We can either improve, it or yank it, by the Oct. 1st deadline. From infotecn@tin.it Mon Aug 31 03:01:51 1998 Received: from mail.tol.it (mail.tin.it [194.243.154.49]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id DAA09740 for ; Mon, 31 Aug 1998 03:01:48 -0400 (EDT) Received: from Server.InfoTecna.com (a-mz6-50.tin.it [212.216.9.113]) by mail.tol.it (8.8.4/8.8.4) with ESMTP id JAA16451; Mon, 31 Aug 1998 09:00:35 +0200 (MET DST) Received: from tm3.InfoTecna.com (Tm1.InfoTecna.com [192.168.1.1]) by Server.InfoTecna.com (8.8.5/8.8.5) with SMTP id IAA18678; Mon, 31 Aug 1998 08:53:13 +0200 Message-Id: <3.0.5.32.19980831085312.00986cc0@MBox.InfoTecna.com> X-Sender: denis@MBox.InfoTecna.com X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Mon, 31 Aug 1998 08:53:12 +0200 To: David Hartwig , Bruce Momjian From: Sbragion Denis Subject: Re: [INTERFACES] Re: [HACKERS] changes in 6.4 Cc: hannu@trust.ee, pgsql-interfaces@postgreSQL.org, hackers@postgreSQL.org In-Reply-To: <35E9726E.C6E73049@bellatlantic.net> References: <199808290344.XAA28089@candle.pha.pa.us> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Status: RO Hello, At 11.40 30/08/98 -0400, David Hartwig wrote: >> Why is the system cnf'ifying the query. Because it wants to have a >> list of qualifications that are AND'ed, so it can just pick the most >> restrictive/cheapest, and evaluate that one first. Just a small question about all this optimizations stuff. I'm not a database expert but I think we are talking about a NP-complete problem. Could'nt we convert this optimization problem into another NP one that is known to have a good solution ? For example for the traveling salesman problem there's an alghoritm that provide a solution that's never more than two times the optimal one an provides results that are *really* near the optimal one most of the times. The simplex alghoritm may be another example. I think that this kind of alghoritm would be better than a collection ot tricks for special cases, and this tricks could be used anyway when special cases are detected. Furthermore I also know that exists a free program I used in the past that provides this kind of optimizations for chip design. I don't remember the exact name of the program but I remember it came from Berkeley university. Of course may be I'm totally missing the point. Hope it helps ! Bye! Dr. Sbragion Denis InfoTecna Tel, Fax: +39 39 2324054 URL: http://space.tin.it/internet/dsbragio From andreas.zeugswetter@telecom.at Mon Aug 31 06:31:13 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA14231 for ; Mon, 31 Aug 1998 06:31:12 -0400 (EDT) Received: from gandalf.telecom.at (gandalf.telecom.at [194.118.26.84]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id GAA21099 for ; Mon, 31 Aug 1998 06:23:41 -0400 (EDT) Received: from zeugswettera.user.lan.at (zeugswettera.user.lan.at [10.4.123.227]) by gandalf.telecom.at (A.B.C.Delta4/8.8.8) with SMTP id MAA38132; Mon, 31 Aug 1998 12:22:07 +0200 Received: by zeugswettera.user.lan.at with Microsoft Mail id <01BDD4DA.C7F5B690@zeugswettera.user.lan.at>; Mon, 31 Aug 1998 12:27:55 +0200 Message-ID: <01BDD4DA.C7F5B690@zeugswettera.user.lan.at> From: Andreas Zeugswetter To: "'maillist@candle.pha.pa.us'" Cc: "hackers@postgreSQL.org" Subject: AW: [INTERFACES] Re: [HACKERS] changes in 6.4 Date: Mon, 31 Aug 1998 12:22:05 +0200 Encoding: 31 TEXT Status: RO >Another idea would be to do actual UNION queries: > > SELECT * FROM tab > WHERE (x=3 and y=4) > UNION > SELECT * FROM tab > WHERE (x=3 and y=5) > UNION > SELECT * FROM tab > WHERE (x=3 and y=6) ... > >This would work well for tables with indexes, but for a sequential scan, >you are doing a sequential scan for each UNION. The most important Application for this syntax will be M$ Access because it uses this syntax to display x rows from a table in a particular sort order. In this case x and y will be the primary key and therefore have a unique index. So I think this special case should work good. The strategy could be something like: iff x, y is a unique index do the union access path else do something else done I think hand written SQL can always be rewritten if it is not fast enough using this syntax. Andreas From owner-pgsql-patches@hub.org Tue Sep 1 02:01:10 1998 Received: from hub.org (hub.org [209.47.148.200]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA28687 for ; Tue, 1 Sep 1998 02:01:06 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA02180; Tue, 1 Sep 1998 01:48:43 -0400 (EDT) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 01 Sep 1998 01:47:48 +0000 (EDT) Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA02160 for pgsql-patches-outgoing; Tue, 1 Sep 1998 01:47:46 -0400 (EDT) Received: from iconmail.bellatlantic.net (iconmail.bellatlantic.net [199.173.162.30]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA02147 for ; Tue, 1 Sep 1998 01:47:42 -0400 (EDT) Received: from bellatlantic.net (client196-126-3.bellatlantic.net [151.196.126.3]) by iconmail.bellatlantic.net (IConNet Sendmail) with ESMTP id XAA27530 for ; Mon, 31 Aug 1998 23:24:07 -0400 (EDT) Message-ID: <35EB2B33.EBF1E9AA@bellatlantic.net> Date: Mon, 31 Aug 1998 19:01:07 -0400 From: David Hartwig Organization: Insight Distribution Systems X-Mailer: Mozilla 4.04 [en] (X11; I; Linux 2.0.29 i586) MIME-Version: 1.0 To: patches Subject: [PATCHES] Interim AND/OR memory exaustion fix. Content-Type: multipart/mixed; boundary="------------BEFD1E6DA78A2DC20B524E32" Sender: owner-pgsql-patches@hub.org Precedence: bulk Status: ROr This is a multi-part message in MIME format. --------------BEFD1E6DA78A2DC20B524E32 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I will be cleaning this up more before the Oct 1 deadline. --------------BEFD1E6DA78A2DC20B524E32 Content-Type: text/plain; charset=us-ascii; name="keyset.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="keyset.patch" *** ./backend/commands/variable.c.orig Thu Jul 30 19:25:26 1998 --- ./backend/commands/variable.c Mon Aug 31 17:23:32 1998 *************** *** 24,29 **** --- 24,30 ---- extern bool _use_geqo_; extern int32 _use_geqo_rels_; extern bool _use_right_sided_plans_; + extern bool _use_keyset_query_optimizer; /*-----------------------------------------------------------------------*/ static const char * *************** *** 559,564 **** --- 560,568 ---- }, #endif { + "ksqo", parse_ksqo, show_ksqo, reset_ksqo + }, + { NULL, NULL, NULL, NULL } }; *************** *** 611,615 **** --- 615,663 ---- elog(NOTICE, "Unrecognized variable %s", name); + return TRUE; + } + + + /*----------------------------------------------------------------------- + KSQO code will one day be unnecessary when the optimizer makes use of + indexes when multiple ORs are specified in the where clause. + See optimizer/prep/prepkeyset.c for more on this. + daveh@insightdist.com 6/16/98 + -----------------------------------------------------------------------*/ + bool + parse_ksqo(const char *value) + { + if (value == NULL) + { + reset_ksqo(); + return TRUE; + } + + if (strcasecmp(value, "on") == 0) + _use_keyset_query_optimizer = true; + else if (strcasecmp(value, "off") == 0) + _use_keyset_query_optimizer = false; + else + elog(ERROR, "Bad value for Key Set Query Optimizer (%s)", value); + + return TRUE; + } + + bool + show_ksqo() + { + + if (_use_keyset_query_optimizer) + elog(NOTICE, "Key Set Query Optimizer is ON"); + else + elog(NOTICE, "Key Set Query Optimizer is OFF"); + return TRUE; + } + + bool + reset_ksqo() + { + _use_keyset_query_optimizer = false; return TRUE; } *** ./backend/optimizer/plan/planner.c.orig Sun Aug 30 04:28:02 1998 --- ./backend/optimizer/plan/planner.c Mon Aug 31 17:23:32 1998 *************** *** 69,74 **** --- 69,75 ---- PlannerInitPlan = NULL; PlannerPlanId = 0; + transformKeySetQuery(parse); result_plan = union_planner(parse); Assert(PlannerQueryLevel == 1); *** ./backend/optimizer/prep/Makefile.orig Sun Apr 5 20:23:48 1998 --- ./backend/optimizer/prep/Makefile Mon Aug 31 17:23:32 1998 *************** *** 13,19 **** CFLAGS += -I../.. ! OBJS = prepqual.o preptlist.o prepunion.o # not ready yet: predmig.o xfunc.o --- 13,19 ---- CFLAGS += -I../.. ! OBJS = prepqual.o preptlist.o prepunion.o prepkeyset.o # not ready yet: predmig.o xfunc.o *** ./backend/optimizer/prep/prepkeyset.c.orig Mon Aug 31 17:23:32 1998 --- ./backend/optimizer/prep/prepkeyset.c Mon Aug 31 18:30:58 1998 *************** *** 0 **** --- 1,213 ---- + /*------------------------------------------------------------------------- + * + * prepkeyset.c-- + * Special preperation for keyset queries. + * + * Copyright (c) 1994, Regents of the University of California + * + *------------------------------------------------------------------------- + */ + #include + #include + + #include "postgres.h" + #include "nodes/pg_list.h" + #include "nodes/parsenodes.h" + #include "utils/elog.h" + + #include "nodes/nodes.h" + #include "nodes/execnodes.h" + #include "nodes/plannodes.h" + #include "nodes/primnodes.h" + #include "nodes/relation.h" + + #include "catalog/pg_type.h" + #include "lib/stringinfo.h" + #include "optimizer/planmain.h" + /* + * Node_Copy-- + * a macro to simplify calling of copyObject on the specified field + */ + #define Node_Copy(from, newnode, field) newnode->field = copyObject(from->field) + + /***** DEBUG stuff + #define TABS {int i; printf("\n"); for (i = 0; igroupClause || + origNode->havingQual || + origNode->hasAggs || + origNode->utilityStmt || + origNode->unionClause || + origNode->unionall || + origNode->hasSubLinks || + origNode->commandType != CMD_SELECT) + return; + + /* Qualify single table query */ + + /* Qualify where clause */ + if ( ! inspectOrNode((Expr*)origNode->qual)) { + return; + } + + /* Copy essential elements into a union node */ + /* + elog(NOTICE, "OR_EXPR=%d, OP_EXPR=%d, AND_EXPR=%d", OR_EXPR, OP_EXPR, AND_EXPR); + elog(NOTICE, "T_List=%d, T_Expr=%d, T_Var=%d, T_Const=%d", T_List, T_Expr, T_Var, T_Const); + elog(NOTICE, "opType=%d", ((Expr*)origNode->qual)->opType); + */ + while (((Expr*)origNode->qual)->opType == OR_EXPR) { + Query *unionNode = makeNode(Query); + + /* Pull up Expr = */ + unionNode->qual = lsecond(((Expr*)origNode->qual)->args); + + /* Pull up balance of tree */ + origNode->qual = lfirst(((Expr*)origNode->qual)->args); + + /* + elog(NOTICE, "origNode: opType=%d, nodeTag=%d", ((Expr*)origNode->qual)->opType, nodeTag(origNode->qual)); + elog(NOTICE, "unionNode: opType=%d, nodeTag=%d", ((Expr*)unionNode->qual)->opType, nodeTag(unionNode->qual)); + */ + + unionNode->commandType = origNode->commandType; + unionNode->resultRelation = origNode->resultRelation; + unionNode->isPortal = origNode->isPortal; + unionNode->isBinary = origNode->isBinary; + + if (origNode->uniqueFlag) + unionNode->uniqueFlag = pstrdup(origNode->uniqueFlag); + + Node_Copy(origNode, unionNode, sortClause); + Node_Copy(origNode, unionNode, rtable); + Node_Copy(origNode, unionNode, targetList); + + origNode->unionClause = lappend(origNode->unionClause, unionNode); + } + return; + } + + + + + static int + inspectOrNode(Expr *expr) + { + int fr = 0, sr = 0; + Expr *firstExpr, *secondExpr; + + if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == OR_EXPR)) + return 0; + + firstExpr = lfirst(expr->args); + secondExpr = lsecond(expr->args); + if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr) + return 0; + + if (firstExpr->opType == OR_EXPR) + fr = inspectOrNode(firstExpr); + else if (firstExpr->opType == OP_EXPR) /* Need to make sure it is last */ + fr = inspectOpNode(firstExpr); + else if (firstExpr->opType == AND_EXPR) /* Need to make sure it is last */ + fr = inspectAndNode(firstExpr); + + + if (secondExpr->opType == AND_EXPR) + sr = inspectAndNode(secondExpr); + else if (secondExpr->opType == OP_EXPR) + sr = inspectOpNode(secondExpr); + + return (fr && sr); + } + + + static int + inspectAndNode(Expr *expr) + { + int fr = 0, sr = 0; + Expr *firstExpr, *secondExpr; + + if ( ! (expr && nodeTag(expr) == T_Expr && expr->opType == AND_EXPR)) + return 0; + + firstExpr = lfirst(expr->args); + secondExpr = lsecond(expr->args); + if (nodeTag(firstExpr) != T_Expr || nodeTag(secondExpr) != T_Expr) + return 0; + + if (firstExpr->opType == AND_EXPR) + fr = inspectAndNode(firstExpr); + else if (firstExpr->opType == OP_EXPR) + fr = inspectOpNode(firstExpr); + + if (secondExpr->opType == OP_EXPR) + sr = inspectOpNode(secondExpr); + + return (fr && sr); + } + + + static int + /****************************************************************** + * Return TRUE if T_Var = T_Const, else FALSE + * Actually it does not test for =. Need to do this! + ******************************************************************/ + inspectOpNode(Expr *expr) + { + Expr *firstExpr, *secondExpr; + + if (nodeTag(expr) != T_Expr || expr->opType != OP_EXPR) + return 0; + + firstExpr = lfirst(expr->args); + secondExpr = lsecond(expr->args); + return (firstExpr && secondExpr && nodeTag(firstExpr) == T_Var && nodeTag(secondExpr) == T_Const); + } *** ./include/commands/variable.h.orig Thu Jul 30 19:27:05 1998 --- ./include/commands/variable.h Mon Aug 31 17:23:32 1998 *************** *** 54,58 **** --- 54,61 ---- extern bool show_geqo(void); extern bool reset_geqo(void); extern bool parse_geqo(const char *); + extern bool show_ksqo(void); + extern bool reset_ksqo(void); + extern bool parse_ksqo(const char *); #endif /* VARIABLE_H */ *** ./include/optimizer/planmain.h.orig Mon Aug 31 18:27:03 1998 --- ./include/optimizer/planmain.h Mon Aug 31 18:26:04 1998 *************** *** 67,71 **** --- 67,72 ---- extern List *check_having_qual_for_aggs(Node *clause, List *subplanTargetList, List *groupClause); extern List *check_having_qual_for_vars(Node *clause, List *targetlist_so_far); + extern void transformKeySetQuery(Query *origNode); #endif /* PLANMAIN_H */ --------------BEFD1E6DA78A2DC20B524E32-- From daveh@insightdist.com Thu Sep 3 12:34:48 1998 Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA07696 for ; Thu, 3 Sep 1998 12:34:46 -0400 (EDT) Received: from insightdist.com (nobody@localhost) by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA23590 for maillist@candle.pha.pa.us; Thu, 3 Sep 1998 12:17:44 -0400 (EDT) X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03) id AA56436; Thu, 3 Sep 1998 11:51:24 -0400 Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03) id AA45986; Thu, 3 Sep 1998 11:51:24 -0400 Message-Id: <35EEBBEF.2158F68A@insightdist.com> Date: Thu, 03 Sep 1998 11:55:28 -0400 From: David Hartwig Organization: Insight Distribution Systems X-Mailer: Mozilla 4.05 [en] (Win95; I) Mime-Version: 1.0 To: Bruce Momjian Cc: David Hartwig , pgsql-patches@postgreSQL.org Subject: Re: [PATCHES] Interim AND/OR memory exaustion fix. References: <199809030236.WAA22888@candle.pha.pa.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Status: RO Bruce Momjian wrote: > > I will be cleaning this up more before the Oct 1 deadline. > > > *** ./backend/commands/variable.c.orig Thu Jul 30 19:25:26 1998 > > --- ./backend/commands/variable.c Mon Aug 31 17:23:32 1998 > > Applied. Let's keep talking to see if we can come up with a nice > general solution to this. > Agreed. > I have been thinking, and the trouble case is a query that uses only one > table, and had only "column = value" statements. I believe this can be > easily identified and reworked somehow. > If you are referring to the AND'less set of OR's, I do have plans to not let that qualify since you have gotten the index scan working with OR's. I also think that the qualification process should be tightened up. For example force the number of AND's to be the same in each OR grouping. And have at least n OR's to qualify. We just need to head off the memory exhaustion. > Your subtable idea may be a good one. > This sounds like a 6.5 thing. I needed to stop the bleeding for 6.4. From bga@mug.org Tue Sep 8 03:39:37 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA06237 for ; Tue, 8 Sep 1998 03:39:36 -0400 (EDT) Received: from bgalli.mug.org (bajor.mug.org [207.158.132.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id DAA03648 for ; Tue, 8 Sep 1998 03:38:52 -0400 (EDT) Received: from localhost (bga@localhost) by bgalli.mug.org (8.8.7/SCO5) with SMTP id DAA02895 for ; Tue, 8 Sep 1998 03:31:26 -0400 (EDT) Message-Id: <199809080731.DAA02895@bgalli.mug.org> X-Authentication-Warning: bgalli.mug.org: bga@localhost didn't use HELO protocol X-Mailer: exmh version 2.0.2 2/24/98 From: "Billy G. Allie" Reply-To: "Billy G. Allie" To: Bruce Momjian Subject: Re: [HACKERS] flock patch breaks things here In-reply-to: Your message of "Mon, 31 Aug 1998 00:36:34 EDT." <199808310436.AAA07618@candle.pha.pa.us> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 08 Sep 1998 03:31:26 -0400 Sender: bga@mug.org Status: ROr Bruce Momjian writes: > I have been thinking about this. First, we can easily use fopen(r+) to > check to see if the file exists, and if it does read the pid and do a > kill -0 to see if it is running. If no one else does it, I will take it > on. It is better to use open with the O_CREAT and O_EXCL set. If the file does not exist it will be created and the PID can be written to it. If the file exists then the call will fail, at which point it can be opened with fread, and the PID it contains can be checked to see if it still exists with kill. The open call has the added advantage that 'The check for the existence of the file and the creation of the file if it does not exist is atomic with respect to other processes executing open naming the same filename in the same directory with O_EXCL and O_CREAT set.' [from the UnixAWare 7 man page, open(2)]. Also, you can't just delete the file, create it and write the your PID to it and assume that you have the lock, you need to close the file, sleep some small amount of time and then open and read the file to see if you still have the lock. If you like, I can take this task on. Oh, the postmaster must clear the PID when it exits. > > Second, where to put the pid file. There is reason to put in /tmp, > because it will get cleared in a reboot, and because it is locking the > port number 5432. There is also reason to put it in /data because you > can't have more than one postmaster running on a single data directory. > > So, we really want to lock both places. If this is going to make it > easier for people to run more than one postmaster, because it will > prevent/warn administrators when they try and put two postmasters in the > same data dir or port, I say create the pid lock files both places, and > give the admin a clear description of what he is doing wrong in each > case. IHMO, the pid should be put in the data directory. The reasoning that it will get cleared in a reboot is not sufficent since the logic used to create the PID file will delete it if the PID it contains is not a running process. Besides, I have used systems where /tmp was not cleared out on a re-boot (for various reasons). Also, I would rather have a script that explicitly removes the PID locking file at system statup (if it exists), in which case, it doesn't matter where it resides. -- ____ | Billy G. Allie | Domain....: Bill.Allie@mug.org | /| | 7436 Hartwell | Compuserve: 76337,2061 |-/-|----- | Dearborn, MI 48126| MSN.......: B_G_Allie@email.msn.com |/ |LLIE | (313) 582-1540 | From owner-pgsql-general@hub.org Thu Oct 1 14:00:57 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA12443 for ; Thu, 1 Oct 1998 14:00:56 -0400 (EDT) Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA07930 for ; Thu, 1 Oct 1998 13:57:47 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.8.8) with SMTP id NAA26913; Thu, 1 Oct 1998 13:56:29 -0400 (EDT) (envelope-from owner-pgsql-general@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 01 Oct 1998 13:55:56 +0000 (EDT) Received: (from majordom@localhost) by hub.org (8.8.8/8.8.8) id NAA26856 for pgsql-general-outgoing; Thu, 1 Oct 1998 13:55:54 -0400 (EDT) (envelope-from owner-pgsql-general@postgreSQL.org) X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-general@postgreSQL.org using -f Received: from mail.utexas.edu (wb3-a.mail.utexas.edu [128.83.126.138]) by hub.org (8.8.8/8.8.8) with SMTP id NAA26840 for ; Thu, 1 Oct 1998 13:55:49 -0400 (EDT) (envelope-from taral@mail.utexas.edu) Received: (qmail 1198 invoked by uid 0); 1 Oct 1998 17:55:40 -0000 Received: from dial-24-13.ots.utexas.edu (HELO taral) (128.83.128.157) by umbs-smtp-3 with SMTP; 1 Oct 1998 17:55:40 -0000 From: "Taral" To: Subject: [GENERAL] CNF vs DNF Date: Thu, 1 Oct 1998 12:55:39 -0500 Message-ID: <000001bded64$b34b2200$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 In-Reply-To: Importance: Normal Sender: owner-pgsql-general@postgreSQL.org Precedence: bulk Status: RO > select * from aa where (bb = 2 and ff = 3) or (bb = 4 and ff = 5); I've been told that the system restructures these in CNF (conjunctive normal form)... i.e. the above query turns into: select * from aa where (bb = 2 or bb = 4) and (ff = 3 or bb = 4) and (bb = 2 or ff = 5) and (ff = 3 or ff = 5); Much longer and much less efficient, AFAICT. Isn't it more efficient to do a union of many queries (DNF) than an intersection of many subqueries (CNF)? Certainly remembering the subqueries takes less memory... Also, queries already in DNF are probably more common than queries in CNF, requiring less rewrite. Can someone clarify this? Taral From taral@mail.utexas.edu Fri Oct 2 01:35:42 1998 Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134]) by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id BAA28231 for ; Fri, 2 Oct 1998 01:35:27 -0400 (EDT) Received: (qmail 16318 invoked by uid 0); 2 Oct 1998 05:35:13 -0000 Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216) by umbs-smtp-1 with SMTP; 2 Oct 1998 05:35:13 -0000 From: "Taral" To: "Bruce Momjian" Cc: Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Fri, 2 Oct 1998 00:35:12 -0500 Message-ID: <000001bdedc6$6cf75d20$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 In-Reply-To: <199810020218.WAA23299@candle.pha.pa.us> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Status: ROr > It currently convert to CNF so it can select the most restrictive > restriction and join, and use those first. However, the CNF conversion > is a memory exploder for some queries, and we certainly need to have > another method to split up those queries into UNIONS. I think we need > to code to identify those queries capable of being converted to UNIONS, > and do that before the query gets to the CNF section. That would be > great, and David Hartwig has implemented a limited capability of doing > this, but we really need a general routine to do this with 100% > reliability. Well, if you're talking about a routine to generate a heuristic for CNF vs. DNF, it is possible to precalculate the query sizes for CNF and DNF rewrites... For conversion to CNF: At every node: if nodeType = AND then f(node) = f(left) + f(right) if nodeType = OR then f(node) = f(left) * f(right) f(root) = a reasonably (but not wonderful) metric For DNF just switch AND and OR in the above. You may want to compute both metrics and compare... take the smaller one and use that path. How to deal with other operators depends on their implementation... Taral From taral@mail.utexas.edu Fri Oct 2 12:48:27 1998 Received: from mail.utexas.edu (wb4-a.mail.utexas.edu [128.83.126.140]) by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id MAA11438 for ; Fri, 2 Oct 1998 12:48:25 -0400 (EDT) Received: (qmail 15628 invoked by uid 0); 2 Oct 1998 16:47:50 -0000 Received: from dial-42-8.ots.utexas.edu (HELO taral) (128.83.111.216) by umbs-smtp-4 with SMTP; 2 Oct 1998 16:47:50 -0000 From: "Taral" To: "Bruce Momjian" Cc: Subject: RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Fri, 2 Oct 1998 11:47:48 -0500 Message-ID: <000301bdee24$63308740$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 In-reply-to: <199810021640.MAA10925@candle.pha.pa.us> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Status: RO > > Create a temporary oid hash? (for each table selected on, I guess) > > What I did with indexes was to run the previous OR clause index > restrictions through the qualification code, and make sure it failed, > but I am not sure how that is going to work with a more complex WHERE > clause. Perhaps I need to restrict this to just simple cases of > constants, which are easy to pick out an run through. Doing this with > joins would be very hard, I think. Actually, I was thinking more of an index of returned rows... After each subquery, the backend would check each row to see if it was already in the index... Simple duplicate check, in other words. Of course, I don't know how well this would behave with large tables being returned... Anyone else have some ideas they want to throw in? Taral From taral@mail.utexas.edu Fri Oct 2 17:13:01 1998 Received: from mail.utexas.edu (wb1-a.mail.utexas.edu [128.83.126.134]) by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA20838 for ; Fri, 2 Oct 1998 17:12:27 -0400 (EDT) Received: (qmail 17418 invoked by uid 0); 2 Oct 1998 21:12:19 -0000 Received: from dial-46-30.ots.utexas.edu (HELO taral) (128.83.112.158) by umbs-smtp-1 with SMTP; 2 Oct 1998 21:12:19 -0000 From: "Taral" To: "Bruce Momjian" , Cc: Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Fri, 2 Oct 1998 16:12:19 -0500 Message-ID: <000001bdee49$56c7cd40$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 In-reply-to: <199810021758.NAA15524@candle.pha.pa.us> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Status: ROr > Another idea is that we rewrite queries such as: > > SELECT * > FROM tab > WHERE (a=1 AND b=2 AND c=3) OR > (a=1 AND b=2 AND c=4) OR > (a=1 AND b=2 AND c=5) OR > (a=1 AND b=2 AND c=6) > > into: > > SELECT * > FROM tab > WHERE (a=1 AND b=2) AND (c=3 OR c=4 OR c=5 OR c=6) Very nice, but that's like trying to code factorization of numbers... not pretty, and very CPU intensive on complex queries... Taral From taral@mail.utexas.edu Fri Oct 2 17:49:59 1998 Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id RAA21488 for ; Fri, 2 Oct 1998 17:49:52 -0400 (EDT) Received: (qmail 23729 invoked by uid 0); 2 Oct 1998 21:49:27 -0000 Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22) by umbs-smtp-2 with SMTP; 2 Oct 1998 21:49:27 -0000 From: "Taral" To: "Bruce Momjian" Cc: , Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Fri, 2 Oct 1998 16:49:26 -0500 Message-ID: <000001bdee4e$86688b20$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 In-Reply-To: <199810022139.RAA21082@candle.pha.pa.us> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Status: ROr > > Very nice, but that's like trying to code factorization of > numbers... not > > pretty, and very CPU intensive on complex queries... > > Yes, but how large are the WHERE clauses going to be? Considering the > cost of cnfify() and UNION, it seems like a clear win. Is it general > enough to solve our problems? Could be... the examples I received where the cnfify() was really bad were cases where the query was submitted alredy in DNF... and where the UNION was a simple one. However, I don't know of any algorithms for generic simplification of logical constraints. One problem is resolution/selection of factors: SELECT * FROM a WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 2 AND c = 3) OR (a = 1 AND b = 5 AND c = 3) OR (a = 1 AND b = 2 AND c = 6); Try that on for size. You can understand why that code gets ugly, fast. Somebody could try coding it, but it's not a clear win to me. My original heuristic was missing one thing: "Where the heuristic fails to process or decide, default to CNF." Since that's the current behavior, we're less likely to break things. Taral From owner-pgsql-hackers@hub.org Fri Oct 2 19:28:09 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA23341 for ; Fri, 2 Oct 1998 19:28:08 -0400 (EDT) Received: from hub.org (majordom@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id SAA18003 for ; Fri, 2 Oct 1998 18:21:37 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.8.8) with SMTP id SAA01250; Fri, 2 Oct 1998 18:08:02 -0400 (EDT) (envelope-from owner-pgsql-hackers@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 02 Oct 1998 18:04:37 +0000 (EDT) Received: (from majordom@localhost) by hub.org (8.8.8/8.8.8) id SAA00847 for pgsql-hackers-outgoing; Fri, 2 Oct 1998 18:04:35 -0400 (EDT) (envelope-from owner-pgsql-hackers@postgreSQL.org) X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) by hub.org (8.8.8/8.8.8) with SMTP id SAA00806 for ; Fri, 2 Oct 1998 18:04:26 -0400 (EDT) (envelope-from taral@mail.utexas.edu) Received: (qmail 29662 invoked by uid 0); 2 Oct 1998 22:04:25 -0000 Received: from dial-2-6.ots.utexas.edu (HELO taral) (128.83.204.22) by umbs-smtp-2 with SMTP; 2 Oct 1998 22:04:25 -0000 From: "Taral" To: "Bruce Momjian" Cc: , Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Fri, 2 Oct 1998 17:04:24 -0500 Message-ID: <000201bdee50$9d9c4320$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 In-Reply-To: <199810022157.RAA21769@candle.pha.pa.us> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Sender: owner-pgsql-hackers@postgreSQL.org Precedence: bulk Status: ROr > How do we do that with UNION, and return the right rows. Seems the > _join_ happending multiple times would be much worse than the factoring. Ok... We have two problems: 1) DNF for unjoined queries. 2) Factorization for the rest. I have some solutions for (1). Not for (2). Remember that unjoined queries are quite common. :) For (1), we can always try to parallel the multiple queries... especially in the case where a sequential search is required. Taral From owner-pgsql-hackers@hub.org Sat Oct 3 23:32:35 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06644 for ; Sat, 3 Oct 1998 23:31:13 -0400 (EDT) Received: from hub.org (root@hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id XAA26912 for ; Sat, 3 Oct 1998 23:14:01 -0400 (EDT) Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.8.8) with SMTP id WAA04407; Sat, 3 Oct 1998 22:07:05 -0400 (EDT) (envelope-from owner-pgsql-hackers@hub.org) Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sat, 03 Oct 1998 22:02:00 +0000 (EDT) Received: (from majordom@localhost) by hub.org (8.8.8/8.8.8) id WAA04010 for pgsql-hackers-outgoing; Sat, 3 Oct 1998 22:01:59 -0400 (EDT) (envelope-from owner-pgsql-hackers@postgreSQL.org) X-Authentication-Warning: hub.org: majordom set sender to owner-pgsql-hackers@postgreSQL.org using -f Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.8.8) with ESMTP id WAA03968 for ; Sat, 3 Oct 1998 22:00:37 -0400 (EDT) (envelope-from maillist@candle.pha.pa.us) Received: (from maillist@localhost) by candle.pha.pa.us (8.9.0/8.9.0) id VAA04640; Sat, 3 Oct 1998 21:57:30 -0400 (EDT) From: Bruce Momjian Message-Id: <199810040157.VAA04640@candle.pha.pa.us> Subject: Re: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) In-Reply-To: <000201bdee50$9d9c4320$3b291f0a@taral> from Taral at "Oct 2, 1998 5: 4:24 pm" To: taral@mail.utexas.edu (Taral) Date: Sat, 3 Oct 1998 21:57:30 -0400 (EDT) Cc: jwieck@debis.com, hackers@postgreSQL.org X-Mailer: ELM [version 2.4ME+ PL47 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-pgsql-hackers@postgreSQL.org Precedence: bulk Status: RO I have another idea. When we cnfify, this: (A AND B) OR (C AND D) becomes (A OR C) AND (A OR D) AND (B OR C) AND (B OR D) however if A and C are identical, this could become: (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) and A OR A is A: A AND (A OR D) AND (B OR A) AND (B OR D) and since we are now saying A has to be true, we can remove OR's with A: A AND (B OR D) Much smaller, and a big win for queries like: SELECT * FROM tab WHERE (a=1 AND b=2) OR (a=1 AND b=3) This becomes: (a=1) AND (b=2 OR b=3) which is accurate, and uses our OR indexing. Seems I could code cnfify() to look for identical qualifications in two joined OR clauses and remove the duplicates. Sound like big win, and fairly easy and inexpensive in processing time. Comments? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 From taral@mail.utexas.edu Sat Oct 3 22:43:41 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA05961 for ; Sat, 3 Oct 1998 22:42:18 -0400 (EDT) Received: from mail.utexas.edu (wb2-a.mail.utexas.edu [128.83.126.136]) by renoir.op.net (o1/$ Revision: 1.18 $) with SMTP id WAA25111 for ; Sat, 3 Oct 1998 22:27:34 -0400 (EDT) Received: (qmail 25622 invoked by uid 0); 4 Oct 1998 02:26:21 -0000 Received: from dial-42-9.ots.utexas.edu (HELO taral) (128.83.111.217) by umbs-smtp-2 with SMTP; 4 Oct 1998 02:26:21 -0000 From: "Taral" To: "Bruce Momjian" Cc: , Subject: RE: [HACKERS] RE: [GENERAL] Long update query ? (also Re: [GENERAL] CNF vs. DNF) Date: Sat, 3 Oct 1998 21:26:20 -0500 Message-ID: <000501bdef3e$5f5293a0$3b291f0a@taral> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 Importance: Normal In-Reply-To: <199810040157.VAA04640@candle.pha.pa.us> X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3155.0 Status: ROr > however if A and C are identical, this could become: > > (A OR A) AND (A OR D) AND (B OR A) AND (B OR D) > > and A OR A is A: > > A AND (A OR D) AND (B OR A) AND (B OR D) > > and since we are now saying A has to be true, we can remove OR's with A: > > A AND (B OR D) Very nice... and you could do that after each iteration of the rewrite, preventing the size from getting too big. :) I have a symbolic expression tree evaluator that would be perfect for this... I'll see if I can't adapt it. Can someone mail me the structures for expression trees? I don't want to have to excise them from the source. Please? Taral From daveh@insightdist.com Mon Nov 9 13:31:07 1998 Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA00997 for ; Mon, 9 Nov 1998 13:31:00 -0500 (EST) Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA26657 for ; Mon, 9 Nov 1998 13:10:14 -0500 (EST) Received: from insightdist.com (nobody@localhost) by u1.abs.net (8.9.0/8.9.0) with UUCP id MAA17710 for maillist@candle.pha.pa.us; Mon, 9 Nov 1998 12:52:05 -0500 (EST) X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!daveh using -f Received: from ceodev by insightdist.com (AIX 3.2/UCB 5.64/4.03) id AA43498; Mon, 9 Nov 1998 12:38:24 -0500 Received: from daveh by ceodev (AIX 4.1/UCB 5.64/4.03) id AA54446; Mon, 9 Nov 1998 12:38:23 -0500 Message-Id: <3647296F.6F7FDDD2@insightdist.com> Date: Mon, 09 Nov 1998 12:42:07 -0500 From: David Hartwig Organization: Insight Distribution Systems X-Mailer: Mozilla 4.5 [en] (Win98; I) X-Accept-Language: en Mime-Version: 1.0 To: Bob Kruger , Bruce Momjian Cc: pgsql-general@postgreSQL.org, Byron Nikolaidis Subject: Re: [GENERAL] Incrementing a Serial Field References: <3.0.5.32.19981109110757.0082c950@mindspring.com> Content-Type: multipart/mixed; boundary="------------3D3EE7F67DFC542D3928BB7E" Status: ROr This is a multi-part message in MIME format. --------------3D3EE7F67DFC542D3928BB7E Content-Type: multipart/alternative; boundary="------------43E2CC34278FA08EFC9E0611" --------------43E2CC34278FA08EFC9E0611 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Bob Kruger wrote: > The second question is that I noticed the ODBC bug (feature?) when linking > Postgres to MS Access still exists. This bug occurs when linking a MS > Access table to a Postgres table, and identifying more than one field as > the unique record identifier. This makes Postgres run until it exhausts > all available memory. Does anyone know a way around this? Enabling read > only ODBC is a feature I would like to make available, but I do not want > the possibility of postgres crashing because of an error on the part of a > MS Access user. > > BTW - Having capability to be linked to an Access database is not an > option. The current project I am working on calls for that, so it is a > necessary evil that I hav to live with. > In the driver connection settings add the following line. SET ksql TO 'on'; Stands for: keyset query optimization. This is not considered a final solution. As such, it is undocumented. Some time in the next day or so, we will be releasing a version of the driver which will automatically SET ksqo. You will most likely be satisfied with the results. One problem with this solution, however, is that it does not work if you have any (some kinds of?) arrays in the table you are browsing. This is a sideffect of the rewrite to a UNION which performs an internal sort unique. Also, if you are using row versioning you may need to overload some operators for xid and int4. I have included a script that will take care of this. Bruce, can I get these operators hardcoded into 6.4.1- assuming there will be one. The operators necessitated by the UNION sideffects. --------------43E2CC34278FA08EFC9E0611 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit  

Bob Kruger wrote:

The second question is that I noticed the ODBC bug (feature?) when linking
Postgres to MS Access still exists.  This bug occurs when linking a MS
Access table to a Postgres table, and identifying more than one field as
the unique record identifier.  This makes Postgres run until it exhausts
all available memory.  Does anyone know a way around this?  Enabling read
only ODBC is a feature I would like to make available, but I do not want
the possibility of postgres crashing because of an error on the part of a
MS Access user.

BTW - Having capability to be linked to an Access database is not an
option.  The current project I am working on calls for that, so it is a
necessary evil that I hav to live with.
 

In the driver connection settings add the following line.

    SET ksql TO 'on';

Stands for: keyset query optimization.  This is not considered a final solution.  As such, it is undocumented.   Some time in the next day or so, we will be releasing a version of the driver which will automatically SET ksqo.

You will most likely be satisfied with the results.   One problem with this solution, however,  is that it does not work if you have any (some kinds of?) arrays in the table you are browsing.   This is a sideffect of the rewrite to a UNION which performs an internal sort unique.

Also, if you are using row versioning you may need to overload some operators for xid and int4.  I have included a script that will take care of this.

Bruce, can I get these operators hardcoded into 6.4.1- assuming there will be one.   The operators  necessitated by the UNION sideffects.
  --------------43E2CC34278FA08EFC9E0611-- --------------3D3EE7F67DFC542D3928BB7E Content-Type: text/plain; charset=us-ascii; name="xidint4.sql" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="xidint4.sql" -- Insight Distribution Systems - System V - Apr 1998 -- @(#)xidint4.sql 1.2 :/sccs/sql/extend/s.xidint4.sql 10/2/98 13:40:19" create function int4eq(xid,int4) returns bool as '' language 'internal'; create operator = ( leftarg=xid, rightarg=int4, procedure=int4eq, commutator='=', negator='<>', restrict=eqsel, join=eqjoinsel ); create function int4lt(xid,xid) returns bool as '' language 'internal'; create operator < ( leftarg=xid, rightarg=xid, procedure=int4lt, commutator='=', negator='<>', restrict=eqsel, join=eqjoinsel ); --------------3D3EE7F67DFC542D3928BB7E--