diff --git a/src/backend/optimizer/plan/README b/src/backend/optimizer/plan/README new file mode 100644 index 0000000000..d3a516834c --- /dev/null +++ b/src/backend/optimizer/plan/README @@ -0,0 +1,156 @@ +Subselect notes from Vadim. + + + +From owner-pgsql-hackers@hub.org Fri Feb 13 09:01:19 1998 +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA11576 + for ; Fri, 13 Feb 1998 09:01:17 -0500 (EST) +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id IAA09761 for ; Fri, 13 Feb 1998 08:41:22 -0500 (EST) +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id IAA08135; Fri, 13 Feb 1998 08:40:17 -0500 (EST) +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Feb 1998 08:38:42 -0500 (EST) +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id IAA06646 for pgsql-hackers-outgoing; Fri, 13 Feb 1998 08:38:35 -0500 (EST) +Received: from dune.krasnet.ru (dune.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id IAA04568 for ; Fri, 13 Feb 1998 08:37:16 -0500 (EST) +Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86]) + by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA13717 + for ; Fri, 13 Feb 1998 20:51:03 +0700 (KRS) + (envelope-from vadim@sable.krasnoyarsk.su) +Message-ID: <34E44FBA.D64E7997@sable.krasnoyarsk.su> +Date: Fri, 13 Feb 1998 20:50:50 +0700 +From: "Vadim B. Mikheev" +Organization: ITTS (Krasnoyarsk) +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) +MIME-Version: 1.0 +To: PostgreSQL Developers List +Subject: [HACKERS] Subselects are in CVS... +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Sender: owner-pgsql-hackers@hub.org +Precedence: bulk +Status: OR + +This is some implementation notes and opened issues... + +First, implementation uses new type of parameters - PARAM_EXEC - to deal +with correlation Vars. When query_planner() is called, it first tries to +replace all upper queries Var referenced in current query with Param of +this type. Some global variables are used to keep mapping of Vars to +Params and Params to Vars. + +After this, all current query' SubLinks are processed: for each SubLink +found in query' qual union_planner() (old planner() function) will be +called to plan corresponding subselect (union_planner() calls +query_planner() for "simple" query and supports UNIONs). After subselect +are planned, optimizer knows about is this correlated, un-correlated or +_undirect_ correlated (references some grand-parent Vars but no parent +ones: uncorrelated from the parent' point of view) query. + +For uncorrelated and undirect correlated subqueries of EXPRession or +EXISTS type SubLinks will be replaced with "normal" clauses from +SubLink->Oper list (I changed this list to be list of EXPR nodes, +not just Oper ones). Right sides of these nodes are replaced with +PARAM_EXEC parameters. This is second use of new parameter type. +At run-time these parameters get value from result of subquery +evaluation (i.e. - from target list of subquery). Execution plan of +subquery itself becomes init plan of parent query. InitPlan knows +what parameters are to get values from subquery' results and will be +executed "on-demand" (for query select * from table where x > 0 and +y > (select max(a) from table_a) subquery will not be executed at all +if there are no tuples with x > 0 _and_ y is not used in index scan). + +SubLinks for subqueries of all other types are transformed into +new type of Expr node - SUBPLAN_EXPR. Expr->args are just correlation +variables from _parent_ query. Expr->oper is new SubPlan node. + +This node is used for InitPlan too. It keeps subquery range table, +indices of Params which are to get value from _parent_ query Vars +(i.e. - from Expr->args), indices of Params into which subquery' +results are to be substituted (this is for InitPlans), SubLink +and subquery' execution plan. + +Plan node was changed to know about dependencies on Params from +parent queries and InitPlans, to keep list of changed Params +(from the above) and so be re-scanned if this list is not NULL. +Also, added list of InitPlans (actually, all of them for current +query are in topmost plan node now) and other SubPlans (from +plan->qual) - to initialize them and let them know about changed +Params (from the list of their "interests"). + +After all SubLinks are processed, query_planner() calls qual' +canonificator and does "normal" work. By using Params optimizer +is mostly unchanged. + +Well, Executor. To get subplans re-evaluated without ExecutorStart() +and ExecutorEnd() (without opening and closing relations and indices +and without many palloc() and pfree() - this is what SQL-funcs does +on each call) ExecReScan() now supports most of Plan types... + +Explanation of EXPLAIN. + +vac=> explain select * from tmp where x >= (select max(x2) from test2 +where y2 = y and exists (select * from tempx where tx = x)); +NOTICE: QUERY PLAN: + +Seq Scan on tmp (cost=40.03 size=101 width=8) + SubPlan + ^^^^^^^ subquery is in Seq Scan' qual, its plan is below + -> Aggregate (cost=2.05 size=0 width=0) + InitPlan + ^^^^^^^^ EXISTS subsubquery is InitPlan of subquery + -> Seq Scan on tempx (cost=4.33 size=1 width=4) + -> Result (cost=2.05 size=0 width=0) + ^^^^^^ EXISTS subsubquery was transformed into Param + and so we have Result node here + -> Index Scan on test2 (cost=2.05 size=1 width=4) + + +Opened issues. + +1. No read permissions checking (easy, just not done yet). +2. readfuncs.c can't read subplan-s (easy, not critical, because of + we currently nowhere use ascii representation of execution plans). +3. ExecReScan() doesn't support all plan types. At least support for + MergeJoin has to be implemented. +4. Memory leaks in ExecReScan(). +5. I need in advice: if subquery introduced with NOT IN doesn't return + any tuples then qualification is failed, yes ? +6. Regression tests !!!!!!!!!!!!!!!!!!!! + (Could we use data/queries from MySQL' crash.me ? + Copyright-ed ? Could they give us rights ?) +7. Performance. + - Should be good when subquery is transformed into InitPlan. + - Something should be done for uncorrelated subqueries introduced + with ANY/ALL - keep thinking. Currently, subplan will be re-scanned + for each parent tuple - very slow... + +Results of some test. TMP is table with x,y (int4-s), x in 0-9, +y = 100 - x, 1000 tuples (10 duplicates of each tuple). TEST2 is table +with x2, y2 (int4-s), x2 in 1-99, y2 = 100 -x2, 10000 tuples (100 dups). + + Trying + +select * from tmp where x >= (select max(x2) from test2 where y2 = y); + + and + +begin; +select y as ty, max(x2) as mx into table tsub from test2, tmp +where y2 = y group by ty; +vacuum tsub; +select x, y from tmp, tsub where x >= mx and y = ty; +drop table tsub; +end; + + Without index on test2(y2): + +SubSelect -> 320 sec +Using temp table -> 32 sec + + Having index + +SubSelect -> 17 sec (2M of memory) +Using temp table -> 32 sec (12M of memory: -S 8192) + +Vadim + +