diff --git a/doc/src/sgml/geqo.sgml b/doc/src/sgml/geqo.sgml index 725504c28f..61abf13ca4 100644 --- a/doc/src/sgml/geqo.sgml +++ b/doc/src/sgml/geqo.sgml @@ -3,78 +3,103 @@ Martin Utesch + + +University of Mining and Technology + + +Institute of Automatic Control + +
+ +Freiberg + + +Germany + +
+
+1997-10-02 Genetic Query Optimization in Database Systems - -Martin Utesch + +Author + +Written by Martin Utesch +for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany. + + - Institute of Automatic Control - University of Mining and Technology - Freiberg, Germany - - 02/10/1997 - - -1.) Query Handling as a Complex Optimization Problem -==================================================== + +Query Handling as a Complex Optimization Problem + Among all relational operators the most difficult one to process and -optimize is the JOIN. The number of alternative plans to answer a query -grows exponentially with the number of JOINs included in it. Further -optimization effort is caused by the support of a variety of *JOIN -methods* (e.g., nested loop, index scan, merge join in Postgres) to -process individual JOINs and a diversity of *indices* (e.g., r-tree, -b-tree, hash in Postgres) as access paths for relations. +optimize is the join. The number of alternative plans to answer a query +grows exponentially with the number of joins included in it. Further +optimization effort is caused by the support of a variety of join methods + (e.g., nested loop, index scan, merge join in Postgres) to +process individual joins and a diversity of indices (e.g., r-tree, +b-tree, hash in Postgres) as access paths for relations. - The current Postgres optimizer implementation performs a *near- -exhaustive search* over the space of alternative strategies. This query + + The current Postgres optimizer implementation performs a near- +exhaustive search over the space of alternative strategies. This query optimization technique is inadequate to support database application domains that involve the need for extensive queries, such as artificial intelligence. + The Institute of Automatic Control at the University of Mining and Technology, in Freiberg, Germany, encountered the described problems as its -folks wanted to take the Postgres DBMS as the backend for a decision +folks wanted to take the Postgres DBMS as the backend for a decision support knowledge based system for the maintenance of an electrical -power grid. The DBMS needed to handle large JOIN queries for the +power grid. The DBMS needed to handle large join queries for the inference machine of the knowledge based system. + Performance difficulties within exploring the space of possible query plans arose the demand for a new optimization technique being developed. - In the following we propose the implementation of a *Genetic -Algorithm* as an option for the database query optimization problem. + + In the following we propose the implementation of a Genetic Algorithm + as an option for the database query optimization problem. -2.) Genetic Algorithms (GA) -=========================== + +Genetic Algorithms (<Acronym>GA</Acronym>) - The GA is a heuristic optimization method which operates through + + The GA is a heuristic optimization method which operates through determined, randomized search. The set of possible solutions for the -optimization problem is considered as a *population* of *individuals*. +optimization problem is considered as a population of individuals. The degree of adaption of an individual to its environment is specified -by its *fitness*. +by its fitness. + The coordinates of an individual in the search space are represented -by *chromosomes*, in essence a set of character strings. A *gene* is a +by chromosomes, in essence a set of character strings. A gene is a subsection of a chromosome which encodes the value of a single parameter -being optimized. Typical encodings for a gene could be *binary* or -*integer*. +being optimized. Typical encodings for a gene could be binary or +integer. - Through simulation of the evolutionary operations *recombination*, -*mutation*, and *selection* new generations of search points are found + + Through simulation of the evolutionary operations recombination, +mutation, and selection new generations of search points are found that show a higher average fitness than their ancestors. - According to the "comp.ai.genetic" FAQ it cannot be stressed too -strongly that a GA is not a pure random search for a solution to a -problem. A GA uses stochastic processes, but the result is distinctly + + According to the "comp.ai.genetic" FAQ it cannot be stressed too +strongly that a GA is not a pure random search for a solution to a +problem. A GA uses stochastic processes, but the result is distinctly non-random (better than random). -Structured Diagram of a GA: + +Structured Diagram of a GA: --------------------------- P(t) generation of ancestors at a time t @@ -101,128 +126,233 @@ P''(t) generation of descendants at a time t | +-------------------------------------+ | | t := t + 1 | +===+=====================================+ + + +Genetic Query Optimization (<Acronym>GEQO</Acronym>) in Postgres -3.) Genetic Query Optimization (GEQO) in PostgreSQL -=================================================== - - The GEQO module is intended for the solution of the query -optimization problem similar to a traveling salesman problem (TSP). + + The GEQO module is intended for the solution of the query +optimization problem similar to a traveling salesman problem (TSP). Possible query plans are encoded as integer strings. Each string -represents the JOIN order from one relation of the query to the next. -E. g., the query tree /\ - /\ 2 - /\ 3 - 4 1 is encoded by the integer string '4-1-3-2', +represents the join order from one relation of the query to the next. +E. g., the query tree + + /\ + /\ 2 + /\ 3 + 4 1 + +is encoded by the integer string '4-1-3-2', which means, first join relation '4' and '1', then '3', and -then '2', where 1, 2, 3, 4 are relids in PostgreSQL. +then '2', where 1, 2, 3, 4 are relids in Postgres. - Parts of the GEQO module are adapted from D. Whitley's Genitor + + Parts of the GEQO module are adapted from D. Whitley's Genitor algorithm. - Specific characteristics of the GEQO implementation in PostgreSQL + + Specific characteristics of the GEQO implementation in Postgres are: -o usage of a *steady state* GA (replacement of the least fit + + + +Usage of a steady state GA (replacement of the least fit individuals in a population, not whole-generational replacement) allows fast convergence towards improved query plans. This is essential for query handling with reasonable time; + + -o usage of *edge recombination crossover* which is especially suited - to keep edge losses low for the solution of the TSP by means of a GA; + + +Usage of edge recombination crossover which is especially suited + to keep edge losses low for the solution of the TSP by means of a GA; + + -o mutation as genetic operator is deprecated so that no repair - mechanisms are needed to generate legal TSP tours. + + +Mutation as genetic operator is deprecated so that no repair + mechanisms are needed to generate legal TSP tours. + + + - The GEQO module gives the following benefits to the PostgreSQL DBMS -compared to the Postgres query optimizer implementation: + + The GEQO module gives the following benefits to the Postgres DBMS +compared to the Postgres query optimizer implementation: -o handling of large JOIN queries through non-exhaustive search; + + + +Handling of large join queries through non-exhaustive search; + + -o improved cost size approximation of query plans since no longer - plan merging is needed (the GEQO module evaluates the cost for a + + +Improved cost size approximation of query plans since no longer + plan merging is needed (the GEQO module evaluates the cost for a query plan as an individual). + + + + -References -========== + +Future Implementation Tasks for <ProductName>Postgres</ProductName> <Acronym>GEQO</Acronym> -J. Heitk"otter, D. Beasley: ---------------------------- - "The Hitch-Hicker's Guide to Evolutionary Computation", - FAQ in 'comp.ai.genetic', - 'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html' + +Basic Improvements -Z. Fong: --------- - "The Design and Implementation of the Postgres Query Optimizer", - file 'planner/Report.ps' in the 'postgres-papers' distribution + +Improve freeing of memory when query is already processed -R. Elmasri, S. Navathe: ------------------------ - "Fundamentals of Database Systems", - The Benjamin/Cummings Pub., Inc. + +With large join queries the computing time spent for the genetic query +optimization seems to be a mere fraction of the time + Postgres +needs for freeing memory via routine MemoryContextFree, +file backend/utils/mmgr/mcxt.c. +Debugging showed that it get stucked in a loop of routine +OrderedElemPop, file backend/utils/mmgr/oset.c. +The same problems arise with long queries when using the normal +Postgres query optimization algorithm. + +Improve genetic algorithm parameter settings -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Things left to done for the PostgreSQL * -= Genetic Query Optimization (GEQO) = -* module implementation * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Martin Utesch * Institute of Automatic Control * -= = University of Mining and Technology = -* utesch@aut.tu-freiberg.de * Freiberg, Germany * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= - - -1.) Basic Improvements -=============================================================== - -a) improve freeing of memory when query is already processed: -------------------------------------------------------------- -with large JOIN queries the computing time spent for the genetic query -optimization seems to be a mere *fraction* of the time Postgres -needs for freeing memory via routine 'MemoryContextFree', -file 'backend/utils/mmgr/mcxt.c'; -debugging showed that it get stucked in a loop of routine -'OrderedElemPop', file 'backend/utils/mmgr/oset.c'; -the same problems arise with long queries when using the normal -Postgres query optimization algorithm; - -b) improve genetic algorithm parameter settings: ------------------------------------------------- -file 'backend/optimizer/geqo/geqo_params.c', routines -'gimme_pool_size' and 'gimme_number_generations'; + +In file backend/optimizer/geqo/geqo_params.c, routines +gimme_pool_size and gimme_number_generations, we have to find a compromise for the parameter settings to satisfy two competing demands: -1. optimality of the query plan -2. computing time + + + +Optimality of the query plan + + + + +Computing time + + + -c) find better solution for integer overflow: ---------------------------------------------- -file 'backend/optimizer/geqo/geqo_eval.c', routine -'geqo_joinrel_size'; -the present hack for MAXINT overflow is to set the Postgres integer -value of 'rel->size' to its logarithm; -modifications of 'struct Rel' in 'backend/nodes/relation.h' will -surely have severe impacts on the whole PostgreSQL implementation. + +Find better solution for integer overflow -d) find solution for exhausted memory: --------------------------------------- -that may occur with more than 10 relations involved in a query, -file 'backend/optimizer/geqo/geqo_eval.c', routine -'gimme_tree' which is recursively called; -maybe I forgot something to be freed correctly, but I dunno what; -of course the 'rel' data structure of the JOIN keeps growing and -growing the more relations are packed into it; -suggestions are welcome :-( + +In file backend/optimizer/geqo/geqo_eval.c, routine +geqo_joinrel_size, +the present hack for MAXINT overflow is to set the Postgres integer +value of rel->size to its logarithm. +Modifications of Rel in backend/nodes/relation.h will +surely have severe impacts on the whole Postgres implementation. + + +Find solution for exhausted memory + + +Memory exhaustion may occur with more than 10 relations involved in a query. +In file backend/optimizer/geqo/geqo_eval.c, routine +gimme_tree is recursively called. +Maybe I forgot something to be freed correctly, but I dunno what. +Of course the rel data structure of the join keeps growing and +growing the more relations are packed into it. +Suggestions are welcome :-( -2.) Further Improvements -=============================================================== -Enable bushy query tree processing within PostgreSQL; + +Further Improvements + + +Enable bushy query tree processing within Postgres; that may improve the quality of query plans. - + + +References + +Reference information for GEQ algorithms. + + + + + +The Hitch-Hiker's Guide to Evolutionary Computation + + + +Jörg +Heitkötter + + +David +Beasley + + + + +InterNet resource + + + + +FAQ in comp.ai.genetic +is available at Encore. + + + + + +The Design and Implementation of the Postgres Query Optimizer + + + +Z. +Fong + + + + +University of California, Berkeley Computer Science Department + + + + +File planner/Report.ps in the 'postgres-papers' distribution. + + + + + + +Fundamentals of Database Systems + + + +R. +Elmasri + + +S. +Navathe + + + + +The Benjamin/Cummings Pub., Inc. + + + + + + +