1998-08-15 08:55:05 +02:00
|
|
|
<!--
|
2001-05-17 23:50:18 +02:00
|
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.17 2001/05/17 21:50:15 petere Exp $
|
1998-08-15 08:55:05 +02:00
|
|
|
Genetic Optimizer
|
|
|
|
-->
|
|
|
|
|
2000-08-23 07:59:11 +02:00
|
|
|
<chapter id="geqo">
|
|
|
|
<docinfo>
|
|
|
|
<author>
|
|
|
|
<firstname>Martin</firstname>
|
|
|
|
<surname>Utesch</surname>
|
|
|
|
<affiliation>
|
|
|
|
<orgname>
|
|
|
|
University of Mining and Technology
|
|
|
|
</orgname>
|
|
|
|
<orgdiv>
|
|
|
|
Institute of Automatic Control
|
|
|
|
</orgdiv>
|
|
|
|
<address>
|
|
|
|
<city>
|
|
|
|
Freiberg
|
|
|
|
</city>
|
|
|
|
<country>
|
|
|
|
Germany
|
|
|
|
</country>
|
|
|
|
</address>
|
|
|
|
</affiliation>
|
|
|
|
</author>
|
|
|
|
<date>1997-10-02</date>
|
|
|
|
</docinfo>
|
|
|
|
|
2001-04-20 17:52:33 +02:00
|
|
|
<title>Genetic Query Optimization</title>
|
2000-08-23 07:59:11 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
<note>
|
|
|
|
<title>Author</title>
|
|
|
|
<para>
|
2000-12-22 22:51:58 +01:00
|
|
|
Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
|
2000-08-23 07:59:11 +02:00
|
|
|
for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
|
|
|
|
</para>
|
|
|
|
</note>
|
|
|
|
</para>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="geqo-intro">
|
2000-08-23 07:59:11 +02:00
|
|
|
<title>Query Handling as a Complex Optimization Problem</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Among all relational operators the most difficult one to process and
|
|
|
|
optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query
|
|
|
|
grows exponentially with the number of <command>join</command>s included in it. Further
|
|
|
|
optimization effort is caused by the support of a variety of
|
|
|
|
<firstterm>join methods</firstterm>
|
2000-12-16 23:44:47 +01:00
|
|
|
(e.g., nested loop, hash join, merge join in <productname>Postgres</productname>) to
|
2000-08-23 07:59:11 +02:00
|
|
|
process individual <command>join</command>s and a diversity of
|
2001-05-17 23:50:18 +02:00
|
|
|
<firstterm>indexes</firstterm> (e.g., R-tree,
|
|
|
|
B-tree, hash in <productname>Postgres</productname>) as access paths for relations.
|
2000-08-23 07:59:11 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The current <productname>Postgres</productname> optimizer
|
2000-12-16 23:44:47 +01:00
|
|
|
implementation performs a <firstterm>near-exhaustive search</firstterm>
|
|
|
|
over the space of alternative strategies. This query
|
2000-08-23 07:59:11 +02:00
|
|
|
optimization technique is inadequate to support database application
|
|
|
|
domains that involve the need for extensive queries, such as artificial
|
|
|
|
intelligence.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The Institute of Automatic Control at the University of Mining and
|
|
|
|
Technology, in Freiberg, Germany, encountered the described problems as its
|
|
|
|
folks wanted to take the <productname>Postgres</productname> DBMS as the backend for a decision
|
|
|
|
support knowledge based system for the maintenance of an electrical
|
|
|
|
power grid. The DBMS needed to handle large <command>join</command> queries for the
|
|
|
|
inference machine of the knowledge based system.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2000-12-16 23:44:47 +01:00
|
|
|
Performance difficulties in exploring the space of possible query
|
|
|
|
plans created the demand for a new optimization technique being developed.
|
2000-08-23 07:59:11 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm>
|
|
|
|
as an option for the database query optimization problem.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="geqo-intro2">
|
2000-08-23 07:59:11 +02:00
|
|
|
<title>Genetic Algorithms (<acronym>GA</acronym>)</title>
|
|
|
|
|
|
|
|
<para>
|
2000-12-16 23:44:47 +01:00
|
|
|
The <acronym>GA</acronym> is a heuristic optimization method which
|
|
|
|
operates through
|
2000-08-23 07:59:11 +02:00
|
|
|
determined, randomized search. The set of possible solutions for the
|
|
|
|
optimization problem is considered as a
|
2000-12-16 23:44:47 +01:00
|
|
|
<firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
|
2000-08-23 07:59:11 +02:00
|
|
|
The degree of adaption of an individual to its environment is specified
|
|
|
|
by its <firstterm>fitness</firstterm>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The coordinates of an individual in the search space are represented
|
|
|
|
by <firstterm>chromosomes</firstterm>, in essence a set of character
|
|
|
|
strings. A <firstterm>gene</firstterm> is a
|
|
|
|
subsection of a chromosome which encodes the value of a single parameter
|
|
|
|
being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
|
|
|
|
<firstterm>integer</firstterm>.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
|
|
|
|
<firstterm>mutation</firstterm>, and
|
|
|
|
<firstterm>selection</firstterm> new generations of search points are found
|
|
|
|
that show a higher average fitness than their ancestors.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
According to the "comp.ai.genetic" <acronym>FAQ</acronym> it cannot be stressed too
|
|
|
|
strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
|
|
|
|
problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
|
|
|
|
non-random (better than random).
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
Structured Diagram of a <acronym>GA</acronym>:
|
1998-03-01 09:16:16 +01:00
|
|
|
---------------------------
|
|
|
|
|
|
|
|
P(t) generation of ancestors at a time t
|
|
|
|
P''(t) generation of descendants at a time t
|
|
|
|
|
|
|
|
+=========================================+
|
|
|
|
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
|
|
|
|
+=========================================+
|
|
|
|
| INITIALIZE t := 0 |
|
|
|
|
+=========================================+
|
|
|
|
| INITIALIZE P(t) |
|
|
|
|
+=========================================+
|
|
|
|
| evalute FITNESS of P(t) |
|
|
|
|
+=========================================+
|
|
|
|
| while not STOPPING CRITERION do |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P'(t) := RECOMBINATION{P(t)} |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P''(t) := MUTATION{P'(t)} |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | P(t+1) := SELECTION{P''(t) + P(t)} |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | evalute FITNESS of P''(t) |
|
|
|
|
| +-------------------------------------+
|
|
|
|
| | t := t + 1 |
|
|
|
|
+===+=====================================+
|
2000-08-23 07:59:11 +02:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
2000-09-29 22:21:34 +02:00
|
|
|
<sect1 id="geqo-pg-intro">
|
2000-08-23 07:59:11 +02:00
|
|
|
<title>Genetic Query Optimization (<acronym>GEQO</acronym>) in Postgres</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <acronym>GEQO</acronym> module is intended for the solution of the query
|
|
|
|
optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>).
|
|
|
|
Possible query plans are encoded as integer strings. Each string
|
|
|
|
represents the <command>join</command> order from one relation of the query to the next.
|
|
|
|
E. g., the query tree
|
|
|
|
<programlisting>
|
|
|
|
/\
|
|
|
|
/\ 2
|
|
|
|
/\ 3
|
|
|
|
4 1
|
|
|
|
</programlisting>
|
|
|
|
is encoded by the integer string '4-1-3-2',
|
|
|
|
which means, first join relation '4' and '1', then '3', and
|
2000-12-16 23:44:47 +01:00
|
|
|
then '2', where 1, 2, 3, 4 are relids within the
|
|
|
|
<productname>Postgres</productname> optimizer.
|
2000-08-23 07:59:11 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
|
|
|
|
algorithm.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Specific characteristics of the <acronym>GEQO</acronym>
|
|
|
|
implementation in <productname>Postgres</productname>
|
|
|
|
are:
|
|
|
|
|
|
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
|
|
|
|
individuals in a population, not whole-generational replacement)
|
|
|
|
allows fast convergence towards improved query plans. This is
|
|
|
|
essential for query handling with reasonable time;
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
2000-12-16 23:44:47 +01:00
|
|
|
Usage of <firstterm>edge recombination crossover</firstterm> which is
|
|
|
|
especially suited
|
2000-08-23 07:59:11 +02:00
|
|
|
to keep edge losses low for the solution of the
|
2000-12-16 23:44:47 +01:00
|
|
|
<acronym>TSP</acronym> by means of a <acronym>GA</acronym>;
|
2000-08-23 07:59:11 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Mutation as genetic operator is deprecated so that no repair
|
|
|
|
mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2000-12-16 23:44:47 +01:00
|
|
|
The <acronym>GEQO</acronym> module allows
|
|
|
|
the <productname>Postgres</productname> query optimizer to
|
|
|
|
support large <command>join</command> queries effectively through
|
|
|
|
non-exhaustive search.
|
2000-08-23 07:59:11 +02:00
|
|
|
</para>
|
|
|
|
|
2000-12-16 23:44:47 +01:00
|
|
|
<sect2 id="geqo-future">
|
2000-08-23 07:59:11 +02:00
|
|
|
<title>Future Implementation Tasks for
|
2000-09-18 22:11:37 +02:00
|
|
|
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
2000-08-23 07:59:11 +02:00
|
|
|
|
|
|
|
<para>
|
2000-12-16 23:44:47 +01:00
|
|
|
Work is still needed to improve the genetic algorithm parameter
|
|
|
|
settings.
|
2000-08-23 07:59:11 +02:00
|
|
|
In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines
|
|
|
|
<function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
|
|
|
|
we have to find a compromise for the parameter settings
|
|
|
|
to satisfy two competing demands:
|
|
|
|
<itemizedlist spacing="compact">
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Optimality of the query plan
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Computing time
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<bibliography id="geqo-biblio">
|
|
|
|
<title>
|
|
|
|
References
|
|
|
|
</title>
|
|
|
|
<para>Reference information for <acronym>GEQ</acronym> algorithms.
|
|
|
|
</para>
|
|
|
|
<biblioentry>
|
|
|
|
|
|
|
|
<bookbiblio>
|
|
|
|
<title>
|
|
|
|
The Hitch-Hiker's Guide to Evolutionary Computation
|
|
|
|
</title>
|
|
|
|
<authorgroup>
|
|
|
|
<author>
|
|
|
|
<firstname>Jörg</firstname>
|
|
|
|
<surname>Heitkötter</surname>
|
|
|
|
</author>
|
|
|
|
<author>
|
|
|
|
<firstname>David</firstname>
|
|
|
|
<surname>Beasley</surname>
|
|
|
|
</author>
|
|
|
|
</authorgroup>
|
|
|
|
<publisher>
|
|
|
|
<publishername>
|
|
|
|
InterNet resource
|
|
|
|
</publishername>
|
|
|
|
</publisher>
|
|
|
|
<abstract>
|
|
|
|
<para>
|
|
|
|
FAQ in <ulink url="news://comp.ai.genetic">comp.ai.genetic</ulink>
|
|
|
|
is available at <ulink
|
|
|
|
url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ulink>.
|
|
|
|
</para>
|
|
|
|
</abstract>
|
|
|
|
</bookbiblio>
|
|
|
|
|
|
|
|
<bookbiblio>
|
|
|
|
<title>
|
|
|
|
The Design and Implementation of the Postgres Query Optimizer
|
|
|
|
</title>
|
|
|
|
<authorgroup>
|
|
|
|
<author>
|
|
|
|
<firstname>Z.</firstname>
|
|
|
|
<surname>Fong</surname>
|
|
|
|
</author>
|
|
|
|
</authorgroup>
|
|
|
|
<publisher>
|
|
|
|
<publishername>
|
|
|
|
University of California, Berkeley Computer Science Department
|
|
|
|
</publishername>
|
|
|
|
</publisher>
|
|
|
|
<abstract>
|
|
|
|
<para>
|
|
|
|
File <filename>planner/Report.ps</filename> in the 'postgres-papers' distribution.
|
|
|
|
</para>
|
|
|
|
</abstract>
|
|
|
|
</bookbiblio>
|
|
|
|
|
|
|
|
<bookbiblio>
|
|
|
|
<title>
|
|
|
|
Fundamentals of Database Systems
|
|
|
|
</title>
|
|
|
|
<authorgroup>
|
|
|
|
<author>
|
|
|
|
<firstname>R.</firstname>
|
|
|
|
<surname>Elmasri</surname>
|
|
|
|
</author>
|
|
|
|
<author>
|
|
|
|
<firstname>S.</firstname>
|
|
|
|
<surname>Navathe</surname>
|
|
|
|
</author>
|
|
|
|
</authorgroup>
|
|
|
|
<publisher>
|
|
|
|
<publishername>
|
|
|
|
The Benjamin/Cummings Pub., Inc.
|
|
|
|
</publishername>
|
|
|
|
</publisher>
|
|
|
|
</bookbiblio>
|
|
|
|
|
|
|
|
</biblioentry>
|
|
|
|
</bibliography>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
1999-03-30 17:25:56 +02:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
2000-03-31 05:27:42 +02:00
|
|
|
mode:sgml
|
1999-03-30 17:25:56 +02:00
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
2000-03-31 05:27:42 +02:00
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
1999-03-30 17:25:56 +02:00
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|