postgresql/doc/src/sgml/geqo.sgml

319 lines
9.9 KiB
Plaintext
Raw Normal View History

<!--
2002-01-20 23:19:57 +01:00
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.23 2002/01/20 22:19:56 petere Exp $
Genetic Optimizer
-->
<chapter id="geqo">
<docinfo>
<author>
<firstname>Martin</firstname>
<surname>Utesch</surname>
<affiliation>
<orgname>
University of Mining and Technology
</orgname>
<orgdiv>
Institute of Automatic Control
</orgdiv>
<address>
<city>
Freiberg
</city>
<country>
Germany
</country>
</address>
</affiliation>
</author>
<date>1997-10-02</date>
</docinfo>
<title>Genetic Query Optimization</title>
<para>
<note>
<title>Author</title>
<para>
Written by Martin Utesch (<email>utesch@aut.tu-freiberg.de</email>)
for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
</para>
</note>
</para>
<sect1 id="geqo-intro">
<title>Query Handling as a Complex Optimization Problem</title>
<para>
Among all relational operators the most difficult one to process and
optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query
grows exponentially with the number of joins included in it. Further
optimization effort is caused by the support of a variety of
<firstterm>join methods</firstterm>
(e.g., nested loop, hash join, merge join in <productname>PostgreSQL</productname>) to
process individual joins and a diversity of
<firstterm>indexes</firstterm> (e.g., R-tree,
B-tree, hash in <productname>PostgreSQL</productname>) as access paths for relations.
</para>
<para>
The current <productname>PostgreSQL</productname> optimizer
2000-12-16 23:44:47 +01:00
implementation performs a <firstterm>near-exhaustive search</firstterm>
over the space of alternative strategies. This query
optimization technique is inadequate to support database application
domains that involve the need for extensive queries, such as artificial
intelligence.
</para>
<para>
The Institute of Automatic Control at the University of Mining and
Technology, in Freiberg, Germany, encountered the described problems as its
folks wanted to take the <productname>PostgreSQL</productname> DBMS as the backend for a decision
support knowledge based system for the maintenance of an electrical
power grid. The DBMS needed to handle large join queries for the
inference machine of the knowledge based system.
</para>
<para>
2000-12-16 23:44:47 +01:00
Performance difficulties in exploring the space of possible query
plans created the demand for a new optimization technique being developed.
</para>
<para>
In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm>
as an option for the database query optimization problem.
</para>
</sect1>
<sect1 id="geqo-intro2">
<title>Genetic Algorithms</title>
<para>
The genetic algorithm (<acronym>GA</acronym>) is a heuristic optimization method which
2000-12-16 23:44:47 +01:00
operates through
determined, randomized search. The set of possible solutions for the
optimization problem is considered as a
2000-12-16 23:44:47 +01:00
<firstterm>population</firstterm> of <firstterm>individuals</firstterm>.
2002-01-20 23:19:57 +01:00
The degree of adaptation of an individual to its environment is specified
by its <firstterm>fitness</firstterm>.
</para>
<para>
The coordinates of an individual in the search space are represented
by <firstterm>chromosomes</firstterm>, in essence a set of character
strings. A <firstterm>gene</firstterm> is a
subsection of a chromosome which encodes the value of a single parameter
being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
<firstterm>integer</firstterm>.
</para>
<para>
Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
<firstterm>mutation</firstterm>, and
<firstterm>selection</firstterm> new generations of search points are found
that show a higher average fitness than their ancestors.
</para>
<para>
According to the <systemitem class="resource">comp.ai.genetic</> <acronym>FAQ</acronym> it cannot be stressed too
strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
non-random (better than random).
</para>
<figure id="geqo-diagram">
<title>Structured Diagram of a Genetic Algorithm</title>
<informaltable frame="none">
<tgroup cols="2">
<tbody>
<row>
<entry>P(t)</entry>
<entry>generation of ancestors at a time t</entry>
</row>
<row>
<entry>P''(t)</entry>
<entry>generation of descendants at a time t</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<literallayout class="monospaced">
1998-03-01 09:16:16 +01:00
+=========================================+
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
+=========================================+
| INITIALIZE t := 0 |
+=========================================+
| INITIALIZE P(t) |
+=========================================+
2002-01-20 23:19:57 +01:00
| evaluate FITNESS of P(t) |
1998-03-01 09:16:16 +01:00
+=========================================+
| while not STOPPING CRITERION do |
| +-------------------------------------+
| | P'(t) := RECOMBINATION{P(t)} |
| +-------------------------------------+
| | P''(t) := MUTATION{P'(t)} |
| +-------------------------------------+
| | P(t+1) := SELECTION{P''(t) + P(t)} |
| +-------------------------------------+
2002-01-20 23:19:57 +01:00
| | evaluate FITNESS of P''(t) |
1998-03-01 09:16:16 +01:00
| +-------------------------------------+
| | t := t + 1 |
+===+=====================================+
</literallayout>
</figure>
</sect1>
<sect1 id="geqo-pg-intro">
<title>Genetic Query Optimization (<acronym>GEQO</acronym>) in PostgreSQL</title>
<para>
The <acronym>GEQO</acronym> module is intended for the solution of the query
optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>).
Possible query plans are encoded as integer strings. Each string
represents the join order from one relation of the query to the next.
E. g., the query tree
<literallayout class="monospaced">
/\
/\ 2
/\ 3
4 1
</literallayout>
is encoded by the integer string '4-1-3-2',
which means, first join relation '4' and '1', then '3', and
2002-01-20 23:19:57 +01:00
then '2', where 1, 2, 3, 4 are relation IDs within the
<productname>PostgreSQL</productname> optimizer.
</para>
<para>
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
algorithm.
</para>
<para>
Specific characteristics of the <acronym>GEQO</acronym>
implementation in <productname>PostgreSQL</productname>
are:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
individuals in a population, not whole-generational replacement)
allows fast convergence towards improved query plans. This is
essential for query handling with reasonable time;
</para>
</listitem>
<listitem>
<para>
2000-12-16 23:44:47 +01:00
Usage of <firstterm>edge recombination crossover</firstterm> which is
especially suited
to keep edge losses low for the solution of the
2000-12-16 23:44:47 +01:00
<acronym>TSP</acronym> by means of a <acronym>GA</acronym>;
</para>
</listitem>
<listitem>
<para>
Mutation as genetic operator is deprecated so that no repair
mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
</para>
</listitem>
</itemizedlist>
</para>
<para>
2000-12-16 23:44:47 +01:00
The <acronym>GEQO</acronym> module allows
the <productname>PostgreSQL</productname> query optimizer to
support large join queries effectively through
2000-12-16 23:44:47 +01:00
non-exhaustive search.
</para>
2000-12-16 23:44:47 +01:00
<sect2 id="geqo-future">
<title>Future Implementation Tasks for
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
<para>
2000-12-16 23:44:47 +01:00
Work is still needed to improve the genetic algorithm parameter
settings.
In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines
<function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
we have to find a compromise for the parameter settings
to satisfy two competing demands:
<itemizedlist spacing="compact">
<listitem>
<para>
Optimality of the query plan
</para>
</listitem>
<listitem>
<para>
Computing time
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="geqo-biblio">
<title>Further Readings</title>
<para>
The following resources contain additional information about
genetic algorithms:
<itemizedlist>
<listitem>
<para>
<ulink url="http://surf.de.uu.net/encore/www/">The Hitch-Hiker's
Guide to Evolutionary Computation</ulink> (FAQ for <ulink
url="news://comp.ai.genetic">comp.ai.genetic</ulink>)
</para>
</listitem>
<listitem>
<para>
<ulink url="http://www.red3d.com/cwr/evolve.html">Evolutionary
Computation and its application to art and design</ulink> by
Craig Reynolds
</para>
</listitem>
<listitem>
<para>
<xref linkend="ELMA99">
</para>
</listitem>
<listitem>
<para>
<xref linkend="FONG">
</para>
</listitem>
</itemizedlist>
</para>
</sect1>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->