mirror of
https://git.postgresql.org/git/postgresql.git
synced 2024-08-22 16:23:24 +02:00
72ad5fe15c
names for the HTML files (e.g., not x4856.htm).
395 lines
12 KiB
Plaintext
395 lines
12 KiB
Plaintext
<!--
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.13 2000/09/29 20:21:33 petere Exp $
|
|
Genetic Optimizer
|
|
-->
|
|
|
|
<chapter id="geqo">
|
|
<docinfo>
|
|
<author>
|
|
<firstname>Martin</firstname>
|
|
<surname>Utesch</surname>
|
|
<affiliation>
|
|
<orgname>
|
|
University of Mining and Technology
|
|
</orgname>
|
|
<orgdiv>
|
|
Institute of Automatic Control
|
|
</orgdiv>
|
|
<address>
|
|
<city>
|
|
Freiberg
|
|
</city>
|
|
<country>
|
|
Germany
|
|
</country>
|
|
</address>
|
|
</affiliation>
|
|
</author>
|
|
<date>1997-10-02</date>
|
|
</docinfo>
|
|
|
|
<title>Genetic Query Optimization in Database Systems</title>
|
|
|
|
<para>
|
|
<note>
|
|
<title>Author</title>
|
|
<para>
|
|
Written by <ulink url="mailto:utesch@aut.tu-freiberg.de">Martin Utesch</ulink>
|
|
for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
|
|
</para>
|
|
</note>
|
|
</para>
|
|
|
|
<sect1 id="geqo-intro">
|
|
<title>Query Handling as a Complex Optimization Problem</title>
|
|
|
|
<para>
|
|
Among all relational operators the most difficult one to process and
|
|
optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query
|
|
grows exponentially with the number of <command>join</command>s included in it. Further
|
|
optimization effort is caused by the support of a variety of
|
|
<firstterm>join methods</firstterm>
|
|
(e.g., nested loop, index scan, merge join in <productname>Postgres</productname>) to
|
|
process individual <command>join</command>s and a diversity of
|
|
<firstterm>indices</firstterm> (e.g., r-tree,
|
|
b-tree, hash in <productname>Postgres</productname>) as access paths for relations.
|
|
</para>
|
|
|
|
<para>
|
|
The current <productname>Postgres</productname> optimizer
|
|
implementation performs a <firstterm>near-
|
|
exhaustive search</firstterm> over the space of alternative strategies. This query
|
|
optimization technique is inadequate to support database application
|
|
domains that involve the need for extensive queries, such as artificial
|
|
intelligence.
|
|
</para>
|
|
|
|
<para>
|
|
The Institute of Automatic Control at the University of Mining and
|
|
Technology, in Freiberg, Germany, encountered the described problems as its
|
|
folks wanted to take the <productname>Postgres</productname> DBMS as the backend for a decision
|
|
support knowledge based system for the maintenance of an electrical
|
|
power grid. The DBMS needed to handle large <command>join</command> queries for the
|
|
inference machine of the knowledge based system.
|
|
</para>
|
|
|
|
<para>
|
|
Performance difficulties within exploring the space of possible query
|
|
plans arose the demand for a new optimization technique being developed.
|
|
</para>
|
|
|
|
<para>
|
|
In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm>
|
|
as an option for the database query optimization problem.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="geqo-intro2">
|
|
<title>Genetic Algorithms (<acronym>GA</acronym>)</title>
|
|
|
|
<para>
|
|
The <acronym>GA</acronym> is a heuristic optimization method which operates through
|
|
determined, randomized search. The set of possible solutions for the
|
|
optimization problem is considered as a
|
|
<firstterm>erm>popula</firstterm>erm> of <firstterm>individuals</firstterm>.
|
|
The degree of adaption of an individual to its environment is specified
|
|
by its <firstterm>fitness</firstterm>.
|
|
</para>
|
|
|
|
<para>
|
|
The coordinates of an individual in the search space are represented
|
|
by <firstterm>chromosomes</firstterm>, in essence a set of character
|
|
strings. A <firstterm>gene</firstterm> is a
|
|
subsection of a chromosome which encodes the value of a single parameter
|
|
being optimized. Typical encodings for a gene could be <firstterm>binary</firstterm> or
|
|
<firstterm>integer</firstterm>.
|
|
</para>
|
|
|
|
<para>
|
|
Through simulation of the evolutionary operations <firstterm>recombination</firstterm>,
|
|
<firstterm>mutation</firstterm>, and
|
|
<firstterm>selection</firstterm> new generations of search points are found
|
|
that show a higher average fitness than their ancestors.
|
|
</para>
|
|
|
|
<para>
|
|
According to the "comp.ai.genetic" <acronym>FAQ</acronym> it cannot be stressed too
|
|
strongly that a <acronym>GA</acronym> is not a pure random search for a solution to a
|
|
problem. A <acronym>GA</acronym> uses stochastic processes, but the result is distinctly
|
|
non-random (better than random).
|
|
|
|
<programlisting>
|
|
Structured Diagram of a <acronym>GA</acronym>:
|
|
---------------------------
|
|
|
|
P(t) generation of ancestors at a time t
|
|
P''(t) generation of descendants at a time t
|
|
|
|
+=========================================+
|
|
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
|
|
+=========================================+
|
|
| INITIALIZE t := 0 |
|
|
+=========================================+
|
|
| INITIALIZE P(t) |
|
|
+=========================================+
|
|
| evalute FITNESS of P(t) |
|
|
+=========================================+
|
|
| while not STOPPING CRITERION do |
|
|
| +-------------------------------------+
|
|
| | P'(t) := RECOMBINATION{P(t)} |
|
|
| +-------------------------------------+
|
|
| | P''(t) := MUTATION{P'(t)} |
|
|
| +-------------------------------------+
|
|
| | P(t+1) := SELECTION{P''(t) + P(t)} |
|
|
| +-------------------------------------+
|
|
| | evalute FITNESS of P''(t) |
|
|
| +-------------------------------------+
|
|
| | t := t + 1 |
|
|
+===+=====================================+
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="geqo-pg-intro">
|
|
<title>Genetic Query Optimization (<acronym>GEQO</acronym>) in Postgres</title>
|
|
|
|
<para>
|
|
The <acronym>GEQO</acronym> module is intended for the solution of the query
|
|
optimization problem similar to a traveling salesman problem (<acronym>TSP</acronym>).
|
|
Possible query plans are encoded as integer strings. Each string
|
|
represents the <command>join</command> order from one relation of the query to the next.
|
|
E. g., the query tree
|
|
<programlisting>
|
|
/\
|
|
/\ 2
|
|
/\ 3
|
|
4 1
|
|
</programlisting>
|
|
is encoded by the integer string '4-1-3-2',
|
|
which means, first join relation '4' and '1', then '3', and
|
|
then '2', where 1, 2, 3, 4 are relids in <productname>Postgres</productname>.
|
|
</para>
|
|
|
|
<para>
|
|
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
|
|
algorithm.
|
|
</para>
|
|
|
|
<para>
|
|
Specific characteristics of the <acronym>GEQO</acronym>
|
|
implementation in <productname>Postgres</productname>
|
|
are:
|
|
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
<listitem>
|
|
<para>
|
|
Usage of a <firstterm>steady state</firstterm> <acronym>GA</acronym> (replacement of the least fit
|
|
individuals in a population, not whole-generational replacement)
|
|
allows fast convergence towards improved query plans. This is
|
|
essential for query handling with reasonable time;
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Usage of <firstterm>edge recombination crossover</firstterm> which is especially suited
|
|
to keep edge losses low for the solution of the
|
|
<acronym>cro</acronym>cronym> by means of a <acronym>GA</acronym>;
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Mutation as genetic operator is deprecated so that no repair
|
|
mechanisms are needed to generate legal <acronym>TSP</acronym> tours.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
The <acronym>GEQO</acronym> module gives the following benefits to
|
|
the <productname>Postgres</productname> DBMS
|
|
compared to the <productname>Postgres</productname> query optimizer implementation:
|
|
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
<listitem>
|
|
<para>
|
|
Handling of large <command>join</command> queries through non-exhaustive search;
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Improved cost size approximation of query plans since no longer
|
|
plan merging is needed (the <acronym>GEQO</acronym> module evaluates the cost for a
|
|
query plan as an individual).
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="geqo-future">
|
|
<title>Future Implementation Tasks for
|
|
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
|
|
|
<sect2>
|
|
<title>Basic Improvements</title>
|
|
|
|
<sect3>
|
|
<title>Improve genetic algorithm parameter settings</title>
|
|
|
|
<para>
|
|
In file <filename>backend/optimizer/geqo/geqo_params.c</filename>, routines
|
|
<function>gimme_pool_size</function> and <function>gimme_number_generations</function>,
|
|
we have to find a compromise for the parameter settings
|
|
to satisfy two competing demands:
|
|
<itemizedlist spacing="compact">
|
|
<listitem>
|
|
<para>
|
|
Optimality of the query plan
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Computing time
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Find better solution for integer overflow</title>
|
|
|
|
<para>
|
|
In file <filename>backend/optimizer/geqo/geqo_eval.c</filename>, routine
|
|
<function>geqo_joinrel_size</function>,
|
|
the present hack for MAXINT overflow is to set the <productname>Postgres</productname> integer
|
|
value of <structfield>rel->size</structfield> to its logarithm.
|
|
Modifications of <structname>Rel</structname> in <filename>backend/nodes/relation.h</filename> will
|
|
surely have severe impacts on the whole <productname>Postgres</productname> implementation.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Find solution for exhausted memory</title>
|
|
|
|
<para>
|
|
Memory exhaustion may occur with more than 10 relations involved in a query.
|
|
In file <filename>backend/optimizer/geqo/geqo_eval.c</filename>, routine
|
|
<function>gimme_tree</function> is recursively called.
|
|
Maybe I forgot something to be freed correctly, but I dunno what.
|
|
Of course the <structname>rel</structname> data structure of the
|
|
<command>join</command> keeps growing and
|
|
growing the more relations are packed into it.
|
|
Suggestions are welcome :-(
|
|
</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
|
|
<bibliography id="geqo-biblio">
|
|
<title>
|
|
References
|
|
</title>
|
|
<para>Reference information for <acronym>GEQ</acronym> algorithms.
|
|
</para>
|
|
<biblioentry>
|
|
|
|
<bookbiblio>
|
|
<title>
|
|
The Hitch-Hiker's Guide to Evolutionary Computation
|
|
</title>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Jörg</firstname>
|
|
<surname>Heitkötter</surname>
|
|
</author>
|
|
<author>
|
|
<firstname>David</firstname>
|
|
<surname>Beasley</surname>
|
|
</author>
|
|
</authorgroup>
|
|
<publisher>
|
|
<publishername>
|
|
InterNet resource
|
|
</publishername>
|
|
</publisher>
|
|
<abstract>
|
|
<para>
|
|
FAQ in <ulink url="news://comp.ai.genetic">comp.ai.genetic</ulink>
|
|
is available at <ulink
|
|
url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ulink>.
|
|
</para>
|
|
</abstract>
|
|
</bookbiblio>
|
|
|
|
<bookbiblio>
|
|
<title>
|
|
The Design and Implementation of the Postgres Query Optimizer
|
|
</title>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Z.</firstname>
|
|
<surname>Fong</surname>
|
|
</author>
|
|
</authorgroup>
|
|
<publisher>
|
|
<publishername>
|
|
University of California, Berkeley Computer Science Department
|
|
</publishername>
|
|
</publisher>
|
|
<abstract>
|
|
<para>
|
|
File <filename>planner/Report.ps</filename> in the 'postgres-papers' distribution.
|
|
</para>
|
|
</abstract>
|
|
</bookbiblio>
|
|
|
|
<bookbiblio>
|
|
<title>
|
|
Fundamentals of Database Systems
|
|
</title>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>R.</firstname>
|
|
<surname>Elmasri</surname>
|
|
</author>
|
|
<author>
|
|
<firstname>S.</firstname>
|
|
<surname>Navathe</surname>
|
|
</author>
|
|
</authorgroup>
|
|
<publisher>
|
|
<publishername>
|
|
The Benjamin/Cummings Pub., Inc.
|
|
</publishername>
|
|
</publisher>
|
|
</bookbiblio>
|
|
|
|
</biblioentry>
|
|
</bibliography>
|
|
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|