This patch makes a few incremental improvements to geqo.sgml and

arch-dev.sgml

Neil Conway
This commit is contained in:
Bruce Momjian 2003-09-29 18:18:35 +00:00
parent 04e401f97f
commit a17b53753e
6 changed files with 101 additions and 65 deletions

View File

@ -1,5 +1,5 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tgl Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.22 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="overview">
@ -25,7 +25,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
very extensive. Rather, this chapter is intended to help the reader
understand the general sequence of operations that occur within the
backend from the point at which a query is received, to the point
when the results are returned to the client.
at which the results are returned to the client.
</para>
<sect1 id="query-path">
@ -79,7 +79,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<step>
<para>
The <firstterm>planner/optimizer</firstterm> takes
the (rewritten) querytree and creates a
the (rewritten) query tree and creates a
<firstterm>query plan</firstterm> that will be the input to the
<firstterm>executor</firstterm>.
</para>
@ -183,12 +183,12 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<title>Parser</title>
<para>
The parser has to check the query string (which arrives as
plain ASCII text) for valid syntax. If the syntax is correct a
<firstterm>parse tree</firstterm> is built up and handed back otherwise an error is
returned. For the implementation the well known Unix
tools <application>lex</application> and <application>yacc</application>
are used.
The parser has to check the query string (which arrives as plain
ASCII text) for valid syntax. If the syntax is correct a
<firstterm>parse tree</firstterm> is built up and handed back;
otherwise an error is returned. The parser and lexer are
implemented using the well-known Unix tools <application>yacc</>
and <application>lex</>.
</para>
<para>
@ -201,23 +201,22 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
</para>
<para>
The parser is defined in the file <filename>gram.y</filename> and consists of a
set of <firstterm>grammar rules</firstterm> and <firstterm>actions</firstterm>
that are executed
whenever a rule is fired. The code of the actions (which
is actually C-code) is used to build up the parse tree.
The parser is defined in the file <filename>gram.y</filename> and
consists of a set of <firstterm>grammar rules</firstterm> and
<firstterm>actions</firstterm> that are executed whenever a rule
is fired. The code of the actions (which is actually C code) is
used to build up the parse tree.
</para>
<para>
The file <filename>scan.l</filename> is transformed to
the C-source file <filename>scan.c</filename>
using the program <application>lex</application>
and <filename>gram.y</filename> is transformed to
<filename>gram.c</filename> using <application>yacc</application>.
After these transformations have taken
place a normal C-compiler can be used to create the
parser. Never make any changes to the generated C-files as they will
be overwritten the next time <application>lex</application>
The file <filename>scan.l</filename> is transformed to the C
source file <filename>scan.c</filename> using the program
<application>lex</application> and <filename>gram.y</filename> is
transformed to <filename>gram.c</filename> using
<application>yacc</application>. After these transformations
have taken place a normal C compiler can be used to create the
parser. Never make any changes to the generated C files as they
will be overwritten the next time <application>lex</application>
or <application>yacc</application> is called.
<note>
@ -334,15 +333,27 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<title>Planner/Optimizer</title>
<para>
The task of the <firstterm>planner/optimizer</firstterm> is to create an optimal
execution plan. It first considers all possible ways of
<firstterm>scanning</firstterm> and <firstterm>joining</firstterm>
the relations that appear in a
query. All the created paths lead to the same result and it's the
task of the optimizer to estimate the cost of executing each path and
find out which one is the cheapest.
The task of the <firstterm>planner/optimizer</firstterm> is to
create an optimal execution plan. A given SQL query (and hence, a
query tree) can be actually executed in a wide variety of
different ways, each of which will produce the same set of
results. If it is computationally feasible, the query optimizer
will examine each of these possible execution plans, ultimately
selecting the execution plan that will run the fastest.
</para>
<note>
<para>
In some situations, examining each possible way in which a query
may be executed would take an excessive amount of time and memory
space. In particular, this occurs when executing queries
involving large numbers of join operations. In order to determine
a reasonable (not optimal) query plan in a reasonable amount of
time, <productname>PostgreSQL</productname> uses a <xref
linkend="geqo" endterm="geqo-title">.
</para>
</note>
<para>
After the cheapest path is determined, a <firstterm>plan tree</>
is built to pass to the executor. This represents the desired
@ -373,7 +384,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
After all feasible plans have been found for scanning single relations,
plans for joining relations are created. The planner/optimizer
preferentially considers joins between any two relations for which there
exist a corresponding join clause in the WHERE qualification (i.e. for
exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
exists). Join pairs with no join clause are considered only when there
is no other choice, that is, a particular relation has no available
@ -416,17 +427,19 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
</para>
<para>
The finished plan tree consists of sequential or index scans of the
base relations, plus nestloop, merge, or hash join nodes as needed,
plus any auxiliary steps needed, such as sort nodes or aggregate-function
calculation nodes. Most of these plan node types have the additional
ability to do <firstterm>selection</> (discarding rows that do
not meet a specified boolean condition) and <firstterm>projection</>
(computation of a derived column set based on given column values,
that is, evaluation of scalar expressions where needed). One of
the responsibilities of the planner is to attach selection conditions
from the WHERE clause and computation of required output expressions
to the most appropriate nodes of the plan tree.
The finished plan tree consists of sequential or index scans of
the base relations, plus nestloop, merge, or hash join nodes as
needed, plus any auxiliary steps needed, such as sort nodes or
aggregate-function calculation nodes. Most of these plan node
types have the additional ability to do <firstterm>selection</>
(discarding rows that do not meet a specified boolean condition)
and <firstterm>projection</> (computation of a derived column set
based on given column values, that is, evaluation of scalar
expressions where needed). One of the responsibilities of the
planner is to attach selection conditions from the
<literal>WHERE</literal> clause and computation of required
output expressions to the most appropriate nodes of the plan
tree.
</para>
</sect2>
</sect1>

View File

@ -1,5 +1,5 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.23 2002/01/20 22:19:56 petere Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.24 2003/09/29 18:18:35 momjian Exp $
Genetic Optimizer
-->
@ -28,7 +28,7 @@ Genetic Optimizer
<date>1997-10-02</date>
</docinfo>
<title>Genetic Query Optimization</title>
<title id="geqo-title">Genetic Query Optimizer</title>
<para>
<note>
@ -44,24 +44,29 @@ Genetic Optimizer
<title>Query Handling as a Complex Optimization Problem</title>
<para>
Among all relational operators the most difficult one to process and
optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query
grows exponentially with the number of joins included in it. Further
optimization effort is caused by the support of a variety of
<firstterm>join methods</firstterm>
(e.g., nested loop, hash join, merge join in <productname>PostgreSQL</productname>) to
process individual joins and a diversity of
<firstterm>indexes</firstterm> (e.g., R-tree,
B-tree, hash in <productname>PostgreSQL</productname>) as access paths for relations.
Among all relational operators the most difficult one to process
and optimize is the <firstterm>join</firstterm>. The number of
alternative plans to answer a query grows exponentially with the
number of joins included in it. Further optimization effort is
caused by the support of a variety of <firstterm>join
methods</firstterm> (e.g., nested loop, hash join, merge join in
<productname>PostgreSQL</productname>) to process individual joins
and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
B-tree, hash in <productname>PostgreSQL</productname>) as access
paths for relations.
</para>
<para>
The current <productname>PostgreSQL</productname> optimizer
implementation performs a <firstterm>near-exhaustive search</firstterm>
over the space of alternative strategies. This query
optimization technique is inadequate to support database application
domains that involve the need for extensive queries, such as artificial
intelligence.
implementation performs a <firstterm>near-exhaustive
search</firstterm> over the space of alternative strategies. This
algorithm, first introduced in the <quote>System R</quote>
database, produces a near-optimal join order, but can take an
enormous amount of time and memory space when the number of joins
in the query grows large. This makes the ordinary
<productname>PostgreSQL</productname> query optimizer
inappropriate for database application domains that involve the
need for extensive queries, such as artificial intelligence.
</para>
<para>
@ -75,12 +80,14 @@ Genetic Optimizer
<para>
Performance difficulties in exploring the space of possible query
plans created the demand for a new optimization technique being developed.
plans created the demand for a new optimization technique to be developed.
</para>
<para>
In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm>
as an option for the database query optimization problem.
In the following we describe the implementation of a
<firstterm>Genetic Algorithm</firstterm> to solve the join
ordering problem in a manner that is efficient for queries
involving large numbers of joins.
</para>
</sect1>
@ -208,10 +215,10 @@ Genetic Optimizer
<listitem>
<para>
Usage of <firstterm>edge recombination crossover</firstterm> which is
especially suited
to keep edge losses low for the solution of the
<acronym>TSP</acronym> by means of a <acronym>GA</acronym>;
Usage of <firstterm>edge recombination crossover</firstterm>
which is especially suited to keep edge losses low for the
solution of the <acronym>TSP</acronym> by means of a
<acronym>GA</acronym>;
</para>
</listitem>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/gist.sgml,v 1.12 2003/09/29 18:18:35 momjian Exp $
-->
<Chapter Id="gist">
<DocInfo>
<AuthorGroup>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/install-win32.sgml,v 1.12 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="install-win32">
<title>Installation on <productname>Windows</productname></title>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/libpgtcl.sgml,v 1.38 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="pgtcl">
<title><application>pgtcl</application> - Tcl Binding Library</title>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/page.sgml,v 1.14 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="page">
<title>Page Files</title>