This patch makes a few incremental improvements to geqo.sgml and

arch-dev.sgml

Neil Conway
This commit is contained in:
Bruce Momjian 2003-09-29 18:18:35 +00:00
parent 04e401f97f
commit a17b53753e
6 changed files with 101 additions and 65 deletions

View File

@ -1,5 +1,5 @@
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tgl Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.22 2003/09/29 18:18:35 momjian Exp $
--> -->
<chapter id="overview"> <chapter id="overview">
@ -25,7 +25,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
very extensive. Rather, this chapter is intended to help the reader very extensive. Rather, this chapter is intended to help the reader
understand the general sequence of operations that occur within the understand the general sequence of operations that occur within the
backend from the point at which a query is received, to the point backend from the point at which a query is received, to the point
when the results are returned to the client. at which the results are returned to the client.
</para> </para>
<sect1 id="query-path"> <sect1 id="query-path">
@ -79,7 +79,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<step> <step>
<para> <para>
The <firstterm>planner/optimizer</firstterm> takes The <firstterm>planner/optimizer</firstterm> takes
the (rewritten) querytree and creates a the (rewritten) query tree and creates a
<firstterm>query plan</firstterm> that will be the input to the <firstterm>query plan</firstterm> that will be the input to the
<firstterm>executor</firstterm>. <firstterm>executor</firstterm>.
</para> </para>
@ -183,12 +183,12 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<title>Parser</title> <title>Parser</title>
<para> <para>
The parser has to check the query string (which arrives as The parser has to check the query string (which arrives as plain
plain ASCII text) for valid syntax. If the syntax is correct a ASCII text) for valid syntax. If the syntax is correct a
<firstterm>parse tree</firstterm> is built up and handed back otherwise an error is <firstterm>parse tree</firstterm> is built up and handed back;
returned. For the implementation the well known Unix otherwise an error is returned. The parser and lexer are
tools <application>lex</application> and <application>yacc</application> implemented using the well-known Unix tools <application>yacc</>
are used. and <application>lex</>.
</para> </para>
<para> <para>
@ -201,23 +201,22 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
</para> </para>
<para> <para>
The parser is defined in the file <filename>gram.y</filename> and consists of a The parser is defined in the file <filename>gram.y</filename> and
set of <firstterm>grammar rules</firstterm> and <firstterm>actions</firstterm> consists of a set of <firstterm>grammar rules</firstterm> and
that are executed <firstterm>actions</firstterm> that are executed whenever a rule
whenever a rule is fired. The code of the actions (which is fired. The code of the actions (which is actually C code) is
is actually C-code) is used to build up the parse tree. used to build up the parse tree.
</para> </para>
<para> <para>
The file <filename>scan.l</filename> is transformed to The file <filename>scan.l</filename> is transformed to the C
the C-source file <filename>scan.c</filename> source file <filename>scan.c</filename> using the program
using the program <application>lex</application> <application>lex</application> and <filename>gram.y</filename> is
and <filename>gram.y</filename> is transformed to transformed to <filename>gram.c</filename> using
<filename>gram.c</filename> using <application>yacc</application>. <application>yacc</application>. After these transformations
After these transformations have taken have taken place a normal C compiler can be used to create the
place a normal C-compiler can be used to create the parser. Never make any changes to the generated C files as they
parser. Never make any changes to the generated C-files as they will will be overwritten the next time <application>lex</application>
be overwritten the next time <application>lex</application>
or <application>yacc</application> is called. or <application>yacc</application> is called.
<note> <note>
@ -334,15 +333,27 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
<title>Planner/Optimizer</title> <title>Planner/Optimizer</title>
<para> <para>
The task of the <firstterm>planner/optimizer</firstterm> is to create an optimal The task of the <firstterm>planner/optimizer</firstterm> is to
execution plan. It first considers all possible ways of create an optimal execution plan. A given SQL query (and hence, a
<firstterm>scanning</firstterm> and <firstterm>joining</firstterm> query tree) can be actually executed in a wide variety of
the relations that appear in a different ways, each of which will produce the same set of
query. All the created paths lead to the same result and it's the results. If it is computationally feasible, the query optimizer
task of the optimizer to estimate the cost of executing each path and will examine each of these possible execution plans, ultimately
find out which one is the cheapest. selecting the execution plan that will run the fastest.
</para> </para>
<note>
<para>
In some situations, examining each possible way in which a query
may be executed would take an excessive amount of time and memory
space. In particular, this occurs when executing queries
involving large numbers of join operations. In order to determine
a reasonable (not optimal) query plan in a reasonable amount of
time, <productname>PostgreSQL</productname> uses a <xref
linkend="geqo" endterm="geqo-title">.
</para>
</note>
<para> <para>
After the cheapest path is determined, a <firstterm>plan tree</> After the cheapest path is determined, a <firstterm>plan tree</>
is built to pass to the executor. This represents the desired is built to pass to the executor. This represents the desired
@ -373,7 +384,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
After all feasible plans have been found for scanning single relations, After all feasible plans have been found for scanning single relations,
plans for joining relations are created. The planner/optimizer plans for joining relations are created. The planner/optimizer
preferentially considers joins between any two relations for which there preferentially considers joins between any two relations for which there
exist a corresponding join clause in the WHERE qualification (i.e. for exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal> which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
exists). Join pairs with no join clause are considered only when there exists). Join pairs with no join clause are considered only when there
is no other choice, that is, a particular relation has no available is no other choice, that is, a particular relation has no available
@ -416,17 +427,19 @@ $Header: /cvsroot/pgsql/doc/src/sgml/arch-dev.sgml,v 2.21 2003/06/22 16:16:44 tg
</para> </para>
<para> <para>
The finished plan tree consists of sequential or index scans of the The finished plan tree consists of sequential or index scans of
base relations, plus nestloop, merge, or hash join nodes as needed, the base relations, plus nestloop, merge, or hash join nodes as
plus any auxiliary steps needed, such as sort nodes or aggregate-function needed, plus any auxiliary steps needed, such as sort nodes or
calculation nodes. Most of these plan node types have the additional aggregate-function calculation nodes. Most of these plan node
ability to do <firstterm>selection</> (discarding rows that do types have the additional ability to do <firstterm>selection</>
not meet a specified boolean condition) and <firstterm>projection</> (discarding rows that do not meet a specified boolean condition)
(computation of a derived column set based on given column values, and <firstterm>projection</> (computation of a derived column set
that is, evaluation of scalar expressions where needed). One of based on given column values, that is, evaluation of scalar
the responsibilities of the planner is to attach selection conditions expressions where needed). One of the responsibilities of the
from the WHERE clause and computation of required output expressions planner is to attach selection conditions from the
to the most appropriate nodes of the plan tree. <literal>WHERE</literal> clause and computation of required
output expressions to the most appropriate nodes of the plan
tree.
</para> </para>
</sect2> </sect2>
</sect1> </sect1>

View File

@ -1,5 +1,5 @@
<!-- <!--
$Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.23 2002/01/20 22:19:56 petere Exp $ $Header: /cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.24 2003/09/29 18:18:35 momjian Exp $
Genetic Optimizer Genetic Optimizer
--> -->
@ -28,7 +28,7 @@ Genetic Optimizer
<date>1997-10-02</date> <date>1997-10-02</date>
</docinfo> </docinfo>
<title>Genetic Query Optimization</title> <title id="geqo-title">Genetic Query Optimizer</title>
<para> <para>
<note> <note>
@ -44,24 +44,29 @@ Genetic Optimizer
<title>Query Handling as a Complex Optimization Problem</title> <title>Query Handling as a Complex Optimization Problem</title>
<para> <para>
Among all relational operators the most difficult one to process and Among all relational operators the most difficult one to process
optimize is the <firstterm>join</firstterm>. The number of alternative plans to answer a query and optimize is the <firstterm>join</firstterm>. The number of
grows exponentially with the number of joins included in it. Further alternative plans to answer a query grows exponentially with the
optimization effort is caused by the support of a variety of number of joins included in it. Further optimization effort is
<firstterm>join methods</firstterm> caused by the support of a variety of <firstterm>join
(e.g., nested loop, hash join, merge join in <productname>PostgreSQL</productname>) to methods</firstterm> (e.g., nested loop, hash join, merge join in
process individual joins and a diversity of <productname>PostgreSQL</productname>) to process individual joins
<firstterm>indexes</firstterm> (e.g., R-tree, and a diversity of <firstterm>indexes</firstterm> (e.g., R-tree,
B-tree, hash in <productname>PostgreSQL</productname>) as access paths for relations. B-tree, hash in <productname>PostgreSQL</productname>) as access
paths for relations.
</para> </para>
<para> <para>
The current <productname>PostgreSQL</productname> optimizer The current <productname>PostgreSQL</productname> optimizer
implementation performs a <firstterm>near-exhaustive search</firstterm> implementation performs a <firstterm>near-exhaustive
over the space of alternative strategies. This query search</firstterm> over the space of alternative strategies. This
optimization technique is inadequate to support database application algorithm, first introduced in the <quote>System R</quote>
domains that involve the need for extensive queries, such as artificial database, produces a near-optimal join order, but can take an
intelligence. enormous amount of time and memory space when the number of joins
in the query grows large. This makes the ordinary
<productname>PostgreSQL</productname> query optimizer
inappropriate for database application domains that involve the
need for extensive queries, such as artificial intelligence.
</para> </para>
<para> <para>
@ -75,12 +80,14 @@ Genetic Optimizer
<para> <para>
Performance difficulties in exploring the space of possible query Performance difficulties in exploring the space of possible query
plans created the demand for a new optimization technique being developed. plans created the demand for a new optimization technique to be developed.
</para> </para>
<para> <para>
In the following we propose the implementation of a <firstterm>Genetic Algorithm</firstterm> In the following we describe the implementation of a
as an option for the database query optimization problem. <firstterm>Genetic Algorithm</firstterm> to solve the join
ordering problem in a manner that is efficient for queries
involving large numbers of joins.
</para> </para>
</sect1> </sect1>
@ -208,10 +215,10 @@ Genetic Optimizer
<listitem> <listitem>
<para> <para>
Usage of <firstterm>edge recombination crossover</firstterm> which is Usage of <firstterm>edge recombination crossover</firstterm>
especially suited which is especially suited to keep edge losses low for the
to keep edge losses low for the solution of the solution of the <acronym>TSP</acronym> by means of a
<acronym>TSP</acronym> by means of a <acronym>GA</acronym>; <acronym>GA</acronym>;
</para> </para>
</listitem> </listitem>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/gist.sgml,v 1.12 2003/09/29 18:18:35 momjian Exp $
-->
<Chapter Id="gist"> <Chapter Id="gist">
<DocInfo> <DocInfo>
<AuthorGroup> <AuthorGroup>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/install-win32.sgml,v 1.12 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="install-win32"> <chapter id="install-win32">
<title>Installation on <productname>Windows</productname></title> <title>Installation on <productname>Windows</productname></title>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/libpgtcl.sgml,v 1.38 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="pgtcl"> <chapter id="pgtcl">
<title><application>pgtcl</application> - Tcl Binding Library</title> <title><application>pgtcl</application> - Tcl Binding Library</title>

View File

@ -1,3 +1,7 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/page.sgml,v 1.14 2003/09/29 18:18:35 momjian Exp $
-->
<chapter id="page"> <chapter id="page">
<title>Page Files</title> <title>Page Files</title>