286 lines
8.3 KiB
Plaintext
286 lines
8.3 KiB
Plaintext
<!--
|
|
$PostgreSQL: pgsql/doc/src/sgml/indexcost.sgml,v 2.18 2003/11/29 19:51:37 pgsql Exp $
|
|
-->
|
|
|
|
<chapter id="indexcost">
|
|
<title>Index Cost Estimation Functions</title>
|
|
|
|
<note>
|
|
<title>Author</title>
|
|
|
|
<para>
|
|
Written by Tom Lane (<email>tgl@sss.pgh.pa.us</email>) on 2000-01-24
|
|
</para>
|
|
</note>
|
|
|
|
<note>
|
|
<para>
|
|
This must eventually become part of a much larger chapter about
|
|
writing new index access methods.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
Every index access method must provide a cost estimation function for
|
|
use by the planner/optimizer. The procedure OID of this function is
|
|
given in the <literal>amcostestimate</literal> field of the access
|
|
method's <literal>pg_am</literal> entry.
|
|
|
|
<note>
|
|
<para>
|
|
Prior to <productname>PostgreSQL</productname> 7.0, a different
|
|
scheme was used for registering
|
|
index-specific cost estimation functions.
|
|
</para>
|
|
</note>
|
|
</para>
|
|
|
|
<para>
|
|
The amcostestimate function is given a list of WHERE clauses that have
|
|
been determined to be usable with the index. It must return estimates
|
|
of the cost of accessing the index and the selectivity of the WHERE
|
|
clauses (that is, the fraction of main-table rows that will be
|
|
retrieved during the index scan). For simple cases, nearly all the
|
|
work of the cost estimator can be done by calling standard routines
|
|
in the optimizer; the point of having an amcostestimate function is
|
|
to allow index access methods to provide index-type-specific knowledge,
|
|
in case it is possible to improve on the standard estimates.
|
|
</para>
|
|
|
|
<para>
|
|
Each amcostestimate function must have the signature:
|
|
|
|
<programlisting>
|
|
void
|
|
amcostestimate (Query *root,
|
|
RelOptInfo *rel,
|
|
IndexOptInfo *index,
|
|
List *indexQuals,
|
|
Cost *indexStartupCost,
|
|
Cost *indexTotalCost,
|
|
Selectivity *indexSelectivity,
|
|
double *indexCorrelation);
|
|
</programlisting>
|
|
|
|
The first four parameters are inputs:
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>root</term>
|
|
<listitem>
|
|
<para>
|
|
The query being processed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>rel</term>
|
|
<listitem>
|
|
<para>
|
|
The relation the index is on.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>index</term>
|
|
<listitem>
|
|
<para>
|
|
The index itself.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>indexQuals</term>
|
|
<listitem>
|
|
<para>
|
|
List of index qual clauses (implicitly ANDed);
|
|
a NIL list indicates no qualifiers are available.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
|
|
<para>
|
|
The last four parameters are pass-by-reference outputs:
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>*indexStartupCost</term>
|
|
<listitem>
|
|
<para>
|
|
Set to cost of index start-up processing
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>*indexTotalCost</term>
|
|
<listitem>
|
|
<para>
|
|
Set to total cost of index processing
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>*indexSelectivity</term>
|
|
<listitem>
|
|
<para>
|
|
Set to index selectivity
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>*indexCorrelation</term>
|
|
<listitem>
|
|
<para>
|
|
Set to correlation coefficient between index scan order and
|
|
underlying table's order
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
|
|
<para>
|
|
Note that cost estimate functions must be written in C, not in SQL or
|
|
any available procedural language, because they must access internal
|
|
data structures of the planner/optimizer.
|
|
</para>
|
|
|
|
<para>
|
|
The index access costs should be computed in the units used by
|
|
<filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
|
|
has cost 1.0, a nonsequential fetch has cost random_page_cost, and
|
|
the cost of processing one index row should usually be taken as
|
|
cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
|
|
In addition, an appropriate multiple of cpu_operator_cost should be charged
|
|
for any comparison operators invoked during index processing (especially
|
|
evaluation of the indexQuals themselves).
|
|
</para>
|
|
|
|
<para>
|
|
The access costs should include all disk and CPU costs associated with
|
|
scanning the index itself, but NOT the costs of retrieving or processing
|
|
the main-table rows that are identified by the index.
|
|
</para>
|
|
|
|
<para>
|
|
The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
|
|
before we can begin to fetch the first row. For most indexes this can
|
|
be taken as zero, but an index type with a high start-up cost might want
|
|
to set it nonzero.
|
|
</para>
|
|
|
|
<para>
|
|
The indexSelectivity should be set to the estimated fraction of the main
|
|
table rows that will be retrieved during the index scan. In the case
|
|
of a lossy index, this will typically be higher than the fraction of
|
|
rows that actually pass the given qual conditions.
|
|
</para>
|
|
|
|
<para>
|
|
The indexCorrelation should be set to the correlation (ranging between
|
|
-1.0 and 1.0) between the index order and the table order. This is used
|
|
to adjust the estimate for the cost of fetching rows from the main
|
|
table.
|
|
</para>
|
|
|
|
<procedure>
|
|
<title>Cost Estimation</title>
|
|
<para>
|
|
A typical cost estimator will proceed as follows:
|
|
</para>
|
|
|
|
<step>
|
|
<para>
|
|
Estimate and return the fraction of main-table rows that will be visited
|
|
based on the given qual conditions. In the absence of any index-type-specific
|
|
knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
|
|
|
|
<programlisting>
|
|
*indexSelectivity = clauselist_selectivity(root, indexQuals,
|
|
rel->relid, JOIN_INNER);
|
|
</programlisting>
|
|
</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>
|
|
Estimate the number of index rows that will be visited during the
|
|
scan. For many index types this is the same as indexSelectivity times
|
|
the number of rows in the index, but it might be more. (Note that the
|
|
index's size in pages and rows is available from the IndexOptInfo struct.)
|
|
</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>
|
|
Estimate the number of index pages that will be retrieved during the scan.
|
|
This might be just indexSelectivity times the index's size in pages.
|
|
</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>
|
|
Compute the index access cost. A generic estimator might do this:
|
|
|
|
<programlisting>
|
|
/*
|
|
* Our generic assumption is that the index pages will be read
|
|
* sequentially, so they have cost 1.0 each, not random_page_cost.
|
|
* Also, we charge for evaluation of the indexquals at each index row.
|
|
* All the costs are assumed to be paid incrementally during the scan.
|
|
*/
|
|
cost_qual_eval(&index_qual_cost, indexQuals);
|
|
*indexStartupCost = index_qual_cost.startup;
|
|
*indexTotalCost = numIndexPages +
|
|
(cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
|
|
</programlisting>
|
|
</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>
|
|
Estimate the index correlation. For a simple ordered index on a single
|
|
field, this can be retrieved from pg_statistic. If the correlation
|
|
is not known, the conservative estimate is zero (no correlation).
|
|
</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<para>
|
|
Examples of cost estimator functions can be found in
|
|
<filename>src/backend/utils/adt/selfuncs.c</filename>.
|
|
</para>
|
|
|
|
<para>
|
|
By convention, the <literal>pg_proc</literal> entry for an
|
|
<literal>amcostestimate</literal> function should show
|
|
eight arguments all declared as <type>internal</> (since none of them have
|
|
types that are known to SQL), and the return type is <type>void</>.
|
|
</para>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|