848 lines
32 KiB
Plaintext
848 lines
32 KiB
Plaintext
<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.47 2003/11/29 19:51:37 pgsql Exp $ -->
|
|
|
|
<chapter id="indexes">
|
|
<title id="indexes-title">Indexes</title>
|
|
|
|
<indexterm zone="indexes">
|
|
<primary>index</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Indexes are a common way to enhance database performance. An index
|
|
allows the database server to find and retrieve specific rows much
|
|
faster than it could do without an index. But indexes also add
|
|
overhead to the database system as a whole, so they should be used
|
|
sensibly.
|
|
</para>
|
|
|
|
|
|
<sect1 id="indexes-intro">
|
|
<title>Introduction</title>
|
|
|
|
<para>
|
|
Suppose we have a table similar to this:
|
|
<programlisting>
|
|
CREATE TABLE test1 (
|
|
id integer,
|
|
content varchar
|
|
);
|
|
</programlisting>
|
|
and the application requires a lot of queries of the form
|
|
<programlisting>
|
|
SELECT content FROM test1 WHERE id = <replaceable>constant</replaceable>;
|
|
</programlisting>
|
|
With no advance preparation, the system would have to scan the entire
|
|
<structname>test1</structname> table, row by row, to find all
|
|
matching entries. If there are a lot of rows in
|
|
<structname>test1</structname> and only a few rows (perhaps only zero
|
|
or one) that would be returned by such a query, then this is clearly an
|
|
inefficient method. But if the system has been instructed to maintain an
|
|
index on the <structfield>id</structfield> column, then it can use a more
|
|
efficient method for locating matching rows. For instance, it
|
|
might only have to walk a few levels deep into a search tree.
|
|
</para>
|
|
|
|
<para>
|
|
A similar approach is used in most books of non-fiction: terms and
|
|
concepts that are frequently looked up by readers are collected in
|
|
an alphabetic index at the end of the book. The interested reader
|
|
can scan the index relatively quickly and flip to the appropriate
|
|
page(s), rather than having to read the entire book to find the
|
|
material of interest. Just as it is the task of the author to
|
|
anticipate the items that the readers are most likely to look up,
|
|
it is the task of the database programmer to foresee which indexes
|
|
would be of advantage.
|
|
</para>
|
|
|
|
<para>
|
|
The following command would be used to create the index on the
|
|
<structfield>id</structfield> column, as discussed:
|
|
<programlisting>
|
|
CREATE INDEX test1_id_index ON test1 (id);
|
|
</programlisting>
|
|
The name <structname>test1_id_index</structname> can be chosen
|
|
freely, but you should pick something that enables you to remember
|
|
later what the index was for.
|
|
</para>
|
|
|
|
<para>
|
|
To remove an index, use the <command>DROP INDEX</command> command.
|
|
Indexes can be added to and removed from tables at any time.
|
|
</para>
|
|
|
|
<para>
|
|
Once the index is created, no further intervention is required: the
|
|
system will update the index when the table is modified, and it will
|
|
use the index in queries when it thinks this would be more efficient
|
|
than a sequential table scan. But you may have to run the
|
|
<command>ANALYZE</command> command regularly to update
|
|
statistics to allow the query planner to make educated decisions.
|
|
See <xref linkend="performance-tips"> for information about
|
|
how to find out whether an index is used and when and why the
|
|
planner may choose <emphasis>not</emphasis> to use an index.
|
|
</para>
|
|
|
|
<para>
|
|
Indexes can also benefit <command>UPDATE</command> and
|
|
<command>DELETE</command> commands with search conditions. Indexes can moreover be
|
|
used in join queries. Thus,
|
|
an index defined on a column that is part of a join condition can
|
|
significantly speed up queries with joins.
|
|
</para>
|
|
|
|
<para>
|
|
When an index is created, the system has to keep it synchronized with the
|
|
table. This adds overhead to data manipulation operations.
|
|
Therefore indexes that are non-essential or do not get used at all
|
|
should be removed. Note that a
|
|
query or data manipulation command can use at most one index
|
|
per table.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-types">
|
|
<title>Index Types</title>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> provides several index types:
|
|
B-tree, R-tree, GiST, and Hash. Each index type uses a different
|
|
algorithm that is best suited to different types of queries.
|
|
<indexterm>
|
|
<primary>index</primary>
|
|
<secondary>B-tree</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>B-tree</primary>
|
|
<see>index</see>
|
|
</indexterm>
|
|
By default, the <command>CREATE INDEX</command> command will create a
|
|
B-tree index, which fits the most common situations. B-trees can
|
|
handle equality and range queries on data that can be sorted into
|
|
some ordering. In
|
|
particular, the <productname>PostgreSQL</productname> query planner
|
|
will consider using a B-tree index whenever an indexed column is
|
|
involved in a comparison using one of these operators:
|
|
|
|
<simplelist type="inline">
|
|
<member><literal><</literal></member>
|
|
<member><literal><=</literal></member>
|
|
<member><literal>=</literal></member>
|
|
<member><literal>>=</literal></member>
|
|
<member><literal>></literal></member>
|
|
</simplelist>
|
|
</para>
|
|
|
|
<para>
|
|
The optimizer can also use a B-tree index for queries involving the
|
|
pattern matching operators <literal>LIKE</>,
|
|
<literal>ILIKE</literal>, <literal>~</literal>, and
|
|
<literal>~*</literal>, <emphasis>if</emphasis> the pattern is
|
|
anchored to the beginning of the string, e.g., <literal>col LIKE
|
|
'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
|
|
<literal>col LIKE '%bar'</literal>. However, if your server does
|
|
not use the C locale you will need to create the index with a
|
|
special operator class. See <xref linkend="indexes-opclass">
|
|
below.
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>index</primary>
|
|
<secondary>R-tree</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>R-tree</primary>
|
|
<see>index</see>
|
|
</indexterm>
|
|
R-tree indexes are suited for queries on spatial data. To create
|
|
an R-tree index, use a command of the form
|
|
<synopsis>
|
|
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING RTREE (<replaceable>column</replaceable>);
|
|
</synopsis>
|
|
The <productname>PostgreSQL</productname> query planner will
|
|
consider using an R-tree index whenever an indexed column is
|
|
involved in a comparison using one of these operators:
|
|
|
|
<simplelist type="inline">
|
|
<member><literal><<</literal></member>
|
|
<member><literal>&<</literal></member>
|
|
<member><literal>&></literal></member>
|
|
<member><literal>>></literal></member>
|
|
<member><literal>@</literal></member>
|
|
<member><literal>~=</literal></member>
|
|
<member><literal>&&</literal></member>
|
|
</simplelist>
|
|
(Refer to <xref linkend="functions-geometry"> about the meaning of
|
|
these operators.)
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>index</primary>
|
|
<secondary>hash</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>hash</primary>
|
|
<see>index</see>
|
|
</indexterm>
|
|
Hash indexes can only handle simple equality comparisons.
|
|
The query planner will consider using a hash index whenever an
|
|
indexed column is involved in a comparison using the
|
|
<literal>=</literal> operator. The following command is used to
|
|
create a hash index:
|
|
<synopsis>
|
|
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> USING HASH (<replaceable>column</replaceable>);
|
|
</synopsis>
|
|
<note>
|
|
<para>
|
|
Testing has shown <productname>PostgreSQL</productname>'s hash
|
|
indexes to perform no better than B-tree indexes, and the
|
|
index size and build time for hash indexes is much worse. For
|
|
these reasons, hash index use is presently discouraged.
|
|
</para>
|
|
</note>
|
|
</para>
|
|
|
|
<para>
|
|
The B-tree index method is an implementation of Lehman-Yao
|
|
high-concurrency B-trees. The R-tree index method implements
|
|
standard R-trees using Guttman's quadratic split algorithm. The
|
|
hash index method is an implementation of Litwin's linear hashing. We
|
|
mention the algorithms used solely to indicate that all of these
|
|
index methods are fully dynamic and do not have to be optimized
|
|
periodically (as is the case with, for example, static hash methods).
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-multicolumn">
|
|
<title>Multicolumn Indexes</title>
|
|
|
|
<indexterm zone="indexes-multicolumn">
|
|
<primary>index</primary>
|
|
<secondary>multicolumn</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
An index can be defined on more than one column. For example, if
|
|
you have a table of this form:
|
|
<programlisting>
|
|
CREATE TABLE test2 (
|
|
major int,
|
|
minor int,
|
|
name varchar
|
|
);
|
|
</programlisting>
|
|
(say, you keep your <filename class="directory">/dev</filename>
|
|
directory in a database...) and you frequently make queries like
|
|
<programlisting>
|
|
SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> AND minor = <replaceable>constant</replaceable>;
|
|
</programlisting>
|
|
then it may be appropriate to define an index on the columns
|
|
<structfield>major</structfield> and
|
|
<structfield>minor</structfield> together, e.g.,
|
|
<programlisting>
|
|
CREATE INDEX test2_mm_idx ON test2 (major, minor);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Currently, only the B-tree and GiST implementations support multicolumn
|
|
indexes. Up to 32 columns may be specified. (This limit can be
|
|
altered when building <productname>PostgreSQL</productname>; see the
|
|
file <filename>pg_config_manual.h</filename>.)
|
|
</para>
|
|
|
|
<para>
|
|
The query planner can use a multicolumn index for queries that
|
|
involve the leftmost column in the index definition plus any number
|
|
of columns listed to the right of it, without a gap. For example,
|
|
an index on <literal>(a, b, c)</literal> can be used in queries
|
|
involving all of <literal>a</literal>, <literal>b</literal>, and
|
|
<literal>c</literal>, or in queries involving both
|
|
<literal>a</literal> and <literal>b</literal>, or in queries
|
|
involving only <literal>a</literal>, but not in other combinations.
|
|
(In a query involving <literal>a</literal> and <literal>c</literal>
|
|
the planner could choose to use the index for
|
|
<literal>a</literal>, while treating <literal>c</literal> like an
|
|
ordinary unindexed column.) Of course, each column must be used with
|
|
operators appropriate to the index type; clauses that involve other
|
|
operators will not be considered.
|
|
</para>
|
|
|
|
<para>
|
|
Multicolumn indexes can only be used if the clauses involving the
|
|
indexed columns are joined with <literal>AND</literal>. For instance,
|
|
<programlisting>
|
|
SELECT name FROM test2 WHERE major = <replaceable>constant</replaceable> OR minor = <replaceable>constant</replaceable>;
|
|
</programlisting>
|
|
cannot make use of the index <structname>test2_mm_idx</structname>
|
|
defined above to look up both columns. (It can be used to look up
|
|
only the <structfield>major</structfield> column, however.)
|
|
</para>
|
|
|
|
<para>
|
|
Multicolumn indexes should be used sparingly. Most of the time,
|
|
an index on a single column is sufficient and saves space and time.
|
|
Indexes with more than three columns are unlikely to be helpful
|
|
unless the usage of the table is extremely stylized.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-unique">
|
|
<title>Unique Indexes</title>
|
|
|
|
<indexterm zone="indexes-unique">
|
|
<primary>index</primary>
|
|
<secondary>unique</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Indexes may also be used to enforce uniqueness of a column's value,
|
|
or the uniqueness of the combined values of more than one column.
|
|
<synopsis>
|
|
CREATE UNIQUE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> (<replaceable>column</replaceable> <optional>, ...</optional>);
|
|
</synopsis>
|
|
Currently, only B-tree indexes can be declared unique.
|
|
</para>
|
|
|
|
<para>
|
|
When an index is declared unique, multiple table rows with equal
|
|
indexed values will not be allowed. Null values are not considered
|
|
equal. A multicolumn unique index will only reject cases where all
|
|
of the indexed columns are equal in two rows.
|
|
</para>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> automatically creates a unique
|
|
index when a unique constraint or a primary key is defined for a table.
|
|
The index covers the columns that make up the primary key or unique
|
|
columns (a multicolumn index, if appropriate), and is the mechanism
|
|
that enforces the constraint.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
The preferred way to add a unique constraint to a table is
|
|
<literal>ALTER TABLE ... ADD CONSTRAINT</literal>. The use of
|
|
indexes to enforce unique constraints could be considered an
|
|
implementation detail that should not be accessed directly.
|
|
One should, however, be aware that there's no need to manually
|
|
create indexes on unique columns; doing so would just duplicate
|
|
the automatically-created index.
|
|
</para>
|
|
</note>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-expressional">
|
|
<title>Indexes on Expressions</title>
|
|
|
|
<indexterm zone="indexes-expressional">
|
|
<primary>index</primary>
|
|
<secondary sortas="expressions">on expressions</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
An index column need not be just a column of the underlying table,
|
|
but can be a function or scalar expression computed from one or
|
|
more columns of the table. This feature is useful to obtain fast
|
|
access to tables based on the results of computations.
|
|
</para>
|
|
|
|
<para>
|
|
For example, a common way to do case-insensitive comparisons is to
|
|
use the <function>lower</function> function:
|
|
<programlisting>
|
|
SELECT * FROM test1 WHERE lower(col1) = 'value';
|
|
</programlisting>
|
|
This query can use an index, if one has been
|
|
defined on the result of the <literal>lower(col1)</literal>
|
|
operation:
|
|
<programlisting>
|
|
CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
If we were to declare this index <literal>UNIQUE</>, it would prevent
|
|
creation of rows whose <literal>col1</> values differ only in case,
|
|
as well as rows whose <literal>col1</> values are actually identical.
|
|
Thus, indexes on expressions can be used to enforce constraints that
|
|
are not definable as simple unique constraints.
|
|
</para>
|
|
|
|
<para>
|
|
As another example, if one often does queries like this:
|
|
<programlisting>
|
|
SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith';
|
|
</programlisting>
|
|
then it might be worth creating an index like this:
|
|
<programlisting>
|
|
CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The syntax of the <command>CREATE INDEX</> command normally requires
|
|
writing parentheses around index expressions, as shown in the second
|
|
example. The parentheses may be omitted when the expression is just
|
|
a function call, as in the first example.
|
|
</para>
|
|
|
|
<para>
|
|
Index expressions are relatively expensive to maintain, since the
|
|
derived expression(s) must be computed for each row upon insertion
|
|
or whenever it is updated. Therefore they should be used only when
|
|
queries that can use the index are very frequent.
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-opclass">
|
|
<title>Operator Classes</title>
|
|
|
|
<indexterm zone="indexes-opclass">
|
|
<primary>operator class</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
An index definition may specify an <firstterm>operator
|
|
class</firstterm> for each column of an index.
|
|
<synopsis>
|
|
CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> (<replaceable>column</replaceable> <replaceable>opclass</replaceable> <optional>, ...</optional>);
|
|
</synopsis>
|
|
The operator class identifies the operators to be used by the index
|
|
for that column. For example, a B-tree index on the type <type>int4</type>
|
|
would use the <literal>int4_ops</literal> class; this operator
|
|
class includes comparison functions for values of type <type>int4</type>.
|
|
In practice the default operator class for the column's data type is
|
|
usually sufficient. The main point of having operator classes is
|
|
that for some data types, there could be more than one meaningful
|
|
index behavior. For example, we might want to sort a complex-number data
|
|
type either by absolute value or by real part. We could do this by
|
|
defining two operator classes for the data type and then selecting
|
|
the proper class when making an index.
|
|
</para>
|
|
|
|
<para>
|
|
There are also some built-in operator classes besides the default ones:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The operator classes <literal>text_pattern_ops</literal>,
|
|
<literal>varchar_pattern_ops</literal>,
|
|
<literal>bpchar_pattern_ops</literal>, and
|
|
<literal>name_pattern_ops</literal> support B-tree indexes on
|
|
the types <type>text</type>, <type>varchar</type>,
|
|
<type>char</type>, and <type>name</type>, respectively. The
|
|
difference from the ordinary operator classes is that the values
|
|
are compared strictly character by character rather than
|
|
according to the locale-specific collation rules. This makes
|
|
these operator classes suitable for use by queries involving
|
|
pattern matching expressions (<literal>LIKE</literal> or POSIX
|
|
regular expressions) if the server does not use the standard
|
|
<quote>C</quote> locale. As an example, you might index a
|
|
<type>varchar</type> column like this:
|
|
<programlisting>
|
|
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
|
|
</programlisting>
|
|
If you do use the C locale, you may instead create an index
|
|
with the default operator class, and it will still be useful
|
|
for pattern-matching queries. Also note that you should
|
|
create an index with the default operator class if you want
|
|
queries involving ordinary comparisons to use an index. Such
|
|
queries cannot use the
|
|
<literal><replaceable>xxx</replaceable>_pattern_ops</literal>
|
|
operator classes. It is allowed to create multiple
|
|
indexes on the same column with different operator classes.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
The following query shows all defined operator classes:
|
|
|
|
<programlisting>
|
|
SELECT am.amname AS index_method,
|
|
opc.opcname AS opclass_name
|
|
FROM pg_am am, pg_opclass opc
|
|
WHERE opc.opcamid = am.oid
|
|
ORDER BY index_method, opclass_name;
|
|
</programlisting>
|
|
|
|
It can be extended to show all the operators included in each class:
|
|
<programlisting>
|
|
SELECT am.amname AS index_method,
|
|
opc.opcname AS opclass_name,
|
|
opr.oprname AS opclass_operator
|
|
FROM pg_am am, pg_opclass opc, pg_amop amop, pg_operator opr
|
|
WHERE opc.opcamid = am.oid AND
|
|
amop.amopclaid = opc.oid AND
|
|
amop.amopopr = opr.oid
|
|
ORDER BY index_method, opclass_name, opclass_operator;
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="indexes-partial">
|
|
<title>Partial Indexes</title>
|
|
|
|
<indexterm zone="indexes-partial">
|
|
<primary>index</primary>
|
|
<secondary>partial</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A <firstterm>partial index</firstterm> is an index built over a
|
|
subset of a table; the subset is defined by a conditional
|
|
expression (called the <firstterm>predicate</firstterm> of the
|
|
partial index). The index contains entries for only those table
|
|
rows that satisfy the predicate.
|
|
</para>
|
|
|
|
<para>
|
|
A major motivation for partial indexes is to avoid indexing common
|
|
values. Since a query searching for a common value (one that
|
|
accounts for more than a few percent of all the table rows) will not
|
|
use the index anyway, there is no point in keeping those rows in the
|
|
index at all. This reduces the size of the index, which will speed
|
|
up queries that do use the index. It will also speed up many table
|
|
update operations because the index does not need to be
|
|
updated in all cases. <xref linkend="indexes-partial-ex1"> shows a
|
|
possible application of this idea.
|
|
</para>
|
|
|
|
<example id="indexes-partial-ex1">
|
|
<title>Setting up a Partial Index to Exclude Common Values</title>
|
|
|
|
<para>
|
|
Suppose you are storing web server access logs in a database.
|
|
Most accesses originate from the IP address range of your organization but
|
|
some are from elsewhere (say, employees on dial-up connections).
|
|
If your searches by IP are primarily for outside accesses,
|
|
you probably do not need to index the IP range that corresponds to your
|
|
organization's subnet.
|
|
</para>
|
|
|
|
<para>
|
|
Assume a table like this:
|
|
<programlisting>
|
|
CREATE TABLE access_log (
|
|
url varchar,
|
|
client_ip inet,
|
|
...
|
|
);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
To create a partial index that suits our example, use a command
|
|
such as this:
|
|
<programlisting>
|
|
CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
|
|
WHERE NOT (client_ip > inet '192.168.100.0' AND client_ip < inet '192.168.100.255');
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
A typical query that can use this index would be:
|
|
<programlisting>
|
|
SELECT * FROM access_log WHERE url = '/index.html' AND client_ip = inet '212.78.10.32';
|
|
</programlisting>
|
|
A query that cannot use this index is:
|
|
<programlisting>
|
|
SELECT * FROM access_log WHERE client_ip = inet '192.168.100.23';
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Observe that this kind of partial index requires that the common
|
|
values be predetermined. If the distribution of values is
|
|
inherent (due to the nature of the application) and static (not
|
|
changing over time), this is not difficult, but if the common values are
|
|
merely due to the coincidental data load this can require a lot of
|
|
maintenance work.
|
|
</para>
|
|
</example>
|
|
|
|
<para>
|
|
Another possibility is to exclude values from the index that the
|
|
typical query workload is not interested in; this is shown in <xref
|
|
linkend="indexes-partial-ex2">. This results in the same
|
|
advantages as listed above, but it prevents the
|
|
<quote>uninteresting</quote> values from being accessed via that
|
|
index at all, even if an index scan might be profitable in that
|
|
case. Obviously, setting up partial indexes for this kind of
|
|
scenario will require a lot of care and experimentation.
|
|
</para>
|
|
|
|
<example id="indexes-partial-ex2">
|
|
<title>Setting up a Partial Index to Exclude Uninteresting Values</title>
|
|
|
|
<para>
|
|
If you have a table that contains both billed and unbilled orders,
|
|
where the unbilled orders take up a small fraction of the total
|
|
table and yet those are the most-accessed rows, you can improve
|
|
performance by creating an index on just the unbilled rows. The
|
|
command to create the index would look like this:
|
|
<programlisting>
|
|
CREATE INDEX orders_unbilled_index ON orders (order_nr)
|
|
WHERE billed is not true;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
A possible query to use this index would be
|
|
<programlisting>
|
|
SELECT * FROM orders WHERE billed is not true AND order_nr < 10000;
|
|
</programlisting>
|
|
However, the index can also be used in queries that do not involve
|
|
<structfield>order_nr</> at all, e.g.,
|
|
<programlisting>
|
|
SELECT * FROM orders WHERE billed is not true AND amount > 5000.00;
|
|
</programlisting>
|
|
This is not as efficient as a partial index on the
|
|
<structfield>amount</> column would be, since the system has to
|
|
scan the entire index. Yet, if there are relatively few unbilled
|
|
orders, using this partial index just to find the unbilled orders
|
|
could be a win.
|
|
</para>
|
|
|
|
<para>
|
|
Note that this query cannot use this index:
|
|
<programlisting>
|
|
SELECT * FROM orders WHERE order_nr = 3501;
|
|
</programlisting>
|
|
The order 3501 may be among the billed or among the unbilled
|
|
orders.
|
|
</para>
|
|
</example>
|
|
|
|
<para>
|
|
<xref linkend="indexes-partial-ex2"> also illustrates that the
|
|
indexed column and the column used in the predicate do not need to
|
|
match. <productname>PostgreSQL</productname> supports partial
|
|
indexes with arbitrary predicates, so long as only columns of the
|
|
table being indexed are involved. However, keep in mind that the
|
|
predicate must match the conditions used in the queries that
|
|
are supposed to benefit from the index. To be precise, a partial
|
|
index can be used in a query only if the system can recognize that
|
|
the <literal>WHERE</> condition of the query mathematically implies
|
|
the predicate of the index.
|
|
<productname>PostgreSQL</productname> does not have a sophisticated
|
|
theorem prover that can recognize mathematically equivalent
|
|
expressions that are written in different forms. (Not
|
|
only is such a general theorem prover extremely difficult to
|
|
create, it would probably be too slow to be of any real use.)
|
|
The system can recognize simple inequality implications, for example
|
|
<quote>x < 1</quote> implies <quote>x < 2</quote>; otherwise
|
|
the predicate condition must exactly match part of the query's
|
|
<literal>WHERE</> condition
|
|
or the index will not be recognized to be usable.
|
|
</para>
|
|
|
|
<para>
|
|
A third possible use for partial indexes does not require the
|
|
index to be used in queries at all. The idea here is to create
|
|
a unique index over a subset of a table, as in <xref
|
|
linkend="indexes-partial-ex3">. This enforces uniqueness
|
|
among the rows that satisfy the index predicate, without constraining
|
|
those that do not.
|
|
</para>
|
|
|
|
<example id="indexes-partial-ex3">
|
|
<title>Setting up a Partial Unique Index</title>
|
|
|
|
<para>
|
|
Suppose that we have a table describing test outcomes. We wish
|
|
to ensure that there is only one <quote>successful</> entry for
|
|
a given subject and target combination, but there might be any number of
|
|
<quote>unsuccessful</> entries. Here is one way to do it:
|
|
<programlisting>
|
|
CREATE TABLE tests (
|
|
subject text,
|
|
target text,
|
|
success boolean,
|
|
...
|
|
);
|
|
|
|
CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
|
|
WHERE success;
|
|
</programlisting>
|
|
This is a particularly efficient way of doing it when there are few
|
|
successful tests and many unsuccessful ones.
|
|
</para>
|
|
</example>
|
|
|
|
<para>
|
|
Finally, a partial index can also be used to override the system's
|
|
query plan choices. It may occur that data sets with peculiar
|
|
distributions will cause the system to use an index when it really
|
|
should not. In that case the index can be set up so that it is not
|
|
available for the offending query. Normally,
|
|
<productname>PostgreSQL</> makes reasonable choices about index
|
|
usage (e.g., it avoids them when retrieving common values, so the
|
|
earlier example really only saves index size, it is not required to
|
|
avoid index usage), and grossly incorrect plan choices are cause
|
|
for a bug report.
|
|
</para>
|
|
|
|
<para>
|
|
Keep in mind that setting up a partial index indicates that you
|
|
know at least as much as the query planner knows, in particular you
|
|
know when an index might be profitable. Forming this knowledge
|
|
requires experience and understanding of how indexes in
|
|
<productname>PostgreSQL</> work. In most cases, the advantage of a
|
|
partial index over a regular index will not be much.
|
|
</para>
|
|
|
|
<para>
|
|
More information about partial indexes can be found in <xref
|
|
linkend="STON89b">, <xref linkend="OLSON93">, and <xref
|
|
linkend="SESHADRI95">.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="indexes-examine">
|
|
<title>Examining Index Usage</title>
|
|
|
|
<indexterm zone="indexes-examine">
|
|
<primary>index</primary>
|
|
<secondary>examining usage</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Although indexes in <productname>PostgreSQL</> do not need
|
|
maintenance and tuning, it is still important to check
|
|
which indexes are actually used by the real-life query workload.
|
|
Examining index usage for an individual query is done with the
|
|
<xref linkend="sql-explain" endterm="sql-explain-title">
|
|
command; its application for this purpose is
|
|
illustrated in <xref linkend="using-explain">.
|
|
It is also possible to gather overall statistics about index usage
|
|
in a running server, as described in <xref linkend="monitoring-stats">.
|
|
</para>
|
|
|
|
<para>
|
|
It is difficult to formulate a general procedure for determining
|
|
which indexes to set up. There are a number of typical cases that
|
|
have been shown in the examples throughout the previous sections.
|
|
A good deal of experimentation will be necessary in most cases.
|
|
The rest of this section gives some tips for that.
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Always run <xref linkend="sql-analyze" endterm="sql-analyze-title">
|
|
first. This command
|
|
collects statistics about the distribution of the values in the
|
|
table. This information is required to guess the number of rows
|
|
returned by a query, which is needed by the planner to assign
|
|
realistic costs to each possible query plan. In absence of any
|
|
real statistics, some default values are assumed, which are
|
|
almost certain to be inaccurate. Examining an application's
|
|
index usage without having run <command>ANALYZE</command> is
|
|
therefore a lost cause.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Use real data for experimentation. Using test data for setting
|
|
up indexes will tell you what indexes you need for the test data,
|
|
but that is all.
|
|
</para>
|
|
|
|
<para>
|
|
It is especially fatal to use proportionally reduced data sets.
|
|
While selecting 1000 out of 100000 rows could be a candidate for
|
|
an index, selecting 1 out of 100 rows will hardly be, because the
|
|
100 rows will probably fit within a single disk page, and there
|
|
is no plan that can beat sequentially fetching 1 disk page.
|
|
</para>
|
|
|
|
<para>
|
|
Also be careful when making up test data, which is often
|
|
unavoidable when the application is not in production use yet.
|
|
Values that are very similar, completely random, or inserted in
|
|
sorted order will skew the statistics away from the distribution
|
|
that real data would have.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
When indexes are not used, it can be useful for testing to force
|
|
their use. There are run-time parameters that can turn off
|
|
various plan types (described in <xref linkend="runtime-config">).
|
|
For instance, turning off sequential scans
|
|
(<varname>enable_seqscan</>) and nested-loop joins
|
|
(<varname>enable_nestloop</>), which are the most basic plans,
|
|
will force the system to use a different plan. If the system
|
|
still chooses a sequential scan or nested-loop join then there is
|
|
probably a more fundamental problem for why the index is not
|
|
used, for example, the query condition does not match the index.
|
|
(What kind of query can use what kind of index is explained in
|
|
the previous sections.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
If forcing index usage does use the index, then there are two
|
|
possibilities: Either the system is right and using the index is
|
|
indeed not appropriate, or the cost estimates of the query plans
|
|
are not reflecting reality. So you should time your query with
|
|
and without indexes. The <command>EXPLAIN ANALYZE</command>
|
|
command can be useful here.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
If it turns out that the cost estimates are wrong, there are,
|
|
again, two possibilities. The total cost is computed from the
|
|
per-row costs of each plan node times the selectivity estimate of
|
|
the plan node. The costs of the plan nodes can be tuned with
|
|
run-time parameters (described in <xref linkend="runtime-config">).
|
|
An inaccurate selectivity estimate is due to
|
|
insufficient statistics. It may be possible to help this by
|
|
tuning the statistics-gathering parameters (see
|
|
<xref linkend="sql-altertable" endterm="sql-altertable-title">).
|
|
</para>
|
|
|
|
<para>
|
|
If you do not succeed in adjusting the costs to be more
|
|
appropriate, then you may have to resort to forcing index usage
|
|
explicitly. You may also want to contact the
|
|
<productname>PostgreSQL</> developers to examine the issue.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|