2000-12-16 03:29:36 +01:00
|
|
|
<!--
|
2001-10-16 03:13:44 +02:00
|
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.13 2001/10/16 01:13:44 tgl Exp $
|
2000-12-16 03:29:36 +01:00
|
|
|
-->
|
|
|
|
|
|
|
|
<chapter id="performance-tips">
|
|
|
|
<title>Performance Tips</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Query performance can be affected by many things. Some of these can
|
|
|
|
be manipulated by the user, while others are fundamental to the underlying
|
|
|
|
design of the system. This chapter provides some hints about understanding
|
|
|
|
and tuning <productname>Postgres</productname> performance.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect1 id="using-explain">
|
|
|
|
<title>Using <command>EXPLAIN</command></title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
<productname>Postgres</productname> devises a <firstterm>query
|
|
|
|
plan</firstterm> for each query it is given. Choosing the right
|
|
|
|
plan to match the query structure and the properties of the data
|
|
|
|
is absolutely critical for good performance. You can use the
|
|
|
|
<command>EXPLAIN</command> command to see what query plan the system
|
2001-06-11 02:52:09 +02:00
|
|
|
creates for any query.
|
|
|
|
Plan-reading is an art that deserves an extensive tutorial, which
|
|
|
|
this is not; but here is some basic information.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-06-11 02:52:09 +02:00
|
|
|
The numbers that are currently quoted by <command>EXPLAIN</command> are:
|
2000-12-16 03:29:36 +01:00
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Estimated start-up cost (time expended before output scan can start,
|
2001-03-25 00:03:26 +01:00
|
|
|
e.g., time to do the sorting in a SORT node).
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Estimated total cost (if all tuples are retrieved, which they may not
|
|
|
|
be --- a query with a LIMIT will stop short of paying the total cost,
|
|
|
|
for example).
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Estimated number of rows output by this plan node (again, without
|
|
|
|
regard for any LIMIT).
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Estimated average width (in bytes) of rows output by this plan
|
|
|
|
node.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The costs are measured in units of disk page fetches. (CPU effort
|
|
|
|
estimates are converted into disk-page units using some
|
|
|
|
fairly arbitrary fudge-factors. If you want to experiment with these
|
|
|
|
factors, see the list of run-time configuration parameters in the
|
|
|
|
<citetitle>Administrator's Guide</citetitle>.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It's important to note that the cost of an upper-level node includes
|
|
|
|
the cost of all its child nodes. It's also important to realize that
|
|
|
|
the cost only reflects things that the planner/optimizer cares about.
|
|
|
|
In particular, the cost does not consider the time spent transmitting
|
|
|
|
result tuples to the frontend --- which could be a pretty dominant
|
|
|
|
factor in the true elapsed time, but the planner ignores it because
|
|
|
|
it cannot change it by altering the plan. (Every correct plan will
|
|
|
|
output the same tuple set, we trust.)
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Rows output is a little tricky because it is <emphasis>not</emphasis> the
|
|
|
|
number of rows
|
|
|
|
processed/scanned by the query --- it is usually less, reflecting the
|
|
|
|
estimated selectivity of any WHERE-clause constraints that are being
|
|
|
|
applied at this node. Ideally the top-level rows estimate will
|
|
|
|
approximate the number of rows actually returned, updated, or deleted
|
2001-06-11 02:52:09 +02:00
|
|
|
by the query.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Here are some examples (using the regress test database after a
|
2001-06-11 02:52:09 +02:00
|
|
|
vacuum analyze, and 7.2 development sources):
|
2000-12-16 03:29:36 +01:00
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1;
|
2000-12-16 03:29:36 +01:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
|
|
|
Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148)
|
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This is about as straightforward as it gets. If you do
|
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
SELECT * FROM pg_class WHERE relname = 'tenk1';
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
|
2001-09-09 19:21:59 +02:00
|
|
|
you will find out that <classname>tenk1</classname> has 233 disk
|
2001-06-22 20:53:36 +02:00
|
|
|
pages and 10000 tuples. So the cost is estimated at 233 page
|
2001-09-09 19:21:59 +02:00
|
|
|
reads, defined as 1.0 apiece, plus 10000 * <varname>cpu_tuple_cost</varname> which is
|
2000-12-16 03:29:36 +01:00
|
|
|
currently 0.01 (try <command>show cpu_tuple_cost</command>).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Now let's modify the query to add a qualification clause:
|
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
|
2000-12-16 03:29:36 +01:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
Seq Scan on tenk1 (cost=0.00..358.00 rows=1007 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
The estimate of output rows has gone down because of the WHERE clause.
|
2001-06-11 02:52:09 +02:00
|
|
|
However, the scan will still have to visit all 10000 rows, so the cost
|
|
|
|
hasn't decreased; in fact it has gone up a bit to reflect the extra CPU
|
|
|
|
time spent checking the WHERE condition.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The actual number of rows this query would select is 1000, but the
|
|
|
|
estimate is only approximate. If you try to duplicate this experiment,
|
|
|
|
you will probably get a slightly different estimate; moreover, it will
|
|
|
|
change after each <command>ANALYZE</command> command, because the
|
|
|
|
statistics produced by <command>ANALYZE</command> are taken from a
|
|
|
|
randomized sample of the table.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Modify the query to restrict the qualification even more:
|
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50;
|
2000-12-16 03:29:36 +01:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.09 rows=49 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
and you will see that if we make the WHERE condition selective
|
|
|
|
enough, the planner will
|
2001-09-09 19:21:59 +02:00
|
|
|
eventually decide that an index scan is cheaper than a sequential scan.
|
2001-06-11 02:52:09 +02:00
|
|
|
This plan will only have to visit 50 tuples because of the index,
|
|
|
|
so it wins despite the fact that each individual fetch is more expensive
|
|
|
|
than reading a whole disk page sequentially.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Add another condition to the qualification:
|
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50 AND
|
2000-12-16 03:29:36 +01:00
|
|
|
regression-# stringu1 = 'xxx';
|
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..181.22 rows=1 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
|
2001-09-09 19:21:59 +02:00
|
|
|
The added clause <literal>stringu1 = 'xxx'</literal> reduces the output-rows estimate,
|
2000-12-16 03:29:36 +01:00
|
|
|
but not the cost because we still have to visit the same set of tuples.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Let's try joining two tables, using the fields we have been discussing:
|
|
|
|
|
|
|
|
<programlisting>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50
|
|
|
|
regression-# AND t1.unique2 = t2.unique2;
|
2000-12-16 03:29:36 +01:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
Nested Loop (cost=0.00..330.41 rows=49 width=296)
|
2000-12-16 03:29:36 +01:00
|
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
2001-09-18 03:59:07 +02:00
|
|
|
(cost=0.00..181.09 rows=49 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2
|
2001-09-18 03:59:07 +02:00
|
|
|
(cost=0.00..3.01 rows=1 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2001-09-09 19:21:59 +02:00
|
|
|
In this nested-loop join, the outer scan is the same index scan we had
|
2000-12-16 03:29:36 +01:00
|
|
|
in the example before last, and so its cost and row count are the same
|
2001-09-13 17:55:24 +02:00
|
|
|
because we are applying the <literal>unique1 < 50</literal> WHERE clause at that node.
|
|
|
|
The <literal>t1.unique2 = t2.unique2</literal> clause is not relevant yet, so it doesn't
|
2001-09-09 19:21:59 +02:00
|
|
|
affect row count of the outer scan. For the inner scan, the unique2 value of the
|
2000-12-16 03:29:36 +01:00
|
|
|
current
|
2001-09-09 19:21:59 +02:00
|
|
|
outer-scan tuple is plugged into the inner index scan
|
|
|
|
to produce an index qualification like
|
2001-09-13 17:55:24 +02:00
|
|
|
<literal>t2.unique2 = <replaceable>constant</replaceable></literal>. So we get the
|
2001-09-09 19:21:59 +02:00
|
|
|
same inner-scan plan and costs that we'd get from, say, <literal>explain select
|
|
|
|
* from tenk2 where unique2 = 42</literal>. The costs of the loop node are then set
|
|
|
|
on the basis of the cost of the outer scan, plus one repetition of the
|
2001-09-18 03:59:07 +02:00
|
|
|
inner scan for each outer tuple (49 * 3.01, here), plus a little CPU
|
2000-12-16 03:29:36 +01:00
|
|
|
time for join processing.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In this example the loop's output row count is the same as the product
|
|
|
|
of the two scans' row counts, but that's not true in general, because
|
|
|
|
in general you can have WHERE clauses that mention both relations and
|
|
|
|
so can only be applied at the join point, not to either input scan.
|
2001-09-13 17:55:24 +02:00
|
|
|
For example, if we added <literal>WHERE ... AND t1.hundred < t2.hundred</literal>,
|
2001-09-09 19:21:59 +02:00
|
|
|
that would decrease the output row count of the join node, but not change
|
2000-12-16 03:29:36 +01:00
|
|
|
either input scan.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One way to look at variant plans is to force the planner to disregard
|
|
|
|
whatever strategy it thought was the winner, using the enable/disable
|
|
|
|
flags for each plan type. (This is a crude tool, but useful. See
|
|
|
|
also <xref linkend="explicit-joins">.)
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
regression=# set enable_nestloop = off;
|
|
|
|
SET VARIABLE
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50
|
|
|
|
regression-# AND t1.unique2 = t2.unique2;
|
2000-12-16 03:29:36 +01:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
Hash Join (cost=181.22..564.83 rows=49 width=296)
|
2000-12-16 03:29:36 +01:00
|
|
|
-> Seq Scan on tenk2 t2
|
|
|
|
(cost=0.00..333.00 rows=10000 width=148)
|
2001-09-18 03:59:07 +02:00
|
|
|
-> Hash (cost=181.09..181.09 rows=49 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
2001-09-18 03:59:07 +02:00
|
|
|
(cost=0.00..181.09 rows=49 width=148)
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
|
2001-09-09 19:21:59 +02:00
|
|
|
This plan proposes to extract the 50 interesting rows of <classname>tenk1</classname>
|
|
|
|
using ye same olde index scan, stash them into an in-memory hash table,
|
|
|
|
and then do a sequential scan of <classname>tenk2</classname>, probing into the hash table
|
2001-09-13 17:55:24 +02:00
|
|
|
for possible matches of <literal>t1.unique2 = t2.unique2</literal> at each <classname>tenk2</classname> tuple.
|
2001-09-09 19:21:59 +02:00
|
|
|
The cost to read <classname>tenk1</classname> and set up the hash table is entirely start-up
|
2000-12-16 03:29:36 +01:00
|
|
|
cost for the hash join, since we won't get any tuples out until we can
|
2001-09-09 19:21:59 +02:00
|
|
|
start reading <classname>tenk2</classname>. The total time estimate for the join also
|
2001-06-11 02:52:09 +02:00
|
|
|
includes a hefty charge for CPU time to probe the hash table
|
2001-09-18 03:59:07 +02:00
|
|
|
10000 times. Note, however, that we are NOT charging 10000 times 181.09;
|
2000-12-16 03:29:36 +01:00
|
|
|
the hash table setup is only done once in this plan type.
|
|
|
|
</para>
|
2001-06-22 20:53:36 +02:00
|
|
|
|
2001-09-18 03:59:07 +02:00
|
|
|
<para>
|
|
|
|
It is possible to check on the accuracy of the planner's estimated costs
|
|
|
|
by using EXPLAIN ANALYZE. This command actually executes the query,
|
|
|
|
and then displays the true runtime accumulated within each plan node
|
|
|
|
along with the same estimated costs that a plain EXPLAIN shows.
|
|
|
|
For example, we might get a result like this:
|
|
|
|
|
2001-10-09 20:46:00 +02:00
|
|
|
<screen>
|
2001-10-13 01:32:34 +02:00
|
|
|
regression=# EXPLAIN ANALYZE
|
|
|
|
regression-# SELECT * FROM tenk1 t1, tenk2 t2
|
|
|
|
regression-# WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
|
2001-09-18 03:59:07 +02:00
|
|
|
NOTICE: QUERY PLAN:
|
|
|
|
|
|
|
|
Nested Loop (cost=0.00..330.41 rows=49 width=296) (actual time=1.31..28.90 rows=50 loops=1)
|
|
|
|
-> Index Scan using tenk1_unique1 on tenk1 t1
|
|
|
|
(cost=0.00..181.09 rows=49 width=148) (actual time=0.69..8.84 rows=50 loops=1)
|
|
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2
|
|
|
|
(cost=0.00..3.01 rows=1 width=148) (actual time=0.28..0.31 rows=1 loops=50)
|
|
|
|
Total runtime: 30.67 msec
|
2001-10-09 20:46:00 +02:00
|
|
|
</screen>
|
2001-09-18 03:59:07 +02:00
|
|
|
|
|
|
|
Note that the <quote>actual time</quote> values are in milliseconds of
|
|
|
|
real time, whereas the <quote>cost</quote> estimates are expressed in
|
|
|
|
arbitrary units of disk fetches; so they are unlikely to match up.
|
|
|
|
The thing to pay attention to is the ratios.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In some query plans, it is possible for a subplan node to be executed more
|
2001-10-09 20:46:00 +02:00
|
|
|
than once. For example, the inner index scan is executed once per outer
|
2001-09-18 03:59:07 +02:00
|
|
|
tuple in the above nested-loop plan. In such cases, the
|
|
|
|
<quote>loops</quote> value reports the
|
|
|
|
total number of executions of the node, and the actual time and rows
|
|
|
|
values shown are averages per-execution. This is done to make the numbers
|
|
|
|
comparable with the way that the cost estimates are shown. Multiply by
|
|
|
|
the <quote>loops</quote> value to get the total time actually spent in
|
|
|
|
the node.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <quote>total runtime</quote> shown by EXPLAIN ANALYZE includes
|
|
|
|
executor startup and shutdown time, as well as time spent processing
|
|
|
|
the result tuples. It does not include parsing, rewriting, or planning
|
|
|
|
time. For a SELECT query, the total runtime will normally be just a
|
|
|
|
little larger than the total time reported for the top-level plan node.
|
|
|
|
For INSERT, UPDATE, and DELETE queries, the total runtime may be
|
|
|
|
considerably larger, because it includes the time spent processing the
|
|
|
|
output tuples. In these queries, the time for the top plan node
|
|
|
|
essentially is the time spent computing the new tuples and/or locating
|
|
|
|
the old ones, but it doesn't include the time spent making the changes.
|
|
|
|
</para>
|
|
|
|
|
2001-06-22 20:53:36 +02:00
|
|
|
<para>
|
|
|
|
It is worth noting that EXPLAIN results should not be extrapolated
|
|
|
|
to situations other than the one you are actually testing; for example,
|
|
|
|
results on a toy-sized table can't be assumed to apply to large tables.
|
|
|
|
The planner's cost estimates are not linear and so it may well choose
|
|
|
|
a different plan for a larger or smaller table. An extreme example
|
|
|
|
is that on a table that only occupies one disk page, you'll nearly
|
|
|
|
always get a sequential scan plan whether indexes are available or not.
|
|
|
|
The planner realizes that it's going to take one disk page read to
|
|
|
|
process the table in any case, so there's no value in expending additional
|
|
|
|
page reads to look at an index.
|
|
|
|
</para>
|
2000-12-16 03:29:36 +01:00
|
|
|
</sect1>
|
|
|
|
|
2001-10-16 03:13:44 +02:00
|
|
|
<sect1 id="planner-stats">
|
|
|
|
<title>Statistics used by the Planner</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As we saw in the previous section, the query planner needs to estimate
|
|
|
|
the number of rows retrieved by a query in order to make good choices
|
|
|
|
of query plans. This section provides a quick look at the statistics
|
|
|
|
that the system uses for these estimates.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One component of the statistics is the total number of entries in each
|
|
|
|
table and index, as well as the number of disk blocks occupied by each
|
|
|
|
table and index. This information is kept in
|
|
|
|
<structname>pg_class</structname>'s <structfield>reltuples</structfield>
|
|
|
|
and <structfield>relpages</structfield> columns. We can look at it
|
|
|
|
with queries similar to this one:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
regression=# select relname, relkind, reltuples, relpages from pg_class
|
|
|
|
regression-# where relname like 'tenk1%';
|
|
|
|
relname | relkind | reltuples | relpages
|
|
|
|
---------------+---------+-----------+----------
|
|
|
|
tenk1 | r | 10000 | 233
|
|
|
|
tenk1_hundred | i | 10000 | 30
|
|
|
|
tenk1_unique1 | i | 10000 | 30
|
|
|
|
tenk1_unique2 | i | 10000 | 30
|
|
|
|
(4 rows)
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
Here we can see that <structname>tenk1</structname> contains 10000
|
|
|
|
rows, as do its indexes, but the indexes are (unsurprisingly) much
|
|
|
|
smaller than the table.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
For efficiency reasons, <structfield>reltuples</structfield>
|
|
|
|
and <structfield>relpages</structfield> are not updated on-the-fly,
|
|
|
|
and so they usually contain only approximate values (which is good
|
|
|
|
enough for the planner's purposes). They are initialized with dummy
|
|
|
|
values (presently 1000 and 10 respectively) when a table is created.
|
|
|
|
They are updated by certain commands, presently <command>VACUUM</>,
|
|
|
|
<command>ANALYZE</>, and <command>CREATE INDEX</>. A stand-alone
|
|
|
|
<command>ANALYZE</>, that is one not part of <command>VACUUM</>,
|
|
|
|
generates an approximate <structfield>reltuples</structfield> value
|
|
|
|
since it does not read every row of the table.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Most queries retrieve only a fraction of the rows in a table, due
|
|
|
|
to having WHERE clauses that restrict the rows to be examined.
|
|
|
|
The planner thus needs to make an estimate of the
|
|
|
|
<firstterm>selectivity</> of WHERE clauses, that is, the fraction of
|
|
|
|
rows that match each clause of the WHERE condition. The information
|
|
|
|
used for this task is stored in the <structname>pg_statistic</structname>
|
|
|
|
system catalog. Entries in <structname>pg_statistic</structname> are
|
|
|
|
updated by <command>ANALYZE</> and <command>VACUUM ANALYZE</> commands,
|
|
|
|
and are always approximate even when freshly updated.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Rather than look at <structname>pg_statistic</structname> directly,
|
|
|
|
it's better to look at its view <structname>pg_stats</structname>
|
|
|
|
when examining the statistics manually. <structname>pg_stats</structname>
|
|
|
|
is designed to be more easily readable. Furthermore,
|
|
|
|
<structname>pg_stats</structname> is readable by all, whereas
|
|
|
|
<structname>pg_statistic</structname> is only readable by the superuser.
|
|
|
|
(This prevents unprivileged users from learning something about
|
|
|
|
the contents of other people's tables from the statistics. The
|
|
|
|
<structname>pg_stats</structname> view is restricted to show only
|
|
|
|
rows about tables that the current user can read.)
|
|
|
|
For example, we might do:
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
regression=# select attname, n_distinct, most_common_vals from pg_stats where tablename = 'road';
|
|
|
|
attname | n_distinct | most_common_vals
|
|
|
|
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
|
|
name | -0.467008 | {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "}
|
|
|
|
thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
|
|
|
|
(2 rows)
|
|
|
|
regression=#
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
As of <productname>Postgres</productname> 7.2 the following columns exist
|
|
|
|
in <structname>pg_stats</structname>:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<table>
|
|
|
|
<title>pg_stats Columns</title>
|
|
|
|
|
|
|
|
<tgroup cols=3>
|
|
|
|
<thead>
|
|
|
|
<row>
|
|
|
|
<entry>Name</entry>
|
|
|
|
<entry>Type</entry>
|
|
|
|
<entry>Description</entry>
|
|
|
|
</row>
|
|
|
|
</thead>
|
|
|
|
|
|
|
|
<tbody>
|
|
|
|
<row>
|
|
|
|
<entry>tablename</entry>
|
|
|
|
<entry><type>name</type></entry>
|
|
|
|
<entry>Name of table containing column</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>attname</entry>
|
|
|
|
<entry><type>name</type></entry>
|
|
|
|
<entry>Column described by this row</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>null_frac</entry>
|
|
|
|
<entry><type>real</type></entry>
|
|
|
|
<entry>Fraction of column's entries that are NULL</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>avg_width</entry>
|
|
|
|
<entry><type>integer</type></entry>
|
|
|
|
<entry>Average width in bytes of column's entries</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>n_distinct</entry>
|
|
|
|
<entry><type>real</type></entry>
|
|
|
|
<entry>If greater than zero, the estimated number of distinct values
|
|
|
|
in the column. If less than zero, the negative of the number of
|
|
|
|
distinct values divided by the number of rows. (The negated form
|
|
|
|
is used when ANALYZE believes that the number of distinct values
|
|
|
|
is likely to increase as the table grows; the positive form is used
|
|
|
|
when the column seems to have a fixed number of possible values.)
|
|
|
|
For example, -1 indicates a unique column in which the number of
|
|
|
|
distinct values is the same as the number of rows.
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>most_common_vals</entry>
|
|
|
|
<entry><type>text[]</type></entry>
|
|
|
|
<entry>A list of the most common values in the column. (Omitted if
|
|
|
|
no values seem to be more common than any others.)</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>most_common_freqs</entry>
|
|
|
|
<entry><type>real[]</type></entry>
|
|
|
|
<entry>A list of the frequencies of the most common values,
|
|
|
|
ie, number of occurrences of each divided by total number of rows.
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>histogram_bounds</entry>
|
|
|
|
<entry><type>text[]</type></entry>
|
|
|
|
<entry>A list of values that divide the column's values into
|
|
|
|
groups of approximately equal population. The
|
|
|
|
<structfield>most_common_vals</>, if present, are omitted from the
|
|
|
|
histogram calculation. (Omitted if column datatype does not have a
|
|
|
|
<literal><</> operator, or if the <structfield>most_common_vals</>
|
|
|
|
list accounts for the entire population.)
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
|
|
|
|
<row>
|
|
|
|
<entry>correlation</entry>
|
|
|
|
<entry><type>real</type></entry>
|
|
|
|
<entry>Statistical correlation between physical row ordering and
|
|
|
|
logical ordering of the column values. This ranges from -1 to +1.
|
|
|
|
When the value is near -1 or +1, an indexscan on the column will
|
|
|
|
be estimated to be cheaper than when it is near zero, due to reduction
|
|
|
|
of random access to the disk. (Omitted if column datatype does
|
|
|
|
not have a <literal><</> operator.)
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The maximum number of entries in the <structfield>most_common_vals</>
|
|
|
|
and <structfield>histogram_bounds</> arrays can be set on a
|
|
|
|
column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>
|
|
|
|
command. The default limit is presently 10 entries. Raising the limit
|
|
|
|
may allow more accurate planner estimates to be made, particularly for
|
|
|
|
columns with irregular data distributions, at the price of consuming
|
|
|
|
more space in <structname>pg_statistic</structname> and slightly more
|
|
|
|
time to compute the estimates. Conversely, a lower limit may be
|
|
|
|
appropriate for columns with simple data distributions.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2000-12-16 03:29:36 +01:00
|
|
|
<sect1 id="explicit-joins">
|
|
|
|
<title>Controlling the Planner with Explicit JOINs</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Beginning with <productname>Postgres</productname> 7.1 it is possible
|
|
|
|
to control the query planner to some extent by using explicit JOIN
|
|
|
|
syntax. To see why this matters, we first need some background.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In a simple join query, such as
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM a,b,c WHERE a.id = b.id AND b.ref = c.id;
|
|
|
|
</programlisting>
|
|
|
|
the planner is free to join the given tables in any order. For example,
|
|
|
|
it could generate a query plan that joins A to B, using the WHERE clause
|
|
|
|
a.id = b.id, and then joins C to this joined table, using the other
|
|
|
|
WHERE clause. Or it could join B to C and then join A to that result.
|
|
|
|
Or it could join A to C and then join them with B --- but that would
|
|
|
|
be inefficient, since the full Cartesian product of A and C would have
|
|
|
|
to be formed, there being no applicable WHERE clause to allow optimization
|
|
|
|
of the join.
|
|
|
|
(All joins in the <productname>Postgres</productname> executor happen
|
|
|
|
between two input tables, so it's necessary to build up the result in one
|
|
|
|
or another of these fashions.) The important point is that these different
|
|
|
|
join possibilities give semantically equivalent results but may have hugely
|
|
|
|
different execution costs. Therefore, the planner will explore all of them
|
|
|
|
to try to find the most efficient query plan.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When a query only involves two or three tables, there aren't many join
|
|
|
|
orders to worry about. But the number of possible join orders grows
|
|
|
|
exponentially as the number of tables expands. Beyond ten or so input
|
|
|
|
tables it's no longer practical to do an exhaustive search of all the
|
|
|
|
possibilities, and even for six or seven tables planning may take an
|
|
|
|
annoyingly long time. When there are too many input tables, the
|
|
|
|
<productname>Postgres</productname> planner will switch from exhaustive
|
|
|
|
search to a <firstterm>genetic</firstterm> probabilistic search
|
2001-09-09 19:21:59 +02:00
|
|
|
through a limited number of possibilities. (The switch-over threshold is
|
|
|
|
set by the <varname>GEQO_THRESHOLD</varname> run-time
|
2000-12-16 03:29:36 +01:00
|
|
|
parameter described in the <citetitle>Administrator's Guide</citetitle>.)
|
|
|
|
The genetic search takes less time, but it won't
|
|
|
|
necessarily find the best possible plan.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When the query involves outer joins, the planner has much less freedom
|
|
|
|
than it does for plain (inner) joins. For example, consider
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
|
|
</programlisting>
|
|
|
|
Although this query's restrictions are superficially similar to the
|
|
|
|
previous example, the semantics are different because a row must be
|
|
|
|
emitted for each row of A that has no matching row in the join of B and C.
|
|
|
|
Therefore the planner has no choice of join order here: it must join
|
|
|
|
B to C and then join A to that result. Accordingly, this query takes
|
|
|
|
less time to plan than the previous query.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In <productname>Postgres</productname> 7.1, the planner treats all
|
|
|
|
explicit JOIN syntaxes as constraining the join order, even though
|
|
|
|
it is not logically necessary to make such a constraint for inner
|
|
|
|
joins. Therefore, although all of these queries give the same result:
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM a,b,c WHERE a.id = b.id AND b.ref = c.id;
|
|
|
|
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
|
|
|
|
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
|
|
</programlisting>
|
|
|
|
the second and third take less time to plan than the first. This effect
|
|
|
|
is not worth worrying about for only three tables, but it can be a
|
|
|
|
lifesaver with many tables.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You do not need to constrain the join order completely in order to
|
|
|
|
cut search time, because it's OK to use JOIN operators in a plain
|
|
|
|
FROM list. For example,
|
|
|
|
<programlisting>
|
|
|
|
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
|
|
|
|
</programlisting>
|
|
|
|
forces the planner to join A to B before joining them to other tables,
|
|
|
|
but doesn't constrain its choices otherwise. In this example, the
|
|
|
|
number of possible join orders is reduced by a factor of 5.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you have a mix of outer and inner joins in a complex query, you
|
|
|
|
might not want to constrain the planner's search for a good ordering
|
|
|
|
of inner joins inside an outer join. You can't do that directly in the
|
|
|
|
JOIN syntax, but you can get around the syntactic limitation by using
|
2001-02-19 01:24:30 +01:00
|
|
|
subselects. For example,
|
2000-12-16 03:29:36 +01:00
|
|
|
<programlisting>
|
2001-02-19 01:24:30 +01:00
|
|
|
SELECT * FROM d LEFT JOIN
|
|
|
|
(SELECT * FROM a, b, c WHERE ...) AS ss
|
|
|
|
ON (...);
|
2000-12-16 03:29:36 +01:00
|
|
|
</programlisting>
|
|
|
|
Here, joining D must be the last step in the query plan, but the
|
|
|
|
planner is free to consider various join orders for A,B,C.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Constraining the planner's search in this way is a useful technique
|
|
|
|
both for reducing planning time and for directing the planner to a
|
|
|
|
good query plan. If the planner chooses a bad join order by default,
|
|
|
|
you can force it to choose a better order via JOIN syntax --- assuming
|
|
|
|
that you know of a better order, that is. Experimentation is recommended.
|
|
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="populate">
|
|
|
|
<title>Populating a Database</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
One may need to do a large number of table insertions when first
|
|
|
|
populating a database. Here are some tips and techniques for making that as
|
|
|
|
efficient as possible.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<sect2 id="disable-autocommit">
|
|
|
|
<title>Disable Auto-commit</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Turn off auto-commit and just do one commit at
|
2001-06-22 20:53:36 +02:00
|
|
|
the end. (In plain SQL, this means issuing <command>BEGIN</command>
|
|
|
|
at the start and <command>COMMIT</command> at the end. Some client
|
|
|
|
libraries may do this behind your back, in which case you need to
|
|
|
|
make sure the library does it when you want it done.)
|
|
|
|
If you allow each insertion to be committed separately,
|
|
|
|
<productname>Postgres</productname> is doing a lot of work for each
|
|
|
|
record added.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="populate-copy-from">
|
|
|
|
<title>Use COPY FROM</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Use <command>COPY FROM STDIN</command> to load all the records in one
|
2001-06-22 20:53:36 +02:00
|
|
|
command, instead of using
|
|
|
|
a series of <command>INSERT</command> commands. This reduces parsing,
|
|
|
|
planning, etc
|
2000-12-16 03:29:36 +01:00
|
|
|
overhead a great deal. If you do this then it's not necessary to fool
|
2001-06-22 20:53:36 +02:00
|
|
|
around with auto-commit, since it's only one command anyway.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
2001-05-17 23:50:18 +02:00
|
|
|
<sect2 id="populate-rm-indexes">
|
|
|
|
<title>Remove Indexes</title>
|
2000-12-16 03:29:36 +01:00
|
|
|
|
|
|
|
<para>
|
|
|
|
If you are loading a freshly created table, the fastest way is to
|
2001-06-22 20:53:36 +02:00
|
|
|
create the table, bulk-load with <command>COPY</command>, then create any
|
|
|
|
indexes needed
|
2000-12-16 03:29:36 +01:00
|
|
|
for the table. Creating an index on pre-existing data is quicker than
|
|
|
|
updating it incrementally as each record is loaded.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
If you are augmenting an existing table, you can <command>DROP
|
2001-06-22 20:53:36 +02:00
|
|
|
INDEX</command>, load the table, then recreate the index. Of
|
2000-12-16 03:29:36 +01:00
|
|
|
course, the database performance for other users may be adversely
|
2001-06-22 20:53:36 +02:00
|
|
|
affected during the time that the index is missing. One should also
|
|
|
|
think twice before dropping UNIQUE indexes, since the error checking
|
|
|
|
afforded by the UNIQUE constraint will be lost while the index is missing.
|
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="populate-analyze">
|
|
|
|
<title>ANALYZE Afterwards</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
It's a good idea to run <command>ANALYZE</command> or <command>VACUUM
|
|
|
|
ANALYZE</command> anytime you've added or updated a lot of data,
|
|
|
|
including just after initially populating a table. This ensures that
|
|
|
|
the planner has up-to-date statistics about the table. With no statistics
|
|
|
|
or obsolete statistics, the planner may make poor choices of query plans,
|
|
|
|
leading to bad performance on queries that use your table.
|
2000-12-16 03:29:36 +01:00
|
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
</chapter>
|
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode:sgml
|
|
|
|
sgml-omittag:nil
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|