2001-05-07 02:43:27 +02:00
|
|
|
<!--
|
2010-09-20 22:08:53 +02:00
|
|
|
doc/src/sgml/ref/analyze.sgml
|
2001-12-08 04:24:40 +01:00
|
|
|
PostgreSQL documentation
|
2001-05-07 02:43:27 +02:00
|
|
|
-->
|
|
|
|
|
|
|
|
<refentry id="SQL-ANALYZE">
|
2014-02-24 03:25:35 +01:00
|
|
|
<indexterm zone="sql-analyze">
|
|
|
|
<primary>ANALYZE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refmeta>
|
2010-04-03 09:23:02 +02:00
|
|
|
<refentrytitle>ANALYZE</refentrytitle>
|
2008-11-14 11:22:48 +01:00
|
|
|
<manvolnum>7</manvolnum>
|
2001-05-07 02:43:27 +02:00
|
|
|
<refmiscinfo>SQL - Language Statements</refmiscinfo>
|
|
|
|
</refmeta>
|
2003-04-15 15:25:08 +02:00
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refnamediv>
|
2003-04-15 15:25:08 +02:00
|
|
|
<refname>ANALYZE</refname>
|
|
|
|
<refpurpose>collect statistics about a database</refpurpose>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refnamediv>
|
2003-04-15 15:25:08 +02:00
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refsynopsisdiv>
|
2003-04-15 15:25:08 +02:00
|
|
|
<synopsis>
|
2012-06-22 00:06:14 +02:00
|
|
|
ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> [ ( <replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
|
2003-04-15 15:25:08 +02:00
|
|
|
</synopsis>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsynopsisdiv>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Description</title>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command> collects statistics about the contents
|
2007-05-15 21:13:55 +02:00
|
|
|
of tables in the database, and stores the results in the <link
|
|
|
|
linkend="catalog-pg-statistic"><structname>pg_statistic</></>
|
|
|
|
system catalog. Subsequently, the query planner uses these
|
|
|
|
statistics to help determine the most efficient execution plans for
|
|
|
|
queries.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
With no parameter, <command>ANALYZE</command> examines every table in the
|
|
|
|
current database. With a parameter, <command>ANALYZE</command> examines
|
|
|
|
only that table. It is further possible to give a list of column names,
|
2003-04-15 15:25:08 +02:00
|
|
|
in which case only the statistics for those columns are collected.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
2003-04-15 15:25:08 +02:00
|
|
|
</refsect1>
|
|
|
|
|
|
|
|
<refsect1>
|
|
|
|
<title>Parameters</title>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term><literal>VERBOSE</literal></term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Enables display of progress messages.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
2012-06-22 00:06:14 +02:00
|
|
|
<term><replaceable class="PARAMETER">table_name</replaceable></term>
|
2003-04-15 15:25:08 +02:00
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The name (possibly schema-qualified) of a specific table to
|
2012-04-06 21:02:35 +02:00
|
|
|
analyze. If omitted, all regular tables (but not foreign tables)
|
|
|
|
in the current database are analyzed.
|
2003-04-15 15:25:08 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
2012-06-22 00:06:14 +02:00
|
|
|
<term><replaceable class="PARAMETER">column_name</replaceable></term>
|
2003-04-15 15:25:08 +02:00
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The name of a specific column to analyze. Defaults to all columns.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</refsect1>
|
2001-05-07 02:43:27 +02:00
|
|
|
|
2003-09-13 01:04:46 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Outputs</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When <literal>VERBOSE</> is specified, <command>ANALYZE</> emits
|
|
|
|
progress messages to indicate which table is currently being
|
|
|
|
processed. Various statistics about the tables are printed as well.
|
|
|
|
</para>
|
|
|
|
</refsect1>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Notes</title>
|
2001-05-07 02:43:27 +02:00
|
|
|
|
2012-04-06 21:02:35 +02:00
|
|
|
<para>
|
|
|
|
Foreign tables are analyzed only when explicitly selected. Not all
|
|
|
|
foreign data wrappers support <command>ANALYZE</>. If the table's
|
|
|
|
wrapper does not support <command>ANALYZE</>, the command prints a
|
|
|
|
warning and does nothing.
|
|
|
|
</para>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<para>
|
2007-10-07 03:16:42 +02:00
|
|
|
In the default <productname>PostgreSQL</productname> configuration,
|
2009-08-05 00:04:37 +02:00
|
|
|
the autovacuum daemon (see <xref linkend="autovacuum">)
|
2007-10-07 03:16:42 +02:00
|
|
|
takes care of automatic analyzing of tables when they are first loaded
|
|
|
|
with data, and as they change throughout regular operation.
|
|
|
|
When autovacuum is disabled,
|
|
|
|
it is a good idea to run <command>ANALYZE</command> periodically, or
|
2001-05-07 02:43:27 +02:00
|
|
|
just after making major changes in the contents of a table. Accurate
|
|
|
|
statistics will help the planner to choose the most appropriate query
|
|
|
|
plan, and thereby improve the speed of query processing. A common
|
2011-09-29 01:39:54 +02:00
|
|
|
strategy for read-mostly databases is to run <xref linkend="sql-vacuum">
|
2001-11-18 23:17:30 +01:00
|
|
|
and <command>ANALYZE</command> once a day during a low-usage time of day.
|
2011-09-29 01:39:54 +02:00
|
|
|
(This will not be sufficient if there is heavy update activity.)
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2007-10-07 03:16:42 +02:00
|
|
|
<command>ANALYZE</command>
|
2003-09-11 19:31:45 +02:00
|
|
|
requires only a read lock on the target table, so it can run in
|
|
|
|
parallel with other activity on the table.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
The statistics collected by <command>ANALYZE</command> usually
|
|
|
|
include a list of some of the most common values in each column and
|
|
|
|
a histogram showing the approximate data distribution in each
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
column. One or both of these can be omitted if
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command> deems them uninteresting (for example,
|
|
|
|
in a unique-key column, there are no common values) or if the
|
|
|
|
column data type does not support the appropriate operators. There
|
|
|
|
is more information about the statistics in <xref
|
|
|
|
linkend="maintenance">.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
For large tables, <command>ANALYZE</command> takes a random sample
|
|
|
|
of the table contents, rather than examining every row. This
|
|
|
|
allows even very large tables to be analyzed in a small amount of
|
|
|
|
time. Note, however, that the statistics are only approximate, and
|
|
|
|
will change slightly each time <command>ANALYZE</command> is run,
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
even if the actual table contents did not change. This might result
|
2003-09-11 19:31:45 +02:00
|
|
|
in small changes in the planner's estimated costs shown by
|
2010-04-03 09:23:02 +02:00
|
|
|
<xref linkend="sql-explain">.
|
2008-12-13 20:13:44 +01:00
|
|
|
In rare situations, this non-determinism will cause the planner's
|
|
|
|
choices of query plans to change after <command>ANALYZE</command> is run.
|
|
|
|
To avoid this, raise the amount of statistics collected by
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command>, as described below.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-07-31 19:19:54 +02:00
|
|
|
The extent of analysis can be controlled by adjusting the
|
2004-03-09 17:57:47 +01:00
|
|
|
<xref linkend="guc-default-statistics-target"> configuration variable, or
|
2003-09-11 19:31:45 +02:00
|
|
|
on a column-by-column basis by setting the per-column statistics
|
|
|
|
target with <command>ALTER TABLE ... ALTER COLUMN ... SET
|
2010-04-03 09:23:02 +02:00
|
|
|
STATISTICS</command> (see <xref linkend="sql-altertable">).
|
|
|
|
The target value sets the
|
2003-09-11 19:31:45 +02:00
|
|
|
maximum number of entries in the most-common-value list and the
|
|
|
|
maximum number of bins in the histogram. The default target value
|
2008-12-13 20:13:44 +01:00
|
|
|
is 100, but this can be adjusted up or down to trade off accuracy of
|
2003-09-11 19:31:45 +02:00
|
|
|
planner estimates against the time taken for
|
|
|
|
<command>ANALYZE</command> and the amount of space occupied in
|
|
|
|
<literal>pg_statistic</literal>. In particular, setting the
|
|
|
|
statistics target to zero disables collection of statistics for
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
that column. It might be useful to do that for columns that are
|
2003-09-11 19:31:45 +02:00
|
|
|
never used as part of the <literal>WHERE</>, <literal>GROUP BY</>,
|
|
|
|
or <literal>ORDER BY</> clauses of queries, since the planner will
|
|
|
|
have no use for statistics on such columns.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The largest statistics target among the columns being analyzed determines
|
|
|
|
the number of table rows sampled to prepare the statistics. Increasing
|
|
|
|
the target causes a proportional increase in the time and space needed
|
|
|
|
to do <command>ANALYZE</command>.
|
|
|
|
</para>
|
2009-08-03 00:14:53 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
One of the values estimated by <command>ANALYZE</command> is the number of
|
|
|
|
distinct values that appear in each column. Because only a subset of the
|
|
|
|
rows are examined, this estimate can sometimes be quite inaccurate, even
|
|
|
|
with the largest possible statistics target. If this inaccuracy leads to
|
|
|
|
bad query plans, a more accurate value can be determined manually and then
|
|
|
|
installed with
|
2010-01-22 17:40:19 +01:00
|
|
|
<command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</>
|
2010-04-03 09:23:02 +02:00
|
|
|
(see <xref linkend="sql-altertable">).
|
2009-08-03 00:14:53 +02:00
|
|
|
</para>
|
2010-06-15 20:43:35 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
If the table being analyzed has one or more children,
|
|
|
|
<command>ANALYZE</command> will gather statistics twice: once on the
|
|
|
|
rows of the parent table only, and a second time on the rows of the
|
2011-09-29 01:39:54 +02:00
|
|
|
parent table with all of its children. This second set of statistics
|
|
|
|
is needed when planning queries that traverse the entire inheritance
|
|
|
|
tree. The autovacuum daemon, however, will only consider inserts or
|
|
|
|
updates on the parent table itself when deciding whether to trigger an
|
|
|
|
automatic analyze for that table. If that table is rarely inserted into
|
|
|
|
or updated, the inheritance statistics will not be up to date unless you
|
|
|
|
run <command>ANALYZE</command> manually.
|
2010-06-15 20:43:35 +02:00
|
|
|
</para>
|
2012-01-27 18:13:49 +01:00
|
|
|
|
|
|
|
<para>
|
|
|
|
If the table being analyzed is completely empty, <command>ANALYZE</command>
|
|
|
|
will not record new statistics for that table. Any existing statistics
|
|
|
|
will be retained.
|
|
|
|
</para>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsect1>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Compatibility</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
There is no <command>ANALYZE</command> statement in the SQL standard.
|
|
|
|
</para>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsect1>
|
2007-10-07 03:16:42 +02:00
|
|
|
|
|
|
|
<refsect1>
|
|
|
|
<title>See Also</title>
|
|
|
|
|
|
|
|
<simplelist type="inline">
|
2010-04-03 09:23:02 +02:00
|
|
|
<member><xref linkend="sql-vacuum"></member>
|
|
|
|
<member><xref linkend="app-vacuumdb"></member>
|
|
|
|
<member><xref linkend="runtime-config-resource-vacuum-cost"></member>
|
|
|
|
<member><xref linkend="autovacuum"></member>
|
2007-10-07 03:16:42 +02:00
|
|
|
</simplelist>
|
|
|
|
</refsect1>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refentry>
|