2001-05-07 02:43:27 +02:00
|
|
|
<!--
|
2010-09-20 22:08:53 +02:00
|
|
|
doc/src/sgml/ref/analyze.sgml
|
2001-12-08 04:24:40 +01:00
|
|
|
PostgreSQL documentation
|
2001-05-07 02:43:27 +02:00
|
|
|
-->
|
|
|
|
|
|
|
|
<refentry id="SQL-ANALYZE">
|
2014-02-24 03:25:35 +01:00
|
|
|
<indexterm zone="sql-analyze">
|
|
|
|
<primary>ANALYZE</primary>
|
|
|
|
</indexterm>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refmeta>
|
2010-04-03 09:23:02 +02:00
|
|
|
<refentrytitle>ANALYZE</refentrytitle>
|
2008-11-14 11:22:48 +01:00
|
|
|
<manvolnum>7</manvolnum>
|
2001-05-07 02:43:27 +02:00
|
|
|
<refmiscinfo>SQL - Language Statements</refmiscinfo>
|
|
|
|
</refmeta>
|
2003-04-15 15:25:08 +02:00
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refnamediv>
|
2003-04-15 15:25:08 +02:00
|
|
|
<refname>ANALYZE</refname>
|
|
|
|
<refpurpose>collect statistics about a database</refpurpose>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refnamediv>
|
2003-04-15 15:25:08 +02:00
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<refsynopsisdiv>
|
2003-04-15 15:25:08 +02:00
|
|
|
<synopsis>
|
2017-10-09 04:00:57 +02:00
|
|
|
ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replaceable> [, ...] ]
|
2017-10-04 00:53:44 +02:00
|
|
|
|
2017-10-09 04:00:57 +02:00
|
|
|
<phrase>where <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
|
2017-10-04 00:53:44 +02:00
|
|
|
|
2017-10-09 04:00:57 +02:00
|
|
|
<replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ]
|
2003-04-15 15:25:08 +02:00
|
|
|
</synopsis>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsynopsisdiv>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Description</title>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command> collects statistics about the contents
|
2007-05-15 21:13:55 +02:00
|
|
|
of tables in the database, and stores the results in the <link
|
2017-10-09 03:44:17 +02:00
|
|
|
linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
|
2007-05-15 21:13:55 +02:00
|
|
|
system catalog. Subsequently, the query planner uses these
|
|
|
|
statistics to help determine the most efficient execution plans for
|
|
|
|
queries.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2017-10-09 04:00:57 +02:00
|
|
|
Without a <replaceable class="parameter">table_and_columns</replaceable>
|
2017-10-04 00:53:44 +02:00
|
|
|
list, <command>ANALYZE</command> processes every table and materialized view
|
|
|
|
in the current database that the current user has permission to analyze.
|
|
|
|
With a list, <command>ANALYZE</command> processes only those table(s).
|
|
|
|
It is further possible to give a list of column names for a table,
|
2003-04-15 15:25:08 +02:00
|
|
|
in which case only the statistics for those columns are collected.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
2003-04-15 15:25:08 +02:00
|
|
|
</refsect1>
|
|
|
|
|
|
|
|
<refsect1>
|
|
|
|
<title>Parameters</title>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term><literal>VERBOSE</literal></term>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Enables display of progress messages.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
2017-10-09 04:00:57 +02:00
|
|
|
<term><replaceable class="parameter">table_name</replaceable></term>
|
2003-04-15 15:25:08 +02:00
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The name (possibly schema-qualified) of a specific table to
|
2017-03-02 12:48:19 +01:00
|
|
|
analyze. If omitted, all regular tables, partitioned tables, and
|
2017-03-07 17:18:56 +01:00
|
|
|
materialized views in the current database are analyzed (but not
|
2017-03-02 12:48:19 +01:00
|
|
|
foreign tables). If the specified table is a partitioned table, both the
|
|
|
|
inheritance statistics of the partitioned table as a whole and
|
|
|
|
statistics of the individual partitions are updated.
|
2003-04-15 15:25:08 +02:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
2017-10-09 04:00:57 +02:00
|
|
|
<term><replaceable class="parameter">column_name</replaceable></term>
|
2003-04-15 15:25:08 +02:00
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The name of a specific column to analyze. Defaults to all columns.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</refsect1>
|
2001-05-07 02:43:27 +02:00
|
|
|
|
2003-09-13 01:04:46 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Outputs</title>
|
|
|
|
|
|
|
|
<para>
|
2017-10-09 03:44:17 +02:00
|
|
|
When <literal>VERBOSE</literal> is specified, <command>ANALYZE</command> emits
|
2003-09-13 01:04:46 +02:00
|
|
|
progress messages to indicate which table is currently being
|
|
|
|
processed. Various statistics about the tables are printed as well.
|
|
|
|
</para>
|
|
|
|
</refsect1>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Notes</title>
|
2001-05-07 02:43:27 +02:00
|
|
|
|
2012-04-06 21:02:35 +02:00
|
|
|
<para>
|
|
|
|
Foreign tables are analyzed only when explicitly selected. Not all
|
2017-10-09 03:44:17 +02:00
|
|
|
foreign data wrappers support <command>ANALYZE</command>. If the table's
|
|
|
|
wrapper does not support <command>ANALYZE</command>, the command prints a
|
2012-04-06 21:02:35 +02:00
|
|
|
warning and does nothing.
|
|
|
|
</para>
|
|
|
|
|
2001-05-07 02:43:27 +02:00
|
|
|
<para>
|
2007-10-07 03:16:42 +02:00
|
|
|
In the default <productname>PostgreSQL</productname> configuration,
|
2009-08-05 00:04:37 +02:00
|
|
|
the autovacuum daemon (see <xref linkend="autovacuum">)
|
2007-10-07 03:16:42 +02:00
|
|
|
takes care of automatic analyzing of tables when they are first loaded
|
|
|
|
with data, and as they change throughout regular operation.
|
|
|
|
When autovacuum is disabled,
|
|
|
|
it is a good idea to run <command>ANALYZE</command> periodically, or
|
2001-05-07 02:43:27 +02:00
|
|
|
just after making major changes in the contents of a table. Accurate
|
|
|
|
statistics will help the planner to choose the most appropriate query
|
|
|
|
plan, and thereby improve the speed of query processing. A common
|
2011-09-29 01:39:54 +02:00
|
|
|
strategy for read-mostly databases is to run <xref linkend="sql-vacuum">
|
2001-11-18 23:17:30 +01:00
|
|
|
and <command>ANALYZE</command> once a day during a low-usage time of day.
|
2011-09-29 01:39:54 +02:00
|
|
|
(This will not be sufficient if there is heavy update activity.)
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2007-10-07 03:16:42 +02:00
|
|
|
<command>ANALYZE</command>
|
2003-09-11 19:31:45 +02:00
|
|
|
requires only a read lock on the target table, so it can run in
|
|
|
|
parallel with other activity on the table.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
The statistics collected by <command>ANALYZE</command> usually
|
|
|
|
include a list of some of the most common values in each column and
|
|
|
|
a histogram showing the approximate data distribution in each
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
column. One or both of these can be omitted if
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command> deems them uninteresting (for example,
|
|
|
|
in a unique-key column, there are no common values) or if the
|
|
|
|
column data type does not support the appropriate operators. There
|
|
|
|
is more information about the statistics in <xref
|
|
|
|
linkend="maintenance">.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2003-09-11 19:31:45 +02:00
|
|
|
For large tables, <command>ANALYZE</command> takes a random sample
|
|
|
|
of the table contents, rather than examining every row. This
|
|
|
|
allows even very large tables to be analyzed in a small amount of
|
|
|
|
time. Note, however, that the statistics are only approximate, and
|
|
|
|
will change slightly each time <command>ANALYZE</command> is run,
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
even if the actual table contents did not change. This might result
|
2003-09-11 19:31:45 +02:00
|
|
|
in small changes in the planner's estimated costs shown by
|
2010-04-03 09:23:02 +02:00
|
|
|
<xref linkend="sql-explain">.
|
2008-12-13 20:13:44 +01:00
|
|
|
In rare situations, this non-determinism will cause the planner's
|
|
|
|
choices of query plans to change after <command>ANALYZE</command> is run.
|
|
|
|
To avoid this, raise the amount of statistics collected by
|
2003-09-11 19:31:45 +02:00
|
|
|
<command>ANALYZE</command>, as described below.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
2002-07-31 19:19:54 +02:00
|
|
|
The extent of analysis can be controlled by adjusting the
|
2004-03-09 17:57:47 +01:00
|
|
|
<xref linkend="guc-default-statistics-target"> configuration variable, or
|
2003-09-11 19:31:45 +02:00
|
|
|
on a column-by-column basis by setting the per-column statistics
|
|
|
|
target with <command>ALTER TABLE ... ALTER COLUMN ... SET
|
2010-04-03 09:23:02 +02:00
|
|
|
STATISTICS</command> (see <xref linkend="sql-altertable">).
|
|
|
|
The target value sets the
|
2003-09-11 19:31:45 +02:00
|
|
|
maximum number of entries in the most-common-value list and the
|
|
|
|
maximum number of bins in the histogram. The default target value
|
2008-12-13 20:13:44 +01:00
|
|
|
is 100, but this can be adjusted up or down to trade off accuracy of
|
2003-09-11 19:31:45 +02:00
|
|
|
planner estimates against the time taken for
|
|
|
|
<command>ANALYZE</command> and the amount of space occupied in
|
|
|
|
<literal>pg_statistic</literal>. In particular, setting the
|
|
|
|
statistics target to zero disables collection of statistics for
|
Update reference documentation on may/can/might:
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 00:26:05 +01:00
|
|
|
that column. It might be useful to do that for columns that are
|
2017-10-09 03:44:17 +02:00
|
|
|
never used as part of the <literal>WHERE</literal>, <literal>GROUP BY</literal>,
|
|
|
|
or <literal>ORDER BY</literal> clauses of queries, since the planner will
|
2003-09-11 19:31:45 +02:00
|
|
|
have no use for statistics on such columns.
|
2001-05-07 02:43:27 +02:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The largest statistics target among the columns being analyzed determines
|
|
|
|
the number of table rows sampled to prepare the statistics. Increasing
|
|
|
|
the target causes a proportional increase in the time and space needed
|
|
|
|
to do <command>ANALYZE</command>.
|
|
|
|
</para>
|
2009-08-03 00:14:53 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
One of the values estimated by <command>ANALYZE</command> is the number of
|
|
|
|
distinct values that appear in each column. Because only a subset of the
|
|
|
|
rows are examined, this estimate can sometimes be quite inaccurate, even
|
|
|
|
with the largest possible statistics target. If this inaccuracy leads to
|
|
|
|
bad query plans, a more accurate value can be determined manually and then
|
|
|
|
installed with
|
2017-10-09 03:44:17 +02:00
|
|
|
<command>ALTER TABLE ... ALTER COLUMN ... SET (n_distinct = ...)</command>
|
2010-04-03 09:23:02 +02:00
|
|
|
(see <xref linkend="sql-altertable">).
|
2009-08-03 00:14:53 +02:00
|
|
|
</para>
|
2010-06-15 20:43:35 +02:00
|
|
|
|
|
|
|
<para>
|
|
|
|
If the table being analyzed has one or more children,
|
|
|
|
<command>ANALYZE</command> will gather statistics twice: once on the
|
|
|
|
rows of the parent table only, and a second time on the rows of the
|
2011-09-29 01:39:54 +02:00
|
|
|
parent table with all of its children. This second set of statistics
|
|
|
|
is needed when planning queries that traverse the entire inheritance
|
|
|
|
tree. The autovacuum daemon, however, will only consider inserts or
|
|
|
|
updates on the parent table itself when deciding whether to trigger an
|
|
|
|
automatic analyze for that table. If that table is rarely inserted into
|
|
|
|
or updated, the inheritance statistics will not be up to date unless you
|
|
|
|
run <command>ANALYZE</command> manually.
|
2010-06-15 20:43:35 +02:00
|
|
|
</para>
|
2012-01-27 18:13:49 +01:00
|
|
|
|
Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents. Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior. Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it. And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.
An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:
Update on pt1 (cost=0.00..321.05 rows=3541 width=46)
Update on pt1
Foreign Update on ft1
Foreign Update on ft2
Update on child3
-> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46)
-> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46)
-> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46)
-> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46)
This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 18:53:11 +01:00
|
|
|
<para>
|
|
|
|
If any of the child tables are foreign tables whose foreign data wrappers
|
2017-10-09 03:44:17 +02:00
|
|
|
do not support <command>ANALYZE</command>, those child tables are ignored while
|
Allow foreign tables to participate in inheritance.
Foreign tables can now be inheritance children, or parents. Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior. Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it. And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.
An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:
Update on pt1 (cost=0.00..321.05 rows=3541 width=46)
Update on pt1
Foreign Update on ft1
Foreign Update on ft2
Update on child3
-> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46)
-> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46)
-> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46)
-> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46)
This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me
2015-03-22 18:53:11 +01:00
|
|
|
gathering inheritance statistics.
|
|
|
|
</para>
|
|
|
|
|
2012-01-27 18:13:49 +01:00
|
|
|
<para>
|
|
|
|
If the table being analyzed is completely empty, <command>ANALYZE</command>
|
|
|
|
will not record new statistics for that table. Any existing statistics
|
|
|
|
will be retained.
|
|
|
|
</para>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsect1>
|
|
|
|
|
2003-04-15 15:25:08 +02:00
|
|
|
<refsect1>
|
|
|
|
<title>Compatibility</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
There is no <command>ANALYZE</command> statement in the SQL standard.
|
|
|
|
</para>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refsect1>
|
2007-10-07 03:16:42 +02:00
|
|
|
|
|
|
|
<refsect1>
|
|
|
|
<title>See Also</title>
|
|
|
|
|
|
|
|
<simplelist type="inline">
|
2010-04-03 09:23:02 +02:00
|
|
|
<member><xref linkend="sql-vacuum"></member>
|
|
|
|
<member><xref linkend="app-vacuumdb"></member>
|
|
|
|
<member><xref linkend="runtime-config-resource-vacuum-cost"></member>
|
|
|
|
<member><xref linkend="autovacuum"></member>
|
2007-10-07 03:16:42 +02:00
|
|
|
</simplelist>
|
|
|
|
</refsect1>
|
2001-05-07 02:43:27 +02:00
|
|
|
</refentry>
|