postgresql/doc/src/sgml/pgstatstatements.sgml

458 lines
16 KiB
Plaintext
Raw Normal View History

2010-09-20 22:08:53 +02:00
<!-- doc/src/sgml/pgstatstatements.sgml -->
<sect1 id="pgstatstatements" xreflabel="pg_stat_statements">
<title>pg_stat_statements</title>
<indexterm zone="pgstatstatements">
<primary>pg_stat_statements</primary>
</indexterm>
<para>
The <filename>pg_stat_statements</filename> module provides a means for
tracking execution statistics of all SQL statements executed by a server.
</para>
<para>
The module must be loaded by adding <literal>pg_stat_statements</> to
<xref linkend="guc-shared-preload-libraries"> in
<filename>postgresql.conf</>, because it requires additional shared memory.
This means that a server restart is needed to add or remove the module.
</para>
<sect2>
<title>The <structname>pg_stat_statements</structname> View</title>
<para>
The statistics gathered by the module are made available via a
system view named <structname>pg_stat_statements</>. This view
contains one row for each distinct database ID, user ID and query
ID (up to the maximum number of distinct statements that the module
can track). The columns of the view are shown in
<xref linkend="pgstatstatements-columns">.
</para>
<table id="pgstatstatements-columns">
<title><structname>pg_stat_statements</> Columns</title>
<tgroup cols="4">
<thead>
<row>
<entry>Name</entry>
<entry>Type</entry>
<entry>References</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><structfield>userid</structfield></entry>
<entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-authid"><structname>pg_authid</structname></link>.oid</literal></entry>
<entry>OID of user who executed the statement</entry>
</row>
<row>
<entry><structfield>dbid</structfield></entry>
<entry><type>oid</type></entry>
<entry><literal><link linkend="catalog-pg-database"><structname>pg_database</structname></link>.oid</literal></entry>
<entry>OID of database in which the statement was executed</entry>
</row>
<row>
<entry><structfield>queryid</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Internal hash identifier, computed from the entry's post-parse-analysis tree</entry>
</row>
<row>
<entry><structfield>query</structfield></entry>
<entry><type>text</type></entry>
<entry></entry>
<entry>Text of a representative statement (up to <xref linkend="guc-track-activity-query-size"> bytes)</entry>
</row>
<row>
<entry><structfield>calls</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Number of times executed</entry>
</row>
<row>
<entry><structfield>total_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry></entry>
<entry>Total time spent in the statement, in milliseconds</entry>
</row>
<row>
<entry><structfield>rows</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of rows retrieved or affected by the statement</entry>
</row>
<row>
<entry><structfield>shared_blks_hit</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of shared block cache hits by the statement</entry>
</row>
<row>
<entry><structfield>shared_blks_read</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of shared blocks read by the statement</entry>
</row>
<row>
<entry><structfield>shared_blks_dirtied</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of shared blocks dirtied by the statement</entry>
</row>
<row>
<entry><structfield>shared_blks_written</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of shared blocks written by the statement</entry>
</row>
<row>
<entry><structfield>local_blks_hit</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of local block cache hits by the statement</entry>
</row>
<row>
<entry><structfield>local_blks_read</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of local blocks read by the statement</entry>
</row>
<row>
<entry><structfield>local_blks_dirtied</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of local blocks dirtied by the statement</entry>
</row>
<row>
<entry><structfield>local_blks_written</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of local blocks written by the statement</entry>
</row>
<row>
<entry><structfield>temp_blks_read</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of temp blocks read by the statement</entry>
</row>
<row>
<entry><structfield>temp_blks_written</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>Total number of temp blocks written by the statement</entry>
</row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry></entry>
<entry>
Total time the statement spent reading blocks, in milliseconds
(if <xref linkend="guc-track-io-timing"> is enabled, otherwise zero)
</entry>
</row>
<row>
<entry><structfield>blk_write_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry></entry>
<entry>
Total time the statement spent writing blocks, in milliseconds
(if <xref linkend="guc-track-io-timing"> is enabled, otherwise zero)
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
This view, and the function <function>pg_stat_statements_reset</>,
are available only in databases they have been specifically installed into
by installing the <literal>pg_stat_statements</> extension.
However, statistics are tracked across all databases of the server
whenever the <filename>pg_stat_statements</filename> module is loaded
into the server, regardless of presence of the view.
</para>
<para>
For security reasons, non-superusers are not allowed to see the SQL
text or queryid of queries executed by other users. They can see
the statistics, however, if the view has been installed in their
database.
</para>
<para>
Plannable queries (that is, <command>SELECT</>, <command>INSERT</>,
<command>UPDATE</>, and <command>DELETE</>) are combined into a single
<structname>pg_stat_statements</> entry whenever they have identical query
structures according to an internal hash calculation. Typically, two
queries will be considered the same for this purpose if they are
semantically equivalent except for the values of literal constants
appearing in the query. Utility commands (that is, all other commands)
are compared strictly on the basis of their textual query strings, however.
</para>
<para>
When a constant's value has been ignored for purposes of matching the
query to other queries, the constant is replaced by <literal>?</literal>
in the <structname>pg_stat_statements</> display. The rest of the query
text is that of the first query that had the particular
<structfield>queryid</> hash value associated with the
<structname>pg_stat_statements</> entry.
</para>
<para>
In some cases, queries with visibly different texts might get merged into a
single <structname>pg_stat_statements</> entry. Normally this will happen
only for semantically equivalent queries, but there is a small chance of
hash collisions causing unrelated queries to be merged into one entry.
(This cannot happen for queries belonging to different users or databases,
however.)
</para>
<para>
Since the <structfield>queryid</> hash value is computed on the
post-parse-analysis representation of the queries, the opposite is
also possible: queries with identical texts might appear as
separate entries, if they have different meanings as a result of
factors such as different <varname>search_path</> settings.
</para>
<para>
Consumers of <literal>pg_stat_statements</> may wish to use
<structfield>queryid</> (perhaps in composite with
<structfield>dbid</> and <structfield>userid</>) as a more stable
and reliable identifier for each entry than its query text.
However, it is important to understand that there are only limited
guarantees around the stability of the <structfield>queryid</> hash
value. Since the identifier is derived from the
post-parse-analysis tree, its value is a function of, among other
things, the internal identifiers that comprise this representation.
This has some counterintuitive implications. For example, a query
against a table that is fingerprinted by
<literal>pg_stat_statements</> will appear distinct to a
subsequently executed query that a reasonable observer might judge
to be a non-distinct, if in the interim the table was dropped and
re-created. The hashing process is sensitive to difference in
machine architecture and other facets of the platform.
Furthermore, it is not safe to assume that <structfield>queryid</>
will be stable across major versions of <productname>PostgreSQL</>.
</para>
<para>
As a rule of thumb, an assumption of the stability or comparability
of <structfield>queryid</> values should be predicated on the
underlying catalog metadata and hash function implementation
details exactly matching. Any two servers participating in any
variety of replication based on physical WAL-replay can be expected
to have identical <structfield>queryid</> values for the same query.
Logical replication schemes do not have replicas comparable in all
relevant regards, and so <structfield>queryid</> will not be a
useful identifier for accumulating costs for the entire replica
set. If in doubt, direct testing is recommended.
</para>
</sect2>
<sect2>
<title>Functions</title>
<variablelist>
<varlistentry>
<indexterm>
<primary>pg_stat_statements_reset</primary>
</indexterm>
<term>
<function>pg_stat_statements_reset() returns void</function>
</term>
<listitem>
<para>
<function>pg_stat_statements_reset</function> discards all statistics
gathered so far by <filename>pg_stat_statements</>.
By default, this function can only be executed by superusers.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Configuration Parameters</title>
<variablelist>
<varlistentry>
<term>
<varname>pg_stat_statements.max</varname> (<type>integer</type>)
</term>
<listitem>
<para>
<varname>pg_stat_statements.max</varname> is the maximum number of
statements tracked by the module (i.e., the maximum number of rows
in the <structname>pg_stat_statements</> view). If more distinct
statements than that are observed, information about the least-executed
statements is discarded.
The default value is 1000.
This parameter can only be set at server start.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<varname>pg_stat_statements.track</varname> (<type>enum</type>)
</term>
<listitem>
<para>
<varname>pg_stat_statements.track</varname> controls which statements
are counted by the module.
Specify <literal>top</> to track top-level statements (those issued
directly by clients), <literal>all</> to also track nested statements
(such as statements invoked within functions), or <literal>none</> to
disable statement statistics collection.
The default value is <literal>top</>.
Only superusers can change this setting.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<varname>pg_stat_statements.track_utility</varname> (<type>boolean</type>)
</term>
<listitem>
<para>
<varname>pg_stat_statements.track_utility</varname> controls whether
utility commands are tracked by the module. Utility commands are
all those other than <command>SELECT</>, <command>INSERT</>,
<command>UPDATE</> and <command>DELETE</>.
The default value is <literal>on</>.
Only superusers can change this setting.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<varname>pg_stat_statements.save</varname> (<type>boolean</type>)
</term>
<listitem>
<para>
<varname>pg_stat_statements.save</varname> specifies whether to
save statement statistics across server shutdowns.
If it is <literal>off</> then statistics are not saved at
shutdown nor reloaded at server start.
The default value is <literal>on</>.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The module requires additional shared memory amounting to about
<varname>pg_stat_statements.max</varname> <literal>*</>
<xref linkend="guc-track-activity-query-size"> bytes. Note that this
memory is consumed whenever the module is loaded, even if
<varname>pg_stat_statements.track</> is set to <literal>none</>.
</para>
<para>
These parameters must be set in <filename>postgresql.conf</>.
Typical usage might be:
<programlisting>
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
</programlisting>
</para>
</sect2>
<sect2>
<title>Sample Output</title>
<screen>
bench=# SELECT pg_stat_statements_reset();
$ pgbench -i bench
$ pgbench -c10 -t300 bench
bench=# \x
bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
-[ RECORD 1 ]---------------------------------------------------------------------
query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?;
calls | 3000
total_time | 9609.00100000002
rows | 2836
hit_percent | 99.9778970000200936
-[ RECORD 2 ]---------------------------------------------------------------------
query | UPDATE pgbench_tellers SET tbalance = tbalance + ? WHERE tid = ?;
calls | 3000
total_time | 8015.156
rows | 2990
hit_percent | 99.9731126579631345
-[ RECORD 3 ]---------------------------------------------------------------------
query | copy pgbench_accounts from stdin
calls | 1
total_time | 310.624
rows | 100000
hit_percent | 0.30395136778115501520
-[ RECORD 4 ]---------------------------------------------------------------------
query | UPDATE pgbench_accounts SET abalance = abalance + ? WHERE aid = ?;
calls | 3000
total_time | 271.741999999997
rows | 3000
hit_percent | 93.7968855088209426
-[ RECORD 5 ]---------------------------------------------------------------------
query | alter table pgbench_accounts add primary key (aid)
calls | 1
total_time | 81.42
rows | 0
hit_percent | 34.4947735191637631
</screen>
</sect2>
<sect2>
<title>Authors</title>
<para>
Takahiro Itagaki <email>itagaki.takahiro@oss.ntt.co.jp</email>.
Query normalization added by Peter Geoghegan <email>peter@2ndquadrant.com</email>.
</para>
</sect2>
</sect1>