docs: Rework sections on partition pruning/exclusion
Explain partition pruning more thoroughly, in a section above the one that explains constraint exclusion, since the new feature is the one that will be used more extensively from now on. Move some of the material from the constraint exclusion subsection to the one on partition pruning, so that we can explain the legacy method by explaining the differences with the new one instead of repeating it. Author: David Rowley, Álvaro Herrera Reviewed-by: Amit Langote, David G. Johnston, Justin Pryzby Discussion: https://postgr.es/m/CAKJS1f8PECxEi1YQ9nhVtshtfOMHUzAMm_Zp4gGCOCnMPjEKJA@mail.gmail.com
This commit is contained in:
parent
d9fcf7f5e3
commit
bebc46931a
|
@ -3859,6 +3859,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
|
||||||
the planner's ability to generate query plans which allow the query
|
the planner's ability to generate query plans which allow the query
|
||||||
executor to remove (ignore) partitions during query execution. The
|
executor to remove (ignore) partitions during query execution. The
|
||||||
default is <literal>on</literal>.
|
default is <literal>on</literal>.
|
||||||
|
See <xref linkend="ddl-partition-pruning"/> for details.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
|
@ -3759,6 +3759,156 @@ ANALYZE measurement;
|
||||||
</sect3>
|
</sect3>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="ddl-partition-pruning">
|
||||||
|
<title>Partition Pruning</title>
|
||||||
|
|
||||||
|
<indexterm>
|
||||||
|
<primary>partition pruning</primary>
|
||||||
|
</indexterm>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
<firstterm>Partition pruning</firstterm> is a query optimization technique
|
||||||
|
that improves performance for partitioned tables. As an example:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SET enable_partition_pruning = on; -- the default
|
||||||
|
SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
Without partition pruning, the above query would scan each of the
|
||||||
|
partitions of the <structname>measurement</structname> table. With
|
||||||
|
partition pruning enabled, the planner will examine the definition
|
||||||
|
of each partition and prove that the partition need not
|
||||||
|
be scanned because it could not contain any rows meeting the query's
|
||||||
|
<literal>WHERE</literal> clause. When the planner can prove this, it
|
||||||
|
excludes (<firstterm>prunes</firstterm>) the partition from the query
|
||||||
|
plan.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You can use the <command>EXPLAIN</command> command to show the
|
||||||
|
difference between a plan whose partitions have been pruned from one
|
||||||
|
whose partitions haven't, by using the
|
||||||
|
<xref linkend="guc-enable-partition-pruning"/> configuration
|
||||||
|
parameter. A typical unoptimized plan for this type of table setup
|
||||||
|
is:
|
||||||
|
<programlisting>
|
||||||
|
SET enable_partition_pruning = off;
|
||||||
|
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
||||||
|
QUERY PLAN
|
||||||
|
───────────────────────────────────────────────────────────────────────────────────
|
||||||
|
Aggregate (cost=188.76..188.77 rows=1 width=8)
|
||||||
|
-> Append (cost=0.00..181.05 rows=3085 width=0)
|
||||||
|
-> Seq Scan on measurement_y2006m02 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
-> Seq Scan on measurement_y2006m03 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
...
|
||||||
|
-> Seq Scan on measurement_y2007m11 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
-> Seq Scan on measurement_y2007m12 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
Some or all of the partitions might use index scans instead of
|
||||||
|
full-table sequential scans, but the point here is that there
|
||||||
|
is no need to scan the older partitions at all to answer this query.
|
||||||
|
When we enable partition pruning, we get a significantly
|
||||||
|
cheaper plan that will deliver the same answer:
|
||||||
|
<programlisting>
|
||||||
|
SET enable_partition_pruning = on;
|
||||||
|
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
||||||
|
QUERY PLAN
|
||||||
|
───────────────────────────────────────────────────────────────────────────────────
|
||||||
|
Aggregate (cost=37.75..37.76 rows=1 width=8)
|
||||||
|
-> Append (cost=0.00..36.21 rows=617 width=0)
|
||||||
|
-> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0)
|
||||||
|
Filter: (logdate >= '2008-01-01'::date)
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Note that partition pruning is driven only by the constraints defined
|
||||||
|
implicitly by the partition keys, not by the presence of indexes.
|
||||||
|
Therefore it isn't necessary to define indexes on the key columns.
|
||||||
|
Whether an index needs to be created for a given partition depends on
|
||||||
|
whether you expect that queries that scan the partition will
|
||||||
|
generally scan a large part of the partition or just a small part.
|
||||||
|
An index will be helpful in the latter case but not the former.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Partition pruning can be performed not only during the planning of a
|
||||||
|
given query, but also during its execution. This is useful as it can
|
||||||
|
allow more partitions to be pruned when clauses contain expressions
|
||||||
|
whose values are not known at query planning time; for example,
|
||||||
|
parameters defined in a <command>PREPARE</command> statement, using a
|
||||||
|
value obtained from a subquery or using a parameterized value on the
|
||||||
|
inner side of a nested loop join. Partition pruning during execution
|
||||||
|
can be performed at any of the following times:
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
During initialization of the query plan. Partition pruning can be
|
||||||
|
performed here for parameter values which are known during the
|
||||||
|
initialization phase of execution. Partitions which are pruned
|
||||||
|
during this stage will not show up in the query's
|
||||||
|
<command>EXPLAIN</command> or <command>EXPLAIN ANALYZE</command>.
|
||||||
|
It is possible to determine the number of partitions which were
|
||||||
|
removed during this phase by observing the
|
||||||
|
<quote>Subplans Removed</quote> property in the
|
||||||
|
<command>EXPLAIN</command> output.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
During actual execution of the query plan. Partition pruning may
|
||||||
|
also be performed here to remove partitions using values which are
|
||||||
|
only known during actual query execution. This includes values
|
||||||
|
from subqueries and values from execution-time parameters such as
|
||||||
|
those from parameterized nested loop joins. Since the value of
|
||||||
|
these parameters may change many times during the execution of the
|
||||||
|
query, partition pruning is performed whenever one of the
|
||||||
|
execution parameters being used by partition pruning changes.
|
||||||
|
Determining if partitions were pruned during this phase requires
|
||||||
|
careful inspection of the <literal>nloops</literal> property in
|
||||||
|
the <command>EXPLAIN ANALYZE</command> output.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Partition pruning can be disabled using the
|
||||||
|
<xref linkend="guc-enable-partition-pruning"/> setting.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
Currently, pruning of partitions during the planning of an
|
||||||
|
<command>UPDATE</command> or <command>DELETE</command> command is
|
||||||
|
implemented using the constraint exclusion method (however, it is
|
||||||
|
still ruled by the <literal>enable_partition_pruning</literal>
|
||||||
|
setting instead of <literal>constraint_exclusion</literal>) —
|
||||||
|
see the next section for details and caveats that apply.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Also, execution-time partition pruning currently only occurs for the
|
||||||
|
<literal>Append</literal> node type, not <literal>MergeAppend</literal>.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Both of these behaviors are likely to be changed in a future release
|
||||||
|
of <productname>PostgreSQL</productname>.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="ddl-partitioning-constraint-exclusion">
|
<sect2 id="ddl-partitioning-constraint-exclusion">
|
||||||
<title>Partitioning and Constraint Exclusion</title>
|
<title>Partitioning and Constraint Exclusion</title>
|
||||||
|
|
||||||
|
@ -3767,97 +3917,53 @@ ANALYZE measurement;
|
||||||
</indexterm>
|
</indexterm>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
<firstterm>Constraint exclusion</firstterm> is a query optimization technique
|
<firstterm>Constraint exclusion</firstterm> is a query optimization
|
||||||
that improves performance for partitioned tables defined in the
|
technique similar to partition pruning. While it is primarily used
|
||||||
fashion described above (both declaratively partitioned tables and those
|
for partitioned tables using the legacy inheritance method, it can be
|
||||||
implemented using inheritance). As an example:
|
used for other purposes, including with declarative partitioning.
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
SET constraint_exclusion = on;
|
|
||||||
SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
Without constraint exclusion, the above query would scan each of
|
|
||||||
the partitions of the <structname>measurement</structname> table. With constraint
|
|
||||||
exclusion enabled, the planner will examine the constraints of each
|
|
||||||
partition and try to prove that the partition need not
|
|
||||||
be scanned because it could not contain any rows meeting the query's
|
|
||||||
<literal>WHERE</literal> clause. When the planner can prove this, it
|
|
||||||
excludes the partition from the query plan.
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
You can use the <command>EXPLAIN</command> command to show the difference
|
Constraint exclusion works in a very similar way to partition
|
||||||
between a plan with <varname>constraint_exclusion</varname> on and a plan
|
pruning, except that it uses each table's <literal>CHECK</literal>
|
||||||
with it off. A typical unoptimized plan for this type of table setup is:
|
constraints — which gives it its name — whereas partition
|
||||||
|
pruning uses the table's partitioning constraint, which exists only in
|
||||||
<programlisting>
|
the case of declarative partitioning. Another difference is that it
|
||||||
SET constraint_exclusion = off;
|
is only applied at plan time; there is no attempt to remove
|
||||||
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
partitions at execution time.
|
||||||
|
|
||||||
QUERY PLAN
|
|
||||||
-----------------------------------------------------------------------------------------------
|
|
||||||
Aggregate (cost=158.66..158.68 rows=1 width=0)
|
|
||||||
-> Append (cost=0.00..151.88 rows=2715 width=0)
|
|
||||||
-> Seq Scan on measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
-> Seq Scan on measurement_y2006m02 measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
-> Seq Scan on measurement_y2006m03 measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
...
|
|
||||||
-> Seq Scan on measurement_y2007m12 measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
-> Seq Scan on measurement_y2008m01 measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
Some or all of the partitions might use index scans instead of
|
|
||||||
full-table sequential scans, but the point here is that there
|
|
||||||
is no need to scan the older partitions at all to answer this query.
|
|
||||||
When we enable constraint exclusion, we get a significantly
|
|
||||||
cheaper plan that will deliver the same answer:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
SET constraint_exclusion = on;
|
|
||||||
EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
|
||||||
QUERY PLAN
|
|
||||||
-----------------------------------------------------------------------------------------------
|
|
||||||
Aggregate (cost=63.47..63.48 rows=1 width=0)
|
|
||||||
-> Append (cost=0.00..60.75 rows=1086 width=0)
|
|
||||||
-> Seq Scan on measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
-> Seq Scan on measurement_y2008m01 measurement (cost=0.00..30.38 rows=543 width=0)
|
|
||||||
Filter: (logdate >= '2008-01-01'::date)
|
|
||||||
</programlisting>
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Note that constraint exclusion is driven only by <literal>CHECK</literal>
|
The fact that constraint exclusion uses <literal>CHECK</literal>
|
||||||
constraints, not by the presence of indexes. Therefore it isn't
|
constraints, which makes it slow compared to partition pruning, can
|
||||||
necessary to define indexes on the key columns. Whether an index
|
sometimes be used as an advantage: because constraints can be defined
|
||||||
needs to be created for a given partition depends on whether you
|
even on declaratively-partitioned tables, in addition to the internal
|
||||||
expect that queries that scan the partition will generally scan
|
partitioning constraints, and only constraint exclusion would be able
|
||||||
a large part of the partition or just a small part. An index will
|
to elide certain partitions from the query plan using those.
|
||||||
be helpful in the latter case but not the former.
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The default (and recommended) setting of
|
The default (and recommended) setting of
|
||||||
<xref linkend="guc-constraint-exclusion"/> is actually neither
|
<xref linkend="guc-constraint-exclusion"/> is neither
|
||||||
<literal>on</literal> nor <literal>off</literal>, but an intermediate setting
|
<literal>on</literal> nor <literal>off</literal>, but an intermediate setting
|
||||||
called <literal>partition</literal>, which causes the technique to be
|
called <literal>partition</literal>, which causes the technique to be
|
||||||
applied only to queries that are likely to be working on partitioned
|
applied only to queries that are likely to be working on inheritance partitioned
|
||||||
tables. The <literal>on</literal> setting causes the planner to examine
|
tables. The <literal>on</literal> setting causes the planner to examine
|
||||||
<literal>CHECK</literal> constraints in all queries, even simple ones that
|
<literal>CHECK</literal> constraints in all queries, even simple ones that
|
||||||
are unlikely to benefit.
|
are unlikely to benefit.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The following caveats apply to constraint exclusion, which is used by
|
The following caveats apply to constraint exclusion:
|
||||||
both inheritance and partitioned tables:
|
|
||||||
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Constraint exclusion is only applied during query planning; it is
|
||||||
|
not applied at execution time like partition pruning does.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Constraint exclusion only works when the query's <literal>WHERE</literal>
|
Constraint exclusion only works when the query's <literal>WHERE</literal>
|
||||||
|
@ -3877,11 +3983,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
||||||
range tests for range partitioning, as illustrated in the preceding
|
range tests for range partitioning, as illustrated in the preceding
|
||||||
examples. A good rule of thumb is that partitioning constraints should
|
examples. A good rule of thumb is that partitioning constraints should
|
||||||
contain only comparisons of the partitioning column(s) to constants
|
contain only comparisons of the partitioning column(s) to constants
|
||||||
using B-tree-indexable operators, which applies even to partitioned
|
using B-tree-indexable operators, because only B-tree-indexable
|
||||||
tables, because only B-tree-indexable column(s) are allowed in the
|
column(s) are allowed in the partition key.
|
||||||
partition key. (This is not a problem when using declarative
|
|
||||||
partitioning, since the automatically generated constraints are simple
|
|
||||||
enough to be understood by the planner.)
|
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
|
@ -3889,9 +3992,9 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01';
|
||||||
<para>
|
<para>
|
||||||
All constraints on all partitions of the master table are examined
|
All constraints on all partitions of the master table are examined
|
||||||
during constraint exclusion, so large numbers of partitions are likely
|
during constraint exclusion, so large numbers of partitions are likely
|
||||||
to increase query planning time considerably. Partitioning using
|
to increase query planning time considerably. So the legacy
|
||||||
these techniques will work well with up to perhaps a hundred partitions;
|
inheritance based partitioning will work well with up to perhaps a
|
||||||
don't try to use many thousands of partitions.
|
hundred partitions; don't try to use many thousands of partitions.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue