docs: Rework sections on partition pruning/exclusion

Explain partition pruning more thoroughly, in a section above the one that explains constraint exclusion, since the new feature is the one that will be used more extensively from now on. Move some of the material from the constraint exclusion subsection to the one on partition pruning, so that we can explain the legacy method by explaining the differences with the new one instead of repeating it. Author: David Rowley, Álvaro Herrera Reviewed-by: Amit Langote, David G. Johnston, Justin Pryzby Discussion: https://postgr.es/m/CAKJS1f8PECxEi1YQ9nhVtshtfOMHUzAMm_Zp4gGCOCnMPjEKJA@mail.gmail.com
2018-05-12 12:08:17 -03:00 · 2018-05-12 12:08:17 -03:00 · bebc46931a
parent d9fcf7f5e3
commit bebc46931a
2 changed files with 184 additions and 80 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -3859,6 +3859,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
        the planner's ability to generate query plans which allow the query
        executor to remove (ignore) partitions during query execution.  The
        default is <literal>on</literal>.
+        See <xref linkend="ddl-partition-pruning"/> for details.
       </para>
      </listitem>
     </varlistentry>
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@ -2946,7 +2946,7 @@ VALUES ('Albany', NULL, NULL, 'NY');
    divide a table into pieces called partitions.  The table that is divided
    is referred to as a <firstterm>partitioned table</firstterm>.  The
    specification consists of the <firstterm>partitioning method</firstterm>
-    and a list of columns or expressions to be used as the 
+    and a list of columns or expressions to be used as the
    <firstterm>partition key</firstterm>.
   </para>

@ -3759,6 +3759,156 @@ ANALYZE measurement;
   </sect3>
  </sect2>

+  <sect2 id="ddl-partition-pruning">
+   <title>Partition Pruning</title>
+
+   <indexterm>
+    <primary>partition pruning</primary>
+   </indexterm>
+
+   <para>
+    <firstterm>Partition pruning</firstterm> is a query optimization technique
+    that improves performance for partitioned tables.  As an example:
+
+<programlisting>
+SET enable_partition_pruning = on;                 -- the default
+SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
+</programlisting>
+
+    Without partition pruning, the above query would scan each of the
+    partitions of the <structname>measurement</structname> table. With
+    partition pruning enabled, the planner will examine the definition
+    of each partition and prove that the partition need not
+    be scanned because it could not contain any rows meeting the query's
+    <literal>WHERE</literal> clause.  When the planner can prove this, it
+    excludes (<firstterm>prunes</firstterm>) the partition from the query
+    plan.
+   </para>
+
+   <para>
+    You can use the <command>EXPLAIN</command> command to show the
+    difference between a plan whose partitions have been pruned from one
+    whose partitions haven't, by using the
+    <xref linkend="guc-enable-partition-pruning"/> configuration
+    parameter.  A typical unoptimized plan for this type of table setup
+    is:
+<programlisting>
+SET enable_partition_pruning = off;
+EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
+                                    QUERY PLAN
+───────────────────────────────────────────────────────────────────────────────────
+ Aggregate  (cost=188.76..188.77 rows=1 width=8)
+   -&gt;  Append  (cost=0.00..181.05 rows=3085 width=0)
+         -&gt;  Seq Scan on measurement_y2006m02  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+         -&gt;  Seq Scan on measurement_y2006m03  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+...
+         -&gt;  Seq Scan on measurement_y2007m11  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+         -&gt;  Seq Scan on measurement_y2007m12  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+         -&gt;  Seq Scan on measurement_y2008m01  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+</programlisting>
+
+    Some or all of the partitions might use index scans instead of
+    full-table sequential scans, but the point here is that there
+    is no need to scan the older partitions at all to answer this query.
+    When we enable partition pruning, we get a significantly
+    cheaper plan that will deliver the same answer:
+<programlisting>
+SET enable_partition_pruning = on;
+EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
+                                    QUERY PLAN
+───────────────────────────────────────────────────────────────────────────────────
+ Aggregate  (cost=37.75..37.76 rows=1 width=8)
+   -&gt;  Append  (cost=0.00..36.21 rows=617 width=0)
+         -&gt;  Seq Scan on measurement_y2008m01  (cost=0.00..33.12 rows=617 width=0)
+               Filter: (logdate &gt;= '2008-01-01'::date)
+</programlisting>
+   </para>
+
+   <para>
+    Note that partition pruning is driven only by the constraints defined
+    implicitly by the partition keys, not by the presence of indexes.
+    Therefore it isn't necessary to define indexes on the key columns.
+    Whether an index needs to be created for a given partition depends on
+    whether you expect that queries that scan the partition will
+    generally scan a large part of the partition or just a small part.
+    An index will be helpful in the latter case but not the former.
+   </para>
+
+   <para>
+    Partition pruning can be performed not only during the planning of a
+    given query, but also during its execution.  This is useful as it can
+    allow more partitions to be pruned when clauses contain expressions
+    whose values are not known at query planning time; for example,
+    parameters defined in a <command>PREPARE</command> statement, using a
+    value obtained from a subquery or using a parameterized value on the
+    inner side of a nested loop join.  Partition pruning during execution
+    can be performed at any of the following times:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       During initialization of the query plan.  Partition pruning can be
+       performed here for parameter values which are known during the
+       initialization phase of execution.  Partitions which are pruned
+       during this stage will not show up in the query's
+       <command>EXPLAIN</command> or <command>EXPLAIN ANALYZE</command>.
+       It is possible to determine the number of partitions which were
+       removed during this phase by observing the
+       <quote>Subplans Removed</quote> property in the
+       <command>EXPLAIN</command> output.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       During actual execution of the query plan.  Partition pruning may
+       also be performed here to remove partitions using values which are
+       only known during actual query execution.  This includes values
+       from subqueries and values from execution-time parameters such as
+       those from parameterized nested loop joins.  Since the value of
+       these parameters may change many times during the execution of the
+       query, partition pruning is performed whenever one of the
+       execution parameters being used by partition pruning changes.
+       Determining if partitions were pruned during this phase requires
+       careful inspection of the <literal>nloops</literal> property in
+       the <command>EXPLAIN ANALYZE</command> output.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    Partition pruning can be disabled using the
+    <xref linkend="guc-enable-partition-pruning"/> setting.
+   </para>
+
+   <note>
+    <para>
+     Currently, pruning of partitions during the planning of an
+     <command>UPDATE</command> or <command>DELETE</command> command is
+     implemented using the constraint exclusion method (however, it is
+     still ruled by the <literal>enable_partition_pruning</literal>
+     setting instead of <literal>constraint_exclusion</literal>) &mdash;
+     see the next section for details and caveats that apply.
+    </para>
+
+    <para>
+     Also, execution-time partition pruning currently only occurs for the
+     <literal>Append</literal> node type, not <literal>MergeAppend</literal>.
+    </para>
+
+    <para>
+     Both of these behaviors are likely to be changed in a future release
+     of <productname>PostgreSQL</productname>.
+    </para>
+   </note>
+  </sect2>
+
  <sect2 id="ddl-partitioning-constraint-exclusion">
   <title>Partitioning and Constraint Exclusion</title>

@ -3767,97 +3917,53 @@ ANALYZE measurement;
   </indexterm>

   <para>
-    <firstterm>Constraint exclusion</firstterm> is a query optimization technique
-    that improves performance for partitioned tables defined in the
-    fashion described above (both declaratively partitioned tables and those
-    implemented using inheritance).  As an example:
-
-<programlisting>
-SET constraint_exclusion = on;
-SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
-</programlisting>
-
-    Without constraint exclusion, the above query would scan each of
-    the partitions of the <structname>measurement</structname> table. With constraint
-    exclusion enabled, the planner will examine the constraints of each
-    partition and try to prove that the partition need not
-    be scanned because it could not contain any rows meeting the query's
-    <literal>WHERE</literal> clause.  When the planner can prove this, it
-    excludes the partition from the query plan.
+    <firstterm>Constraint exclusion</firstterm> is a query optimization
+    technique similar to partition pruning.  While it is primarily used
+    for partitioned tables using the legacy inheritance method, it can be
+    used for other purposes, including with declarative partitioning.
   </para>

   <para>
-    You can use the <command>EXPLAIN</command> command to show the difference
-    between a plan with <varname>constraint_exclusion</varname> on and a plan
-    with it off.  A typical unoptimized plan for this type of table setup is:
-
-<programlisting>
-SET constraint_exclusion = off;
-EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
-
-                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
- Aggregate  (cost=158.66..158.68 rows=1 width=0)
-   -&gt;  Append  (cost=0.00..151.88 rows=2715 width=0)
-         -&gt;  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-         -&gt;  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-         -&gt;  Seq Scan on measurement_y2006m03 measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-...
-         -&gt;  Seq Scan on measurement_y2007m12 measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-         -&gt;  Seq Scan on measurement_y2008m01 measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-</programlisting>
-
-    Some or all of the partitions might use index scans instead of
-    full-table sequential scans, but the point here is that there
-    is no need to scan the older partitions at all to answer this query.
-    When we enable constraint exclusion, we get a significantly
-    cheaper plan that will deliver the same answer:
-
-<programlisting>
-SET constraint_exclusion = on;
-EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
-                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
- Aggregate  (cost=63.47..63.48 rows=1 width=0)
-   -&gt;  Append  (cost=0.00..60.75 rows=1086 width=0)
-         -&gt;  Seq Scan on measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-         -&gt;  Seq Scan on measurement_y2008m01 measurement  (cost=0.00..30.38 rows=543 width=0)
-               Filter: (logdate &gt;= '2008-01-01'::date)
-</programlisting>
+    Constraint exclusion works in a very similar way to partition
+    pruning, except that it uses each table's <literal>CHECK</literal>
+    constraints &mdash; which gives it its name &mdash; whereas partition
+    pruning uses the table's partitioning constraint, which exists only in
+    the case of declarative partitioning.  Another difference is that it
+    is only applied at plan time; there is no attempt to remove
+    partitions at execution time.
   </para>

   <para>
-    Note that constraint exclusion is driven only by <literal>CHECK</literal>
-    constraints, not by the presence of indexes.  Therefore it isn't
-    necessary to define indexes on the key columns.  Whether an index
-    needs to be created for a given partition depends on whether you
-    expect that queries that scan the partition will generally scan
-    a large part of the partition or just a small part.  An index will
-    be helpful in the latter case but not the former.
+    The fact that constraint exclusion uses <literal>CHECK</literal>
+    constraints, which makes it slow compared to partition pruning, can
+    sometimes be used as an advantage: because constraints can be defined
+    even on declaratively-partitioned tables, in addition to the internal
+    partitioning constraints, and only constraint exclusion would be able
+    to elide certain partitions from the query plan using those.
   </para>

   <para>
    The default (and recommended) setting of
-    <xref linkend="guc-constraint-exclusion"/> is actually neither
+    <xref linkend="guc-constraint-exclusion"/> is neither
    <literal>on</literal> nor <literal>off</literal>, but an intermediate setting
    called <literal>partition</literal>, which causes the technique to be
-    applied only to queries that are likely to be working on partitioned
+    applied only to queries that are likely to be working on inheritance partitioned
    tables.  The <literal>on</literal> setting causes the planner to examine
    <literal>CHECK</literal> constraints in all queries, even simple ones that
    are unlikely to benefit.
   </para>

   <para>
-    The following caveats apply to constraint exclusion, which is used by
-    both inheritance and partitioned tables:
+    The following caveats apply to constraint exclusion:

   <itemizedlist>
+    <listitem>
+     <para>
+      Constraint exclusion is only applied during query planning; it is
+      not applied at execution time like partition pruning does.
+     </para>
+    </listitem>
+
    <listitem>
     <para>
      Constraint exclusion only works when the query's <literal>WHERE</literal>
@ -3877,11 +3983,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
      range tests for range partitioning, as illustrated in the preceding
      examples.  A good rule of thumb is that partitioning constraints should
      contain only comparisons of the partitioning column(s) to constants
-      using B-tree-indexable operators, which applies even to partitioned
-      tables, because only B-tree-indexable column(s) are allowed in the
-      partition key.  (This is not a problem when using declarative
-      partitioning, since the automatically generated constraints are simple
-      enough to be understood by the planner.)
+      using B-tree-indexable operators, because only B-tree-indexable
+      column(s) are allowed in the partition key.
     </para>
    </listitem>

@ -3889,9 +3992,9 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
     <para>
      All constraints on all partitions of the master table are examined
      during constraint exclusion, so large numbers of partitions are likely
-      to increase query planning time considerably.  Partitioning using
-      these techniques will work well with up to perhaps a hundred partitions;
-      don't try to use many thousands of partitions.
+      to increase query planning time considerably.  So the legacy
+      inheritance based partitioning will work well with up to perhaps a
+      hundred partitions; don't try to use many thousands of partitions.
     </para>
    </listitem>