doc: Add best practises section to partitioning docs

A few questionable partitioning designs have been cropping up lately
around the mailing lists.  Generally, these cases have been partitioning
using too many partitions which have caused performance or OOM problems for
the users.

Since we have very little else to guide users into good design, here we
add a new section to the partitioning documentation with some best
practise guidelines for good design.

Reviewed-by: Justin Pryzby, Amit Langote, Alvaro Herrera
Discussion: https://postgr.es/m/CAKJS1f-2rx+E9mG3xrCVHupefMjAp1+tpczQa9SEOZWyU7fjEA@mail.gmail.com
Backpatch-through: 10
This commit is contained in:
David Rowley 2019-06-12 08:09:11 +12:00
parent 1c9034579c
commit 936b5e589e
1 changed files with 84 additions and 2 deletions

View File

@ -2833,8 +2833,9 @@ VALUES ('Albany', NULL, NULL, 'NY');
</listitem>
</itemizedlist>
These deficiencies will probably be fixed in some future release,
but in the meantime considerable care is needed in deciding whether
Some functionality not implemented for inheritance hierarchies is
implemented for declarative partitioning.
Considerable care is needed in deciding whether partitioning with legacy
inheritance is useful for your application.
</para>
@ -4057,6 +4058,87 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
</itemizedlist>
</para>
</sect2>
<sect2 id="ddl-partitioning-declarative-best-practices">
<title>Declarative Partitioning Best Practices</title>
<para>
The choice of how to partition a table should be made carefully as the
performance of query planning and execution can be negatively affected by
poor design.
</para>
<para>
One of the most critical design decisions will be the column or columns
by which you partition your data. Often the best choice will be to
partition by the column or set of columns which most commonly appear in
<literal>WHERE</literal> clauses of queries being executed on the
partitioned table. <literal>WHERE</literal> clause items that match and
are compatible with the partition key can be used to prune unneeded
partitions. However, you may be forced into making other decisions by
requirements for the <literal>PRIMARY KEY</literal> or a
<literal>UNIQUE</literal> constraint. Removal of unwanted data is also a
factor to consider when planning your partitioning strategy. An entire
partition can be detached fairly quickly, so it may be beneficial to
design the partition strategy in such a way that all data to be removed
at once is located in a single partition.
</para>
<para>
Choosing the target number of partitions that the table should be divided
into is also a critical decision to make. Not having enough partitions
may mean that indexes remain too large and that data locality remains poor
which could result in low cache hit ratios. However, dividing the table
into too many partitions can also cause issues. Too many partitions can
mean longer query planning times and higher memory consumption during both
query planning and execution. When choosing how to partition your table,
it's also important to consider what changes may occur in the future. For
example, if you choose to have one partition per customer and you
currently have a small number of large customers, consider the
implications if in several years you instead find yourself with a large
number of small customers. In this case, it may be better to choose to
partition by <literal>HASH</literal> and choose a reasonable number of
partitions rather than trying to partition by <literal>LIST</literal> and
hoping that the number of customers does not increase beyond what it is
practical to partition the data by.
</para>
<para>
Sub-partitioning can be useful to further divide partitions that are
expected to become larger than other partitions, although excessive
sub-partitioning can easily lead to large numbers of partitions and can
cause the same problems mentioned in the preceding paragraph.
</para>
<para>
It is also important to consider the overhead of partitioning during
query planning and execution. The query planner is generally able to
handle partition hierarchies up a few hundred partitions fairly well,
provided that typical queries allow the query planner to prune all but a
small number of partitions. Planning times become longer and memory
consumption becomes higher as more partitions are added. This is
particularly true for the <command>UPDATE</command> and
<command>DELETE</command> commands. Another reason to be concerned about
having a large number of partitions is that the server's memory
consumption may grow significantly over a period of time, especially if
many sessions touch large numbers of partitions. That's because each
partition requires its metadata to be loaded into the local memory of
each session that touches it.
</para>
<para>
With data warehouse type workloads, it can make sense to use a larger
number of partitions than with an <acronym>OLTP</acronym> type workload.
Generally, in data warehouses, query planning time is less of a concern as
the majority of processing time is spent during query execution. With
either of these two types of workload, it is important to make the right
decisions early, as re-partitioning large quantities of data can be
painfully slow. Simulations of the intended workload are often beneficial
for optimizing the partitioning strategy. Never assume that more
partitions are better than fewer partitions and vice-versa.
</para>
</sect2>
</sect1>
<sect1 id="ddl-foreign-data">