diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 39382e99c7..b628cac2d3 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2833,8 +2833,9 @@ VALUES ('Albany', NULL, NULL, 'NY'); - These deficiencies will probably be fixed in some future release, - but in the meantime considerable care is needed in deciding whether + Some functionality not implemented for inheritance hierarchies is + implemented for declarative partitioning. + Considerable care is needed in deciding whether partitioning with legacy inheritance is useful for your application. @@ -4057,6 +4058,87 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; + + + Declarative Partitioning Best Practices + + + The choice of how to partition a table should be made carefully as the + performance of query planning and execution can be negatively affected by + poor design. + + + + One of the most critical design decisions will be the column or columns + by which you partition your data. Often the best choice will be to + partition by the column or set of columns which most commonly appear in + WHERE clauses of queries being executed on the + partitioned table. WHERE clause items that match and + are compatible with the partition key can be used to prune unneeded + partitions. However, you may be forced into making other decisions by + requirements for the PRIMARY KEY or a + UNIQUE constraint. Removal of unwanted data is also a + factor to consider when planning your partitioning strategy. An entire + partition can be detached fairly quickly, so it may be beneficial to + design the partition strategy in such a way that all data to be removed + at once is located in a single partition. + + + + Choosing the target number of partitions that the table should be divided + into is also a critical decision to make. Not having enough partitions + may mean that indexes remain too large and that data locality remains poor + which could result in low cache hit ratios. However, dividing the table + into too many partitions can also cause issues. Too many partitions can + mean longer query planning times and higher memory consumption during both + query planning and execution. When choosing how to partition your table, + it's also important to consider what changes may occur in the future. For + example, if you choose to have one partition per customer and you + currently have a small number of large customers, consider the + implications if in several years you instead find yourself with a large + number of small customers. In this case, it may be better to choose to + partition by HASH and choose a reasonable number of + partitions rather than trying to partition by LIST and + hoping that the number of customers does not increase beyond what it is + practical to partition the data by. + + + + Sub-partitioning can be useful to further divide partitions that are + expected to become larger than other partitions, although excessive + sub-partitioning can easily lead to large numbers of partitions and can + cause the same problems mentioned in the preceding paragraph. + + + + It is also important to consider the overhead of partitioning during + query planning and execution. The query planner is generally able to + handle partition hierarchies up a few hundred partitions fairly well, + provided that typical queries allow the query planner to prune all but a + small number of partitions. Planning times become longer and memory + consumption becomes higher as more partitions are added. This is + particularly true for the UPDATE and + DELETE commands. Another reason to be concerned about + having a large number of partitions is that the server's memory + consumption may grow significantly over a period of time, especially if + many sessions touch large numbers of partitions. That's because each + partition requires its metadata to be loaded into the local memory of + each session that touches it. + + + + With data warehouse type workloads, it can make sense to use a larger + number of partitions than with an OLTP type workload. + Generally, in data warehouses, query planning time is less of a concern as + the majority of processing time is spent during query execution. With + either of these two types of workload, it is important to make the right + decisions early, as re-partitioning large quantities of data can be + painfully slow. Simulations of the intended workload are often beneficial + for optimizing the partitioning strategy. Never assume that more + partitions are better than fewer partitions and vice-versa. + + +