From a87d7801c24ffb3593841838ba0e3d4883d34853 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sat, 14 Nov 2020 13:09:53 -0500 Subject: [PATCH] Doc: improve partitioning discussion in ddl.sgml. This started with the intent to explain that range upper bounds are exclusive, which previously you could only find out by reading the CREATE TABLE man page. But I soon found that section 5.11 really could stand a fair amount of editorial attention. It's apparently been revised several times without much concern for overall flow, nor careful copy-editing. Back-patch to v11, which is as far as the patch goes easily. Per gripe from Edson Richter. Thanks to David Johnston for review. Discussion: https://postgr.es/m/DM6PR13MB3988736CF8F5DC5720440231CFE60@DM6PR13MB3988.namprd13.prod.outlook.com --- doc/src/sgml/ddl.sgml | 372 +++++++++++++++++++++++------------------- 1 file changed, 203 insertions(+), 169 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 8fafa5dd04..7538200d01 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2947,8 +2947,8 @@ VALUES ('Albany', NULL, NULL, 'NY'); Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a - single partition or a small number of partitions. The partitioning - substitutes for leading columns of indexes, reducing index size and + single partition or a small number of partitions. Partitioning + effectively substitutes for the upper tree levels of indexes, making it more likely that the heavily-used parts of the indexes fit in memory. @@ -2957,18 +2957,20 @@ VALUES ('Albany', NULL, NULL, 'NY'); When queries or updates access a large percentage of a single - partition, performance can be improved by taking advantage - of sequential scan of that partition instead of using an - index and random access reads scattered across the whole table. + partition, performance can be improved by using a + sequential scan of that partition instead of using an + index, which would require random-access reads scattered across the + whole table. Bulk loads and deletes can be accomplished by adding or removing - partitions, if that requirement is planned into the partitioning design. - Doing ALTER TABLE DETACH PARTITION or dropping an individual - partition using DROP TABLE is far faster than a bulk + partitions, if the usage pattern is accounted for in the + partitioning design. Dropping an individual partition + using DROP TABLE, or doing ALTER TABLE + DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE. @@ -2981,7 +2983,7 @@ VALUES ('Albany', NULL, NULL, 'NY'); - The benefits will normally be worthwhile only when a table would + These benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical @@ -3003,6 +3005,13 @@ VALUES ('Albany', NULL, NULL, 'NY'); the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. + Each range's bounds are understood as being inclusive at the + lower end and exclusive at the upper end. For example, if one + partition's range is from 1 + to 10, and the next one's range is + from 10 to 20, then + value 10 belongs to the second partition not + the first. @@ -3012,7 +3021,7 @@ VALUES ('Albany', NULL, NULL, 'NY'); - The table is partitioned by explicitly listing which key values + The table is partitioned by explicitly listing which key value(s) appear in each partition. @@ -3044,25 +3053,34 @@ VALUES ('Albany', NULL, NULL, 'NY'); Declarative Partitioning - PostgreSQL offers a way to specify how to - divide a table into pieces called partitions. The table that is divided + PostgreSQL allows you to declare + that a table is divided into partitions. The table that is divided is referred to as a partitioned table. The - specification consists of the partitioning method - and a list of columns or expressions to be used as the - partition key. + declaration includes the partitioning method + as described above, plus a list of columns or expressions to be used + as the partition key. - All rows inserted into a partitioned table will be routed to one of the - partitions based on the value of the partition - key. Each partition has a subset of the data defined by its - partition bounds. The currently supported - partitioning methods are range, list, and hash. + The partitioned table itself is a virtual table having + no storage of its own. Instead, the storage belongs + to partitions, which are otherwise-ordinary + tables associated with the partitioned table. + Each partition stores a subset of the data as defined by its + partition bounds. + All rows inserted into a partitioned table will be routed to the + appropriate one of the partitions based on the values of the partition + key column(s). + Updating the partition key of a row will cause it to be moved into a + different partition if it no longer satisfies the partition bounds + of its original partition. - Partitions may themselves be defined as partitioned tables, using what is - called sub-partitioning. Partitions may have their + Partitions may themselves be defined as partitioned tables, resulting + in sub-partitioning. Although all partitions + must have the same columns as their partitioned parent, partitions may + have their own indexes, constraints and default values, distinct from those of other partitions. See for more details on creating partitioned tables and partitions. @@ -3070,91 +3088,21 @@ VALUES ('Albany', NULL, NULL, 'NY'); It is not possible to turn a regular table into a partitioned table or - vice versa. However, it is possible to add a regular or partitioned table - containing data as a partition of a partitioned table, or remove a + vice versa. However, it is possible to add an existing regular or + partitioned table as a partition of a partitioned table, or remove a partition from a partitioned table turning it into a standalone table; - see to learn more about the + this can simplify and speed up many maintenance processes. + See to learn more about the ATTACH PARTITION and DETACH PARTITION sub-commands. - - Individual partitions are linked to the partitioned table with inheritance - behind-the-scenes; however, it is not possible to use some of the - generic features of inheritance (discussed below) with declaratively - partitioned tables or their partitions. For example, a partition - cannot have any parents other than the partitioned table it is a - partition of, nor can a regular table inherit from a partitioned table - making the latter its parent. That means partitioned tables and their - partitions do not participate in inheritance with regular tables. - Since a partition hierarchy consisting of the partitioned table and its - partitions is still an inheritance hierarchy, all the normal rules of - inheritance apply as described in with - some exceptions, most notably: - - - - - Both CHECK and NOT NULL - constraints of a partitioned table are always inherited by all its - partitions. CHECK constraints that are marked - NO INHERIT are not allowed to be created on - partitioned tables. - - - - - - Using ONLY to add or drop a constraint on only the - partitioned table is supported as long as there are no partitions. Once - partitions exist, using ONLY will result in an error - as adding or dropping constraints on only the partitioned table, when - partitions exist, is not supported. Instead, constraints on the - partitions themselves can be added and (if they are not present in the - parent table) dropped. - - - - - - As a partitioned table does not have any data directly, attempts to use - TRUNCATE ONLY on a partitioned - table will always return an error. - - - - - - Partitions cannot have columns that are not present in the parent. It - is not possible to specify columns when creating partitions with - CREATE TABLE, nor is it possible to add columns to - partitions after-the-fact using ALTER TABLE. Tables may be - added as a partition with ALTER TABLE ... ATTACH PARTITION - only if their columns exactly match the parent, including any - oid column. - - - - - - You cannot drop the NOT NULL constraint on a - partition's column if the constraint is present in the parent table. - - - - - Partitions can also be foreign tables, although they have some limitations that normal tables do not; see for more information. - - Updating the partition key of a row might cause it to be moved into a - different partition where this row satisfies the partition bounds. - - Example @@ -3175,7 +3123,7 @@ CREATE TABLE measurement ( We know that most queries will access just the last week's, month's or quarter's data, since the main use of this table will be to prepare online reports for management. To reduce the amount of old data that - needs to be stored, we decide to only keep the most recent 3 years + needs to be stored, we decide to keep only the most recent 3 years worth of data. At the beginning of each month we will remove the oldest month's data. In this situation we can use partitioning to help us meet all of our different requirements for the measurements table. @@ -3187,7 +3135,7 @@ CREATE TABLE measurement ( - Create measurement table as a partitioned + Create the measurement table as a partitioned table by specifying the PARTITION BY clause, which includes the partitioning method (RANGE in this case) and the list of column(s) to use as the partition key. @@ -3201,30 +3149,15 @@ CREATE TABLE measurement ( ) PARTITION BY RANGE (logdate); - - - You may decide to use multiple columns in the partition key for range - partitioning, if desired. Of course, this will often result in a larger - number of partitions, each of which is individually smaller. On the - other hand, using fewer columns may lead to a coarser-grained - partitioning criteria with smaller number of partitions. A query - accessing the partitioned table will have to scan fewer partitions if - the conditions involve some or all of these columns. - For example, consider a table range partitioned using columns - lastname and firstname (in that order) - as the partition key. - - Create partitions. Each partition's definition must specify the bounds + Create partitions. Each partition's definition must specify bounds that correspond to the partitioning method and partition key of the parent. Note that specifying bounds such that the new partition's - values will overlap with those in one or more existing partitions will - cause an error. Inserting data into the parent table that does not map - to one of the existing partitions will cause an error; an appropriate - partition must be added manually. + values would overlap with those in one or more existing partitions will + cause an error. @@ -3235,10 +3168,9 @@ CREATE TABLE measurement ( - It is not necessary to create table constraints describing partition - boundary condition for partitions. Instead, partition constraints are - generated implicitly from the partition bound specification whenever - there is need to refer to them. + For our example, each partition should hold one month's worth of + data, to match the requirement of deleting one month's data at a + time. So the commands might look like: CREATE TABLE measurement_y2006m02 PARTITION OF measurement @@ -3260,10 +3192,13 @@ CREATE TABLE measurement_y2008m01 PARTITION OF measurement WITH (parallel_workers = 4) TABLESPACE fasttablespace; + + (Recall that adjacent partitions can share a bound value, since + range upper bounds are treated as exclusive bounds.) - To implement sub-partitioning, specify the + If you wish to implement sub-partitioning, again specify the PARTITION BY clause in the commands used to create individual partitions, for example: @@ -3275,16 +3210,29 @@ CREATE TABLE measurement_y2006m02 PARTITION OF measurement After creating partitions of measurement_y2006m02, any data inserted into measurement that is mapped to - measurement_y2006m02 (or data that is directly inserted - into measurement_y2006m02, provided it satisfies its - partition constraint) will be further redirected to one of its + measurement_y2006m02 (or data that is + directly inserted into measurement_y2006m02, + which is allowed provided its partition constraint is satisfied) + will be further redirected to one of its partitions based on the peaktemp column. The partition key specified may overlap with the parent's partition key, although care should be taken when specifying the bounds of a sub-partition such that the set of data it accepts constitutes a subset of what - the partition's own bounds allows; the system does not try to check + the partition's own bounds allow; the system does not try to check whether that's really the case. + + + Inserting data into the parent table that does not map + to one of the existing partitions will cause an error; an appropriate + partition must be added manually. + + + + It is not necessary to manually create table constraints describing + the partition boundary conditions for partitions. Such constraints + will be created automatically. + @@ -3292,9 +3240,13 @@ CREATE TABLE measurement_y2006m02 PARTITION OF measurement Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) - This automatically creates - one index on each partition, and any partitions you create or attach - later will also contain the index. + This automatically creates a matching index on each partition, and + any partitions you create or attach later will also have such an + index. + An index or unique constraint declared on a partitioned table + is virtual in the same way that the partitioned table + is: the actual data is in child indexes on the individual partition + tables. CREATE INDEX ON measurement (logdate); @@ -3325,7 +3277,7 @@ CREATE INDEX ON measurement (logdate); Normally the set of partitions established when initially defining the table is not intended to remain static. It is common to want to - remove old partitions of data and periodically add new partitions for + remove partitions holding old data and periodically add new partitions for new data. One of the most important advantages of partitioning is precisely that it allows this otherwise painful task to be executed nearly instantaneously by manipulating the partition structure, rather @@ -3374,8 +3326,10 @@ CREATE TABLE measurement_y2008m02 PARTITION OF measurement As an alternative, it is sometimes more convenient to create the new table outside the partition structure, and make it a proper - partition later. This allows the data to be loaded, checked, and - transformed prior to it appearing in the partitioned table: + partition later. This allows new data to be loaded, checked, and + transformed prior to it appearing in the partitioned table. + The CREATE TABLE ... LIKE option is helpful + to avoid tediously repeating the parent table's definition: CREATE TABLE measurement_y2008m02 @@ -3396,26 +3350,28 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to - be attached matching the desired partition constraint. That way, - the system will be able to skip the scan to validate the implicit + be attached that matches the expected partition constraint, as + illustrated above. That way, the system will be able to skip the scan + which is otherwise needed to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on the parent table. - It may be desired to drop the redundant CHECK constraint - after ATTACH PARTITION is finished. + It is recommended to drop the now-redundant CHECK + constraint after ATTACH PARTITION is finished. As explained above, it is possible to create indexes on partitioned tables - and they are applied automatically to the entire hierarchy. This is very - convenient, as not only the existing partitions will become indexed, but + so that they are applied automatically to the entire hierarchy. + This is very + convenient, as not only will the existing partitions become indexed, but also any partitions that are created in the future will. One limitation is that it's not possible to use the CONCURRENTLY - qualifier when creating such a partitioned index. To overcome long lock + qualifier when creating such a partitioned index. To avoid long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can - be created separately using CONCURRENTLY, and later + be created individually using CONCURRENTLY, and then attached to the index on the parent using ALTER INDEX .. ATTACH PARTITION. Once indexes for all partitions are attached to the parent index, the parent index is marked @@ -3452,18 +3408,22 @@ ALTER INDEX measurement_city_id_logdate_key - There is no way to create an - exclusion constraint spanning all partitions; it is only possible - to constrain each leaf partition individually. + Unique constraints (and hence primary keys) on partitioned tables must + include all the partition key columns. This limitation exists because + the individual indexes making up the constraint can only directly + enforce uniqueness within their own partitions; therefore, the + partition structure itself must guarantee that there are not + duplicates in different partitions. - Unique constraints (and hence primary keys) on partitioned tables must - include all the partition key columns. This limitation exists because - PostgreSQL can only enforce - uniqueness in each partition individually. + There is no way to create an exclusion constraint spanning the + whole partitioned table. It is only possible to put such a + constraint on each leaf partition individually. Again, this + limitation stems from not being able to enforce cross-partition + restrictions. @@ -3493,11 +3453,76 @@ ALTER INDEX measurement_city_id_logdate_key + + + Individual partitions are linked to their partitioned table using + inheritance behind-the-scenes. However, it is not possible to use + all of the generic features of inheritance with declaratively + partitioned tables or their partitions, as discussed below. Notably, + a partition cannot have any parents other than the partitioned table + it is a partition of, nor can a table inherit from both a partitioned + table and a regular table. That means partitioned tables and their + partitions never share an inheritance hierarchy with regular tables. + + + + Since a partition hierarchy consisting of the partitioned table and its + partitions is still an inheritance hierarchy, all the normal rules of + inheritance apply as described in , with + a few exceptions: + + + + + Partitions cannot have columns that are not present in the parent. It + is not possible to specify columns when creating partitions with + CREATE TABLE, nor is it possible to add columns to + partitions after-the-fact using ALTER TABLE. + Tables may be added as a partition with ALTER TABLE + ... ATTACH PARTITION only if their columns exactly match + the parent, including any oid column. + + + + + + Both CHECK and NOT NULL + constraints of a partitioned table are always inherited by all its + partitions. CHECK constraints that are marked + NO INHERIT are not allowed to be created on + partitioned tables. + You cannot drop a NOT NULL constraint on a + partition's column if the same constraint is present in the parent + table. + + + + + + Using ONLY to add or drop a constraint on only + the partitioned table is supported as long as there are no + partitions. Once partitions exist, using ONLY + will result in an error. Instead, constraints on the partitions + themselves can be added and (if they are not present in the parent + table) dropped. + + + + + + As a partitioned table does not have any data itself, attempts to use + TRUNCATE ONLY on a partitioned + table will always return an error. + + + + - - Implementation Using Inheritance + + Partitioning Using Inheritance + While the built-in declarative partitioning is suitable for most common use cases, there are some circumstances where a more flexible @@ -3547,8 +3572,8 @@ ALTER INDEX measurement_city_id_logdate_key Example - We use the non-partitioned measurement - table above. To implement partitioning using inheritance, use + This example builds a partitioning structure equivalent to the + declarative partitioning example above. Use the following steps: @@ -3560,7 +3585,16 @@ ALTER INDEX measurement_city_id_logdate_key to be applied equally to all child tables. There is no point in defining any indexes or unique constraints on it, either. For our example, the master table is the measurement - table as originally defined. + table as originally defined: + + +CREATE TABLE measurement ( + city_id int not null, + logdate date not null, + peaktemp int, + unitsales int +); + @@ -3607,10 +3641,7 @@ CHECK ( outletID BETWEEN 200 AND 300 ) This is wrong since it is not clear which child table the key value 200 belongs in. - - - - It would be better to instead create child tables as follows: + Instead, ranges should be defined in this style: CREATE TABLE measurement_y2006m02 ( @@ -3683,7 +3714,7 @@ CREATE TRIGGER insert_measurement_trigger We must redefine the trigger function each month so that it always - points to the current child table. The trigger definition does + inserts into the current child table. The trigger definition does not need to be updated, however. @@ -4139,12 +4170,12 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; - + - Declarative Partitioning Best Practices + Best Practices for Declarative Partitioning - The choice of how to partition a table should be made carefully as the + The choice of how to partition a table should be made carefully, as the performance of query planning and execution can be negatively affected by poor design. @@ -4154,8 +4185,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; by which you partition your data. Often the best choice will be to partition by the column or set of columns which most commonly appear in WHERE clauses of queries being executed on the - partitioned table. WHERE clause items that match and - are compatible with the partition key can be used to prune unneeded + partitioned table. WHERE clauses that are compatible + with the partition bound constraints can be used to prune unneeded partitions. However, you may be forced into making other decisions by requirements for the PRIMARY KEY or a UNIQUE constraint. Removal of unwanted data is also a @@ -4172,7 +4203,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both - query planning and execution. When choosing how to partition your table, + query planning and execution, as further described below. + When choosing how to partition your table, it's also important to consider what changes may occur in the future. For example, if you choose to have one partition per customer and you currently have a small number of large customers, consider the @@ -4186,13 +4218,15 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; Sub-partitioning can be useful to further divide partitions that are - expected to become larger than other partitions, although excessive - sub-partitioning can easily lead to large numbers of partitions and can - cause the same problems mentioned in the preceding paragraph. + expected to become larger than other partitions. + Another option is to use range partitioning with multiple columns in + the partition key. + Either of these can easily lead to excessive numbers of partitions, + so restraint is advisable. - It is also important to consider the overhead of partitioning during + It is important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all @@ -4201,7 +4235,7 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; particularly true for the UPDATE and DELETE commands. Another reason to be concerned about having a large number of partitions is that the server's memory - consumption may grow significantly over a period of time, especially if + consumption may grow significantly over time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it. @@ -4215,8 +4249,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; either of these two types of workload, it is important to make the right decisions early, as re-partitioning large quantities of data can be painfully slow. Simulations of the intended workload are often beneficial - for optimizing the partitioning strategy. Never assume that more - partitions are better than fewer partitions and vice-versa. + for optimizing the partitioning strategy. Never just assume that more + partitions are better than fewer partitions, nor vice-versa.