From b150a76793109b00ea43c26e0006b3cad9030096 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Tue, 24 Mar 2020 14:58:27 -0700 Subject: [PATCH] Fix nbtree deduplication README commentary. Descriptions of some aspects of how deduplication works were unclear in a couple of places. --- src/backend/access/nbtree/README | 45 +++++++++++++++++--------------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README index 6499f5adb7..2d0f8f4b79 100644 --- a/src/backend/access/nbtree/README +++ b/src/backend/access/nbtree/README @@ -780,20 +780,21 @@ order. Delaying deduplication minimizes page level fragmentation. Deduplication in unique indexes ------------------------------- -Very often, the range of values that can be placed on a given leaf page in -a unique index is fixed and permanent. For example, a primary key on an -identity column will usually only have page splits caused by the insertion -of new logical rows within the rightmost leaf page. If there is a split -of a non-rightmost leaf page, then the split must have been triggered by -inserts associated with an UPDATE of an existing logical row. Splitting a -leaf page purely to store multiple versions should be considered -pathological, since it permanently degrades the index structure in order -to absorb a temporary burst of duplicates. Deduplication in unique -indexes helps to prevent these pathological page splits. Storing -duplicates in a space efficient manner is not the goal, since in the long -run there won't be any duplicates anyway. Rather, we're buying time for -standard garbage collection mechanisms to run before a page split is -needed. +Very often, the number of distinct values that can ever be placed on +almost any given leaf page in a unique index is fixed and permanent. For +example, a primary key on an identity column will usually only have leaf +page splits caused by the insertion of new logical rows within the +rightmost leaf page. If there is a split of a non-rightmost leaf page, +then the split must have been triggered by inserts associated with UPDATEs +of existing logical rows. Splitting a leaf page purely to store multiple +versions is a false economy. In effect, we're permanently degrading the +index structure just to absorb a temporary burst of duplicates. + +Deduplication in unique indexes helps to prevent these pathological page +splits. Storing duplicates in a space efficient manner is not the goal, +since in the long run there won't be any duplicates anyway. Rather, we're +buying time for standard garbage collection mechanisms to run before a +page split is needed. Unique index leaf pages only get a deduplication pass when an insertion (that might have to split the page) observed an existing duplicate on the @@ -838,13 +839,15 @@ list splits. Only a few isolated extra steps are required to preserve the illusion that the new item never overlapped with an existing posting list in the first -place: the heap TID of the incoming tuple is swapped with the rightmost/max -heap TID from the existing/originally overlapping posting list. Also, the -posting-split-with-page-split case must generate a new high key based on -an imaginary version of the original page that has both the final new item -and the after-list-split posting tuple (page splits usually just operate -against an imaginary version that contains the new item/item that won't -fit). +place: the heap TID of the incoming tuple has its TID replaced with the +rightmost/max heap TID from the existing/originally overlapping posting +list. Similarly, the original incoming item's TID is relocated to the +appropriate offset in the posting list (we usually shift TIDs out of the +way to make a hole for it). Finally, the posting-split-with-page-split +case must generate a new high key based on an imaginary version of the +original page that has both the final new item and the after-list-split +posting tuple (page splits usually just operate against an imaginary +version that contains the new item/item that won't fit). This approach avoids inventing an "eager" atomic posting split operation that splits the posting list without simultaneously finishing the insert