Fix and clarify comments on replacement selection.

These were modified by the patch to only use replacement selection for the
first run in an external sort.
This commit is contained in:
Heikki Linnakangas 2016-09-15 11:51:43 +03:00
parent 5e1431f94e
commit 18ae680632
1 changed files with 16 additions and 16 deletions

View File

@ -13,26 +13,26 @@
* See Knuth, volume 3, for more than you want to know about the external
* sorting algorithm. Historically, we divided the input into sorted runs
* using replacement selection, in the form of a priority tree implemented
* as a heap (essentially his Algorithm 5.2.3H -- although that strategy is
* often avoided altogether), but that can now only happen first the first
* run. We merge the runs using polyphase merge, Knuth's Algorithm
* as a heap (essentially his Algorithm 5.2.3H), but now we only do that
* for the first run, and only if the run would otherwise end up being very
* short. We merge the runs using polyphase merge, Knuth's Algorithm
* 5.4.2D. The logical "tapes" used by Algorithm D are implemented by
* logtape.c, which avoids space wastage by recycling disk space as soon
* as each block is read from its "tape".
*
* We never form the initial runs using Knuth's recommended replacement
* selection data structure (Algorithm 5.4.1R), because it uses a fixed
* number of records in memory at all times. Since we are dealing with
* tuples that may vary considerably in size, we want to be able to vary
* the number of records kept in memory to ensure full utilization of the
* allowed sort memory space. So, we keep the tuples in a variable-size
* heap, with the next record to go out at the top of the heap. Like
* Algorithm 5.4.1R, each record is stored with the run number that it
* must go into, and we use (run number, key) as the ordering key for the
* heap. When the run number at the top of the heap changes, we know that
* no more records of the prior run are left in the heap. Note that there
* are in practice only ever two distinct run numbers, due to the greatly
* reduced use of replacement selection in PostgreSQL 9.6.
* We do not use Knuth's recommended data structure (Algorithm 5.4.1R) for
* the replacement selection, because it uses a fixed number of records
* in memory at all times. Since we are dealing with tuples that may vary
* considerably in size, we want to be able to vary the number of records
* kept in memory to ensure full utilization of the allowed sort memory
* space. So, we keep the tuples in a variable-size heap, with the next
* record to go out at the top of the heap. Like Algorithm 5.4.1R, each
* record is stored with the run number that it must go into, and we use
* (run number, key) as the ordering key for the heap. When the run number
* at the top of the heap changes, we know that no more records of the prior
* run are left in the heap. Note that there are in practice only ever two
* distinct run numbers, because since PostgreSQL 9.6, we only use
* replacement selection to form the first run.
*
* In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small
* customizations) is only used with the aim of producing just one run,