Fix and clarify comments on replacement selection.

These were modified by the patch to only use replacement selection for the
first run in an external sort.
This commit is contained in:
Heikki Linnakangas 2016-09-15 11:51:43 +03:00
parent 5e1431f94e
commit 18ae680632

View File

@ -13,26 +13,26 @@
* See Knuth, volume 3, for more than you want to know about the external * See Knuth, volume 3, for more than you want to know about the external
* sorting algorithm. Historically, we divided the input into sorted runs * sorting algorithm. Historically, we divided the input into sorted runs
* using replacement selection, in the form of a priority tree implemented * using replacement selection, in the form of a priority tree implemented
* as a heap (essentially his Algorithm 5.2.3H -- although that strategy is * as a heap (essentially his Algorithm 5.2.3H), but now we only do that
* often avoided altogether), but that can now only happen first the first * for the first run, and only if the run would otherwise end up being very
* run. We merge the runs using polyphase merge, Knuth's Algorithm * short. We merge the runs using polyphase merge, Knuth's Algorithm
* 5.4.2D. The logical "tapes" used by Algorithm D are implemented by * 5.4.2D. The logical "tapes" used by Algorithm D are implemented by
* logtape.c, which avoids space wastage by recycling disk space as soon * logtape.c, which avoids space wastage by recycling disk space as soon
* as each block is read from its "tape". * as each block is read from its "tape".
* *
* We never form the initial runs using Knuth's recommended replacement * We do not use Knuth's recommended data structure (Algorithm 5.4.1R) for
* selection data structure (Algorithm 5.4.1R), because it uses a fixed * the replacement selection, because it uses a fixed number of records
* number of records in memory at all times. Since we are dealing with * in memory at all times. Since we are dealing with tuples that may vary
* tuples that may vary considerably in size, we want to be able to vary * considerably in size, we want to be able to vary the number of records
* the number of records kept in memory to ensure full utilization of the * kept in memory to ensure full utilization of the allowed sort memory
* allowed sort memory space. So, we keep the tuples in a variable-size * space. So, we keep the tuples in a variable-size heap, with the next
* heap, with the next record to go out at the top of the heap. Like * record to go out at the top of the heap. Like Algorithm 5.4.1R, each
* Algorithm 5.4.1R, each record is stored with the run number that it * record is stored with the run number that it must go into, and we use
* must go into, and we use (run number, key) as the ordering key for the * (run number, key) as the ordering key for the heap. When the run number
* heap. When the run number at the top of the heap changes, we know that * at the top of the heap changes, we know that no more records of the prior
* no more records of the prior run are left in the heap. Note that there * run are left in the heap. Note that there are in practice only ever two
* are in practice only ever two distinct run numbers, due to the greatly * distinct run numbers, because since PostgreSQL 9.6, we only use
* reduced use of replacement selection in PostgreSQL 9.6. * replacement selection to form the first run.
* *
* In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small * In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small
* customizations) is only used with the aim of producing just one run, * customizations) is only used with the aim of producing just one run,