1999-10-16 21:49:28 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* logtape.c
|
|
|
|
* Management of "logical tapes" within temporary files.
|
|
|
|
*
|
|
|
|
* This module exists to support sorting via multiple merge passes (see
|
1999-10-18 00:15:09 +02:00
|
|
|
* tuplesort.c). Merging is an ideal algorithm for tape devices, but if
|
|
|
|
* we implement it on disk by creating a separate file for each "tape",
|
1999-10-16 21:49:28 +02:00
|
|
|
* there is an annoying problem: the peak space usage is at least twice
|
|
|
|
* the volume of actual data to be sorted. (This must be so because each
|
|
|
|
* datum will appear in both the input and output tapes of the final
|
2021-10-18 13:30:00 +02:00
|
|
|
* merge pass.)
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
|
|
|
* We can work around this problem by recognizing that any one tape
|
|
|
|
* dataset (with the possible exception of the final output) is written
|
|
|
|
* and read exactly once in a perfectly sequential manner. Therefore,
|
|
|
|
* a datum once read will not be required again, and we can recycle its
|
|
|
|
* space for use by the new tape dataset(s) being generated. In this way,
|
|
|
|
* the total space usage is essentially just the actual data volume, plus
|
|
|
|
* insignificant bookkeeping and start/stop overhead.
|
|
|
|
*
|
|
|
|
* Few OSes allow arbitrary parts of a file to be released back to the OS,
|
|
|
|
* so we have to implement this space-recycling ourselves within a single
|
|
|
|
* logical file. logtape.c exists to perform this bookkeeping and provide
|
1999-10-18 00:15:09 +02:00
|
|
|
* the illusion of N independent tape devices to tuplesort.c. Note that
|
1999-10-16 21:49:28 +02:00
|
|
|
* logtape.c itself depends on buffile.c to provide a "logical file" of
|
|
|
|
* larger size than the underlying OS may support.
|
|
|
|
*
|
|
|
|
* For simplicity, we allocate and release space in the underlying file
|
|
|
|
* in BLCKSZ-size blocks. Space allocation boils down to keeping track
|
|
|
|
* of which blocks in the underlying file belong to which logical tape,
|
2006-03-08 00:46:24 +01:00
|
|
|
* plus any blocks that are free (recycled and not yet reused).
|
2016-12-22 17:45:00 +01:00
|
|
|
* The blocks in each logical tape form a chain, with a prev- and next-
|
|
|
|
* pointer in each block.
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
|
|
|
* The initial write pass is guaranteed to fill the underlying file
|
|
|
|
* perfectly sequentially, no matter how data is divided into logical tapes.
|
|
|
|
* Once we begin merge passes, the access pattern becomes considerably
|
|
|
|
* less predictable --- but the seeking involved should be comparable to
|
|
|
|
* what would happen if we kept each logical tape in a separate file,
|
|
|
|
* so there's no serious performance penalty paid to obtain the space
|
|
|
|
* savings of recycling. We try to localize the write accesses by always
|
|
|
|
* writing to the lowest-numbered free block when we have a choice; it's
|
|
|
|
* not clear this helps much, but it can't hurt. (XXX perhaps a LIFO
|
|
|
|
* policy for free blocks would be better?)
|
|
|
|
*
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
* To further make the I/Os more sequential, we can use a larger buffer
|
|
|
|
* when reading, and read multiple blocks from the same tape in one go,
|
2018-04-27 08:31:43 +02:00
|
|
|
* whenever the buffer becomes empty.
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
*
|
2020-02-06 19:06:25 +01:00
|
|
|
* To support the above policy of writing to the lowest free block, the
|
|
|
|
* freelist is a min heap.
|
2006-03-08 00:46:24 +01:00
|
|
|
*
|
1999-10-16 21:49:28 +02:00
|
|
|
* Since all the bookkeeping and buffer memory is allocated with palloc(),
|
|
|
|
* and the underlying file(s) are made with OpenTemporaryFile, all resources
|
|
|
|
* for a logical tape set are certain to be cleaned up even if processing
|
2003-07-25 22:18:01 +02:00
|
|
|
* is aborted by ereport(ERROR). To avoid confusion, the caller should take
|
1999-10-16 21:49:28 +02:00
|
|
|
* care that all calls for a single LogicalTapeSet are made in the same
|
|
|
|
* palloc context.
|
|
|
|
*
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
* To support parallel sort operations involving coordinated callers to
|
|
|
|
* tuplesort.c routines across multiple workers, it is necessary to
|
|
|
|
* concatenate each worker BufFile/tapeset into one single logical tapeset
|
|
|
|
* managed by the leader. Workers should have produced one final
|
|
|
|
* materialized tape (their entire output) when this happens in leader.
|
|
|
|
* There will always be the same number of runs as input tapes, and the same
|
|
|
|
* number of input tapes as participants (worker Tuplesortstates).
|
|
|
|
*
|
2024-01-04 02:49:05 +01:00
|
|
|
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
|
2000-01-26 06:58:53 +01:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/utils/sort/logtape.c
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2020-08-26 04:06:43 +02:00
|
|
|
#include <fcntl.h>
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
#include "storage/buffile.h"
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
#include "utils/builtins.h"
|
1999-10-16 21:49:28 +02:00
|
|
|
#include "utils/logtape.h"
|
2018-02-06 20:24:57 +01:00
|
|
|
#include "utils/memdebug.h"
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
#include "utils/memutils.h"
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* A TapeBlockTrailer is stored at the end of each BLCKSZ block.
|
|
|
|
*
|
|
|
|
* The first block of a tape has prev == -1. The last block of a tape
|
|
|
|
* stores the number of valid bytes on the block, inverted, in 'next'
|
|
|
|
* Therefore next < 0 indicates the last block.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2016-12-22 17:45:00 +01:00
|
|
|
typedef struct TapeBlockTrailer
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 prev; /* previous block on this tape, or -1 on first
|
2016-12-22 17:45:00 +01:00
|
|
|
* block */
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 next; /* next block on this tape, or # of valid
|
2016-12-22 17:45:00 +01:00
|
|
|
* bytes on last block (if < 0) */
|
|
|
|
} TapeBlockTrailer;
|
|
|
|
|
|
|
|
#define TapeBlockPayloadSize (BLCKSZ - sizeof(TapeBlockTrailer))
|
|
|
|
#define TapeBlockGetTrailer(buf) \
|
|
|
|
((TapeBlockTrailer *) ((char *) buf + TapeBlockPayloadSize))
|
|
|
|
|
|
|
|
#define TapeBlockIsLast(buf) (TapeBlockGetTrailer(buf)->next < 0)
|
|
|
|
#define TapeBlockGetNBytes(buf) \
|
|
|
|
(TapeBlockIsLast(buf) ? \
|
|
|
|
(- TapeBlockGetTrailer(buf)->next) : TapeBlockPayloadSize)
|
|
|
|
#define TapeBlockSetNBytes(buf, nbytes) \
|
|
|
|
(TapeBlockGetTrailer(buf)->next = -(nbytes))
|
|
|
|
|
2020-05-27 01:06:30 +02:00
|
|
|
/*
|
|
|
|
* When multiple tapes are being written to concurrently (as in HashAgg),
|
|
|
|
* avoid excessive fragmentation by preallocating block numbers to individual
|
|
|
|
* tapes. Each preallocation doubles in size starting at
|
|
|
|
* TAPE_WRITE_PREALLOC_MIN blocks up to TAPE_WRITE_PREALLOC_MAX blocks.
|
|
|
|
*
|
|
|
|
* No filesystem operations are performed for preallocation; only the block
|
|
|
|
* numbers are reserved. This may lead to sparse writes, which will cause
|
|
|
|
* ltsWriteBlock() to fill in holes with zeros.
|
|
|
|
*/
|
|
|
|
#define TAPE_WRITE_PREALLOC_MIN 8
|
|
|
|
#define TAPE_WRITE_PREALLOC_MAX 128
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This data structure represents a single "logical tape" within the set
|
2016-12-22 17:45:00 +01:00
|
|
|
* of logical tapes stored in the same file.
|
|
|
|
*
|
|
|
|
* While writing, we hold the current partially-written data block in the
|
|
|
|
* buffer. While reading, we can hold multiple blocks in the buffer. Note
|
|
|
|
* that we don't retain the trailers of a block when it's read into the
|
|
|
|
* buffer. The buffer therefore contains one large contiguous chunk of data
|
|
|
|
* from the tape.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2021-10-18 16:02:01 +02:00
|
|
|
struct LogicalTape
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeSet *tapeSet; /* tape set this tape is part of */
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
bool writing; /* T while in write phase */
|
|
|
|
bool frozen; /* T if blocks should not be freed when read */
|
|
|
|
bool dirty; /* does buffer need to be written? */
|
2000-04-12 19:17:23 +02:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Block numbers of the first, current, and next block of the tape.
|
|
|
|
*
|
|
|
|
* The "current" block number is only valid when writing, or reading from
|
|
|
|
* a frozen tape. (When reading from an unfrozen tape, we use a larger
|
|
|
|
* read buffer that holds multiple blocks, so the "current" block is
|
|
|
|
* ambiguous.)
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
|
|
|
* When concatenation of worker tape BufFiles is performed, an offset to
|
|
|
|
* the first block in the unified BufFile space is applied during reads.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 firstBlockNumber;
|
|
|
|
int64 curBlockNumber;
|
|
|
|
int64 nextBlockNumber;
|
|
|
|
int64 offsetBlockNumber;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Buffer for current data block(s).
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2006-02-19 06:58:36 +01:00
|
|
|
char *buffer; /* physical buffer (separately palloc'd) */
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
int buffer_size; /* allocated size of the buffer */
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
int max_size; /* highest useful, safe buffer_size */
|
1999-10-16 21:49:28 +02:00
|
|
|
int pos; /* next read/write position in buffer */
|
|
|
|
int nbytes; /* total # of valid bytes in buffer */
|
2020-05-27 01:06:30 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Preallocated block numbers are held in an array sorted in descending
|
|
|
|
* order; blocks are consumed from the end of the array (lowest block
|
|
|
|
* numbers first).
|
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 *prealloc;
|
2020-05-27 01:06:30 +02:00
|
|
|
int nprealloc; /* number of elements in list */
|
|
|
|
int prealloc_size; /* number of elements list can hold */
|
2021-10-18 16:02:01 +02:00
|
|
|
};
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This data structure represents a set of related "logical tapes" sharing
|
|
|
|
* space in a single underlying file. (But that "file" may be multiple files
|
|
|
|
* if needed to escape OS limits on file size; buffile.c handles that for us.)
|
2021-10-18 13:30:00 +02:00
|
|
|
* Tapes belonging to a tape set can be created and destroyed on-the-fly, on
|
|
|
|
* demand.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
struct LogicalTapeSet
|
|
|
|
{
|
|
|
|
BufFile *pfile; /* underlying file for whole tape set */
|
2021-10-18 13:30:00 +02:00
|
|
|
SharedFileSet *fileset;
|
|
|
|
int worker; /* worker # if shared, -1 for leader/serial */
|
2017-02-01 11:17:38 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* File size tracking. nBlocksWritten is the size of the underlying file,
|
|
|
|
* in BLCKSZ blocks. nBlocksAllocated is the number of blocks allocated
|
2020-02-06 19:06:25 +01:00
|
|
|
* by ltsReleaseBlock(), and it is always greater than or equal to
|
2017-02-01 11:17:38 +01:00
|
|
|
* nBlocksWritten. Blocks between nBlocksAllocated and nBlocksWritten are
|
|
|
|
* blocks that have been allocated for a tape, but have not been written
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
* to the underlying file yet. nHoleBlocks tracks the total number of
|
|
|
|
* blocks that are in unused holes between worker spaces following BufFile
|
|
|
|
* concatenation.
|
2017-02-01 11:17:38 +01:00
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 nBlocksAllocated; /* # of blocks allocated */
|
|
|
|
int64 nBlocksWritten; /* # of blocks used in underlying file */
|
|
|
|
int64 nHoleBlocks; /* # of "hole" blocks left */
|
2000-04-12 19:17:23 +02:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* We store the numbers of recycled-and-available blocks in freeBlocks[].
|
2006-03-08 00:46:24 +01:00
|
|
|
* When there are no such blocks, we extend the underlying file.
|
2006-03-07 20:06:50 +01:00
|
|
|
*
|
|
|
|
* If forgetFreeSpace is true then any freed blocks are simply forgotten
|
|
|
|
* rather than being remembered in freeBlocks[]. See notes for
|
|
|
|
* LogicalTapeSetForgetFreeSpace().
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2006-03-07 20:06:50 +01:00
|
|
|
bool forgetFreeSpace; /* are we remembering free blocks? */
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 *freeBlocks; /* resizable array holding minheap */
|
|
|
|
int64 nFreeBlocks; /* # of currently free blocks */
|
2020-02-06 19:06:25 +01:00
|
|
|
Size freeBlocksLen; /* current allocated length of freeBlocks[] */
|
2020-09-12 02:10:02 +02:00
|
|
|
bool enable_prealloc; /* preallocate write blocks? */
|
1999-10-16 21:49:28 +02:00
|
|
|
};
|
|
|
|
|
2021-10-18 13:30:00 +02:00
|
|
|
static LogicalTape *ltsCreateTape(LogicalTapeSet *lts);
|
2023-11-17 03:20:53 +01:00
|
|
|
static void ltsWriteBlock(LogicalTapeSet *lts, int64 blocknum, const void *buffer);
|
|
|
|
static void ltsReadBlock(LogicalTapeSet *lts, int64 blocknum, void *buffer);
|
|
|
|
static int64 ltsGetBlock(LogicalTapeSet *lts, LogicalTape *lt);
|
|
|
|
static int64 ltsGetFreeBlock(LogicalTapeSet *lts);
|
|
|
|
static int64 ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt);
|
|
|
|
static void ltsReleaseBlock(LogicalTapeSet *lts, int64 blocknum);
|
2021-10-18 13:30:00 +02:00
|
|
|
static void ltsInitReadBuffer(LogicalTape *lt);
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Write a block-sized buffer to the specified block of the underlying file.
|
|
|
|
*
|
2003-07-25 22:18:01 +02:00
|
|
|
* No need for an error return convention; we ereport() on any error.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
static void
|
2023-11-17 03:20:53 +01:00
|
|
|
ltsWriteBlock(LogicalTapeSet *lts, int64 blocknum, const void *buffer)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2017-02-01 11:17:38 +01:00
|
|
|
/*
|
|
|
|
* BufFile does not support "holes", so if we're about to write a block
|
|
|
|
* that's past the current end of file, fill the space between the current
|
|
|
|
* end of file and the target block with zeros.
|
|
|
|
*
|
2020-09-12 02:10:02 +02:00
|
|
|
* This can happen either when tapes preallocate blocks; or for the last
|
|
|
|
* block of a tape which might not have been flushed.
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
|
|
|
* Note that BufFile concatenation can leave "holes" in BufFile between
|
|
|
|
* worker-owned block ranges. These are tracked for reporting purposes
|
|
|
|
* only. We never read from nor write to these hole blocks, and so they
|
|
|
|
* are not considered here.
|
2017-02-01 11:17:38 +01:00
|
|
|
*/
|
|
|
|
while (blocknum > lts->nBlocksWritten)
|
|
|
|
{
|
2023-04-08 00:38:09 +02:00
|
|
|
PGIOAlignedBlock zerobuf;
|
2017-02-01 11:17:38 +01:00
|
|
|
|
2018-09-01 21:27:12 +02:00
|
|
|
MemSet(zerobuf.data, 0, sizeof(zerobuf));
|
2017-02-01 11:17:38 +01:00
|
|
|
|
2018-09-01 21:27:12 +02:00
|
|
|
ltsWriteBlock(lts, lts->nBlocksWritten, zerobuf.data);
|
2017-02-01 11:17:38 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Write the requested block */
|
2020-06-16 03:50:56 +02:00
|
|
|
if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
|
2003-07-25 22:18:01 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2023-11-17 03:20:53 +01:00
|
|
|
errmsg("could not seek to block %lld of temporary file",
|
|
|
|
(long long) blocknum)));
|
2020-06-16 03:50:56 +02:00
|
|
|
BufFileWrite(lts->pfile, buffer, BLCKSZ);
|
2017-02-01 11:17:38 +01:00
|
|
|
|
|
|
|
/* Update nBlocksWritten, if we extended the file */
|
|
|
|
if (blocknum == lts->nBlocksWritten)
|
|
|
|
lts->nBlocksWritten++;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Read a block-sized buffer from the specified block of the underlying file.
|
|
|
|
*
|
2003-07-25 22:18:01 +02:00
|
|
|
* No need for an error return convention; we ereport() on any error. This
|
1999-10-16 21:49:28 +02:00
|
|
|
* module should never attempt to read a block it doesn't know is there.
|
|
|
|
*/
|
|
|
|
static void
|
2023-11-17 03:20:53 +01:00
|
|
|
ltsReadBlock(LogicalTapeSet *lts, int64 blocknum, void *buffer)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2020-06-16 03:50:56 +02:00
|
|
|
if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
|
2003-07-25 22:18:01 +02:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode_for_file_access(),
|
2023-11-17 03:20:53 +01:00
|
|
|
errmsg("could not seek to block %lld of temporary file",
|
|
|
|
(long long) blocknum)));
|
2023-01-16 09:20:44 +01:00
|
|
|
BufFileReadExact(lts->pfile, buffer, BLCKSZ);
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
/*
|
|
|
|
* Read as many blocks as we can into the per-tape buffer.
|
|
|
|
*
|
|
|
|
* Returns true if anything was read, 'false' on EOF.
|
|
|
|
*/
|
|
|
|
static bool
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsReadFillBuffer(LogicalTape *lt)
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
{
|
|
|
|
lt->pos = 0;
|
|
|
|
lt->nbytes = 0;
|
|
|
|
|
|
|
|
do
|
|
|
|
{
|
2016-12-22 17:45:00 +01:00
|
|
|
char *thisbuf = lt->buffer + lt->nbytes;
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 datablocknum = lt->nextBlockNumber;
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
/* Fetch next block number */
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
if (datablocknum == -1L)
|
2016-12-22 17:45:00 +01:00
|
|
|
break; /* EOF */
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
/* Apply worker offset, needed for leader tapesets */
|
|
|
|
datablocknum += lt->offsetBlockNumber;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
|
|
|
/* Read the block */
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsReadBlock(lt->tapeSet, datablocknum, thisbuf);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
if (!lt->frozen)
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsReleaseBlock(lt->tapeSet, datablocknum);
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->curBlockNumber = lt->nextBlockNumber;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->nbytes += TapeBlockGetNBytes(thisbuf);
|
|
|
|
if (TapeBlockIsLast(thisbuf))
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
{
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->nextBlockNumber = -1L;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
/* EOF */
|
|
|
|
break;
|
|
|
|
}
|
2016-12-22 17:45:00 +01:00
|
|
|
else
|
|
|
|
lt->nextBlockNumber = TapeBlockGetTrailer(thisbuf)->next;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
|
|
|
/* Advance to next block, if we have buffer space left */
|
2016-12-22 17:45:00 +01:00
|
|
|
} while (lt->buffer_size - lt->nbytes > BLCKSZ);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
|
|
|
return (lt->nbytes > 0);
|
|
|
|
}
|
|
|
|
|
2023-11-17 03:20:53 +01:00
|
|
|
static inline uint64
|
|
|
|
left_offset(uint64 i)
|
2020-02-06 19:06:25 +01:00
|
|
|
{
|
|
|
|
return 2 * i + 1;
|
|
|
|
}
|
|
|
|
|
2023-11-17 03:20:53 +01:00
|
|
|
static inline uint64
|
|
|
|
right_offset(uint64 i)
|
2020-02-06 19:06:25 +01:00
|
|
|
{
|
|
|
|
return 2 * i + 2;
|
|
|
|
}
|
|
|
|
|
2023-11-17 03:20:53 +01:00
|
|
|
static inline uint64
|
|
|
|
parent_offset(uint64 i)
|
2006-03-08 00:46:24 +01:00
|
|
|
{
|
2020-02-06 19:06:25 +01:00
|
|
|
return (i - 1) / 2;
|
2006-03-08 00:46:24 +01:00
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
2020-09-12 02:10:02 +02:00
|
|
|
* Get the next block for writing.
|
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
static int64
|
2020-09-12 02:10:02 +02:00
|
|
|
ltsGetBlock(LogicalTapeSet *lts, LogicalTape *lt)
|
|
|
|
{
|
|
|
|
if (lts->enable_prealloc)
|
|
|
|
return ltsGetPreallocBlock(lts, lt);
|
|
|
|
else
|
|
|
|
return ltsGetFreeBlock(lts);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Select the lowest currently unused block from the tape set's global free
|
|
|
|
* list min heap.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
static int64
|
1999-10-16 21:49:28 +02:00
|
|
|
ltsGetFreeBlock(LogicalTapeSet *lts)
|
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 *heap = lts->freeBlocks;
|
|
|
|
int64 blocknum;
|
|
|
|
int64 heapsize;
|
|
|
|
int64 holeval;
|
|
|
|
uint64 holepos;
|
2020-02-06 19:06:25 +01:00
|
|
|
|
|
|
|
/* freelist empty; allocate a new block */
|
|
|
|
if (lts->nFreeBlocks == 0)
|
|
|
|
return lts->nBlocksAllocated++;
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
/* easy if heap contains one element */
|
2020-02-06 19:06:25 +01:00
|
|
|
if (lts->nFreeBlocks == 1)
|
2006-03-08 00:46:24 +01:00
|
|
|
{
|
2020-02-06 19:06:25 +01:00
|
|
|
lts->nFreeBlocks--;
|
|
|
|
return lts->freeBlocks[0];
|
2006-03-08 00:46:24 +01:00
|
|
|
}
|
2020-02-06 19:06:25 +01:00
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
/* remove top of minheap */
|
2020-02-06 19:06:25 +01:00
|
|
|
blocknum = heap[0];
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
/* we'll replace it with end of minheap array */
|
|
|
|
holeval = heap[--lts->nFreeBlocks];
|
2020-02-06 19:06:25 +01:00
|
|
|
|
|
|
|
/* sift down */
|
2021-12-14 19:35:22 +01:00
|
|
|
holepos = 0; /* holepos is where the "hole" is */
|
2020-02-06 19:06:25 +01:00
|
|
|
heapsize = lts->nFreeBlocks;
|
|
|
|
while (true)
|
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
uint64 left = left_offset(holepos);
|
|
|
|
uint64 right = right_offset(holepos);
|
|
|
|
uint64 min_child;
|
2020-02-06 19:06:25 +01:00
|
|
|
|
|
|
|
if (left < heapsize && right < heapsize)
|
|
|
|
min_child = (heap[left] < heap[right]) ? left : right;
|
|
|
|
else if (left < heapsize)
|
|
|
|
min_child = left;
|
|
|
|
else if (right < heapsize)
|
|
|
|
min_child = right;
|
|
|
|
else
|
|
|
|
break;
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
if (heap[min_child] >= holeval)
|
2020-02-06 19:06:25 +01:00
|
|
|
break;
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
heap[holepos] = heap[min_child];
|
|
|
|
holepos = min_child;
|
2020-02-06 19:06:25 +01:00
|
|
|
}
|
2021-12-14 19:35:22 +01:00
|
|
|
heap[holepos] = holeval;
|
2020-02-06 19:06:25 +01:00
|
|
|
|
|
|
|
return blocknum;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
2020-05-27 01:06:30 +02:00
|
|
|
/*
|
|
|
|
* Return the lowest free block number from the tape's preallocation list.
|
2020-09-12 02:10:02 +02:00
|
|
|
* Refill the preallocation list with blocks from the tape set's free list if
|
|
|
|
* necessary.
|
2020-05-27 01:06:30 +02:00
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
static int64
|
2020-05-27 01:06:30 +02:00
|
|
|
ltsGetPreallocBlock(LogicalTapeSet *lts, LogicalTape *lt)
|
|
|
|
{
|
|
|
|
/* sorted in descending order, so return the last element */
|
|
|
|
if (lt->nprealloc > 0)
|
|
|
|
return lt->prealloc[--lt->nprealloc];
|
|
|
|
|
|
|
|
if (lt->prealloc == NULL)
|
|
|
|
{
|
|
|
|
lt->prealloc_size = TAPE_WRITE_PREALLOC_MIN;
|
2023-11-17 03:20:53 +01:00
|
|
|
lt->prealloc = (int64 *) palloc(sizeof(int64) * lt->prealloc_size);
|
2020-05-27 01:06:30 +02:00
|
|
|
}
|
|
|
|
else if (lt->prealloc_size < TAPE_WRITE_PREALLOC_MAX)
|
|
|
|
{
|
|
|
|
/* when the preallocation list runs out, double the size */
|
|
|
|
lt->prealloc_size *= 2;
|
|
|
|
if (lt->prealloc_size > TAPE_WRITE_PREALLOC_MAX)
|
|
|
|
lt->prealloc_size = TAPE_WRITE_PREALLOC_MAX;
|
2023-11-17 03:20:53 +01:00
|
|
|
lt->prealloc = (int64 *) repalloc(lt->prealloc,
|
|
|
|
sizeof(int64) * lt->prealloc_size);
|
2020-05-27 01:06:30 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* refill preallocation list */
|
|
|
|
lt->nprealloc = lt->prealloc_size;
|
|
|
|
for (int i = lt->nprealloc; i > 0; i--)
|
|
|
|
{
|
|
|
|
lt->prealloc[i - 1] = ltsGetFreeBlock(lts);
|
|
|
|
|
|
|
|
/* verify descending order */
|
|
|
|
Assert(i == lt->nprealloc || lt->prealloc[i - 1] > lt->prealloc[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
return lt->prealloc[--lt->nprealloc];
|
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* Return a block# to the freelist.
|
|
|
|
*/
|
|
|
|
static void
|
2023-11-17 03:20:53 +01:00
|
|
|
ltsReleaseBlock(LogicalTapeSet *lts, int64 blocknum)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 *heap;
|
|
|
|
uint64 holepos;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2006-03-07 20:06:50 +01:00
|
|
|
/*
|
|
|
|
* Do nothing if we're no longer interested in remembering free space.
|
|
|
|
*/
|
|
|
|
if (lts->forgetFreeSpace)
|
|
|
|
return;
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* Enlarge freeBlocks array if full.
|
|
|
|
*/
|
|
|
|
if (lts->nFreeBlocks >= lts->freeBlocksLen)
|
|
|
|
{
|
2020-02-06 19:06:25 +01:00
|
|
|
/*
|
|
|
|
* If the freelist becomes very large, just return and leak this free
|
|
|
|
* block.
|
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
if (lts->freeBlocksLen * 2 * sizeof(int64) > MaxAllocSize)
|
2020-02-06 19:06:25 +01:00
|
|
|
return;
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
lts->freeBlocksLen *= 2;
|
2023-11-17 03:20:53 +01:00
|
|
|
lts->freeBlocks = (int64 *) repalloc(lts->freeBlocks,
|
|
|
|
lts->freeBlocksLen * sizeof(int64));
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
/* create a "hole" at end of minheap array */
|
2020-02-06 19:06:25 +01:00
|
|
|
heap = lts->freeBlocks;
|
2021-12-14 19:35:22 +01:00
|
|
|
holepos = lts->nFreeBlocks;
|
2020-02-06 19:06:25 +01:00
|
|
|
lts->nFreeBlocks++;
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
/* sift up to insert blocknum */
|
|
|
|
while (holepos != 0)
|
2020-02-06 19:06:25 +01:00
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
uint64 parent = parent_offset(holepos);
|
2020-05-14 19:06:38 +02:00
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
if (heap[parent] < blocknum)
|
2020-02-06 19:06:25 +01:00
|
|
|
break;
|
|
|
|
|
2021-12-14 19:35:22 +01:00
|
|
|
heap[holepos] = heap[parent];
|
|
|
|
holepos = parent;
|
2020-02-06 19:06:25 +01:00
|
|
|
}
|
2021-12-14 19:35:22 +01:00
|
|
|
heap[holepos] = blocknum;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
2020-02-13 18:43:51 +01:00
|
|
|
/*
|
|
|
|
* Lazily allocate and initialize the read buffer. This avoids waste when many
|
|
|
|
* tapes are open at once, but not all are active between rewinding and
|
|
|
|
* reading.
|
|
|
|
*/
|
|
|
|
static void
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsInitReadBuffer(LogicalTape *lt)
|
2020-02-13 18:43:51 +01:00
|
|
|
{
|
2020-02-18 21:31:24 +01:00
|
|
|
Assert(lt->buffer_size > 0);
|
|
|
|
lt->buffer = palloc(lt->buffer_size);
|
2020-02-13 18:43:51 +01:00
|
|
|
|
|
|
|
/* Read the first block, or reset if tape is empty */
|
|
|
|
lt->nextBlockNumber = lt->firstBlockNumber;
|
|
|
|
lt->pos = 0;
|
|
|
|
lt->nbytes = 0;
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsReadFillBuffer(lt);
|
2020-02-13 18:43:51 +01:00
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
2021-10-18 13:30:00 +02:00
|
|
|
* Create a tape set, backed by a temporary underlying file.
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
2021-10-18 13:30:00 +02:00
|
|
|
* The tape set is initially empty. Use LogicalTapeCreate() to create
|
|
|
|
* tapes in it.
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
2023-01-23 10:56:43 +01:00
|
|
|
* In a single-process sort, pass NULL argument for fileset, and -1 for
|
|
|
|
* worker.
|
2021-10-18 13:30:00 +02:00
|
|
|
*
|
2023-01-23 10:56:43 +01:00
|
|
|
* In a parallel sort, parallel workers pass the shared fileset handle and
|
|
|
|
* their own worker number. After the workers have finished, create the
|
|
|
|
* tape set in the leader, passing the shared fileset handle and -1 for
|
|
|
|
* worker, and use LogicalTapeImport() to import the worker tapes into it.
|
2021-10-18 13:30:00 +02:00
|
|
|
*
|
|
|
|
* Currently, the leader will only import worker tapes into the set, it does
|
|
|
|
* not create tapes of its own, although in principle that should work.
|
2023-01-23 10:56:43 +01:00
|
|
|
*
|
|
|
|
* If preallocate is true, blocks for each individual tape are allocated in
|
|
|
|
* batches. This avoids fragmentation when writing multiple tapes at the
|
|
|
|
* same time.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
LogicalTapeSet *
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeSetCreate(bool preallocate, SharedFileSet *fileset, int worker)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
LogicalTapeSet *lts;
|
|
|
|
|
|
|
|
/*
|
2015-02-21 22:12:14 +01:00
|
|
|
* Create top-level struct including per-tape LogicalTape structs.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2020-03-07 02:28:53 +01:00
|
|
|
lts = (LogicalTapeSet *) palloc(sizeof(LogicalTapeSet));
|
2017-02-01 11:17:38 +01:00
|
|
|
lts->nBlocksAllocated = 0L;
|
|
|
|
lts->nBlocksWritten = 0L;
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
lts->nHoleBlocks = 0L;
|
2006-03-07 20:06:50 +01:00
|
|
|
lts->forgetFreeSpace = false;
|
1999-10-16 21:49:28 +02:00
|
|
|
lts->freeBlocksLen = 32; /* reasonable initial guess */
|
2023-11-17 03:20:53 +01:00
|
|
|
lts->freeBlocks = (int64 *) palloc(lts->freeBlocksLen * sizeof(int64));
|
1999-10-16 21:49:28 +02:00
|
|
|
lts->nFreeBlocks = 0;
|
2020-09-12 02:10:02 +02:00
|
|
|
lts->enable_prealloc = preallocate;
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2021-10-18 13:30:00 +02:00
|
|
|
lts->fileset = fileset;
|
|
|
|
lts->worker = worker;
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Create temp BufFile storage as required.
|
|
|
|
*
|
2021-10-18 13:30:00 +02:00
|
|
|
* In leader, we hijack the BufFile of the first tape that's imported, and
|
|
|
|
* concatenate the BufFiles of any subsequent tapes to that. Hence don't
|
|
|
|
* create a BufFile here. Things are simpler for the worker case and the
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
* serial case, though. They are generally very similar -- workers use a
|
|
|
|
* shared fileset, whereas serial sorts use a conventional serial BufFile.
|
|
|
|
*/
|
2021-10-18 13:30:00 +02:00
|
|
|
if (fileset && worker == -1)
|
|
|
|
lts->pfile = NULL;
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
else if (fileset)
|
|
|
|
{
|
|
|
|
char filename[MAXPGPATH];
|
|
|
|
|
|
|
|
pg_itoa(worker, filename);
|
2021-08-30 05:15:35 +02:00
|
|
|
lts->pfile = BufFileCreateFileSet(&fileset->fs, filename);
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
lts->pfile = BufFileCreateTemp(false);
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
return lts;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-10-18 13:30:00 +02:00
|
|
|
* Claim ownership of a logical tape from an existing shared BufFile.
|
|
|
|
*
|
|
|
|
* Caller should be leader process. Though tapes are marked as frozen in
|
|
|
|
* workers, they are not frozen when opened within leader, since unfrozen tapes
|
|
|
|
* use a larger read buffer. (Frozen tapes have smaller read buffer, optimized
|
|
|
|
* for random access.)
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTape *
|
|
|
|
LogicalTapeImport(LogicalTapeSet *lts, int worker, TapeShare *shared)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
LogicalTape *lt;
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 tapeblocks;
|
2021-10-18 13:30:00 +02:00
|
|
|
char filename[MAXPGPATH];
|
|
|
|
BufFile *file;
|
|
|
|
int64 filesize;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2021-10-18 13:30:00 +02:00
|
|
|
lt = ltsCreateTape(lts);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* build concatenated view of all buffiles, remembering the block number
|
|
|
|
* where each source file begins.
|
|
|
|
*/
|
|
|
|
pg_itoa(worker, filename);
|
|
|
|
file = BufFileOpenFileSet(<s->fileset->fs, filename, O_RDONLY, false);
|
|
|
|
filesize = BufFileSize(file);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Stash first BufFile, and concatenate subsequent BufFiles to that. Store
|
|
|
|
* block offset into each tape as we go.
|
|
|
|
*/
|
|
|
|
lt->firstBlockNumber = shared->firstblocknumber;
|
|
|
|
if (lts->pfile == NULL)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2021-10-18 13:30:00 +02:00
|
|
|
lts->pfile = file;
|
|
|
|
lt->offsetBlockNumber = 0L;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2021-10-18 13:30:00 +02:00
|
|
|
else
|
|
|
|
{
|
|
|
|
lt->offsetBlockNumber = BufFileAppend(lts->pfile, file);
|
|
|
|
}
|
|
|
|
/* Don't allocate more for read buffer than could possibly help */
|
|
|
|
lt->max_size = Min(MaxAllocSize, filesize);
|
|
|
|
tapeblocks = filesize / BLCKSZ;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Update # of allocated blocks and # blocks written to reflect the
|
|
|
|
* imported BufFile. Allocated/written blocks include space used by holes
|
|
|
|
* left between concatenated BufFiles. Also track the number of hole
|
|
|
|
* blocks so that we can later work backwards to calculate the number of
|
|
|
|
* physical blocks for instrumentation.
|
|
|
|
*/
|
|
|
|
lts->nHoleBlocks += lt->offsetBlockNumber - lts->nBlocksAllocated;
|
|
|
|
|
|
|
|
lts->nBlocksAllocated = lt->offsetBlockNumber + tapeblocks;
|
|
|
|
lts->nBlocksWritten = lts->nBlocksAllocated;
|
|
|
|
|
|
|
|
return lt;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Close a logical tape set and release all resources.
|
|
|
|
*
|
|
|
|
* NOTE: This doesn't close any of the tapes! You must close them
|
|
|
|
* first, or you can let them be destroyed along with the memory context.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LogicalTapeSetClose(LogicalTapeSet *lts)
|
|
|
|
{
|
|
|
|
BufFileClose(lts->pfile);
|
1999-10-16 21:49:28 +02:00
|
|
|
pfree(lts->freeBlocks);
|
|
|
|
pfree(lts);
|
|
|
|
}
|
|
|
|
|
2021-10-18 13:30:00 +02:00
|
|
|
/*
|
|
|
|
* Create a logical tape in the given tapeset.
|
|
|
|
*
|
|
|
|
* The tape is initialized in write state.
|
|
|
|
*/
|
|
|
|
LogicalTape *
|
|
|
|
LogicalTapeCreate(LogicalTapeSet *lts)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* The only thing that currently prevents creating new tapes in leader is
|
|
|
|
* the fact that BufFiles opened using BufFileOpenShared() are read-only
|
|
|
|
* by definition, but that could be changed if it seemed worthwhile. For
|
|
|
|
* now, writing to the leader tape will raise a "Bad file descriptor"
|
|
|
|
* error, so tuplesort must avoid writing to the leader tape altogether.
|
|
|
|
*/
|
|
|
|
if (lts->fileset && lts->worker == -1)
|
|
|
|
elog(ERROR, "cannot create new tapes in leader process");
|
|
|
|
|
|
|
|
return ltsCreateTape(lts);
|
|
|
|
}
|
|
|
|
|
|
|
|
static LogicalTape *
|
|
|
|
ltsCreateTape(LogicalTapeSet *lts)
|
|
|
|
{
|
|
|
|
LogicalTape *lt;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create per-tape struct. Note we allocate the I/O buffer lazily.
|
|
|
|
*/
|
|
|
|
lt = palloc(sizeof(LogicalTape));
|
|
|
|
lt->tapeSet = lts;
|
|
|
|
lt->writing = true;
|
|
|
|
lt->frozen = false;
|
|
|
|
lt->dirty = false;
|
|
|
|
lt->firstBlockNumber = -1L;
|
|
|
|
lt->curBlockNumber = -1L;
|
|
|
|
lt->nextBlockNumber = -1L;
|
|
|
|
lt->offsetBlockNumber = 0L;
|
|
|
|
lt->buffer = NULL;
|
|
|
|
lt->buffer_size = 0;
|
|
|
|
/* palloc() larger than MaxAllocSize would fail */
|
|
|
|
lt->max_size = MaxAllocSize;
|
|
|
|
lt->pos = 0;
|
|
|
|
lt->nbytes = 0;
|
|
|
|
lt->prealloc = NULL;
|
|
|
|
lt->nprealloc = 0;
|
|
|
|
lt->prealloc_size = 0;
|
|
|
|
|
|
|
|
return lt;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Close a logical tape.
|
|
|
|
*
|
|
|
|
* Note: This doesn't return any blocks to the free list! You must read
|
|
|
|
* the tape to the end first, to reuse the space. In current use, though,
|
|
|
|
* we only close tapes after fully reading them.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LogicalTapeClose(LogicalTape *lt)
|
|
|
|
{
|
|
|
|
if (lt->buffer)
|
|
|
|
pfree(lt->buffer);
|
|
|
|
pfree(lt);
|
|
|
|
}
|
|
|
|
|
2006-03-07 20:06:50 +01:00
|
|
|
/*
|
|
|
|
* Mark a logical tape set as not needing management of free space anymore.
|
|
|
|
*
|
|
|
|
* This should be called if the caller does not intend to write any more data
|
|
|
|
* into the tape set, but is reading from un-frozen tapes. Since no more
|
|
|
|
* writes are planned, remembering free blocks is no longer useful. Setting
|
|
|
|
* this flag lets us avoid wasting time and space in ltsReleaseBlock(), which
|
|
|
|
* is not designed to handle large numbers of free blocks.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
LogicalTapeSetForgetFreeSpace(LogicalTapeSet *lts)
|
|
|
|
{
|
|
|
|
lts->forgetFreeSpace = true;
|
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* Write to a logical tape.
|
|
|
|
*
|
2003-07-25 22:18:01 +02:00
|
|
|
* There are no error returns; we ereport() on failure.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
void
|
2022-12-30 10:02:59 +01:00
|
|
|
LogicalTapeWrite(LogicalTape *lt, const void *ptr, size_t size)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeSet *lts = lt->tapeSet;
|
1999-10-16 21:49:28 +02:00
|
|
|
size_t nthistime;
|
|
|
|
|
|
|
|
Assert(lt->writing);
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
Assert(lt->offsetBlockNumber == 0L);
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
/* Allocate data buffer and first block on first write */
|
2006-02-19 06:58:36 +01:00
|
|
|
if (lt->buffer == NULL)
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
{
|
2006-02-19 06:58:36 +01:00
|
|
|
lt->buffer = (char *) palloc(BLCKSZ);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
lt->buffer_size = BLCKSZ;
|
|
|
|
}
|
2016-12-22 17:45:00 +01:00
|
|
|
if (lt->curBlockNumber == -1)
|
2006-02-19 06:58:36 +01:00
|
|
|
{
|
2016-12-22 17:45:00 +01:00
|
|
|
Assert(lt->firstBlockNumber == -1);
|
|
|
|
Assert(lt->pos == 0);
|
|
|
|
|
2020-09-12 02:10:02 +02:00
|
|
|
lt->curBlockNumber = ltsGetBlock(lts, lt);
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->firstBlockNumber = lt->curBlockNumber;
|
|
|
|
|
|
|
|
TapeBlockGetTrailer(lt->buffer)->prev = -1L;
|
2006-02-19 06:58:36 +01:00
|
|
|
}
|
|
|
|
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
Assert(lt->buffer_size == BLCKSZ);
|
1999-10-16 21:49:28 +02:00
|
|
|
while (size > 0)
|
|
|
|
{
|
2020-06-07 18:14:24 +02:00
|
|
|
if (lt->pos >= (int) TapeBlockPayloadSize)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
/* Buffer full, dump it out */
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 nextBlockNumber;
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
if (!lt->dirty)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
/* Hmm, went directly from reading to writing? */
|
2003-07-25 22:18:01 +02:00
|
|
|
elog(ERROR, "invalid logtape state: should be dirty");
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* First allocate the next block, so that we can store it in the
|
|
|
|
* 'next' pointer of this block.
|
|
|
|
*/
|
2021-10-18 13:30:00 +02:00
|
|
|
nextBlockNumber = ltsGetBlock(lt->tapeSet, lt);
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
/* set the next-pointer and dump the current block. */
|
|
|
|
TapeBlockGetTrailer(lt->buffer)->next = nextBlockNumber;
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsWriteBlock(lt->tapeSet, lt->curBlockNumber, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
/* initialize the prev-pointer of the next block */
|
|
|
|
TapeBlockGetTrailer(lt->buffer)->prev = lt->curBlockNumber;
|
|
|
|
lt->curBlockNumber = nextBlockNumber;
|
1999-10-16 21:49:28 +02:00
|
|
|
lt->pos = 0;
|
|
|
|
lt->nbytes = 0;
|
|
|
|
}
|
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
nthistime = TapeBlockPayloadSize - lt->pos;
|
1999-10-16 21:49:28 +02:00
|
|
|
if (nthistime > size)
|
|
|
|
nthistime = size;
|
|
|
|
Assert(nthistime > 0);
|
|
|
|
|
|
|
|
memcpy(lt->buffer + lt->pos, ptr, nthistime);
|
|
|
|
|
|
|
|
lt->dirty = true;
|
|
|
|
lt->pos += nthistime;
|
|
|
|
if (lt->nbytes < lt->pos)
|
|
|
|
lt->nbytes = lt->pos;
|
2022-12-30 10:02:59 +01:00
|
|
|
ptr = (const char *) ptr + nthistime;
|
1999-10-16 21:49:28 +02:00
|
|
|
size -= nthistime;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2016-10-12 11:05:45 +02:00
|
|
|
* Rewind logical tape and switch from writing to reading.
|
1999-10-16 21:49:28 +02:00
|
|
|
*
|
2016-10-12 11:05:45 +02:00
|
|
|
* The tape must currently be in writing state, or "frozen" in read state.
|
|
|
|
*
|
2016-12-22 17:45:00 +01:00
|
|
|
* 'buffer_size' specifies how much memory to use for the read buffer.
|
|
|
|
* Regardless of the argument, the actual amount of memory used is between
|
|
|
|
* BLCKSZ and MaxAllocSize, and is a multiple of BLCKSZ. The given value is
|
|
|
|
* rounded down and truncated to fit those constraints, if necessary. If the
|
|
|
|
* tape is frozen, the 'buffer_size' argument is ignored, and a small BLCKSZ
|
|
|
|
* byte buffer is used.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
void
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeRewindForRead(LogicalTape *lt, size_t buffer_size)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeSet *lts = lt->tapeSet;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2016-10-12 11:05:45 +02:00
|
|
|
/*
|
|
|
|
* Round and cap buffer_size if needed.
|
|
|
|
*/
|
|
|
|
if (lt->frozen)
|
|
|
|
buffer_size = BLCKSZ;
|
|
|
|
else
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2016-10-12 11:05:45 +02:00
|
|
|
/* need at least one block */
|
|
|
|
if (buffer_size < BLCKSZ)
|
|
|
|
buffer_size = BLCKSZ;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
/* palloc() larger than max_size is unlikely to be helpful */
|
|
|
|
if (buffer_size > lt->max_size)
|
|
|
|
buffer_size = lt->max_size;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
2016-10-12 11:05:45 +02:00
|
|
|
/* round down to BLCKSZ boundary */
|
|
|
|
buffer_size -= buffer_size % BLCKSZ;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (lt->writing)
|
|
|
|
{
|
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Completion of a write phase. Flush last partial data block, and
|
|
|
|
* rewind for normal (destructive) read.
|
2016-10-12 11:05:45 +02:00
|
|
|
*/
|
|
|
|
if (lt->dirty)
|
2016-12-22 17:45:00 +01:00
|
|
|
{
|
2018-02-22 15:28:12 +01:00
|
|
|
/*
|
|
|
|
* As long as we've filled the buffer at least once, its contents
|
|
|
|
* are entirely defined from valgrind's point of view, even though
|
|
|
|
* contents beyond the current end point may be stale. But it's
|
|
|
|
* possible - at least in the case of a parallel sort - to sort
|
|
|
|
* such small amount of data that we do not fill the buffer even
|
|
|
|
* once. Tell valgrind that its contents are defined, so it
|
|
|
|
* doesn't bleat.
|
|
|
|
*/
|
|
|
|
VALGRIND_MAKE_MEM_DEFINED(lt->buffer + lt->nbytes,
|
|
|
|
lt->buffer_size - lt->nbytes);
|
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
TapeBlockSetNBytes(lt->buffer, lt->nbytes);
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsWriteBlock(lt->tapeSet, lt->curBlockNumber, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
}
|
2016-10-12 11:05:45 +02:00
|
|
|
lt->writing = false;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2016-10-12 11:05:45 +02:00
|
|
|
* This is only OK if tape is frozen; we rewind for (another) read
|
|
|
|
* pass.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2016-10-12 11:05:45 +02:00
|
|
|
Assert(lt->frozen);
|
|
|
|
}
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2016-10-12 11:05:45 +02:00
|
|
|
if (lt->buffer)
|
|
|
|
pfree(lt->buffer);
|
2020-02-18 21:31:24 +01:00
|
|
|
|
|
|
|
/* the buffer is lazily allocated, but set the size here */
|
2016-10-12 11:05:45 +02:00
|
|
|
lt->buffer = NULL;
|
2020-02-18 21:31:24 +01:00
|
|
|
lt->buffer_size = buffer_size;
|
2020-05-27 01:06:30 +02:00
|
|
|
|
|
|
|
/* free the preallocation list, and return unused block numbers */
|
|
|
|
if (lt->prealloc != NULL)
|
|
|
|
{
|
|
|
|
for (int i = lt->nprealloc; i > 0; i--)
|
|
|
|
ltsReleaseBlock(lts, lt->prealloc[i - 1]);
|
|
|
|
pfree(lt->prealloc);
|
|
|
|
lt->prealloc = NULL;
|
|
|
|
lt->nprealloc = 0;
|
|
|
|
lt->prealloc_size = 0;
|
|
|
|
}
|
2016-10-12 11:05:45 +02:00
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* Read from a logical tape.
|
|
|
|
*
|
|
|
|
* Early EOF is indicated by return value less than #bytes requested.
|
|
|
|
*/
|
|
|
|
size_t
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeRead(LogicalTape *lt, void *ptr, size_t size)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
size_t nread = 0;
|
|
|
|
size_t nthistime;
|
|
|
|
|
|
|
|
Assert(!lt->writing);
|
|
|
|
|
2020-02-13 18:43:51 +01:00
|
|
|
if (lt->buffer == NULL)
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsInitReadBuffer(lt);
|
2020-02-13 18:43:51 +01:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
while (size > 0)
|
|
|
|
{
|
|
|
|
if (lt->pos >= lt->nbytes)
|
|
|
|
{
|
|
|
|
/* Try to load more data into buffer. */
|
2021-10-18 13:30:00 +02:00
|
|
|
if (!ltsReadFillBuffer(lt))
|
1999-10-16 21:49:28 +02:00
|
|
|
break; /* EOF */
|
|
|
|
}
|
|
|
|
|
|
|
|
nthistime = lt->nbytes - lt->pos;
|
|
|
|
if (nthistime > size)
|
|
|
|
nthistime = size;
|
|
|
|
Assert(nthistime > 0);
|
|
|
|
|
|
|
|
memcpy(ptr, lt->buffer + lt->pos, nthistime);
|
|
|
|
|
|
|
|
lt->pos += nthistime;
|
2022-12-30 10:02:59 +01:00
|
|
|
ptr = (char *) ptr + nthistime;
|
1999-10-16 21:49:28 +02:00
|
|
|
size -= nthistime;
|
|
|
|
nread += nthistime;
|
|
|
|
}
|
|
|
|
|
|
|
|
return nread;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* "Freeze" the contents of a tape so that it can be read multiple times
|
|
|
|
* and/or read backwards. Once a tape is frozen, its contents will not
|
|
|
|
* be released until the LogicalTapeSet is destroyed. This is expected
|
|
|
|
* to be used only for the final output pass of a merge.
|
|
|
|
*
|
|
|
|
* This *must* be called just at the end of a write pass, before the
|
|
|
|
* tape is rewound (after rewind is too late!). It performs a rewind
|
|
|
|
* and switch to read mode "for free". An immediately following rewind-
|
|
|
|
* for-read call is OK but not necessary.
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
*
|
|
|
|
* share output argument is set with details of storage used for tape after
|
|
|
|
* freezing, which may be passed to LogicalTapeSetCreate within leader
|
|
|
|
* process later. This metadata is only of interest to worker callers
|
|
|
|
* freezing their final output for leader (single materialized tape).
|
|
|
|
* Serial sorts should set share to NULL.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
void
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeFreeze(LogicalTape *lt, TapeShare *share)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeSet *lts = lt->tapeSet;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
Assert(lt->writing);
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
Assert(lt->offsetBlockNumber == 0L);
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Completion of a write phase. Flush last partial data block, and rewind
|
|
|
|
* for nondestructive read.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
|
|
|
if (lt->dirty)
|
2016-12-22 17:45:00 +01:00
|
|
|
{
|
2018-02-06 20:24:57 +01:00
|
|
|
/*
|
|
|
|
* As long as we've filled the buffer at least once, its contents are
|
|
|
|
* entirely defined from valgrind's point of view, even though
|
|
|
|
* contents beyond the current end point may be stale. But it's
|
|
|
|
* possible - at least in the case of a parallel sort - to sort such
|
|
|
|
* small amount of data that we do not fill the buffer even once. Tell
|
|
|
|
* valgrind that its contents are defined, so it doesn't bleat.
|
|
|
|
*/
|
|
|
|
VALGRIND_MAKE_MEM_DEFINED(lt->buffer + lt->nbytes,
|
|
|
|
lt->buffer_size - lt->nbytes);
|
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
TapeBlockSetNBytes(lt->buffer, lt->nbytes);
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsWriteBlock(lt->tapeSet, lt->curBlockNumber, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
}
|
1999-10-16 21:49:28 +02:00
|
|
|
lt->writing = false;
|
|
|
|
lt->frozen = true;
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The seek and backspace functions assume a single block read buffer.
|
|
|
|
* That's OK with current usage. A larger buffer is helpful to make the
|
|
|
|
* read pattern of the backing file look more sequential to the OS, when
|
|
|
|
* we're reading from multiple tapes. But at the end of a sort, when a
|
|
|
|
* tape is frozen, we only read from a single tape anyway.
|
|
|
|
*/
|
|
|
|
if (!lt->buffer || lt->buffer_size != BLCKSZ)
|
|
|
|
{
|
|
|
|
if (lt->buffer)
|
|
|
|
pfree(lt->buffer);
|
|
|
|
lt->buffer = palloc(BLCKSZ);
|
|
|
|
lt->buffer_size = BLCKSZ;
|
|
|
|
}
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/* Read the first block, or reset if tape is empty */
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->curBlockNumber = lt->firstBlockNumber;
|
1999-10-16 21:49:28 +02:00
|
|
|
lt->pos = 0;
|
|
|
|
lt->nbytes = 0;
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
if (lt->firstBlockNumber == -1L)
|
|
|
|
lt->nextBlockNumber = -1L;
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsReadBlock(lt->tapeSet, lt->curBlockNumber, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
if (TapeBlockIsLast(lt->buffer))
|
|
|
|
lt->nextBlockNumber = -1L;
|
|
|
|
else
|
|
|
|
lt->nextBlockNumber = TapeBlockGetTrailer(lt->buffer)->next;
|
|
|
|
lt->nbytes = TapeBlockGetNBytes(lt->buffer);
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
|
|
|
|
/* Handle extra steps when caller is to share its tapeset */
|
|
|
|
if (share)
|
|
|
|
{
|
2021-08-30 05:15:35 +02:00
|
|
|
BufFileExportFileSet(lts->pfile);
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
share->firstblocknumber = lt->firstBlockNumber;
|
|
|
|
}
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Backspace the tape a given number of bytes. (We also support a more
|
|
|
|
* general seek interface, see below.)
|
|
|
|
*
|
|
|
|
* *Only* a frozen-for-read tape can be backed up; we don't support
|
|
|
|
* random access during write, and an unfrozen read tape may have
|
|
|
|
* already discarded the desired data!
|
|
|
|
*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Returns the number of bytes backed up. It can be less than the
|
|
|
|
* requested amount, if there isn't that much data before the current
|
|
|
|
* position. The tape is positioned to the beginning of the tape in
|
|
|
|
* that case.
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2016-12-22 17:45:00 +01:00
|
|
|
size_t
|
2021-10-18 13:30:00 +02:00
|
|
|
LogicalTapeBackspace(LogicalTape *lt, size_t size)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2016-12-22 17:45:00 +01:00
|
|
|
size_t seekpos = 0;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
|
|
|
Assert(lt->frozen);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
Assert(lt->buffer_size == BLCKSZ);
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2020-02-13 18:43:51 +01:00
|
|
|
if (lt->buffer == NULL)
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsInitReadBuffer(lt);
|
2020-02-13 18:43:51 +01:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
|
|
|
* Easy case for seek within current block.
|
|
|
|
*/
|
|
|
|
if (size <= (size_t) lt->pos)
|
|
|
|
{
|
|
|
|
lt->pos -= (int) size;
|
2016-12-22 17:45:00 +01:00
|
|
|
return size;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2000-04-12 19:17:23 +02:00
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
/*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Not-so-easy case, have to walk back the chain of blocks. This
|
|
|
|
* implementation would be pretty inefficient for long seeks, but we
|
|
|
|
* really aren't doing that (a seek over one tuple is typical).
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2016-12-22 17:45:00 +01:00
|
|
|
seekpos = (size_t) lt->pos; /* part within this block */
|
|
|
|
while (size > seekpos)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2023-11-17 03:20:53 +01:00
|
|
|
int64 prev = TapeBlockGetTrailer(lt->buffer)->prev;
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
if (prev == -1L)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2016-12-22 17:45:00 +01:00
|
|
|
/* Tried to back up beyond the beginning of tape. */
|
|
|
|
if (lt->curBlockNumber != lt->firstBlockNumber)
|
|
|
|
elog(ERROR, "unexpected end of tape");
|
|
|
|
lt->pos = 0;
|
|
|
|
return seekpos;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2016-12-22 17:45:00 +01:00
|
|
|
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsReadBlock(lt->tapeSet, prev, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
if (TapeBlockGetTrailer(lt->buffer)->next != lt->curBlockNumber)
|
2023-11-17 03:20:53 +01:00
|
|
|
elog(ERROR, "broken tape, next of block %lld is %lld, expected %lld",
|
|
|
|
(long long) prev,
|
|
|
|
(long long) (TapeBlockGetTrailer(lt->buffer)->next),
|
|
|
|
(long long) lt->curBlockNumber);
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
lt->nbytes = TapeBlockPayloadSize;
|
|
|
|
lt->curBlockNumber = prev;
|
|
|
|
lt->nextBlockNumber = TapeBlockGetTrailer(lt->buffer)->next;
|
|
|
|
|
|
|
|
seekpos += TapeBlockPayloadSize;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2016-12-22 17:45:00 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* 'seekpos' can now be greater than 'size', because it points to the
|
|
|
|
* beginning the target block. The difference is the position within the
|
|
|
|
* page.
|
|
|
|
*/
|
|
|
|
lt->pos = seekpos - size;
|
|
|
|
return size;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Seek to an arbitrary position in a logical tape.
|
|
|
|
*
|
|
|
|
* *Only* a frozen-for-read tape can be seeked.
|
|
|
|
*
|
2016-12-22 17:45:00 +01:00
|
|
|
* Must be called with a block/offset previously returned by
|
|
|
|
* LogicalTapeTell().
|
1999-10-16 21:49:28 +02:00
|
|
|
*/
|
2016-12-22 17:45:00 +01:00
|
|
|
void
|
2023-11-17 03:20:53 +01:00
|
|
|
LogicalTapeSeek(LogicalTape *lt, int64 blocknum, int offset)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
|
|
|
Assert(lt->frozen);
|
2016-12-22 17:45:00 +01:00
|
|
|
Assert(offset >= 0 && offset <= TapeBlockPayloadSize);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
Assert(lt->buffer_size == BLCKSZ);
|
1999-10-16 21:49:28 +02:00
|
|
|
|
2020-02-13 18:43:51 +01:00
|
|
|
if (lt->buffer == NULL)
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsInitReadBuffer(lt);
|
2020-02-13 18:43:51 +01:00
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
if (blocknum != lt->curBlockNumber)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2022-12-30 10:02:59 +01:00
|
|
|
ltsReadBlock(lt->tapeSet, blocknum, lt->buffer);
|
2016-12-22 17:45:00 +01:00
|
|
|
lt->curBlockNumber = blocknum;
|
|
|
|
lt->nbytes = TapeBlockPayloadSize;
|
|
|
|
lt->nextBlockNumber = TapeBlockGetTrailer(lt->buffer)->next;
|
1999-10-16 21:49:28 +02:00
|
|
|
}
|
2000-04-12 19:17:23 +02:00
|
|
|
|
2016-12-22 17:45:00 +01:00
|
|
|
if (offset > lt->nbytes)
|
|
|
|
elog(ERROR, "invalid tape seek position");
|
1999-10-16 21:49:28 +02:00
|
|
|
lt->pos = offset;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Obtain current position in a form suitable for a later LogicalTapeSeek.
|
|
|
|
*
|
|
|
|
* NOTE: it'd be OK to do this during write phase with intention of using
|
|
|
|
* the position for a seek after freezing. Not clear if anyone needs that.
|
|
|
|
*/
|
|
|
|
void
|
2023-11-17 03:20:53 +01:00
|
|
|
LogicalTapeTell(LogicalTape *lt, int64 *blocknum, int *offset)
|
1999-10-16 21:49:28 +02:00
|
|
|
{
|
2020-02-13 18:43:51 +01:00
|
|
|
if (lt->buffer == NULL)
|
2021-10-18 13:30:00 +02:00
|
|
|
ltsInitReadBuffer(lt);
|
2020-02-13 18:43:51 +01:00
|
|
|
|
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 19:25:55 +01:00
|
|
|
Assert(lt->offsetBlockNumber == 0L);
|
Change the way pre-reading in external sort's merge phase works.
Don't pre-read tuples into SortTuple slots during merge. Instead, use the
memory for larger read buffers in logtape.c. We're doing the same number
of READTUP() calls either way, but managing the pre-read SortTuple slots
is much more complicated. Also, the on-tape representation is more compact
than SortTuples, so we can fit more pre-read tuples into the same amount
of memory this way. And we have better cache-locality, when we use just a
small number of SortTuple slots.
Now that we only hold one tuple from each tape in the SortTuple slots, we
can greatly simplify the "batch memory" management. We now maintain a
small set of fixed-sized slots, to hold the tuples, and fall back to
palloc() for larger tuples. We use this method during all merge phases,
not just the final merge, and also when randomAccess is requested, and
also in the TSS_SORTEDONTAPE case. In other words, it's used whenever we
do an external sort.
Reviewed by Peter Geoghegan and Claudio Freire.
Discussion: <CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com>
2016-10-03 12:37:49 +02:00
|
|
|
|
|
|
|
/* With a larger buffer, 'pos' wouldn't be the same as offset within page */
|
|
|
|
Assert(lt->buffer_size == BLCKSZ);
|
|
|
|
|
1999-10-16 21:49:28 +02:00
|
|
|
*blocknum = lt->curBlockNumber;
|
|
|
|
*offset = lt->pos;
|
|
|
|
}
|
2005-10-19 00:59:37 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Obtain total disk space currently used by a LogicalTapeSet, in blocks.
|
2020-09-16 06:34:05 +02:00
|
|
|
*
|
|
|
|
* This should not be called while there are open write buffers; otherwise it
|
|
|
|
* may not account for buffered data.
|
2005-10-19 00:59:37 +02:00
|
|
|
*/
|
2023-11-17 03:20:53 +01:00
|
|
|
int64
|
2005-10-19 00:59:37 +02:00
|
|
|
LogicalTapeSetBlocks(LogicalTapeSet *lts)
|
|
|
|
{
|
2020-09-16 06:34:05 +02:00
|
|
|
return lts->nBlocksWritten - lts->nHoleBlocks;
|
2005-10-19 00:59:37 +02:00
|
|
|
}
|