performance issue during regular merge passes not only the 'final merge'
case. The original design contemplated that there would never be more
than about one free block per 'tape', hence no need for an efficient
method of keeping the free blocks sorted. But given the later addition
of merge preread behavior in tuplesort.c, there is likely to be about
work_mem worth of free blocks, which is not so small ... and for that
matter the number of tapes isn't necessarily small anymore either. So
we'd better get rid of the assumption entirely. Instead, I'm assuming
that the usage pattern will involve alternation between merge preread
and writing of a new run. This makes it reasonable to just add blocks
to the list without sorting during successive ltsReleaseBlock calls,
and then do a qsort() when we start getting ltsGetFreeBlock() calls.
Experimentation seems to confirm that there aren't many qsort calls
relative to the number of ltsReleaseBlock/ltsGetFreeBlock calls.
we are doing the final merge pass on-the-fly, and not writing the data
back onto a 'tape', the number of free blocks in the tape set will become
large, leading to a lot of time wasted in ltsReleaseBlock(). There is
really no need to track the free blocks anymore in this state, so add a
simple shutoff switch. Per report from Stefan Kaltenbrunner.
allocates the control data. The per-tape buffers are allocated only
on first use. This saves memory in situations where tuplesort.c
overestimates the number of tapes needed (ie, there are fewer runs
than tapes). Also, this makes legitimate the coding in inittapes()
that includes tape buffer space in the maximum-memory calculation:
when inittapes runs, we've already expended the whole allowed memory
on tuple storage, and so we'd better not allocate all the tape buffers
until we've flushed some tuples out of memory.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
(materialization into a tuple store) discussed on pgsql-hackers earlier.
I've updated the documentation and the regression tests.
Notes on the implementation:
- I needed to change the tuple store API slightly -- it assumes that it
won't be used to hold data across transaction boundaries, so the temp
files that it uses for on-disk storage are automatically reclaimed at
end-of-transaction. I added a flag to tuplestore_begin_heap() to control
this behavior. Is changing the tuple store API in this fashion OK?
- in order to store executor results in a tuple store, I added a new
CommandDest. This works well for the most part, with one exception: the
current DestFunction API doesn't provide enough information to allow the
Executor to store results into an arbitrary tuple store (where the
particular tuple store to use is chosen by the call site of
ExecutorRun). To workaround this, I've temporarily hacked up a solution
that works, but is not ideal: since the receiveTuple DestFunction is
passed the portal name, we can use that to lookup the Portal data
structure for the cursor and then use that to get at the tuple store the
Portal is using. This unnecessarily ties the Portal code with the
tupleReceiver code, but it works...
The proper fix for this is probably to change the DestFunction API --
Tom suggested passing the full QueryDesc to the receiveTuple function.
In that case, callers of ExecutorRun could "subclass" QueryDesc to add
any additional fields that their particular CommandDest needed to get
access to. This approach would work, but I'd like to think about it for
a little bit longer before deciding which route to go. In the mean time,
the code works fine, so I don't think a fix is urgent.
- (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and
adjusted the behavior of SCROLL in accordance with the discussion on
-hackers.
- (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml
Neil Conway
running gcc and HP's cc with warnings cranked way up. Signed vs unsigned
comparisons, routines declared static and then defined not-static,
that kind of thing. Tedious, but perhaps useful...
a generalized module 'tuplesort.c' that can sort either HeapTuples or
IndexTuples, and is not tied to execution of a Sort node. Clean up
memory leakages in sorting, and replace nbtsort.c's private implementation
of mergesorting with calls to tuplesort.c.
recycle storage within sort temp file on a block-by-block basis. This
reduces peak disk usage to essentially just the volume of data being
sorted, whereas it had been about 4x the data volume before.