Commit Graph

427 Commits

Author SHA1 Message Date
Robert Haas 1d41739e5a Don't require sort support functions to provide a comparator.
This could be useful for datatypes like text, where we might want
to optimize for some collations but not others.  However, this patch
doesn't introduce any new sortsupport functions that work this way;
it merely revises the code so that future patches may do so.

Patch by me.  Review by Peter Geoghegan.
2014-08-06 16:06:06 -04:00
Peter Eisentraut 80ddd04b4d Fix whitespace 2014-07-11 15:12:11 -04:00
Robert Haas 9f03ca9151 Avoid copying index tuples when building an index.
The previous code, perhaps out of concern for avoid memory leaks, formed
the tuple in one memory context and then copied it to another memory
context.  However, this doesn't appear to be necessary, since
index_form_tuple and the functions it calls take precautions against
leaking memory.  In my testing, building the tuple directly inside the
sort context shaves several percent off the index build time.
Rearrange things so we do that.

Patch by me.  Review by Amit Kapila, Tom Lane, Andres Freund.
2014-07-01 10:34:42 -04:00
Tom Lane 6554656ea2 Improve tuplestore's error messages for I/O failures.
We should report the errno when we get a failure from functions like
BufFileWrite.  "ERROR: write failed" is unreasonably taciturn for a
case that's well within the realm of possibility; I've seen it a
couple times in the buildfarm recently, in situations that were
probably out-of-disk-space, but it'd be good to see the errno
to confirm it.

I think this code was originally written without assuming that
the buffile.c functions would return useful errno; but most other
callers *are* assuming that, and a quick look at the buffile code
gives no reason to suppose otherwise.

Also, a couple of the old messages were phrased on the assumption
that a short read might indicate a logic bug in tuplestore itself;
but that code's pretty well tested by now, so a filesystem-level
problem seems much more likely.
2014-06-12 18:59:06 -04:00
Bruce Momjian 0a78320057 pgindent run for 9.4
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
2014-05-06 12:12:18 -04:00
Tom Lane e0c91a7ff0 Improve some O(N^2) behavior in window function evaluation.
Repositioning the tuplestore seek pointer in window_gettupleslot() turns
out to be a very significant expense when the window frame is sizable and
the frame end can move.  To fix, introduce a tuplestore function for
skipping an arbitrary number of tuples in one call, parallel to the one we
introduced for tuplesort objects in commit 8d65da1f.  This reduces the cost
of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk.
As in the previous commit, I didn't try to do any real optimization of
tuplestore_skiptuples for the case where the tuplestore has spilled to
disk.  There is probably no practical way to get the cost to less than O(N)
anyway, but perhaps someone can think of something later.

Also fix PersistHoldablePortal() to make use of this API now that we have
it.

Based on a suggestion by Dean Rasheed, though this turns out not to look
much like his patch.
2014-04-13 13:59:17 -04:00
Bruce Momjian 7e04792a1c Update copyright for 2014
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
2014-01-07 16:05:30 -05:00
Tom Lane 1def747db6 Fix inadequately-tested code path in tuplesort_skiptuples().
Per report from Jeff Davis.
2013-12-24 17:13:02 -05:00
Tom Lane 8d65da1f01 Support ordered-set (WITHIN GROUP) aggregates.
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()).  We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.

Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions.  To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need.  There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.

In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.

Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
2013-12-23 16:11:35 -05:00
Stephen Frost 273dcd1628 Ensure 64bit arithmetic when calculating tapeSpace
In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring
out how many 'tapes' we can use (maxTapes) and then multiplying the
result by the tape buffer overhead for each.  Unfortunately, when
we are on a system with an 8-byte long, we allow work_mem to be
larger than 2GB and that allows maxTapes to be large enough that the
32bit arithmetic can overflow when multiplied against the buffer
overhead.

When this overflow happens, we end up adding the overflow to the
amount of space available, causing the amount of memory allocated to
be larger than work_mem.

Note that to reach this point, you have to set work mem to at least
24GB and be sorting a set which is at least that size.  Given that a
user who can set work_mem to 24GB could also set it even higher, if
they were looking to run the system out of memory, this isn't
considered a security issue.

This overflow risk was found by the Coverity scanner.

Back-patch to all supported branches, as this issue has existed
since before 8.4.
2013-07-14 16:26:16 -04:00
Noah Misch 79e0f87a15 Use type "int64" for memory accounting in tuplesort.c/tuplestore.c.
Commit 263865a489 switched tuplesort.c and
tuplestore.c variables representing memory usage from type "long" to
type "Size".  This was unnecessary; I thought doing so avoided overflow
scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
prevent the overflow.  It was also incomplete, not touching the logic
that assumed a signed data type.  Change the affected variables to
"int64".  This is perfect for 64-bit platforms, and it reduces the need
to contemplate platform-specific overflow scenarios.  It also puts us
close to being able to support work_mem over 2 GiB on 64-bit Windows.

Per report from Andres Freund.
2013-07-04 23:13:54 -04:00
Noah Misch 263865a489 Permit super-MaxAllocSize allocations with MemoryContextAllocHuge().
The MaxAllocSize guard is convenient for most callers, because it
reduces the need for careful attention to overflow, data type selection,
and the SET_VARSIZE() limit.  A handful of callers are happy to navigate
those hazards in exchange for the ability to allocate a larger chunk.
Introduce MemoryContextAllocHuge() and repalloc_huge().  Use this in
tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX
tuples, a factor-of-48 increase.  In particular, B-tree index builds can
now benefit from much-larger maintenance_work_mem settings.

Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.
2013-06-27 14:53:57 -04:00
Bruce Momjian 9af4159fce pgindent run for release 9.3
This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.
2013-05-29 16:58:43 -04:00
Tom Lane 991f3e5ab3 Provide database object names as separate fields in error messages.
This patch addresses the problem that applications currently have to
extract object names from possibly-localized textual error messages,
if they want to know for example which index caused a UNIQUE_VIOLATION
failure.  It adds new error message fields to the wire protocol, which
can carry the name of a table, table column, data type, or constraint
associated with the error.  (Since the protocol spec has always instructed
clients to ignore unrecognized field types, this should not create any
compatibility problem.)

Support for providing these new fields has been added to just a limited set
of error reports (mainly, those in the "integrity constraint violation"
SQLSTATE class), but we will doubtless add them to more calls in future.

Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with
additional hacking by Tom Lane.
2013-01-29 17:08:26 -05:00
Tom Lane 8ae35e9180 Improve memory space management in tuplesort and tuplestore.
The code originally just doubled the size of the tuple-pointer array so
long as that would fit in allowedMem.  This could result in failing to use
as much as half of allowedMem, if (as is typical) the last doubling attempt
didn't quite fit.  Worse, we might double the array size but be unable to
use most of the added slots, because there was no room left within the
allowedMem limit for tuples the slots should point to.  To fix, double only
so long as we've used less than half of allowedMem in total.  Then do one
more array enlargement, but scale it based on total memory consumption so
far.  This will work nicely as long as the average tuple size is reasonably
stable, and in any case should be better than the old method.

This change will result in large sort operations consuming a larger
fraction of work_mem than they typically did in the past.  The release
notes should mention that users may want to revisit their work_mem
settings, if they'd tuned those settings based on the old behavior of
sorting.

Jeff Janes, reviewed by Peter Geoghegan and Robert Haas
2013-01-17 13:12:56 -05:00
Bruce Momjian bd61a623ac Update copyrights for 2013
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
2013-01-01 17:15:01 -05:00
Alvaro Herrera 976fa10d20 Add support for easily declaring static inline functions
We already had those, but they forced modules to spell out the function
bodies twice.  Eliminate some duplicates we had already grown.

Extracted from a somewhat larger patch from Andres Freund.
2012-10-08 16:28:01 -03:00
Alvaro Herrera c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Bruce Momjian 042d9ffc28 Run newly-configured perltidy script on Perl files.
Run on HEAD and 9.2.
2012-07-04 21:47:49 -04:00
Bruce Momjian 927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Tom Lane 95b9c333b2 Further adjustment of comment about qsort_tuple. 2012-04-07 17:48:40 -04:00
Tom Lane 17b985b1a0 Fix broken comparetup_datum code.
Commit 337b6f5ecf contained the entirely
fanciful assumption that it had made comparetup_datum unreachable.
Reported and patched by Takashi Yamamoto.

Fix up some not terribly accurate/useful comments from that commit, too.
2012-04-06 16:58:50 -04:00
Robert Haas aefa6d163e Add some CHECK_FOR_INTERRUPTS() calls to the heap-sort call path.
I broke this in commit 337b6f5ecf, which
among other things arranged for quicksorts to CHECK_FOR_INTERRUPTS()
slightly less frequently.  Sadly, it also arranged for heapsorts to
CHECK_FOR_INTERRUPTS() much less frequently.  Repair.
2012-03-20 21:26:39 -04:00
Robert Haas edec8c8e00 Fix VPATH builds, broken by my recent commit to speed up tuplesorting.
The relevant commit is 337b6f5ecf.
2012-02-15 15:53:53 -05:00
Robert Haas 337b6f5ecf Speed up in-memory tuplesorting.
Per recent work by Peter Geoghegan, it's significantly faster to
tuplesort on a single sortkey if ApplySortComparator is inlined into
quicksort rather reached via a function pointer.  It's also faster
in general to have a version of quicksort which is specialized for
sorting SortTuple objects rather than objects of arbitrary size and
type.  This requires a couple of additional copies of the quicksort
logic, which in this patch are generate using a Perl script.  There
might be some benefit in adding further specializations here too,
but thus far it's not clear that those gains are worth their weight
in code footprint.
2012-02-15 12:13:32 -05:00
Robert Haas c5a03256c7 Adjust tuplesort.c based on the fact that we never use the OS's qsort().
Our own qsort_arg() implementation doesn't have the defect previously
observed to affect only QNX 4, so it seems sufficiently to assert that
it isn't broken rather than retesting.  Also, update a few comments to
clarify why it's valuable to retain a tie-break rule based on CTID
during index builds.

Peter Geoghegan, with slight tweaks by me.
2012-01-26 14:43:28 -05:00
Bruce Momjian e126958c2e Update copyright notices for year 2012. 2012-01-01 18:01:58 -05:00
Tom Lane c6e3ac11b6 Create a "sort support" interface API for faster sorting.
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting.  In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator.  While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.

Robert Haas and Tom Lane
2011-12-07 00:19:39 -05:00
Bruce Momjian 6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Tom Lane 10db3de66e Fix failure to account for memory used by tuplestore_putvalues().
This oversight could result in a tuplestore using much more than the
intended amount of memory.  It would only happen in a code path that loaded
a tuplestore via tuplestore_putvalues(), and many of those won't emit huge
amounts of data; but cases such as holdable cursors and plpgsql's RETURN
NEXT command could have the problem.  The fix ensures that the tuplestore
will switch to write-to-disk mode when it overruns work_mem.

The potential overrun was finite, because we would still count the space
used by the tuple pointer array, so the tuplestore code would eventually
flip into write-to-disk mode anyway.  When storing wide tuples we would
go far past the expected work_mem usage before that happened; but this
may account for the lack of prior reports.

Back-patch to 8.4, where tuplestore_putvalues was introduced.

Per bug #6061 from Yann Delorme.
2011-06-15 14:05:22 -04:00
Tom Lane d64713df7e Pass collations to functions in FunctionCallInfoData, not FmgrInfo.
Since collation is effectively an argument, not a property of the function,
FmgrInfo is really the wrong place for it; and this becomes critical in
cases where a cached FmgrInfo is used for varying purposes that might need
different collation settings.  Fix by passing it in FunctionCallInfoData
instead.  In particular this allows a clean fix for bug #5970 (record_cmp
not working).  This requires touching a bit more code than the original
method, but nobody ever thought that collations would not be an invasive
patch...
2011-04-12 19:19:24 -04:00
Bruce Momjian bf50caf105 pgindent run before PG 9.1 beta 1. 2011-04-10 11:42:00 -04:00
Tom Lane 7208fae18f Clean up cruft around collation initialization for tupdescs and scankeys.
I found actual bugs in GiST and plpgsql; the rest of this is cosmetic
but meant to decrease the odds of future bugs of omission.
2011-03-26 18:28:40 -04:00
Tom Lane b310b6e31c Revise collation derivation method and expression-tree representation.
All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record).  Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree.  This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.
2011-03-19 20:30:08 -04:00
Peter Eisentraut 414c5a2ea6 Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
2011-02-08 23:04:18 +02:00
Bruce Momjian 5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
Tom Lane 244407a710 Fix efficiency problems in tuplestore_trim().
The original coding in tuplestore_trim() was only meant to work efficiently
in cases where each trim call deleted most of the tuples in the store.
Which, in fact, was the pattern of the original usage with a Material node
supporting mark/restore operations underneath a MergeJoin.  However,
WindowAgg now uses tuplestores and it has considerably less friendly
trimming behavior.  In particular it can attempt to trim one tuple at a
time off a large tuplestore.  tuplestore_trim() had O(N^2) runtime in this
situation because of repeatedly shifting its tuple pointer array.  Fix by
avoiding shifting the array until a reasonably large number of tuples have
been deleted.  This can waste some pointer space, but we do still reclaim
the tuples themselves, so the percentage wastage should be pretty small.

Per Jie Li's report of slow percent_rank() evaluation.  cume_dist() and
ntile() would certainly be affected as well, along with any other window
function that has a moving frame start and requires reading substantially
ahead of the current row.

Back-patch to 8.4, where window functions were introduced.  There's no
need to tweak it before that.
2010-12-10 11:33:38 -05:00
Tom Lane 26a7b48e10 Eliminate some repetitive coding in tuplesort.c.
Use a macro LogicalTapeReadExact() to encapsulate the error check when
we want to read an exact number of bytes from a "tape".  Per a suggestion
of Takahiro Itagaki.
2010-10-07 20:32:21 -04:00
Tom Lane 3ba11d3df2 Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan.
... or at least, when the planner's cost estimates say it will be faster.

Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane
2010-10-07 20:00:28 -04:00
Magnus Hagander 9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian 65e806cba1 pgindent run for 9.0 2010-02-26 02:01:40 +00:00
Bruce Momjian 0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
Heikki Linnakangas 84d723b6ce Previous fix for temporary file management broke returning a set from
PL/pgSQL function within an exception handler. Make sure we use the right
resource owner when we create the tuplestore to hold returned tuples.

Simplify tuplestore API so that the caller doesn't need to be in the right
memory context when calling tuplestore_put* functions. tuplestore.c
automatically switches to the memory context used when the tuplestore was
created. Tuplesort was already modified like this earlier. This patch also
removes the now useless MemoryContextSwitch calls from callers.

Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like
the previous patch that broke this.
2009-12-29 17:40:59 +00:00
Tom Lane 9bd27b7c9e Extend EXPLAIN to support output in XML or JSON format.
There are probably still some adjustments to be made in the details
of the output, but this gets the basic structure in place.

Robert Haas
2009-08-10 05:46:50 +00:00
Tom Lane 527f0ae3fa Department of second thoughts: let's show the exact key during unique index
build failures, too.  Refactor a bit more since that error message isn't
spelled the same.
2009-08-01 20:59:17 +00:00
Bruce Momjian d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Tom Lane 25bf7f8b9b Fix possible failures when a tuplestore switches from in-memory to on-disk
mode while callers hold pointers to in-memory tuples.  I reported this for
the case of nodeWindowAgg's primary scan tuple, but inspection of the code
shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk.
For the moment, fix it with a rather brute-force approach of copying
whenever one of the at-risk callers requests a tuple.  Later we might
think of some sort of reference-count approach to reduce tuple copying.
2009-03-27 18:30:21 +00:00
Tom Lane e04810e8c4 Code review for dtrace probes added (so far) to 8.4. Adjust placement of
some bufmgr probes, take out redundant and memory-leak-inducing path arguments
to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to
recalculate space used in sort__done, clean up formatting in places where
I'm not sure pgindent will do a nice job by itself.
2009-03-11 23:19:25 +00:00
Bruce Momjian 511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
Tom Lane 95b07bc7f5 Support window functions a la SQL:2008.
Hitoshi Harada, with some kibitzing from Heikki and Tom.
2008-12-28 18:54:01 +00:00
Tom Lane 38e9348282 Make a couple of small changes to the tuplestore API, for the benefit of the
upcoming window-functions patch.  First, tuplestore_trim is now an
exported function that must be explicitly invoked by callers at
appropriate times, rather than something that tuplestore tries to do
behind the scenes.  Second, a read pointer that is marked as allowing
backward scan no longer prevents truncation.  This means that a read pointer
marked as having BACKWARD but not REWIND capability can only safely read
backwards as far as the oldest other read pointer.  (The expected use pattern
for this involves having another read pointer that serves as the truncation
fencepost.)
2008-12-27 17:39:00 +00:00
Tom Lane d26bf23f34 Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation
written to temp files by tuplesort.c and tuplestore.c.  This saves 2 bytes per
row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems
worth the slight additional uglification of the tuple read/write routines.
2008-10-28 15:51:03 +00:00
Tom Lane 34f89cb4af Fix oversight in recent patch to support multiple read positions in
tuplestore: in READFILE state tuplestore_select_read_pointer must
save the current file seek position in the read pointer being
deactivated.
2008-10-07 00:05:55 +00:00
Tom Lane 44d5be0e53 Implement SQL-standard WITH clauses, including WITH RECURSIVE.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.

There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly.  But let's land
the patch now so we can get on with other development.

Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
2008-10-04 21:56:55 +00:00
Tom Lane dad4cb6258 Improve tuplestore.c to support multiple concurrent read positions.
This facility replaces the former mark/restore support but is otherwise
upward-compatible with previous uses.  It's expected to be needed for
single evaluation of CTEs and also for window functions, so I'm committing
it separately instead of waiting for either one of those patches to be
finished.  Per discussion with Greg Stark and Hitoshi Harada.

Note: I removed nodeFunctionscan's mark/restore support, instead of bothering
to update it for this change, because it was dead code anyway.
2008-10-01 19:51:50 +00:00
Tom Lane 4adc2f72a4 Change hash indexes to store only the hash code rather than the whole indexed
value.  This means that hash index lookups are always lossy and have to be
rechecked when the heap is visited; however, the gain in index compactness
outweighs this when the indexed values are wide.  Also, we only need to
perform datatype comparisons when the hash codes match exactly, rather than
for every entry in the hash bucket; so it could also win for datatypes that
have expensive comparison functions.  A small additional win is gained by
keeping hash index pages sorted by hash code and using binary search to reduce
the number of index tuples we have to look at.

Xiao Meng

This commit also incorporates Zdenek Kotala's patch to isolate hash metapages
and hash bitmaps a bit better from the page header datastructures.
2008-09-15 18:43:41 +00:00
Alvaro Herrera e36e6b1cab Add a few more DTrace probes to the backend.
Robert Lor
2008-08-01 13:16:09 +00:00
Alvaro Herrera a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
Alvaro Herrera f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
Neil Conway 1d812a98b4 Add a new tuplestore API function, tuplestore_putvalues(). This is
identical to tuplestore_puttuple(), except it operates on arrays of
Datums + nulls rather than a fully-formed HeapTuple. In several places
that use the tuplestore API, this means we can avoid creating a
HeapTuple altogether, saving a copy.
2008-03-25 19:26:54 +00:00
Tom Lane 0c5962c054 Grab some low-hanging fruit in the new hash index build code.
oprofile shows that a nontrivial amount of time is being spent in
repeated calls to index_getprocinfo, which really only needs to be
called once.  So do that, and inline _hash_datum2hashkey to make it
work.
2008-03-17 03:45:36 +00:00
Tom Lane 787eba734b When creating a large hash index, pre-sort the index entries by estimated
bucket number, so as to ensure locality of access to the index during the
insertion step.  Without this, building an index significantly larger than
available RAM takes a very long time because of thrashing.  On the other
hand, sorting is just useless overhead when the index does fit in RAM.
We choose to sort when the initial index size exceeds effective_cache_size.

This is a revised version of work by Tom Raney and Shreya Bhargava.
2008-03-16 23:15:08 +00:00
Tom Lane f0828b2fc3 Provide a build-time option to store large relations as single files, rather
than dividing them into 1GB segments as has been our longtime practice.  This
requires working support for large files in the operating system; at least for
the time being, it won't be the default.

Zdenek Kotala
2008-03-10 20:06:27 +00:00
Peter Eisentraut 0474dcb608 Refactor backend makefiles to remove lots of duplicate code 2008-02-19 10:30:09 +00:00
Bruce Momjian 9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
Bruce Momjian fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane 2aae35d049 Mention the index name in 'could not create unique index' errors,
per suggestion from Rene Gollent.
2007-10-29 21:31:28 +00:00
Tom Lane d2825e1c85 Since sort_bounded_heap makes state changes that should be made
regardless of the number of tuples involved, it's incorrect to skip it
when memtupcount = 1; the number of cycles saved is minuscule anyway.
An alternative solution would be to pull the state changes out to the
call site in tuplesort_performsort, but keeping them near the corresponding
changes in make_bounded_heap seems marginally cleaner.  Noticed by
Greg Stark.
2007-09-01 18:47:39 +00:00
Neil Conway 494d6f809e Fix a memory leak in tuplestore_end(). Unlikely to be significant during
normal operation, but tuplestore_end() ought to do what it claims to do.
2007-08-02 17:48:52 +00:00
Tom Lane 24ee8af573 Rework temp_tablespaces patch so that temp tablespaces are assigned separately
for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.
2007-06-07 19:19:57 +00:00
Tom Lane acfce502ba Create a GUC parameter temp_tablespaces that allows selection of the
tablespace(s) in which to store temp tables and temporary files.  This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created).  Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-03 17:08:34 +00:00
Tom Lane 2415ad9831 Teach tuplestore.c to throw away data before the "mark" point when the caller
is using mark/restore but not rewind or backward-scan capability.  Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk.  The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.

Greg Stark, with some kibitzing from Tom Lane.
2007-05-21 17:57:35 +00:00
Tom Lane d2a4a4069f Add a line to the EXPLAIN ANALYZE output for a Sort node, showing the
actual sort strategy and amount of space used.  By popular demand.
2007-05-04 21:29:53 +00:00
Tom Lane d26559dbf3 Teach tuplesort.c about "top N" sorting, in which only the first N tuples
need be returned.  We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input.  For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it.  Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries.  Greg Stark, with some editorialization by moi.
2007-05-04 01:13:45 +00:00
Peter Eisentraut 2cc01004c6 Remove remains of old depend target. 2007-01-20 17:16:17 +00:00
Tom Lane a191a169d6 Change the planner-to-executor API so that the planner tells the executor
which comparison operators to use for plan nodes involving tuple comparison
(Agg, Group, Unique, SetOp).  Formerly the executor looked up the default
equality operator for the datatype, which was really pretty shaky, since it's
possible that the data being fed to the node is sorted according to some
nondefault operator class that could have an incompatible idea of equality.
The planner knows what it has sorted by and therefore can provide the right
equality operator to use.  Also, this change moves a couple of catalog lookups
out of the executor and into the planner, which should help startup time for
pre-planned queries by some small amount.  Modify the planner to remove some
other cavalier assumptions about always being able to use the default
operators.  Also add "nulls first/last" info to the Plan node for a mergejoin
--- neither the executor nor the planner can cope yet, but at least the API is
in place.
2007-01-10 18:06:05 +00:00
Tom Lane 4431758229 Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST
per-column options for btree indexes.  The planner's support for this is still
pretty rudimentary; it does not yet know how to plan mergejoins with
nondefault ordering options.  The documentation is pretty rudimentary, too.
I'll work on improving that stuff later.

Note incompatible change from prior behavior: ORDER BY ... USING will now be
rejected if the operator is not a less-than or greater-than member of some
btree opclass.  This prevents less-than-sane behavior if an operator that
doesn't actually define a proper sort ordering is selected.
2007-01-09 02:14:16 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane a78fcfb512 Restructure operator classes to allow improved handling of cross-data-type
cases.  Operator classes now exist within "operator families".  While most
families are equivalent to a single class, related classes can be grouped
into one family to represent the fact that they are semantically compatible.
Cross-type operators are now naturally adjunct parts of a family, without
having to wedge them into a particular opclass as we had done originally.

This commit restructures the catalogs and cleans up enough of the fallout so
that everything still works at least as well as before, but most of the work
needed to actually improve the planner's behavior will come later.  Also,
there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way
to create a new family right now is to allow CREATE OPERATOR CLASS to make
one by default.  I owe some more documentation work, too.  But that can all
be done in smaller pieces once this infrastructure is in place.
2006-12-23 00:43:13 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane 6edd2b4a91 Switch over to using our own qsort() all the time, as has been proposed
repeatedly.  Now that we don't have to worry about memory leaks from
glibc's qsort, we can safely put CHECK_FOR_INTERRUPTS into the tuplesort
comparators, as was requested a couple months ago.  Also, get rid of
non-reentrancy and an extra level of function call in tuplesort.c by
providing a variant qsort_arg() API that passes an extra void * argument
through to the comparison routine.  (We might want to use that in other
places too, I didn't look yet.)
2006-10-03 22:18:23 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Tom Lane cdd5178c69 Extend the MinimalTuple concept to tuplesort.c, thereby reducing the
per-tuple space overhead for sorts in memory.  I chose to replace the
previous patch that tried to write out the bare minimum amount of data
when sorting on disk; instead, just dump the MinimalTuples as-is.  This
wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap
length, but the simplification in the writetup/readtup routines seems
worth it.
2006-06-27 16:53:02 +00:00
Tom Lane 3f50ba27cf Create infrastructure for 'MinimalTuple' representation of in-memory
tuples with less header overhead than a regular HeapTuple, per my
recent proposal.  Teach TupleTableSlot code how to deal with these.
As proof of concept, change tuplestore.c to store MinimalTuples instead
of HeapTuples.  Future patches will expand the concept to other places
where it is useful.
2006-06-27 02:51:40 +00:00
Tom Lane 7f52e0c50e Tweak writetup_heap/readtup_heap to avoid storing the tuple identity
and transaction visibility fields of tuples being sorted.  These are
always uninteresting in a tuple being sorted (if the fields were actually
selected, they'd have been pulled out into user columns beforehand).
This saves about 24 bytes per row being sorted, which is a useful savings
for any but the widest of sort rows.  Per recent discussion.
2006-05-23 21:37:59 +00:00
Tom Lane c65ab0bfa9 Recent changes in memory management in tuplesort.c had a problem: the
case where we run low on array slots before we run low on memory is much
more probable than I had thought, and so it's important to treat each
tape fairly in that case.  To fix this, track per-tape slot allocations
just like we track per-tape space allocation.  Also, in the FINALMERGE
code path avoid scanning all the input tapes when we really only need to
read from one.  This should fix poor behavior with very large work_mem
as exhibited by Stefan Kaltenbrunner.
I didn't do anything about putting an upper bound on the number of tapes,
but maybe we should still consider that.
2006-03-10 23:19:00 +00:00
Tom Lane c8cd76de28 Tweak trace_sort code to show the merge order (number of active input
tapes) for each merge step.  This will give us some idea of how effective
the merge distribution algorithm is.
2006-03-08 16:59:03 +00:00
Tom Lane 43ceb3d449 Further examination of ltsReleaseBlock usage shows that it's got a
performance issue during regular merge passes not only the 'final merge'
case.  The original design contemplated that there would never be more
than about one free block per 'tape', hence no need for an efficient
method of keeping the free blocks sorted.  But given the later addition
of merge preread behavior in tuplesort.c, there is likely to be about
work_mem worth of free blocks, which is not so small ... and for that
matter the number of tapes isn't necessarily small anymore either.  So
we'd better get rid of the assumption entirely.  Instead, I'm assuming
that the usage pattern will involve alternation between merge preread
and writing of a new run.  This makes it reasonable to just add blocks
to the list without sorting during successive ltsReleaseBlock calls,
and then do a qsort() when we start getting ltsGetFreeBlock() calls.
Experimentation seems to confirm that there aren't many qsort calls
relative to the number of ltsReleaseBlock/ltsGetFreeBlock calls.
2006-03-07 23:46:24 +00:00
Tom Lane 8db05ba411 Repair old performance bug in tuplesort.c/logtape.c. In the case where
we are doing the final merge pass on-the-fly, and not writing the data
back onto a 'tape', the number of free blocks in the tape set will become
large, leading to a lot of time wasted in ltsReleaseBlock().  There is
really no need to track the free blocks anymore in this state, so add a
simple shutoff switch.  Per report from Stefan Kaltenbrunner.
2006-03-07 19:06:50 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane 2689abf078 Incorporate a couple of recent tuplesort.c improvements into tuplestore.c.
In particular, ensure that enlargement of the memtuples[] array doesn't
fall foul of MaxAllocSize when work_mem is very large, and don't bother
enlarging it if that would force an immediate switch into 'tape' mode anyway.
2006-03-04 19:30:12 +00:00
Tom Lane 80cadb303c Prevent sorting from requesting a SortTuple array that exceeds MaxAllocSize;
we'll go over to disk-based sort if we reach that limit.
This fixes Stefan Kaltenbrunner's observation that sorting can suffer an
'invalid memory alloc request size' failure when sort_mem is set large
enough.  It's unfortunately not so easy to fix in 8.1 ...
2006-03-04 19:05:06 +00:00
Tom Lane 909ca1407c Improve sorting speed by pre-extracting the first sort-key column of
each tuple, as per my proposal of several days ago.  Also, clean up
sort memory management by keeping all working data in a separate memory
context, and refine the handling of low-memory conditions.
2006-02-26 22:58:12 +00:00
Tom Lane 21e2544aa7 Update obsolete comment. 2006-02-19 19:59:53 +00:00
Tom Lane b34aa3372f Modify logtape.c so that the initial LogicalTapeSetCreate call only
allocates the control data.  The per-tape buffers are allocated only
on first use.  This saves memory in situations where tuplesort.c
overestimates the number of tapes needed (ie, there are fewer runs
than tapes).  Also, this makes legitimate the coding in inittapes()
that includes tape buffer space in the maximum-memory calculation:
when inittapes runs, we've already expended the whole allowed memory
on tuple storage, and so we'd better not allocate all the tape buffers
until we've flushed some tuples out of memory.
2006-02-19 05:58:36 +00:00
Tom Lane df700e6b40 Improve tuplesort.c to support variable merge order. The original coding
with fixed merge order (fixed number of "tapes") was based on obsolete
assumptions, namely that tape drives are expensive.  Since our "tapes"
are really just a couple of buffers, we can have a lot of them given
adequate workspace.  This allows reduction of the number of merge passes
with consequent savings of I/O during large sorts.

Simon Riggs with some rework by Tom Lane
2006-02-19 05:54:06 +00:00
Bruce Momjian a1675649e4 Remove QNX port. 2006-01-05 01:56:30 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane dd218ae7b0 Remove the t_datamcxt field of HeapTupleData. This was introduced for
the convenience of tuptoaster.c and is no longer needed, so may as well
get rid of some small amount of overhead.
2005-11-20 19:49:08 +00:00
Bruce Momjian 9ee8b9fd38 Change trace_sort to output to the log, rather than the user's terminal. 2005-10-25 13:47:08 +00:00
Tom Lane b33a732264 Improve trace_sort code to also show the total memory or disk space used.
Per request from Marc.
2005-10-18 22:59:37 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 53e47cdd79 Add a trace_sort option to help with measuring resource usage of external
sort operations.  Per recent discussion.  Simon Riggs and Tom Lane.
2005-10-03 22:55:56 +00:00
Tom Lane 06d70d78a4 Fix typo in comment. 2005-09-23 15:36:57 +00:00
Bruce Momjian b492c3accc Add parentheses to macros when args are used in computations. Without
them, the executation behavior could be unexpected.
2005-05-25 21:40:43 +00:00
Tom Lane 278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane bd9b4a9d46 Use InitFunctionCallInfoData() macro instead of MemSet in performance
critical places in execQual.  By Atsushi Ogawa; some minor cleanup by moi.
2005-03-22 20:13:09 +00:00
Tom Lane ad476170e9 Improve performance of fmgr.c calling routines for cases with more than
two arguments.  Per suggestions from A. Ogawa.
2005-02-02 22:40:04 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane fbac1272b8 During btree index build, sort equal-keyed tuples according to their
TID (heap position).  This doesn't do anything to the validity of the
finished index, but by pretending to qsort() that there are no really
equal keys in the sort, we can avoid performance problems with qsort
implementations that have trouble with large numbers of equal keys.
Patch from Manfred Koizar.
2004-03-17 22:24:58 +00:00
Tom Lane 391c3811a2 Rename SortMem and VacuumMem to work_mem and maintenance_work_mem.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
2004-02-03 17:34:04 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Tom Lane fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane c1d62bfd00 Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number.  Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations.  I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
2003-11-09 21:30:38 +00:00
Tom Lane ec646dbc65 Create a 'type cache' that keeps track of the data needed for any particular
datatype by array_eq and array_cmp; use this to solve problems with memory
leaks in array indexing support.  The parser's equality_oper and ordering_oper
routines also use the cache.  Change the operator search algorithms to look
for appropriate btree or hash index opclasses, instead of assuming operators
named '<' or '=' have the right semantics.  (ORDER BY ASC/DESC now also look
at opclasses, instead of assuming '<' and '>' are the right things.)  Add
several more index opclasses so that there is no regression in functionality
for base datatypes.  initdb forced due to catalog additions.
2003-08-17 19:58:06 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane 689eb53e47 Error message editing in backend/utils (except /adt). 2003-07-25 20:18:01 +00:00
Tom Lane 1c9ac7dfd0 Change pg_amop's index on (amopclaid,amopopr) to index (amopopr,amopclaid).
This makes no difference for existing uses, but allows SelectSortFunction()
and pred_test_simple_clause() to use indexscans instead of seqscans to
locate entries for a particular operator in pg_amop.  Better yet, they can
use the SearchSysCacheList() API to cache the search results.
2003-05-13 04:38:58 +00:00
Tom Lane 4a5f38c4e6 Code review for holdable-cursors patch. Fix error recovery, memory
context sloppiness, some other things.  Includes Neil's mopup patch
of 22-Apr.
2003-04-29 03:21:30 +00:00
Bruce Momjian 54f7338fa1 This patch implements holdable cursors, following the proposal
(materialization into a tuple store) discussed on pgsql-hackers earlier.
I've updated the documentation and the regression tests.

Notes on the implementation:

- I needed to change the tuple store API slightly -- it assumes that it
won't be used to hold data across transaction boundaries, so the temp
files that it uses for on-disk storage are automatically reclaimed at
end-of-transaction. I added a flag to tuplestore_begin_heap() to control
this behavior. Is changing the tuple store API in this fashion OK?

- in order to store executor results in a tuple store, I added a new
CommandDest. This works well for the most part, with one exception: the
current DestFunction API doesn't provide enough information to allow the
Executor to store results into an arbitrary tuple store (where the
particular tuple store to use is chosen by the call site of
ExecutorRun). To workaround this, I've temporarily hacked up a solution
that works, but is not ideal: since the receiveTuple DestFunction is
passed the portal name, we can use that to lookup the Portal data
structure for the cursor and then use that to get at the tuple store the
Portal is using. This unnecessarily ties the Portal code with the
tupleReceiver code, but it works...

The proper fix for this is probably to change the DestFunction API --
Tom suggested passing the full QueryDesc to the receiveTuple function.
In that case, callers of ExecutorRun could "subclass" QueryDesc to add
any additional fields that their particular CommandDest needed to get
access to. This approach would work, but I'd like to think about it for
a little bit longer before deciding which route to go. In the mean time,
the code works fine, so I don't think a fix is urgent.

- (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and
adjusted the behavior of SCROLL in accordance with the discussion on
-hackers.

- (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml

Neil Conway
2003-03-27 16:51:29 +00:00
Tom Lane aa60eecc37 Revise tuplestore and nodeMaterial so that we don't have to read the
entire contents of the subplan into the tuplestore before we can return
any tuples.  Instead, the tuplestore holds what we've already read, and
we fetch additional rows from the subplan as needed.  Random access to
the previously-read rows works with the tuplestore, and doesn't affect
the state of the partially-read subplan.  This is a step towards fixing
the problems with cursors over complex queries --- we don't want to
stick in Materialize nodes if they'll prevent quick startup for a cursor.
2003-03-09 02:19:13 +00:00
Bruce Momjian 9b12ab6d5d Add new palloc0 call as merge of palloc and MemSet(0). 2002-11-13 00:39:48 +00:00
Bruce Momjian 75fee4535d Back out use of palloc0 in place if palloc/MemSet. Seems constant len
to MemSet is a performance boost.
2002-11-11 03:02:20 +00:00
Bruce Momjian 8fee9615cc Merge palloc()/MemSet(0) calls into a single palloc0() call. 2002-11-10 07:25:14 +00:00
Tom Lane 5936055d46 Avoid use of inline functions that are not declared static. Needed to
conform to C99's brain-dead notion of how inline functions should work.
2002-10-31 19:11:48 +00:00
Tom Lane 3b8ba163d0 Tweak a few of the most heavily used function call points to zero out
just the significant fields of FunctionCallInfoData, rather than MemSet'ing
the whole struct to zero.  Unused positions in the arg[] array will
thereby contain garbage rather than zeroes.  This buys back some of the
performance hit from increasing FUNC_MAX_ARGS.  Also tweak tuplesort.c
code for more speed by marking some routines 'inline'.  All together
these changes speed up simple sorts, like count(distinct int4column),
by about 25% on a P4 running RH Linux 7.2.
2002-10-04 17:19:55 +00:00
Bruce Momjian e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Tom Lane 976246cc7e The cstring datatype can now be copied, passed around, etc. The typlen
value '-2' is used to indicate a variable-width type whose width is
computed as strlen(datum)+1.  Everything that looks at typlen is updated
except for array support, which Joe Conway is working on; at the moment
it wouldn't work to try to create an array of cstring.
2002-08-24 15:00:47 +00:00
Tom Lane 77a7e9968b Change memory-space accounting mechanism in tuplesort.c and tuplestore.c
to make a reasonable attempt at accounting for palloc overhead, not just
the requested size of each memory chunk.  Since in many scenarios this
will make for a significant reduction in the amount of space acquired,
partially compensate by doubling the default value of SORT_MEM to 1Mb.
Per discussion in pgsql-general around 9-Jun-2002..
2002-08-12 00:36:12 +00:00
Bruce Momjian d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Tom Lane 44fbe20d62 Restructure indexscan API (index_beginscan, index_getnext) per
yesterday's proposal to pghackers.  Also remove unnecessary parameters
to heap_beginscan, heap_rescan.  I modified pg_proc.h to reflect the
new numbers of parameters for the AM interface routines, but did not
force an initdb because nothing actually looks at those fields.
2002-05-20 23:51:44 +00:00
Tom Lane 3b6cbce458 Add CHECK_FOR_INTERRUPTS() in various strategic spots, per comments
from Hiroshi.
2002-01-06 00:37:44 +00:00
Tom Lane 69a59150c2 Defend against brain-dead QNX implementation of qsort().
Per report from Bernd Tegge, 10-Nov-01.
2001-11-11 22:00:25 +00:00
Bruce Momjian 6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane f933766ba7 Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in
pgsql-hackers.  pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name.  This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.

Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries.  I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.

Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.

initdb forced.
2001-08-21 16:36:06 +00:00
Tom Lane 5433b48380 Tweak sorting so that nulls appear at the front of a descending sort
(vs. at the end of a normal sort).  This ensures that explicit sorts
yield the same ordering as a btree index scan.  To be really sure that
that equivalence holds, we use the btree entries in pg_amop to decide
whether we are looking at a '<' or '>' operator.  For a sort operator
that has no btree association, we put the nulls at the front if the
operator is named '>' ... pretty grotty, but it does the right thing in
simple ASC and DESC cases, and at least there's no possibility of getting
a different answer depending on the plan type chosen.
2001-06-02 19:01:53 +00:00
Tom Lane f905d65ee3 Rewrite of planner statistics-gathering code. ANALYZE is now available as
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored.  ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values).  Random
sampling is used to make the process reasonably fast even on very large
tables.  The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.

There is more still to do; the new stats are not being used everywhere
they could be in the planner.  But the remaining changes for this project
should be localized, and the behavior is already better than before.

A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
2001-05-07 00:43:27 +00:00
Bruce Momjian 7cf952e7b4 Fix comments that were mis-wrapped, for Tom Lane. 2001-03-23 04:49:58 +00:00
Bruce Momjian 9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Tom Lane 0d54d6ac44 Clean up handling of tuple descriptors so that result-tuple descriptors
allocated by plan nodes are not leaked at end of query.  This doesn't
really matter for normal queries, but it sure does for queries invoked
repetitively inside SQL functions.  Clean up some other grotty code
associated with tupdescs, and fix a few other memory leaks exposed by
tests with simple SQL functions.
2001-01-29 00:39:20 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane a933ee38bb Change SearchSysCache coding conventions so that a reference count is
maintained for each cache entry.  A cache entry will not be freed until
the matching ReleaseSysCache call has been executed.  This eliminates
worries about cache entries getting dropped while still in use.  See
my posting to pg-hackers of even date for more info.
2000-11-16 22:30:52 +00:00
Peter Eisentraut 424f0edcb8 Fix relative path references so that make knowns which dependencies refer
to one another. Sort out builddir vs srcdir variable namings. Remove some
now obsoleted make variables.
2000-08-31 16:12:35 +00:00
Tom Lane 1ee26b7764 Reimplement nodeMaterial to use a temporary BufFile (or even memory, if the
materialized tupleset is small enough) instead of a temporary relation.
This was something I was thinking of doing anyway for performance, and Jan
says he needs it for TOAST because he doesn't want to cope with toasting
noname relations.  With this change, the 'noname table' support in heap.c
is dead code, and I have accordingly removed it.  Also clean up 'noname'
plan handling in planner --- nonames are either sort or materialize plans,
and it seems less confusing to handle them separately under those names.
2000-06-18 22:44:35 +00:00
Tom Lane 0f1e39643d Third round of fmgr updates: eliminate calls using fmgr() and
fmgr_faddr() in favor of new-style calls.  Lots of cleanup of
sloppy casts to use XXXGetDatum and DatumGetXXX ...
2000-05-30 04:25:00 +00:00
Tom Lane 091126fa28 Generated header files parse.h and fmgroids.h are now copied into
the src/include tree, so that -I backend is no longer necessary anywhere.
Also, clean up some bit rot in contrib tree.
2000-05-29 05:45:56 +00:00
Bruce Momjian 52f77df613 Ye-old pgindent run. Same 4-space tabs. 2000-04-12 17:17:23 +00:00
Tom Lane 341b328b18 Fix a bunch of minor portability problems and maybe-bugs revealed by
running gcc and HP's cc with warnings cranked way up.  Signed vs unsigned
comparisons, routines declared static and then defined not-static,
that kind of thing.  Tedious, but perhaps useful...
2000-03-17 02:36:41 +00:00
Tom Lane 73dd716285 Small performance improvement in comparetup_heap. 2000-03-01 17:14:09 +00:00
Tom Lane 8cb624262a Replace inefficient _bt_invokestrat calls with direct calls to the
appropriate btree three-way comparison routine.  Not clear why the
three-way comparison routines were being used in some paths and not
others in btree --- incomplete changes by someone long ago, maybe?
Anyway, this makes for a nice speedup in CREATE INDEX.
2000-02-18 06:32:39 +00:00
Bruce Momjian 5c25d60244 Add:
* Portions Copyright (c) 1996-2000, PostgreSQL, Inc

to all files copyright Regents of Berkeley.  Man, that's a lot of files.
2000-01-26 05:58:53 +00:00
Jan Wieck 397e9b32a3 Some changes to prepare for LONG attributes.
Jan
1999-12-16 22:20:03 +00:00
Bruce Momjian a82f9ffde6 New LDOUT makefile variable for QNX os. 1999-12-13 22:35:27 +00:00
Tom Lane a8ae19ec3d aggregate(DISTINCT ...) works, per SQL spec.
Note this forces initdb because of change of Aggref node in stored rules.
1999-12-13 01:27:21 +00:00
Bruce Momjian 3ffd3d82db Make LD -r as macros that can be changed for QNX. 1999-12-09 19:15:45 +00:00
Tom Lane cf627ab41a Further performance improvements in sorting: reduce number of comparisons
during initial run formation by keeping both current run and next-run
tuples in the same heap (yup, Knuth is smarter than I am).  And, during
merge passes, make use of available sort memory to load multiple tuples
from any one input 'tape' at a time, thereby improving locality of
access to the temp file.
1999-10-30 17:27:15 +00:00
Tom Lane 887afac1f5 Remove now-dead sort modules. 1999-10-17 22:19:07 +00:00
Tom Lane 26c48b5e8c Final stage of psort reconstruction work: replace psort.c with
a generalized module 'tuplesort.c' that can sort either HeapTuples or
IndexTuples, and is not tied to execution of a Sort node.  Clean up
memory leakages in sorting, and replace nbtsort.c's private implementation
of mergesorting with calls to tuplesort.c.
1999-10-17 22:15:09 +00:00
Tom Lane 957146dcec Second phase of psort reconstruction project: add bookkeeping logic to
recycle storage within sort temp file on a block-by-block basis.  This
reduces peak disk usage to essentially just the volume of data being
sorted, whereas it had been about 4x the data volume before.
1999-10-16 19:49:28 +00:00
Tom Lane db3c4c3a2d Split 'BufFile' routines out of fd.c into a new module, buffile.c. Extend
BufFile so that it handles multi-segment temporary files transparently.
This allows sorts and hashes to work with data exceeding 2Gig (or whatever
the local limit on file size is).  Change psort.c to use relative seeks
instead of absolute seeks for backwards scanning, so that it won't fail
when the data volume exceeds 2Gig.
1999-10-13 15:02:32 +00:00
Bruce Momjian 3406901a29 Move some system includes into c.h, and remove duplicates. 1999-07-17 20:18:55 +00:00
Bruce Momjian 69817665cb Final cleanup 1999-07-16 05:23:30 +00:00
Bruce Momjian 2e6b1e63a3 Remove unused #includes in *.c files. 1999-07-15 22:40:16 +00:00
Tom Lane cc62dc2032 Fix tuplecmp() to ensure repeatable sort ordering of tuples
that contain null fields.  Old code would produce erratic sort results
because comparisons of tuples containing nulls could produce inconsistent
answers.
1999-07-10 18:21:59 +00:00
Bruce Momjian fcff1cdf4e Another pgindent run. Sorry folks. 1999-05-25 22:43:53 +00:00
Bruce Momjian 07842084fe pgindent run over code. 1999-05-25 16:15:34 +00:00
Tom Lane 71d5d95376 Update hash and join routines to use fd.c's new temp-file
code, instead of not-very-bulletproof stuff they had before.
1999-05-09 00:53:22 +00:00
Bruce Momjian 6724a50787 Change my-function-name-- to my_function_name, and optimizer renames. 1999-02-13 23:22:53 +00:00
Bruce Momjian 9322950aa4 Cleanup of source files where 'return' or 'var =' is alone on a line. 1999-02-03 21:18:02 +00:00
Bruce Momjian 4390b0bfbe Add TEMP tables/indexes. Add COPY pfree(). Other cleanups. 1999-02-02 03:45:56 +00:00
Bruce Momjian 7a6b562fdf Apply Win32 patch from Horak Daniel. 1999-01-17 06:20:06 +00:00
Bruce Momjian f0fbd7b87e Some security, since we now have vsnprintf, I remade an old patch
with some extra ugly sprintfs fixed. More work in this area is
   needed still.

Göran Thyni
1999-01-01 04:48:49 +00:00
Marc G. Fournier 9396802f14 more cleanups...of note, appendStringInfo now performs like sprintf(),
where you state a format and arguments.  the old behavior required
each appendStringInfo to have to have a sprintf() before it if any
formatting was required.

Also shortened several instances where there were multiple appendStringInfo()
calls in a row, doing nothing more then adding one more word to the String,
instead of doing them all in one call.
1998-12-14 08:11:17 +00:00
Marc G. Fournier 7c3b7d2744 Initial attempt to clean up the code...
Switch sprintf() to snprintf()
Remove any/all #if 0 -or- #ifdef NOT_USED -or- #ifdef FALSE sections of
	code
1998-12-14 05:19:16 +00:00
Vadim B. Mikheev 6beba218d7 New HeapTuple structure/interface. 1998-11-27 19:52:36 +00:00
Bruce Momjian af74855a60 Renaming cleanup, no pgindent yet. 1998-09-01 03:29:17 +00:00
Bruce Momjian 6bd323c6b3 Remove un-needed braces around single statements. 1998-06-15 19:30:31 +00:00
Bruce Momjian 27db9ecd0b Fix macros that were not properly surrounded by parens or braces. 1998-06-15 18:40:05 +00:00
Bruce Momjian 1e801a8f16 Hi,
Attached you'll find a (big) patch that fixes make dep and make
depend in all Makefiles where I found it to be appropriate.

It also removes the dependency in Makefile.global for NAMEDATALEN
and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh
a little smarter.

This no longer requires initdb.sh that is turned into initdb with
a sed script when installing Postgres, hence initdb.sh should be
renamed to initdb (after the patch has been applied :-) )

This patch is against the 6.3 sources, as it took a while to
complete.

Please review and apply,

Cheers,

Jeroen van Vianen
1998-04-06 00:32:26 +00:00
Bruce Momjian a32450a585 pgindent run before 6.3 release, with Thomas' requested changes. 1998-02-26 04:46:47 +00:00
Vadim B. Mikheev f0e7e2faa4 ExecReScan for Unique & Sort nodes. 1998-02-23 06:28:16 +00:00
Bruce Momjian 24cab6bd0d Goodbye register keyword. Compiler knows better. 1998-02-11 19:14:04 +00:00
Bruce Momjian 79f99a3888 Fix for psort. fixes regression tests. 1998-02-01 22:20:47 +00:00
Bruce Momjian 726c3854cb Inline fastgetattr and others so data access does not use function
calls.
1998-01-31 04:39:26 +00:00
Marc G. Fournier c58fb21bd4 From: Jeroen van Vianen <jeroenv@design.nl>
This patch solves the problem with multiple order by columns, with the
first one having NULL values.
1998-01-25 05:18:34 +00:00
Bruce Momjian c16ebb0f67 getpid/pid cleanup 1998-01-25 05:15:15 +00:00
PostgreSQL Daemon baef78d96b Thank god for searchable mail archives.
Patch by: wieck@sapserv.debis.de (Jan Wieck)

   One  of  the design rules of PostgreSQL is extensibility. And
   to follow this rule means (at least for me) that there should
   not  only  be a builtin PL.  Instead I would prefer a defined
   interface for PL implemetations.
1998-01-15 19:46:37 +00:00
Marc G. Fournier 374bb5d261 Some *very* major changes by darrenk@insightdist.com (Darren King)
==========================================
What follows is a set of diffs that cleans up the usage of BLCKSZ.

As a side effect, the person compiling the code can change the
value of BLCKSZ _at_their_own_risk_.  By that, I mean that I've
tried it here at 4096 and 16384 with no ill-effects.  A value
of 4096 _shouldn't_ affect much as far as the kernel/file system
goes, but making it bigger than 8192 can have severe consequences
if you don't know what you're doing.  16394 worked for me, _BUT_
when I went to 32768 and did an initdb, the SCSI driver broke and
the partition that I was running under went to hell in a hand
basket. Had to reboot and do a good bit of fsck'ing to fix things up.

The patch can be safely applied though.  Just leave BLCKSZ = 8192
and everything is as before.  It basically only cleans up all of the
references to BLCKSZ in the code.

If this patch is applied, a comment in the config.h file though above
the BLCKSZ define with warning about monkeying around with it would
be a good idea.

Darren  darrenk@insightdist.com

(Also cleans up some of the #includes in files referencing BLCKSZ.)
==========================================
1998-01-13 04:05:12 +00:00
Bruce Momjian 679d39b9c8 Goodbye ABORT. Hello ERROR for all errors. 1998-01-07 21:07:04 +00:00
Bruce Momjian 0d9fc5afd6 Change elog(WARN) to elog(ERROR) and elog(ABORT). 1998-01-05 03:35:55 +00:00
Marc G. Fournier 6e337eef45 Major cleanout of PORTNAME variables from Makefiles...bound to screw up
some of the ports...
1997-12-20 00:29:35 +00:00
Marc G. Fournier 5379b84eff More cleanups. I can now compile without PORTNAME being defined n
Makefile.global.

End result, if all goes well, should allow for much easier porting, since
there will no longer be a concept of a "port".  Most, if not everything,
*should* be determined by configure, or by the compiler itself.  Still
work to be done though :)
1997-12-19 02:09:10 +00:00
Bruce Momjian f7f2e18f8e Remove tqual.h includes not needed. 1997-11-24 05:09:50 +00:00
Bruce Momjian f3af1368bd Rename strNcpy to StrNCpy, and change third parameter. 1997-10-25 01:10:58 +00:00
Vadim B. Mikheev 78351f422b Fix for backward cursors with ORDER BY. 1997-10-15 06:36:36 +00:00
Bruce Momjian 5e2c0a87c9 Fix for psort temp file names, from Vadim. 1997-09-26 20:05:47 +00:00
Vadim B. Mikheev b0ccd78479 Don't limit number of tuples in leftist trees!
Use qsort to sort array of tuples for nextrun when current
run is done and put into leftist tree from sorted array!
It's much faster and creates non-bushy tree - this is ve-e-ery good
for perfomance!
1997-09-18 14:41:56 +00:00
Vadim B. Mikheev 712ea2507e 1. Use qsort for first run
2. Limit number of tuples in leftist trees:
	- put one tuple from current tree to disk if limit reached;
	- end run creation if limit reached by nextrun.
3. Avoid mergeruns() if first run is single one!
1997-09-18 05:37:31 +00:00
Vadim B. Mikheev f3e9cf9c6b Fix pfree problem. 1997-09-15 14:29:01 +00:00
Bruce Momjian 1ea01720d5 heapattr functions now return a Datum, not char *. 1997-09-12 04:09:08 +00:00
Bruce Momjian 59f6a57e59 Used modified version of indent that understands over 100 typedefs. 1997-09-08 21:56:23 +00:00
Bruce Momjian 319dbfa736 Another PGINDENT run that changes variable indenting and case label indenting. Also static variable indenting. 1997-09-08 02:41:22 +00:00
Bruce Momjian 1ccd423235 Massive commit to run PGINDENT on all *.c and *.h files. 1997-09-07 05:04:48 +00:00
Bruce Momjian 868d708188 Add // comments. 1997-09-05 00:09:47 +00:00
Bruce Momjian 1d8bbfd2e7 Make functions static where possible, enclose unused functions in #ifdef NOT_USED. 1997-08-19 21:40:56 +00:00
Bruce Momjian 022903f22e Reduce open() calls. Replace fopen() calls with calls to fd.c functions. 1997-08-18 02:15:04 +00:00
Bruce Momjian fd86ae151a Cleanup global variables, remove stable memory stuff. 1997-08-14 16:11:41 +00:00
Vadim B. Mikheev e99e4ba833 sprintf "...%d...", ... (int)getpid(), ...
^^^^^
1997-08-14 05:04:38 +00:00
Bruce Momjian ea5b5357cd Remove more (void) and fix -Wall warnings. 1997-08-12 22:55:25 +00:00
Bruce Momjian edb58721b8 Fix pgproc names over 15 chars in output. Add strNcpy() function. remove some (void) casts that are unnecessary. 1997-08-12 20:16:25 +00:00
Bruce Momjian dc374505fa Fix for psort again. 1997-08-06 17:11:20 +00:00
Bruce Momjian 677efc7679 Another psort fix. 1997-08-06 07:39:20 +00:00
Bruce Momjian 42c0cd33a2 I think I finally got psort working for all cases. 1997-08-06 07:02:49 +00:00
Bruce Momjian cc24b846dd psort cleanups. 1997-08-06 05:38:46 +00:00
Bruce Momjian ead219384f Fix for palloc(0) in new code 1997-08-06 04:45:39 +00:00
Bruce Momjian f5f366e188 Allow internal sorts to be stored in memory rather than in files. 1997-08-06 03:42:21 +00:00
Bruce Momjian 3ac9d2fff3 Various compile errors concerning overflow due to shifts, unsigned, and bad prototypes, from Solaris, from Diab Jerius 1997-07-24 20:19:10 +00:00
Vadim B. Mikheev 5f893a1e32 Shouldn't we use palloc instead of malloc ?
Because of
 *      resetpsort  - resets (frees) malloc'd memory for an aborted Xaction
 *
 *      Not implemented yet.
1997-05-20 11:35:50 +00:00
Bruce Momjian a0990e1884 Makefile cleanup after reorganization 1996-11-09 06:24:51 +00:00
Marc G. Fournier 0020e8790d Another directory that compiles with no errors, and few warnings 1996-11-06 10:32:10 +00:00
Marc G. Fournier c9002ecb21 Produce a clean compile of backend... 1996-11-03 06:54:38 +00:00
Bryan Henderson b0d6f0aa63 Simplify make files, add full dependencies. 1996-10-27 09:55:05 +00:00
Marc G. Fournier d31084e9d1 Postgres95 1.01 Distribution - Virgin Sources 1996-07-09 06:22:35 +00:00