postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	1d41739e5a	Don't require sort support functions to provide a comparator. This could be useful for datatypes like text, where we might want to optimize for some collations but not others. However, this patch doesn't introduce any new sortsupport functions that work this way; it merely revises the code so that future patches may do so. Patch by me. Review by Peter Geoghegan.	2014-08-06 16:06:06 -04:00
Peter Eisentraut	80ddd04b4d	Fix whitespace	2014-07-11 15:12:11 -04:00
Robert Haas	9f03ca9151	Avoid copying index tuples when building an index. The previous code, perhaps out of concern for avoid memory leaks, formed the tuple in one memory context and then copied it to another memory context. However, this doesn't appear to be necessary, since index_form_tuple and the functions it calls take precautions against leaking memory. In my testing, building the tuple directly inside the sort context shaves several percent off the index build time. Rearrange things so we do that. Patch by me. Review by Amit Kapila, Tom Lane, Andres Freund.	2014-07-01 10:34:42 -04:00
Tom Lane	6554656ea2	Improve tuplestore's error messages for I/O failures. We should report the errno when we get a failure from functions like BufFileWrite. "ERROR: write failed" is unreasonably taciturn for a case that's well within the realm of possibility; I've seen it a couple times in the buildfarm recently, in situations that were probably out-of-disk-space, but it'd be good to see the errno to confirm it. I think this code was originally written without assuming that the buffile.c functions would return useful errno; but most other callers are assuming that, and a quick look at the buffile code gives no reason to suppose otherwise. Also, a couple of the old messages were phrased on the assumption that a short read might indicate a logic bug in tuplestore itself; but that code's pretty well tested by now, so a filesystem-level problem seems much more likely.	2014-06-12 18:59:06 -04:00
Bruce Momjian	0a78320057	pgindent run for 9.4 This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.	2014-05-06 12:12:18 -04:00
Tom Lane	e0c91a7ff0	Improve some O(N^2) behavior in window function evaluation. Repositioning the tuplestore seek pointer in window_gettupleslot() turns out to be a very significant expense when the window frame is sizable and the frame end can move. To fix, introduce a tuplestore function for skipping an arbitrary number of tuples in one call, parallel to the one we introduced for tuplesort objects in commit `8d65da1f`. This reduces the cost of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk. As in the previous commit, I didn't try to do any real optimization of tuplestore_skiptuples for the case where the tuplestore has spilled to disk. There is probably no practical way to get the cost to less than O(N) anyway, but perhaps someone can think of something later. Also fix PersistHoldablePortal() to make use of this API now that we have it. Based on a suggestion by Dean Rasheed, though this turns out not to look much like his patch.	2014-04-13 13:59:17 -04:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Tom Lane	1def747db6	Fix inadequately-tested code path in tuplesort_skiptuples(). Per report from Jeff Davis.	2013-12-24 17:13:02 -05:00
Tom Lane	8d65da1f01	Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane	2013-12-23 16:11:35 -05:00
Stephen Frost	273dcd1628	Ensure 64bit arithmetic when calculating tapeSpace In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring out how many 'tapes' we can use (maxTapes) and then multiplying the result by the tape buffer overhead for each. Unfortunately, when we are on a system with an 8-byte long, we allow work_mem to be larger than 2GB and that allows maxTapes to be large enough that the 32bit arithmetic can overflow when multiplied against the buffer overhead. When this overflow happens, we end up adding the overflow to the amount of space available, causing the amount of memory allocated to be larger than work_mem. Note that to reach this point, you have to set work mem to at least 24GB and be sorting a set which is at least that size. Given that a user who can set work_mem to 24GB could also set it even higher, if they were looking to run the system out of memory, this isn't considered a security issue. This overflow risk was found by the Coverity scanner. Back-patch to all supported branches, as this issue has existed since before 8.4.	2013-07-14 16:26:16 -04:00
Noah Misch	79e0f87a15	Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. Commit `263865a489` switched tuplesort.c and tuplestore.c variables representing memory usage from type "long" to type "Size". This was unnecessary; I thought doing so avoided overflow scenarios on 64-bit Windows, but guc.c already limited work_mem so as to prevent the overflow. It was also incomplete, not touching the logic that assumed a signed data type. Change the affected variables to "int64". This is perfect for 64-bit platforms, and it reduces the need to contemplate platform-specific overflow scenarios. It also puts us close to being able to support work_mem over 2 GiB on 64-bit Windows. Per report from Andres Freund.	2013-07-04 23:13:54 -04:00
Noah Misch	263865a489	Permit super-MaxAllocSize allocations with MemoryContextAllocHuge(). The MaxAllocSize guard is convenient for most callers, because it reduces the need for careful attention to overflow, data type selection, and the SET_VARSIZE() limit. A handful of callers are happy to navigate those hazards in exchange for the ability to allocate a larger chunk. Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX tuples, a factor-of-48 increase. In particular, B-tree index builds can now benefit from much-larger maintenance_work_mem settings. Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.	2013-06-27 14:53:57 -04:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Tom Lane	991f3e5ab3	Provide database object names as separate fields in error messages. This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.	2013-01-29 17:08:26 -05:00
Tom Lane	8ae35e9180	Improve memory space management in tuplesort and tuplestore. The code originally just doubled the size of the tuple-pointer array so long as that would fit in allowedMem. This could result in failing to use as much as half of allowedMem, if (as is typical) the last doubling attempt didn't quite fit. Worse, we might double the array size but be unable to use most of the added slots, because there was no room left within the allowedMem limit for tuples the slots should point to. To fix, double only so long as we've used less than half of allowedMem in total. Then do one more array enlargement, but scale it based on total memory consumption so far. This will work nicely as long as the average tuple size is reasonably stable, and in any case should be better than the old method. This change will result in large sort operations consuming a larger fraction of work_mem than they typically did in the past. The release notes should mention that users may want to revisit their work_mem settings, if they'd tuned those settings based on the old behavior of sorting. Jeff Janes, reviewed by Peter Geoghegan and Robert Haas	2013-01-17 13:12:56 -05:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Alvaro Herrera	976fa10d20	Add support for easily declaring static inline functions We already had those, but they forced modules to spell out the function bodies twice. Eliminate some duplicates we had already grown. Extracted from a somewhat larger patch from Andres Freund.	2012-10-08 16:28:01 -03:00
Alvaro Herrera	c219d9b0a5	Split tuple struct defs from htup.h to htup_details.h This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.	2012-08-30 16:52:35 -04:00
Bruce Momjian	042d9ffc28	Run newly-configured perltidy script on Perl files. Run on HEAD and 9.2.	2012-07-04 21:47:49 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Tom Lane	95b9c333b2	Further adjustment of comment about qsort_tuple.	2012-04-07 17:48:40 -04:00
Tom Lane	17b985b1a0	Fix broken comparetup_datum code. Commit `337b6f5ecf` contained the entirely fanciful assumption that it had made comparetup_datum unreachable. Reported and patched by Takashi Yamamoto. Fix up some not terribly accurate/useful comments from that commit, too.	2012-04-06 16:58:50 -04:00
Robert Haas	aefa6d163e	Add some CHECK_FOR_INTERRUPTS() calls to the heap-sort call path. I broke this in commit `337b6f5ecf`, which among other things arranged for quicksorts to CHECK_FOR_INTERRUPTS() slightly less frequently. Sadly, it also arranged for heapsorts to CHECK_FOR_INTERRUPTS() much less frequently. Repair.	2012-03-20 21:26:39 -04:00
Robert Haas	edec8c8e00	Fix VPATH builds, broken by my recent commit to speed up tuplesorting. The relevant commit is `337b6f5ecf`.	2012-02-15 15:53:53 -05:00
Robert Haas	337b6f5ecf	Speed up in-memory tuplesorting. Per recent work by Peter Geoghegan, it's significantly faster to tuplesort on a single sortkey if ApplySortComparator is inlined into quicksort rather reached via a function pointer. It's also faster in general to have a version of quicksort which is specialized for sorting SortTuple objects rather than objects of arbitrary size and type. This requires a couple of additional copies of the quicksort logic, which in this patch are generate using a Perl script. There might be some benefit in adding further specializations here too, but thus far it's not clear that those gains are worth their weight in code footprint.	2012-02-15 12:13:32 -05:00
Robert Haas	c5a03256c7	Adjust tuplesort.c based on the fact that we never use the OS's qsort(). Our own qsort_arg() implementation doesn't have the defect previously observed to affect only QNX 4, so it seems sufficiently to assert that it isn't broken rather than retesting. Also, update a few comments to clarify why it's valuable to retain a tie-break rule based on CTID during index builds. Peter Geoghegan, with slight tweaks by me.	2012-01-26 14:43:28 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Tom Lane	c6e3ac11b6	Create a "sort support" interface API for faster sorting. This patch creates an API whereby a btree index opclass can optionally provide non-SQL-callable support functions for sorting. In the initial patch, we only use this to provide a directly-callable comparator function, which can be invoked with a bit less overhead than the traditional SQL-callable comparator. While that should be of value in itself, the real reason for doing this is to provide a datatype-extensible framework for more aggressive optimizations, as in Peter Geoghegan's recent work. Robert Haas and Tom Lane	2011-12-07 00:19:39 -05:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Tom Lane	10db3de66e	Fix failure to account for memory used by tuplestore_putvalues(). This oversight could result in a tuplestore using much more than the intended amount of memory. It would only happen in a code path that loaded a tuplestore via tuplestore_putvalues(), and many of those won't emit huge amounts of data; but cases such as holdable cursors and plpgsql's RETURN NEXT command could have the problem. The fix ensures that the tuplestore will switch to write-to-disk mode when it overruns work_mem. The potential overrun was finite, because we would still count the space used by the tuple pointer array, so the tuplestore code would eventually flip into write-to-disk mode anyway. When storing wide tuples we would go far past the expected work_mem usage before that happened; but this may account for the lack of prior reports. Back-patch to 8.4, where tuplestore_putvalues was introduced. Per bug #6061 from Yann Delorme.	2011-06-15 14:05:22 -04:00
Tom Lane	d64713df7e	Pass collations to functions in FunctionCallInfoData, not FmgrInfo. Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...	2011-04-12 19:19:24 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Tom Lane	7208fae18f	Clean up cruft around collation initialization for tupdescs and scankeys. I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.	2011-03-26 18:28:40 -04:00
Tom Lane	b310b6e31c	Revise collation derivation method and expression-tree representation. All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.	2011-03-19 20:30:08 -04:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Tom Lane	244407a710	Fix efficiency problems in tuplestore_trim(). The original coding in tuplestore_trim() was only meant to work efficiently in cases where each trim call deleted most of the tuples in the store. Which, in fact, was the pattern of the original usage with a Material node supporting mark/restore operations underneath a MergeJoin. However, WindowAgg now uses tuplestores and it has considerably less friendly trimming behavior. In particular it can attempt to trim one tuple at a time off a large tuplestore. tuplestore_trim() had O(N^2) runtime in this situation because of repeatedly shifting its tuple pointer array. Fix by avoiding shifting the array until a reasonably large number of tuples have been deleted. This can waste some pointer space, but we do still reclaim the tuples themselves, so the percentage wastage should be pretty small. Per Jie Li's report of slow percent_rank() evaluation. cume_dist() and ntile() would certainly be affected as well, along with any other window function that has a moving frame start and requires reading substantially ahead of the current row. Back-patch to 8.4, where window functions were introduced. There's no need to tweak it before that.	2010-12-10 11:33:38 -05:00
Tom Lane	26a7b48e10	Eliminate some repetitive coding in tuplesort.c. Use a macro LogicalTapeReadExact() to encapsulate the error check when we want to read an exact number of bytes from a "tape". Per a suggestion of Takahiro Itagaki.	2010-10-07 20:32:21 -04:00
Tom Lane	3ba11d3df2	Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan. ... or at least, when the planner's cost estimates say it will be faster. Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane	2010-10-07 20:00:28 -04:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Heikki Linnakangas	84d723b6ce	Previous fix for temporary file management broke returning a set from PL/pgSQL function within an exception handler. Make sure we use the right resource owner when we create the tuplestore to hold returned tuples. Simplify tuplestore API so that the caller doesn't need to be in the right memory context when calling tuplestore_put* functions. tuplestore.c automatically switches to the memory context used when the tuplestore was created. Tuplesort was already modified like this earlier. This patch also removes the now useless MemoryContextSwitch calls from callers. Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like the previous patch that broke this.	2009-12-29 17:40:59 +00:00
Tom Lane	9bd27b7c9e	Extend EXPLAIN to support output in XML or JSON format. There are probably still some adjustments to be made in the details of the output, but this gets the basic structure in place. Robert Haas	2009-08-10 05:46:50 +00:00
Tom Lane	527f0ae3fa	Department of second thoughts: let's show the exact key during unique index build failures, too. Refactor a bit more since that error message isn't spelled the same.	2009-08-01 20:59:17 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	25bf7f8b9b	Fix possible failures when a tuplestore switches from in-memory to on-disk mode while callers hold pointers to in-memory tuples. I reported this for the case of nodeWindowAgg's primary scan tuple, but inspection of the code shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk. For the moment, fix it with a rather brute-force approach of copying whenever one of the at-risk callers requests a tuple. Later we might think of some sort of reference-count approach to reduce tuple copying.	2009-03-27 18:30:21 +00:00
Tom Lane	e04810e8c4	Code review for dtrace probes added (so far) to 8.4. Adjust placement of some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.	2009-03-11 23:19:25 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Tom Lane	95b07bc7f5	Support window functions a la SQL:2008. Hitoshi Harada, with some kibitzing from Heikki and Tom.	2008-12-28 18:54:01 +00:00
Tom Lane	38e9348282	Make a couple of small changes to the tuplestore API, for the benefit of the upcoming window-functions patch. First, tuplestore_trim is now an exported function that must be explicitly invoked by callers at appropriate times, rather than something that tuplestore tries to do behind the scenes. Second, a read pointer that is marked as allowing backward scan no longer prevents truncation. This means that a read pointer marked as having BACKWARD but not REWIND capability can only safely read backwards as far as the oldest other read pointer. (The expected use pattern for this involves having another read pointer that serves as the truncation fencepost.)	2008-12-27 17:39:00 +00:00
Tom Lane	d26bf23f34	Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems worth the slight additional uglification of the tuple read/write routines.	2008-10-28 15:51:03 +00:00
Tom Lane	34f89cb4af	Fix oversight in recent patch to support multiple read positions in tuplestore: in READFILE state tuplestore_select_read_pointer must save the current file seek position in the read pointer being deactivated.	2008-10-07 00:05:55 +00:00
Tom Lane	44d5be0e53	Implement SQL-standard WITH clauses, including WITH RECURSIVE. There are some unimplemented aspects: recursive queries must use UNION ALL (should allow UNION too), and we don't have SEARCH or CYCLE clauses. These might or might not get done for 8.4, but even without them it's a pretty useful feature. There are also a couple of small loose ends and definitional quibbles, which I'll send a memo about to pgsql-hackers shortly. But let's land the patch now so we can get on with other development. Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane	2008-10-04 21:56:55 +00:00
Tom Lane	dad4cb6258	Improve tuplestore.c to support multiple concurrent read positions. This facility replaces the former mark/restore support but is otherwise upward-compatible with previous uses. It's expected to be needed for single evaluation of CTEs and also for window functions, so I'm committing it separately instead of waiting for either one of those patches to be finished. Per discussion with Greg Stark and Hitoshi Harada. Note: I removed nodeFunctionscan's mark/restore support, instead of bothering to update it for this change, because it was dead code anyway.	2008-10-01 19:51:50 +00:00
Tom Lane	4adc2f72a4	Change hash indexes to store only the hash code rather than the whole indexed value. This means that hash index lookups are always lossy and have to be rechecked when the heap is visited; however, the gain in index compactness outweighs this when the indexed values are wide. Also, we only need to perform datatype comparisons when the hash codes match exactly, rather than for every entry in the hash bucket; so it could also win for datatypes that have expensive comparison functions. A small additional win is gained by keeping hash index pages sorted by hash code and using binary search to reduce the number of index tuples we have to look at. Xiao Meng This commit also incorporates Zdenek Kotala's patch to isolate hash metapages and hash bitmaps a bit better from the page header datastructures.	2008-09-15 18:43:41 +00:00
Alvaro Herrera	e36e6b1cab	Add a few more DTrace probes to the backend. Robert Lor	2008-08-01 13:16:09 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Neil Conway	1d812a98b4	Add a new tuplestore API function, tuplestore_putvalues(). This is identical to tuplestore_puttuple(), except it operates on arrays of Datums + nulls rather than a fully-formed HeapTuple. In several places that use the tuplestore API, this means we can avoid creating a HeapTuple altogether, saving a copy.	2008-03-25 19:26:54 +00:00
Tom Lane	0c5962c054	Grab some low-hanging fruit in the new hash index build code. oprofile shows that a nontrivial amount of time is being spent in repeated calls to index_getprocinfo, which really only needs to be called once. So do that, and inline _hash_datum2hashkey to make it work.	2008-03-17 03:45:36 +00:00
Tom Lane	787eba734b	When creating a large hash index, pre-sort the index entries by estimated bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.	2008-03-16 23:15:08 +00:00
Tom Lane	f0828b2fc3	Provide a build-time option to store large relations as single files, rather than dividing them into 1GB segments as has been our longtime practice. This requires working support for large files in the operating system; at least for the time being, it won't be the default. Zdenek Kotala	2008-03-10 20:06:27 +00:00
Peter Eisentraut	0474dcb608	Refactor backend makefiles to remove lots of duplicate code	2008-02-19 10:30:09 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	2aae35d049	Mention the index name in 'could not create unique index' errors, per suggestion from Rene Gollent.	2007-10-29 21:31:28 +00:00
Tom Lane	d2825e1c85	Since sort_bounded_heap makes state changes that should be made regardless of the number of tuples involved, it's incorrect to skip it when memtupcount = 1; the number of cycles saved is minuscule anyway. An alternative solution would be to pull the state changes out to the call site in tuplesort_performsort, but keeping them near the corresponding changes in make_bounded_heap seems marginally cleaner. Noticed by Greg Stark.	2007-09-01 18:47:39 +00:00
Neil Conway	494d6f809e	Fix a memory leak in tuplestore_end(). Unlikely to be significant during normal operation, but tuplestore_end() ought to do what it claims to do.	2007-08-02 17:48:52 +00:00
Tom Lane	24ee8af573	Rework temp_tablespaces patch so that temp tablespaces are assigned separately for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.	2007-06-07 19:19:57 +00:00
Tom Lane	acfce502ba	Create a GUC parameter temp_tablespaces that allows selection of the tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.	2007-06-03 17:08:34 +00:00
Tom Lane	2415ad9831	Teach tuplestore.c to throw away data before the "mark" point when the caller is using mark/restore but not rewind or backward-scan capability. Insert a materialize plan node between a mergejoin and its inner child if the inner child is a sort that is expected to spill to disk. The materialize shields the sort from the need to do mark/restore and thereby allows it to perform its final merge pass on-the-fly; while the materialize itself is normally cheap since it won't spill to disk unless the number of tuples with equal key values exceeds work_mem. Greg Stark, with some kibitzing from Tom Lane.	2007-05-21 17:57:35 +00:00
Tom Lane	d2a4a4069f	Add a line to the EXPLAIN ANALYZE output for a Sort node, showing the actual sort strategy and amount of space used. By popular demand.	2007-05-04 21:29:53 +00:00
Tom Lane	d26559dbf3	Teach tuplesort.c about "top N" sorting, in which only the first N tuples need be returned. We keep a heap of the current best N tuples and sift-up new tuples into it as we scan the input. For M input tuples this means only about Mlog(N) comparisons instead of Mlog(M), not to mention a lot less workspace when N is small --- avoiding spill-to-disk for large M is actually the most attractive thing about it. Patch includes planner and executor support for invoking this facility in ORDER BY ... LIMIT queries. Greg Stark, with some editorialization by moi.	2007-05-04 01:13:45 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Tom Lane	a191a169d6	Change the planner-to-executor API so that the planner tells the executor which comparison operators to use for plan nodes involving tuple comparison (Agg, Group, Unique, SetOp). Formerly the executor looked up the default equality operator for the datatype, which was really pretty shaky, since it's possible that the data being fed to the node is sorted according to some nondefault operator class that could have an incompatible idea of equality. The planner knows what it has sorted by and therefore can provide the right equality operator to use. Also, this change moves a couple of catalog lookups out of the executor and into the planner, which should help startup time for pre-planned queries by some small amount. Modify the planner to remove some other cavalier assumptions about always being able to use the default operators. Also add "nulls first/last" info to the Plan node for a mergejoin --- neither the executor nor the planner can cope yet, but at least the API is in place.	2007-01-10 18:06:05 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	6edd2b4a91	Switch over to using our own qsort() all the time, as has been proposed repeatedly. Now that we don't have to worry about memory leaks from glibc's qsort, we can safely put CHECK_FOR_INTERRUPTS into the tuplesort comparators, as was requested a couple months ago. Also, get rid of non-reentrancy and an extra level of function call in tuplesort.c by providing a variant qsort_arg() API that passes an extra void * argument through to the comparison routine. (We might want to use that in other places too, I didn't look yet.)	2006-10-03 22:18:23 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Tom Lane	cdd5178c69	Extend the MinimalTuple concept to tuplesort.c, thereby reducing the per-tuple space overhead for sorts in memory. I chose to replace the previous patch that tried to write out the bare minimum amount of data when sorting on disk; instead, just dump the MinimalTuples as-is. This wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap length, but the simplification in the writetup/readtup routines seems worth it.	2006-06-27 16:53:02 +00:00
Tom Lane	3f50ba27cf	Create infrastructure for 'MinimalTuple' representation of in-memory tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.	2006-06-27 02:51:40 +00:00
Tom Lane	7f52e0c50e	Tweak writetup_heap/readtup_heap to avoid storing the tuple identity and transaction visibility fields of tuples being sorted. These are always uninteresting in a tuple being sorted (if the fields were actually selected, they'd have been pulled out into user columns beforehand). This saves about 24 bytes per row being sorted, which is a useful savings for any but the widest of sort rows. Per recent discussion.	2006-05-23 21:37:59 +00:00
Tom Lane	c65ab0bfa9	Recent changes in memory management in tuplesort.c had a problem: the case where we run low on array slots before we run low on memory is much more probable than I had thought, and so it's important to treat each tape fairly in that case. To fix this, track per-tape slot allocations just like we track per-tape space allocation. Also, in the FINALMERGE code path avoid scanning all the input tapes when we really only need to read from one. This should fix poor behavior with very large work_mem as exhibited by Stefan Kaltenbrunner. I didn't do anything about putting an upper bound on the number of tapes, but maybe we should still consider that.	2006-03-10 23:19:00 +00:00
Tom Lane	c8cd76de28	Tweak trace_sort code to show the merge order (number of active input tapes) for each merge step. This will give us some idea of how effective the merge distribution algorithm is.	2006-03-08 16:59:03 +00:00
Tom Lane	43ceb3d449	Further examination of ltsReleaseBlock usage shows that it's got a performance issue during regular merge passes not only the 'final merge' case. The original design contemplated that there would never be more than about one free block per 'tape', hence no need for an efficient method of keeping the free blocks sorted. But given the later addition of merge preread behavior in tuplesort.c, there is likely to be about work_mem worth of free blocks, which is not so small ... and for that matter the number of tapes isn't necessarily small anymore either. So we'd better get rid of the assumption entirely. Instead, I'm assuming that the usage pattern will involve alternation between merge preread and writing of a new run. This makes it reasonable to just add blocks to the list without sorting during successive ltsReleaseBlock calls, and then do a qsort() when we start getting ltsGetFreeBlock() calls. Experimentation seems to confirm that there aren't many qsort calls relative to the number of ltsReleaseBlock/ltsGetFreeBlock calls.	2006-03-07 23:46:24 +00:00
Tom Lane	8db05ba411	Repair old performance bug in tuplesort.c/logtape.c. In the case where we are doing the final merge pass on-the-fly, and not writing the data back onto a 'tape', the number of free blocks in the tape set will become large, leading to a lot of time wasted in ltsReleaseBlock(). There is really no need to track the free blocks anymore in this state, so add a simple shutoff switch. Per report from Stefan Kaltenbrunner.	2006-03-07 19:06:50 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	2689abf078	Incorporate a couple of recent tuplesort.c improvements into tuplestore.c. In particular, ensure that enlargement of the memtuples[] array doesn't fall foul of MaxAllocSize when work_mem is very large, and don't bother enlarging it if that would force an immediate switch into 'tape' mode anyway.	2006-03-04 19:30:12 +00:00
Tom Lane	80cadb303c	Prevent sorting from requesting a SortTuple array that exceeds MaxAllocSize; we'll go over to disk-based sort if we reach that limit. This fixes Stefan Kaltenbrunner's observation that sorting can suffer an 'invalid memory alloc request size' failure when sort_mem is set large enough. It's unfortunately not so easy to fix in 8.1 ...	2006-03-04 19:05:06 +00:00
Tom Lane	909ca1407c	Improve sorting speed by pre-extracting the first sort-key column of each tuple, as per my proposal of several days ago. Also, clean up sort memory management by keeping all working data in a separate memory context, and refine the handling of low-memory conditions.	2006-02-26 22:58:12 +00:00
Tom Lane	21e2544aa7	Update obsolete comment.	2006-02-19 19:59:53 +00:00
Tom Lane	b34aa3372f	Modify logtape.c so that the initial LogicalTapeSetCreate call only allocates the control data. The per-tape buffers are allocated only on first use. This saves memory in situations where tuplesort.c overestimates the number of tapes needed (ie, there are fewer runs than tapes). Also, this makes legitimate the coding in inittapes() that includes tape buffer space in the maximum-memory calculation: when inittapes runs, we've already expended the whole allowed memory on tuple storage, and so we'd better not allocate all the tape buffers until we've flushed some tuples out of memory.	2006-02-19 05:58:36 +00:00
Tom Lane	df700e6b40	Improve tuplesort.c to support variable merge order. The original coding with fixed merge order (fixed number of "tapes") was based on obsolete assumptions, namely that tape drives are expensive. Since our "tapes" are really just a couple of buffers, we can have a lot of them given adequate workspace. This allows reduction of the number of merge passes with consequent savings of I/O during large sorts. Simon Riggs with some rework by Tom Lane	2006-02-19 05:54:06 +00:00
Bruce Momjian	a1675649e4	Remove QNX port.	2006-01-05 01:56:30 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	dd218ae7b0	Remove the t_datamcxt field of HeapTupleData. This was introduced for the convenience of tuptoaster.c and is no longer needed, so may as well get rid of some small amount of overhead.	2005-11-20 19:49:08 +00:00
Bruce Momjian	9ee8b9fd38	Change trace_sort to output to the log, rather than the user's terminal.	2005-10-25 13:47:08 +00:00
Tom Lane	b33a732264	Improve trace_sort code to also show the total memory or disk space used. Per request from Marc.	2005-10-18 22:59:37 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	53e47cdd79	Add a trace_sort option to help with measuring resource usage of external sort operations. Per recent discussion. Simon Riggs and Tom Lane.	2005-10-03 22:55:56 +00:00
Tom Lane	06d70d78a4	Fix typo in comment.	2005-09-23 15:36:57 +00:00
Bruce Momjian	b492c3accc	Add parentheses to macros when args are used in computations. Without them, the executation behavior could be unexpected.	2005-05-25 21:40:43 +00:00
Tom Lane	278bd0cc22	For some reason access/tupmacs.h has been #including utils/memutils.h, which is neither needed by nor related to that header. Remove the bogus inclusion and instead include the header in those C files that actually need it. Also fix unnecessary inclusions and bad inclusion order in tsearch2 files.	2005-05-06 17:24:55 +00:00
Tom Lane	bd9b4a9d46	Use InitFunctionCallInfoData() macro instead of MemSet in performance critical places in execQual. By Atsushi Ogawa; some minor cleanup by moi.	2005-03-22 20:13:09 +00:00
Tom Lane	ad476170e9	Improve performance of fmgr.c calling routines for cases with more than two arguments. Per suggestions from A. Ogawa.	2005-02-02 22:40:04 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	fbac1272b8	During btree index build, sort equal-keyed tuples according to their TID (heap position). This doesn't do anything to the validity of the finished index, but by pretending to qsort() that there are no really equal keys in the sort, we can avoid performance problems with qsort implementations that have trouble with large numbers of equal keys. Patch from Manfred Koizar.	2004-03-17 22:24:58 +00:00
Tom Lane	391c3811a2	Rename SortMem and VacuumMem to work_mem and maintenance_work_mem. Make btree index creation and initial validation of foreign-key constraints use maintenance_work_mem rather than work_mem as their memory limit. Add some code to guc.c to allow these variables to be referenced by their old names in SHOW and SET commands, for backwards compatibility.	2004-02-03 17:34:04 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	fa5c8a055a	Cross-data-type comparisons are now indexable by btrees, pursuant to my pghackers proposal of 8-Nov. All the existing cross-type comparison operators (int2/int4/int8 and float4/float8) have appropriate support. The original proposal of storing the right-hand-side datatype as part of the primary key for pg_amop and pg_amproc got modified a bit in the event; it is easier to store zero as the 'default' case and only store a nonzero when the operator is actually cross-type. Along the way, remove the long-since-defunct bigbox_ops operator class.	2003-11-12 21:15:59 +00:00
Tom Lane	c1d62bfd00	Add operator strategy and comparison-value datatype fields to ScanKey. Remove the 'strategy map' code, which was a large amount of mechanism that no longer had any use except reverse-mapping from procedure OID to strategy number. Passing the strategy number to the index AM in the first place is simpler and faster. This is a preliminary step in planned support for cross-datatype index operations. I'm committing it now since the ScanKeyEntryInitialize() API change touches quite a lot of files, and I want to commit those changes before the tree drifts under me.	2003-11-09 21:30:38 +00:00
Tom Lane	ec646dbc65	Create a 'type cache' that keeps track of the data needed for any particular datatype by array_eq and array_cmp; use this to solve problems with memory leaks in array indexing support. The parser's equality_oper and ordering_oper routines also use the cache. Change the operator search algorithms to look for appropriate btree or hash index opclasses, instead of assuming operators named '<' or '=' have the right semantics. (ORDER BY ASC/DESC now also look at opclasses, instead of assuming '<' and '>' are the right things.) Add several more index opclasses so that there is no regression in functionality for base datatypes. initdb forced due to catalog additions.	2003-08-17 19:58:06 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	689eb53e47	Error message editing in backend/utils (except /adt).	2003-07-25 20:18:01 +00:00
Tom Lane	1c9ac7dfd0	Change pg_amop's index on (amopclaid,amopopr) to index (amopopr,amopclaid). This makes no difference for existing uses, but allows SelectSortFunction() and pred_test_simple_clause() to use indexscans instead of seqscans to locate entries for a particular operator in pg_amop. Better yet, they can use the SearchSysCacheList() API to cache the search results.	2003-05-13 04:38:58 +00:00
Tom Lane	4a5f38c4e6	Code review for holdable-cursors patch. Fix error recovery, memory context sloppiness, some other things. Includes Neil's mopup patch of 22-Apr.	2003-04-29 03:21:30 +00:00
Bruce Momjian	54f7338fa1	This patch implements holdable cursors, following the proposal (materialization into a tuple store) discussed on pgsql-hackers earlier. I've updated the documentation and the regression tests. Notes on the implementation: - I needed to change the tuple store API slightly -- it assumes that it won't be used to hold data across transaction boundaries, so the temp files that it uses for on-disk storage are automatically reclaimed at end-of-transaction. I added a flag to tuplestore_begin_heap() to control this behavior. Is changing the tuple store API in this fashion OK? - in order to store executor results in a tuple store, I added a new CommandDest. This works well for the most part, with one exception: the current DestFunction API doesn't provide enough information to allow the Executor to store results into an arbitrary tuple store (where the particular tuple store to use is chosen by the call site of ExecutorRun). To workaround this, I've temporarily hacked up a solution that works, but is not ideal: since the receiveTuple DestFunction is passed the portal name, we can use that to lookup the Portal data structure for the cursor and then use that to get at the tuple store the Portal is using. This unnecessarily ties the Portal code with the tupleReceiver code, but it works... The proper fix for this is probably to change the DestFunction API -- Tom suggested passing the full QueryDesc to the receiveTuple function. In that case, callers of ExecutorRun could "subclass" QueryDesc to add any additional fields that their particular CommandDest needed to get access to. This approach would work, but I'd like to think about it for a little bit longer before deciding which route to go. In the mean time, the code works fine, so I don't think a fix is urgent. - (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and adjusted the behavior of SCROLL in accordance with the discussion on -hackers. - (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml Neil Conway	2003-03-27 16:51:29 +00:00
Tom Lane	aa60eecc37	Revise tuplestore and nodeMaterial so that we don't have to read the entire contents of the subplan into the tuplestore before we can return any tuples. Instead, the tuplestore holds what we've already read, and we fetch additional rows from the subplan as needed. Random access to the previously-read rows works with the tuplestore, and doesn't affect the state of the partially-read subplan. This is a step towards fixing the problems with cursors over complex queries --- we don't want to stick in Materialize nodes if they'll prevent quick startup for a cursor.	2003-03-09 02:19:13 +00:00
Bruce Momjian	9b12ab6d5d	Add new palloc0 call as merge of palloc and MemSet(0).	2002-11-13 00:39:48 +00:00
Bruce Momjian	75fee4535d	Back out use of palloc0 in place if palloc/MemSet. Seems constant len to MemSet is a performance boost.	2002-11-11 03:02:20 +00:00
Bruce Momjian	8fee9615cc	Merge palloc()/MemSet(0) calls into a single palloc0() call.	2002-11-10 07:25:14 +00:00
Tom Lane	5936055d46	Avoid use of inline functions that are not declared static. Needed to conform to C99's brain-dead notion of how inline functions should work.	2002-10-31 19:11:48 +00:00
Tom Lane	3b8ba163d0	Tweak a few of the most heavily used function call points to zero out just the significant fields of FunctionCallInfoData, rather than MemSet'ing the whole struct to zero. Unused positions in the arg[] array will thereby contain garbage rather than zeroes. This buys back some of the performance hit from increasing FUNC_MAX_ARGS. Also tweak tuplesort.c code for more speed by marking some routines 'inline'. All together these changes speed up simple sorts, like count(distinct int4column), by about 25% on a P4 running RH Linux 7.2.	2002-10-04 17:19:55 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Tom Lane	976246cc7e	The cstring datatype can now be copied, passed around, etc. The typlen value '-2' is used to indicate a variable-width type whose width is computed as strlen(datum)+1. Everything that looks at typlen is updated except for array support, which Joe Conway is working on; at the moment it wouldn't work to try to create an array of cstring.	2002-08-24 15:00:47 +00:00
Tom Lane	77a7e9968b	Change memory-space accounting mechanism in tuplesort.c and tuplestore.c to make a reasonable attempt at accounting for palloc overhead, not just the requested size of each memory chunk. Since in many scenarios this will make for a significant reduction in the amount of space acquired, partially compensate by doubling the default value of SORT_MEM to 1Mb. Per discussion in pgsql-general around 9-Jun-2002..	2002-08-12 00:36:12 +00:00
Bruce Momjian	d84fe82230	Update copyright to 2002.	2002-06-20 20:29:54 +00:00
Tom Lane	44fbe20d62	Restructure indexscan API (index_beginscan, index_getnext) per yesterday's proposal to pghackers. Also remove unnecessary parameters to heap_beginscan, heap_rescan. I modified pg_proc.h to reflect the new numbers of parameters for the AM interface routines, but did not force an initdb because nothing actually looks at those fields.	2002-05-20 23:51:44 +00:00
Tom Lane	3b6cbce458	Add CHECK_FOR_INTERRUPTS() in various strategic spots, per comments from Hiroshi.	2002-01-06 00:37:44 +00:00
Tom Lane	69a59150c2	Defend against brain-dead QNX implementation of qsort(). Per report from Bernd Tegge, 10-Nov-01.	2001-11-11 22:00:25 +00:00
Bruce Momjian	6783b2372e	Another pgindent run. Fixes enum indenting, and improves #endif spacing. Also adds space for one-line comments.	2001-10-28 06:26:15 +00:00
Bruce Momjian	b81844b173	pgindent run on all C files. Java run to follow. initdb/regression tests pass.	2001-10-25 05:50:21 +00:00
Tom Lane	f933766ba7	Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions in pgsql-hackers. pg_opclass now has a row for each opclass supported by each index AM, not a row for each opclass name. This allows pg_opclass to show directly whether an AM supports an opclass, and furthermore makes it possible to store additional information about an opclass that might be AM-dependent. pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we previously expected the user to remember to provide in CREATE INDEX commands. Lossiness is no longer an index-level property, but is associated with the use of a particular operator in a particular index opclass. Along the way, IndexSupportInitialize now uses the syscaches to retrieve pg_amop and pg_amproc entries. I find this reduces backend launch time by about ten percent, at the cost of a couple more special cases in catcache.c's IndexScanOK. Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane. initdb forced.	2001-08-21 16:36:06 +00:00
Tom Lane	5433b48380	Tweak sorting so that nulls appear at the front of a descending sort (vs. at the end of a normal sort). This ensures that explicit sorts yield the same ordering as a btree index scan. To be really sure that that equivalence holds, we use the btree entries in pg_amop to decide whether we are looking at a '<' or '>' operator. For a sort operator that has no btree association, we put the nulls at the front if the operator is named '>' ... pretty grotty, but it does the right thing in simple ASC and DESC cases, and at least there's no possibility of getting a different answer depending on the plan type chosen.	2001-06-02 19:01:53 +00:00
Tom Lane	f905d65ee3	Rewrite of planner statistics-gathering code. ANALYZE is now available as a separate statement (though it can still be invoked as part of VACUUM, too). pg_statistic redesigned to be more flexible about what statistics are stored. ANALYZE now collects a list of several of the most common values, not just one, plus a histogram (not just the min and max values). Random sampling is used to make the process reasonably fast even on very large tables. The number of values and histogram bins collected is now user-settable via an ALTER TABLE command. There is more still to do; the new stats are not being used everywhere they could be in the planner. But the remaining changes for this project should be localized, and the behavior is already better than before. A not-very-related change is that sorting now makes use of btree comparison routines if it can find one, rather than invoking '<' twice.	2001-05-07 00:43:27 +00:00
Bruce Momjian	7cf952e7b4	Fix comments that were mis-wrapped, for Tom Lane.	2001-03-23 04:49:58 +00:00
Bruce Momjian	9e1552607a	pgindent run. Make it all clean.	2001-03-22 04:01:46 +00:00
Tom Lane	0d54d6ac44	Clean up handling of tuple descriptors so that result-tuple descriptors allocated by plan nodes are not leaked at end of query. This doesn't really matter for normal queries, but it sure does for queries invoked repetitively inside SQL functions. Clean up some other grotty code associated with tupdescs, and fix a few other memory leaks exposed by tests with simple SQL functions.	2001-01-29 00:39:20 +00:00
Bruce Momjian	623bf843d2	Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.	2001-01-24 19:43:33 +00:00
Tom Lane	a933ee38bb	Change SearchSysCache coding conventions so that a reference count is maintained for each cache entry. A cache entry will not be freed until the matching ReleaseSysCache call has been executed. This eliminates worries about cache entries getting dropped while still in use. See my posting to pg-hackers of even date for more info.	2000-11-16 22:30:52 +00:00
Peter Eisentraut	424f0edcb8	Fix relative path references so that make knowns which dependencies refer to one another. Sort out builddir vs srcdir variable namings. Remove some now obsoleted make variables.	2000-08-31 16:12:35 +00:00
Tom Lane	1ee26b7764	Reimplement nodeMaterial to use a temporary BufFile (or even memory, if the materialized tupleset is small enough) instead of a temporary relation. This was something I was thinking of doing anyway for performance, and Jan says he needs it for TOAST because he doesn't want to cope with toasting noname relations. With this change, the 'noname table' support in heap.c is dead code, and I have accordingly removed it. Also clean up 'noname' plan handling in planner --- nonames are either sort or materialize plans, and it seems less confusing to handle them separately under those names.	2000-06-18 22:44:35 +00:00
Tom Lane	0f1e39643d	Third round of fmgr updates: eliminate calls using fmgr() and fmgr_faddr() in favor of new-style calls. Lots of cleanup of sloppy casts to use XXXGetDatum and DatumGetXXX ...	2000-05-30 04:25:00 +00:00
Tom Lane	091126fa28	Generated header files parse.h and fmgroids.h are now copied into the src/include tree, so that -I backend is no longer necessary anywhere. Also, clean up some bit rot in contrib tree.	2000-05-29 05:45:56 +00:00
Bruce Momjian	52f77df613	Ye-old pgindent run. Same 4-space tabs.	2000-04-12 17:17:23 +00:00
Tom Lane	341b328b18	Fix a bunch of minor portability problems and maybe-bugs revealed by running gcc and HP's cc with warnings cranked way up. Signed vs unsigned comparisons, routines declared static and then defined not-static, that kind of thing. Tedious, but perhaps useful...	2000-03-17 02:36:41 +00:00
Tom Lane	73dd716285	Small performance improvement in comparetup_heap.	2000-03-01 17:14:09 +00:00
Tom Lane	8cb624262a	Replace inefficient _bt_invokestrat calls with direct calls to the appropriate btree three-way comparison routine. Not clear why the three-way comparison routines were being used in some paths and not others in btree --- incomplete changes by someone long ago, maybe? Anyway, this makes for a nice speedup in CREATE INDEX.	2000-02-18 06:32:39 +00:00
Bruce Momjian	5c25d60244	Add: * Portions Copyright (c) 1996-2000, PostgreSQL, Inc to all files copyright Regents of Berkeley. Man, that's a lot of files.	2000-01-26 05:58:53 +00:00
Jan Wieck	397e9b32a3	Some changes to prepare for LONG attributes. Jan	1999-12-16 22:20:03 +00:00
Bruce Momjian	a82f9ffde6	New LDOUT makefile variable for QNX os.	1999-12-13 22:35:27 +00:00
Tom Lane	a8ae19ec3d	aggregate(DISTINCT ...) works, per SQL spec. Note this forces initdb because of change of Aggref node in stored rules.	1999-12-13 01:27:21 +00:00
Bruce Momjian	3ffd3d82db	Make LD -r as macros that can be changed for QNX.	1999-12-09 19:15:45 +00:00
Tom Lane	cf627ab41a	Further performance improvements in sorting: reduce number of comparisons during initial run formation by keeping both current run and next-run tuples in the same heap (yup, Knuth is smarter than I am). And, during merge passes, make use of available sort memory to load multiple tuples from any one input 'tape' at a time, thereby improving locality of access to the temp file.	1999-10-30 17:27:15 +00:00
Tom Lane	887afac1f5	Remove now-dead sort modules.	1999-10-17 22:19:07 +00:00
Tom Lane	26c48b5e8c	Final stage of psort reconstruction work: replace psort.c with a generalized module 'tuplesort.c' that can sort either HeapTuples or IndexTuples, and is not tied to execution of a Sort node. Clean up memory leakages in sorting, and replace nbtsort.c's private implementation of mergesorting with calls to tuplesort.c.	1999-10-17 22:15:09 +00:00
Tom Lane	957146dcec	Second phase of psort reconstruction project: add bookkeeping logic to recycle storage within sort temp file on a block-by-block basis. This reduces peak disk usage to essentially just the volume of data being sorted, whereas it had been about 4x the data volume before.	1999-10-16 19:49:28 +00:00
Tom Lane	db3c4c3a2d	Split 'BufFile' routines out of fd.c into a new module, buffile.c. Extend BufFile so that it handles multi-segment temporary files transparently. This allows sorts and hashes to work with data exceeding 2Gig (or whatever the local limit on file size is). Change psort.c to use relative seeks instead of absolute seeks for backwards scanning, so that it won't fail when the data volume exceeds 2Gig.	1999-10-13 15:02:32 +00:00
Bruce Momjian	3406901a29	Move some system includes into c.h, and remove duplicates.	1999-07-17 20:18:55 +00:00
Bruce Momjian	69817665cb	Final cleanup	1999-07-16 05:23:30 +00:00
Bruce Momjian	2e6b1e63a3	Remove unused #includes in *.c files.	1999-07-15 22:40:16 +00:00
Tom Lane	cc62dc2032	Fix tuplecmp() to ensure repeatable sort ordering of tuples that contain null fields. Old code would produce erratic sort results because comparisons of tuples containing nulls could produce inconsistent answers.	1999-07-10 18:21:59 +00:00
Bruce Momjian	fcff1cdf4e	Another pgindent run. Sorry folks.	1999-05-25 22:43:53 +00:00
Bruce Momjian	07842084fe	pgindent run over code.	1999-05-25 16:15:34 +00:00
Tom Lane	71d5d95376	Update hash and join routines to use fd.c's new temp-file code, instead of not-very-bulletproof stuff they had before.	1999-05-09 00:53:22 +00:00
Bruce Momjian	6724a50787	Change my-function-name-- to my_function_name, and optimizer renames.	1999-02-13 23:22:53 +00:00
Bruce Momjian	9322950aa4	Cleanup of source files where 'return' or 'var =' is alone on a line.	1999-02-03 21:18:02 +00:00
Bruce Momjian	4390b0bfbe	Add TEMP tables/indexes. Add COPY pfree(). Other cleanups.	1999-02-02 03:45:56 +00:00
Bruce Momjian	7a6b562fdf	Apply Win32 patch from Horak Daniel.	1999-01-17 06:20:06 +00:00
Bruce Momjian	f0fbd7b87e	Some security, since we now have vsnprintf, I remade an old patch with some extra ugly sprintfs fixed. More work in this area is needed still. Göran Thyni	1999-01-01 04:48:49 +00:00
Marc G. Fournier	9396802f14	more cleanups...of note, appendStringInfo now performs like sprintf(), where you state a format and arguments. the old behavior required each appendStringInfo to have to have a sprintf() before it if any formatting was required. Also shortened several instances where there were multiple appendStringInfo() calls in a row, doing nothing more then adding one more word to the String, instead of doing them all in one call.	1998-12-14 08:11:17 +00:00
Marc G. Fournier	7c3b7d2744	Initial attempt to clean up the code... Switch sprintf() to snprintf() Remove any/all #if 0 -or- #ifdef NOT_USED -or- #ifdef FALSE sections of code	1998-12-14 05:19:16 +00:00
Vadim B. Mikheev	6beba218d7	New HeapTuple structure/interface.	1998-11-27 19:52:36 +00:00
Bruce Momjian	af74855a60	Renaming cleanup, no pgindent yet.	1998-09-01 03:29:17 +00:00
Bruce Momjian	6bd323c6b3	Remove un-needed braces around single statements.	1998-06-15 19:30:31 +00:00
Bruce Momjian	27db9ecd0b	Fix macros that were not properly surrounded by parens or braces.	1998-06-15 18:40:05 +00:00
Bruce Momjian	1e801a8f16	Hi, Attached you'll find a (big) patch that fixes make dep and make depend in all Makefiles where I found it to be appropriate. It also removes the dependency in Makefile.global for NAMEDATALEN and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh a little smarter. This no longer requires initdb.sh that is turned into initdb with a sed script when installing Postgres, hence initdb.sh should be renamed to initdb (after the patch has been applied :-) ) This patch is against the 6.3 sources, as it took a while to complete. Please review and apply, Cheers, Jeroen van Vianen	1998-04-06 00:32:26 +00:00
Bruce Momjian	a32450a585	pgindent run before 6.3 release, with Thomas' requested changes.	1998-02-26 04:46:47 +00:00
Vadim B. Mikheev	f0e7e2faa4	ExecReScan for Unique & Sort nodes.	1998-02-23 06:28:16 +00:00
Bruce Momjian	24cab6bd0d	Goodbye register keyword. Compiler knows better.	1998-02-11 19:14:04 +00:00
Bruce Momjian	79f99a3888	Fix for psort. fixes regression tests.	1998-02-01 22:20:47 +00:00
Bruce Momjian	726c3854cb	Inline fastgetattr and others so data access does not use function calls.	1998-01-31 04:39:26 +00:00
Marc G. Fournier	c58fb21bd4	From: Jeroen van Vianen <jeroenv@design.nl> This patch solves the problem with multiple order by columns, with the first one having NULL values.	1998-01-25 05:18:34 +00:00
Bruce Momjian	c16ebb0f67	getpid/pid cleanup	1998-01-25 05:15:15 +00:00
PostgreSQL Daemon	baef78d96b	Thank god for searchable mail archives. Patch by: wieck@sapserv.debis.de (Jan Wieck) One of the design rules of PostgreSQL is extensibility. And to follow this rule means (at least for me) that there should not only be a builtin PL. Instead I would prefer a defined interface for PL implemetations.	1998-01-15 19:46:37 +00:00
Marc G. Fournier	374bb5d261	Some very major changes by darrenk@insightdist.com (Darren King) ========================================== What follows is a set of diffs that cleans up the usage of BLCKSZ. As a side effect, the person compiling the code can change the value of BLCKSZ _at_their_own_risk_. By that, I mean that I've tried it here at 4096 and 16384 with no ill-effects. A value of 4096 _shouldn't_ affect much as far as the kernel/file system goes, but making it bigger than 8192 can have severe consequences if you don't know what you're doing. 16394 worked for me, _BUT_ when I went to 32768 and did an initdb, the SCSI driver broke and the partition that I was running under went to hell in a hand basket. Had to reboot and do a good bit of fsck'ing to fix things up. The patch can be safely applied though. Just leave BLCKSZ = 8192 and everything is as before. It basically only cleans up all of the references to BLCKSZ in the code. If this patch is applied, a comment in the config.h file though above the BLCKSZ define with warning about monkeying around with it would be a good idea. Darren darrenk@insightdist.com (Also cleans up some of the #includes in files referencing BLCKSZ.) ==========================================	1998-01-13 04:05:12 +00:00
Bruce Momjian	679d39b9c8	Goodbye ABORT. Hello ERROR for all errors.	1998-01-07 21:07:04 +00:00
Bruce Momjian	0d9fc5afd6	Change elog(WARN) to elog(ERROR) and elog(ABORT).	1998-01-05 03:35:55 +00:00
Marc G. Fournier	6e337eef45	Major cleanout of PORTNAME variables from Makefiles...bound to screw up some of the ports...	1997-12-20 00:29:35 +00:00
Marc G. Fournier	5379b84eff	More cleanups. I can now compile without PORTNAME being defined n Makefile.global. End result, if all goes well, should allow for much easier porting, since there will no longer be a concept of a "port". Most, if not everything, should be determined by configure, or by the compiler itself. Still work to be done though :)	1997-12-19 02:09:10 +00:00
Bruce Momjian	f7f2e18f8e	Remove tqual.h includes not needed.	1997-11-24 05:09:50 +00:00
Bruce Momjian	f3af1368bd	Rename strNcpy to StrNCpy, and change third parameter.	1997-10-25 01:10:58 +00:00
Vadim B. Mikheev	78351f422b	Fix for backward cursors with ORDER BY.	1997-10-15 06:36:36 +00:00
Bruce Momjian	5e2c0a87c9	Fix for psort temp file names, from Vadim.	1997-09-26 20:05:47 +00:00
Vadim B. Mikheev	b0ccd78479	Don't limit number of tuples in leftist trees! Use qsort to sort array of tuples for nextrun when current run is done and put into leftist tree from sorted array! It's much faster and creates non-bushy tree - this is ve-e-ery good for perfomance!	1997-09-18 14:41:56 +00:00
Vadim B. Mikheev	712ea2507e	1. Use qsort for first run 2. Limit number of tuples in leftist trees: - put one tuple from current tree to disk if limit reached; - end run creation if limit reached by nextrun. 3. Avoid mergeruns() if first run is single one!	1997-09-18 05:37:31 +00:00
Vadim B. Mikheev	f3e9cf9c6b	Fix pfree problem.	1997-09-15 14:29:01 +00:00
Bruce Momjian	1ea01720d5	heapattr functions now return a Datum, not char *.	1997-09-12 04:09:08 +00:00
Bruce Momjian	59f6a57e59	Used modified version of indent that understands over 100 typedefs.	1997-09-08 21:56:23 +00:00
Bruce Momjian	319dbfa736	Another PGINDENT run that changes variable indenting and case label indenting. Also static variable indenting.	1997-09-08 02:41:22 +00:00
Bruce Momjian	1ccd423235	Massive commit to run PGINDENT on all .c and .h files.	1997-09-07 05:04:48 +00:00
Bruce Momjian	868d708188	Add // comments.	1997-09-05 00:09:47 +00:00
Bruce Momjian	1d8bbfd2e7	Make functions static where possible, enclose unused functions in #ifdef NOT_USED.	1997-08-19 21:40:56 +00:00
Bruce Momjian	022903f22e	Reduce open() calls. Replace fopen() calls with calls to fd.c functions.	1997-08-18 02:15:04 +00:00
Bruce Momjian	fd86ae151a	Cleanup global variables, remove stable memory stuff.	1997-08-14 16:11:41 +00:00
Vadim B. Mikheev	e99e4ba833	sprintf "...%d...", ... (int)getpid(), ... ^^^^^	1997-08-14 05:04:38 +00:00
Bruce Momjian	ea5b5357cd	Remove more (void) and fix -Wall warnings.	1997-08-12 22:55:25 +00:00
Bruce Momjian	edb58721b8	Fix pgproc names over 15 chars in output. Add strNcpy() function. remove some (void) casts that are unnecessary.	1997-08-12 20:16:25 +00:00
Bruce Momjian	dc374505fa	Fix for psort again.	1997-08-06 17:11:20 +00:00
Bruce Momjian	677efc7679	Another psort fix.	1997-08-06 07:39:20 +00:00
Bruce Momjian	42c0cd33a2	I think I finally got psort working for all cases.	1997-08-06 07:02:49 +00:00
Bruce Momjian	cc24b846dd	psort cleanups.	1997-08-06 05:38:46 +00:00
Bruce Momjian	ead219384f	Fix for palloc(0) in new code	1997-08-06 04:45:39 +00:00
Bruce Momjian	f5f366e188	Allow internal sorts to be stored in memory rather than in files.	1997-08-06 03:42:21 +00:00
Bruce Momjian	3ac9d2fff3	Various compile errors concerning overflow due to shifts, unsigned, and bad prototypes, from Solaris, from Diab Jerius	1997-07-24 20:19:10 +00:00
Vadim B. Mikheev	5f893a1e32	Shouldn't we use palloc instead of malloc ? Because of * resetpsort - resets (frees) malloc'd memory for an aborted Xaction * * Not implemented yet.	1997-05-20 11:35:50 +00:00
Bruce Momjian	a0990e1884	Makefile cleanup after reorganization	1996-11-09 06:24:51 +00:00
Marc G. Fournier	0020e8790d	Another directory that compiles with no errors, and few warnings	1996-11-06 10:32:10 +00:00
Marc G. Fournier	c9002ecb21	Produce a clean compile of backend...	1996-11-03 06:54:38 +00:00
Bryan Henderson	b0d6f0aa63	Simplify make files, add full dependencies.	1996-10-27 09:55:05 +00:00
Marc G. Fournier	d31084e9d1	Postgres95 1.01 Distribution - Virgin Sources	1996-07-09 06:22:35 +00:00

... 3 4 5 6 7 ...

427 Commits