postgresql

Commit Graph

Author	SHA1	Message	Date
Simon Riggs	e99767bc28	First part of refactoring of code for ResolveRecoveryConflict. Purposes of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.	2010-01-14 11:08:02 +00:00
Tom Lane	5b76bb180f	Dept of second thoughts: my first cut at supporting "x IS NOT NULL" btree indexscans would do the wrong thing if index_rescan() was called with a NULL instead of a new set of scankeys and the index was DESC order, because sk_strategy would not get flipped a second time. I think that those provisions for a NULL argument are dead code now as far as the core backend goes, but possibly somebody somewhere is still using it. In any case, this refactoring seems clearer, and it's definitely shorter.	2010-01-03 05:39:08 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	29c4ad9829	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.	2010-01-01 21:53:49 +00:00
Simon Riggs	efc16ea520	Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.	2009-12-19 01:32:45 +00:00
Tom Lane	c970292a94	Remove very ancient tuple-counting infrastructure (IncrRetrieved() and friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.	2009-10-08 22:34:57 +00:00
Tom Lane	e66d714386	Make sure that GIN fast-insert and regular code paths enforce the same tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom	2009-10-02 21:14:04 +00:00
Tom Lane	527f0ae3fa	Department of second thoughts: let's show the exact key during unique index build failures, too. Refactor a bit more since that error message isn't spelled the same.	2009-08-01 20:59:17 +00:00
Tom Lane	b680ae4bdb	Improve unique-constraint-violation error messages to include the exact values being complained of. In passing, also remove the arbitrary length limitation in the similar error detail message for foreign key violations. Itagaki Takahiro	2009-08-01 19:59:41 +00:00
Tom Lane	25d9bf2e3e	Support deferrable uniqueness constraints. The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed	2009-07-29 20:56:21 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	32ea236361	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.	2009-06-06 22:13:52 +00:00
Tom Lane	8f348112f3	Insert CHECK_FOR_INTERRUPTS() calls into btree and hash index scans at the points where we step right or left to the next page. This should ensure reasonable response time to a query cancel request during an unsuccessful index scan, as seen in recent gripe from Marc Cousin. It's a bit trickier than it might seem at first glance, because CHECK_FOR_INTERRUPTS() is a no-op if executed while holding a buffer lock. So we have to do it just at the point where we've dropped one page lock and not yet acquired the next. Remove CHECK_FOR_INTERRUPTS calls at the top level of btgetbitmap and hashgetbitmap, since they're pointless given the added checks. I think that GIST is okay already --- at least, there's a CHECK_FOR_INTERRUPTS at a plausible-looking place in gistnext(). I don't claim to know GIN well enough to try to poke it for this, if indeed it has a problem at all. This is a pre-existing issue, but in view of the lack of prior complaints I'm not going to risk back-patching.	2009-05-05 19:36:32 +00:00
Tom Lane	2aa5ca952f	Update comment for _bt_relandgetbuf.	2009-05-05 19:02:22 +00:00
Tom Lane	ff301d6e69	Implement "fastupdate" support for GIN indexes, in which we try to accumulate multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev	2009-03-24 20:17:18 +00:00
Heikki Linnakangas	b2a667b9ee	Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.	2009-01-20 18:59:37 +00:00
Alvaro Herrera	ba748f7a11	Change the reloptions machinery to use a table-based parser, and provide a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.	2009-01-05 17:14:28 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00
Tom Lane	10e3acb8e7	Prevent synchronous scan during GIN index build, because GIN is optimized for inserting tuples in increasing TID order. It's not clear whether this fully explains Ivan Sergio Borgonovo's complaint, but simple testing confirms that a scan that doesn't start at block 0 can slow GIN build by a factor of three or four. Backpatch to 8.3. Sync scan didn't exist before that.	2008-11-13 17:42:10 +00:00
Tom Lane	b4eae023bb	Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage by splitting it into three functions with better-defined behaviors. Zdenek Kotala	2008-11-03 20:47:49 +00:00
Heikki Linnakangas	19c8dc839b	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.	2008-10-31 15:05:00 +00:00
Heikki Linnakangas	89f373bf5b	Index FSMs needs to be vacuumed as well. Report by Jeff Davis.	2008-10-06 08:04:11 +00:00
Heikki Linnakangas	15c121b3ed	Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.	2008-09-30 10:52:14 +00:00
Heikki Linnakangas	3f0e808c4a	Introduce the concept of relation forks. An smgr relation can now consist of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.	2008-08-11 11:05:11 +00:00
Tom Lane	9d035f4254	Clean up the use of some page-header-access macros: principally, use SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.	2008-07-13 20:45:47 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Heikki Linnakangas	a213f1ee6c	Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.	2008-06-12 09:12:31 +00:00
Heikki Linnakangas	96675bff1f	Fix bug in the WAL recovery code to finish an incomplete split. CacheInvalidateRelcache() crashes if called in WAL recovery, because the invalidation infrastructure hasn't been initialized yet. Back-patch to 8.2, where the bug was introduced.	2008-06-11 08:38:56 +00:00
Tom Lane	7b8a63c3e9	Alter the xxx_pattern_ops opclasses to use the regular equality operator of the associated datatype as their equality member. This means that these opclasses can now support plain equality comparisons along with LIKE tests, thus avoiding the need for an extra index in some applications. This optimization was not possible when the pattern opclasses were first introduced, because we didn't insist that text equality meant bitwise equality; but we do now, so there is no semantic difference between regular and pattern equality operators. I removed the name_pattern_ops opclass altogether, since it's really useless: name's regular comparisons are just strcmp() and are unlikely to become something different. Instead teach indxpath.c that btree name_ops can be used for LIKE whether or not the locale is C. This might lead to a useful speedup in LIKE queries on the system catalogs in non-C locales. The ~=~ and ~<>~ operators are gone altogether. (It would have been nice to keep them for backward compatibility's sake, but since the pg_amop structure doesn't allow multiple equality operators per opclass, there's no way.) A not-immediately-obvious incompatibility is that the sort order within bpchar_pattern_ops indexes changes --- it had been identical to plain strcmp, but is now trailing-blank-insensitive. This will impact in-place upgrades, if those ever happen. Per discussions a couple months ago.	2008-05-27 00:13:09 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Tom Lane	d1cbd26ded	Repair two places where SIGTERM exit could leave shared memory state corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.	2008-04-16 23:59:40 +00:00
Tom Lane	24558da14a	Phase 2 of project to make index operator lossiness be determined at runtime instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.	2008-04-13 19:18:14 +00:00
Tom Lane	4e82a95476	Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.	2008-04-10 22:25:26 +00:00
Alvaro Herrera	73b0300b2a	Move the HTSU_Result enum definition into snapshot.h, to avoid including tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.	2008-03-26 21:10:39 +00:00
Alvaro Herrera	78f02ca1f5	Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files. Per complaint from Tom Lane.	2008-03-26 18:48:59 +00:00
Alvaro Herrera	d43b085d57	Separate snapshot management code from tuple visibility code, create a snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.	2008-03-26 16:20:48 +00:00
Bruce Momjian	fca9fff41b	More README src cleanups.	2008-03-21 13:23:29 +00:00
Bruce Momjian	4e228447aa	Make source code READMEs more consistent. Add CVS tags to all README files.	2008-03-20 17:55:15 +00:00
Tom Lane	787eba734b	When creating a large hash index, pre-sort the index entries by estimated bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.	2008-03-16 23:15:08 +00:00
Peter Eisentraut	0474dcb608	Refactor backend makefiles to remove lots of duplicate code	2008-02-19 10:30:09 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Tom Lane	ac1ae9f2fa	Improve a number of elog messages for not-supposed-to-happen cases in btrees, since these seem to happen after all in corrupted indexes. Make sure we supply the index name in all cases, and provide relevant block numbers where available. Also consistently identify the index name as such. Back-patch to 8.2, in hopes that this might help Mason Hale figure out his problem.	2007-12-31 04:52:05 +00:00
Tom Lane	93190c3098	Repair still another bug in the btree page split WAL reduction patch: it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.	2007-11-16 19:53:50 +00:00
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	2007-11-15 22:25:18 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	6889303531	Redefine the lp_flags field of item pointers as having four states, rather than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.	2007-09-12 22:10:26 +00:00
Peter Eisentraut	f4a3789b39	Clarify some error messages about duplicate things.	2007-06-03 22:16:03 +00:00
Tom Lane	d526575f89	Make large sequential scans and VACUUMs work in a limited-size "ring" of buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.	2007-05-30 20:12:03 +00:00
Tom Lane	77947c51c0	Fix up pgstats counting of live and dead tuples to recognize that committed and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.	2007-05-27 03:50:39 +00:00
Tom Lane	a8d539f124	To support external compression of archived WAL data, add a flag bit to WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.	2007-05-20 21:08:19 +00:00
Tom Lane	226a100568	Code review for btree page split WAL reduction patch. Make it actually work (original code always created a full-page image for the left page, thus leaving the intended savings unrealized), avoid risk of not having enough room on the page during xlog restore, squeeze out another couple bytes in the xlog record, clean up neglected comments.	2007-04-11 20:47:38 +00:00
Tom Lane	56218fbc48	Minor tweaking of index special-space definitions so that the various index types can be reliably distinguished by examining the special space on an index page. Per my earlier proposal, plus the realization that there's no need for btree's vacuum cycle ID to cycle through every possible 16-bit value. Restricting its range a little costs nearly nothing and eliminates the possibility of collisions. Memo to self: remember to make bitmap indexes play along with this scheme, assuming that patch ever gets accepted.	2007-04-09 22:04:08 +00:00
Tom Lane	7b78474da3	Make CLUSTER MVCC-safe. Heikki Linnakangas	2007-04-08 01:26:33 +00:00
Tom Lane	f02a82b6ad	Make 'col IS NULL' clauses be indexable conditions. Teodor Sigaev, with some kibitzing from Tom Lane.	2007-04-06 22:33:43 +00:00
Tom Lane	8875d0987d	Fix oversight in coding of _bt_start_vacuum: we can't assume that the LWLock will be released by transaction abort before _bt_end_vacuum gets called. If either of these "can't happen" errors actually happened, we'd freeze up trying to acquire an already-held lock. Latest word is that this does not explain Martin Pitt's trouble report, but it still looks like a bug.	2007-03-30 00:12:59 +00:00
Tom Lane	e85a01df67	Clean up the representation of special snapshots by including a "method pointer" in every Snapshot struct. This allows removal of the case-by-case tests in HeapTupleSatisfiesVisibility, which should make it a bit faster (I didn't try any performance tests though). More importantly, we are no longer violating portable C practices by assuming that small integers are distinct from all pointer values, and HeapTupleSatisfiesDirty no longer has a non-reentrant API involving side-effects on a global variable. There were a couple of places calling HeapTupleSatisfiesXXX routines directly rather than through the HeapTupleSatisfiesVisibility macro. Since these places had to be changed anyway, I chose to make them go through the macro for uniformity. Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC to emphasize that it's only used with MVCC-type snapshots. I was sorely tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot, but forebore for the moment to avoid confusion and reduce the likelihood that this patch breaks some of the pending patches. Might want to reconsider doing that later.	2007-03-25 19:45:14 +00:00
Neil Conway	e1d8deb918	Fix a typo in a comment. Heikki Linnakangas.	2007-03-05 14:13:12 +00:00
Bruce Momjian	bc292937ae	Split _bt_insertonpg to two functions. Heikki Linnakangas	2007-03-03 20:13:06 +00:00
Bruce Momjian	6f519ad01c	btree source code cleanups: I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas	2007-02-21 20:02:17 +00:00
Alvaro Herrera	f8ebab901b	Fix reference-after-free in the new btree page split code, as reported by the buildfarm via Stefan Kaltenbrunner. Patch from Heikki Linnakangas.	2007-02-08 13:52:55 +00:00
Bruce Momjian	b79575ce45	Reduce WAL activity for page splits: > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas	2007-02-08 05:05:53 +00:00
Tom Lane	c76ed81513	Remove some dead code, per Heikki.	2007-02-06 14:55:11 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Tom Lane	6cefacd7c8	Correct an old logic error in btree page splitting: when considering a split exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.	2007-01-27 20:53:30 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Tom Lane	d83235415b	Add some notes about the basic mathematical laws that the system presumes hold true for operators in a btree operator family. This is mostly to clarify my own thinking about what the planner can assume for optimization purposes. (blowing dust off an old abstract-algebra textbook...)	2007-01-12 17:04:54 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	ef07221997	Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.	2007-01-03 18:11:01 +00:00
Tom Lane	9aefd56669	Fix up btree's initial scankey processing to be able to detect redundant or contradictory keys even in cross-data-type scenarios. This is another benefit of the opfamily rewrite: we can find the needed comparison operators now.	2006-12-28 23:16:39 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Tom Lane	a46ca619f8	Suppress a few 'uninitialized variable' warnings that gcc emits only at -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.	2006-11-11 01:14:19 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	9e936693a9	Fix free space map to correctly track the total amount of FSM space needed even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.	2006-09-21 20:31:22 +00:00
Tom Lane	e093dcdd28	Add the ability to create indexes 'concurrently', that is, without blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.	2006-08-25 04:06:58 +00:00
Tom Lane	08ae5edc5c	Optimize the case where a btree indexscan has current and mark positions on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas	2006-08-24 01:18:34 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	d29b66882a	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.	2006-07-11 21:05:57 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Tom Lane	cdd5178c69	Extend the MinimalTuple concept to tuplesort.c, thereby reducing the per-tuple space overhead for sorts in memory. I chose to replace the previous patch that tried to write out the bare minimum amount of data when sorting on disk; instead, just dump the MinimalTuples as-is. This wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap length, but the simplification in the writetup/readtup routines seems worth it.	2006-06-27 16:53:02 +00:00
Tom Lane	3fdeb189e9	Clean up code associated with updating pg_class statistics columns (relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.	2006-05-10 23:18:39 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	09cb5c0e7d	Rewrite btree index scans to work a page at a time in all cases (both btgettuple and btgetmulti). This eliminates the problem of "re-finding" the exact stopping point, since the stopping point is effectively always a page boundary, and index items are never moved across pre-existing page boundaries. A small penalty is that the keys_are_unique optimization is effectively disabled (and, therefore, is removed in this patch), causing us to apply _bt_checkkeys() to at least one more tuple than necessary when looking up a unique key. However, the advantages for non-unique cases seem great enough to accept this tradeoff. Aside from simplifying and (sometimes) speeding up the indexscan code, this will allow us to reimplement btbulkdelete as a largely sequential scan instead of index-order traversal, thereby significantly reducing the cost of VACUUM. Those changes will come in a separate patch. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-07 01:21:30 +00:00
Tom Lane	e57345975c	Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion. This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().	2006-05-02 22:25:10 +00:00
Tom Lane	d2896a9ed1	Arrange to cache btree metapage data in the relcache entry for the index, thereby saving a visit to the metapage in most index searches/updates. This wouldn't actually save any I/O (since in the old regime the metapage generally stayed in cache anyway), but it does provide a useful decrease in bufmgr traffic in high-contention scenarios. Per my recent proposal.	2006-04-25 22:46:05 +00:00
Tom Lane	49a7610c36	Fix an ancient oversight in btree xlog replay. When trying to determine if an upper-level insertion completes a previously-seen split, we cannot simply grab the downlink block number out of the buffer, because the buffer could contain a later state of the page --- or perhaps the page doesn't even exist at all any more, due to relation truncation. These possibilities have been masked up to now because the use of full_page_writes effectively ensured that no xlog replay routine ever actually saw a page state newer than its own change. Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this in existing release branches, but we need a fix in HEAD if we want to have any hope of re-allowing full_page_writes. Accordingly, adjust the contents of btree WAL records so that we can always get the downlink block number from the WAL record rather than having to depend on buffer contents. Per report from Kevin Grittner and Peter Brant. Improve a few comments in related code while at it.	2006-04-13 03:53:05 +00:00
Tom Lane	89bda95d82	Remove the 'slow' path for btree index build, which built the btree incrementally by successive inserts rather than by sorting the data. We were only using the slow path during bootstrap, apparently because when first written it failed during bootstrap --- but it works fine now AFAICT. Removing it saves a hundred or so lines of code and produces noticeably (~10%) smaller initial states of the system catalog indexes. While that won't make much difference for heavily-modified catalogs, for the more static ones there may be a useful long-term performance improvement.	2006-04-01 03:03:37 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	288551fc60	Repair longstanding error in btree xlog replay: XLogReadBuffer should be passed extend = true whenever we are reading a page we intend to reinitialize completely, even if we think the page "should exist". This is because it might indeed not exist, if the relation got truncated sometime after the current xlog record was made and before the crash we're trying to recover from. These two thinkos appear to explain both of the old bug reports discussed here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php	2006-03-28 21:17:23 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Tom Lane	9f6192490e	Add a CHECK_FOR_INTERRUPTS() in _bt_buildadd(). This fixes problem with not responding to query cancel during the last stage of btree index creation.	2006-03-10 20:18:15 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	2d7f694729	Move btbulkdelete's vacuum_delay_point() call to a place in the loop where we are not holding a buffer content lock; where it was, InterruptHoldoffCount is positive and so we'd not respond to cancel signals as intended. Also add missing vacuum_delay_point() call in btvacuumcleanup. This should fix complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM in a timely fashion (bug #2257).	2006-02-14 17:20:01 +00:00
Tom Lane	d52a57fc30	Actually there's a better way to do this, which is to count tuples during the vacuumcleanup scan that we're going to do anyway. Should save a few cycles (one calculation per page, not per tuple) as well as not having to depend on assumptions about heap and index being in step. I think this could probably be made to work for GIST too, but that code looks messy enough that I'm disinclined to try right now.	2006-02-12 00:18:17 +00:00
Tom Lane	fd267c1ebc	Skip ambulkdelete scan if there's nothing to delete and the index is not partial. None of the existing AMs do anything useful except counting tuples when there's nothing to delete, and we can get a tuple count from the heap as long as it's not a partial index. (hash actually can skip anyway because it maintains a tuple count in the index metapage.) GIST is not currently able to exploit this optimization because, due to failure to index NULLs, GIST is always effectively partial. Possibly we should fix that sometime. Simon Riggs w/ some review by Tom Lane.	2006-02-11 23:31:34 +00:00
Bruce Momjian	77bb65d3fc	Revert based on Tom's recommendation: > Allow VACUUM to complete faster by avoiding scanning the indexes when no > rows were removed from the heap by the VACUUM.	2006-02-11 17:14:09 +00:00
Bruce Momjian	bf324946b3	Allow VACUUM to complete faster by avoiding scanning the indexes when no rows were removed from the heap by the VACUUM. Simon Riggs	2006-02-11 16:59:09 +00:00
Tom Lane	c389760c32	Remove the no-longer-useful BTItem/BTItemData level of structure, and just refer to btree index entries as plain IndexTuples, which is what they have been for a very long time. This is mostly just an exercise in removing extraneous notation, but it does save a palloc/pfree cycle per index insertion.	2006-01-25 23:04:21 +00:00
Tom Lane	3a0a16cb7e	Allow row comparisons to be used as indexscan qualifications. This completes the project to upgrade our handling of row comparisons.	2006-01-25 20:29:24 +00:00
Tom Lane	7ccaf13a06	Instead of using a numberOfRequiredKeys count to distinguish required and non-required keys in a btree index scan, mark the required scankeys with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD. This seems at least marginally clearer to me, and it eliminates a wired-into-the- data-structure assumption that required keys are consecutive. Even though that assumption will remain true for the foreseeable future, having it in there makes the code seem more complex than necessary.	2006-01-23 22:31:41 +00:00
Tom Lane	73e3566078	Improve comments about btree's use of ScanKey data structures: there are two basically different kinds of scankeys, and we ought to try harder to indicate which is used in each place in the code. I've chosen the names "search scankey" and "insertion scankey", though you could make about as good an argument for "operator scankey" and "comparison function scankey".	2006-01-17 00:09:01 +00:00
Neil Conway	fb627b76cc	Cosmetic code cleanup: fix a bunch of places that used "return (expr);" rather than "return expr;" -- the latter style is used in most of the tree. I kept the parentheses when they were necessary or useful because the return expression was complex.	2006-01-11 08:43:13 +00:00
Tom Lane	afa8f1971a	Add RelationOpenSmgr() calls to ensure rd_smgr is valid when we try to use it. While it normally has been opened earlier during btree index build, testing shows that it's possible for the link to be closed again if an sinval reset occurs while the index is being built.	2006-01-07 22:45:41 +00:00
Tom Lane	cefcbbf1fd	Push the responsibility for handling ignore_killed_tuples down into _bt_checkkeys(), instead of checking it in the top-level nbtree.c routines as formerly. This saves a little bit of loop overhead, but more importantly it lets us skip performing the index key comparisons for dead tuples.	2005-12-07 19:37:53 +00:00
Tom Lane	f1b059af12	A couple of tiny performance hacks in _bt_step(). Remove PageIsEmpty checks, which were once needed because PageGetMaxOffsetNumber would fail on empty pages, but are now just redundant. Also, don't set up local variables that aren't needed in the fast path --- most of the time, we only need to advance offnum and not step across a page boundary. Motivated by noticing _bt_step at the top of OProfile profile for a pgbench run.	2005-12-07 18:03:48 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	766dc45d9f	Add defenses to btree and hash index AMs to do simple sanity checks on every index page they read; in particular to catch the case of an all-zero page, which PageHeaderIsValid allows to pass. It turns out hash already had this idea, but it was just Assert()ing things rather than doing a straight error check, and the Asserts were partially redundant with PageHeaderIsValid anyway. Per recent failure example from Jim Nasby. (gist still needs the same treatment.)	2005-11-06 19:29:01 +00:00
Tom Lane	23836fb1fb	A few trivial code cleanups motivated by reading warnings generated by a recent HP C compiler. Mostly, get rid of useless local variables that are assigned to but never used.	2005-10-18 01:06:24 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	e952ae1268	Fix longstanding bug found by Atsushi Ogawa: _bt_check_unique would mark the wrong buffer dirty when trying to kill a dead index entry that's on a page after the one it started on. No risk of data corruption, just inefficiency, but still a bug.	2005-10-12 17:18:03 +00:00
Tom Lane	cb8b6618ce	Revise pgstats stuff to fix the problems with not counting accesses generated by bitmap index scans. Along the way, simplify and speed up the code for counting sequential and index scans; it was both confusing and inefficient to be taking care of that in the per-tuple loops, IMHO. initdb forced because of internal changes in pg_stat view definitions.	2005-10-06 02:29:23 +00:00
Tom Lane	303e089df5	Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.	2005-09-24 22:54:44 +00:00
Tom Lane	35e9b1cc1e	Clean up a couple of ad-hoc computations of the maximum number of tuples on a page, as suggested by ITAGAKI Takahiro. Also, change a few places that were using some other estimates of max-items-per-page to consistently use MaxOffsetNumber. This is conservatively large --- we could have used the new MaxHeapTuplesPerPage macro, or a similar one for index tuples --- but those places are simply declaring a fixed-size buffer and assuming it will work, rather than actively testing for overrun. It seems safer to size these buffers in a way that can't overflow even if the page is corrupt.	2005-09-02 19:02:20 +00:00
Tom Lane	6ea05c16a4	Change a couple of "can't happen" error messages to be a shade more verbose when they do happen. The "left link changed unexpectedly" one in particular has been seen more than once in the field.	2005-08-12 14:34:14 +00:00
Bruce Momjian	949ebbd55e	Mention MD5 function index for indexing long values.	2005-08-11 13:22:33 +00:00
Tom Lane	24ff62d76f	Make new hints follow style guide.	2005-08-10 22:39:00 +00:00
Bruce Momjian	237be3cc29	Add hints to cases where indexes fail because of values that are too long.	2005-08-10 21:36:46 +00:00
Tom Lane	d961a56899	Avoid unnecessary palloc overhead in _bt_first(). The temporary scankeys arrays that it needs can never have more than INDEX_MAX_KEYS entries, so it's reasonable to just allocate them as fixed-size local arrays, and save the cost of palloc/pfree. Not a huge savings, but a cycle saved is a cycle earned ...	2005-06-19 22:41:00 +00:00
Tom Lane	c186c93148	Change the planner to allow indexscan qualification clauses to use nonconsecutive columns of a multicolumn index, as per discussion around mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This turns out to require only minimal changes in btree, and so far as I can see none at all in GiST. btcostestimate did need some work, but its original assumption that index selectivity == heap selectivity was quite bogus even before this.	2005-06-13 23:14:49 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Neil Conway	3140437495	This patch refactors away some duplicated code in the index AM build methods: they all invoke UpdateStats() since they have computed the number of heap tuples, so I created a function in catalog/index.c that each AM now calls.	2005-05-11 06:24:55 +00:00
Tom Lane	30f540be43	Repair very-low-probability race condition between relation extension and VACUUM: in the interval between adding a new page to the relation and formatting it, it was possible for VACUUM to come along and decide it should format the page too. Though not harmful in itself, this would cause data loss if a third transaction were able to insert tuples into the vacuumed page before the original extender got control back.	2005-05-07 21:32:24 +00:00
Tom Lane	278bd0cc22	For some reason access/tupmacs.h has been #including utils/memutils.h, which is neither needed by nor related to that header. Remove the bogus inclusion and instead include the header in those C files that actually need it. Also fix unnecessary inclusions and bad inclusion order in tsearch2 files.	2005-05-06 17:24:55 +00:00
Tom Lane	3a694bb0a1	Restructure LOCKTAG as per discussions of a couple months ago. Essentially, we shoehorn in a lockable-object-type field by taking a byte away from the lockmethodid, which can surely fit in one byte instead of two. This allows less artificial definitions of all the other fields of LOCKTAG; we can get rid of the special pg_xactlock pseudo-relation, and also support locks on individual tuples and general database objects (including shared objects). None of those possibilities are actually exploited just yet, however. I removed pg_xactlock from pg_class, but did not force initdb for that change. At this point, relkind 's' (SPECIAL) is unused and could be removed entirely.	2005-04-29 22:28:24 +00:00
Tom Lane	70c9763d48	Convert oidvector and int2vector into variable-length arrays. This change saves a great deal of space in pg_proc and its primary index, and it eliminates the former requirement that INDEX_MAX_KEYS and FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded in the on-disk representation (because it affects index tuple header size), but FUNC_MAX_ARGS is not. I believe it would now be possible to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet. There are still a lot of vestigial references to FUNC_MAX_ARGS, which I will clean up in a separate pass. However, getting rid of it altogether would require changing the FunctionCallInfoData struct, and I'm not sure I want to buy into that.	2005-03-29 00:17:27 +00:00
Tom Lane	bf3dbb5881	First steps towards index scans with heap access decoupled from index access: define new index access method functions 'amgetmulti' that can fetch multiple TIDs per call. (The functions exist but are totally untested as yet.) Since I was modifying pg_am anyway, remove the no-longer-needed 'rel' parameter from amcostestimate functions, and also remove the vestigial amowner column that was creating useless work for Alvaro's shared-object-dependencies project. Initdb forced due to changes in pg_am.	2005-03-27 23:53:05 +00:00
Tom Lane	94e03330cb	Create a routine PageIndexMultiDelete() that replaces a loop around PageIndexTupleDelete() with a single pass of compactification --- logic mostly lifted from PageRepairFragmentation. I noticed while profiling that a VACUUM that's cleaning up a whole lot of deleted tuples would spend as much as a third of its CPU time in PageIndexTupleDelete; not too surprising considering the loop method was roughly O(N^2) in the number of tuples involved.	2005-03-22 06:17:03 +00:00
Tom Lane	ee4ddac137	Convert index-related tuple handling routines from char 'n'/' ' to bool convention for isnull flags. Also, remove the useless InsertIndexResult return struct from index AM aminsert calls --- there is no reason for the caller to know where in the index the tuple was inserted, and we were wasting a palloc cycle per insert to deliver this uninteresting value (plus nontrivial complexity in some AMs). I forced initdb because of the change in the signature of the aminsert routines, even though nothing really looks at those pg_proc entries...	2005-03-21 01:24:04 +00:00
Tom Lane	354049c709	Remove unnecessary calls of FlushRelationBuffers: there is no need to write out data that we are about to tell the filesystem to drop. smgr_internal_unlink already had a DropRelFileNodeBuffers call to get rid of dead buffers without a write after it's no longer possible to roll back the deleting transaction. Adding a similar call in smgrtruncate simplifies callers and makes the overall division of labor clearer. This patch removes the former behavior that VACUUM would write all dirty buffers of a relation unconditionally.	2005-03-20 22:00:54 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	c3d6c7d8f9	Calculation of keys_are_unique flag was wrong for cases involving redundant cross-datatype comparisons. Per example from Merlin Moncure.	2004-12-15 19:16:39 +00:00
Tom Lane	5374d097de	Change planner to use the current true disk file size as its estimate of a relation's number of blocks, rather than the possibly-obsolete value in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly to arrive at a hopefully more accurate number of rows. When pg_class contains 0/0, estimate a tuple width from the column datatypes and divide that into current file size to estimate number of rows. This improved methodology allows us to jettison the ancient hacks that put bogus default values into pg_class when a table is first created. Also, per a suggestion from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value it puts into pg_class.reltuples to try to represent the mean tuple density instead of the minimal density that actually prevails just after VACUUM. These changes alter the plans selected for certain regression tests, so update the expected files accordingly. (I removed join_1.out because it's not clear if it still applies; we can add back any variant versions as they are shown to be needed.)	2004-12-01 19:00:56 +00:00
Neil Conway	5d1dd2bc55	Micro-optimization of markpos() and restrpos() in btree and hash indexes. Rather than using ReadBuffer() to increment the reference count on an already-pinned buffer, we should use IncrBufferRefCount() as it is faster and does not require acquiring the BufMgrLock.	2004-11-17 03:13:38 +00:00
Neil Conway	4d0f669f3c	Remove obsolete comment from btbuild() and hashbuild(): we no longer use a global variable to control building indexes.	2004-11-11 00:32:50 +00:00
Tom Lane	83cd2d8b0f	Make heap_fetch API more consistent by having the buffer remain pinned in all cases when keep_buf = true. This allows ANALYZE's inner loop to use heap_release_fetch, which saves multiple buffer lookups for the same page and avoids overestimation of cost by the vacuum cost mechanism.	2004-10-26 16:05:03 +00:00
Tom Lane	9ffc8ed58b	Repair possible failure to update hint bits back to disk, per http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.	2004-10-15 22:40:29 +00:00
Neil Conway	0ed07d49d5	Code cleanup: don't bother casting the argument to pfree() to void * from another pointer type. Per C89, this is unnecessary, and it is common practice throughout the rest of the tree anyway.	2004-09-27 04:01:23 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	1c72d0dec1	Fix relcache to account properly for subtransaction status of 'new' relcache entries. Also, change TransactionIdIsCurrentTransactionId() so that if consulted during transaction abort, it will not say that the aborted xact is still current. (It would be better to ensure that it's never called at all during abort, but I'm not sure we can easily guarantee that.) In combination, these fix a crash we have seen occasionally during parallel regression tests of 8.0.	2004-08-28 20:31:44 +00:00
Tom Lane	19cd31b068	Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initial value of 'start' could be past the end of the page, if the page was split by some concurrent inserting process since we visited it. In this situation the code could look at bogus entries and possibly find a match (since after all those entries still contain what they had before the split). This would lead to 'specified item offset is too large' followed by 'PANIC: failed to add item to the page', as reported by Joe Conway for scenarios involving heavy concurrent insertion activity.	2004-08-17 23:15:33 +00:00
Tom Lane	1a3de15a3a	Dept. of further reflection: I looked around to see if any other callers of XLogInsert had the same sort of checkpoint interlock problem as RecordTransactionCommit, and indeed I found some. Btree index build and ALTER TABLE SET TABLESPACE write data outside the friendly confines of the buffer manager, and therefore they have to take their own responsibility for checkpoint interlock. The easiest solution seems to be to force smgrimmedsync at the end of the index build or table copy, even when the operation is being WAL-logged. This is sufficient since the new index or table will be of interest to no one if we don't get as far as committing the current transaction.	2004-08-15 23:44:46 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	fe548629c5	Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.	2004-07-17 03:32:14 +00:00
Tom Lane	94d4d240bb	Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and shift support code into heapam.c accordingly. This is in service of soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to use this same record type for both heaps and indexes. Theoretically I should have forced initdb for this, but in practice there is no change in xlog contents because CVS tip will never really emit this record type anyhow...	2004-07-11 18:01:45 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	c3a153afed	Tweak palloc/repalloc to allow zero bytes to be requested, as per recent proposal. Eliminate several dozen now-unnecessary hacks to avoid palloc(0). (It's likely there are more that I didn't find.)	2004-06-05 19:48:09 +00:00
Tom Lane	2095206de1	Adjust btree index build to not use shared buffers, thereby avoiding the locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.	2004-06-02 17:28:18 +00:00
Tom Lane	e674707968	Minor code rationalization: FlushRelationBuffers just returns void, rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.	2004-05-31 19:24:05 +00:00
Neil Conway	72b6ad6313	Use the new List API function names throughout the backend, and disable the list compatibility API by default. While doing this, I decided to keep the llast() macro around and introduce llast_int() and llast_oid() variants.	2004-05-30 23:40:41 +00:00
Neil Conway	d0b4399d81	Reimplement the linked list data structure used throughout the backend. In the past, we used a 'Lispy' linked list implementation: a "list" was merely a pointer to the head node of the list. The problem with that design is that it makes lappend() and length() linear time. This patch fixes that problem (and others) by maintaining a count of the list length and a pointer to the tail node along with each head node pointer. A "list" is now a pointer to a structure containing some meta-data about the list; the head and tail pointers in that structure refer to ListCell structures that maintain the actual linked list of nodes. The function names of the list API have also been changed to, I hope, be more logically consistent. By default, the old function names are still available; they will be disabled-by-default once the rest of the tree has been updated to use the new API names.	2004-05-26 04:41:50 +00:00
Tom Lane	4af3421161	Get rid of rd_nblocks field in relcache entries. Turns out this was costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.	2004-05-08 19:09:25 +00:00
Tom Lane	37fa3b6c89	Tweak indexscan and seqscan code to arrange that steps from one page to the next are handled by ReleaseAndReadBuffer rather than separate ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens to pull successive rows from the same heap page). Unfortunately this doesn't seem enough to get us out of the recently discussed context-switch storm problem, but it's surely worth doing anyway.	2004-04-21 18:24:26 +00:00
Tom Lane	58f337a343	Centralize implementation of delay code by creating a pg_usleep() subroutine in src/port/pgsleep.c. Remove platform dependencies from miscadmin.h and put them in port.h where they belong. Extend recent vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and non-btree index vacuuming. By the way, where is the documentation for the cost-based-delay patch?	2004-02-10 03:42:45 +00:00
Tom Lane	87bd956385	Restructure smgr API as per recent proposal. smgr no longer depends on the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.	2004-02-10 01:55:27 +00:00
Jan Wieck	f425b605f4	Cost based vacuum delay feature. Jan	2004-02-06 19:36:18 +00:00
Tom Lane	391c3811a2	Rename SortMem and VacuumMem to work_mem and maintenance_work_mem. Make btree index creation and initial validation of foreign-key constraints use maintenance_work_mem rather than work_mem as their memory limit. Add some code to guc.c to allow these variables to be referenced by their old names in SHOW and SET commands, for backwards compatibility.	2004-02-03 17:34:04 +00:00
Neil Conway	192ad63bd7	More janitorial work: remove the explicit casting of NULL literals to a pointer type when it is not necessary to do so. For future reference, casting NULL to a pointer type is only necessary when (a) invoking a function AND either (b) the function has no prototype OR (c) the function is a varargs function.	2004-01-07 18:56:30 +00:00
Tom Lane	ef92b82dbb	Further cleanup in _bt_first: eliminate duplicate code paths.	2003-12-21 17:52:34 +00:00
Tom Lane	2a0caefeb5	Previous change exposed some opportunities for further simplification in _bt_first().	2003-12-21 03:00:04 +00:00
Tom Lane	569659ae16	Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03.	2003-12-21 01:23:06 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	fa5c8a055a	Cross-data-type comparisons are now indexable by btrees, pursuant to my pghackers proposal of 8-Nov. All the existing cross-type comparison operators (int2/int4/int8 and float4/float8) have appropriate support. The original proposal of storing the right-hand-side datatype as part of the primary key for pg_amop and pg_amproc got modified a bit in the event; it is easier to store zero as the 'default' case and only store a nonzero when the operator is actually cross-type. Along the way, remove the long-since-defunct bigbox_ops operator class.	2003-11-12 21:15:59 +00:00
Tom Lane	c1d62bfd00	Add operator strategy and comparison-value datatype fields to ScanKey. Remove the 'strategy map' code, which was a large amount of mechanism that no longer had any use except reverse-mapping from procedure OID to strategy number. Passing the strategy number to the index AM in the first place is simpler and faster. This is a preliminary step in planned support for cross-datatype index operations. I'm committing it now since the ScanKeyEntryInitialize() API change touches quite a lot of files, and I want to commit those changes before the tree drifts under me.	2003-11-09 21:30:38 +00:00
Tom Lane	e33f205a94	Adjust btree index build procedure so that the btree metapage looks invalid (has the wrong magic number) until the build is entirely complete. This turns out to cost no additional writes in the normal case, since we were rewriting the metapage at the end of the process anyway. In normal scenarios there's no real gain in security, because a failed index build would roll back the transaction leaving an unused index file, but for rebuilding shared system indexes this seems to add some useful protection.	2003-09-29 23:40:26 +00:00
Peter Eisentraut	feb4f44d29	Message editing: remove gratuitous variations in message wording, standardize terms, add some clarifications, fix some untranslatable attempts at dynamic message building.	2003-09-25 06:58:07 +00:00
Tom Lane	5ac2d7c0eb	In _bt_check_unique() loop, don't bother applying _bt_isequal() to killed items; just skip to the next item immediately. Only check for key equality when we reach a non-killed item or the end of the index page. This saves key comparisons when there are lots of killed items, as for example in a heavily-updated table that's not been vacuumed lately. Seems to be a win for pgbench anyway.	2003-09-02 22:10:16 +00:00
Tom Lane	ffafacc1f6	Repair potential deadlock created by recent changes to recycle btree index pages: when _bt_getbuf asks the FSM for a free index page, it is possible (and, in some cases, even moderately likely) that the answer will be the same page that _bt_split is trying to split. _bt_getbuf already knew that the returned page might not be free, but it wasn't prepared for the possibility that even trying to lock the page could be problematic. Fix by doing a conditional rather than unconditional grab of the page lock.	2003-08-10 19:48:08 +00:00
Bruce Momjian	46785776c4	Another pgindent run with updated typedefs.	2003-08-08 21:42:59 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	892a51c367	Fix longstanding error in _bt_search(): should moveright at top of loop not bottom. Otherwise we fail to moveright when the root page was split while we were "in flight" to it. This is not a significant problem when the root is above the leaf level, but if the root was also a leaf (ie, a single-page index just got split) we may return the wrong leaf page to the caller, resulting in failure to find a key that is in fact present. Bug has existed at least since 7.1, probably forever.	2003-07-29 22:18:38 +00:00
Tom Lane	81b5c8a136	A visit from the message-style police ...	2003-07-28 00:09:16 +00:00
Tom Lane	ec7aa4b515	Error message editing in backend/access.	2003-07-21 20:29:40 +00:00
Bruce Momjian	98b6f37e47	Make debug_ GUC varables output DEBUG1 rather than LOG, and mention in docs that CLIENT/LOG_MIN_MESSAGES now controls debug_* output location. Doc changes included.	2003-05-27 17:49:47 +00:00
Peter Eisentraut	2c0556068f	Indexing support for pattern matching operations via separate operator class when lc_collate is not C.	2003-05-15 15:50:21 +00:00
Tom Lane	0489783011	Adjust amrescan code so that it's allowed to call index_rescan with a NULL key pointer, indicating that the existing scan key should be reused. This behavior isn't used yet but will be needed for my planned fix to the keys_are_unique code.	2003-03-23 23:01:03 +00:00
Tom Lane	391eb5e5b6	Reimplement free-space-map management as per recent discussions. Adjustable threshold is gone in favor of keeping track of total requested page storage and doling out proportional fractions to each relation (with a minimum amount per relation, and some quantization of the results to avoid thrashing with small changes in page counts). Provide special- case code for indexes so as not to waste space storing useless page free space counts. Restructure internal data storage to be a flat array instead of list-of-chunks; this may cost a little more work in data copying when reorganizing, but allows binary search to be used during lookup_fsm_page_entry().	2003-03-04 21:51:22 +00:00
Tom Lane	0797bb5c50	During VACUUM FULL, truncate off any deletable pages that are at the end of a btree index. This isn't super-effective, since we won't move nondeletable pages, but it's better than nothing. Also, improve stats displayed during VACUUM VERBOSE.	2003-02-24 00:57:17 +00:00
Tom Lane	3981f2195f	Remove no-longer-used FixBTree GUC variable.	2003-02-23 23:27:21 +00:00
Tom Lane	61b22d3aab	btree page recycling can be done as soon as page's next-xact label is older than current Xmin; we don't have to wait till it's older than GlobalXmin.	2003-02-23 23:20:52 +00:00
Tom Lane	3bbd6af37c	Adjust btbulkdelete logic so that only one WAL record is issued while deleting multiple index entries on a single index page. This makes for a very substantial reduction in the amount of WAL traffic during a large delete operation.	2003-02-23 22:43:09 +00:00
Tom Lane	88dc31e3f2	First cut at recycling space in btree indexes. Still some rough edges to fix, but it seems to basically work...	2003-02-23 06:17:13 +00:00
Tom Lane	799bc58dc7	More infrastructure for btree compaction project. Tree-traversal code now knows what to do upon hitting a dead page (in theory anyway, it's untested...). Add a post-VACUUM-cleanup entry point for index AMs, to provide a place for dead-page scavenging to happen. Also, fix oversight that broke btpo_prev links in temporary indexes. initdb forced due to additions in pg_am.	2003-02-22 00:45:05 +00:00
Tom Lane	70508ba7ae	Make btree index structure adjustments and WAL logging changes needed to support btree compaction, as per proposal of a few days ago. btree index pages no longer store parent links, instead they have a level indicator (counting up from zero for leaf pages). The FixBTree recovery logic is removed, and replaced by code that detects missing parent-level insertions during WAL replay. Also, generate appropriate WAL entries when updating btree metapage and when building a btree index from scratch. I believe btree indexes are now completely WAL-legal for the first time. initdb forced due to index and WAL changes.	2003-02-21 00:06:22 +00:00
Bruce Momjian	559b6c7ced	Rename show_btree_build_stats to log_btree_build_stats	2002-11-15 01:26:09 +00:00
Bruce Momjian	9b12ab6d5d	Add new palloc0 call as merge of palloc and MemSet(0).	2002-11-13 00:39:48 +00:00
Bruce Momjian	75fee4535d	Back out use of palloc0 in place if palloc/MemSet. Seems constant len to MemSet is a performance boost.	2002-11-11 03:02:20 +00:00
Bruce Momjian	8fee9615cc	Merge palloc()/MemSet(0) calls into a single palloc0() call.	2002-11-10 07:25:14 +00:00
Tom Lane	13416a1f8f	Fix potential problem with btbulkdelete deleting an indexscan's current item, if the page containing the current item is split while the indexscan is stopped and holds no read-lock on the page. The current item might move right onto a page that the indexscan holds no pin on. In the prior code this would allow btbulkdelete to reach and possibly delete the item, causing 'my bits moved right off the end of the world!' when the indexscan finally resumes. Fix by chaining read-locks to the right during _bt_restscan and requiring btbulkdelete to LockBufferForCleanup on every page it scans, not only those with deletable items. Per my pghackers message of 25-May-02. (Too bad no one could think of a better way.)	2002-10-20 20:47:31 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Tom Lane	ba053de197	Still more paranoia in PageAddItem: disallow specification of an item offset past the last-used-item-plus-one, since that would result in leaving uninitialized holes in the item pointer array. AFAICT the only place that was depending on this was btree index build, which was being cavalier about when to fill in the P_HIKEY pointer; easily fixed. Also a small performance improvement: shuffle itemid's by means of memmove, not a one-at-a-time loop.	2002-08-06 19:41:23 +00:00
Tom Lane	5df307c778	Restructure local-buffer handling per recent pghackers discussion. The local buffer manager is no longer used for newly-created relations (unless they are TEMP); a new non-TEMP relation goes through the shared bufmgr and thus will participate normally in checkpoints. But TEMP relations use the local buffer manager throughout their lifespan. Also, operations in TEMP relations are not logged in WAL, thus improving performance. Since it's no longer necessary to fsync relations as they move out of the local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code is no longer needed and has been removed: there's no concept of a dirty relation anymore in md.c/fd.c, and we never fsync anything but WAL. Still TODO: improve local buffer management algorithms so that it would be reasonable to increase NLocBuffer.	2002-08-06 02:36:35 +00:00
Bruce Momjian	33f1687879	There already was a macro PageGetItemId; this is now used in (almost) all places, where pd_linp is accessed. Also introduce new macros SizeOfPageHeaderData and BTMaxItemSize. This is just source code cosmetic, no behaviour changed. Manfred Koizar	2002-07-02 05:48:44 +00:00
Bruce Momjian	d84fe82230	Update copyright to 2002.	2002-06-20 20:29:54 +00:00
Tom Lane	de09da547a	Wups, managed to break ANALYZE with one aspect of that heap_fetch change.	2002-05-24 19:52:43 +00:00
Tom Lane	3f4d488022	Mark index entries "killed" when they are no longer visible to any transaction, so as to avoid returning them out of the index AM. Saves repeated heap_fetch operations on frequently-updated rows. Also detect queries on unique keys (equality to all columns of a unique index), and don't bother continuing scan once we have found first match. Killing is implemented in the btree and hash AMs, but not yet in rtree or gist, because there isn't an equally convenient place to do it in those AMs (the outer amgetnext routine can't do it without re-pinning the index page). Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and index_insert to make this a little easier.	2002-05-24 18:57:57 +00:00
Tom Lane	44fbe20d62	Restructure indexscan API (index_beginscan, index_getnext) per yesterday's proposal to pghackers. Also remove unnecessary parameters to heap_beginscan, heap_rescan. I modified pg_proc.h to reflect the new numbers of parameters for the AM interface routines, but did not force an initdb because nothing actually looks at those fields.	2002-05-20 23:51:44 +00:00
Bruce Momjian	92288a1cf9	Change made to elog: o Change all current CVS messages of NOTICE to WARNING. We were going to do this just before 7.3 beta but it has to be done now, as you will see below. o Change current INFO messages that should be controlled by client_min_messages to NOTICE. o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc. to always go to the client. o Remove INFO from the client_min_messages options and add NOTICE. Seems we do need three non-ERROR elog levels to handle the various behaviors we need for these messages. Regression passed.	2002-03-06 06:10:59 +00:00
Bruce Momjian	a033daf566	Commit to match discussed elog() changes. Only update is that LOG is now just below FATAL in server_min_messages. Added more text to highlight ordering difference between it and client_min_messages. --------------------------------------------------------------------------- REALLYFATAL => PANIC STOP => PANIC New INFO level the prints to client by default New LOG level the prints to server log by default Cause VACUUM information to print only to the client NOTICE => INFO where purely information messages are sent DEBUG => LOG for purely server status messages DEBUG removed, kept as backward compatible DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added DebugLvl removed in favor of new DEBUG[1-5] symbols New server_min_messages GUC parameter with values: DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC New client_min_messages GUC parameter with values: DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC Server startup now logged with LOG instead of DEBUG Remove debug_level GUC parameter elog() numbers now start at 10 Add test to print error message if older elog() values are passed to elog() Bootstrap mode now has a -d that requires an argument, like postmaster	2002-03-02 21:39:36 +00:00
Tom Lane	aa00e6134e	Add more sanity-checking to PageAddItem and PageIndexTupleDelete, to prevent spreading of corruption when page header pointers are bad. Merge PageZero into PageInit, since it was never used separately, and remove separate memset calls used at most other PageInit call points. Remove IndexPageCleanup, which wasn't used at all.	2002-01-15 22:14:17 +00:00
Tom Lane	3b6cbce458	Add CHECK_FOR_INTERRUPTS() in various strategic spots, per comments from Hiroshi.	2002-01-06 00:37:44 +00:00
Tom Lane	1ccc67600b	Fix race condition that could allow two concurrent transactions to insert the same key into a supposedly unique index. The bug is of low probability, and may not explain any of the recent reports of duplicated rows; but a bug is a bug.	2002-01-01 20:32:37 +00:00
Tom Lane	cd255bb070	Fix boundary condition in btbulkdelete: don't examine high key in case where rightmost index page splits while we are waiting to obtain exclusive lock on it. Not clear this would actually hurt (probably the callback would always fail), but better safe than sorry. Also, improve comments describing concurrency considerations in this code.	2001-11-23 23:41:54 +00:00
Tom Lane	f6ee99a062	Clean up usage-statistics display code (ShowUsage and friends). StatFp is gone, usage messages now go through elog(DEBUG).	2001-11-10 23:51:14 +00:00
Bruce Momjian	ea08e6cd55	New pgindent run with fixes suggested by Tom. Patch manually reviewed, initdb/regression tests pass.	2001-11-05 17:46:40 +00:00
Bruce Momjian	6783b2372e	Another pgindent run. Fixes enum indenting, and improves #endif spacing. Also adds space for one-line comments.	2001-10-28 06:26:15 +00:00
Bruce Momjian	b81844b173	pgindent run on all C files. Java run to follow. initdb/regression tests pass.	2001-10-25 05:50:21 +00:00
Tom Lane	85801a4dbd	Rearrange fmgr.c and relcache so that it's possible to keep FmgrInfo lookup info in the relcache for index access method support functions. This makes a huge difference for dynamically loaded support functions, and should save a few cycles even for built-in ones. Also tweak dfmgr.c so that load_external_function is called only once, not twice, when doing fmgr_info for a dynamically loaded function. All per performance gripe from Teodor Sigaev, 5-Oct-01.	2001-10-06 23:21:45 +00:00
Tom Lane	1663f33838	Tweak btree page split logic so that when splitting a page that is rightmost on its tree level, we split 2/3 to the left and 1/3 to the new right page, rather than the even split we use elsewhere. The idea is that when faced with a steadily increasing series of inserted keys (such as sequence or timestamp values), we'll end up with a btree that's about 2/3ds full not 1/2 full, which is much closer to the desired steady-state load for a btree. Per suggestion from Ann Harrison of IBPhoenix.	2001-09-29 23:49:51 +00:00
Tom Lane	7326e78c42	Ensure that all TransactionId comparisons are encapsulated in macros (TransactionIdPrecedes, TransactionIdFollows, etc). First step on the way to transaction ID wrap solution ...	2001-08-23 23:06:38 +00:00
Tom Lane	c8076f09d2	Restructure index AM interface for index building and index tuple deletion, per previous discussion on pghackers. Most of the duplicate code in different AMs' ambuild routines has been moved out to a common routine in index.c; this means that all index types now do the right things about inserting recently-dead tuples, etc. (I also removed support for EXTEND INDEX in the ambuild routines, since that's about to go away anyway, and it cluttered the code a lot.) The retail indextuple deletion routines have been replaced by a "bulk delete" routine in which the indexscan is inside the access method. I haven't pushed this change as far as it should go yet, but it should allow considerable simplification of the internal bookkeeping for deletions. Also, add flag columns to pg_am to eliminate various hardcoded tests on AM OIDs, and remove unused pg_am columns. Fix rtree and gist index types to not attempt to store NULLs; before this, gist usually crashed, while rtree managed not to crash but computed wacko bounding boxes for NULL entries (which might have had something to do with the performance problems we've heard about occasionally). Add AtEOXact routines to hash, rtree, and gist, all of which have static state that needs to be reset after an error. We discovered this need long ago for btree, but missed the other guys. Oh, one more thing: concurrent VACUUM is now the default.	2001-07-15 22:48:19 +00:00
Tom Lane	e0c9301c87	Install infrastructure for shared-memory free space map. Doesn't actually do anything yet, but it has the necessary connections to initialization and so forth. Make some gestures towards allowing number of blocks in a relation to be BlockNumber, ie, unsigned int, rather than signed int. (I doubt I got all the places that are sloppy about it, yet.) On the way, replace the hardwired NLOCKS_PER_XACT fudge factor with a GUC variable.	2001-06-27 23:31:40 +00:00
Jan Wieck	8d80b0d980	Statistical system views (yet without the config stuff, but it's hard to keep such massive changes in sync with the tree so I need to get it in and work from there now). Jan	2001-06-22 19:16:24 +00:00
Tom Lane	f1d5d0905c	Tweak StrategyEvaluation data structure to eliminate hardwired limit on number of strategies supported by an index AM. Add missing copyright notices and CVS $Header$ markers to GIST source files.	2001-05-30 19:53:40 +00:00
Bruce Momjian	dc0ff5c67a	Small code cleanups,formatting.	2001-05-18 21:24:20 +00:00
Bruce Momjian	806aba49fd	Small cleanup of spacing.	2001-05-17 14:59:31 +00:00
Tom Lane	f905d65ee3	Rewrite of planner statistics-gathering code. ANALYZE is now available as a separate statement (though it can still be invoked as part of VACUUM, too). pg_statistic redesigned to be more flexible about what statistics are stored. ANALYZE now collects a list of several of the most common values, not just one, plus a histogram (not just the min and max values). Random sampling is used to make the process reasonably fast even on very large tables. The number of values and histogram bins collected is now user-settable via an ALTER TABLE command. There is more still to do; the new stats are not being used everywhere they could be in the planner. But the remaining changes for this project should be localized, and the behavior is already better than before. A not-very-related change is that sorting now makes use of btree comparison routines if it can find one, rather than invoking '<' twice.	2001-05-07 00:43:27 +00:00
Tom Lane	2792374cff	Ensure that btree sort ordering functions and boolean comparison operators give consistent results for all datatypes. Types float4, float8, and numeric were broken for NaN values; abstime, timestamp, and interval were broken for INVALID values; timetz was just plain broken (some possible pairs of values were neither < nor = nor >). Also clean up text, bpchar, varchar, and bit/varbit to eliminate duplicate code and thereby reduce the probability of similar inconsistencies arising in the future.	2001-05-03 19:00:37 +00:00
Bruce Momjian	7cf952e7b4	Fix comments that were mis-wrapped, for Tom Lane.	2001-03-23 04:49:58 +00:00
Bruce Momjian	0686d49da0	Remove dashes in comments that don't need them, rewrap with pgindent.	2001-03-22 06:16:21 +00:00
Bruce Momjian	9e1552607a	pgindent run. Make it all clean.	2001-03-22 04:01:46 +00:00
Vadim B. Mikheev	c19dadbf08	Runtime btree recovery is now ON by default.	2001-02-07 23:35:33 +00:00
Vadim B. Mikheev	b18c09ee3a	Runtime tree recovery is implemented, just testing is left -:)	2001-02-02 19:49:15 +00:00
Vadim B. Mikheev	dca0762efc	Couple additional functions to fix tree at runtime. Need in one more function to handle "my bits moved..." case. FixBTree is still FALSE.	2001-01-31 01:08:36 +00:00
Vadim B. Mikheev	598a12722a	Call _bt_fixroot() from _bt_insertonpg.	2001-01-29 07:28:17 +00:00
Tom Lane	0d54d6ac44	Clean up handling of tuple descriptors so that result-tuple descriptors allocated by plan nodes are not leaked at end of query. This doesn't really matter for normal queries, but it sure does for queries invoked repetitively inside SQL functions. Clean up some other grotty code associated with tupdescs, and fix a few other memory leaks exposed by tests with simple SQL functions.	2001-01-29 00:39:20 +00:00
Vadim B. Mikheev	c6e6d292bc	First step in attempt to fix tree at runtime: create upper levels and new root page if old root one was splitted but new root page wasn't created. New code is protected by FixBTree bool flag setted to FALSE, so nothing should be affected by this untested approach.	2001-01-26 01:24:31 +00:00
Bruce Momjian	623bf843d2	Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.	2001-01-24 19:43:33 +00:00
Tom Lane	4e27b308e2	Do _bt_wrtbuf() outside critical section, per discussion with Vadim 1/19.	2001-01-23 23:29:22 +00:00
Tom Lane	36839c1927	Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts are treated more like 'cancel' interrupts: the signal handler sets a flag that is examined at well-defined spots, rather than trying to cope with an interrupt that might happen anywhere. See pghackers discussion of 1/12/01.	2001-01-14 05:08:17 +00:00
Tom Lane	6162432de9	Add more critical-section calls: all code sections that hold spinlocks are now critical sections, so as to ensure die() won't interrupt us while we are munging shared-memory data structures. Avoid insecure intermediate states in some code that proc_exit will call, like palloc/pfree. Rename START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be what people tend to call them anyway, and make them be called with () like a function call, in hopes of not confusing pg_indent. I doubt that this is sufficient to make SIGTERM safe anywhere; there's just too much code that could get invoked during proc_exit().	2001-01-12 21:54:01 +00:00
Vadim B. Mikheev	7d363c4c33	MUST update (in-memory) data page BEFORE XLogInsert to log NEW page content if WAL will decide to backup page.	2000-12-29 20:47:17 +00:00
Vadim B. Mikheev	b3c4f03c9c	nbtree_xlog_newroot: set meta flag in meta page opaque.	2000-12-29 08:08:59 +00:00
Vadim B. Mikheev	7ceeeb662f	New WAL version - CRC and data blocks backup.	2000-12-28 13:00:29 +00:00
Vadim B. Mikheev	65b362fae1	Disable elog(ERROR\|FATAL) in signal handlers in critical sections of code.	2000-12-03 10:27:29 +00:00
Vadim B. Mikheev	81c8c244b2	No more #ifdef XLOG.	2000-11-30 08:46:26 +00:00
Tom Lane	680b7357ce	Rearrange bufmgr header files so that buf_internals.h need not be included by everything that includes bufmgr.h --- it's supposed to be internals, after all, not part of the API! This fixes the conflict against FreeBSD headers reported by Rosenman, by making it unnecessary for s_lock.h to be included by plperl.c.	2000-11-30 01:39:08 +00:00
Peter Eisentraut	a70e74b060	Put external declarations into header files.	2000-11-21 21:16:06 +00:00
Bruce Momjian	312063c97b	Make pgsql compile on FreeBSD-alpha. Context diff this time. Remove -m486 compile args for FreeBSD-i386, compile -O2 on i386. Compile with only -O on alpha for codegen safety. Make the port use the TEST_AND_SET for alpha and i386 on FreeBSD. Fix a lot of bogus string formats for outputting pointers (cast to int and %u/%x replaced with no cast and %p), and 'Size'(size_t) are now cast to 'unsigned long' and output with %lu/ Remove an unused variable. Alfred Perlstein	2000-11-16 05:51:07 +00:00
Tom Lane	3908473c80	Make DROP TABLE rollback-able: postpone physical file delete until commit. (WAL logging for this is not done yet, however.) Clean up a number of really crufty things that are no longer needed now that DROP behaves nicely. Make temp table mapper do the right things when drop or rename affecting a temp table is rolled back. Also, remove "relation modified while in use" error check, in favor of locking tables at first reference and holding that lock throughout the statement.	2000-11-08 22:10:03 +00:00
Vadim B. Mikheev	855ffa0be0	Forgot to check page LSN and unlock buffer in btree_xlog_delete - fixed. (Thanks to Tatsuo Ishii for finding bug)	2000-11-01 20:39:58 +00:00
Vadim B. Mikheev	e3ba543525	WAL fixes.	2000-10-29 18:33:41 +00:00
Vadim B. Mikheev	a7fcadd10a	WAL	2000-10-21 15:43:36 +00:00
Vadim B. Mikheev	b58c0411ba	redo/undo support functions and cleanups.	2000-10-20 11:01:21 +00:00
Vadim B. Mikheev	deee783052	WAL	2000-10-13 12:05:22 +00:00
Vadim B. Mikheev	25a26a7ab8	WAL	2000-10-13 02:03:02 +00:00
Tom Lane	32616129cd	Suppress gcc warnings.	2000-10-05 20:10:20 +00:00
Vadim B. Mikheev	5800c6b9aa	Btree WAL logging.	2000-10-04 00:04:43 +00:00
Peter Eisentraut	424f0edcb8	Fix relative path references so that make knowns which dependencies refer to one another. Sort out builddir vs srcdir variable namings. Remove some now obsoleted make variables.	2000-08-31 16:12:35 +00:00
Tom Lane	40549e9cb5	Tweak btree insertion to avoid O(N^2) slowdown with large numbers of equal keys. See discussion of today's date in pghackers list.	2000-08-25 23:13:33 +00:00
Hiroshi Inoue	b0d5036c7c	CREATE btree INDEX takes dead tuples into account when old transactions are running.	2000-08-10 02:33:20 +00:00
Tom Lane	dc73e25a5e	Add commentary about varying usage of scankeys in btree code.	2000-07-25 05:26:40 +00:00
Tom Lane	916b2321ad	Clean up and document btree code for ordering keys. Neat stuff, actually, but who could understand it with no comments? Fix bug while at it: _bt_orderkeys would try to invoke comparisons on NULL inputs, given the right sort of redundant quals.	2000-07-25 04:47:59 +00:00
Tom Lane	421f0baaff	Further cleanup of btbuild (CREATE INDEX). Avoid storing unneeded left keys during bottom-up index build, and leave some free space instead of packing the pages to the brim (so as to avoid vast numbers of page splits during the first interactive insertions).	2000-07-21 22:14:09 +00:00
Tom Lane	1ea912e16d	Fix sloppiness about alignment requirements in findsplitloc() space calculation, also make it stop when it has a 'good enough' split instead of exhaustively trying all split points.	2000-07-21 19:21:00 +00:00
Tom Lane	9e85183bfc	Major overhaul of btree index code. Eliminate special BTP_CHAIN logic for duplicate keys by letting search go to the left rather than right when an equal key is seen at an upper tree level. Fix poor choice of page split point (leading to insertion failures) that was forced by chaining logic. Don't store leftmost key in non-leaf pages, since it's not necessary. Don't create root page until something is first stored in the index, so an unused index is now 8K not 16K. (Doesn't seem to be as easy to get rid of the metadata page, unfortunately.) Massive cleanup of unreadable code, fix poor, obsolete, and just plain wrong documentation and comments. See src/backend/access/nbtree/README for the gory details.	2000-07-21 06:42:39 +00:00
Tom Lane	6bfe64032e	Cleanup of code for creating index entries. Functional indexes with pass-by-ref data types --- eg, an index on lower(textfield) --- no longer leak memory during index creation or update. Clean up a lot of redundant code ... did you know that copy, vacuum, truncate, reindex, extend index, and bootstrap each basically duplicated the main executor's logic for extracting information about an index and preparing index entries? Functional indexes should be a little faster now too, due to removal of repeated function lookups. CREATE INDEX 'opt_type' clause is deimplemented by these changes, but I haven't removed it from the parser yet (need to merge with Thomas' latest change set first).	2000-07-14 22:18:02 +00:00
Tom Lane	badce86a2c	First stage of reclaiming memory in executor by resetting short-term memory contexts. Currently, only leaks in expressions executed as quals or projections are handled. Clean up some old dead cruft in executor while at it --- unused fields in state nodes, that sort of thing.	2000-07-12 02:37:39 +00:00
Tom Lane	c590273fef	Clean up bogosities in pg_opclass, pg_amop, pg_amproc. There are amproc entries now for int8 and network hash indexes. int24_ops and int42_ops are gone. pg_opclass no longer contains multiple entries claiming to be the default opclass for the same datatype. opr_sanity regress test extended to catch errors like these in the future.	2000-06-19 03:55:01 +00:00
Tom Lane	edf0b5f0db	Get rid of IndexIsUniqueNoCache() kluge by the simple expedient of passing the index-is-unique flag to index build routines (duh! ... why wasn't it done this way to begin with?). Aside from eliminating an eyesore, this should save a few milliseconds in btree index creation because a full scan of pg_index is not needed any more.	2000-06-17 23:41:51 +00:00
Bruce Momjian	946e80c435	Final #include cleanup.	2000-06-15 04:10:30 +00:00
Bruce Momjian	df43800fc8	Clean up #include's.	2000-06-15 03:33:12 +00:00
Tom Lane	ff7b9f5541	I had overlooked the fact that some fmgr-callable functions return void --- ie, they're only called for side-effects. Add a PG_RETURN_VOID() macro and use it where appropriate. This probably doesn't change the machine code by a single bit ... it's just for documentation.	2000-06-14 05:24:50 +00:00
Tom Lane	f2d1205322	Another batch of fmgr updates. I think I have gotten all old-style functions that take pass-by-value datatypes. Should be ready for port testing ...	2000-06-13 07:35:40 +00:00
Tom Lane	ae526b4070	Another round of updates for new fmgr, mostly in the datetime code.	2000-06-09 01:11:16 +00:00
Bruce Momjian	20ad43b576	Mark functions as static and ifdef NOT_USED as appropriate.	2000-06-08 22:38:00 +00:00
Tom Lane	48165ec226	Latest round of fmgr updates. All functions with bool,char, or int2 inputs have been converted to newstyle. This should go a long way towards fixing our portability problems with platforms where char and short parameters are passed differently from int-width parameters. Still more to do for the Alpha port however.	2000-06-05 07:29:25 +00:00
Peter Eisentraut	6a68f42648	The heralded `Grand Unified Configuration scheme' (GUC) That means you can now set your options in either or all of $PGDATA/configuration, some postmaster option (--enable-fsync=off), or set a SET command. The list of options is in backend/utils/misc/guc.c, documentation will be written post haste. pg_options is gone, so is that pq_geqo config file. Also removed were backend -K, -Q, and -T options (no longer applicable, although -d0 does the same as -Q). Added to configure an --enable-syslog option. changed all callers from TPRINTF to elog(DEBUG)	2000-05-31 00:28:42 +00:00
Tom Lane	0f1e39643d	Third round of fmgr updates: eliminate calls using fmgr() and fmgr_faddr() in favor of new-style calls. Lots of cleanup of sloppy casts to use XXXGetDatum and DatumGetXXX ...	2000-05-30 04:25:00 +00:00
Tom Lane	091126fa28	Generated header files parse.h and fmgroids.h are now copied into the src/include tree, so that -I backend is no longer necessary anywhere. Also, clean up some bit rot in contrib tree.	2000-05-29 05:45:56 +00:00
Bruce Momjian	52f77df613	Ye-old pgindent run. Same 4-space tabs.	2000-04-12 17:17:23 +00:00
Tom Lane	341b328b18	Fix a bunch of minor portability problems and maybe-bugs revealed by running gcc and HP's cc with warnings cranked way up. Signed vs unsigned comparisons, routines declared static and then defined not-static, that kind of thing. Tedious, but perhaps useful...	2000-03-17 02:36:41 +00:00
Hiroshi Inoue	e3a97b370c	Implement reindex command	2000-02-18 09:30:20 +00:00
Tom Lane	8cb624262a	Replace inefficient _bt_invokestrat calls with direct calls to the appropriate btree three-way comparison routine. Not clear why the three-way comparison routines were being used in some paths and not others in btree --- incomplete changes by someone long ago, maybe? Anyway, this makes for a nice speedup in CREATE INDEX.	2000-02-18 06:32:39 +00:00
Bruce Momjian	7528fd2d52	Add btree indexing of boolean values Don Baccus	2000-02-10 19:51:52 +00:00
Bruce Momjian	1380921e65	Patch from Hiroshi for overflow btree comparison.	2000-01-28 17:23:47 +00:00
Bruce Momjian	5c25d60244	Add: * Portions Copyright (c) 1996-2000, PostgreSQL, Inc to all files copyright Regents of Berkeley. Man, that's a lot of files.	2000-01-26 05:58:53 +00:00
Tom Lane	6d1efd76fb	Fix handling of NULL constraint conditions: per SQL92 spec, a NULL result from a constraint condition does not violate the constraint (cf. discussion on pghackers 12/9/99). Implemented by adding a parameter to ExecQual, specifying whether to return TRUE or FALSE when the qual result is really NULL in three-valued boolean logic. Currently, ExecRelCheck is the only caller that asks for TRUE, but if we find any other places that have the wrong response to NULL, it'll be easy to fix them.	2000-01-19 23:55:03 +00:00
Peter Eisentraut	1cd4c14116	Fixed all elog related warnings, as well as a few others.	2000-01-15 02:59:43 +00:00
Bruce Momjian	8a093d0ae3	Make number of args to a function configurable.	2000-01-10 17:14:46 +00:00
Bruce Momjian	6456b17bc1	Rename oid8 -> oidvector and int28 -> int2vector. Cleanup of *out functions.	2000-01-10 16:13:23 +00:00
Tom Lane	b79e75d66f	Need defense against oversize index entries in btree CREATE INDEX, as well as when inserting entries into an existing index.	2000-01-08 21:24:49 +00:00
Tom Lane	a6a70315af	It turns out that the item size limit for btree indexes is about BLCKSZ/3, not BLCKSZ/2 as some of us thought. Add check for oversize item so that failure is detected before corrupting the index, not after.	1999-12-26 03:48:22 +00:00
Bruce Momjian	a82f9ffde6	New LDOUT makefile variable for QNX os.	1999-12-13 22:35:27 +00:00
Bruce Momjian	97dec77fab	Rename several destroy* functions/tags to drop*.	1999-12-10 03:56:14 +00:00
Bruce Momjian	3ffd3d82db	Make LD -r as macros that can be changed for QNX.	1999-12-09 19:15:45 +00:00
Bruce Momjian	4901ff77bd	Mention index name when reporting corruption.	1999-12-01 00:29:54 +00:00
Bruce Momjian	fc955b14ea	Add system indexes to match all caches. Make all system indexes unique. Make all cache loads use system indexes. Rename rel to relid in inheritance tables. Rename cache names to be clearer.	1999-11-22 17:56:41 +00:00
Tom Lane	a5150dc658	Fix typo so it actually compiles...	1999-11-14 19:01:04 +00:00
Bruce Momjian	103022c339	Add index recreation suggestion to end of world error message.	1999-11-14 16:22:59 +00:00
Bruce Momjian	86ef36c907	New NameStr macro to convert Name to Str. No need for var.data anymore. Fewer calls to nameout. Better use of RelationGetRelationName.	1999-11-07 23:08:36 +00:00
Tom Lane	26c48b5e8c	Final stage of psort reconstruction work: replace psort.c with a generalized module 'tuplesort.c' that can sort either HeapTuples or IndexTuples, and is not tied to execution of a Sort node. Clean up memory leakages in sorting, and replace nbtsort.c's private implementation of mergesorting with calls to tuplesort.c.	1999-10-17 22:15:09 +00:00
Bruce Momjian	7b2a8e4e56	Currently,only the first column of multi-column indices is used to find start scan position of Indexscan-s. To speed up finding scan start position,I have changed _bt_first() to use as many keys as possible. I'll attach the patch here. Regards. Hiroshi Inoue	1999-09-27 18:20:21 +00:00
Tom Lane	bd272cace6	Mega-commit to make heap_open/heap_openr/heap_close take an additional argument specifying the kind of lock to acquire/release (or 'NoLock' to do no lock processing). Ensure that all relations are locked with some appropriate lock level before being examined --- this ensures that relevant shared-inval messages have been processed and should prevent problems caused by concurrent VACUUM. Fix several bugs having to do with mismatched increment/decrement of relation ref count and mismatched heap_open/close (which amounts to the same thing). A bogus ref count on a relation doesn't matter much unless a SI Inval message happens to arrive at the wrong time, which is probably why we got away with this sloppiness for so long. Repair missing grab of AccessExclusiveLock in DROP TABLE, ALTER/RENAME TABLE, etc, as noted by Hiroshi. Recommend 'make clean all' after pulling this update; I modified the Relation struct layout slightly. Will post further discussion to pghackers list shortly.	1999-09-18 19:08:25 +00:00
Vadim B. Mikheev	1ecb43a40c	Re-use free space on index pages with duplicates.	1999-08-09 01:39:19 +00:00
Tom Lane	4488b69b4c	Fix nbtree's failure to clear BTScans list during xact abort. Also, move responsibility for calling vc_abort into main xact.c list of things-to-call-at-abort. What in the world was it doing down inside of TransactionIdAbort()?	1999-08-08 20:12:52 +00:00
Bruce Momjian	faf7d78174	Install new alignment code to use MAXALIGN rather than DOUBLEALIGN where approproate.	1999-07-19 07:07:29 +00:00
Bruce Momjian	3406901a29	Move some system includes into c.h, and remove duplicates.	1999-07-17 20:18:55 +00:00
Tom Lane	df454bd864	Fix silly typo in commentary...	1999-07-17 16:02:50 +00:00
Tom Lane	bc9236bc01	Revise _bt_binsrch() so that its binary search loop takes care of equal-key cases, eliminating bt_firsteq(). The linear search formerly done by bt_firsteq() took a lot of time in the case where many equal keys appear on the same page.	1999-07-16 22:17:06 +00:00
Bruce Momjian	a71802e12e	Final cleanup.	1999-07-16 05:00:38 +00:00
Bruce Momjian	a9591ce66a	Change #include's to use <> and "" as appropriate.	1999-07-15 23:04:24 +00:00
Bruce Momjian	2e6b1e63a3	Remove unused #includes in *.c files.	1999-07-15 22:40:16 +00:00
Bruce Momjian	4b2c2850bf	Clean up #include in /include directory. Add scripts for checking includes.	1999-07-15 15:21:54 +00:00
Bruce Momjian	0cf1b79528	Cleanup of /include #include's, for 6.6 only.	1999-07-14 01:20:30 +00:00
Vadim B. Mikheev	4c45832c39	Concurrency... Highest one... DO NOT EVEN TRY TO DO PageGetMaxOffsetNumber BEFORE LockBuffer! -:)	1999-06-07 15:14:54 +00:00
Vadim B. Mikheev	43c135e351	Have to release meta page before reading root one! < 6.5 versions were just not affected by this bug due to locking.	1999-06-07 14:28:22 +00:00
Bruce Momjian	fcff1cdf4e	Another pgindent run. Sorry folks.	1999-05-25 22:43:53 +00:00
Bruce Momjian	4eadfe8754	Make 0x007f -> (unsigned)0x7f to make pgindent happy.	1999-05-25 22:04:56 +00:00
Vadim B. Mikheev	7d443a85af	Get rid of page-level locking in btree-s. LockBuffer is used to acquire read/write access to index pages. Pages are released before leaving index internals.	1999-05-25 18:20:31 +00:00
Bruce Momjian	07842084fe	pgindent run over code.	1999-05-25 16:15:34 +00:00
Tom Lane	71d5d95376	Update hash and join routines to use fd.c's new temp-file code, instead of not-very-bulletproof stuff they had before.	1999-05-09 00:53:22 +00:00
Vadim B. Mikheev	b4c7a5655d	Patch from "Hiroshi Inoue" <Inoue@tpf.co.jp> for FATAL 1:btree: BTP_CHAIN flag was expected	1999-05-01 16:09:45 +00:00
Vadim B. Mikheev	3888b53a58	Fix duplicating ROOT page in concurrent updates.	1999-04-22 08:19:59 +00:00
Bruce Momjian	174b552e71	There are some bugs about backward scanning using indexes. 1. Index Scan using plural indexids never scan backward as to the order of indexids. 2. The cursor using Index scan is not usable after moving past the end. This patch solves above bugs. Moreover the change of _bt_first() would be useful to extend ORDER BY patch by Jan Wieck for all descending order cases. Hiroshi Inoue	1999-04-13 17:18:29 +00:00
Vadim B. Mikheev	401293fcff	Unique btree-s: /* * Have to check is inserted heap tuple deleted one * (i.e. just moved to another place by vacuum)! */	1999-04-12 16:56:08 +00:00
Vadim B. Mikheev	fdf6be80f9	1. Vacuum is updated for MVCC. 2. Much faster btree tuples deletion in the case when first on page index tuple is deleted (no movement to the left page(s)). 3. Remember blkno of new root page in BTPageOpaque of left/right siblings when root page is splitted.	1999-03-28 20:32:42 +00:00
Bruce Momjian	817a3e6d39	Enclosed below I have a patch to allow a btree index on the int8 type. I would like some feedback on what the hash function for the int8 hash function in the ./backend/access/hash/hashfunc.c should return. Also, could someone (maybe Tomas Lockhart?) look-over the patch and make sure the system table entries are correct? I've tried to research them as much as I could, but some of them are still not clear to me. Thanks, -Ryan	1999-03-14 05:09:05 +00:00
Marc G. Fournier	8c3e8a8a0e	From: Tatsuo Ishii <t-ishii@sra.co.jp> Ok. I made patches replacing all of "#if FALSE" or "#if 0" to "#ifdef NOT_USED" for current. I have tested these patches in that the postgres binaries are identical.	1999-02-21 03:49:55 +00:00
Bruce Momjian	6724a50787	Change my-function-name-- to my_function_name, and optimizer renames.	1999-02-13 23:22:53 +00:00
Bruce Momjian	9322950aa4	Cleanup of source files where 'return' or 'var =' is alone on a line.	1999-02-03 21:18:02 +00:00
Vadim B. Mikheev	e3a1ab764e	READ COMMITTED isolevel is implemented and is default now.	1999-01-29 09:23:17 +00:00
Thomas G. Lockhart	974757f19a	Add a set of braces to clarify conditional nesting. gcc complained about ambiguities.	1999-01-20 16:24:59 +00:00
Bruce Momjian	7a6b562fdf	Apply Win32 patch from Horak Daniel.	1999-01-17 06:20:06 +00:00
Vadim B. Mikheev	3f7fbf85dc	Initial MVCC code. New code for locking buffer' context.	1998-12-15 12:47:01 +00:00
Vadim B. Mikheev	6beba218d7	New HeapTuple structure/interface.	1998-11-27 19:52:36 +00:00
Bruce Momjian	733ad60409	Fix for relname.data from SHIOZAKI Takehiko	1998-11-02 15:28:36 +00:00
Bruce Momjian	202751921d	Alignment cleanup so no more massive switch statements for alignment, just two macros.	1998-09-07 05:35:48 +00:00
Bruce Momjian	fa1a8d6a97	OK, folks, here is the pgindent output.	1998-09-01 04:40:42 +00:00
Bruce Momjian	af74855a60	Renaming cleanup, no pgindent yet.	1998-09-01 03:29:17 +00:00
Marc G. Fournier	7414d61950	From: Massimo Dal Zotto <dz@cs.unitn.it> > tprintf.patch > > tprintf.patch > > adds functions and macros which implement a conditional trace package > with the ability to change flags and numeric options of running > backends at runtime. > Options/flags can be specified in the command line and/or read from > the file pg_options in the data directory.	1998-08-25 21:34:10 +00:00
Bruce Momjian	7971539020	heap_fetch requires buffer pointer, must be released; heap_getnext no longer returns buffer pointer, can be gotten from scan; descriptor; bootstrap can create multi-key indexes; pg_procname index now is multi-key index; oidint2, oidint4, oidname are gone (must be removed from regression tests); use System Cache rather than sequential scan in many places; heap_modifytuple no longer takes buffer parameter; remove unused buffer parameter in a few other functions; oid8 is not index-able; remove some use of single-character variable names; cleanup Buffer variables usage and scan descriptor looping; cleaned up allocation and freeing of tuples; 18k lines of diff;	1998-08-19 02:04:17 +00:00
Vadim B. Mikheev	f73fc6eb29	Fix scan adjustment.	1998-07-30 05:05:05 +00:00
Vadim B. Mikheev	be8300b18f	Use Snapshot in heap access methods.	1998-07-27 19:38:40 +00:00
Bruce Momjian	6bd323c6b3	Remove un-needed braces around single statements.	1998-06-15 19:30:31 +00:00
Bruce Momjian	68f9c9819b	Remove added NullProc define, and use fmgr.h value from fmgr.h.	1998-05-13 03:44:24 +00:00
Bruce Momjian	09baa3cc81	This patch... 1. Removes the unnecessary "#define AbcRegProcedure 123"'s from pg_proc.h. 2. Changes those #defines to use the names already defined in fmgr.h. 3. Forces the make of fmgr.h in backend/Makefile instead of having it made as a dependency in access/common/Makefile hackhackhack 4. Rearranged the #includes to a less helter-skelter arrangement, also changing <file.h> to "file.h" to signify a non-system header. 5. Removed "pg_proc.h" from files where its only purpose was for the #defines removed in item #1. 6. Added "fmgr.h" to each file changed for completeness sake. Turns out that #6 was not necessary for some files because fmgr.h was being included in a roundabout way SIX levels deep by the first include. "access/genam.h" ->"access/relscan.h" ->"utils/rel.h" ->"access/strat.h" ->"access/skey.h" ->"fmgr.h" So adding fmgr.h really didn't add anything to the compile, hopefully just made it clearer to the programmer. S Darren.	1998-04-27 04:08:07 +00:00
Bruce Momjian	0d203b745d	Re-apply Darren's char2-16 removal code.	1998-04-26 04:12:15 +00:00
Marc G. Fournier	afdc54ab15	Oops...I used Relation->rd_fd->relname exactly, instead of using the actual variable name blush grin	1998-04-10 22:07:41 +00:00
Marc G. Fournier	57a40abd68	Okay, add relation name to the file generating the error...	1998-04-10 21:59:30 +00:00
Marc G. Fournier	0b746a7d05	See if I can determine where the BTP_CHAIN error is coming from...	1998-04-10 18:43:30 +00:00
Bruce Momjian	db21523314	Back out char2-char16 removal. Add later.	1998-04-07 18:14:38 +00:00
Bruce Momjian	1e801a8f16	Hi, Attached you'll find a (big) patch that fixes make dep and make depend in all Makefiles where I found it to be appropriate. It also removes the dependency in Makefile.global for NAMEDATALEN and OIDNAMELEN by making backend/catalog/genbki.sh and bin/initdb/initdb.sh a little smarter. This no longer requires initdb.sh that is turned into initdb with a sed script when installing Postgres, hence initdb.sh should be renamed to initdb (after the patch has been applied :-) ) This patch is against the 6.3 sources, as it took a while to complete. Please review and apply, Cheers, Jeroen van Vianen	1998-04-06 00:32:26 +00:00
Bruce Momjian	57b5966405	The following uuencoded, gzip'd file will ... 1. Remove the char2, char4, char8 and char16 types from postgresql 2. Change references of char16 to name in the regression tests. 3. Rename the char16.sql regression test to name.sql. 4. Modify the regression test scripts and outputs to match up. Might require new regression.{SYSTEM} files... Darren King	1998-03-30 17:28:21 +00:00
Vadim B. Mikheev	4af1e537d6	Fix scan adjusting for marked index tuples.	1998-02-28 13:53:18 +00:00
Bruce Momjian	a32450a585	pgindent run before 6.3 release, with Thomas' requested changes.	1998-02-26 04:46:47 +00:00
Marc G. Fournier	aa7244ed01	Change: #define TAPETEMP "pg_btsortXXXXXX" to: #define TAPETEMP "pg_btsortXXXXXXX" For some reason, under FreeBSD, it appears that the mktemp() value needs the extra 'X' to improve/ensure uniqueness	1998-02-21 19:23:14 +00:00
PostgreSQL Daemon	baef78d96b	Thank god for searchable mail archives. Patch by: wieck@sapserv.debis.de (Jan Wieck) One of the design rules of PostgreSQL is extensibility. And to follow this rule means (at least for me) that there should not only be a builtin PL. Instead I would prefer a defined interface for PL implemetations.	1998-01-15 19:46:37 +00:00
Marc G. Fournier	374bb5d261	Some very major changes by darrenk@insightdist.com (Darren King) ========================================== What follows is a set of diffs that cleans up the usage of BLCKSZ. As a side effect, the person compiling the code can change the value of BLCKSZ _at_their_own_risk_. By that, I mean that I've tried it here at 4096 and 16384 with no ill-effects. A value of 4096 _shouldn't_ affect much as far as the kernel/file system goes, but making it bigger than 8192 can have severe consequences if you don't know what you're doing. 16394 worked for me, _BUT_ when I went to 32768 and did an initdb, the SCSI driver broke and the partition that I was running under went to hell in a hand basket. Had to reboot and do a good bit of fsck'ing to fix things up. The patch can be safely applied though. Just leave BLCKSZ = 8192 and everything is as before. It basically only cleans up all of the references to BLCKSZ in the code. If this patch is applied, a comment in the config.h file though above the BLCKSZ define with warning about monkeying around with it would be a good idea. Darren darrenk@insightdist.com (Also cleans up some of the #includes in files referencing BLCKSZ.) ==========================================	1998-01-13 04:05:12 +00:00
Bruce Momjian	679d39b9c8	Goodbye ABORT. Hello ERROR for all errors.	1998-01-07 21:07:04 +00:00
Bruce Momjian	0d9fc5afd6	Change elog(WARN) to elog(ERROR) and elog(ABORT).	1998-01-05 03:35:55 +00:00
Marc G. Fournier	6e337eef45	Major cleanout of PORTNAME variables from Makefiles...bound to screw up some of the ports...	1997-12-20 00:29:35 +00:00
Marc G. Fournier	e2d9501094	Clean up the Makefiles Essentially, this cleans things up so that if PORTNAME isn't defined (I'm working on getting rid of it for FreeBSD, at least, to see if its possible) none of the PORTNAME related stuff gets passed around. Had a little bit of -I related redundancy as well	1997-12-17 04:31:34 +00:00
Thomas G. Lockhart	a440f8e3d7	Remove trailing period from an elog message. Most other messages do not have one.	1997-12-09 01:40:30 +00:00
Bruce Momjian	e9e1ff226f	Remove all time travel stuff. Small parser cleanup.	1997-11-20 23:24:03 +00:00
Vadim B. Mikheev	bd305f3f06	Fix multi-column index scans in internal pages.	1997-10-22 19:02:52 +00:00
Bruce Momjian	3f365ba0fc	Inline memset() as MemSet().	1997-09-18 20:22:58 +00:00
Bruce Momjian	59f6a57e59	Used modified version of indent that understands over 100 typedefs.	1997-09-08 21:56:23 +00:00
Bruce Momjian	075cede748	Add typdefs to pgindent run.	1997-09-08 20:59:27 +00:00
Bruce Momjian	319dbfa736	Another PGINDENT run that changes variable indenting and case label indenting. Also static variable indenting.	1997-09-08 02:41:22 +00:00
Bruce Momjian	1ccd423235	Massive commit to run PGINDENT on all .c and .h files.	1997-09-07 05:04:48 +00:00
Bruce Momjian	11ac1bf268	More NOT_USEDs	1997-08-20 14:54:35 +00:00
Bruce Momjian	1d8bbfd2e7	Make functions static where possible, enclose unused functions in #ifdef NOT_USED.	1997-08-19 21:40:56 +00:00
Bruce Momjian	ea5b5357cd	Remove more (void) and fix -Wall warnings.	1997-08-12 22:55:25 +00:00
Vadim B. Mikheev	1561684a2d	Compare 'char' and 'text' lexicographically.	1997-06-11 05:20:05 +00:00
Vadim B. Mikheev	71b3e93c50	Duplicates handling...	1997-06-10 07:28:50 +00:00
Vadim B. Mikheev	c8a38d5d97	Added check is new item successfuly inserted to a page or not.	1997-06-06 03:11:46 +00:00
Vadim B. Mikheev	139858e699	If we have to split leaf page in the chain of duplicates then we try to look at our right sibling first, but not farther, as it was in yesterday fix.	1997-05-31 06:35:56 +00:00
Vadim B. Mikheev	3f5834fb8c	Fix duplicates handling.	1997-05-30 18:35:40 +00:00
Marc G. Fournier	3e871388b5	From: Darren King <aixssd!darrenk@abs.net> Subject: [PATCHES] Re: [PORTS] AIX 6.1 fixes... Here are the patches for the two things that wouldn't make it thru the AIX compiler. The geo_ops.c change is harmless I believe. The nbtcompare.c patch fixes me, but I don't know about any other ports. Maybe wait on that one until Vadim decides what to do about the unsigned vs signed chars varlena issue.	1997-05-22 00:07:30 +00:00
Vadim B. Mikheev	c3b51e0d67	Bug: backend crashes in btbeginscan()->btrescan()->_bt_orderkeys() when btree used in innerscan with run-time key which value passed by pointer. Fix: keys ordering stuff moved to _bt_first(). Pointed by Thomas Lockhart.	1997-05-05 03:41:19 +00:00
Vadim B. Mikheev	72b523d055	_bt_endpoint fixed: set currentItemData to Invalid if no result.	1997-04-24 15:46:44 +00:00
Vadim B. Mikheev	538f58c04c	#ifdef BTREE_BUILD_STATS enables to get executor stats for btree building.	1997-04-18 03:37:57 +00:00
Vadim B. Mikheev	55f5354380	Fix bttextcmp() to use unsigned char*. #ifdef USE_LOCALE added.	1997-04-18 02:48:05 +00:00
Vadim B. Mikheev	329fb11262	1. BTREE_VERSION_1: using bti_itup->t_tid as unique identifier for a given index tuple (logical position within A LEVEL). bti_oid & bti_dummy taken off from BTItemData. 2. Fix for multi-column indices (nbtsearch.c): _bt_binsrch() - for searches on internal pages having keysize < number of attrs we point at the last item < the scankey, not at the first item = the scankey; _bt_moveright() - if keysize < number of attrs we compare scankey with _last_ item on current page to decide should we move right or not.	1997-04-16 01:48:29 +00:00
Vadim B. Mikheev	c56b20eee9	Fix btabstimecmp ().	1997-04-07 06:45:41 +00:00
Vadim B. Mikheev	14f6b387b1	+ NULLs handling Actually required by multi-column indices support. We still don't use btree for 'A is (not) null', but now btree keep items with NULL attrs using single rule for placing/finding items on pages: NULLs greater NOT_NULLs and NULL = NULL. + Bulkload code (nbtsort.c) support for multi-column indices building and NULLs. + Fix for btendscan()->pfree(scanopaque) from Chris Dunlop.	1997-03-24 08:48:16 +00:00
Marc G. Fournier	d146305065	Patches for Vadim's multikey indexing...	1997-03-18 18:41:37 +00:00
Marc G. Fournier	00bcb8a0ed	Change "WARN" message generated if a unique index is attempted on a table/key containing non-unique data	1997-02-25 03:38:23 +00:00
Vadim B. Mikheev	36058981a4	Added: UNIQUE feature to bulkload code.	1997-02-22 10:08:27 +00:00
Bruce Momjian	a17b01f320	Update btree patches that were missed.	1997-02-18 17:14:25 +00:00
Bruce Momjian	d38767fcb5	Add prototypes and remove unused variables from btree Fastbuild patch.	1997-02-14 22:47:36 +00:00
Marc G. Fournier	5d9f146c64	What looks like some major improvements to btree indexing... Patches from: aoki@CS.Berkeley.EDU (Paul M. Aoki) i gave jolly my btree bulkload code a long, long time ago but never gave him a bunch of my bugfixes. here's a diff against the 6.0 baseline. for some reason, this code has slowed down somewhat relative to the insertion-build code on very small tables. don't know why -- it used to be within about 10%. anyway, here are some (highly unscientific!) timings on a dec 3000/300 for synthetic tables with 10k, 100k and 1000k tuples (basically, 1mb, 10mb and 100mb heaps). 'c' means clustered (pre-sorted) inputs and 'u' means unclustered (randomly ordered) inputs. the 10k table basically fits in the buffer pool, but the 100k and 1000k tables don't. as you can see, insertion build is fine if you've sorted your heaps on your index key or if your heap fits in core, but is absolutely horrible on unordered data (yes, that's 7.5 hours to index 100mb of data...) because of the zillions of random i/os. if it doesn't work for you for whatever reason, you can always turn it back off by flipping the FastBuild flag in nbtree.c. i don't have time to maintain it. good luck! baseline code: time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest real 8.6 time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest real 9.1 time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest real 59.2 time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest real 652.4 time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest real 636.1 time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest real 26772.9 bulkloading code: time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest real 11.3 time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest real 10.4 time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest real 59.5 time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest real 63.5 time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest real 636.9 time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest real 701.0	1997-02-12 05:04:52 +00:00
Bruce Momjian	311c521d96	would you mind committing the following changes for me? (the first bug causes compilation to fail on alpha, the second causes a compiler in this environment	1997-01-25 21:09:20 +00:00
Vadim B. Mikheev	daec84f09d	Fixed (I hope) unique btree index implementation.	1997-01-10 10:06:20 +00:00
Vadim B. Mikheev	675457d6ab	index_insert has now HeapRelation as last param (for unique index implementation).	1997-01-10 09:46:33 +00:00
Vadim B. Mikheev	8fa5394c49	Releasing empty root page in _bt_endpoint () to avoid buffer leak.	1997-01-05 10:56:36 +00:00

... 6 7 8 9 10 ...

772 Commits