postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-05 00:58:04 +02:00

Author	SHA1	Message	Date
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Simon Riggs	efc16ea520	Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.	2009-12-19 01:32:45 +00:00
Tom Lane	e66d714386	Make sure that GIN fast-insert and regular code paths enforce the same tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom	2009-10-02 21:14:04 +00:00
Tom Lane	527f0ae3fa	Department of second thoughts: let's show the exact key during unique index build failures, too. Refactor a bit more since that error message isn't spelled the same.	2009-08-01 20:59:17 +00:00
Tom Lane	b680ae4bdb	Improve unique-constraint-violation error messages to include the exact values being complained of. In passing, also remove the arbitrary length limitation in the similar error detail message for foreign key violations. Itagaki Takahiro	2009-08-01 19:59:41 +00:00
Tom Lane	25d9bf2e3e	Support deferrable uniqueness constraints. The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed	2009-07-29 20:56:21 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Tom Lane	b4eae023bb	Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage by splitting it into three functions with better-defined behaviors. Zdenek Kotala	2008-11-03 20:47:49 +00:00
Heikki Linnakangas	96675bff1f	Fix bug in the WAL recovery code to finish an incomplete split. CacheInvalidateRelcache() crashes if called in WAL recovery, because the invalidation infrastructure hasn't been initialized yet. Back-patch to 8.2, where the bug was introduced.	2008-06-11 08:38:56 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Alvaro Herrera	73b0300b2a	Move the HTSU_Result enum definition into snapshot.h, to avoid including tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.	2008-03-26 21:10:39 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Tom Lane	ac1ae9f2fa	Improve a number of elog messages for not-supposed-to-happen cases in btrees, since these seem to happen after all in corrupted indexes. Make sure we supply the index name in all cases, and provide relevant block numbers where available. Also consistently identify the index name as such. Back-patch to 8.2, in hopes that this might help Mason Hale figure out his problem.	2007-12-31 04:52:05 +00:00
Tom Lane	93190c3098	Repair still another bug in the btree page split WAL reduction patch: it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.	2007-11-16 19:53:50 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	6889303531	Redefine the lp_flags field of item pointers as having four states, rather than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.	2007-09-12 22:10:26 +00:00
Peter Eisentraut	f4a3789b39	Clarify some error messages about duplicate things.	2007-06-03 22:16:03 +00:00
Tom Lane	a8d539f124	To support external compression of archived WAL data, add a flag bit to WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.	2007-05-20 21:08:19 +00:00
Tom Lane	226a100568	Code review for btree page split WAL reduction patch. Make it actually work (original code always created a full-page image for the left page, thus leaving the intended savings unrealized), avoid risk of not having enough room on the page during xlog restore, squeeze out another couple bytes in the xlog record, clean up neglected comments.	2007-04-11 20:47:38 +00:00
Tom Lane	e85a01df67	Clean up the representation of special snapshots by including a "method pointer" in every Snapshot struct. This allows removal of the case-by-case tests in HeapTupleSatisfiesVisibility, which should make it a bit faster (I didn't try any performance tests though). More importantly, we are no longer violating portable C practices by assuming that small integers are distinct from all pointer values, and HeapTupleSatisfiesDirty no longer has a non-reentrant API involving side-effects on a global variable. There were a couple of places calling HeapTupleSatisfiesXXX routines directly rather than through the HeapTupleSatisfiesVisibility macro. Since these places had to be changed anyway, I chose to make them go through the macro for uniformity. Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC to emphasize that it's only used with MVCC-type snapshots. I was sorely tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot, but forebore for the moment to avoid confusion and reduce the likelihood that this patch breaks some of the pending patches. Might want to reconsider doing that later.	2007-03-25 19:45:14 +00:00
Neil Conway	e1d8deb918	Fix a typo in a comment. Heikki Linnakangas.	2007-03-05 14:13:12 +00:00
Bruce Momjian	bc292937ae	Split _bt_insertonpg to two functions. Heikki Linnakangas	2007-03-03 20:13:06 +00:00
Bruce Momjian	6f519ad01c	btree source code cleanups: I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas	2007-02-21 20:02:17 +00:00
Alvaro Herrera	f8ebab901b	Fix reference-after-free in the new btree page split code, as reported by the buildfarm via Stefan Kaltenbrunner. Patch from Heikki Linnakangas.	2007-02-08 13:52:55 +00:00
Bruce Momjian	b79575ce45	Reduce WAL activity for page splits: > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas	2007-02-08 05:05:53 +00:00
Tom Lane	c76ed81513	Remove some dead code, per Heikki.	2007-02-06 14:55:11 +00:00
Tom Lane	6cefacd7c8	Correct an old logic error in btree page splitting: when considering a split exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.	2007-01-27 20:53:30 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	a46ca619f8	Suppress a few 'uninitialized variable' warnings that gcc emits only at -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.	2006-11-11 01:14:19 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	e093dcdd28	Add the ability to create indexes 'concurrently', that is, without blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.	2006-08-25 04:06:58 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	d29b66882a	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.	2006-07-11 21:05:57 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	d2896a9ed1	Arrange to cache btree metapage data in the relcache entry for the index, thereby saving a visit to the metapage in most index searches/updates. This wouldn't actually save any I/O (since in the old regime the metapage generally stayed in cache anyway), but it does provide a useful decrease in bufmgr traffic in high-contention scenarios. Per my recent proposal.	2006-04-25 22:46:05 +00:00
Tom Lane	49a7610c36	Fix an ancient oversight in btree xlog replay. When trying to determine if an upper-level insertion completes a previously-seen split, we cannot simply grab the downlink block number out of the buffer, because the buffer could contain a later state of the page --- or perhaps the page doesn't even exist at all any more, due to relation truncation. These possibilities have been masked up to now because the use of full_page_writes effectively ensured that no xlog replay routine ever actually saw a page state newer than its own change. Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this in existing release branches, but we need a fix in HEAD if we want to have any hope of re-allowing full_page_writes. Accordingly, adjust the contents of btree WAL records so that we can always get the downlink block number from the WAL record rather than having to depend on buffer contents. Per report from Kevin Grittner and Peter Brant. Improve a few comments in related code while at it.	2006-04-13 03:53:05 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	c389760c32	Remove the no-longer-useful BTItem/BTItemData level of structure, and just refer to btree index entries as plain IndexTuples, which is what they have been for a very long time. This is mostly just an exercise in removing extraneous notation, but it does save a palloc/pfree cycle per index insertion.	2006-01-25 23:04:21 +00:00
Tom Lane	73e3566078	Improve comments about btree's use of ScanKey data structures: there are two basically different kinds of scankeys, and we ought to try harder to indicate which is used in each place in the code. I've chosen the names "search scankey" and "insertion scankey", though you could make about as good an argument for "operator scankey" and "comparison function scankey".	2006-01-17 00:09:01 +00:00
Neil Conway	fb627b76cc	Cosmetic code cleanup: fix a bunch of places that used "return (expr);" rather than "return expr;" -- the latter style is used in most of the tree. I kept the parentheses when they were necessary or useful because the return expression was complex.	2006-01-11 08:43:13 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	766dc45d9f	Add defenses to btree and hash index AMs to do simple sanity checks on every index page they read; in particular to catch the case of an all-zero page, which PageHeaderIsValid allows to pass. It turns out hash already had this idea, but it was just Assert()ing things rather than doing a straight error check, and the Asserts were partially redundant with PageHeaderIsValid anyway. Per recent failure example from Jim Nasby. (gist still needs the same treatment.)	2005-11-06 19:29:01 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	e952ae1268	Fix longstanding bug found by Atsushi Ogawa: _bt_check_unique would mark the wrong buffer dirty when trying to kill a dead index entry that's on a page after the one it started on. No risk of data corruption, just inefficiency, but still a bug.	2005-10-12 17:18:03 +00:00
Tom Lane	303e089df5	Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.	2005-09-24 22:54:44 +00:00
Bruce Momjian	949ebbd55e	Mention MD5 function index for indexing long values.	2005-08-11 13:22:33 +00:00
Tom Lane	24ff62d76f	Make new hints follow style guide.	2005-08-10 22:39:00 +00:00
Bruce Momjian	237be3cc29	Add hints to cases where indexes fail because of values that are too long.	2005-08-10 21:36:46 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	ee4ddac137	Convert index-related tuple handling routines from char 'n'/' ' to bool convention for isnull flags. Also, remove the useless InsertIndexResult return struct from index AM aminsert calls --- there is no reason for the caller to know where in the index the tuple was inserted, and we were wasting a palloc cycle per insert to deliver this uninteresting value (plus nontrivial complexity in some AMs). I forced initdb because of the change in the signature of the aminsert routines, even though nothing really looks at those pg_proc entries...	2005-03-21 01:24:04 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	83cd2d8b0f	Make heap_fetch API more consistent by having the buffer remain pinned in all cases when keep_buf = true. This allows ANALYZE's inner loop to use heap_release_fetch, which saves multiple buffer lookups for the same page and avoids overestimation of cost by the vacuum cost mechanism.	2004-10-26 16:05:03 +00:00
Tom Lane	9ffc8ed58b	Repair possible failure to update hint bits back to disk, per http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.	2004-10-15 22:40:29 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	19cd31b068	Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initial value of 'start' could be past the end of the page, if the page was split by some concurrent inserting process since we visited it. In this situation the code could look at bogus entries and possibly find a match (since after all those entries still contain what they had before the split). This would lead to 'specified item offset is too large' followed by 'PANIC: failed to add item to the page', as reported by Joe Conway for scenarios involving heavy concurrent insertion activity.	2004-08-17 23:15:33 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	37fa3b6c89	Tweak indexscan and seqscan code to arrange that steps from one page to the next are handled by ReleaseAndReadBuffer rather than separate ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens to pull successive rows from the same heap page). Unfortunately this doesn't seem enough to get us out of the recently discussed context-switch storm problem, but it's surely worth doing anyway.	2004-04-21 18:24:26 +00:00
Neil Conway	192ad63bd7	More janitorial work: remove the explicit casting of NULL literals to a pointer type when it is not necessary to do so. For future reference, casting NULL to a pointer type is only necessary when (a) invoking a function AND either (b) the function has no prototype OR (c) the function is a varargs function.	2004-01-07 18:56:30 +00:00
Tom Lane	569659ae16	Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03.	2003-12-21 01:23:06 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	fa5c8a055a	Cross-data-type comparisons are now indexable by btrees, pursuant to my pghackers proposal of 8-Nov. All the existing cross-type comparison operators (int2/int4/int8 and float4/float8) have appropriate support. The original proposal of storing the right-hand-side datatype as part of the primary key for pg_amop and pg_amproc got modified a bit in the event; it is easier to store zero as the 'default' case and only store a nonzero when the operator is actually cross-type. Along the way, remove the long-since-defunct bigbox_ops operator class.	2003-11-12 21:15:59 +00:00
Tom Lane	c1d62bfd00	Add operator strategy and comparison-value datatype fields to ScanKey. Remove the 'strategy map' code, which was a large amount of mechanism that no longer had any use except reverse-mapping from procedure OID to strategy number. Passing the strategy number to the index AM in the first place is simpler and faster. This is a preliminary step in planned support for cross-datatype index operations. I'm committing it now since the ScanKeyEntryInitialize() API change touches quite a lot of files, and I want to commit those changes before the tree drifts under me.	2003-11-09 21:30:38 +00:00
Peter Eisentraut	feb4f44d29	Message editing: remove gratuitous variations in message wording, standardize terms, add some clarifications, fix some untranslatable attempts at dynamic message building.	2003-09-25 06:58:07 +00:00
Tom Lane	5ac2d7c0eb	In _bt_check_unique() loop, don't bother applying _bt_isequal() to killed items; just skip to the next item immediately. Only check for key equality when we reach a non-killed item or the end of the index page. This saves key comparisons when there are lots of killed items, as for example in a heavily-updated table that's not been vacuumed lately. Seems to be a win for pgbench anyway.	2003-09-02 22:10:16 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	81b5c8a136	A visit from the message-style police ...	2003-07-28 00:09:16 +00:00
Tom Lane	ec7aa4b515	Error message editing in backend/access.	2003-07-21 20:29:40 +00:00
Bruce Momjian	98b6f37e47	Make debug_ GUC varables output DEBUG1 rather than LOG, and mention in docs that CLIENT/LOG_MIN_MESSAGES now controls debug_* output location. Doc changes included.	2003-05-27 17:49:47 +00:00
Tom Lane	88dc31e3f2	First cut at recycling space in btree indexes. Still some rough edges to fix, but it seems to basically work...	2003-02-23 06:17:13 +00:00
Tom Lane	799bc58dc7	More infrastructure for btree compaction project. Tree-traversal code now knows what to do upon hitting a dead page (in theory anyway, it's untested...). Add a post-VACUUM-cleanup entry point for index AMs, to provide a place for dead-page scavenging to happen. Also, fix oversight that broke btpo_prev links in temporary indexes. initdb forced due to additions in pg_am.	2003-02-22 00:45:05 +00:00
Tom Lane	70508ba7ae	Make btree index structure adjustments and WAL logging changes needed to support btree compaction, as per proposal of a few days ago. btree index pages no longer store parent links, instead they have a level indicator (counting up from zero for leaf pages). The FixBTree recovery logic is removed, and replaced by code that detects missing parent-level insertions during WAL replay. Also, generate appropriate WAL entries when updating btree metapage and when building a btree index from scratch. I believe btree indexes are now completely WAL-legal for the first time. initdb forced due to index and WAL changes.	2003-02-21 00:06:22 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Tom Lane	5df307c778	Restructure local-buffer handling per recent pghackers discussion. The local buffer manager is no longer used for newly-created relations (unless they are TEMP); a new non-TEMP relation goes through the shared bufmgr and thus will participate normally in checkpoints. But TEMP relations use the local buffer manager throughout their lifespan. Also, operations in TEMP relations are not logged in WAL, thus improving performance. Since it's no longer necessary to fsync relations as they move out of the local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code is no longer needed and has been removed: there's no concept of a dirty relation anymore in md.c/fd.c, and we never fsync anything but WAL. Still TODO: improve local buffer management algorithms so that it would be reasonable to increase NLocBuffer.	2002-08-06 02:36:35 +00:00
Bruce Momjian	33f1687879	There already was a macro PageGetItemId; this is now used in (almost) all places, where pd_linp is accessed. Also introduce new macros SizeOfPageHeaderData and BTMaxItemSize. This is just source code cosmetic, no behaviour changed. Manfred Koizar	2002-07-02 05:48:44 +00:00
Bruce Momjian	d84fe82230	Update copyright to 2002.	2002-06-20 20:29:54 +00:00
Tom Lane	de09da547a	Wups, managed to break ANALYZE with one aspect of that heap_fetch change.	2002-05-24 19:52:43 +00:00
Tom Lane	3f4d488022	Mark index entries "killed" when they are no longer visible to any transaction, so as to avoid returning them out of the index AM. Saves repeated heap_fetch operations on frequently-updated rows. Also detect queries on unique keys (equality to all columns of a unique index), and don't bother continuing scan once we have found first match. Killing is implemented in the btree and hash AMs, but not yet in rtree or gist, because there isn't an equally convenient place to do it in those AMs (the outer amgetnext routine can't do it without re-pinning the index page). Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and index_insert to make this a little easier.	2002-05-24 18:57:57 +00:00
Bruce Momjian	92288a1cf9	Change made to elog: o Change all current CVS messages of NOTICE to WARNING. We were going to do this just before 7.3 beta but it has to be done now, as you will see below. o Change current INFO messages that should be controlled by client_min_messages to NOTICE. o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc. to always go to the client. o Remove INFO from the client_min_messages options and add NOTICE. Seems we do need three non-ERROR elog levels to handle the various behaviors we need for these messages. Regression passed.	2002-03-06 06:10:59 +00:00
Bruce Momjian	a033daf566	Commit to match discussed elog() changes. Only update is that LOG is now just below FATAL in server_min_messages. Added more text to highlight ordering difference between it and client_min_messages. --------------------------------------------------------------------------- REALLYFATAL => PANIC STOP => PANIC New INFO level the prints to client by default New LOG level the prints to server log by default Cause VACUUM information to print only to the client NOTICE => INFO where purely information messages are sent DEBUG => LOG for purely server status messages DEBUG removed, kept as backward compatible DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added DebugLvl removed in favor of new DEBUG[1-5] symbols New server_min_messages GUC parameter with values: DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC New client_min_messages GUC parameter with values: DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC Server startup now logged with LOG instead of DEBUG Remove debug_level GUC parameter elog() numbers now start at 10 Add test to print error message if older elog() values are passed to elog() Bootstrap mode now has a -d that requires an argument, like postmaster	2002-03-02 21:39:36 +00:00
Tom Lane	1ccc67600b	Fix race condition that could allow two concurrent transactions to insert the same key into a supposedly unique index. The bug is of low probability, and may not explain any of the recent reports of duplicated rows; but a bug is a bug.	2002-01-01 20:32:37 +00:00
Bruce Momjian	b81844b173	pgindent run on all C files. Java run to follow. initdb/regression tests pass.	2001-10-25 05:50:21 +00:00
Tom Lane	1663f33838	Tweak btree page split logic so that when splitting a page that is rightmost on its tree level, we split 2/3 to the left and 1/3 to the new right page, rather than the even split we use elsewhere. The idea is that when faced with a steadily increasing series of inserted keys (such as sequence or timestamp values), we'll end up with a btree that's about 2/3ds full not 1/2 full, which is much closer to the desired steady-state load for a btree. Per suggestion from Ann Harrison of IBPhoenix.	2001-09-29 23:49:51 +00:00
Tom Lane	7326e78c42	Ensure that all TransactionId comparisons are encapsulated in macros (TransactionIdPrecedes, TransactionIdFollows, etc). First step on the way to transaction ID wrap solution ...	2001-08-23 23:06:38 +00:00
Tom Lane	c8076f09d2	Restructure index AM interface for index building and index tuple deletion, per previous discussion on pghackers. Most of the duplicate code in different AMs' ambuild routines has been moved out to a common routine in index.c; this means that all index types now do the right things about inserting recently-dead tuples, etc. (I also removed support for EXTEND INDEX in the ambuild routines, since that's about to go away anyway, and it cluttered the code a lot.) The retail indextuple deletion routines have been replaced by a "bulk delete" routine in which the indexscan is inside the access method. I haven't pushed this change as far as it should go yet, but it should allow considerable simplification of the internal bookkeeping for deletions. Also, add flag columns to pg_am to eliminate various hardcoded tests on AM OIDs, and remove unused pg_am columns. Fix rtree and gist index types to not attempt to store NULLs; before this, gist usually crashed, while rtree managed not to crash but computed wacko bounding boxes for NULL entries (which might have had something to do with the performance problems we've heard about occasionally). Add AtEOXact routines to hash, rtree, and gist, all of which have static state that needs to be reset after an error. We discovered this need long ago for btree, but missed the other guys. Oh, one more thing: concurrent VACUUM is now the default.	2001-07-15 22:48:19 +00:00
Jan Wieck	8d80b0d980	Statistical system views (yet without the config stuff, but it's hard to keep such massive changes in sync with the tree so I need to get it in and work from there now). Jan	2001-06-22 19:16:24 +00:00
Bruce Momjian	9e1552607a	pgindent run. Make it all clean.	2001-03-22 04:01:46 +00:00
Vadim B. Mikheev	c19dadbf08	Runtime btree recovery is now ON by default.	2001-02-07 23:35:33 +00:00
Vadim B. Mikheev	b18c09ee3a	Runtime tree recovery is implemented, just testing is left -:)	2001-02-02 19:49:15 +00:00
Vadim B. Mikheev	dca0762efc	Couple additional functions to fix tree at runtime. Need in one more function to handle "my bits moved..." case. FixBTree is still FALSE.	2001-01-31 01:08:36 +00:00
Vadim B. Mikheev	598a12722a	Call _bt_fixroot() from _bt_insertonpg.	2001-01-29 07:28:17 +00:00
Vadim B. Mikheev	c6e6d292bc	First step in attempt to fix tree at runtime: create upper levels and new root page if old root one was splitted but new root page wasn't created. New code is protected by FixBTree bool flag setted to FALSE, so nothing should be affected by this untested approach.	2001-01-26 01:24:31 +00:00
Bruce Momjian	623bf843d2	Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.	2001-01-24 19:43:33 +00:00
Tom Lane	4e27b308e2	Do _bt_wrtbuf() outside critical section, per discussion with Vadim 1/19.	2001-01-23 23:29:22 +00:00
Tom Lane	36839c1927	Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts are treated more like 'cancel' interrupts: the signal handler sets a flag that is examined at well-defined spots, rather than trying to cope with an interrupt that might happen anywhere. See pghackers discussion of 1/12/01.	2001-01-14 05:08:17 +00:00
Tom Lane	6162432de9	Add more critical-section calls: all code sections that hold spinlocks are now critical sections, so as to ensure die() won't interrupt us while we are munging shared-memory data structures. Avoid insecure intermediate states in some code that proc_exit will call, like palloc/pfree. Rename START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be what people tend to call them anyway, and make them be called with () like a function call, in hopes of not confusing pg_indent. I doubt that this is sufficient to make SIGTERM safe anywhere; there's just too much code that could get invoked during proc_exit().	2001-01-12 21:54:01 +00:00
Vadim B. Mikheev	7d363c4c33	MUST update (in-memory) data page BEFORE XLogInsert to log NEW page content if WAL will decide to backup page.	2000-12-29 20:47:17 +00:00
Vadim B. Mikheev	7ceeeb662f	New WAL version - CRC and data blocks backup.	2000-12-28 13:00:29 +00:00
Vadim B. Mikheev	65b362fae1	Disable elog(ERROR\|FATAL) in signal handlers in critical sections of code.	2000-12-03 10:27:29 +00:00
Vadim B. Mikheev	81c8c244b2	No more #ifdef XLOG.	2000-11-30 08:46:26 +00:00
Bruce Momjian	312063c97b	Make pgsql compile on FreeBSD-alpha. Context diff this time. Remove -m486 compile args for FreeBSD-i386, compile -O2 on i386. Compile with only -O on alpha for codegen safety. Make the port use the TEST_AND_SET for alpha and i386 on FreeBSD. Fix a lot of bogus string formats for outputting pointers (cast to int and %u/%x replaced with no cast and %p), and 'Size'(size_t) are now cast to 'unsigned long' and output with %lu/ Remove an unused variable. Alfred Perlstein	2000-11-16 05:51:07 +00:00
Vadim B. Mikheev	a7fcadd10a	WAL	2000-10-21 15:43:36 +00:00
Vadim B. Mikheev	deee783052	WAL	2000-10-13 12:05:22 +00:00
Vadim B. Mikheev	25a26a7ab8	WAL	2000-10-13 02:03:02 +00:00
Tom Lane	32616129cd	Suppress gcc warnings.	2000-10-05 20:10:20 +00:00
Vadim B. Mikheev	5800c6b9aa	Btree WAL logging.	2000-10-04 00:04:43 +00:00
Tom Lane	40549e9cb5	Tweak btree insertion to avoid O(N^2) slowdown with large numbers of equal keys. See discussion of today's date in pghackers list.	2000-08-25 23:13:33 +00:00
Tom Lane	1ea912e16d	Fix sloppiness about alignment requirements in findsplitloc() space calculation, also make it stop when it has a 'good enough' split instead of exhaustively trying all split points.	2000-07-21 19:21:00 +00:00
Tom Lane	9e85183bfc	Major overhaul of btree index code. Eliminate special BTP_CHAIN logic for duplicate keys by letting search go to the left rather than right when an equal key is seen at an upper tree level. Fix poor choice of page split point (leading to insertion failures) that was forced by chaining logic. Don't store leftmost key in non-leaf pages, since it's not necessary. Don't create root page until something is first stored in the index, so an unused index is now 8K not 16K. (Doesn't seem to be as easy to get rid of the metadata page, unfortunately.) Massive cleanup of unreadable code, fix poor, obsolete, and just plain wrong documentation and comments. See src/backend/access/nbtree/README for the gory details.	2000-07-21 06:42:39 +00:00
Bruce Momjian	20ad43b576	Mark functions as static and ifdef NOT_USED as appropriate.	2000-06-08 22:38:00 +00:00
Tom Lane	0f1e39643d	Third round of fmgr updates: eliminate calls using fmgr() and fmgr_faddr() in favor of new-style calls. Lots of cleanup of sloppy casts to use XXXGetDatum and DatumGetXXX ...	2000-05-30 04:25:00 +00:00
Bruce Momjian	52f77df613	Ye-old pgindent run. Same 4-space tabs.	2000-04-12 17:17:23 +00:00
Tom Lane	341b328b18	Fix a bunch of minor portability problems and maybe-bugs revealed by running gcc and HP's cc with warnings cranked way up. Signed vs unsigned comparisons, routines declared static and then defined not-static, that kind of thing. Tedious, but perhaps useful...	2000-03-17 02:36:41 +00:00
Tom Lane	8cb624262a	Replace inefficient _bt_invokestrat calls with direct calls to the appropriate btree three-way comparison routine. Not clear why the three-way comparison routines were being used in some paths and not others in btree --- incomplete changes by someone long ago, maybe? Anyway, this makes for a nice speedup in CREATE INDEX.	2000-02-18 06:32:39 +00:00
Bruce Momjian	5c25d60244	Add: * Portions Copyright (c) 1996-2000, PostgreSQL, Inc to all files copyright Regents of Berkeley. Man, that's a lot of files.	2000-01-26 05:58:53 +00:00
Peter Eisentraut	1cd4c14116	Fixed all elog related warnings, as well as a few others.	2000-01-15 02:59:43 +00:00
Tom Lane	a6a70315af	It turns out that the item size limit for btree indexes is about BLCKSZ/3, not BLCKSZ/2 as some of us thought. Add check for oversize item so that failure is detected before corrupting the index, not after.	1999-12-26 03:48:22 +00:00
Bruce Momjian	fc955b14ea	Add system indexes to match all caches. Make all system indexes unique. Make all cache loads use system indexes. Rename rel to relid in inheritance tables. Rename cache names to be clearer.	1999-11-22 17:56:41 +00:00
Vadim B. Mikheev	1ecb43a40c	Re-use free space on index pages with duplicates.	1999-08-09 01:39:19 +00:00
Bruce Momjian	faf7d78174	Install new alignment code to use MAXALIGN rather than DOUBLEALIGN where approproate.	1999-07-19 07:07:29 +00:00
Bruce Momjian	3406901a29	Move some system includes into c.h, and remove duplicates.	1999-07-17 20:18:55 +00:00
Bruce Momjian	a71802e12e	Final cleanup.	1999-07-16 05:00:38 +00:00
Bruce Momjian	a9591ce66a	Change #include's to use <> and "" as appropriate.	1999-07-15 23:04:24 +00:00
Bruce Momjian	2e6b1e63a3	Remove unused #includes in *.c files.	1999-07-15 22:40:16 +00:00
Bruce Momjian	4b2c2850bf	Clean up #include in /include directory. Add scripts for checking includes.	1999-07-15 15:21:54 +00:00
Bruce Momjian	0cf1b79528	Cleanup of /include #include's, for 6.6 only.	1999-07-14 01:20:30 +00:00
Bruce Momjian	4eadfe8754	Make 0x007f -> (unsigned)0x7f to make pgindent happy.	1999-05-25 22:04:56 +00:00
Vadim B. Mikheev	7d443a85af	Get rid of page-level locking in btree-s. LockBuffer is used to acquire read/write access to index pages. Pages are released before leaving index internals.	1999-05-25 18:20:31 +00:00
Bruce Momjian	07842084fe	pgindent run over code.	1999-05-25 16:15:34 +00:00
Vadim B. Mikheev	b4c7a5655d	Patch from "Hiroshi Inoue" <Inoue@tpf.co.jp> for FATAL 1:btree: BTP_CHAIN flag was expected	1999-05-01 16:09:45 +00:00
Vadim B. Mikheev	3888b53a58	Fix duplicating ROOT page in concurrent updates.	1999-04-22 08:19:59 +00:00
Vadim B. Mikheev	401293fcff	Unique btree-s: /* * Have to check is inserted heap tuple deleted one * (i.e. just moved to another place by vacuum)! */	1999-04-12 16:56:08 +00:00
Vadim B. Mikheev	fdf6be80f9	1. Vacuum is updated for MVCC. 2. Much faster btree tuples deletion in the case when first on page index tuple is deleted (no movement to the left page(s)). 3. Remember blkno of new root page in BTPageOpaque of left/right siblings when root page is splitted.	1999-03-28 20:32:42 +00:00
Bruce Momjian	6724a50787	Change my-function-name-- to my_function_name, and optimizer renames.	1999-02-13 23:22:53 +00:00
Bruce Momjian	9322950aa4	Cleanup of source files where 'return' or 'var =' is alone on a line.	1999-02-03 21:18:02 +00:00
Vadim B. Mikheev	e3a1ab764e	READ COMMITTED isolevel is implemented and is default now.	1999-01-29 09:23:17 +00:00
Vadim B. Mikheev	3f7fbf85dc	Initial MVCC code. New code for locking buffer' context.	1998-12-15 12:47:01 +00:00
Vadim B. Mikheev	6beba218d7	New HeapTuple structure/interface.	1998-11-27 19:52:36 +00:00
Bruce Momjian	fa1a8d6a97	OK, folks, here is the pgindent output.	1998-09-01 04:40:42 +00:00
Bruce Momjian	af74855a60	Renaming cleanup, no pgindent yet.	1998-09-01 03:29:17 +00:00
Bruce Momjian	7971539020	heap_fetch requires buffer pointer, must be released; heap_getnext no longer returns buffer pointer, can be gotten from scan; descriptor; bootstrap can create multi-key indexes; pg_procname index now is multi-key index; oidint2, oidint4, oidname are gone (must be removed from regression tests); use System Cache rather than sequential scan in many places; heap_modifytuple no longer takes buffer parameter; remove unused buffer parameter in a few other functions; oid8 is not index-able; remove some use of single-character variable names; cleanup Buffer variables usage and scan descriptor looping; cleaned up allocation and freeing of tuples; 18k lines of diff;	1998-08-19 02:04:17 +00:00

1 2 3 4 5 ...

277 Commits