postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-04 12:56:51 +02:00

Author	SHA1	Message	Date
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	2007-11-15 22:25:18 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	6889303531	Redefine the lp_flags field of item pointers as having four states, rather than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.	2007-09-12 22:10:26 +00:00
Tom Lane	a8d539f124	To support external compression of archived WAL data, add a flag bit to WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.	2007-05-20 21:08:19 +00:00
Tom Lane	226a100568	Code review for btree page split WAL reduction patch. Make it actually work (original code always created a full-page image for the left page, thus leaving the intended savings unrealized), avoid risk of not having enough room on the page during xlog restore, squeeze out another couple bytes in the xlog record, clean up neglected comments.	2007-04-11 20:47:38 +00:00
Bruce Momjian	b79575ce45	Reduce WAL activity for page splits: > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas	2007-02-08 05:05:53 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	49a7610c36	Fix an ancient oversight in btree xlog replay. When trying to determine if an upper-level insertion completes a previously-seen split, we cannot simply grab the downlink block number out of the buffer, because the buffer could contain a later state of the page --- or perhaps the page doesn't even exist at all any more, due to relation truncation. These possibilities have been masked up to now because the use of full_page_writes effectively ensured that no xlog replay routine ever actually saw a page state newer than its own change. Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this in existing release branches, but we need a fix in HEAD if we want to have any hope of re-allowing full_page_writes. Accordingly, adjust the contents of btree WAL records so that we can always get the downlink block number from the WAL record rather than having to depend on buffer contents. Per report from Kevin Grittner and Peter Brant. Improve a few comments in related code while at it.	2006-04-13 03:53:05 +00:00
Tom Lane	89bda95d82	Remove the 'slow' path for btree index build, which built the btree incrementally by successive inserts rather than by sorting the data. We were only using the slow path during bootstrap, apparently because when first written it failed during bootstrap --- but it works fine now AFAICT. Removing it saves a hundred or so lines of code and produces noticeably (~10%) smaller initial states of the system catalog indexes. While that won't make much difference for heavily-modified catalogs, for the more static ones there may be a useful long-term performance improvement.	2006-04-01 03:03:37 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	288551fc60	Repair longstanding error in btree xlog replay: XLogReadBuffer should be passed extend = true whenever we are reading a page we intend to reinitialize completely, even if we think the page "should exist". This is because it might indeed not exist, if the relation got truncated sometime after the current xlog record was made and before the crash we're trying to recover from. These two thinkos appear to explain both of the old bug reports discussed here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php	2006-03-28 21:17:23 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	c389760c32	Remove the no-longer-useful BTItem/BTItemData level of structure, and just refer to btree index entries as plain IndexTuples, which is what they have been for a very long time. This is mostly just an exercise in removing extraneous notation, but it does save a palloc/pfree cycle per index insertion.	2006-01-25 23:04:21 +00:00
Tom Lane	23836fb1fb	A few trivial code cleanups motivated by reading warnings generated by a recent HP C compiler. Mostly, get rid of useless local variables that are assigned to but never used.	2005-10-18 01:06:24 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Tom Lane	94e03330cb	Create a routine PageIndexMultiDelete() that replaces a loop around PageIndexTupleDelete() with a single pass of compactification --- logic mostly lifted from PageRepairFragmentation. I noticed while profiling that a VACUUM that's cleaning up a whole lot of deleted tuples would spend as much as a third of its CPU time in PageIndexTupleDelete; not too surprising considering the loop method was roughly O(N^2) in the number of tuples involved.	2005-03-22 06:17:03 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	94d4d240bb	Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and shift support code into heapam.c accordingly. This is in service of soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to use this same record type for both heaps and indexes. Theoretically I should have forced initdb for this, but in practice there is no change in xlog contents because CVS tip will never really emit this record type anyhow...	2004-07-11 18:01:45 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	2095206de1	Adjust btree index build to not use shared buffers, thereby avoiding the locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.	2004-06-02 17:28:18 +00:00
Neil Conway	72b6ad6313	Use the new List API function names throughout the backend, and disable the list compatibility API by default. While doing this, I decided to keep the llast() macro around and introduce llast_int() and llast_oid() variants.	2004-05-30 23:40:41 +00:00
Neil Conway	d0b4399d81	Reimplement the linked list data structure used throughout the backend. In the past, we used a 'Lispy' linked list implementation: a "list" was merely a pointer to the head node of the list. The problem with that design is that it makes lappend() and length() linear time. This patch fixes that problem (and others) by maintaining a count of the list length and a pointer to the tail node along with each head node pointer. A "list" is now a pointer to a structure containing some meta-data about the list; the head and tail pointers in that structure refer to ListCell structures that maintain the actual linked list of nodes. The function names of the list API have also been changed to, I hope, be more logically consistent. By default, the old function names are still available; they will be disabled-by-default once the rest of the tree has been updated to use the new API names.	2004-05-26 04:41:50 +00:00
Neil Conway	192ad63bd7	More janitorial work: remove the explicit casting of NULL literals to a pointer type when it is not necessary to do so. For future reference, casting NULL to a pointer type is only necessary when (a) invoking a function AND either (b) the function has no prototype OR (c) the function is a varargs function.	2004-01-07 18:56:30 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	e33f205a94	Adjust btree index build procedure so that the btree metapage looks invalid (has the wrong magic number) until the build is entirely complete. This turns out to cost no additional writes in the normal case, since we were rewriting the metapage at the end of the process anyway. In normal scenarios there's no real gain in security, because a failed index build would roll back the transaction leaving an unused index file, but for rebuilding shared system indexes this seems to add some useful protection.	2003-09-29 23:40:26 +00:00
Bruce Momjian	46785776c4	Another pgindent run with updated typedefs.	2003-08-08 21:42:59 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	3bbd6af37c	Adjust btbulkdelete logic so that only one WAL record is issued while deleting multiple index entries on a single index page. This makes for a very substantial reduction in the amount of WAL traffic during a large delete operation.	2003-02-23 22:43:09 +00:00
Tom Lane	88dc31e3f2	First cut at recycling space in btree indexes. Still some rough edges to fix, but it seems to basically work...	2003-02-23 06:17:13 +00:00
Tom Lane	70508ba7ae	Make btree index structure adjustments and WAL logging changes needed to support btree compaction, as per proposal of a few days ago. btree index pages no longer store parent links, instead they have a level indicator (counting up from zero for leaf pages). The FixBTree recovery logic is removed, and replaced by code that detects missing parent-level insertions during WAL replay. Also, generate appropriate WAL entries when updating btree metapage and when building a btree index from scratch. I believe btree indexes are now completely WAL-legal for the first time. initdb forced due to index and WAL changes.	2003-02-21 00:06:22 +00:00

1 2

98 Commits