postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-09-30 09:41:17 +02:00

Author	SHA1	Message	Date
Tom Lane	a8d539f124	To support external compression of archived WAL data, add a flag bit to WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.	2007-05-20 21:08:19 +00:00
Alvaro Herrera	3b0347b36e	Move the tuple freezing point in CLUSTER to a point further back in the past, to avoid losing useful Xid information in not-so-old tuples. This makes CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes (though CLUSTER does not yet advance the table's relfrozenxid). While at it, move the actual freezing operation in rewriteheap.c to a more appropriate place, and document it thoroughly. This part of the patch from Tom Lane.	2007-05-17 15:28:29 +00:00
Alvaro Herrera	dfed0012bc	Have the rewriteheap code freeze old tuples. This is safe because it is only applied to live tuples older than a recent Xmin, not to tuples that may be part of an update chain. Those still keep their original markings. This patch makes it possible for CLUSTER to advance relfrozenxid, thus avoiding the need of vacuuming the table for Xid wraparound purposes. That will be patched separately. Patch from Heikki Linnakangas.	2007-05-16 16:36:56 +00:00
Tom Lane	0fef38da21	Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetching pages it intends to zero immediately. Just to show there is some use for that function besides WAL recovery :-). Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf and friends, instead of expecting callers to do that separately.	2007-05-03 16:45:58 +00:00
Tom Lane	8c3cc86e7b	During WAL recovery, when reading a page that we intend to overwrite completely from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas	2007-05-02 23:18:03 +00:00
Tom Lane	c432061963	Change the timestamps recorded in transaction commit/abort xlog records from time_t to TimestampTz representation. This provides full gettimeofday() resolution of the timestamps, which might be useful when attempting to do point-in-time recovery --- previously it was not possible to specify the stop point with sub-second resolution. But mostly this is to get rid of TimestampTz-to-time_t conversion overhead during commit. Per my proposal of a day or two back.	2007-04-30 21:01:53 +00:00
Tom Lane	957d08c81f	Implement rate-limiting logic on how often backends will attempt to send messages to the stats collector. This avoids the problem that enabling stats_row_level for autovacuum has a significant overhead for short read-only transactions, as noted by Arjen van der Meijden. We can avoid an extra gettimeofday call by piggybacking on the one done for WAL-logging xact commit or abort (although that doesn't help read-only transactions, since they don't WAL-log anything). In my proposal for this, I noted that we could change the WAL log entries for commit/abort to record full TimestampTz precision, instead of only time_t as at present. That's not done in this patch, but will be committed separately.	2007-04-30 03:23:49 +00:00
Tom Lane	a2e923a652	Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.	2007-04-26 23:24:46 +00:00
Tom Lane	9d37c038fc	Repair PANIC condition in hash indexes when a previous index extension attempt failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.	2007-04-19 20:24:04 +00:00
Tom Lane	836feeda9c	Fix condition for whether end_heap_rewrite must fsync, per Heikki.	2007-04-17 21:29:31 +00:00
Tom Lane	4942ee656a	Don't assume rd_smgr stays open across all of a rewriteheap operation; doing so can result in crash if an sinval reset occurs meanwhile. I believe this explains intermittent buildfarm failures in cluster test.	2007-04-17 20:49:39 +00:00
Tom Lane	226a100568	Code review for btree page split WAL reduction patch. Make it actually work (original code always created a full-page image for the left page, thus leaving the intended savings unrealized), avoid risk of not having enough room on the page during xlog restore, squeeze out another couple bytes in the xlog record, clean up neglected comments.	2007-04-11 20:47:38 +00:00
Tom Lane	56218fbc48	Minor tweaking of index special-space definitions so that the various index types can be reliably distinguished by examining the special space on an index page. Per my earlier proposal, plus the realization that there's no need for btree's vacuum cycle ID to cycle through every possible 16-bit value. Restricting its range a little costs nearly nothing and eliminates the possibility of collisions. Memo to self: remember to make bitmap indexes play along with this scheme, assuming that patch ever gets accepted.	2007-04-09 22:04:08 +00:00
Tom Lane	7b78474da3	Make CLUSTER MVCC-safe. Heikki Linnakangas	2007-04-08 01:26:33 +00:00
Tom Lane	f02a82b6ad	Make 'col IS NULL' clauses be indexable conditions. Teodor Sigaev, with some kibitzing from Tom Lane.	2007-04-06 22:33:43 +00:00
Tom Lane	3e23b68dac	Support varlena fields with single-byte headers and unaligned storage. This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane	2007-04-06 04:21:44 +00:00
Tom Lane	9c9b619473	Remove the CheckpointStartLock in favor of having backends show whether they are in their commit critical sections via flags in the ProcArray. Checkpoint can watch the ProcArray to determine when it's safe to proceed. This is a considerably better solution to the original problem of race conditions between checkpoint and transaction commit: it speeds up commit, since there's one less lock to fool with, and it prevents the problem of checkpoint being delayed indefinitely when there's a constant flow of commits. Heikki, with some kibitzing from Tom.	2007-04-03 16:34:36 +00:00
Tom Lane	b3005276eb	Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. Add the latter to the values checked in pg_control, since it can't be changed without invalidating toast table content. This commit in itself shouldn't change any behavior, but it lays some necessary groundwork for experimentation with these toast-control numbers. Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some thought still needs to be given to needs_toast_table() in toasting.c before unleashing random changes.	2007-04-03 04:14:26 +00:00
Tom Lane	57690c6803	Support enum data types. Along the way, use macros for the values of pg_type.typtype whereever practical. Tom Dunstan, with some kibitzing from Tom Lane.	2007-04-02 03:49:42 +00:00
Tom Lane	8875d0987d	Fix oversight in coding of _bt_start_vacuum: we can't assume that the LWLock will be released by transaction abort before _bt_end_vacuum gets called. If either of these "can't happen" errors actually happened, we'd freeze up trying to acquire an already-held lock. Latest word is that this does not explain Martin Pitt's trouble report, but it still looks like a bug.	2007-03-30 00:12:59 +00:00
Tom Lane	fba8113c1b	Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving) --- Simon. Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.	2007-03-29 00:15:39 +00:00
Tom Lane	e85a01df67	Clean up the representation of special snapshots by including a "method pointer" in every Snapshot struct. This allows removal of the case-by-case tests in HeapTupleSatisfiesVisibility, which should make it a bit faster (I didn't try any performance tests though). More importantly, we are no longer violating portable C practices by assuming that small integers are distinct from all pointer values, and HeapTupleSatisfiesDirty no longer has a non-reentrant API involving side-effects on a global variable. There were a couple of places calling HeapTupleSatisfiesXXX routines directly rather than through the HeapTupleSatisfiesVisibility macro. Since these places had to be changed anyway, I chose to make them go through the macro for uniformity. Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC to emphasize that it's only used with MVCC-type snapshots. I was sorely tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot, but forebore for the moment to avoid confusion and reduce the likelihood that this patch breaks some of the pending patches. Might want to reconsider doing that later.	2007-03-25 19:45:14 +00:00
Tom Lane	4f896dac17	Arrange for PreventTransactionChain to reject commands submitted as part of a multi-statement simple-Query message. This bug goes all the way back, but unfortunately is not nearly so easy to fix in existing releases; it is only the recent ProcessUtility API change that makes it fixable in HEAD. Per report from William Garrison.	2007-03-22 19:55:04 +00:00
Peter Eisentraut	f4ee82e3d3	Reverted waiting for further fixes: Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland	2007-03-13 14:32:25 +00:00
Tom Lane	b9527e9840	First phase of plan-invalidation project: create a plan cache management module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.	2007-03-13 00:33:44 +00:00
Peter Eisentraut	f84308f195	Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland	2007-03-12 22:09:28 +00:00
Neil Conway	e1d8deb918	Fix a typo in a comment. Heikki Linnakangas.	2007-03-05 14:13:12 +00:00
Bruce Momjian	bc292937ae	Split _bt_insertonpg to two functions. Heikki Linnakangas	2007-03-03 20:13:06 +00:00
Bruce Momjian	ae35867a39	Remove undo information from pg_controldata --- never used. Florian G. Pflug	2007-03-03 20:02:27 +00:00
Tom Lane	234a02b2a8	Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len). Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane	2007-02-27 23:48:10 +00:00
Bruce Momjian	6f519ad01c	btree source code cleanups: I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas	2007-02-21 20:02:17 +00:00
Alvaro Herrera	1820650934	Restructure autovacuum in two processes: a dummy process, which runs continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.	2007-02-15 23:23:23 +00:00
Bruce Momjian	a9eb53969a	Move fsync method macro defines into /include/access/xlogdefs.h so they can be used by src/tools/fsync/test_fsync.c.	2007-02-14 05:00:40 +00:00
Tom Lane	caf2b64a75	Disallow committing a prepared transaction unless we are in the same database it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.	2007-02-13 19:39:42 +00:00
Peter Eisentraut	c138b966d4	Replace useless uses of := by = in makefiles.	2007-02-09 15:56:00 +00:00
Tom Lane	c398300330	Combine cmin and cmax fields of HeapTupleHeaders into a single field, by keeping private state in each backend that has inserted and deleted the same tuple during its current top-level transaction. This is sufficient since there is no need to be able to determine the cmin/cmax from any other transaction. This gets us back down to 23-byte headers, removing a penalty paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with minor revisions by moi, following a design hashed out awhile back on the pghackers list.	2007-02-09 03:35:35 +00:00
Alvaro Herrera	f8ebab901b	Fix reference-after-free in the new btree page split code, as reported by the buildfarm via Stefan Kaltenbrunner. Patch from Heikki Linnakangas.	2007-02-08 13:52:55 +00:00
Peter Eisentraut	086c189456	Normalize fgets() calls to use sizeof() for calculating the buffer size where possible, and fix some sites that apparently thought that fgets() will overwrite the buffer by one byte. Also add some strlcpy() to eliminate some weird memory handling.	2007-02-08 11:10:27 +00:00
Bruce Momjian	b79575ce45	Reduce WAL activity for page splits: > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas	2007-02-08 05:05:53 +00:00
Tom Lane	aec4cf1c8c	Add a function pg_stat_clear_snapshot() that discards any statistics snapshot already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.	2007-02-07 23:11:30 +00:00
Tom Lane	78d1216160	Remove the xlog-centric "database system is ready" message and replace it with "database system is ready to accept connections", which is issued by the postmaster when it really is ready to accept connections. Per proposal from Markus Schiltknecht and subsequent discussion.	2007-02-07 16:44:48 +00:00
Tom Lane	c76ed81513	Remove some dead code, per Heikki.	2007-02-06 14:55:11 +00:00
Tom Lane	23c4978e6c	Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to describe the maximum size of index tuples (which is typically AM-dependent anyway); and consequently remove the bogus deduction for "special space" that was built into it. Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two bytes per toast chunk, and to ensure that the calculation correctly tracks any future changes in page header size. The computation had been inaccurate in a way that didn't cause any harm except space wastage, but future changes could have broken it more drastically. Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte more than it could safely be. This didn't cause any harm in practice because it's only compared against maxalign'd lengths, but future changes in the size of page headers or btree special space could have exposed the problem. initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the storage of toast tables.	2007-02-05 04:22:18 +00:00
Tom Lane	a2e092e1c7	Don't MAXALIGN in the checks to decide whether a tuple is over TOAST's threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)	2007-02-04 20:00:37 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Neil Conway	dbcaee49b5	Fix a few typos in comments in GiN.	2007-02-01 04:16:08 +00:00
Teodor Sigaev	d4c6da1527	Allow GIN's extractQuery method to signal that nothing can satisfy the query. In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.	2007-01-31 15:09:45 +00:00
Tom Lane	a635c08fa1	Add support for cross-type hashing in hash index searches and hash joins. Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.	2007-01-30 01:33:36 +00:00
Tom Lane	e8cd6f14a2	Add comment noting that hashm_procid in a hash index's metapage isn't actually used for anything.	2007-01-29 23:22:59 +00:00
Tom Lane	6cefacd7c8	Correct an old logic error in btree page splitting: when considering a split exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.	2007-01-27 20:53:30 +00:00
Bruce Momjian	ef65f6f7a4	Prevent WAL logging when COPY is done in the same transation that created it. Simon Riggs	2007-01-25 02:17:26 +00:00
Neil Conway	2b7334d487	Refactor the index AM API slightly: move currentItemData and currentMarkData from IndexScanDesc to the opaque structs for the AMs that need this information (currently gist and hash). Patch from Heikki Linnakangas, fixes by Neil Conway.	2007-01-20 18:43:35 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Alvaro Herrera	eb63cc3da8	Arrange for autovacuum to be killed when another operation wants to be alone accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.	2007-01-16 13:28:57 +00:00
Tom Lane	d83235415b	Add some notes about the basic mathematical laws that the system presumes hold true for operators in a btree operator family. This is mostly to clarify my own thinking about what the planner can assume for optimization purposes. (blowing dust off an old abstract-algebra textbook...)	2007-01-12 17:04:54 +00:00
Bruce Momjian	40f797be03	Enable another five tuple status bits by using the high bits of the nattr field, and rename the field. Heikki Linnakangas	2007-01-09 22:01:00 +00:00
Tom Lane	69db009163	Add a citation to Seltzer and Yigit's Usenix '91 paper about hash table management. The paper clearly describes many of the ideas embodied in our current hashing code, but as far as I could find out there is not a direct code heritage. (Mike Olsen recalls discussion of this paper at Postgres meetings but believes it "informed the Postgres implementation probably just at the design level". Margo herself says she wasn't involved with Postgres' hash code.) Credit where credit is due 'n all that, even if fifteen years after the fact.	2007-01-09 07:30:49 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	7c8927bf08	Fix some small typos in comments. Greg Stark	2007-01-04 16:29:42 +00:00
Tom Lane	ef07221997	Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.	2007-01-03 18:11:01 +00:00
Tom Lane	5725b9d9af	Support type modifiers for user-defined types, and pull most knowledge about typmod representation for standard types out into type-specific typmod I/O functions. Teodor Sigaev, with some editorialization by Tom Lane.	2006-12-30 21:21:56 +00:00
Tom Lane	9aefd56669	Fix up btree's initial scankey processing to be able to detect redundant or contradictory keys even in cross-data-type scenarios. This is another benefit of the opfamily rewrite: we can find the needed comparison operators now.	2006-12-28 23:16:39 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Tom Lane	0cb91ccba9	Remove the logId/logSeg fields from pg_control, because they are not needed in normal operation, and we can avoid rewriting pg_control at every log segment switch if we don't insist that these values be valid. Reducing the number of pg_control updates is a good idea for both performance and reliability. It does make pg_resetxlog's life a bit harder, but that seems a good tradeoff; and anyway the change to pg_resetxlog amounts to automating something people formerly needed to do by hand, namely look at the existing pg_xlog files to make sure the new WAL start point was past them. In passing, change the wording of xlog.c's "database system was interrupted" messages: describe the pg_control timestamp as "last known up at" rather than implying it is the exact time of service interruption. With this change the timestamp will generally be the time of the last checkpoint, which could be many minutes before the failure; and we've already seen indications that people tend to misinterpret the old wording. initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane	2006-12-08 19:50:53 +00:00
Neil Conway	886a02d1cb	Add a txn_start column to pg_stat_activity. This makes it easier to identify long-running transactions. Since we already need to record the transaction-start time (e.g. for now()), we don't need any additional system calls to report this information. Catversion bumped, initdb required.	2006-12-06 18:06:48 +00:00
Tom Lane	5f60086e10	Minor adjustments to make failures in startup/shutdown behave more cleanly. StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.	2006-11-30 18:29:12 +00:00
Teodor Sigaev	ef148d6b85	Fix bug with page deletion. If inner page is removed and it tries to remove page on next level linked from next inner page, ginScanToDelete() wrongly sets parent page. Bug reveals when many item pointers from index was deleted ( several hundred thousands). Bug is discovered by hubert depesz lubaczewski <depesz@gmail.com> Suppose, we need rc2 before release...	2006-11-30 16:22:32 +00:00
Neil Conway	546d6848ca	Add a comment noting that heap_copytuple_with_tuple() results in a HeapTuple that is no longer allocated as a single palloc() block; if used carelessly, this might result in a subsequent memory leak after heap_freetuple().	2006-11-23 05:27:18 +00:00
Tom Lane	395249ecbe	Several changes to reduce the probability of running out of memory during AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.	2006-11-23 01:14:59 +00:00
Tom Lane	3ad0728c81	On systems that have setsid(2) (which should be just about everything except Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.	2006-11-21 20:59:53 +00:00
Tom Lane	d68efb3f8d	Repair problems with hash indexes that span multiple segments: the hash code's preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.	2006-11-19 21:33:23 +00:00
Tom Lane	4f335a3d7f	Repair two related errors in heap_lock_tuple: it was failing to recognize cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.	2006-11-17 18:00:15 +00:00
Peter Eisentraut	e138b80996	String fix	2006-11-16 14:28:41 +00:00
Neil Conway	dc10387eb1	Fix some typos in comments.	2006-11-12 06:55:54 +00:00
Tom Lane	a46ca619f8	Suppress a few 'uninitialized variable' warnings that gcc emits only at -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.	2006-11-11 01:14:19 +00:00
Tom Lane	792d6edd5b	Clean up some misleading references to %p being a full path, per Simon.	2006-11-10 22:32:20 +00:00
Tom Lane	dcbdf9b1d4	Change Windows rename and unlink substitutes so that they time out after 30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.	2006-11-08 20:12:05 +00:00
Tom Lane	48188e1621	Fix recently-understood problems with handling of XID freezing, particularly in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.	2006-11-05 22:42:10 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Tom Lane	1e758d5263	Add some code to CREATE DATABASE to check for pre-existing subdirectories that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.	2006-10-18 22:44:12 +00:00
Peter Eisentraut	b9b4f10b5b	Message style improvements	2006-10-06 17:14:01 +00:00
Tom Lane	378c79dc78	Cleanup for pglz_compress code: remove dead code, const-ify API of remaining functions, simplify pglz_compress's API to not require a useless data copy when compression fails. Also add a check in pglz_decompress that the expected amount of data was decompressed.	2006-10-05 23:33:33 +00:00
Tom Lane	e378f82e00	Make use of qsort_arg in several places that were formerly using klugy static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.	2006-10-05 17:57:40 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Bruce Momjian	45c8ed96b9	Make some sentences consistent with similar ones. Euler Taveira de Oliveira	2006-10-03 21:21:36 +00:00
Alvaro Herrera	4650c4fdb9	Degrade the transaction-id wraparound point message from LOG to DEBUG1, per discussion. Patch from Simon Riggs.	2006-09-26 17:21:39 +00:00
Tom Lane	9e936693a9	Fix free space map to correctly track the total amount of FSM space needed even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.	2006-09-21 20:31:22 +00:00
Teodor Sigaev	deb66e013c	Improve error message. Per discussion http://archives.postgresql.org/pgsql-general/2006-09/msg00186.php	2006-09-14 11:26:49 +00:00
Bruce Momjian	d18768867e	Remove unnecessary brace pair.	2006-09-10 23:33:22 +00:00
Tom Lane	f5b4d9a9e0	If we're going to advertise the array overlap/containment operators, we probably should make them work reliably for all arrays. Fix code to handle NULLs and multidimensional arrays, move it into arrayfuncs.c. GIN is still restricted to indexing arrays with no null elements, however.	2006-09-10 20:14:20 +00:00
Tom Lane	ba920e1c91	Rename contains/contained-by operators to @> and <@, per discussion that agreed these symbols are less easily confused. I made new pg_operator entries (with new OIDs) for the old names, so as to provide backward compatibility while making it pretty easy to remove the old names in some future release cycle. This commit only touches the core datatypes, contrib will be fixed separately.	2006-09-10 00:29:35 +00:00
Teodor Sigaev	889ec4b998	Fix Intel compiler bug. Per discussion 'GIN FailedAssertions on Itanium2 with Intel compiler' in pgsql-hackers, http://archives.postgresql.org/pgsql-hackers/2006-08/msg01914.php	2006-09-05 18:25:10 +00:00
Tom Lane	8fad2e3ff4	Arrange for GetSnapshotData to copy live-subtransaction XIDs from the PGPROC array into snapshots, and use this information to avoid visits to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve the pg_subtrans-related context swap storm problem that's been reported by several people for 8.1. While at it, modify GetSnapshotData to not take an exclusive lock on ProcArrayLock, as closer analysis shows that shared lock is always sufficient. Itagaki Takahiro and Tom Lane	2006-09-03 15:59:39 +00:00
Teodor Sigaev	b681bfdd59	Fix BUG #2594 : Gin Indexes cause server to crash when it builds on empty table	2006-08-29 14:05:44 +00:00
Tom Lane	ca1fd0ea5b	Move xact.c's partial support for Lists of TransactionIds into pg_list.h. Needed because lock.c is now going to use the same type of list.	2006-08-27 19:11:46 +00:00
Tom Lane	e093dcdd28	Add the ability to create indexes 'concurrently', that is, without blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.	2006-08-25 04:06:58 +00:00
Tom Lane	08ae5edc5c	Optimize the case where a btree indexscan has current and mark positions on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas	2006-08-24 01:18:34 +00:00
Tom Lane	35af5422f6	Make the server track an 'XID epoch', that is, maintain higher-order bits of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.	2006-08-21 16:16:31 +00:00
Tom Lane	7aa772f03e	Now that we've rearranged relation open to get a lock before touching the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.	2006-08-18 16:09:13 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00
Tom Lane	7a3e30e608	Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests. plpgsql support to come later. Along the way, convert execMain's SELECT INTO support into a DestReceiver, in order to eliminate some ugly special cases. Jonah Harris and Tom Lane	2006-08-12 02:52:06 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	704ddaaa09	Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.	2006-08-06 03:53:44 +00:00
Tom Lane	bc8ac3ce40	Add missing pgstat_count_index_scan(), per Andreas Seltenreich.	2006-08-03 15:22:09 +00:00
Tom Lane	09d3670df3	Change the relation_open protocol so that we obtain lock on a relation (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.	2006-07-31 20:09:10 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Peter Eisentraut	e9b4969062	DTrace support, with a small initial set of probes by Robert Lor	2006-07-24 16:32:45 +00:00
Tom Lane	9dc842f083	Don't try to truncate multixact SLRU files in checkpoints done during xlog recovery. In the first place, it doesn't work because slru's latest_page_number isn't set up yet (this is why we've been hearing reports of strange "apparent wraparound" log messages during crash recovery, but only from people who'd managed to advance their next-mxact counters some considerable distance from 0). In the second place, it seems a bit unwise to be throwing away data during crash recovery anwyway. This latter consideration convinces me to just disable truncation during recovery, rather than computing latest_page_number and pushing ahead.	2006-07-20 00:46:42 +00:00
Tom Lane	c36418be40	Fix getDatumCopy(): don't use store_att_byval to copy into a Datum variable (this accounts for regression failures on PPC64, and in fact won't work on any big-endian machine). Get rid of hardwired knowledge about datum size rules; make it look just like datumCopy().	2006-07-16 00:54:22 +00:00
Tom Lane	e040ab44e4	Improve error message wording.	2006-07-16 00:52:05 +00:00
Tom Lane	98bac16e4d	Fix misguided removal of access/tuptoaster.h inclusion, per Kris Jurka. I'm going to insist on reversion of this entire patch unless pgrminclude is upgraded to a less broken state, but in the meantime let's get contrib passing regression again.	2006-07-14 19:05:52 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	d29b66882a	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.	2006-07-11 21:05:57 +00:00
Teodor Sigaev	001d30ee6b	Add support to GIN for =(anyarray,anyarray) operation	2006-07-11 19:49:14 +00:00
Bruce Momjian	ac230e7431	Alphabetically order reference to include files, "S"-"Z".	2006-07-11 18:26:11 +00:00
Bruce Momjian	0ff3461bcc	Alphabetically order reference to include files, "N" - "S".	2006-07-11 17:26:59 +00:00
Bruce Momjian	3a534ade39	Alphabetically order reference to include files, "G" - "M".	2006-07-11 17:04:13 +00:00
Teodor Sigaev	234163649e	GIN improvements - Replace sorted array of entries in maintenance_work_mem to binary tree, this should improve create performance. - More precisely calculate allocated memory, eliminate leaks with user-defined extractValue() - Improve wordings in tsearch2	2006-07-11 16:55:34 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Teodor Sigaev	783a73168b	Forget to add new file :((	2006-06-28 12:08:35 +00:00
Teodor Sigaev	1f7ef548ec	Changes * new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l\|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l\|r)datum for splitting. pickSplit should set spl_(l\|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l\|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules	2006-06-28 12:00:14 +00:00
Tom Lane	3c71244b74	Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect this someday, but right now it seems that posix_fadvise is immature to the point of being broken on many platforms ... and we don't have any benchmark evidence proving it's worth spending time on.	2006-06-27 18:59:17 +00:00
Tom Lane	cdd5178c69	Extend the MinimalTuple concept to tuplesort.c, thereby reducing the per-tuple space overhead for sorts in memory. I chose to replace the previous patch that tried to write out the bare minimum amount of data when sorting on disk; instead, just dump the MinimalTuples as-is. This wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap length, but the simplification in the writetup/readtup routines seems worth it.	2006-06-27 16:53:02 +00:00
Tom Lane	3f50ba27cf	Create infrastructure for 'MinimalTuple' representation of in-memory tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.	2006-06-27 02:51:40 +00:00
Tom Lane	3a04f53e7f	pg_stop_backup was calling XLogArchiveNotify() twice for the newly created backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.	2006-06-22 20:42:57 +00:00
Tom Lane	27c3e3de09	Remove redundant gettimeofday() calls to the extent practical without changing semantics too much. statement_timestamp is now set immediately upon receipt of a client command message, and the various places that used to do their own gettimeofday() calls to mark command startup are referenced to that instead. I have also made stats_command_string use that same value for pg_stat_activity.query_start for both the command itself and its eventual replacement by <IDLE> or <idle in transaction>. There was some debate about that, but no argument that seemed convincing enough to justify an extra gettimeofday() call.	2006-06-20 22:52:00 +00:00
Tom Lane	1e8ae13640	Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration for it. Hopefully will fix core dump evidenced by some buildfarm members since fadvise patch went in. The actual definition of the function is not ABI-compatible with compiler's default assumption in the absence of any declaration, so it's clearly unsafe to try to call it without seeing a declaration.	2006-06-18 18:30:21 +00:00
Tom Lane	06e10abc0b	Fix problems with cached tuple descriptors disappearing while still in use by creating a reference-count mechanism, similar to what we did a long time ago for catcache entries. The back branches have an ugly solution involving lots of extra copies, but this way is more efficient. Reference counting is only applied to tupdescs that are actually in caches --- there seems no need to use it for tupdescs that are generated in the executor, since they'll go away during plan shutdown by virtue of being in the per-query memory context. Neil Conway and Tom Lane	2006-06-16 18:42:24 +00:00
Bruce Momjian	40bc06fa16	Test for POSIX_FADV_DONTNEED to use posix_fadvise().	2006-06-16 04:11:48 +00:00
Bruce Momjian	94a5c4a01b	Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file close. ITAGAKI Takahiro	2006-06-15 19:15:00 +00:00
Teodor Sigaev	b32000eda4	Som improve page split in multicolumn GiST index. If user picksplit on n-th column generate equals left and right unions then it calls picksplit on n+1-th column.	2006-05-29 12:50:06 +00:00
Teodor Sigaev	0a6fde5a26	Correct cheking in findParents(). i From Andreas Seltenreich <andreas+pg@gate450.dyndns.org>	2006-05-29 08:39:44 +00:00
Alvaro Herrera	3d58a1c168	Remove traces of otherwise unused RELKIND_SPECIAL symbol. Leave the psql bits in place though, so that it plays nicely with older servers. Per discussion.	2006-05-28 02:27:08 +00:00
Teodor Sigaev	5d1a066e64	Fix findParents() in case of multiple levels to find. By Andreas Seltenreich <andreas+pg@gate450.dyndns.org>	2006-05-26 08:01:17 +00:00
Teodor Sigaev	d2158b0281	* Add support NULL to GiST. * some refactoring and simplify code int gistutil.c and gist.c * now in some cases it can be called used-defined picksplit method for non-first column in index, but here is a place to do more. * small fix of docs related to support NULL.	2006-05-24 11:01:39 +00:00
Teodor Sigaev	09518fbdf4	Call MarkBufferDirty() before XLogInsert() during completion of insert	2006-05-19 17:15:41 +00:00
Teodor Sigaev	420cbff881	Simplify gistSplit() and some refactoring related code.	2006-05-19 16:15:17 +00:00
Teodor Sigaev	5890790b4a	Rework completion of incomplete inserts. Now it writes WAL log during inserts.	2006-05-19 11:10:25 +00:00
Teodor Sigaev	8876e37d07	Reduce size of critial section during vacuum full, critical sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay	2006-05-17 16:34:59 +00:00
Tom Lane	3fdeb189e9	Clean up code associated with updating pg_class statistics columns (relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.	2006-05-10 23:18:39 +00:00
Teodor Sigaev	10dd8df68e	Reduce size of critical section and remove call of user-defined functions in insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG	2006-05-10 09:19:54 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	09cb5c0e7d	Rewrite btree index scans to work a page at a time in all cases (both btgettuple and btgetmulti). This eliminates the problem of "re-finding" the exact stopping point, since the stopping point is effectively always a page boundary, and index items are never moved across pre-existing page boundaries. A small penalty is that the keys_are_unique optimization is effectively disabled (and, therefore, is removed in this patch), causing us to apply _bt_checkkeys() to at least one more tuple than necessary when looking up a unique key. However, the advantages for non-unique cases seem great enough to accept this tradeoff. Aside from simplifying and (sometimes) speeding up the indexscan code, this will allow us to reimplement btbulkdelete as a largely sequential scan instead of index-order traversal, thereby significantly reducing the cost of VACUUM. Those changes will come in a separate patch. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-07 01:21:30 +00:00
Teodor Sigaev	2a58f3bff6	Fix typo noticed by Alvaro Herrera	2006-05-03 06:56:47 +00:00
Tom Lane	e57345975c	Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion. This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().	2006-05-02 22:25:10 +00:00

1 2 3 4 5 ...

1378 Commits