postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-07-31 08:23:22 +02:00

Author	SHA1	Message	Date
Teodor Sigaev	54af876593	Replace ReadBuffer to ReadBufferWithStrategy in all vacuum-involved places to implement limited-size "ring" of buffers for VACUUM for GIN & GIST	2007-05-31 14:03:09 +00:00
Peter Eisentraut	71fb7b9014	Downgrade some low-level startup messages to DEBUG1.	2007-05-31 07:36:12 +00:00
Tom Lane	fa0e318f94	Fix overly-strict sanity check in BeginInternalSubTransaction that made it fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.	2007-05-30 21:01:39 +00:00
Tom Lane	d526575f89	Make large sequential scans and VACUUMs work in a limited-size "ring" of buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.	2007-05-30 20:12:03 +00:00
Tom Lane	77947c51c0	Fix up pgstats counting of live and dead tuples to recognize that committed and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.	2007-05-27 03:50:39 +00:00
Tom Lane	a8d539f124	To support external compression of archived WAL data, add a flag bit to WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.	2007-05-20 21:08:19 +00:00
Alvaro Herrera	3b0347b36e	Move the tuple freezing point in CLUSTER to a point further back in the past, to avoid losing useful Xid information in not-so-old tuples. This makes CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes (though CLUSTER does not yet advance the table's relfrozenxid). While at it, move the actual freezing operation in rewriteheap.c to a more appropriate place, and document it thoroughly. This part of the patch from Tom Lane.	2007-05-17 15:28:29 +00:00
Alvaro Herrera	dfed0012bc	Have the rewriteheap code freeze old tuples. This is safe because it is only applied to live tuples older than a recent Xmin, not to tuples that may be part of an update chain. Those still keep their original markings. This patch makes it possible for CLUSTER to advance relfrozenxid, thus avoiding the need of vacuuming the table for Xid wraparound purposes. That will be patched separately. Patch from Heikki Linnakangas.	2007-05-16 16:36:56 +00:00
Tom Lane	0fef38da21	Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetching pages it intends to zero immediately. Just to show there is some use for that function besides WAL recovery :-). Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf and friends, instead of expecting callers to do that separately.	2007-05-03 16:45:58 +00:00
Tom Lane	8c3cc86e7b	During WAL recovery, when reading a page that we intend to overwrite completely from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas	2007-05-02 23:18:03 +00:00
Tom Lane	c432061963	Change the timestamps recorded in transaction commit/abort xlog records from time_t to TimestampTz representation. This provides full gettimeofday() resolution of the timestamps, which might be useful when attempting to do point-in-time recovery --- previously it was not possible to specify the stop point with sub-second resolution. But mostly this is to get rid of TimestampTz-to-time_t conversion overhead during commit. Per my proposal of a day or two back.	2007-04-30 21:01:53 +00:00
Tom Lane	957d08c81f	Implement rate-limiting logic on how often backends will attempt to send messages to the stats collector. This avoids the problem that enabling stats_row_level for autovacuum has a significant overhead for short read-only transactions, as noted by Arjen van der Meijden. We can avoid an extra gettimeofday call by piggybacking on the one done for WAL-logging xact commit or abort (although that doesn't help read-only transactions, since they don't WAL-log anything). In my proposal for this, I noted that we could change the WAL log entries for commit/abort to record full TimestampTz precision, instead of only time_t as at present. That's not done in this patch, but will be committed separately.	2007-04-30 03:23:49 +00:00
Tom Lane	a2e923a652	Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan is in progress on the same hashtable. This seems the least invasive way to fix the recently-recognized problem that a split could cause the scan to visit entries twice or (with much lower probability) miss them entirely. The only field-reported problem caused by this is the "failed to re-find shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky, which was caused by multiply visited entries. However, it seems certain that mdsync() is vulnerable to missing required fsync's due to missed entries, and I am fearful that RelationCacheInitializePhase2() might be at risk as well. Because of that and the generalized hazard presented by this bug, back-patch all the supported branches. Along the way, fix pg_prepared_statement() and pg_cursor() to not assume that the hashtables they are examining will stay static between calls. This is risky regardless of the newly noted dynahash problem, because hash_seq_search() has never promised to cope with deletion of table entries other than the just-returned one. There may be no bug here because the only supported way to call these functions is via ExecMakeTableFunctionResult() which will cycle them to completion before doing anything very interesting, but it seems best to get rid of the assumption. This affects 8.2 and HEAD only, since those functions weren't there earlier.	2007-04-26 23:24:46 +00:00
Tom Lane	9d37c038fc	Repair PANIC condition in hash indexes when a previous index extension attempt failed (due to lock conflicts or out-of-space). We might have already extended the index's filesystem EOF before failing, causing the EOF to be beyond what the metapage says is the last used page. Hence the invariant maintained by the code needs to be "EOF is at or beyond last used page", not "EOF is exactly the last used page". Problem was created by my patch of 2006-11-19 that attempted to repair bug #2737. Since that was back-patched to 7.4, this needs to be as well. Per report and test case from Vlastimil Krejcir.	2007-04-19 20:24:04 +00:00
Tom Lane	836feeda9c	Fix condition for whether end_heap_rewrite must fsync, per Heikki.	2007-04-17 21:29:31 +00:00
Tom Lane	4942ee656a	Don't assume rd_smgr stays open across all of a rewriteheap operation; doing so can result in crash if an sinval reset occurs meanwhile. I believe this explains intermittent buildfarm failures in cluster test.	2007-04-17 20:49:39 +00:00
Tom Lane	226a100568	Code review for btree page split WAL reduction patch. Make it actually work (original code always created a full-page image for the left page, thus leaving the intended savings unrealized), avoid risk of not having enough room on the page during xlog restore, squeeze out another couple bytes in the xlog record, clean up neglected comments.	2007-04-11 20:47:38 +00:00
Tom Lane	56218fbc48	Minor tweaking of index special-space definitions so that the various index types can be reliably distinguished by examining the special space on an index page. Per my earlier proposal, plus the realization that there's no need for btree's vacuum cycle ID to cycle through every possible 16-bit value. Restricting its range a little costs nearly nothing and eliminates the possibility of collisions. Memo to self: remember to make bitmap indexes play along with this scheme, assuming that patch ever gets accepted.	2007-04-09 22:04:08 +00:00
Tom Lane	7b78474da3	Make CLUSTER MVCC-safe. Heikki Linnakangas	2007-04-08 01:26:33 +00:00
Tom Lane	f02a82b6ad	Make 'col IS NULL' clauses be indexable conditions. Teodor Sigaev, with some kibitzing from Tom Lane.	2007-04-06 22:33:43 +00:00
Tom Lane	3e23b68dac	Support varlena fields with single-byte headers and unaligned storage. This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane	2007-04-06 04:21:44 +00:00
Tom Lane	9c9b619473	Remove the CheckpointStartLock in favor of having backends show whether they are in their commit critical sections via flags in the ProcArray. Checkpoint can watch the ProcArray to determine when it's safe to proceed. This is a considerably better solution to the original problem of race conditions between checkpoint and transaction commit: it speeds up commit, since there's one less lock to fool with, and it prevents the problem of checkpoint being delayed indefinitely when there's a constant flow of commits. Heikki, with some kibitzing from Tom.	2007-04-03 16:34:36 +00:00
Tom Lane	b3005276eb	Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. Add the latter to the values checked in pg_control, since it can't be changed without invalidating toast table content. This commit in itself shouldn't change any behavior, but it lays some necessary groundwork for experimentation with these toast-control numbers. Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some thought still needs to be given to needs_toast_table() in toasting.c before unleashing random changes.	2007-04-03 04:14:26 +00:00
Tom Lane	57690c6803	Support enum data types. Along the way, use macros for the values of pg_type.typtype whereever practical. Tom Dunstan, with some kibitzing from Tom Lane.	2007-04-02 03:49:42 +00:00
Tom Lane	8875d0987d	Fix oversight in coding of _bt_start_vacuum: we can't assume that the LWLock will be released by transaction abort before _bt_end_vacuum gets called. If either of these "can't happen" errors actually happened, we'd freeze up trying to acquire an already-held lock. Latest word is that this does not explain Martin Pitt's trouble report, but it still looks like a bug.	2007-03-30 00:12:59 +00:00
Tom Lane	fba8113c1b	Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving) --- Simon. Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.	2007-03-29 00:15:39 +00:00
Tom Lane	e85a01df67	Clean up the representation of special snapshots by including a "method pointer" in every Snapshot struct. This allows removal of the case-by-case tests in HeapTupleSatisfiesVisibility, which should make it a bit faster (I didn't try any performance tests though). More importantly, we are no longer violating portable C practices by assuming that small integers are distinct from all pointer values, and HeapTupleSatisfiesDirty no longer has a non-reentrant API involving side-effects on a global variable. There were a couple of places calling HeapTupleSatisfiesXXX routines directly rather than through the HeapTupleSatisfiesVisibility macro. Since these places had to be changed anyway, I chose to make them go through the macro for uniformity. Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC to emphasize that it's only used with MVCC-type snapshots. I was sorely tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot, but forebore for the moment to avoid confusion and reduce the likelihood that this patch breaks some of the pending patches. Might want to reconsider doing that later.	2007-03-25 19:45:14 +00:00
Tom Lane	4f896dac17	Arrange for PreventTransactionChain to reject commands submitted as part of a multi-statement simple-Query message. This bug goes all the way back, but unfortunately is not nearly so easy to fix in existing releases; it is only the recent ProcessUtility API change that makes it fixable in HEAD. Per report from William Garrison.	2007-03-22 19:55:04 +00:00
Peter Eisentraut	f4ee82e3d3	Reverted waiting for further fixes: Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland	2007-03-13 14:32:25 +00:00
Tom Lane	b9527e9840	First phase of plan-invalidation project: create a plan cache management module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.	2007-03-13 00:33:44 +00:00
Peter Eisentraut	f84308f195	Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland	2007-03-12 22:09:28 +00:00
Neil Conway	e1d8deb918	Fix a typo in a comment. Heikki Linnakangas.	2007-03-05 14:13:12 +00:00
Bruce Momjian	bc292937ae	Split _bt_insertonpg to two functions. Heikki Linnakangas	2007-03-03 20:13:06 +00:00
Bruce Momjian	ae35867a39	Remove undo information from pg_controldata --- never used. Florian G. Pflug	2007-03-03 20:02:27 +00:00
Tom Lane	234a02b2a8	Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len). Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane	2007-02-27 23:48:10 +00:00
Bruce Momjian	6f519ad01c	btree source code cleanups: I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas	2007-02-21 20:02:17 +00:00
Alvaro Herrera	1820650934	Restructure autovacuum in two processes: a dummy process, which runs continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.	2007-02-15 23:23:23 +00:00
Bruce Momjian	a9eb53969a	Move fsync method macro defines into /include/access/xlogdefs.h so they can be used by src/tools/fsync/test_fsync.c.	2007-02-14 05:00:40 +00:00
Tom Lane	caf2b64a75	Disallow committing a prepared transaction unless we are in the same database it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.	2007-02-13 19:39:42 +00:00
Peter Eisentraut	c138b966d4	Replace useless uses of := by = in makefiles.	2007-02-09 15:56:00 +00:00
Tom Lane	c398300330	Combine cmin and cmax fields of HeapTupleHeaders into a single field, by keeping private state in each backend that has inserted and deleted the same tuple during its current top-level transaction. This is sufficient since there is no need to be able to determine the cmin/cmax from any other transaction. This gets us back down to 23-byte headers, removing a penalty paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with minor revisions by moi, following a design hashed out awhile back on the pghackers list.	2007-02-09 03:35:35 +00:00
Alvaro Herrera	f8ebab901b	Fix reference-after-free in the new btree page split code, as reported by the buildfarm via Stefan Kaltenbrunner. Patch from Heikki Linnakangas.	2007-02-08 13:52:55 +00:00
Peter Eisentraut	086c189456	Normalize fgets() calls to use sizeof() for calculating the buffer size where possible, and fix some sites that apparently thought that fgets() will overwrite the buffer by one byte. Also add some strlcpy() to eliminate some weird memory handling.	2007-02-08 11:10:27 +00:00
Bruce Momjian	b79575ce45	Reduce WAL activity for page splits: > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas	2007-02-08 05:05:53 +00:00
Tom Lane	aec4cf1c8c	Add a function pg_stat_clear_snapshot() that discards any statistics snapshot already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.	2007-02-07 23:11:30 +00:00
Tom Lane	78d1216160	Remove the xlog-centric "database system is ready" message and replace it with "database system is ready to accept connections", which is issued by the postmaster when it really is ready to accept connections. Per proposal from Markus Schiltknecht and subsequent discussion.	2007-02-07 16:44:48 +00:00
Tom Lane	c76ed81513	Remove some dead code, per Heikki.	2007-02-06 14:55:11 +00:00
Tom Lane	23c4978e6c	Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to describe the maximum size of index tuples (which is typically AM-dependent anyway); and consequently remove the bogus deduction for "special space" that was built into it. Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two bytes per toast chunk, and to ensure that the calculation correctly tracks any future changes in page header size. The computation had been inaccurate in a way that didn't cause any harm except space wastage, but future changes could have broken it more drastically. Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte more than it could safely be. This didn't cause any harm in practice because it's only compared against maxalign'd lengths, but future changes in the size of page headers or btree special space could have exposed the problem. initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the storage of toast tables.	2007-02-05 04:22:18 +00:00
Tom Lane	a2e092e1c7	Don't MAXALIGN in the checks to decide whether a tuple is over TOAST's threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)	2007-02-04 20:00:37 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Neil Conway	dbcaee49b5	Fix a few typos in comments in GiN.	2007-02-01 04:16:08 +00:00
Teodor Sigaev	d4c6da1527	Allow GIN's extractQuery method to signal that nothing can satisfy the query. In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.	2007-01-31 15:09:45 +00:00
Tom Lane	a635c08fa1	Add support for cross-type hashing in hash index searches and hash joins. Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.	2007-01-30 01:33:36 +00:00
Tom Lane	e8cd6f14a2	Add comment noting that hashm_procid in a hash index's metapage isn't actually used for anything.	2007-01-29 23:22:59 +00:00
Tom Lane	6cefacd7c8	Correct an old logic error in btree page splitting: when considering a split exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.	2007-01-27 20:53:30 +00:00
Bruce Momjian	ef65f6f7a4	Prevent WAL logging when COPY is done in the same transation that created it. Simon Riggs	2007-01-25 02:17:26 +00:00
Neil Conway	2b7334d487	Refactor the index AM API slightly: move currentItemData and currentMarkData from IndexScanDesc to the opaque structs for the AMs that need this information (currently gist and hash). Patch from Heikki Linnakangas, fixes by Neil Conway.	2007-01-20 18:43:35 +00:00
Peter Eisentraut	2cc01004c6	Remove remains of old depend target.	2007-01-20 17:16:17 +00:00
Alvaro Herrera	eb63cc3da8	Arrange for autovacuum to be killed when another operation wants to be alone accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.	2007-01-16 13:28:57 +00:00
Tom Lane	d83235415b	Add some notes about the basic mathematical laws that the system presumes hold true for operators in a btree operator family. This is mostly to clarify my own thinking about what the planner can assume for optimization purposes. (blowing dust off an old abstract-algebra textbook...)	2007-01-12 17:04:54 +00:00
Bruce Momjian	40f797be03	Enable another five tuple status bits by using the high bits of the nattr field, and rename the field. Heikki Linnakangas	2007-01-09 22:01:00 +00:00
Tom Lane	69db009163	Add a citation to Seltzer and Yigit's Usenix '91 paper about hash table management. The paper clearly describes many of the ideas embodied in our current hashing code, but as far as I could find out there is not a direct code heritage. (Mike Olsen recalls discussion of this paper at Postgres meetings but believes it "informed the Postgres implementation probably just at the design level". Margo herself says she wasn't involved with Postgres' hash code.) Credit where credit is due 'n all that, even if fifteen years after the fact.	2007-01-09 07:30:49 +00:00
Tom Lane	4431758229	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.	2007-01-09 02:14:16 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	7c8927bf08	Fix some small typos in comments. Greg Stark	2007-01-04 16:29:42 +00:00
Tom Lane	ef07221997	Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.	2007-01-03 18:11:01 +00:00
Tom Lane	5725b9d9af	Support type modifiers for user-defined types, and pull most knowledge about typmod representation for standard types out into type-specific typmod I/O functions. Teodor Sigaev, with some editorialization by Tom Lane.	2006-12-30 21:21:56 +00:00
Tom Lane	9aefd56669	Fix up btree's initial scankey processing to be able to detect redundant or contradictory keys even in cross-data-type scenarios. This is another benefit of the opfamily rewrite: we can find the needed comparison operators now.	2006-12-28 23:16:39 +00:00
Tom Lane	a78fcfb512	Restructure operator classes to allow improved handling of cross-data-type cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.	2006-12-23 00:43:13 +00:00
Tom Lane	0cb91ccba9	Remove the logId/logSeg fields from pg_control, because they are not needed in normal operation, and we can avoid rewriting pg_control at every log segment switch if we don't insist that these values be valid. Reducing the number of pg_control updates is a good idea for both performance and reliability. It does make pg_resetxlog's life a bit harder, but that seems a good tradeoff; and anyway the change to pg_resetxlog amounts to automating something people formerly needed to do by hand, namely look at the existing pg_xlog files to make sure the new WAL start point was past them. In passing, change the wording of xlog.c's "database system was interrupted" messages: describe the pg_control timestamp as "last known up at" rather than implying it is the exact time of service interruption. With this change the timestamp will generally be the time of the last checkpoint, which could be many minutes before the failure; and we've already seen indications that people tend to misinterpret the old wording. initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane	2006-12-08 19:50:53 +00:00
Neil Conway	886a02d1cb	Add a txn_start column to pg_stat_activity. This makes it easier to identify long-running transactions. Since we already need to record the transaction-start time (e.g. for now()), we don't need any additional system calls to report this information. Catversion bumped, initdb required.	2006-12-06 18:06:48 +00:00
Tom Lane	5f60086e10	Minor adjustments to make failures in startup/shutdown behave more cleanly. StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.	2006-11-30 18:29:12 +00:00
Teodor Sigaev	ef148d6b85	Fix bug with page deletion. If inner page is removed and it tries to remove page on next level linked from next inner page, ginScanToDelete() wrongly sets parent page. Bug reveals when many item pointers from index was deleted ( several hundred thousands). Bug is discovered by hubert depesz lubaczewski <depesz@gmail.com> Suppose, we need rc2 before release...	2006-11-30 16:22:32 +00:00
Neil Conway	546d6848ca	Add a comment noting that heap_copytuple_with_tuple() results in a HeapTuple that is no longer allocated as a single palloc() block; if used carelessly, this might result in a subsequent memory leak after heap_freetuple().	2006-11-23 05:27:18 +00:00
Tom Lane	395249ecbe	Several changes to reduce the probability of running out of memory during AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.	2006-11-23 01:14:59 +00:00
Tom Lane	3ad0728c81	On systems that have setsid(2) (which should be just about everything except Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.	2006-11-21 20:59:53 +00:00
Tom Lane	d68efb3f8d	Repair problems with hash indexes that span multiple segments: the hash code's preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.	2006-11-19 21:33:23 +00:00
Tom Lane	4f335a3d7f	Repair two related errors in heap_lock_tuple: it was failing to recognize cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.	2006-11-17 18:00:15 +00:00
Peter Eisentraut	e138b80996	String fix	2006-11-16 14:28:41 +00:00
Neil Conway	dc10387eb1	Fix some typos in comments.	2006-11-12 06:55:54 +00:00
Tom Lane	a46ca619f8	Suppress a few 'uninitialized variable' warnings that gcc emits only at -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.	2006-11-11 01:14:19 +00:00
Tom Lane	792d6edd5b	Clean up some misleading references to %p being a full path, per Simon.	2006-11-10 22:32:20 +00:00
Tom Lane	dcbdf9b1d4	Change Windows rename and unlink substitutes so that they time out after 30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.	2006-11-08 20:12:05 +00:00
Tom Lane	48188e1621	Fix recently-understood problems with handling of XID freezing, particularly in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.	2006-11-05 22:42:10 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Tom Lane	1e758d5263	Add some code to CREATE DATABASE to check for pre-existing subdirectories that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.	2006-10-18 22:44:12 +00:00
Peter Eisentraut	b9b4f10b5b	Message style improvements	2006-10-06 17:14:01 +00:00
Tom Lane	378c79dc78	Cleanup for pglz_compress code: remove dead code, const-ify API of remaining functions, simplify pglz_compress's API to not require a useless data copy when compression fails. Also add a check in pglz_decompress that the expected amount of data was decompressed.	2006-10-05 23:33:33 +00:00
Tom Lane	e378f82e00	Make use of qsort_arg in several places that were formerly using klugy static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.	2006-10-05 17:57:40 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Bruce Momjian	45c8ed96b9	Make some sentences consistent with similar ones. Euler Taveira de Oliveira	2006-10-03 21:21:36 +00:00
Alvaro Herrera	4650c4fdb9	Degrade the transaction-id wraparound point message from LOG to DEBUG1, per discussion. Patch from Simon Riggs.	2006-09-26 17:21:39 +00:00
Tom Lane	9e936693a9	Fix free space map to correctly track the total amount of FSM space needed even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.	2006-09-21 20:31:22 +00:00
Teodor Sigaev	deb66e013c	Improve error message. Per discussion http://archives.postgresql.org/pgsql-general/2006-09/msg00186.php	2006-09-14 11:26:49 +00:00
Bruce Momjian	d18768867e	Remove unnecessary brace pair.	2006-09-10 23:33:22 +00:00
Tom Lane	f5b4d9a9e0	If we're going to advertise the array overlap/containment operators, we probably should make them work reliably for all arrays. Fix code to handle NULLs and multidimensional arrays, move it into arrayfuncs.c. GIN is still restricted to indexing arrays with no null elements, however.	2006-09-10 20:14:20 +00:00
Tom Lane	ba920e1c91	Rename contains/contained-by operators to @> and <@, per discussion that agreed these symbols are less easily confused. I made new pg_operator entries (with new OIDs) for the old names, so as to provide backward compatibility while making it pretty easy to remove the old names in some future release cycle. This commit only touches the core datatypes, contrib will be fixed separately.	2006-09-10 00:29:35 +00:00
Teodor Sigaev	889ec4b998	Fix Intel compiler bug. Per discussion 'GIN FailedAssertions on Itanium2 with Intel compiler' in pgsql-hackers, http://archives.postgresql.org/pgsql-hackers/2006-08/msg01914.php	2006-09-05 18:25:10 +00:00
Tom Lane	8fad2e3ff4	Arrange for GetSnapshotData to copy live-subtransaction XIDs from the PGPROC array into snapshots, and use this information to avoid visits to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve the pg_subtrans-related context swap storm problem that's been reported by several people for 8.1. While at it, modify GetSnapshotData to not take an exclusive lock on ProcArrayLock, as closer analysis shows that shared lock is always sufficient. Itagaki Takahiro and Tom Lane	2006-09-03 15:59:39 +00:00
Teodor Sigaev	b681bfdd59	Fix BUG #2594 : Gin Indexes cause server to crash when it builds on empty table	2006-08-29 14:05:44 +00:00
Tom Lane	ca1fd0ea5b	Move xact.c's partial support for Lists of TransactionIds into pg_list.h. Needed because lock.c is now going to use the same type of list.	2006-08-27 19:11:46 +00:00
Tom Lane	e093dcdd28	Add the ability to create indexes 'concurrently', that is, without blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.	2006-08-25 04:06:58 +00:00
Tom Lane	08ae5edc5c	Optimize the case where a btree indexscan has current and mark positions on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas	2006-08-24 01:18:34 +00:00
Tom Lane	35af5422f6	Make the server track an 'XID epoch', that is, maintain higher-order bits of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.	2006-08-21 16:16:31 +00:00
Tom Lane	7aa772f03e	Now that we've rearranged relation open to get a lock before touching the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.	2006-08-18 16:09:13 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00
Tom Lane	7a3e30e608	Add INSERT/UPDATE/DELETE RETURNING, with basic docs and regression tests. plpgsql support to come later. Along the way, convert execMain's SELECT INTO support into a DestReceiver, in order to eliminate some ugly special cases. Jonah Harris and Tom Lane	2006-08-12 02:52:06 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	704ddaaa09	Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.	2006-08-06 03:53:44 +00:00
Tom Lane	bc8ac3ce40	Add missing pgstat_count_index_scan(), per Andreas Seltenreich.	2006-08-03 15:22:09 +00:00
Tom Lane	09d3670df3	Change the relation_open protocol so that we obtain lock on a relation (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.	2006-07-31 20:09:10 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Peter Eisentraut	e9b4969062	DTrace support, with a small initial set of probes by Robert Lor	2006-07-24 16:32:45 +00:00
Tom Lane	9dc842f083	Don't try to truncate multixact SLRU files in checkpoints done during xlog recovery. In the first place, it doesn't work because slru's latest_page_number isn't set up yet (this is why we've been hearing reports of strange "apparent wraparound" log messages during crash recovery, but only from people who'd managed to advance their next-mxact counters some considerable distance from 0). In the second place, it seems a bit unwise to be throwing away data during crash recovery anwyway. This latter consideration convinces me to just disable truncation during recovery, rather than computing latest_page_number and pushing ahead.	2006-07-20 00:46:42 +00:00
Tom Lane	c36418be40	Fix getDatumCopy(): don't use store_att_byval to copy into a Datum variable (this accounts for regression failures on PPC64, and in fact won't work on any big-endian machine). Get rid of hardwired knowledge about datum size rules; make it look just like datumCopy().	2006-07-16 00:54:22 +00:00
Tom Lane	e040ab44e4	Improve error message wording.	2006-07-16 00:52:05 +00:00
Tom Lane	98bac16e4d	Fix misguided removal of access/tuptoaster.h inclusion, per Kris Jurka. I'm going to insist on reversion of this entire patch unless pgrminclude is upgraded to a less broken state, but in the meantime let's get contrib passing regression again.	2006-07-14 19:05:52 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	d29b66882a	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.	2006-07-11 21:05:57 +00:00
Teodor Sigaev	001d30ee6b	Add support to GIN for =(anyarray,anyarray) operation	2006-07-11 19:49:14 +00:00
Bruce Momjian	ac230e7431	Alphabetically order reference to include files, "S"-"Z".	2006-07-11 18:26:11 +00:00
Bruce Momjian	0ff3461bcc	Alphabetically order reference to include files, "N" - "S".	2006-07-11 17:26:59 +00:00
Bruce Momjian	3a534ade39	Alphabetically order reference to include files, "G" - "M".	2006-07-11 17:04:13 +00:00
Teodor Sigaev	234163649e	GIN improvements - Replace sorted array of entries in maintenance_work_mem to binary tree, this should improve create performance. - More precisely calculate allocated memory, eliminate leaks with user-defined extractValue() - Improve wordings in tsearch2	2006-07-11 16:55:34 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Teodor Sigaev	783a73168b	Forget to add new file :((	2006-06-28 12:08:35 +00:00
Teodor Sigaev	1f7ef548ec	Changes * new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l\|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l\|r)datum for splitting. pickSplit should set spl_(l\|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l\|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules	2006-06-28 12:00:14 +00:00
Tom Lane	3c71244b74	Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect this someday, but right now it seems that posix_fadvise is immature to the point of being broken on many platforms ... and we don't have any benchmark evidence proving it's worth spending time on.	2006-06-27 18:59:17 +00:00
Tom Lane	cdd5178c69	Extend the MinimalTuple concept to tuplesort.c, thereby reducing the per-tuple space overhead for sorts in memory. I chose to replace the previous patch that tried to write out the bare minimum amount of data when sorting on disk; instead, just dump the MinimalTuples as-is. This wastes 3 to 10 bytes per tuple depending on architecture and null-bitmap length, but the simplification in the writetup/readtup routines seems worth it.	2006-06-27 16:53:02 +00:00
Tom Lane	3f50ba27cf	Create infrastructure for 'MinimalTuple' representation of in-memory tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.	2006-06-27 02:51:40 +00:00
Tom Lane	3a04f53e7f	pg_stop_backup was calling XLogArchiveNotify() twice for the newly created backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.	2006-06-22 20:42:57 +00:00
Tom Lane	27c3e3de09	Remove redundant gettimeofday() calls to the extent practical without changing semantics too much. statement_timestamp is now set immediately upon receipt of a client command message, and the various places that used to do their own gettimeofday() calls to mark command startup are referenced to that instead. I have also made stats_command_string use that same value for pg_stat_activity.query_start for both the command itself and its eventual replacement by <IDLE> or <idle in transaction>. There was some debate about that, but no argument that seemed convincing enough to justify an extra gettimeofday() call.	2006-06-20 22:52:00 +00:00
Tom Lane	1e8ae13640	Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration for it. Hopefully will fix core dump evidenced by some buildfarm members since fadvise patch went in. The actual definition of the function is not ABI-compatible with compiler's default assumption in the absence of any declaration, so it's clearly unsafe to try to call it without seeing a declaration.	2006-06-18 18:30:21 +00:00
Tom Lane	06e10abc0b	Fix problems with cached tuple descriptors disappearing while still in use by creating a reference-count mechanism, similar to what we did a long time ago for catcache entries. The back branches have an ugly solution involving lots of extra copies, but this way is more efficient. Reference counting is only applied to tupdescs that are actually in caches --- there seems no need to use it for tupdescs that are generated in the executor, since they'll go away during plan shutdown by virtue of being in the per-query memory context. Neil Conway and Tom Lane	2006-06-16 18:42:24 +00:00
Bruce Momjian	40bc06fa16	Test for POSIX_FADV_DONTNEED to use posix_fadvise().	2006-06-16 04:11:48 +00:00
Bruce Momjian	94a5c4a01b	Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file close. ITAGAKI Takahiro	2006-06-15 19:15:00 +00:00
Teodor Sigaev	b32000eda4	Som improve page split in multicolumn GiST index. If user picksplit on n-th column generate equals left and right unions then it calls picksplit on n+1-th column.	2006-05-29 12:50:06 +00:00
Teodor Sigaev	0a6fde5a26	Correct cheking in findParents(). i From Andreas Seltenreich <andreas+pg@gate450.dyndns.org>	2006-05-29 08:39:44 +00:00
Alvaro Herrera	3d58a1c168	Remove traces of otherwise unused RELKIND_SPECIAL symbol. Leave the psql bits in place though, so that it plays nicely with older servers. Per discussion.	2006-05-28 02:27:08 +00:00
Teodor Sigaev	5d1a066e64	Fix findParents() in case of multiple levels to find. By Andreas Seltenreich <andreas+pg@gate450.dyndns.org>	2006-05-26 08:01:17 +00:00
Teodor Sigaev	d2158b0281	* Add support NULL to GiST. * some refactoring and simplify code int gistutil.c and gist.c * now in some cases it can be called used-defined picksplit method for non-first column in index, but here is a place to do more. * small fix of docs related to support NULL.	2006-05-24 11:01:39 +00:00
Teodor Sigaev	09518fbdf4	Call MarkBufferDirty() before XLogInsert() during completion of insert	2006-05-19 17:15:41 +00:00
Teodor Sigaev	420cbff881	Simplify gistSplit() and some refactoring related code.	2006-05-19 16:15:17 +00:00
Teodor Sigaev	5890790b4a	Rework completion of incomplete inserts. Now it writes WAL log during inserts.	2006-05-19 11:10:25 +00:00
Teodor Sigaev	8876e37d07	Reduce size of critial section during vacuum full, critical sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay	2006-05-17 16:34:59 +00:00
Tom Lane	3fdeb189e9	Clean up code associated with updating pg_class statistics columns (relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.	2006-05-10 23:18:39 +00:00
Teodor Sigaev	10dd8df68e	Reduce size of critical section and remove call of user-defined functions in insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG	2006-05-10 09:19:54 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	09cb5c0e7d	Rewrite btree index scans to work a page at a time in all cases (both btgettuple and btgetmulti). This eliminates the problem of "re-finding" the exact stopping point, since the stopping point is effectively always a page boundary, and index items are never moved across pre-existing page boundaries. A small penalty is that the keys_are_unique optimization is effectively disabled (and, therefore, is removed in this patch), causing us to apply _bt_checkkeys() to at least one more tuple than necessary when looking up a unique key. However, the advantages for non-unique cases seem great enough to accept this tradeoff. Aside from simplifying and (sometimes) speeding up the indexscan code, this will allow us to reimplement btbulkdelete as a largely sequential scan instead of index-order traversal, thereby significantly reducing the cost of VACUUM. Those changes will come in a separate patch. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-07 01:21:30 +00:00
Teodor Sigaev	2a58f3bff6	Fix typo noticed by Alvaro Herrera	2006-05-03 06:56:47 +00:00
Tom Lane	e57345975c	Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion. This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().	2006-05-02 22:25:10 +00:00
Tom Lane	67030eec1e	Suppress some gcc warnings.	2006-05-02 15:48:11 +00:00
Teodor Sigaev	8a3631f8d8	GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN.	2006-05-02 11:28:56 +00:00
Tom Lane	d2896a9ed1	Arrange to cache btree metapage data in the relcache entry for the index, thereby saving a visit to the metapage in most index searches/updates. This wouldn't actually save any I/O (since in the old regime the metapage generally stayed in cache anyway), but it does provide a useful decrease in bufmgr traffic in high-contention scenarios. Per my recent proposal.	2006-04-25 22:46:05 +00:00
Bruce Momjian	e6004f0151	Add statement_timestamp(), clock_timestamp(), and transaction_timestamp() (just like now()). Also update statement_timeout() to mention it is statement arrival time that is measured. Catalog version updated.	2006-04-25 00:25:22 +00:00
Tom Lane	eac825aa68	Ensure that we validate the page header of the first page of a WAL file whenever we start to read within that file. The first page carries extra identification information that really ought to be checked, but as the code stood, this was only checked when we switched sequentially into a new WAL file, or if by chance the starting checkpoint record was within the first page. This patch ensures that we will detect bogus 'long header' information before we start replaying the WAL sequence.	2006-04-20 04:07:38 +00:00
Tom Lane	0a87394956	Fix the torn-page hazard for PITR base backups by forcing full page writes to occur between pg_start_backup() and pg_stop_backup(), even if the GUC setting full_page_writes is OFF. Per discussion, doing this in combination with the already-existing checkpoint during pg_start_backup() should ensure safety against partial page updates being included in the backup. We do not have to force full page writes to occur during normal PITR operation, as I had first feared.	2006-04-17 18:55:05 +00:00
Tom Lane	defe93463c	Make the world safe for full_page_writes. Allow XLOG records that try to update no-longer-existing pages to fall through as no-ops, but make a note of each page number referenced by such records. If we don't see a later XLOG entry dropping the table or truncating away the page, complain at the end of XLOG replay. Since this fixes the known failure mode for full_page_writes = off, revert my previous band-aid patch that disabled that GUC variable.	2006-04-14 20:27:24 +00:00
Tom Lane	49a7610c36	Fix an ancient oversight in btree xlog replay. When trying to determine if an upper-level insertion completes a previously-seen split, we cannot simply grab the downlink block number out of the buffer, because the buffer could contain a later state of the page --- or perhaps the page doesn't even exist at all any more, due to relation truncation. These possibilities have been masked up to now because the use of full_page_writes effectively ensured that no xlog replay routine ever actually saw a page state newer than its own change. Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this in existing release branches, but we need a fix in HEAD if we want to have any hope of re-allowing full_page_writes. Accordingly, adjust the contents of btree WAL records so that we can always get the downlink block number from the WAL record rather than having to depend on buffer contents. Per report from Kevin Grittner and Peter Brant. Improve a few comments in related code while at it.	2006-04-13 03:53:05 +00:00
Tom Lane	7fdb4305db	Fix a bunch of problems with domains by making them use special input functions that apply the necessary domain constraint checks immediately. This fixes cases where domain constraints went unchecked for statement parameters, PL function local variables and results, etc. We can also eliminate existing special cases for domains in places that had gotten it right, eg COPY. Also, allow domains over domains (base of a domain is another domain type). This almost worked before, but was disallowed because the original patch hadn't gotten it quite right.	2006-04-05 22:11:58 +00:00
Tom Lane	09b5271ebd	Add a field to the first page of each WAL file to indicate the XLOG_BLCKSZ. This ought to help in preventing configuration mismatch problems if anyone tries to ship PITR files between servers compiled with different XLOG_BLCKSZ settings. Simon Riggs	2006-04-05 03:34:05 +00:00
Tom Lane	e6140d9052	Don't use BLCKSZ for the physical length of the pg_control file, but instead a dedicated symbol. This probably makes no functional difference for likely values of BLCKSZ, but it makes the intent clearer. Simon Riggs, minor editorialization by Tom Lane.	2006-04-04 22:39:59 +00:00
Tom Lane	147d4bf3e5	Modify all callers of datatype input and receive functions so that if these functions are not strict, they will be called (passing a NULL first parameter) during any attempt to input a NULL value of their datatype. Currently, all our input functions are strict and so this commit does not change any behavior. However, this will make it possible to build domain input functions that centralize checking of domain constraints, thereby closing numerous holes in our domain support, as per previous discussion. While at it, I took the opportunity to introduce convenience functions InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O functions. This eliminates a lot of grotty-looking casts, but the main motivation is to make it easier to grep for these places if we ever need to touch them again.	2006-04-04 19:35:37 +00:00
Tom Lane	eaef111396	Define a separately configurable XLOG_BLCKSZ symbol for the page size used within WAL files. Historically this was the same as the data file BLCKSZ, but there's no necessary connection, and it's possible that performance gains might ensue from reducing XLOG_BLCKSZ. In any case distinguishing two symbols should improve code clarity. This commit does not actually change the page size, only provide the infrastructure to make it possible to do so. initdb forced because of addition of a field to pg_control. Mark Wong, with some help from Simon Riggs and Tom Lane.	2006-04-03 23:35:05 +00:00
Tom Lane	c9a2b6d4ca	Fix thinko in gistRedoPageUpdateRecord: if XLR_BKP_BLOCK_1 is set, we don't have anything to do to the page, but we still have to adjust the incomplete_inserts list that we're maintaining in memory.	2006-04-03 16:45:50 +00:00
Teodor Sigaev	8d02b15e33	Eliminate ajust scan code. Since concurrent GiST it doesn't do real work. That was missed during concurrence development.	2006-04-03 13:44:33 +00:00
Tom Lane	89bda95d82	Remove the 'slow' path for btree index build, which built the btree incrementally by successive inserts rather than by sorting the data. We were only using the slow path during bootstrap, apparently because when first written it failed during bootstrap --- but it works fine now AFAICT. Removing it saves a hundred or so lines of code and produces noticeably (~10%) smaller initial states of the system catalog indexes. While that won't make much difference for heavily-modified catalogs, for the more static ones there may be a useful long-term performance improvement.	2006-04-01 03:03:37 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	89395bfa6f	Improve gist XLOG code to follow the coding rules needed to prevent torn-page problems. This introduces some issues of its own, mainly that there are now some critical sections of unreasonably broad scope, but it's a step forward anyway. Further cleanup will require some code refactoring that I'd prefer to get Oleg and Teodor involved in.	2006-03-30 23:03:10 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	0a971e2f20	Disable full_page_writes, because turning it off risks causing crash-recovery failures even when the hardware and OS did nothing wrong. Per recent analysis of a problem report from Alex Bahdushka. For the moment I've just diked out the test of the parameter, rather than removing the GUC infrastructure and documentation, in case we conclude that there's something salvageable there. There seems no chance of it being resurrected in the 8.1 branch though.	2006-03-28 22:01:16 +00:00
Tom Lane	288551fc60	Repair longstanding error in btree xlog replay: XLogReadBuffer should be passed extend = true whenever we are reading a page we intend to reinitialize completely, even if we think the page "should exist". This is because it might indeed not exist, if the relation got truncated sometime after the current xlog record was made and before the crash we're trying to recover from. These two thinkos appear to explain both of the old bug reports discussed here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php	2006-03-28 21:17:23 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Tom Lane	2316013961	Clean up representation of function RTEs for functions returning RECORD. The original coding stored the raw parser output (ColumnDef and TypeName nodes) which was ugly, bulky, and wrong because it failed to create any dependency on the referenced datatype --- and in fact would not track type renamings and suchlike. Instead store a list of column type OIDs in the RTE. Also fix up general failure of recordDependencyOnExpr to do anything sane about recording dependencies on datatypes. While there are many cases where there will be an indirect dependency (eg if an operator returns a datatype, the dependency on the operator is enough), we do have to record the datatype as a separate dependency in examples like CoerceToDomain. initdb forced because of change of stored rules.	2006-03-16 00:31:55 +00:00
Tom Lane	20ab467d76	Improve parser so that we can show an error cursor position for errors during parse analysis, not only errors detected in the flex/bison stages. This is per my earlier proposal. This commit includes all the basic infrastructure, but locations are only tracked and reported for errors involving column references, function calls, and operators. More could be done later but this seems like a good set to start with. I've also moved the ReportSyntaxErrorPosition logic out of psql and into libpq, which should make it available to more people --- even within psql this is an improvement because warnings weren't handled by ReportSyntaxErrorPosition.	2006-03-14 22:48:25 +00:00
Tom Lane	9f6192490e	Add a CHECK_FOR_INTERRUPTS() in _bt_buildadd(). This fixes problem with not responding to query cancel during the last stage of btree index creation.	2006-03-10 20:18:15 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Neil Conway	8e5a10d46c	This patch makes the error message strings throughout the backend more compliant with the error message style guide. In particular, errdetail should begin with a capital letter and end with a period, whereas errmsg should not. I also fixed a few related issues in passing, such as fixing the repeated misspelling of "lexeme" in contrib/tsearch2 (per Tom's suggestion).	2006-03-01 06:30:32 +00:00
Neil Conway	737651f6be	Cleanup the usage of ScanDirection: use the symbolic names for the possible ScanDirection alternatives rather than magic numbers (-1, 0, 1). Also, use the ScanDirection macros in a few places rather than directly checking whether `dir == ForwardScanDirection' and the like. Per patch from James William Pye. His patch also changed ScanDirection to be a "char" rather than an enum, which I haven't applied.	2006-02-21 23:01:54 +00:00
Tom Lane	2d7f694729	Move btbulkdelete's vacuum_delay_point() call to a place in the loop where we are not holding a buffer content lock; where it was, InterruptHoldoffCount is positive and so we'd not respond to cancel signals as intended. Also add missing vacuum_delay_point() call in btvacuumcleanup. This should fix complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM in a timely fashion (bug #2257).	2006-02-14 17:20:01 +00:00
Tom Lane	49758f4703	Add some missing vacuum_delay_point calls in GIST vacuuming.	2006-02-14 16:39:32 +00:00
Tom Lane	d52a57fc30	Actually there's a better way to do this, which is to count tuples during the vacuumcleanup scan that we're going to do anyway. Should save a few cycles (one calculation per page, not per tuple) as well as not having to depend on assumptions about heap and index being in step. I think this could probably be made to work for GIST too, but that code looks messy enough that I'm disinclined to try right now.	2006-02-12 00:18:17 +00:00
Tom Lane	fd267c1ebc	Skip ambulkdelete scan if there's nothing to delete and the index is not partial. None of the existing AMs do anything useful except counting tuples when there's nothing to delete, and we can get a tuple count from the heap as long as it's not a partial index. (hash actually can skip anyway because it maintains a tuple count in the index metapage.) GIST is not currently able to exploit this optimization because, due to failure to index NULLs, GIST is always effectively partial. Possibly we should fix that sometime. Simon Riggs w/ some review by Tom Lane.	2006-02-11 23:31:34 +00:00
Bruce Momjian	77bb65d3fc	Revert based on Tom's recommendation: > Allow VACUUM to complete faster by avoiding scanning the indexes when no > rows were removed from the heap by the VACUUM.	2006-02-11 17:14:09 +00:00
Bruce Momjian	bf324946b3	Allow VACUUM to complete faster by avoiding scanning the indexes when no rows were removed from the heap by the VACUUM. Simon Riggs	2006-02-11 16:59:09 +00:00
Tom Lane	5997386a0a	Remove the no-longer-useful HashItem/HashItemData level of structure. Same motivation as for BTItem.	2006-01-25 23:26:11 +00:00
Tom Lane	c389760c32	Remove the no-longer-useful BTItem/BTItemData level of structure, and just refer to btree index entries as plain IndexTuples, which is what they have been for a very long time. This is mostly just an exercise in removing extraneous notation, but it does save a palloc/pfree cycle per index insertion.	2006-01-25 23:04:21 +00:00
Tom Lane	3a0a16cb7e	Allow row comparisons to be used as indexscan qualifications. This completes the project to upgrade our handling of row comparisons.	2006-01-25 20:29:24 +00:00
Tom Lane	7ccaf13a06	Instead of using a numberOfRequiredKeys count to distinguish required and non-required keys in a btree index scan, mark the required scankeys with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD. This seems at least marginally clearer to me, and it eliminates a wired-into-the- data-structure assumption that required keys are consecutive. Even though that assumption will remain true for the foreseeable future, having it in there makes the code seem more complex than necessary.	2006-01-23 22:31:41 +00:00
Tom Lane	c89a0dd3bb	Repair longstanding bug in slru/clog logic: it is possible for two backends to try to create a log segment file concurrently, but the code erroneously specified O_EXCL to open(), resulting in a needless failure. Before 7.4, it was even a PANIC condition :-(. Correct code is actually simpler than what we had, because we can just say O_CREAT to start with and not need a second open() call. I believe this accounts for several recent reports of hard-to-reproduce "could not create file ...: File exists" errors in both pg_clog and pg_subtrans.	2006-01-21 04:38:21 +00:00
Tom Lane	73e3566078	Improve comments about btree's use of ScanKey data structures: there are two basically different kinds of scankeys, and we ought to try harder to indicate which is used in each place in the code. I've chosen the names "search scankey" and "insertion scankey", though you could make about as good an argument for "operator scankey" and "comparison function scankey".	2006-01-17 00:09:01 +00:00
Tom Lane	f7ea931287	Some minor code cleanup, falling out from the removal of rtree. SK_NEGATE isn't being used anywhere anymore, and there seems no point in a generic index_keytest() routine when two out of three remaining access methods aren't using it. Also, add a comment documenting a convention for letting access methods define private flag bits in ScanKey sk_flags. There are no such flags at the moment but I'm thinking about changing btree's handling of "required keys" to use flag bits in the keys rather than a count of required key positions. Also, if some AM did still want SK_NEGATE then it would be reasonable to treat it as a private flag bit.	2006-01-14 22:03:35 +00:00
Neil Conway	fb627b76cc	Cosmetic code cleanup: fix a bunch of places that used "return (expr);" rather than "return expr;" -- the latter style is used in most of the tree. I kept the parentheses when they were necessary or useful because the return expression was complex.	2006-01-11 08:43:13 +00:00
Tom Lane	afa8f1971a	Add RelationOpenSmgr() calls to ensure rd_smgr is valid when we try to use it. While it normally has been opened earlier during btree index build, testing shows that it's possible for the link to be closed again if an sinval reset occurs while the index is being built.	2006-01-07 22:45:41 +00:00
Tom Lane	2d0475e480	Convert Assert checking for empty page into a regular test and elog. The consequences of overwriting a non-empty page are bad enough that we should not omit this test in production builds.	2006-01-06 00:15:50 +00:00
Peter Eisentraut	86c23a6eb2	Make all command-line options of postmaster and postgres the same. See http://archives.postgresql.org/pgsql-hackers/2006-01/msg00151.php for the complete plan.	2006-01-05 10:07:46 +00:00
Tom Lane	195f164228	Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS (hence, these correspond to the old SpinLockAcquire_NoHoldoff case). Given our coding rules for spinlock use, there is no reason to allow CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there is no situation where ImmediateInterruptOK will be true while holding a spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.	2005-12-29 18:08:05 +00:00
Tom Lane	ab51bbaa06	Arrange to set the LC_XXX environment variables to match our locale setup. This protects against undesired changes in locale behavior if someone carelessly does setlocale(LC_ALL, "") (and we know who you are, perl guys).	2005-12-28 23:22:51 +00:00
Bruce Momjian	261114a23f	I have added these macros to c.h: #define HIGHBIT (0x80) #define IS_HIGHBIT_SET(ch) ((unsigned char)(ch) & HIGHBIT) and removed CSIGNBIT and mapped it uses to HIGHBIT. I have also added uses for IS_HIGHBIT_SET where appropriate. This change is purely for code clarity.	2005-12-25 02:14:19 +00:00
Tom Lane	656beff590	Adjust string comparison so that only bitwise-equal strings are considered equal: if strcoll claims two strings are equal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior under glibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-defined behavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparison and not bother with strcoll at all. NOTE: affected databases may need to REINDEX indexes on text columns to be sure they are self-consistent.	2005-12-22 22:50:00 +00:00
Tom Lane	ec0baf949e	Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions.	2005-12-11 21:02:18 +00:00
Tom Lane	cefcbbf1fd	Push the responsibility for handling ignore_killed_tuples down into _bt_checkkeys(), instead of checking it in the top-level nbtree.c routines as formerly. This saves a little bit of loop overhead, but more importantly it lets us skip performing the index key comparisons for dead tuples.	2005-12-07 19:37:53 +00:00
Tom Lane	f1b059af12	A couple of tiny performance hacks in _bt_step(). Remove PageIsEmpty checks, which were once needed because PageGetMaxOffsetNumber would fail on empty pages, but are now just redundant. Also, don't set up local variables that aren't needed in the fast path --- most of the time, we only need to advance offnum and not step across a page boundary. Motivated by noticing _bt_step at the top of OProfile profile for a pgbench run.	2005-12-07 18:03:48 +00:00
Tom Lane	887a7c61f6	Get rid of slru.c's hardwired insistence on a fixed number of slots per SLRU area. The number of slots is still a compile-time constant (someday we might want to change that), but at least it's a different constant for each SLRU area. Increase number of subtrans buffers to 32 based on experimentation with a heavily subtrans-bashing test case, and increase number of multixact member buffers to 16, since it's obviously silly for it not to be at least twice the number of multixact offset buffers.	2005-12-06 23:08:34 +00:00
Tom Lane	a615acf555	Arrange for read-only accesses to SLRU page buffers to take only a shared lock, not exclusive, if the desired page is already in memory. This can be demonstrated to be a significant win on the pg_subtrans cache when there is a large window of open transactions. It should be useful for pg_clog as well. I didn't try to make GetMultiXactIdMembers() use the code, as that would have taken some restructuring, and what with the local cache for multixact contents it probably wouldn't really make a difference. Per my recent proposal.	2005-12-06 18:10:06 +00:00
Tom Lane	a98871b7ac	Tweak indexscan machinery to avoid taking an AccessShareLock on an index if we already have a stronger lock due to the index's table being the update target table of the query. Same optimization I applied earlier at the table level. There doesn't seem to be much interest in the more radical idea of not locking indexes at all, so do what we can ...	2005-12-03 05:51:03 +00:00
Tom Lane	4c4eb57154	Some marginal additional hacking to shave a few more cycles off heapgettup.	2005-11-26 05:03:06 +00:00
Tom Lane	70f1482de3	Change seqscan logic so that we check visibility of all tuples on a page when we first read the page, rather than checking them one at a time. This allows us to take and release the buffer content lock just once per page, instead of once per tuple. Since it's a shared lock the contention penalty for holding the lock longer shouldn't be too bad. We can safely do this only when using an MVCC snapshot; else the assumption that visibility won't change over time is uncool. Therefore there are now two code paths depending on the snapshot type. I also made the same change in nodeBitmapHeapscan.c, where it can be done always because we only support MVCC snapshots for bitmap scans anyway. Also make some incidental cleanups in the APIs of these functions. Per a suggestion from Qingqing Zhou.	2005-11-26 03:03:07 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	dd218ae7b0	Remove the t_datamcxt field of HeapTupleData. This was introduced for the convenience of tuptoaster.c and is no longer needed, so may as well get rid of some small amount of overhead.	2005-11-20 19:49:08 +00:00
Tom Lane	40314f2dac	Modify tuptoaster's API so that it does not try to modify the passed tuple in-place, but instead passes back an all-new tuple structure if any changes are needed. This is a much cleaner and more robust solution for the bug discovered by Alexey Beschiokov; accordingly, revert the quick hack I installed yesterday. With this change, HeapTupleData.t_datamcxt is no longer needed; will remove it in a separate commit in HEAD only.	2005-11-20 18:38:20 +00:00
Tom Lane	2a8d3d83ef	R-tree is dead ... long live GiST.	2005-11-07 17:36:47 +00:00
Tom Lane	6236991143	Add simple sanity checks on newly-read pages to GiST, too.	2005-11-06 22:39:21 +00:00
Tom Lane	766dc45d9f	Add defenses to btree and hash index AMs to do simple sanity checks on every index page they read; in particular to catch the case of an all-zero page, which PageHeaderIsValid allows to pass. It turns out hash already had this idea, but it was just Assert()ing things rather than doing a straight error check, and the Asserts were partially redundant with PageHeaderIsValid anyway. Per recent failure example from Jim Nasby. (gist still needs the same treatment.)	2005-11-06 19:29:01 +00:00
Tom Lane	18691d8ee3	Clean up representation of SLRU page state. This is the cleaner fix for the SLRU race condition that I posted a few days ago, but we decided not to use in 8.1 and older branches.	2005-11-05 21:19:47 +00:00
Alvaro Herrera	902377c465	Rename the members of CommandDest enum so they don't collide with other uses of those names. (Debug and None were pretty bad names anyway.) I hope I catched all uses of the names in comments too.	2005-11-03 17:11:40 +00:00
Tom Lane	99d48695d4	Fix longstanding race condition in transaction log management: there was a very narrow window in which SimpleLruReadPage or SimpleLruWritePage could think that I/O was needed when it wasn't (and indeed the buffer had already been assigned to another page). This would result in an Assert failure if Asserts were enabled, and probably in silent data corruption if not. Reported independently by Jim Nasby and Robert Creager. I intend a more extensive fix when 8.2 development starts, but this is a reasonably low-impact patch for the existing branches.	2005-11-03 00:23:36 +00:00
Peter Eisentraut	07bb9f086b	Message corrections	2005-10-29 00:31:52 +00:00
Tom Lane	a037926295	Reorder code so that we don't have to hold a critical section while reserving SLRU space for a new MultiXact. The original coding would have treated out-of-disk-space as a PANIC condition, which is unnecessary.	2005-10-28 19:00:19 +00:00
Tom Lane	1986ca5ce5	Fix race condition in multixact code: it's possible to try to read a multixact's starting offset before the offset has been stored into the SLRU file. A simple fix would be to hold the MultiXactGenLock until the offset has been stored, but that looks like a big concurrency hit. Instead rely on knowledge that unset offsets will be zero, and loop when we see a zero. This requires a little extra hacking to ensure that zero is never a valid value for the offset. Problem reported by Matteo Beccati, fix ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.	2005-10-28 17:27:29 +00:00
Tom Lane	6d6c3722fb	Make code for selecting default WAL sync method less confusing.	2005-10-22 20:27:17 +00:00
Tom Lane	d9cb48786e	Better solution to the problem of labeling whole-row Datums that are generated from subquery outputs: use the type info stored in the Var itself. To avoid making ExecEvalVar and slot_getattr more complex and slower, I split out the whole-row case into a separate ExecEval routine.	2005-10-19 22:30:30 +00:00
Tom Lane	07908c9c37	Ensure that the Datum generated from a whole-row Var contains valid type ID information even when it's a record type. This is needed to handle whole-row Vars referencing subquery outputs. Per example from Richard Huxton.	2005-10-19 18:18:33 +00:00
Tom Lane	23836fb1fb	A few trivial code cleanups motivated by reading warnings generated by a recent HP C compiler. Mostly, get rid of useless local variables that are assigned to but never used.	2005-10-18 01:06:24 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Bruce Momjian	1d028537a2	This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 22:55:55 +00:00
Bruce Momjian	84cc9a4bb3	Back out this because of fear of changing error strings: This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 17:57:57 +00:00
Bruce Momjian	90c22c9206	This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 17:57:17 +00:00
Tom Lane	e952ae1268	Fix longstanding bug found by Atsushi Ogawa: _bt_check_unique would mark the wrong buffer dirty when trying to kill a dead index entry that's on a page after the one it started on. No risk of data corruption, just inefficiency, but still a bug.	2005-10-12 17:18:03 +00:00
Tom Lane	cb8b6618ce	Revise pgstats stuff to fix the problems with not counting accesses generated by bitmap index scans. Along the way, simplify and speed up the code for counting sequential and index scans; it was both confusing and inefficient to be taking care of that in the per-tuple loops, IMHO. initdb forced because of internal changes in pg_stat view definitions.	2005-10-06 02:29:23 +00:00
Tom Lane	64eea6c21d	Expand pg_control information so that we can verify that the database was created on a machine with alignment rules and floating-point format similar to the current machine. Per recent discussion, this seems like a good idea with the increasing prevalence of 32/64 bit environments.	2005-10-03 00:28:43 +00:00
Tom Lane	303e089df5	Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.	2005-09-24 22:54:44 +00:00
Bruce Momjian	b3364fc81b	pgindent new GIST index code, per request from Tom.	2005-09-22 20:44:36 +00:00
Tom Lane	08817bdb76	Adjust GiST error messages to conform to message style guidelines.	2005-09-22 18:49:45 +00:00
Teodor Sigaev	f4516f8732	Small fixes	2005-09-16 14:40:54 +00:00
Neil Conway	1dd9b09332	Copy-editing for GiST README.	2005-09-15 17:44:27 +00:00
Teodor Sigaev	79fae4a764	Readme about GiST's algorithms	2005-09-15 16:39:15 +00:00
Tom Lane	35e9b1cc1e	Clean up a couple of ad-hoc computations of the maximum number of tuples on a page, as suggested by ITAGAKI Takahiro. Also, change a few places that were using some other estimates of max-items-per-page to consistently use MaxOffsetNumber. This is conservatively large --- we could have used the new MaxHeapTuplesPerPage macro, or a similar one for index tuples --- but those places are simply declaring a fixed-size buffer and assuming it will work, rather than actively testing for overrun. It seems safer to size these buffers in a way that can't overflow even if the page is corrupt.	2005-09-02 19:02:20 +00:00
Tom Lane	037709e0b3	Reduce default value of max_prepared_transactions from 50 to 5. This saves nearly 700kB in the default shared memory segment size, which seems worthwhile, and it is a feature that many users won't use anyway. Per Heikki's argument, there is no point in a compromise value --- those who are using 2PC at all will probably want it at least equal to max_connections. But we can't set it to zero by default without breaking the prepared_xacts regression test.	2005-08-29 21:38:18 +00:00
Tom Lane	9052537325	Rewrite gather-write patch into something less obviously bolted on after the fact. Fix bug with incorrect test for whether we are at end of logfile segment. Arrange for writes triggered by XLogInsert's is-cache-more-than-half-full test to synchronize with the cache boundaries, so that in long transactions we tend to write alternating halves of the cache rather than randomly chosen portions of it; this saves one more write syscall per cache load.	2005-08-22 23:59:04 +00:00
Bruce Momjian	8ad3965a11	Improve xid wraparound message (the server isn't really shut down, just not accepting queries). errmsg("database is not accepting queries to avoid wraparound data loss in database \"%s\"", errhint("Stop the postmaster and use a standalone backend to VACUUM database \"%s\".",	2005-08-22 16:59:47 +00:00
Tom Lane	d0096a41fa	Fix some inconsistent choices of datatypes in xlog.c. Make buffer indexes all be int, rather than variously int, uint16 and uint32; add some casts where necessary to support large buffer arrays.	2005-08-22 00:41:28 +00:00
Tom Lane	f39f6b500f	Seems that the childXids list would be better based on Oid lists than integer lists.	2005-08-20 23:45:08 +00:00
Tom Lane	0007490e09	Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.	2005-08-20 23:26:37 +00:00
Tatsuo Ishii	ba2fc7eb4b	Make GetMultiXactIdMembers() a public function.	2005-08-20 01:29:27 +00:00
Tom Lane	f57e3f4cf3	Repair problems with VACUUM destroying t_ctid chains too soon, and with insufficient paranoia in code that follows t_ctid links. (We must do both because even with VACUUM doing it properly, the intermediate state with a dangling t_ctid link is visible concurrently during lazy VACUUM, and could be seen afterwards if either type of VACUUM crashes partway through.) Also try to improve documentation about what's going on. Patch is a bit bulky because passing the XMAX information around required changing the APIs of some low-level heapam.c routines, but it's not conceptually very complicated. Per trouble report from Teodor and subsequent analysis. This needs to be back-patched, but I'll do that after 8.1 beta is out.	2005-08-20 00:40:32 +00:00
Tom Lane	f8d0a82bf9	Avoid an Assert failure if OuterUserId hasn't been set yet during AbortTransaction. This can happen if a backend's InitPostgres transaction fails (eg, because the given username is invalid). Per Alvaro.	2005-08-17 22:14:34 +00:00
Tom Lane	6ea05c16a4	Change a couple of "can't happen" error messages to be a shade more verbose when they do happen. The "left link changed unexpectedly" one in particular has been seen more than once in the field.	2005-08-12 14:34:14 +00:00
Tom Lane	721e53785d	Solve the problem of OID collisions by probing for duplicate OIDs whenever we generate a new OID. This prevents occasional duplicate-OID errors that can otherwise occur once the OID counter has wrapped around. Duplicate relfilenode values are also checked for when creating new physical files. Per my recent proposal.	2005-08-12 01:36:05 +00:00
Tom Lane	d90c531188	Autovacuum loose end mop-up. Provide autovacuum-specific vacuum cost delay and limit, both as global GUCs and as table-specific entries in pg_autovacuum. stats_reset_on_server_start is now OFF by default, but a reset is forced if we did WAL replay. XID-wrap vacuums do not ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera	2005-08-11 21:11:50 +00:00
Bruce Momjian	949ebbd55e	Mention MD5 function index for indexing long values.	2005-08-11 13:22:33 +00:00
Tom Lane	24ff62d76f	Make new hints follow style guide.	2005-08-10 22:39:00 +00:00
Bruce Momjian	237be3cc29	Add hints to cases where indexes fail because of values that are too long.	2005-08-10 21:36:46 +00:00
Tom Lane	4568e0f791	Modify AtEOXact_CatCache and AtEOXact_RelationCache to assume that the ResourceOwner mechanism already released all reference counts for the cache entries; therefore, we do not need to scan the catcache or relcache at transaction end, unless we want to do it as a debugging crosscheck. Do the crosscheck only in Assert mode. This is the same logic we had previously installed in AtEOXact_Buffers to avoid overhead with large numbers of shared buffers. I thought it'd be a good idea to do it here too, in view of Kari Lavikka's recent report showing a real-world case where AtEOXact_CatCache is taking a significant fraction of runtime.	2005-08-08 19:17:23 +00:00
Tom Lane	0001e98d54	Code and docs review for pg_column_size() patch.	2005-08-02 16:11:57 +00:00
Tom Lane	2a4fad1a0e	Add NOWAIT option to SELECT FOR UPDATE/SHARE. Original patch by Hans-Juergen Schoenig, revisions by Karel Zak and Tom Lane.	2005-08-01 20:31:16 +00:00
Tom Lane	d42cf5a42a	Add per-user and per-database connection limit options. This patch also includes preliminary update of pg_dumpall for roles. Petr Jelinek, with review by Bruce Momjian and Tom Lane.	2005-07-31 17:19:22 +00:00
Bruce Momjian	5b0bfec414	Fix compile for no O_SYNC, but introduced with O_DIRECT.	2005-07-30 14:15:44 +00:00
Tom Lane	5d5f1a79e6	Clean up a number of autovacuum loose ends. Make the stats collector track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.	2005-07-29 19:30:09 +00:00
Bruce Momjian	c6b1724c67	Update O_DIRECT comment.	2005-07-29 03:25:53 +00:00
Bruce Momjian	c34bb00581	Use O_DIRECT if available when using O_SYNC for wal_sync_method. Also, write multiple WAL buffers out in one write() operation. ITAGAKI Takahiro --------------------------------------------------------------------------- > If we disable writeback-cache and use open_sync, the per-page writing > behavior in WAL module will show up as bad result. O_DIRECT is similar > to O_DSYNC (at least on linux), so that the benefit of it will disappear > behind the slow disk revolution. > > In the current source, WAL is written as: > for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); } > Is this intentional? Can we rewrite it as follows? > write(&buffers[0], N * BLCKSZ); > > In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff). > Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff). > These two patches are independent, so they can be applied either or both. > > > I tested them on my machine and the results as follows. It shows that > direct-io and gather-write is the best choice when writeback-cache is off. > Are these two patches worth trying if they are used together? > > > \| writeback \| fsync= \| fdata \| open_ \| fsync_ \| open_ > patch \| cache \| false \| sync \| sync \| direct \| direct > ------------+-----------+--------+-------+-------+--------+--------- > direct io \| off \| 124.2 \| 105.7 \| 48.3 \| 48.3 \| 48.2 > direct io \| on \| 129.1 \| 112.3 \| 114.1 \| 142.9 \| 144.5 > gather-write\| off \| 124.3 \| 108.7 \| 105.4 \| (N/A) \| (N/A) > both \| off \| 131.5 \| 115.5 \| 114.4 \| 145.4 \| 145.2 > > - 20runs * pgbench -s 100 -c 50 -t 200 > - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8) > - using 2 ATA disks: > - hda(reiserfs) includes system and wal. > - hdc(jfs) includes database files. writeback-cache is always on. > > --- > ITAGAKI Takahiro	2005-07-29 03:22:33 +00:00
Tom Lane	e5d6b91220	Add SET ROLE. This is a partial commit of Stephen Frost's recent patch; I'm still working on the has_role function and information_schema changes.	2005-07-25 22:12:34 +00:00
Bruce Momjian	9af9d674c6	Remove unintended code addition.	2005-07-23 15:31:16 +00:00
Bruce Momjian	4098c8867d	Macro alignment cleanup.	2005-07-23 15:29:47 +00:00
Tom Lane	f2bf2d2dc5	Fix a couple of bogus comments, per Alvaro.	2005-07-13 22:46:09 +00:00
Tom Lane	d7207cfc6b	Even though I'd like to see full_page_writes go away before 8.1, a minimum requirement is that it not completely break the system meanwhile. Put the test in the right place.	2005-07-08 04:07:26 +00:00
Bruce Momjian	a923602855	Add pg_column_size() to return storage size of a column, including possible compression. Mark Kirkwood	2005-07-06 19:02:54 +00:00
Bruce Momjian	326a7a0788	Add GUC full_page_writes to control writing full pages to WAL.	2005-07-05 23:18:10 +00:00
Tom Lane	eb5949d190	Arrange for the postmaster (and standalone backends, initdb, etc) to chdir into PGDATA and subsequently use relative paths instead of absolute paths to access all files under PGDATA. This seems to give a small performance improvement, and it should make the system more robust against naive DBAs doing things like moving a database directory that has a live postmaster in it. Per recent discussion.	2005-07-04 04:51:52 +00:00
Tom Lane	e7e1694295	Migrate rtree_gist functionality into the core system, and add some basic regression tests for GiST to the standard regression tests. I took the opportunity to add an rtree-equivalent gist opclass for circles; the contrib version only covered boxes and polygons, but indexing circles is very handy for distance searches.	2005-07-01 19:19:05 +00:00
Teodor Sigaev	5350216156	Improve error messages and add comment	2005-07-01 13:18:17 +00:00
Teodor Sigaev	898a7bd13b	Bug fixes for GiST crash recovery. - add forgotten check of lsn for insert completion - remove level of pages: hard to check in recovery - some cleanups	2005-06-30 17:52:14 +00:00
Tom Lane	401de9c8be	Improve the checkpoint signaling mechanism so that the bgwriter can tell the difference between checkpoints forced due to WAL segment consumption and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid generating 'checkpoints are occurring too frequently' messages when the checkpoint wasn't caused by WAL segment consumption. Per gripe from Chris K-L.	2005-06-30 00:00:52 +00:00
Tom Lane	b5f7cff84f	Clean up the rather historically encumbered interface to now() and current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.	2005-06-29 22:51:57 +00:00
Teodor Sigaev	4523e0b63a	Cleanup, remove unneeded pallocs	2005-06-29 14:06:14 +00:00
Teodor Sigaev	88b49cdc95	Code cleanup. gistfillbuffer accepts InvalidOffsetNumber.	2005-06-28 15:51:00 +00:00
Tom Lane	7762619e95	Replace pg_shadow and pg_group by new role-capable catalogs pg_authid and pg_auth_members. There are still many loose ends to finish in this patch (no documentation, no regression tests, no pg_dump support for instance). But I'm going to commit it now anyway so that Alvaro can make some progress on shared dependencies. The catalog changes should be pretty much done.	2005-06-28 05:09:14 +00:00
Teodor Sigaev	e8cab5fe49	Concurrency for GiST - full concurrency for insert/update/select/vacuum: - select and vacuum never locks more than one page simultaneously - select (gettuple) hasn't any lock across it's calls - insert never locks more than two page simultaneously: - during search of leaf to insert it locks only one page simultaneously - while walk upward to the root it locked only parent (may be non-direct parent) and child. One of them X-lock, another may be S- or X-lock - 'vacuum full' locks index - improve gistgetmulti - simplify XLOG records Fix bug in index_beginscan_internal: LockRelation may clean rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation	2005-06-27 12:45:23 +00:00
Tom Lane	b90f8f20f0	Extend r-tree operator classes to handle Y-direction tests equivalent to the existing X-direction tests. An rtree class now includes 4 actual 2-D tests, 4 1-D X-direction tests, and 4 1-D Y-direction tests. This involved adding four new Y-direction test operators for each of box and polygon; I followed the PostGIS project's lead as to the names of these operators. NON BACKWARDS COMPATIBLE CHANGE: the poly_overleft (&<) and poly_overright (&>) operators now have semantics comparable to box_overleft and box_overright. This is necessary to make r-tree indexes work correctly on polygons. Also, I changed circle_left and circle_right to agree with box_left and box_right --- formerly they allowed the boundaries to touch. This isn't actually essential given the lack of any r-tree opclass for circles, but it seems best to sync all the definitions while we are at it.	2005-06-24 20:53:34 +00:00
Tom Lane	9a09248edd	Fix rtree and contrib/rtree_gist search behavior for the 1-D box and polygon operators (<<, &<, >>, &>). Per ideas originally put forward by andrew@supernews and later rediscovered by moi. This patch just fixes the existing opclasses, and does not add any new behavior as I proposed earlier; that can be sorted out later. In principle this could be back-patched, since it changes only search behavior and not system catalog entries nor rtree index contents. I'm not currently planning to do that, though, since I think it could use more testing.	2005-06-24 00:18:52 +00:00
Tom Lane	e98edb5555	Fix the mechanism for reporting the original table OID and column number of columns of a query result so that it can "see through" cursors and prepared statements. Per gripe a couple months back from John DeSoi.	2005-06-22 17:45:46 +00:00
Tom Lane	b95ae32b41	Avoid WAL-logging individual tuple insertions during CREATE TABLE AS (a/k/a SELECT INTO). Instead, flush and fsync the whole relation before committing. We do still need the WAL log when PITR is active, however. Simon Riggs and Tom Lane.	2005-06-20 18:37:02 +00:00
Teodor Sigaev	1bfdd1a893	fix founded hole in recovery after crash, add vacuum_delay_point()	2005-06-20 15:22:38 +00:00
Teodor Sigaev	d544ec8bbd	1. full functional WAL for GiST 2. improve vacuum for gist - use FSM - full vacuum: - reforms parent tuple if it's needed ( tuples was deleted on child page or parent tuple remains invalid after crash recovery ) - truncate index file if possible 3. fixes bugs and mistakes	2005-06-20 10:29:37 +00:00
Tom Lane	d961a56899	Avoid unnecessary palloc overhead in _bt_first(). The temporary scankeys arrays that it needs can never have more than INDEX_MAX_KEYS entries, so it's reasonable to just allocate them as fixed-size local arrays, and save the cost of palloc/pfree. Not a huge savings, but a cycle saved is a cycle earned ...	2005-06-19 22:41:00 +00:00
Tom Lane	fc654583ab	Need #include <time.h> on some platforms.	2005-06-19 22:34:56 +00:00
Tom Lane	3f749924f8	Simplify uses of readdir() by creating a function ReadDir() that includes error checking and an appropriate ereport(ERROR) message. This gets rid of rather tedious and error-prone manipulation of errno, as well as a Windows-specific bug workaround, at more than a dozen call sites. After an idea in a recent patch by Heikki Linnakangas.	2005-06-19 21:34:03 +00:00
Tom Lane	e26b0abda3	Arrange to fsync two-phase-commit state files only during checkpoints; given reasonably short lifespans for prepared transactions, this should mean that only a small minority of state files ever need to be fsynced at all. Per discussion with Heikki Linnakangas.	2005-06-19 20:00:39 +00:00
Tom Lane	a8d1075f27	Add a time-of-preparation column to the pg_prepared_xacts view, per an old suggestion by Oliver Jowett. Also, add a transaction column to the pg_locks view to show the xid of each transaction holding or awaiting locks; this allows prepared transactions to be properly associated with the locks they own. There was already a column named 'transaction', and I chose to rename it to 'transactionid' --- since this column is new in the current devel cycle there should be no backwards compatibility issue to worry about.	2005-06-18 19:33:42 +00:00
Tom Lane	66b098492e	Dept. of second thoughts: regular COMMIT deletes deletable files before releasing locks, so COMMIT PREPARED should too.	2005-06-18 05:21:09 +00:00
Tom Lane	d0a89683a3	Two-phase commit. Original patch by Heikki Linnakangas, with additional hacking by Alvaro Herrera and Tom Lane.	2005-06-17 22:32:51 +00:00
Bruce Momjian	f4d907ca85	Remove old .backup files when we do pg_stop_backup(). This prevents a large number of .backup files from existing in pg_xlog/	2005-06-15 01:36:08 +00:00
Teodor Sigaev	37c839365c	WAL for GiST. It work for online backup and so on, but on recovery after crash (power loss etc) it may say that it can't restore index and index should be reindexed. Some refactoring code.	2005-06-14 11:45:14 +00:00
Tom Lane	c186c93148	Change the planner to allow indexscan qualification clauses to use nonconsecutive columns of a multicolumn index, as per discussion around mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This turns out to require only minimal changes in btree, and so far as I can see none at all in GiST. btcostestimate did need some work, but its original assumption that index selectivity == heap selectivity was quite bogus even before this.	2005-06-13 23:14:49 +00:00
Bruce Momjian	51746c4549	Free buffer allocated via malloc (process is short-lived, but fix it anyway).	2005-06-09 22:36:27 +00:00
Tom Lane	dce83e76e0	Add missing #include -- mea culpa.	2005-06-09 21:01:25 +00:00
Tom Lane	82358ca936	Put a critical section around update of hash index metapage. Per discussion with Qingqing Zhou.	2005-06-09 18:23:50 +00:00
Tom Lane	f5b2f60bd1	Change WAL-logging scheme for multixacts to be more like regular transaction IDs, rather than like subtrans; in particular, the information now survives a database restart. Per previous discussion, this is essential for PITR log shipping and for 2PC.	2005-06-08 15:50:28 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Tom Lane	a91fa39028	Add test to WAL replay to verify that xl_prev points back to the previous WAL record; this is necessary to be sure we recognize stale WAL records when a WAL page was only partially written during a system crash.	2005-05-31 19:10:28 +00:00
Tom Lane	e92a88272e	Modify hash_search() API to prevent future occurrences of the error spotted by Qingqing Zhou. The HASH_ENTER action now automatically fails with elog(ERROR) on out-of-memory --- which incidentally lets us eliminate duplicate error checks in quite a bunch of places. If you really need the old return-NULL-on-out-of-memory behavior, you can ask for HASH_ENTER_NULL. But there is now an Assert in that path checking that you aren't hoping to get that behavior in a palloc-based hash table. Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions, which were not being used anywhere anymore, and were surely too ugly and unsafe to want to see revived again.	2005-05-29 04:23:07 +00:00
Tom Lane	32e8fc4a28	Arrange to cache fmgr lookup information for an index's access method routines in the index's relcache entry, instead of doing a fresh fmgr_info on every index access. We were already doing this for the index's opclass support functions; not sure why we didn't think to do it for the AM functions too. This supersedes the former method of caching (only) amgettuple in indexscan scan descriptors; it's an improvement because the function lookup can be amortized across multiple statements instead of being repeated for each statement. Even though lookup for builtin functions is pretty cheap, this seems to drop a percent or two off some simple benchmarks.	2005-05-27 23:31:21 +00:00
Bruce Momjian	b492c3accc	Add parentheses to macros when args are used in computations. Without them, the executation behavior could be unexpected.	2005-05-25 21:40:43 +00:00
Bruce Momjian	6dc7760ac3	Add support for wal_fsync_writethrough for Darwin, and restructure the code to better handle writethrough. Chris Campbell	2005-05-20 14:53:26 +00:00
Tom Lane	ff0c143a3b	Make a comment pgindent-proof, per suggestion from Alvaro.	2005-05-19 23:58:51 +00:00
Tom Lane	ee3b71f6bc	Split the shared-memory array of PGPROC pointers out of the sinval communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.	2005-05-19 21:35:48 +00:00
Neil Conway	c891e05f26	Cleanup GiST header files. Since GiST extensions are often written as external projects, we should be careful about what parts of the GiST API are considered implementation details, and which are part of the public API. Therefore, I've moved internal-only declarations into gist_private.h -- future backward-incompatible changes to gist.h should be made with care, to avoid needlessly breaking external GiST extensions. Also did some related header cleanup: remove some unnecessary #includes from gist.h, and remove some unused definitions: isAttByVal(), _gistdump(), and GISTNStrategies.	2005-05-17 03:34:18 +00:00
Neil Conway	eda6dd32d1	GiST improvements: - make sure we always invoke user-supplied GiST methods in a short-lived memory context. This means the backend isn't exposed to any memory leaks that be in those methods (in fact, it is probably a net loss for most GiST methods to bother manually freeing memory now). This also means we can do away with a lot of ugly manual memory management in the GiST code itself. - keep the current page of a GiST index scan pinned, rather than doing a ReadBuffer() for each tuple produced by the scan. Since ReadBuffer() is expensive, this is a perf. win - implement dead tuple killing for GiST indexes (which is easy to do, now that we keep a pin on the current scan page). Now all the builtin indexes implement dead tuple killing. - cleanup a lot of ugly code in GiST	2005-05-17 00:59:30 +00:00
Tom Lane	2ef172a2a4	Fix latent bug in ExecSeqRestrPos: it leaves the plan node's result slot in an inconsistent state. (This is only latent because in reality ExecSeqRestrPos is dead code at the moment ... but someday maybe it won't be.) Add some comments about what the API for plan node mark/restore actually is, because it's not immediately obvious.	2005-05-15 21:19:55 +00:00
Neil Conway	eb0d00a9be	Various style cleanups for GiST; no changes to functionality.	2005-05-15 04:08:29 +00:00
Neil Conway	3140437495	This patch refactors away some duplicated code in the index AM build methods: they all invoke UpdateStats() since they have computed the number of heap tuples, so I created a function in catalog/index.c that each AM now calls.	2005-05-11 06:24:55 +00:00
Neil Conway	f38e413b20	Code cleanup: in C89, there is no point casting the first argument to memset() or MemSet() to a char . For one, memset()'s first argument is a void , and further void * can be implicitly coerced to/from any other pointer type.	2005-05-11 01:26:02 +00:00
Bruce Momjian	35e1651508	Back out check for unreferenced files. Heikki Linnakangas	2005-05-10 22:27:30 +00:00
Neil Conway	dc5ebcfcce	Fix typo in comment.	2005-05-10 05:15:07 +00:00
Tom Lane	30f540be43	Repair very-low-probability race condition between relation extension and VACUUM: in the interval between adding a new page to the relation and formatting it, it was possible for VACUUM to come along and decide it should format the page too. Though not harmful in itself, this would cause data loss if a third transaction were able to insert tuples into the vacuumed page before the original extender got control back.	2005-05-07 21:32:24 +00:00
Tom Lane	4cd4ed0cc2	Fix case in which a debug printout would print already-pfreed data.	2005-05-07 18:14:25 +00:00
Tom Lane	278bd0cc22	For some reason access/tupmacs.h has been #including utils/memutils.h, which is neither needed by nor related to that header. Remove the bogus inclusion and instead include the header in those C files that actually need it. Also fix unnecessary inclusions and bad inclusion order in tsearch2 files.	2005-05-06 17:24:55 +00:00
Tom Lane	126eaef651	Clean up MultiXactIdExpand's API by separating out the case where we are creating a new MultiXactId from two regular XIDs. The original coding was unnecessarily complicated and didn't save any code anyway.	2005-05-03 19:42:41 +00:00
Bruce Momjian	76668e6eb4	Check the file system on postmaster startup and report any unreferenced files in the server log. Heikki Linnakangas	2005-05-02 18:26:54 +00:00
Tom Lane	6c412f0605	Change CREATE TYPE to require datatype output and send functions to have only one argument. (Per recent discussion, the option to accept multiple arguments is pretty useless for user-defined types, and would be a likely source of security holes if it was used.) Simplify call sites of output/send functions to not bother passing more than one argument.	2005-05-01 18:56:19 +00:00
Tom Lane	93b2477278	Use the standard lock manager to establish priority order when there is contention for a tuple-level lock. This solves the problem of a would-be exclusive locker being starved out by an indefinite succession of share-lockers. Per recent discussion with Alvaro.	2005-04-30 19:03:33 +00:00
Tom Lane	3a694bb0a1	Restructure LOCKTAG as per discussions of a couple months ago. Essentially, we shoehorn in a lockable-object-type field by taking a byte away from the lockmethodid, which can surely fit in one byte instead of two. This allows less artificial definitions of all the other fields of LOCKTAG; we can get rid of the special pg_xactlock pseudo-relation, and also support locks on individual tuples and general database objects (including shared objects). None of those possibilities are actually exploited just yet, however. I removed pg_xactlock from pg_class, but did not force initdb for that change. At this point, relkind 's' (SPECIAL) is unused and could be removed entirely.	2005-04-29 22:28:24 +00:00
Tom Lane	bedb78d386	Implement sharable row-level locks, and use them for foreign key references to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. When more than one transaction is holding a shared lock on a particular row, we create a MultiXactId representing that set of transactions and store its ID in the row's XMAX. This scheme allows an effectively unlimited number of row locks, just as we did before, while not costing any extra overhead except when a shared lock actually has to be shared. Still TODO: use the regular lock manager to control the grant order when multiple backends are waiting for a row lock. Alvaro Herrera and Tom Lane.	2005-04-28 21:47:18 +00:00
Tom Lane	19d127548c	Add comment about checkpoint panic behavior during shutdown, per suggestion from Qingqing Zhou.	2005-04-23 18:49:54 +00:00
Tom Lane	3842892492	Recent changes got the sense of the notnull bit backwards in the 2.0 protocol output routines. Mea culpa :-(. Per report from Kris Jurka.	2005-04-23 17:45:35 +00:00
Bruce Momjian	1a6ad669fb	Fix comment typo.	2005-04-17 03:04:29 +00:00
Tom Lane	5f0a974ea9	Reduce PANIC to ERROR in several xlog routines that are used in both critical and noncritical contexts (an example of noncritical being post-checkpoint removal of dead xlog segments). In the critical cases the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC anyway, and in the noncritical cases we shouldn't let an error take down the entire database. Arguably there should be no explicit PANIC errors in this module, only more START/END_CRIT_SECTION calls, but I didn't go that far. (Yet.)	2005-04-15 22:19:48 +00:00
Tom Lane	61b861421b	Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior when recycling a large number of xlog segments during checkpoint. The former behavior searched from the same start point each time, requiring O(checkpoint_segments^2) stat() calls to relocate all the segments. Instead keep track of where we stopped last time through.	2005-04-15 18:48:10 +00:00
Tom Lane	8e14408028	Make equalTupleDescs() compare attlen/attbyval/attalign rather than assuming comparison of atttypid is sufficient. In a dropped column atttypid will be 0, and we'd better check the physical-storage data to make sure the tupdescs are physically compatible. I do not believe there is a real risk before 8.0, since before that we only used this routine to compare successive states of the tupdesc for a particular relation. But 8.0's typcache.c might be comparing arbitrary tupdescs so we'd better play it safer.	2005-04-14 22:34:48 +00:00
Tom Lane	162bd08b3f	Completion of project to use fixed OIDs for all system catalogs and indexes. Replace all heap_openr and index_openr calls by heap_open and index_open. Remove runtime lookups of catalog OID numbers in various places. Remove relcache's support for looking up system catalogs by name. Bulky but mostly very boring patch ...	2005-04-14 20:03:27 +00:00
Tom Lane	2193a856a2	Simplify initdb-time assignment of OIDs as I proposed yesterday, and avoid encroaching on the 'user' range of OIDs by allowing automatic OID assignment to use values below 16k until we reach normal operation. initdb not forced since this doesn't make any incompatible change; however a lot of stuff will have different OIDs after your next initdb.	2005-04-13 18:54:57 +00:00
Tom Lane	c3294f1cbf	Fix interaction between materializing holdable cursors and firing deferred triggers: either one can create more work for the other, so we have to loop till it's all gone. Per example from andrew@supernews. Add a regression test to help spot trouble in this area in future.	2005-04-11 19:51:16 +00:00
Tom Lane	ad161bcc8a	Merge Resdom nodes into TargetEntry nodes to simplify code and save a few palloc's. I also chose to eliminate the restype and restypmod fields entirely, since they are redundant with information stored in the node's contained expression; re-examining the expression at need seems simpler and more reliable than trying to keep restype/restypmod up to date. initdb forced due to change in contents of stored rules.	2005-04-06 16:34:07 +00:00
Tom Lane	47888fe842	First phase of OUT-parameters project. We can now define and use SQL functions with OUT parameters. The various PLs still need work, as does pg_dump. Rudimentary docs and regression tests included.	2005-03-31 22:46:33 +00:00
Tom Lane	8c85a34a3b	Officially decouple FUNC_MAX_ARGS from INDEX_MAX_KEYS, and set the former to 100 by default. Clean up some of the less necessary dependencies on FUNC_MAX_ARGS; however, the biggie (FunctionCallInfoData) remains.	2005-03-29 03:01:32 +00:00
Tom Lane	70c9763d48	Convert oidvector and int2vector into variable-length arrays. This change saves a great deal of space in pg_proc and its primary index, and it eliminates the former requirement that INDEX_MAX_KEYS and FUNC_MAX_ARGS have the same value. INDEX_MAX_KEYS is still embedded in the on-disk representation (because it affects index tuple header size), but FUNC_MAX_ARGS is not. I believe it would now be possible to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet. There are still a lot of vestigial references to FUNC_MAX_ARGS, which I will clean up in a separate pass. However, getting rid of it altogether would require changing the FunctionCallInfoData struct, and I'm not sure I want to buy into that.	2005-03-29 00:17:27 +00:00
Tom Lane	119191609c	Remove dead push/pop rollback code. Vadim once planned to implement transaction rollback via UNDO but I think that's highly unlikely to happen, so we may as well remove the stubs. (Someday we ought to rip out the stub xxx_undo routines, too.) Per Alvaro.	2005-03-28 01:50:34 +00:00
Tom Lane	bf3dbb5881	First steps towards index scans with heap access decoupled from index access: define new index access method functions 'amgetmulti' that can fetch multiple TIDs per call. (The functions exist but are totally untested as yet.) Since I was modifying pg_am anyway, remove the no-longer-needed 'rel' parameter from amcostestimate functions, and also remove the vestigial amowner column that was creating useless work for Alvaro's shared-object-dependencies project. Initdb forced due to changes in pg_am.	2005-03-27 23:53:05 +00:00
Tom Lane	617dd33b6e	Eliminate duplicate hasnulls bit testing in index tuple access, and clean up itup.h a little bit.	2005-03-27 18:38:27 +00:00
Bruce Momjian	b1f57d88f5	Change Win32 O_SYNC method to O_DSYNC because that is what the method currently does. This is now the default Win32 wal sync method because we perfer o_datasync to fsync. Also, change Win32 fsync to a new wal sync method called fsync_writethrough because that is the behavior of _commit, which is what is used for fsync on Win32. Backpatch to 8.0.X.	2005-03-24 04:36:20 +00:00
Tom Lane	94e03330cb	Create a routine PageIndexMultiDelete() that replaces a loop around PageIndexTupleDelete() with a single pass of compactification --- logic mostly lifted from PageRepairFragmentation. I noticed while profiling that a VACUUM that's cleaning up a whole lot of deleted tuples would spend as much as a third of its CPU time in PageIndexTupleDelete; not too surprising considering the loop method was roughly O(N^2) in the number of tuples involved.	2005-03-22 06:17:03 +00:00
Tom Lane	ee4ddac137	Convert index-related tuple handling routines from char 'n'/' ' to bool convention for isnull flags. Also, remove the useless InsertIndexResult return struct from index AM aminsert calls --- there is no reason for the caller to know where in the index the tuple was inserted, and we were wasting a palloc cycle per insert to deliver this uninteresting value (plus nontrivial complexity in some AMs). I forced initdb because of the change in the signature of the aminsert routines, even though nothing really looks at those pg_proc entries...	2005-03-21 01:24:04 +00:00
Neil Conway	fe7015f5e8	Change the return value of HeapTupleSatisfiesUpdate() to be an enum, rather than an integer, and fix the associated fallout. From Alvaro Herrera.	2005-03-20 23:40:34 +00:00
Tom Lane	354049c709	Remove unnecessary calls of FlushRelationBuffers: there is no need to write out data that we are about to tell the filesystem to drop. smgr_internal_unlink already had a DropRelFileNodeBuffers call to get rid of dead buffers without a write after it's no longer possible to roll back the deleting transaction. Adding a similar call in smgrtruncate simplifies callers and makes the overall division of labor clearer. This patch removes the former behavior that VACUUM would write all dirty buffers of a relation unconditionally.	2005-03-20 22:00:54 +00:00
Tom Lane	f97aebd162	Revise TupleTableSlot code to avoid unnecessary construction and disassembly of tuples when passing data up through multiple plan nodes. A slot can now hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting of Datum/isnull arrays. Upper plan levels can usually just copy the Datum arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr() calls to extract the data again. This work extends Atsushi Ogawa's earlier patch, which provided the key idea of adding Datum arrays to TupleTableSlots. (I believe however that something like this was foreseen way back in Berkeley days --- see the old comment on ExecProject.) A test case involving many levels of join of fairly wide tables (about 80 columns altogether) showed about 3x overall speedup, though simple queries will probably not be helped very much. I have also duplicated some code in heaptuple.c in order to provide versions of heap_formtuple and friends that use "bool" arrays to indicate null attributes, instead of the old convention of "char" arrays containing either 'n' or ' '. This provides a better match to the convention used by ExecEvalExpr. While I have not made a concerted effort to get rid of uses of the old routines, I think they should be deprecated and eventually removed.	2005-03-16 21:38:10 +00:00
Tom Lane	a9b05bdc83	Avoid O(N^2) overhead in repeated nocachegetattr calls when columns of a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut isn't usable (due to nulls and/or varlena columns). To do this, cache Datums extracted from a tuple in the associated TupleTableSlot. Also some code cleanup in and around the TupleTable handling. Atsushi Ogawa with some kibitzing by Tom Lane.	2005-03-14 04:41:13 +00:00
Tom Lane	a52b4fb131	Adjust creation/destruction of TupleDesc data structure to reduce the number of palloc calls. This has a salutory impact on plpgsql operations with record variables (which create and destroy tupdescs constantly) and probably helps a bit in some other cases too.	2005-03-07 04:42:17 +00:00
Tom Lane	4aefe75553	Remove some no-longer-needed kluges for bootstrapping, in particular the AMI_OVERRIDE flag. The fact that TransactionLogFetch treats BootstrapTransactionId as always committed is sufficient to make bootstrap work, and getting rid of extra tests in heavily used code paths seems like a win. The files produced by initdb are demonstrably the same after this change.	2005-02-20 21:46:50 +00:00
Tom Lane	60b2444cc3	Add code to prevent transaction ID wraparound by enforcing a safe limit in GetNewTransactionId(). Since the limit value has to be computed before we run any real transactions, this requires adding code to database startup to scan pg_database and determine the oldest datfrozenxid. This can conveniently be combined with the first stage of an attack on the problem that the 'flat file' copies of pg_shadow and pg_group are not properly updated during WAL recovery. The code I've added to startup resides in a new file src/backend/utils/init/flatfiles.c, and it is responsible for rewriting the flat files as well as initializing the XID wraparound limit value. This will eventually allow us to get rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add a trigger to pg_database.	2005-02-20 02:22:07 +00:00
Bruce Momjian	7c44e57331	Move plpgsql DEBUG from DEBUG2 to DEBUG1 because it is a user-requested DEBUG. Fix a few places where DEBUG1 crept in that should have been DEBUG2.	2005-02-12 23:53:42 +00:00
Tom Lane	12179c99b1	Marginal hack to merge adjacent ReleaseBuffer/ReadBuffer calls into ReleaseAndReadBuffer during GIST index searches. We already did this in btree and rtree, might as well do it here too.	2005-02-05 19:38:58 +00:00
Neil Conway	a885ecd6ef	Change heap_modifytuple() to require a TupleDesc rather than a Relation. Patch from Alvaro Herrera, minor editorializing by Neil Conway.	2005-01-27 23:24:11 +00:00
Tom Lane	0ffe9f7946	Fix memory leak in rtdosplit, per report from Clive Page.	2005-01-24 02:47:26 +00:00
Neil Conway	b4297c177c	This patch makes some improvements to the rtree index implementation: (1) Keep a pin on the scan's current buffer and mark buffer. This avoids the need to do a ReadBuffer() for each tuple produced by the scan. Since ReadBuffer() is expensive, this is a significant win. (2) Convert a ReleaseBuffer(); ReadBuffer() pair into ReleaseAndReadBuffer(). Surely not a huge win, but it saves a lock acquire/release... (3) Remove a bunch of duplicated code in rtget.c; make rtnext() handle both the "initial result" and "subsequent result" cases. (4) Add support for index tuple killing (5) Remove rtscancache(): it is dead code, for the same reason that gistscancache() is dead code (an index scan ought not be invoked with NoMovementScanDirection). The end result is about a 10% improvement in rtree index scan perf, according to contrib/rtree_gist/bench.	2005-01-18 23:25:55 +00:00
Tom Lane	0ce4d56924	Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This is the minimum required fix. I want to look next at taking advantage of it by simplifying the message semantics in the shared inval message queue, but that part can be held over for 8.1 if it turns out too ugly.	2005-01-10 20:02:24 +00:00
Bruce Momjian	2daed8c5b3	Update copyrights that were missed.	2005-01-01 05:43:09 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	bfa5f30481	Awhile back I added some code to StartupCLOG() to forcibly zero out the remainder of the current clog page during system startup. While this was a good idea, it turns out the code fails if nextXid is exactly at a page boundary, because we won't have created the "current" clog page yet in that case. Since the page will be correctly zeroed when we execute the first transaction on it, the solution is just to do nothing when exactly at a page boundary. Per trouble report from Dave Hartwig.	2004-12-22 18:45:49 +00:00
Tom Lane	ff5a354ece	Fix is-it-time-for-a-checkpoint logic so that checkpoint_segments can usefully be larger than 255. Per gripe from Simon Riggs.	2004-12-17 00:10:36 +00:00
Tom Lane	c3d6c7d8f9	Calculation of keys_are_unique flag was wrong for cases involving redundant cross-datatype comparisons. Per example from Merlin Moncure.	2004-12-15 19:16:39 +00:00
Tom Lane	5374d097de	Change planner to use the current true disk file size as its estimate of a relation's number of blocks, rather than the possibly-obsolete value in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly to arrive at a hopefully more accurate number of rows. When pg_class contains 0/0, estimate a tuple width from the column datatypes and divide that into current file size to estimate number of rows. This improved methodology allows us to jettison the ancient hacks that put bogus default values into pg_class when a table is first created. Also, per a suggestion from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value it puts into pg_class.reltuples to try to represent the mean tuple density instead of the minimal density that actually prevails just after VACUUM. These changes alter the plans selected for certain regression tests, so update the expected files accordingly. (I removed join_1.out because it's not clear if it still applies; we can add back any variant versions as they are shown to be needed.)	2004-12-01 19:00:56 +00:00
Tom Lane	37d693033d	Minor adjustment of message style.	2004-11-17 16:26:59 +00:00
Neil Conway	5d1dd2bc55	Micro-optimization of markpos() and restrpos() in btree and hash indexes. Rather than using ReadBuffer() to increment the reference count on an already-pinned buffer, we should use IncrBufferRefCount() as it is faster and does not require acquiring the BufMgrLock.	2004-11-17 03:13:38 +00:00
Neil Conway	b25d23e1e6	Don't allow pg_start_backup() to be invoked if archive_command has not been defined. Patch from Gavin Sherry, editorializing by Neil Conway.	2004-11-17 02:22:54 +00:00
Neil Conway	a236dd9536	There is no need for ReadBuffer() call sites to check that the returned buffer is valid, as ReadBuffer() will elog on error. Most of the call sites of ReadBuffer() got this right, but this patch fixes those call sites that did not.	2004-11-14 02:04:14 +00:00
Neil Conway	4d0f669f3c	Remove obsolete comment from btbuild() and hashbuild(): we no longer use a global variable to control building indexes.	2004-11-11 00:32:50 +00:00
Peter Eisentraut	0ed3c7665e	Small message clarifications	2004-11-05 17:11:34 +00:00
Tom Lane	88868d4fbc	Change COMMIT back to the old behavior of emitting command tag COMMIT, not ROLLBACK, for the case of COMMIT outside a transaction block. Alvaro Herrera	2004-10-30 20:44:43 +00:00
Tom Lane	23f264d125	Rearrange order of pre-commit operations: must close cursors before doing ON COMMIT actions. Per bug report from Michael Guerin.	2004-10-29 22:19:53 +00:00
Tom Lane	ee69be44d5	Add DEBUG1-level logging of checkpoint start and end. Also, reduce the 'recycled log files' and 'removed log files' messages from DEBUG1 to DEBUG2, replacing them with a count of files added/removed/recycled in the checkpoint end message, as per suggestion from Simon Riggs.	2004-10-29 00:16:08 +00:00
Tom Lane	83cd2d8b0f	Make heap_fetch API more consistent by having the buffer remain pinned in all cases when keep_buf = true. This allows ANALYZE's inner loop to use heap_release_fetch, which saves multiple buffer lookups for the same page and avoids overestimation of cost by the vacuum cost mechanism.	2004-10-26 16:05:03 +00:00
Tom Lane	fb22b32095	Allow functions returning void or cstring to appear in FROM clause, to make life cushy for the JDBC driver. Centralize the decision-making that affects this by inventing a get_type_func_class() function, rather than adding special cases in half a dozen places.	2004-10-20 16:04:50 +00:00
Tom Lane	fdd13f1568	Give the ResourceOwner mechanism full responsibility for releasing buffer pins at end of transaction, and reduce AtEOXact_Buffers to an Assert cross-check that this was done correctly. When not USE_ASSERT_CHECKING, AtEOXact_Buffers is a complete no-op. This gets rid of an O(NBuffers) bottleneck during transaction commit/abort, which recent testing has shown becomes significant above a few tens of thousands of shared buffers.	2004-10-16 18:57:26 +00:00
Tom Lane	9ffc8ed58b	Repair possible failure to update hint bits back to disk, per http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.	2004-10-15 22:40:29 +00:00
Bruce Momjian	5c267325ec	Add 'int' cast for getpid() because some Solaris releases return long for getpid().	2004-10-14 20:23:46 +00:00
Peter Eisentraut	0fd37839d9	Message style revisions	2004-10-12 21:54:45 +00:00
Bruce Momjian	67608a393b	Make getpid() use %d consistently for printing.	2004-10-09 02:46:42 +00:00
Bruce Momjian	a5d7ba773d	Adjust comments previously moved to column 1 by pgident.	2004-10-07 15:21:58 +00:00
Tom Lane	4c77cbb272	PortalRun must guard against the possibility that the portal it's running contains VACUUM or a similar command that will internally start and commit transactions. In such a case, the original caller values of CurrentMemoryContext and CurrentResourceOwner will point to objects that will be destroyed by the internal commit. We must restore these pointers to point to the newly-manufactured transaction context and resource owner, rather than possibly pointing to deleted memory. Also tweak xact.c so that AbortTransaction and AbortSubTransaction forcibly restore a sane value for CurrentResourceOwner, much as they have always done for CurrentMemoryContext. I'm not certain this is necessary but I'm feeling paranoid today. Responds to Sean Chittenden's bug report of 4-Oct.	2004-10-04 21:52:15 +00:00
Tom Lane	4c5e810fcd	Code review for NOWAIT patch: downgrade NOWAIT from fully reserved keyword to unreserved keyword, use ereport not elog, assign a separate error code for 'could not obtain lock' so that applications will be able to detect that case cleanly.	2004-10-01 16:40:05 +00:00
Tom Lane	d2af5f8a3e	Adjust index locking rules as per my proposal of earlier today. You now are supposed to take some kind of lock on an index whenever you are going to access the index contents, rather than relying only on a lock on the parent table.	2004-09-30 23:21:26 +00:00
Neil Conway	0ed07d49d5	Code cleanup: don't bother casting the argument to pfree() to void * from another pointer type. Per C89, this is unnecessary, and it is common practice throughout the rest of the tree anyway.	2004-09-27 04:01:23 +00:00
Tom Lane	054b78ba38	Now that xmax and cmin are distinct fields again, we should zero xmax when creating a new tuple. This is just for debugging sanity, though, since nothing should be paying any attention to xmax when the HEAP_XMAX_INVALID bit is set.	2004-09-17 18:09:55 +00:00
Tom Lane	257cccbe5e	Add some marginal tweaks to eliminate memory leakages associated with subtransactions. Trivial subxacts (such as a plpgsql exception block containing no database access) now demonstrably leak zero bytes.	2004-09-16 20:17:49 +00:00
Tom Lane	86fff990b2	RecentXmin is too recent to use as the cutoff point for accessing pg_subtrans --- what we need is the oldest xmin of any snapshot in use in the current top transaction. Introduce a new variable TransactionXmin to play this role. Fixes intermittent regression failure reported by Neil Conway.	2004-09-16 18:35:23 +00:00
Tom Lane	8f9f198603	Restructure subtransaction handling to reduce resource consumption, as per recent discussions. Invent SubTransactionIds that are managed like CommandIds (ie, counter is reset at start of each top transaction), and use these instead of TransactionIds to keep track of subtransaction status in those modules that need it. This means that a subtransaction does not need an XID unless it actually inserts/modifies rows in the database. Accordingly, don't assign it an XID nor take a lock on the XID until it tries to do that. This saves a lot of overhead for subtransactions that are only used for error recovery (eg plpgsql exceptions). Also, arrange to release a subtransaction's XID lock as soon as the subtransaction exits, in both the commit and abort cases. This avoids holding many unique locks after a long series of subtransactions. The price is some additional overhead in XactLockTableWait, but that seems acceptable. Finally, restructure the state machine in xact.c to have a more orthogonal set of states for subtransactions.	2004-09-16 16:58:44 +00:00
Tom Lane	b2c4071299	Redesign query-snapshot timing so that volatile functions in READ COMMITTED mode see a fresh snapshot for each command in the function, rather than using the latest interactive command's snapshot. Also, suppress fresh snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE functions, instead using the snapshot taken for the most closely nested regular query. (This behavior is only sane for read-only functions, so the patch also enforces that such functions contain only SELECT commands.) As per my proposal of 6-Sep-2004; I note that I floated essentially the same proposal on 19-Jun-2002, but that discussion tailed off without any action. Since 8.0 seems like the right place to be taking possibly nontrivial backwards compatibility hits, let's get it done now.	2004-09-13 20:10:13 +00:00
Tom Lane	493f72606b	Renumber SnapshotNow and the other special snapshot codes so that ((Snapshot) NULL) can no longer be confused with a valid snapshot, as per my recent suggestion. Define a macro InvalidSnapshot for 0. Use InvalidSnapshot instead of SnapshotAny as the do-nothing special case for heap_update and heap_delete crosschecks; this seems a little cleaner even though the behavior is really the same.	2004-09-11 18:28:34 +00:00
Tom Lane	b339d1fff6	Fire non-deferred AFTER triggers immediately upon query completion, rather than when returning to the idle loop. This makes no particular difference for interactively-issued queries, but it makes a big difference for queries issued within functions: trigger execution now occurs before the calling function is allowed to proceed. This responds to numerous complaints about nonintuitive behavior of foreign key checking, such as http://archives.postgresql.org/pgsql-bugs/2004-09/msg00020.php, and appears to be required by the SQL99 spec. Also take the opportunity to simplify the data structures used for the pending-trigger list, rename them for more clarity, and squeeze out a bit of space.	2004-09-10 18:40:09 +00:00
Tom Lane	23645f0582	Fix incorrect ordering of smgr cleanup relative to buffer pin cleanup during transaction abort. Add a regression test case to catch related mistakes in future. Alvaro Herrera and Tom Lane.	2004-09-06 17:56:33 +00:00
Tom Lane	e32bba202d	Downgrade LOG messages to DEBUG1 for normal recycling of xlog, clog, subtrans segments. Per Greg Mullane and Chris K-L.	2004-09-06 03:04:27 +00:00
Tom Lane	64cb889106	Ensure that the remainder of the current pg_clog page is zeroed during startup, just to be sure that there's no leftover junk there.	2004-08-30 19:00:42 +00:00
Tom Lane	9cf4eaa00b	Fix failure to advance nextXID beyond subtransactions whose XIDs appear only within COMMIT or ABORT records.	2004-08-30 19:00:03 +00:00
Bruce Momjian	15d3f9f6b7	Another pgindent run with lib typedefs added.	2004-08-30 02:54:42 +00:00
Tom Lane	50742aed68	Add WAL logging for CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. Fix TablespaceCreateDbspace() to be able to create a dummy directory in place of a dropped tablespace's symlink. This eliminates the open problem of a PANIC during WAL replay when a replayed action attempts to touch a file in a since-deleted tablespace. It also makes for a significant improvement in the usability of PITR replay.	2004-08-29 21:08:48 +00:00
Tom Lane	0ffe11abd3	Widen xl_len field of XLogRecord header to 32 bits, so that we'll have a more tolerable limit on the number of subtransactions or deleted files in COMMIT and ABORT records. Buy back the extra space by eliminating the xl_xact_prev field, which isn't being used for anything and is rather unlikely ever to be used for anything. This does not force initdb, but you do need to do pg_resetxlog if you want to upgrade an existing 8.0 installation without initdb.	2004-08-29 16:34:48 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	f78ecbf20e	Now that TransactionIdDidAbort doesn't think it should try to modify pg_clog, there's no reason to do abort marking of subtransactions in a nonintuitive order.	2004-08-28 22:04:12 +00:00
Tom Lane	7531d2fd85	Add missing Assert to make TransactionIdDidAbort more consistent with TransactionIdDidCommit.	2004-08-28 21:58:59 +00:00
Tom Lane	1c72d0dec1	Fix relcache to account properly for subtransaction status of 'new' relcache entries. Also, change TransactionIdIsCurrentTransactionId() so that if consulted during transaction abort, it will not say that the aborted xact is still current. (It would be better to ensure that it's never called at all during abort, but I'm not sure we can easily guarantee that.) In combination, these fix a crash we have seen occasionally during parallel regression tests of 8.0.	2004-08-28 20:31:44 +00:00
Tom Lane	f444dafab0	Can't truncate pg_subtrans during a recovery checkpoint --- subtrans module isn't fully initialized yet.	2004-08-28 18:18:03 +00:00
Tom Lane	fe455ee1d4	Revise ResourceOwner code to avoid accumulating ResourceOwner objects for every command executed within a transaction. For long transactions this was a significant memory leak. Instead, we can delete a portal's or subtransaction's ResourceOwner immediately, if we physically transfer the information about its locks up to the parent owner. This does not fully solve the leak problem; we need to do something about counting multiple acquisitions of the same lock in order to fix it. But it's a necessary step along the way.	2004-08-25 18:43:43 +00:00
Tom Lane	4dbb880d3c	Rearrange pg_subtrans handling as per recent discussion. pg_subtrans updates are no longer WAL-logged nor even fsync'd; we do not need to, since after a crash no old pg_subtrans data is needed again. We truncate pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is refactored a little bit to separate out the necessary decisions.	2004-08-23 23:22:45 +00:00
Tom Lane	f009c316ba	Tweak code so that pg_subtrans is never consulted for XIDs older than RecentXmin (== MyProc->xmin). This ensures that it will be safe to truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate any fear of bloat. Along the way, eliminate SubTransXidsHaveCommonAncestor, which isn't really needed and could not give a trustworthy result anyway under the lookback restriction. In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has been dead code since 2001 and seems unlikely to ever be resurrected.	2004-08-22 02:41:58 +00:00
Tom Lane	19cd31b068	Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initial value of 'start' could be past the end of the page, if the page was split by some concurrent inserting process since we visited it. In this situation the code could look at bogus entries and possibly find a match (since after all those entries still contain what they had before the split). This would lead to 'specified item offset is too large' followed by 'PANIC: failed to add item to the page', as reported by Joe Conway for scenarios involving heavy concurrent insertion activity.	2004-08-17 23:15:33 +00:00
Tom Lane	1a3de15a3a	Dept. of further reflection: I looked around to see if any other callers of XLogInsert had the same sort of checkpoint interlock problem as RecordTransactionCommit, and indeed I found some. Btree index build and ALTER TABLE SET TABLESPACE write data outside the friendly confines of the buffer manager, and therefore they have to take their own responsibility for checkpoint interlock. The easiest solution seems to be to force smgrimmedsync at the end of the index build or table copy, even when the operation is being WAL-logged. This is sufficient since the new index or table will be of interest to no one if we don't get as far as committing the current transaction.	2004-08-15 23:44:46 +00:00
Bruce Momjian	10249abfa1	Cleanup Win32 COPY handling, and move archive examples to SGML.	2004-08-12 19:03:44 +00:00
Bruce Momjian	43ea65a0dc	Add mention of "WIN32" COPY.	2004-08-12 18:34:45 +00:00
Bruce Momjian	6525b42b10	Add make_native_path() because Win32 COPY is an internal CMD.EXE command and doesn't process forward slashes in the same way as external commands. Quoting the first argument to COPY does not convert forward to backward slashes, but COPY does properly process quoted forward slashes in the second argument. Win32 COPY works with quoted forward slashes in the first argument only if the current directory is the same as the directory of the first argument.	2004-08-12 18:32:52 +00:00
Tom Lane	3fdf649f4f	Fix failure to guarantee that a checkpoint will write out pg_clog updates for transaction commits that occurred just before the checkpoint. This is an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a reproducible test case to prove its existence.	2004-08-11 04:07:16 +00:00
Tom Lane	35f539b481	When expanding %p in archive_command or restore_command, translate slashes to backslashes #ifdef WIN32. This is to cope with the fact that Windows seems exceedingly unfriendly to slashes in shell commands, as per recent discussion.	2004-08-09 16:26:06 +00:00
Tom Lane	7dca975c5d	Add a comment about why we always replay backup blocks from WAL.	2004-08-08 03:22:08 +00:00
Tom Lane	fcbc438727	Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments and documentation to reference 8.0 instead of 7.5.	2004-08-04 21:34:35 +00:00
Tom Lane	b387d16f96	Make use of backup label/history files to control recovery properly.	2004-08-04 16:25:02 +00:00
Tom Lane	58c41712d5	Add functions pg_start_backup, pg_stop_backup to create backup label and history files as per recent discussion. While at it, remove pg_terminate_backend, since we have decided we do not have time during this release cycle to address the reliability concerns it creates. Split the 'Miscellaneous Functions' documentation section into 'System Information Functions' and 'System Administration Functions', which hopefully will draw the eyes of those looking for such things.	2004-08-03 20:32:36 +00:00
Tom Lane	a83c45c4c6	Fix misplacement of savepointLevel test, per report from Chris K-L.	2004-08-03 15:57:26 +00:00
Tom Lane	410b1dfb88	Update the in-code documentation about the transaction system. Move it into a README file instead of being in xact.c's header comment. Alvaro Herrera.	2004-08-01 20:57:59 +00:00
Tom Lane	5cc380f9a3	Error message style adjustments, per Alvaro Herrera.	2004-08-01 17:45:43 +00:00
Tom Lane	efcaf1e868	Some mop-up work for savepoints (nested transactions). Store a small number of active subtransaction XIDs in each backend's PGPROC entry, and use this to avoid expensive probes into pg_subtrans during TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on modules to get control at subxact start/end. (This is deliberately not compatible with the former API, since any uses of that API probably need manual review anyway.) Add basic reference documentation for SAVEPOINT and related commands. Minor other cleanups to check off some of the open issues for subtransactions. Alvaro Herrera and Tom Lane.	2004-08-01 17:32:22 +00:00
Tom Lane	beda4814c1	plpgsql does exceptions. There are still some things that need refinement; in particular I fear that the recognized set of error condition names probably has little in common with what Oracle recognizes. But it's a start.	2004-07-31 07:39:21 +00:00
Tom Lane	1bf3d61504	Fix subtransaction behavior for large objects, temp namespace, files, password/group files. Also allow read-only subtransactions of a read-write parent, but not vice versa. These are the reasonably noncontroversial parts of Alvaro's recent mop-up patch, plus further work on large objects to minimize use of the TopTransactionResourceOwner.	2004-07-28 14:23:31 +00:00
Tom Lane	cc813fc2b8	Replace nested-BEGIN syntax for subtransactions with spec-compliant SAVEPOINT/RELEASE/ROLLBACK-TO syntax. (Alvaro) Cause COMMIT of a failed transaction to report ROLLBACK instead of COMMIT in its command tag. (Tom) Fix a few loose ends in the nested-transactions stuff.	2004-07-27 05:11:48 +00:00
Tom Lane	acd907bfcc	Add cross-check that current timeline of pg_control is an ancestor of recovery_target_timeline --- otherwise there is no path from the backup to the requested timeline. This check was foreseen in the original discussion but I forgot to implement it.	2004-07-22 21:09:37 +00:00
Tom Lane	3dba9cb694	Add a check on file size as an additional safety check that a WAL file recovered from archive is not corrupt. It's not much but it will catch one common problem, viz out-of-disk-space. Also, force a WAL recovery scan when recovery.conf is present, even if pg_control shows a clean shutdown. This allows recovery with a tar backup that was taken with the postmaster shut down, as per complaint from Mark Kirkwood.	2004-07-22 20:18:40 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	9c7a765f02	Remove unportable use of strptime() to parse recovery target time spec. Instead use our own abstimein code, which is more flexible anyway.	2004-07-19 14:34:39 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	fe548629c5	Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.	2004-07-17 03:32:14 +00:00
Tom Lane	94d4d240bb	Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and shift support code into heapam.c accordingly. This is in service of soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to use this same record type for both heaps and indexes. Theoretically I should have forced initdb for this, but in practice there is no change in xlog contents because CVS tip will never really emit this record type anyhow...	2004-07-11 18:01:45 +00:00
Tom Lane	f5c798ee82	Fix no-longer-correct bit-pushing in TransactionIdSetStatus, per Alvaro.	2004-07-03 02:55:56 +00:00
Tom Lane	b6197fe069	Further review of xact.c state machine for nested transactions. Fix problems with starting subtransactions inside already-failed transactions. Clean up some comments.	2004-07-01 20:11:03 +00:00
Tom Lane	573a71a5da	Nested transactions. There is still much left to do, especially on the performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.	2004-07-01 00:52:04 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	950d047ec5	Give inet/cidr datatypes their own hash function that ignores the inet vs cidr type bit, the same as network_eq does. This is needed for hash joins and hash aggregation to work correctly on these types. Per bug report from Michael Fuhr, 2004-04-13. Also, improve hash function for int8 as suggested by Greg Stark.	2004-06-13 21:57:28 +00:00
Tom Lane	c541bb86e9	Infrastructure for I/O of composite types: arrange for the I/O routines of a composite type to get that type's OID as their second parameter, in place of typelem which is useless. The actual changes are mostly centralized in getTypeInputInfo and siblings, but I had to fix a few places that were fetching pg_type.typelem for themselves instead of using the lsyscache.c routines. Also, I renamed all the related variables from 'typelem' to 'typioparam' to discourage people from assuming that they necessarily contain array element types.	2004-06-06 00:41:28 +00:00
Tom Lane	c3a153afed	Tweak palloc/repalloc to allow zero bytes to be requested, as per recent proposal. Eliminate several dozen now-unnecessary hacks to avoid palloc(0). (It's likely there are more that I didn't find.)	2004-06-05 19:48:09 +00:00
Tom Lane	ae93e5fd6e	Make the world very nearly safe for composite-type columns in tables. 1. Solve the problem of not having TOAST references hiding inside composite values by establishing the rule that toasting only goes one level deep: a tuple can contain toasted fields, but a composite-type datum that is to be inserted into a tuple cannot. Enforcing this in heap_formtuple is relatively cheap and it avoids a large increase in the cost of running the tuptoaster during final storage of a row. 2. Fix some interesting problems in expansion of inherited queries that reference whole-row variables. We never really did this correctly before, but it's now relatively painless to solve by expanding the parent's whole-row Var into a RowExpr() selecting the proper columns from the child. If you dike out the preventive check in CheckAttributeType(), composite-type columns now seem to actually work. However, we surely cannot ship them like this --- without I/O for composite types, you can't get pg_dump to dump tables containing them. So a little more work still to do.	2004-06-05 01:55:05 +00:00
Tom Lane	8f2ea8b7b5	Resurrect heap_deformtuple(), this time implemented as a singly nested loop over the fields instead of a loop around heap_getattr. This is considerably faster (O(N) instead of O(N^2)) when there are nulls or varlena fields, since those prevent use of attcacheoff. Replace loops over heap_getattr with heap_deformtuple in situations where all or most of the fields have to be fetched, such as printtup and tuptoaster. Profiling done more than a year ago shows that this should be a nice win for situations involving many-column tables.	2004-06-04 20:35:21 +00:00
Tom Lane	921d749bd4	Adjust our timezone library to use pg_time_t (typedef'd as int64) in place of time_t, as per prior discussion. The behavior does not change on machines without a 64-bit-int type, but on machines with one, which is most, we are rid of the bizarre boundary behavior at the edges of the 32-bit-time_t range (1901 and 2038). The system will now treat times over the full supported timestamp range as being in your local time zone. It may seem a little bizarre to consider that times in 4000 BC are PST or EST, but this is surely at least as reasonable as propagating Gregorian calendar rules back that far. I did not modify the format of the zic timezone database files, which means that for the moment the system will not know about daylight-savings periods outside the range 1901-2038. Given the way the files are set up, it's not a simple decision like 'widen to 64 bits'; we have to actually think about the range of years that need to be supported. We should probably inquire what the plans of the upstream zic people are before making any decisions of our own.	2004-06-03 02:08:07 +00:00
Tom Lane	2095206de1	Adjust btree index build to not use shared buffers, thereby avoiding the locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.	2004-06-02 17:28:18 +00:00
Tom Lane	e674707968	Minor code rationalization: FlushRelationBuffers just returns void, rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.	2004-05-31 19:24:05 +00:00
Tom Lane	9b178555fc	Per previous discussions, get rid of use of sync(2) in favor of explicitly fsync'ing every (non-temp) file we have written since the last checkpoint. In the vast majority of cases, the burden of the fsyncs should fall on the bgwriter process not on backends. (To this end, we assume that an fsync issued by the bgwriter will force out blocks written to the same file by other processes using other file descriptors. Anyone have a problem with that?) This makes the world safe for WIN32, which ain't even got sync(2), and really makes the world safe for Unixen as well, because sync(2) never had the semantics we need: it offers no way to wait for the requested I/O to finish. Along the way, fix a bug I recently introduced in xlog recovery: file truncation replay failed to clear bufmgr buffers for the dropped blocks, which could result in 'PANIC: heap_delete_redo: no block' later on in xlog replay.	2004-05-31 03:48:10 +00:00
Neil Conway	72b6ad6313	Use the new List API function names throughout the backend, and disable the list compatibility API by default. While doing this, I decided to keep the llast() macro around and introduce llast_int() and llast_oid() variants.	2004-05-30 23:40:41 +00:00
Tom Lane	076a055acf	Separate out bgwriter code into a logically separate module, rather than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.	2004-05-29 22:48:23 +00:00
Tom Lane	1a321f26d8	Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.	2004-05-28 05:13:32 +00:00
Tom Lane	16974ee910	Get rid of the former rather baroque mechanism for propagating the values of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just to make them all grab the values out of shared memory during startup. This helps to decouple the postmaster from checkpoint execution, which I need since I'm intending to let the bgwriter do it instead, and it also fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at all AFAICS. (Doesn't give me a lot of faith in the amount of testing that port has gotten.)	2004-05-27 17:12:57 +00:00
Neil Conway	d0b4399d81	Reimplement the linked list data structure used throughout the backend. In the past, we used a 'Lispy' linked list implementation: a "list" was merely a pointer to the head node of the list. The problem with that design is that it makes lappend() and length() linear time. This patch fixes that problem (and others) by maintaining a count of the list length and a pointer to the tail node along with each head node pointer. A "list" is now a pointer to a structure containing some meta-data about the list; the head and tail pointers in that structure refer to ListCell structures that maintain the actual linked list of nodes. The function names of the list API have also been changed to, I hope, be more logically consistent. By default, the old function names are still available; they will be disabled-by-default once the rest of the tree has been updated to use the new API names.	2004-05-26 04:41:50 +00:00
Tom Lane	4d86ae4260	For multi-table ANALYZE, use per-table transactions when possible (ie, when not inside a transaction block), so that we can avoid holding locks longer than necessary. Per trouble report from Philip Warner.	2004-05-22 23:14:38 +00:00
Tom Lane	e6319d1d28	Put back #include <sys/time.h> in files that seem to need it on Linux.	2004-05-21 16:08:47 +00:00
Tom Lane	63bd0db121	Integrate src/timezone library for all platforms. There is more we can and should do now that we control our own destiny for timezone handling, but this commit gets the bulk of the picayune diffs in place. Magnus Hagander and Tom Lane.	2004-05-21 05:08:06 +00:00
Tom Lane	868404b859	Fix speling.	2004-05-20 15:07:30 +00:00
Tom Lane	4af3421161	Get rid of rd_nblocks field in relcache entries. Turns out this was costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.	2004-05-08 19:09:25 +00:00
Tom Lane	0bd61548ab	Solve the 'Turkish problem' with undesirable locale behavior for case conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.	2004-05-07 00:24:59 +00:00
Tom Lane	37fa3b6c89	Tweak indexscan and seqscan code to arrange that steps from one page to the next are handled by ReleaseAndReadBuffer rather than separate ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens to pull successive rows from the same heap page). Unfortunately this doesn't seem enough to get us out of the recently discussed context-switch storm problem, but it's surely worth doing anyway.	2004-04-21 18:24:26 +00:00
Bruce Momjian	31338352bd	* Most changes are to fix warnings issued when compiling win32 * removed a few redundant defines * get_user_name safe under win32 * rationalized pipe read EOF for win32 (UPDATED PATCH USED) * changed all backend instances of sleep() to pg_usleep - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a 32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is acceptable, please replace with pg_usleep(2000000000L)] I added a comment to that part of the code: /* * It would be nice to use pg_usleep() here, but only does 2000 sec * or 33 minutes, which seems too short. */ sleep(1000000); Claudio Natoli	2004-04-19 17:42:59 +00:00
Bruce Momjian	823ac7c2b4	This is a cleanup patch for access/transam/xact.c. It only removes some #ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact that StartTransaction() has been executed. Alvaro Herrera	2004-04-05 03:11:39 +00:00
Tom Lane	375369acd1	Replace TupleTableSlot convention for whole-row variables and function results with tuples as ordinary varlena Datums. This commit does not in itself do much for us, except eliminate the horrid memory leak associated with evaluation of whole-row variables. However, it lays the groundwork for allowing composite types as table columns, and perhaps some other useful features as well. Per my proposal of a few days ago.	2004-04-01 21:28:47 +00:00
Teodor Sigaev	f2c064afcb	Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned boxes. Change interface to user-defined GiST support methods union and picksplit. Now instead of bytea struct it used special GistEntryVector structure.	2004-03-30 15:45:33 +00:00
Bruce Momjian	6367ed4382	Increase xlog str_time() static string variable, per Korean User's Group.	2004-03-22 04:16:57 +00:00
Tatsuo Ishii	0b86ade1c2	Add NOWAIT option to LOCK command	2004-03-11 01:47:41 +00:00
Tom Lane	7a57a67278	Replace opendir/closedir calls throughout the backend with AllocateDir and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.	2004-02-23 23:03:10 +00:00
Bruce Momjian	1f17316a3d	Here is an updated version of the win32 readdir patch. 1) Now puts in exactly the same change as the current-cvs mingw code does. (see http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1= 1.3&r2=1.4, second part of the patch). 2) Updates both xlog.c and slru.c in backend/access/transam/ 3) Also updates pg_resetxlog, which also uses readdir() and checks the errno value after the loop. Magnus Hagander	2004-02-17 03:45:17 +00:00
Tom Lane	c3c09be34b	Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to wit: Add a header record to each WAL segment file so that it can be reliably identified. Avoid splitting WAL records across segment files (this is not strictly necessary, but makes it simpler to incorporate the header records). Make WAL entries for file creation, deletion, and truncation (as foreseen but never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent smgr API changes. initdb is forced due to changes in pg_control contents.	2004-02-11 22:55:26 +00:00
Tom Lane	58f337a343	Centralize implementation of delay code by creating a pg_usleep() subroutine in src/port/pgsleep.c. Remove platform dependencies from miscadmin.h and put them in port.h where they belong. Extend recent vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and non-btree index vacuuming. By the way, where is the documentation for the cost-based-delay patch?	2004-02-10 03:42:45 +00:00
Tom Lane	87bd956385	Restructure smgr API as per recent proposal. smgr no longer depends on the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.	2004-02-10 01:55:27 +00:00
Jan Wieck	f425b605f4	Cost based vacuum delay feature. Jan	2004-02-06 19:36:18 +00:00
Tom Lane	391c3811a2	Rename SortMem and VacuumMem to work_mem and maintenance_work_mem. Make btree index creation and initial validation of foreign-key constraints use maintenance_work_mem rather than work_mem as their memory limit. Add some code to guc.c to allow these variables to be referenced by their old names in SHOW and SET commands, for backwards compatibility.	2004-02-03 17:34:04 +00:00
Tom Lane	2f0d43b251	Review uses of IsUnderPostmaster, change some tests to look at whereToSendOutput instead because they are really inquiring about the correct client communication protocol. Update some comments. This is pointing towards supporting regular FE/BE client protocol in a standalone backend, per discussion a month or so back.	2004-01-28 21:02:40 +00:00
Bruce Momjian	f4921e5ca3	Attached is a patch that fixes some trivial typos and alignment. Please apply. Alvaro Herrera	2004-01-26 22:51:56 +00:00
Tom Lane	c77f363384	Ensure that close() and fclose() are checked for errors, at least in cases involving writes. Per recent discussion about the possibility of close-time failures on some filesystems. There is a TODO item for this, too.	2004-01-26 22:35:32 +00:00
Tom Lane	be11fa26e3	Repair incorrect order of operations in GetNewTransactionId(). We must complete ExtendCLOG() before advancing nextXid, so that if that routine fails, the next incoming transaction will try it again. Per trouble report from Christopher Kings-Lynne.	2004-01-26 19:15:59 +00:00
Tom Lane	9bd681a522	Repair problem identified by Olivier Prenant: ALTER DATABASE SET search_path should not be too eager to reject paths involving unknown schemas, since it can't really tell whether the schemas exist in the target database. (Also, when reading pg_dumpall output, it could be that the schemas don't exist yet, but eventually will.) ALTER USER SET has a similar issue. So, reduce the normal ERROR to a NOTICE when checking search_path values for these commands. Supporting this requires changing the API for GUC assign_hook functions, which causes the patch to touch a lot of places, but the changes are conceptually trivial.	2004-01-19 19:04:40 +00:00
Tom Lane	0966516b75	Tighten short-circuit tests for deciding whether we need to invoke tuptoaster.c --- fields that are compressed in-line are not a reason to invoke the toaster. Along the way, add a couple more htup.h macros to eliminate confusing negated tests, and get rid of the already vestigial TUPLE_TOASTER_ACTIVE symbol.	2004-01-16 20:51:30 +00:00
Bruce Momjian	38081fd000	Change PG_DELAY from msec to usec and use it consistenly rather than select(). Add Win32 Sleep() for delay.	2004-01-09 21:08:50 +00:00
Neil Conway	192ad63bd7	More janitorial work: remove the explicit casting of NULL literals to a pointer type when it is not necessary to do so. For future reference, casting NULL to a pointer type is only necessary when (a) invoking a function AND either (b) the function has no prototype OR (c) the function is a varargs function.	2004-01-07 18:56:30 +00:00
Tom Lane	06288d4e22	Suppress compiler warning (xlog_outrec is unused if not WAL_DEBUG).	2004-01-06 22:22:37 +00:00
Neil Conway	bc028beb16	Make the 'wal_debug' GUC variable a boolean (rather than an integer), and hide it behind #ifdef WAL_DEBUG blocks.	2004-01-06 17:26:23 +00:00
Neil Conway	548523533f	Fix three trivial typos in comments.	2004-01-05 20:36:04 +00:00
Tom Lane	ef92b82dbb	Further cleanup in _bt_first: eliminate duplicate code paths.	2003-12-21 17:52:34 +00:00
Tom Lane	2a0caefeb5	Previous change exposed some opportunities for further simplification in _bt_first().	2003-12-21 03:00:04 +00:00
Tom Lane	569659ae16	Improve btree's initial-positioning-strategy code so that we never need to step more than one entry after descending the search tree to arrive at the correct place to start the scan. This can improve the behavior substantially when there are many entries equal to the chosen boundary value. Per suggestion from Dmitry Tkach, 14-Jul-03.	2003-12-21 01:23:06 +00:00
Bruce Momjian	d75b2ec4eb	This patch is the next step towards (re)allowing fork/exec. Claudio Natoli	2003-12-20 17:31:21 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
Peter Eisentraut	2afacfc403	This patch properly sets the prototype for the on_shmem_exit and on_proc_exit functions, and adjust all other related code to use the proper types too. by Kurt Roeckx	2003-12-12 18:45:10 +00:00
Joe Conway	e2605c8311	Add a warning to AtEOXact_SPI() to catch cases where the current transaction has been committed without SPI_finish() being called first. Per recent discussion here: http://archives.postgresql.org/pgsql-patches/2003-11/msg00286.php	2003-12-02 19:26:47 +00:00
PostgreSQL Daemon	55b113257c	make sure the $Id tags are converted to $PostgreSQL as well ...	2003-11-29 22:41:33 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	fa5c8a055a	Cross-data-type comparisons are now indexable by btrees, pursuant to my pghackers proposal of 8-Nov. All the existing cross-type comparison operators (int2/int4/int8 and float4/float8) have appropriate support. The original proposal of storing the right-hand-side datatype as part of the primary key for pg_amop and pg_amproc got modified a bit in the event; it is easier to store zero as the 'default' case and only store a nonzero when the operator is actually cross-type. Along the way, remove the long-since-defunct bigbox_ops operator class.	2003-11-12 21:15:59 +00:00
Tom Lane	c1d62bfd00	Add operator strategy and comparison-value datatype fields to ScanKey. Remove the 'strategy map' code, which was a large amount of mechanism that no longer had any use except reverse-mapping from procedure OID to strategy number. Passing the strategy number to the index AM in the first place is simpler and faster. This is a preliminary step in planned support for cross-datatype index operations. I'm committing it now since the ScanKeyEntryInitialize() API change touches quite a lot of files, and I want to commit those changes before the tree drifts under me.	2003-11-09 21:30:38 +00:00
Tom Lane	90b2202975	Fix bad interaction between NOTIFY processing and V3 extended query protocol, per report from Igor Shevchenko. NOTIFY thought it could do its thing if transaction blockState is TBLOCK_DEFAULT, but in reality it had better check the low-level transaction state is TRANS_DEFAULT as well. Formerly it was not possible to wait for the client in a state where the first is true and the second is not ... but now we can have such a state. Minor cleanup in StartTransaction() as well.	2003-10-16 16:50:41 +00:00
Tom Lane	55d85f42a8	Repair RI trigger visibility problems (this time for sure ;-)) per recent discussion on pgsql-hackers: in READ COMMITTED mode we just have to force a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have to run the scan under a current snapshot and then complain if any rows would be updated/deleted that are not visible in the transaction snapshot.	2003-10-01 21:30:53 +00:00
Tom Lane	e33f205a94	Adjust btree index build procedure so that the btree metapage looks invalid (has the wrong magic number) until the build is entirely complete. This turns out to cost no additional writes in the normal case, since we were rewriting the metapage at the end of the process anyway. In normal scenarios there's no real gain in security, because a failed index build would roll back the transaction leaving an unused index file, but for rebuilding shared system indexes this seems to add some useful protection.	2003-09-29 23:40:26 +00:00
Tom Lane	8934790052	Add a mechanism to let dynamically loaded modules register post-commit/ post-abort cleanup hooks. I'm surprised that we have not needed this already, but I need it now to fix a plpgsql problem, and the usefulness for other dynamically loaded modules seems obvious.	2003-09-28 23:26:20 +00:00
Tom Lane	4f7a2fa0c3	Fix typo in message.	2003-09-27 18:16:35 +00:00
Peter Eisentraut	d84b6ef56b	Various message fixes, among those fixes for the previous round of fixes	2003-09-26 15:27:37 +00:00
Peter Eisentraut	feb4f44d29	Message editing: remove gratuitous variations in message wording, standardize terms, add some clarifications, fix some untranslatable attempts at dynamic message building.	2003-09-25 06:58:07 +00:00
Tom Lane	a56a016ceb	Repair some REINDEX problems per recent discussions. The relcache is now able to cope with assigning new relfilenode values to nailed-in-cache indexes, so they can be reindexed using the fully crash-safe method. This leaves only shared system indexes as special cases. Remove the 'index deactivation' code, since it provides no useful protection in the shared- index case. Require reindexing of shared indexes to be done in standalone mode, but remove other restrictions on REINDEX. -P (IgnoreSystemIndexes) now prevents using indexes for lookups, but does not disable index updates. It is therefore safe to allow from PGOPTIONS. Upshot: reindexing system catalogs can be done without a standalone backend for all cases except shared catalogs.	2003-09-24 18:54:02 +00:00
Tom Lane	db18703b5a	Fix LISTEN/NOTIFY race condition reported by Gavin Sherry. While a really general fix might be difficult, I believe the only case where AtCommit_Notify could see an uncommitted tuple is where the other guy has just unlistened and not yet committed. The best solution seems to be to just skip updating that tuple, on the assumption that the other guy does not want to hear about the notification anyway. This is not perfect --- if the other guy rolls back his unlisten instead of committing, then he really should have gotten this notify. But to do that, we'd have to wait to see if he commits or not, or make UNLISTEN hold exclusive lock on pg_listener until commit. Either of these answers is deadlock-prone, not to mention horrible for interactive performance. Do it this way for now. (What happened to that project to do LISTEN/NOTIFY in memory with no table, anyway?)	2003-09-15 23:33:43 +00:00
Tom Lane	7a3693716d	Reimplement hash index locking algorithms, per my recent proposal to pghackers. This fixes the problem recently reported by Markus KrÌutner (hash bucket split corrupts the state of scans being done concurrently), and I believe it also fixes all the known problems with deadlocks in hash index operations. Hash indexes are still not really ready for prime time (since they aren't WAL-logged), but this is a step forward.	2003-09-04 22:06:27 +00:00
Tom Lane	5ac2d7c0eb	In _bt_check_unique() loop, don't bother applying _bt_isequal() to killed items; just skip to the next item immediately. Only check for key equality when we reach a non-killed item or the end of the index page. This saves key comparisons when there are lots of killed items, as for example in a heavily-updated table that's not been vacuumed lately. Seems to be a win for pgbench anyway.	2003-09-02 22:10:16 +00:00
Tom Lane	d70610c4ee	Several fixes for hash indexes that involve changing the on-disk index layout; therefore, this change forces REINDEX of hash indexes (though not a full initdb). Widen hashm_ntuples to double so that hash space management doesn't get confused by more than 4G entries; enlarge the allowed number of free-space-bitmap pages; replace the useless bshift field with a useful bmshift field; eliminate 4 bytes of wasted space in the per-page special area.	2003-09-02 18:13:32 +00:00
Tom Lane	8b2450c831	Fix a couple typos, add some more comments.	2003-09-02 03:29:01 +00:00
Tom Lane	39673ca47b	Rewrite hashbulkdelete() to make it amenable to new bucket locking scheme. A pleasant side effect is that it is much faster when deleting a large fraction of the indexed tuples, because of elimination of redundant hash_step activity induced by hash_adjscans. Various other continuing code cleanup.	2003-09-02 02:18:38 +00:00
Tom Lane	65c2d427fb	Preliminary cleanup for hash index code (doesn't attack the locking problem yet). Fix a couple of bugs that would only appear if multiple bitmap pages are used, including a buffer reference leak and incorrect computation of bit indexes. Get rid of 'overflow address' concept, which accomplished nothing except obfuscating the code and creating a risk of failure due to limited range of offset field. Rename some misleadingly-named fields and routines, and improve documentation.	2003-09-01 20:26:34 +00:00
Tom Lane	eaeb8621f8	Add some internals documentation for hash indexes, including an explanation of the remarkably confusing page addressing scheme. The file also includes my planned-but-not-yet-implemented revision of the hash index locking scheme.	2003-09-01 20:24:49 +00:00
Tom Lane	302f1a86dc	Rewriter and planner should use only resno, not resname, to identify target columns in INSERT and UPDATE targetlists. Don't rely on resname to be accurate in ruleutils, either. This fixes bug reported by Donald Fraser, in which renaming a column referenced in a rule did not work very well.	2003-08-11 23:04:50 +00:00
Tom Lane	ffafacc1f6	Repair potential deadlock created by recent changes to recycle btree index pages: when _bt_getbuf asks the FSM for a free index page, it is possible (and, in some cases, even moderately likely) that the answer will be the same page that _bt_split is trying to split. _bt_getbuf already knew that the returned page might not be free, but it wasn't prepared for the possibility that even trying to lock the page could be problematic. Fix by doing a conditional rather than unconditional grab of the page lock.	2003-08-10 19:48:08 +00:00
Bruce Momjian	46785776c4	Another pgindent run with updated typedefs.	2003-08-08 21:42:59 +00:00
Tom Lane	870886affe	Suppress unused-variable warnings when building without Asserts.	2003-08-08 14:39:45 +00:00
Tom Lane	338aa57be0	Rename fields of DestReceiver to avoid collisions with (ill-considered) macros in some platforms' sys/socket.h.	2003-08-06 17:46:46 +00:00
Tom Lane	2f9c859ea1	Fix some copyright notices that weren't updated. Improve copyright tool so it won't miss 'em again.	2003-08-04 23:59:41 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	892a51c367	Fix longstanding error in _bt_search(): should moveright at top of loop not bottom. Otherwise we fail to moveright when the root page was split while we were "in flight" to it. This is not a significant problem when the root is above the leaf level, but if the root was also a leaf (ie, a single-page index just got split) we may return the wrong leaf page to the caller, resulting in failure to find a key that is in fact present. Bug has existed at least since 7.1, probably forever.	2003-07-29 22:18:38 +00:00
Tom Lane	81b5c8a136	A visit from the message-style police ...	2003-07-28 00:09:16 +00:00
Tom Lane	ec7aa4b515	Error message editing in backend/access.	2003-07-21 20:29:40 +00:00
Tom Lane	fa3bd4dbd0	Error message editing: finish up undone task of reporting the problem xid when we fail to access pg_clog.	2003-07-19 21:37:37 +00:00
Tom Lane	8cf63ba920	Repair boundary-case bug introduced by patch of two months ago that fixed incorrect initial setting of StartUpID. The logic in XLogWrite() expects that Write->curridx is advanced to the next page as soon as LogwrtResult points to the end of the current page, but StartupXLOG() failed to make that happen when the old WAL ended exactly on a page boundary. Per trouble report from Hannu Krosing.	2003-07-17 16:45:04 +00:00
Tom Lane	0c985ab5a8	Add comment pointing out that XLByteToPrevSeg macro is not broken.	2003-06-26 18:23:07 +00:00
Tom Lane	bff0422b6c	Revise hash join and hash aggregation code to use the same datatype- specific hash functions used by hash indexes, rather than the old not-datatype-aware ComputeHashFunc routine. This makes it safe to do hash joining on several datatypes that previously couldn't use hashing. The sets of datatypes that are hash indexable and hash joinable are now exactly the same, whereas before each had some that weren't in the other.	2003-06-22 22:04:55 +00:00
Tom Lane	3fb6f1347f	Replace cryptic 'Unknown kind of return type' messages with something hopefully a little more useful.	2003-06-15 17:59:10 +00:00
Bruce Momjian	0abe7431c6	This patch extracts page buffer pooling and the simple least-recently-used strategy from clog.c into slru.c. It doesn't change any visible behaviour and passes all regression tests plus a TruncateCLOG test done manually. Apart from refactoring I made a little change to SlruRecentlyUsed, formerly ClogRecentlyUsed: It now skips incrementing lru_counts, if slotno is already the LRU slot, thus saving a few CPU cycles. To make this work, lru_counts are initialised to 1 in SimpleLruInit. SimpleLru will be used by pg_subtrans (part of the nested transactions project), so the main purpose of this patch is to avoid future code duplication. Manfred Koizar	2003-06-11 22:37:46 +00:00
Bruce Momjian	98b6f37e47	Make debug_ GUC varables output DEBUG1 rather than LOG, and mention in docs that CLIENT/LOG_MIN_MESSAGES now controls debug_* output location. Doc changes included.	2003-05-27 17:49:47 +00:00
Tom Lane	8c43300ccc	Make sure printtup() always sends the number of columns previously advertised in RowDescription message. Depending on the physical tuple's column count is not really correct, since according to heap_getattr() conventions the tuple may be short some columns, which will automatically get read as nulls. Problem has been latent since forever, but was only exposed by recent change to skip a projection step in SELECT * FROM...	2003-05-26 17:51:38 +00:00
Tom Lane	39e98d9563	Repair sometimes-incorrect computation of StartUpID after a crash, per example from Rao Kumar. This is a very corner corner-case, requiring a minimum of three closely-spaced database crashes and an unlucky positioning of the second recovery's checkpoint record before you'd notice any problem. But the consequences are dire enough that it's a must-fix.	2003-05-22 14:39:28 +00:00
Peter Eisentraut	2c0556068f	Indexing support for pattern matching operations via separate operator class when lc_collate is not C.	2003-05-15 15:50:21 +00:00
Tom Lane	f85f43dfb5	Backend support for autocommit removed, per recent discussions. The only remnant of this failed experiment is that the server will take SET AUTOCOMMIT TO ON. Still TODO: provide some client-side autocommit logic in libpq.	2003-05-14 03:26:03 +00:00
Tom Lane	d9b679c13a	In RowDescription messages, report columns of domain datatypes as having the type OID and typmod of the underlying base type. Per discussions a few weeks ago with Andreas Pflug and others. Note that this behavioral change affects both old- and new-protocol clients.	2003-05-13 18:39:50 +00:00
Tom Lane	30f609484d	Add binary I/O routines for a bunch more datatypes. Still a few to go, but that was enough tedium for one day. Along the way, move the few support routines for types xid and cid into a more logical place.	2003-05-12 23:08:52 +00:00
Tom Lane	8d86a96068	Adjust CreateCheckpoint so that buffer dumping activities and cleanup of dead xlog segments are not considered part of a critical section. It is not necessary to force a database-wide panic if we get a failure in these operations. Per recent trouble reports.	2003-05-10 18:01:31 +00:00
Tom Lane	0ac6298bb8	Implement new-protocol binary I/O support in DataRow, Bind, and FunctionCall messages. Binary I/O is now up and working, but only for a small set of datatypes (integers, text, bytea).	2003-05-09 18:08:48 +00:00
Tom Lane	c0a8c3ac13	Update 3.0 protocol support to match recent agreements about how to handle multiple 'formats' for data I/O. Restructure CommandDest and DestReceiver stuff one more time (it's finally starting to look a bit clean though). Code now matches latest 3.0 protocol document as far as message formats go --- but there is no support for binary I/O yet.	2003-05-08 18:16:37 +00:00
Tom Lane	79913910d4	Restructure command destination handling so that we pass around DestReceiver pointers instead of just CommandDest values. The DestReceiver is made at the point where the destination is selected, rather than deep inside the executor. This cleans up the original kluge implementation of tstoreReceiver.c, and makes it easy to support retrieving results from utility statements inside portals. Thus, you can now do fun things like Bind and Execute a FETCH or EXPLAIN command, and it'll all work as expected (e.g., you can Describe the portal, or use Execute's count parameter to suspend the output partway through). Implementation involves stuffing the utility command's output into a Tuplestore, which would be kind of annoying for huge output sets, but should be quite acceptable for typical uses of utility commands.	2003-05-06 20:26:28 +00:00
Tom Lane	2cf57c8f8d	Implement feature of new FE/BE protocol whereby RowDescription identifies the column by table OID and column number, if it's a simple column reference. Along the way, get rid of reskey/reskeyop fields in Resdoms. Turns out that representation was not convenient for either the planner or the executor; we can make the planner deliver exactly what the executor wants with no more effort. initdb forced due to change in stored rule representation.	2003-05-06 00:20:33 +00:00
Tom Lane	16503e6fa4	Extended query protocol: parse, bind, execute, describe FE/BE messages. Only lightly tested as yet, since libpq doesn't know anything about 'em.	2003-05-05 00:44:56 +00:00
Bruce Momjian	a7fd03e1de	Handle clog structure in shared memory in exec() case, for Win32.	2003-05-03 03:52:07 +00:00
Bruce Momjian	a2e038fbee	Back out last commit --- wrong patch.	2003-05-02 21:59:31 +00:00
Bruce Momjian	fb1f7ccec5	Dump/read non-default GUC values for use by exec'ed backends, for Win32.	2003-05-02 21:52:42 +00:00
Tom Lane	de28dc9a04	Portal and memory management infrastructure for extended query protocol. Both plannable queries and utility commands are now always executed within Portals, which have been revamped so that they can handle the load (they used to be good only for single SELECT queries). Restructure code to push command-completion-tag selection logic out of postgres.c, so that it won't have to be duplicated between simple and extended queries. initdb forced due to addition of a field to Query nodes.	2003-05-02 20:54:36 +00:00
Tom Lane	4db9689d1a	Add transaction status field to ReadyForQuery messages, and make room for tableID/columnID in RowDescription. (The latter isn't really implemented yet though --- the backend always sends zeroes, and libpq just throws away the data.)	2003-04-26 20:23:00 +00:00
Tom Lane	9cbaf72177	In the continuing saga of FE/BE protocol revisions, add reporting of initial values and runtime changes in selected parameters. This gets rid of the need for an initial 'select pg_client_encoding()' query in libpq, bringing us back to one message transmitted in each direction for a standard connection startup. To allow server version to be sent using the same GUC mechanism that handles other parameters, invent the concept of a never-settable GUC parameter: you can 'show server_version' but it's not settable by any GUC input source. Create 'lc_collate' and 'lc_ctype' never-settable parameters so that people can find out these settings without need for pg_controldata. (These side ideas were all discussed some time ago in pgsql-hackers, but not yet implemented.)	2003-04-25 19:45:10 +00:00
Tom Lane	5ed27e35f3	Another round of protocol changes. Backend-to-frontend messages now all have length words. COPY OUT reimplemented per new protocol: it doesn't need \. anymore, thank goodness. COPY BINARY to/from frontend works, at least as far as the backend is concerned --- libpq's PQgetline API is not up to snuff, and will have to be replaced with something that is null-safe. libpq uses message length words for performance improvement (no cycles wasted rescanning long messages), but not yet for error recovery.	2003-04-22 00:08:07 +00:00
Bruce Momjian	4d4953fc41	Make Win32 tests to match existing Cygwin tests, where appropriate.	2003-04-18 01:03:42 +00:00

... 9 10 11 12 13 ...

1783 Commits