postgresql

Commit Graph

Author	SHA1	Message	Date
Alvaro Herrera	4650c4fdb9	Degrade the transaction-id wraparound point message from LOG to DEBUG1, per discussion. Patch from Simon Riggs.	2006-09-26 17:21:39 +00:00
Tom Lane	8fad2e3ff4	Arrange for GetSnapshotData to copy live-subtransaction XIDs from the PGPROC array into snapshots, and use this information to avoid visits to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve the pg_subtrans-related context swap storm problem that's been reported by several people for 8.1. While at it, modify GetSnapshotData to not take an exclusive lock on ProcArrayLock, as closer analysis shows that shared lock is always sufficient. Itagaki Takahiro and Tom Lane	2006-09-03 15:59:39 +00:00
Tom Lane	ca1fd0ea5b	Move xact.c's partial support for Lists of TransactionIds into pg_list.h. Needed because lock.c is now going to use the same type of list.	2006-08-27 19:11:46 +00:00
Tom Lane	35af5422f6	Make the server track an 'XID epoch', that is, maintain higher-order bits of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.	2006-08-21 16:16:31 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	704ddaaa09	Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.	2006-08-06 03:53:44 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Peter Eisentraut	e9b4969062	DTrace support, with a small initial set of probes by Robert Lor	2006-07-24 16:32:45 +00:00
Tom Lane	9dc842f083	Don't try to truncate multixact SLRU files in checkpoints done during xlog recovery. In the first place, it doesn't work because slru's latest_page_number isn't set up yet (this is why we've been hearing reports of strange "apparent wraparound" log messages during crash recovery, but only from people who'd managed to advance their next-mxact counters some considerable distance from 0). In the second place, it seems a bit unwise to be throwing away data during crash recovery anwyway. This latter consideration convinces me to just disable truncation during recovery, rather than computing latest_page_number and pushing ahead.	2006-07-20 00:46:42 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Bruce Momjian	0ff3461bcc	Alphabetically order reference to include files, "N" - "S".	2006-07-11 17:26:59 +00:00
Bruce Momjian	3a534ade39	Alphabetically order reference to include files, "G" - "M".	2006-07-11 17:04:13 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Tom Lane	3c71244b74	Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect this someday, but right now it seems that posix_fadvise is immature to the point of being broken on many platforms ... and we don't have any benchmark evidence proving it's worth spending time on.	2006-06-27 18:59:17 +00:00
Tom Lane	3a04f53e7f	pg_stop_backup was calling XLogArchiveNotify() twice for the newly created backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.	2006-06-22 20:42:57 +00:00
Tom Lane	27c3e3de09	Remove redundant gettimeofday() calls to the extent practical without changing semantics too much. statement_timestamp is now set immediately upon receipt of a client command message, and the various places that used to do their own gettimeofday() calls to mark command startup are referenced to that instead. I have also made stats_command_string use that same value for pg_stat_activity.query_start for both the command itself and its eventual replacement by <IDLE> or <idle in transaction>. There was some debate about that, but no argument that seemed convincing enough to justify an extra gettimeofday() call.	2006-06-20 22:52:00 +00:00
Tom Lane	1e8ae13640	Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration for it. Hopefully will fix core dump evidenced by some buildfarm members since fadvise patch went in. The actual definition of the function is not ABI-compatible with compiler's default assumption in the absence of any declaration, so it's clearly unsafe to try to call it without seeing a declaration.	2006-06-18 18:30:21 +00:00
Bruce Momjian	40bc06fa16	Test for POSIX_FADV_DONTNEED to use posix_fadvise().	2006-06-16 04:11:48 +00:00
Bruce Momjian	94a5c4a01b	Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file close. ITAGAKI Takahiro	2006-06-15 19:15:00 +00:00
Teodor Sigaev	8a3631f8d8	GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN.	2006-05-02 11:28:56 +00:00
Bruce Momjian	e6004f0151	Add statement_timestamp(), clock_timestamp(), and transaction_timestamp() (just like now()). Also update statement_timeout() to mention it is statement arrival time that is measured. Catalog version updated.	2006-04-25 00:25:22 +00:00
Tom Lane	eac825aa68	Ensure that we validate the page header of the first page of a WAL file whenever we start to read within that file. The first page carries extra identification information that really ought to be checked, but as the code stood, this was only checked when we switched sequentially into a new WAL file, or if by chance the starting checkpoint record was within the first page. This patch ensures that we will detect bogus 'long header' information before we start replaying the WAL sequence.	2006-04-20 04:07:38 +00:00
Tom Lane	0a87394956	Fix the torn-page hazard for PITR base backups by forcing full page writes to occur between pg_start_backup() and pg_stop_backup(), even if the GUC setting full_page_writes is OFF. Per discussion, doing this in combination with the already-existing checkpoint during pg_start_backup() should ensure safety against partial page updates being included in the backup. We do not have to force full page writes to occur during normal PITR operation, as I had first feared.	2006-04-17 18:55:05 +00:00
Tom Lane	defe93463c	Make the world safe for full_page_writes. Allow XLOG records that try to update no-longer-existing pages to fall through as no-ops, but make a note of each page number referenced by such records. If we don't see a later XLOG entry dropping the table or truncating away the page, complain at the end of XLOG replay. Since this fixes the known failure mode for full_page_writes = off, revert my previous band-aid patch that disabled that GUC variable.	2006-04-14 20:27:24 +00:00
Tom Lane	09b5271ebd	Add a field to the first page of each WAL file to indicate the XLOG_BLCKSZ. This ought to help in preventing configuration mismatch problems if anyone tries to ship PITR files between servers compiled with different XLOG_BLCKSZ settings. Simon Riggs	2006-04-05 03:34:05 +00:00
Tom Lane	e6140d9052	Don't use BLCKSZ for the physical length of the pg_control file, but instead a dedicated symbol. This probably makes no functional difference for likely values of BLCKSZ, but it makes the intent clearer. Simon Riggs, minor editorialization by Tom Lane.	2006-04-04 22:39:59 +00:00
Tom Lane	eaef111396	Define a separately configurable XLOG_BLCKSZ symbol for the page size used within WAL files. Historically this was the same as the data file BLCKSZ, but there's no necessary connection, and it's possible that performance gains might ensue from reducing XLOG_BLCKSZ. In any case distinguishing two symbols should improve code clarity. This commit does not actually change the page size, only provide the infrastructure to make it possible to do so. initdb forced because of addition of a field to pg_control. Mark Wong, with some help from Simon Riggs and Tom Lane.	2006-04-03 23:35:05 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	0a971e2f20	Disable full_page_writes, because turning it off risks causing crash-recovery failures even when the hardware and OS did nothing wrong. Per recent analysis of a problem report from Alex Bahdushka. For the moment I've just diked out the test of the parameter, rather than removing the GUC infrastructure and documentation, in case we conclude that there's something salvageable there. There seems no chance of it being resurrected in the 8.1 branch though.	2006-03-28 22:01:16 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Neil Conway	8e5a10d46c	This patch makes the error message strings throughout the backend more compliant with the error message style guide. In particular, errdetail should begin with a capital letter and end with a period, whereas errmsg should not. I also fixed a few related issues in passing, such as fixing the repeated misspelling of "lexeme" in contrib/tsearch2 (per Tom's suggestion).	2006-03-01 06:30:32 +00:00
Tom Lane	c89a0dd3bb	Repair longstanding bug in slru/clog logic: it is possible for two backends to try to create a log segment file concurrently, but the code erroneously specified O_EXCL to open(), resulting in a needless failure. Before 7.4, it was even a PANIC condition :-(. Correct code is actually simpler than what we had, because we can just say O_CREAT to start with and not need a second open() call. I believe this accounts for several recent reports of hard-to-reproduce "could not create file ...: File exists" errors in both pg_clog and pg_subtrans.	2006-01-21 04:38:21 +00:00
Neil Conway	fb627b76cc	Cosmetic code cleanup: fix a bunch of places that used "return (expr);" rather than "return expr;" -- the latter style is used in most of the tree. I kept the parentheses when they were necessary or useful because the return expression was complex.	2006-01-11 08:43:13 +00:00
Tom Lane	195f164228	Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS (hence, these correspond to the old SpinLockAcquire_NoHoldoff case). Given our coding rules for spinlock use, there is no reason to allow CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there is no situation where ImmediateInterruptOK will be true while holding a spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.	2005-12-29 18:08:05 +00:00
Tom Lane	ab51bbaa06	Arrange to set the LC_XXX environment variables to match our locale setup. This protects against undesired changes in locale behavior if someone carelessly does setlocale(LC_ALL, "") (and we know who you are, perl guys).	2005-12-28 23:22:51 +00:00
Tom Lane	ec0baf949e	Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions.	2005-12-11 21:02:18 +00:00
Tom Lane	887a7c61f6	Get rid of slru.c's hardwired insistence on a fixed number of slots per SLRU area. The number of slots is still a compile-time constant (someday we might want to change that), but at least it's a different constant for each SLRU area. Increase number of subtrans buffers to 32 based on experimentation with a heavily subtrans-bashing test case, and increase number of multixact member buffers to 16, since it's obviously silly for it not to be at least twice the number of multixact offset buffers.	2005-12-06 23:08:34 +00:00
Tom Lane	a615acf555	Arrange for read-only accesses to SLRU page buffers to take only a shared lock, not exclusive, if the desired page is already in memory. This can be demonstrated to be a significant win on the pg_subtrans cache when there is a large window of open transactions. It should be useful for pg_clog as well. I didn't try to make GetMultiXactIdMembers() use the code, as that would have taken some restructuring, and what with the local cache for multixact contents it probably wouldn't really make a difference. Per my recent proposal.	2005-12-06 18:10:06 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	2a8d3d83ef	R-tree is dead ... long live GiST.	2005-11-07 17:36:47 +00:00
Tom Lane	18691d8ee3	Clean up representation of SLRU page state. This is the cleaner fix for the SLRU race condition that I posted a few days ago, but we decided not to use in 8.1 and older branches.	2005-11-05 21:19:47 +00:00
Tom Lane	99d48695d4	Fix longstanding race condition in transaction log management: there was a very narrow window in which SimpleLruReadPage or SimpleLruWritePage could think that I/O was needed when it wasn't (and indeed the buffer had already been assigned to another page). This would result in an Assert failure if Asserts were enabled, and probably in silent data corruption if not. Reported independently by Jim Nasby and Robert Creager. I intend a more extensive fix when 8.2 development starts, but this is a reasonably low-impact patch for the existing branches.	2005-11-03 00:23:36 +00:00
Peter Eisentraut	07bb9f086b	Message corrections	2005-10-29 00:31:52 +00:00
Tom Lane	a037926295	Reorder code so that we don't have to hold a critical section while reserving SLRU space for a new MultiXact. The original coding would have treated out-of-disk-space as a PANIC condition, which is unnecessary.	2005-10-28 19:00:19 +00:00
Tom Lane	1986ca5ce5	Fix race condition in multixact code: it's possible to try to read a multixact's starting offset before the offset has been stored into the SLRU file. A simple fix would be to hold the MultiXactGenLock until the offset has been stored, but that looks like a big concurrency hit. Instead rely on knowledge that unset offsets will be zero, and loop when we see a zero. This requires a little extra hacking to ensure that zero is never a valid value for the offset. Problem reported by Matteo Beccati, fix ideas from Martijn van Oosterhout, Alvaro Herrera, and Tom Lane.	2005-10-28 17:27:29 +00:00
Tom Lane	6d6c3722fb	Make code for selecting default WAL sync method less confusing.	2005-10-22 20:27:17 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Bruce Momjian	1d028537a2	This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 22:55:55 +00:00
Bruce Momjian	84cc9a4bb3	Back out this because of fear of changing error strings: This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 17:57:57 +00:00
Bruce Momjian	90c22c9206	This makes the error messages for PREPARE TRANSACTION, COMMIT PREPARED etc. match the docs, which talk about "transaction identifier" not "gid" or "global transaction identifier". Steve Woodcock	2005-10-13 17:57:17 +00:00
Tom Lane	64eea6c21d	Expand pg_control information so that we can verify that the database was created on a machine with alignment rules and floating-point format similar to the current machine. Per recent discussion, this seems like a good idea with the increasing prevalence of 32/64 bit environments.	2005-10-03 00:28:43 +00:00
Tom Lane	037709e0b3	Reduce default value of max_prepared_transactions from 50 to 5. This saves nearly 700kB in the default shared memory segment size, which seems worthwhile, and it is a feature that many users won't use anyway. Per Heikki's argument, there is no point in a compromise value --- those who are using 2PC at all will probably want it at least equal to max_connections. But we can't set it to zero by default without breaking the prepared_xacts regression test.	2005-08-29 21:38:18 +00:00
Tom Lane	9052537325	Rewrite gather-write patch into something less obviously bolted on after the fact. Fix bug with incorrect test for whether we are at end of logfile segment. Arrange for writes triggered by XLogInsert's is-cache-more-than-half-full test to synchronize with the cache boundaries, so that in long transactions we tend to write alternating halves of the cache rather than randomly chosen portions of it; this saves one more write syscall per cache load.	2005-08-22 23:59:04 +00:00
Bruce Momjian	8ad3965a11	Improve xid wraparound message (the server isn't really shut down, just not accepting queries). errmsg("database is not accepting queries to avoid wraparound data loss in database \"%s\"", errhint("Stop the postmaster and use a standalone backend to VACUUM database \"%s\".",	2005-08-22 16:59:47 +00:00
Tom Lane	d0096a41fa	Fix some inconsistent choices of datatypes in xlog.c. Make buffer indexes all be int, rather than variously int, uint16 and uint32; add some casts where necessary to support large buffer arrays.	2005-08-22 00:41:28 +00:00
Tom Lane	f39f6b500f	Seems that the childXids list would be better based on Oid lists than integer lists.	2005-08-20 23:45:08 +00:00
Tom Lane	0007490e09	Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.	2005-08-20 23:26:37 +00:00
Tatsuo Ishii	ba2fc7eb4b	Make GetMultiXactIdMembers() a public function.	2005-08-20 01:29:27 +00:00
Tom Lane	f8d0a82bf9	Avoid an Assert failure if OuterUserId hasn't been set yet during AbortTransaction. This can happen if a backend's InitPostgres transaction fails (eg, because the given username is invalid). Per Alvaro.	2005-08-17 22:14:34 +00:00
Tom Lane	721e53785d	Solve the problem of OID collisions by probing for duplicate OIDs whenever we generate a new OID. This prevents occasional duplicate-OID errors that can otherwise occur once the OID counter has wrapped around. Duplicate relfilenode values are also checked for when creating new physical files. Per my recent proposal.	2005-08-12 01:36:05 +00:00
Tom Lane	d90c531188	Autovacuum loose end mop-up. Provide autovacuum-specific vacuum cost delay and limit, both as global GUCs and as table-specific entries in pg_autovacuum. stats_reset_on_server_start is now OFF by default, but a reset is forced if we did WAL replay. XID-wrap vacuums do not ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera	2005-08-11 21:11:50 +00:00
Tom Lane	4568e0f791	Modify AtEOXact_CatCache and AtEOXact_RelationCache to assume that the ResourceOwner mechanism already released all reference counts for the cache entries; therefore, we do not need to scan the catcache or relcache at transaction end, unless we want to do it as a debugging crosscheck. Do the crosscheck only in Assert mode. This is the same logic we had previously installed in AtEOXact_Buffers to avoid overhead with large numbers of shared buffers. I thought it'd be a good idea to do it here too, in view of Kari Lavikka's recent report showing a real-world case where AtEOXact_CatCache is taking a significant fraction of runtime.	2005-08-08 19:17:23 +00:00
Tom Lane	2a4fad1a0e	Add NOWAIT option to SELECT FOR UPDATE/SHARE. Original patch by Hans-Juergen Schoenig, revisions by Karel Zak and Tom Lane.	2005-08-01 20:31:16 +00:00
Tom Lane	d42cf5a42a	Add per-user and per-database connection limit options. This patch also includes preliminary update of pg_dumpall for roles. Petr Jelinek, with review by Bruce Momjian and Tom Lane.	2005-07-31 17:19:22 +00:00
Bruce Momjian	5b0bfec414	Fix compile for no O_SYNC, but introduced with O_DIRECT.	2005-07-30 14:15:44 +00:00
Tom Lane	5d5f1a79e6	Clean up a number of autovacuum loose ends. Make the stats collector track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.	2005-07-29 19:30:09 +00:00
Bruce Momjian	c6b1724c67	Update O_DIRECT comment.	2005-07-29 03:25:53 +00:00
Bruce Momjian	c34bb00581	Use O_DIRECT if available when using O_SYNC for wal_sync_method. Also, write multiple WAL buffers out in one write() operation. ITAGAKI Takahiro --------------------------------------------------------------------------- > If we disable writeback-cache and use open_sync, the per-page writing > behavior in WAL module will show up as bad result. O_DIRECT is similar > to O_DSYNC (at least on linux), so that the benefit of it will disappear > behind the slow disk revolution. > > In the current source, WAL is written as: > for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); } > Is this intentional? Can we rewrite it as follows? > write(&buffers[0], N * BLCKSZ); > > In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff). > Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff). > These two patches are independent, so they can be applied either or both. > > > I tested them on my machine and the results as follows. It shows that > direct-io and gather-write is the best choice when writeback-cache is off. > Are these two patches worth trying if they are used together? > > > \| writeback \| fsync= \| fdata \| open_ \| fsync_ \| open_ > patch \| cache \| false \| sync \| sync \| direct \| direct > ------------+-----------+--------+-------+-------+--------+--------- > direct io \| off \| 124.2 \| 105.7 \| 48.3 \| 48.3 \| 48.2 > direct io \| on \| 129.1 \| 112.3 \| 114.1 \| 142.9 \| 144.5 > gather-write\| off \| 124.3 \| 108.7 \| 105.4 \| (N/A) \| (N/A) > both \| off \| 131.5 \| 115.5 \| 114.4 \| 145.4 \| 145.2 > > - 20runs * pgbench -s 100 -c 50 -t 200 > - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8) > - using 2 ATA disks: > - hda(reiserfs) includes system and wal. > - hdc(jfs) includes database files. writeback-cache is always on. > > --- > ITAGAKI Takahiro	2005-07-29 03:22:33 +00:00
Tom Lane	e5d6b91220	Add SET ROLE. This is a partial commit of Stephen Frost's recent patch; I'm still working on the has_role function and information_schema changes.	2005-07-25 22:12:34 +00:00
Bruce Momjian	9af9d674c6	Remove unintended code addition.	2005-07-23 15:31:16 +00:00
Bruce Momjian	4098c8867d	Macro alignment cleanup.	2005-07-23 15:29:47 +00:00
Tom Lane	f2bf2d2dc5	Fix a couple of bogus comments, per Alvaro.	2005-07-13 22:46:09 +00:00
Tom Lane	d7207cfc6b	Even though I'd like to see full_page_writes go away before 8.1, a minimum requirement is that it not completely break the system meanwhile. Put the test in the right place.	2005-07-08 04:07:26 +00:00
Bruce Momjian	326a7a0788	Add GUC full_page_writes to control writing full pages to WAL.	2005-07-05 23:18:10 +00:00
Tom Lane	eb5949d190	Arrange for the postmaster (and standalone backends, initdb, etc) to chdir into PGDATA and subsequently use relative paths instead of absolute paths to access all files under PGDATA. This seems to give a small performance improvement, and it should make the system more robust against naive DBAs doing things like moving a database directory that has a live postmaster in it. Per recent discussion.	2005-07-04 04:51:52 +00:00
Tom Lane	401de9c8be	Improve the checkpoint signaling mechanism so that the bgwriter can tell the difference between checkpoints forced due to WAL segment consumption and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid generating 'checkpoints are occurring too frequently' messages when the checkpoint wasn't caused by WAL segment consumption. Per gripe from Chris K-L.	2005-06-30 00:00:52 +00:00
Tom Lane	b5f7cff84f	Clean up the rather historically encumbered interface to now() and current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.	2005-06-29 22:51:57 +00:00
Tom Lane	7762619e95	Replace pg_shadow and pg_group by new role-capable catalogs pg_authid and pg_auth_members. There are still many loose ends to finish in this patch (no documentation, no regression tests, no pg_dump support for instance). But I'm going to commit it now anyway so that Alvaro can make some progress on shared dependencies. The catalog changes should be pretty much done.	2005-06-28 05:09:14 +00:00
Tom Lane	fc654583ab	Need #include <time.h> on some platforms.	2005-06-19 22:34:56 +00:00
Tom Lane	3f749924f8	Simplify uses of readdir() by creating a function ReadDir() that includes error checking and an appropriate ereport(ERROR) message. This gets rid of rather tedious and error-prone manipulation of errno, as well as a Windows-specific bug workaround, at more than a dozen call sites. After an idea in a recent patch by Heikki Linnakangas.	2005-06-19 21:34:03 +00:00
Tom Lane	e26b0abda3	Arrange to fsync two-phase-commit state files only during checkpoints; given reasonably short lifespans for prepared transactions, this should mean that only a small minority of state files ever need to be fsynced at all. Per discussion with Heikki Linnakangas.	2005-06-19 20:00:39 +00:00
Tom Lane	a8d1075f27	Add a time-of-preparation column to the pg_prepared_xacts view, per an old suggestion by Oliver Jowett. Also, add a transaction column to the pg_locks view to show the xid of each transaction holding or awaiting locks; this allows prepared transactions to be properly associated with the locks they own. There was already a column named 'transaction', and I chose to rename it to 'transactionid' --- since this column is new in the current devel cycle there should be no backwards compatibility issue to worry about.	2005-06-18 19:33:42 +00:00
Tom Lane	66b098492e	Dept. of second thoughts: regular COMMIT deletes deletable files before releasing locks, so COMMIT PREPARED should too.	2005-06-18 05:21:09 +00:00
Tom Lane	d0a89683a3	Two-phase commit. Original patch by Heikki Linnakangas, with additional hacking by Alvaro Herrera and Tom Lane.	2005-06-17 22:32:51 +00:00
Bruce Momjian	f4d907ca85	Remove old .backup files when we do pg_stop_backup(). This prevents a large number of .backup files from existing in pg_xlog/	2005-06-15 01:36:08 +00:00
Teodor Sigaev	37c839365c	WAL for GiST. It work for online backup and so on, but on recovery after crash (power loss etc) it may say that it can't restore index and index should be reindexed. Some refactoring code.	2005-06-14 11:45:14 +00:00
Bruce Momjian	51746c4549	Free buffer allocated via malloc (process is short-lived, but fix it anyway).	2005-06-09 22:36:27 +00:00
Tom Lane	f5b2f60bd1	Change WAL-logging scheme for multixacts to be more like regular transaction IDs, rather than like subtrans; in particular, the information now survives a database restart. Per previous discussion, this is essential for PITR log shipping and for 2PC.	2005-06-08 15:50:28 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Tom Lane	a91fa39028	Add test to WAL replay to verify that xl_prev points back to the previous WAL record; this is necessary to be sure we recognize stale WAL records when a WAL page was only partially written during a system crash.	2005-05-31 19:10:28 +00:00
Tom Lane	e92a88272e	Modify hash_search() API to prevent future occurrences of the error spotted by Qingqing Zhou. The HASH_ENTER action now automatically fails with elog(ERROR) on out-of-memory --- which incidentally lets us eliminate duplicate error checks in quite a bunch of places. If you really need the old return-NULL-on-out-of-memory behavior, you can ask for HASH_ENTER_NULL. But there is now an Assert in that path checking that you aren't hoping to get that behavior in a palloc-based hash table. Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions, which were not being used anywhere anymore, and were surely too ugly and unsafe to want to see revived again.	2005-05-29 04:23:07 +00:00
Bruce Momjian	6dc7760ac3	Add support for wal_fsync_writethrough for Darwin, and restructure the code to better handle writethrough. Chris Campbell	2005-05-20 14:53:26 +00:00
Tom Lane	ff0c143a3b	Make a comment pgindent-proof, per suggestion from Alvaro.	2005-05-19 23:58:51 +00:00
Tom Lane	ee3b71f6bc	Split the shared-memory array of PGPROC pointers out of the sinval communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.	2005-05-19 21:35:48 +00:00
Neil Conway	c891e05f26	Cleanup GiST header files. Since GiST extensions are often written as external projects, we should be careful about what parts of the GiST API are considered implementation details, and which are part of the public API. Therefore, I've moved internal-only declarations into gist_private.h -- future backward-incompatible changes to gist.h should be made with care, to avoid needlessly breaking external GiST extensions. Also did some related header cleanup: remove some unnecessary #includes from gist.h, and remove some unused definitions: isAttByVal(), _gistdump(), and GISTNStrategies.	2005-05-17 03:34:18 +00:00
Bruce Momjian	35e1651508	Back out check for unreferenced files. Heikki Linnakangas	2005-05-10 22:27:30 +00:00
Tom Lane	4cd4ed0cc2	Fix case in which a debug printout would print already-pfreed data.	2005-05-07 18:14:25 +00:00
Tom Lane	126eaef651	Clean up MultiXactIdExpand's API by separating out the case where we are creating a new MultiXactId from two regular XIDs. The original coding was unnecessarily complicated and didn't save any code anyway.	2005-05-03 19:42:41 +00:00
Bruce Momjian	76668e6eb4	Check the file system on postmaster startup and report any unreferenced files in the server log. Heikki Linnakangas	2005-05-02 18:26:54 +00:00
Tom Lane	bedb78d386	Implement sharable row-level locks, and use them for foreign key references to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. When more than one transaction is holding a shared lock on a particular row, we create a MultiXactId representing that set of transactions and store its ID in the row's XMAX. This scheme allows an effectively unlimited number of row locks, just as we did before, while not costing any extra overhead except when a shared lock actually has to be shared. Still TODO: use the regular lock manager to control the grant order when multiple backends are waiting for a row lock. Alvaro Herrera and Tom Lane.	2005-04-28 21:47:18 +00:00
Tom Lane	19d127548c	Add comment about checkpoint panic behavior during shutdown, per suggestion from Qingqing Zhou.	2005-04-23 18:49:54 +00:00
Bruce Momjian	1a6ad669fb	Fix comment typo.	2005-04-17 03:04:29 +00:00
Tom Lane	5f0a974ea9	Reduce PANIC to ERROR in several xlog routines that are used in both critical and noncritical contexts (an example of noncritical being post-checkpoint removal of dead xlog segments). In the critical cases the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC anyway, and in the noncritical cases we shouldn't let an error take down the entire database. Arguably there should be no explicit PANIC errors in this module, only more START/END_CRIT_SECTION calls, but I didn't go that far. (Yet.)	2005-04-15 22:19:48 +00:00
Tom Lane	61b861421b	Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior when recycling a large number of xlog segments during checkpoint. The former behavior searched from the same start point each time, requiring O(checkpoint_segments^2) stat() calls to relocate all the segments. Instead keep track of where we stopped last time through.	2005-04-15 18:48:10 +00:00
Tom Lane	2193a856a2	Simplify initdb-time assignment of OIDs as I proposed yesterday, and avoid encroaching on the 'user' range of OIDs by allowing automatic OID assignment to use values below 16k until we reach normal operation. initdb not forced since this doesn't make any incompatible change; however a lot of stuff will have different OIDs after your next initdb.	2005-04-13 18:54:57 +00:00
Tom Lane	c3294f1cbf	Fix interaction between materializing holdable cursors and firing deferred triggers: either one can create more work for the other, so we have to loop till it's all gone. Per example from andrew@supernews. Add a regression test to help spot trouble in this area in future.	2005-04-11 19:51:16 +00:00
Tom Lane	8c85a34a3b	Officially decouple FUNC_MAX_ARGS from INDEX_MAX_KEYS, and set the former to 100 by default. Clean up some of the less necessary dependencies on FUNC_MAX_ARGS; however, the biggie (FunctionCallInfoData) remains.	2005-03-29 03:01:32 +00:00
Tom Lane	119191609c	Remove dead push/pop rollback code. Vadim once planned to implement transaction rollback via UNDO but I think that's highly unlikely to happen, so we may as well remove the stubs. (Someday we ought to rip out the stub xxx_undo routines, too.) Per Alvaro.	2005-03-28 01:50:34 +00:00
Bruce Momjian	b1f57d88f5	Change Win32 O_SYNC method to O_DSYNC because that is what the method currently does. This is now the default Win32 wal sync method because we perfer o_datasync to fsync. Also, change Win32 fsync to a new wal sync method called fsync_writethrough because that is the behavior of _commit, which is what is used for fsync on Win32. Backpatch to 8.0.X.	2005-03-24 04:36:20 +00:00
Tom Lane	4aefe75553	Remove some no-longer-needed kluges for bootstrapping, in particular the AMI_OVERRIDE flag. The fact that TransactionLogFetch treats BootstrapTransactionId as always committed is sufficient to make bootstrap work, and getting rid of extra tests in heavily used code paths seems like a win. The files produced by initdb are demonstrably the same after this change.	2005-02-20 21:46:50 +00:00
Tom Lane	60b2444cc3	Add code to prevent transaction ID wraparound by enforcing a safe limit in GetNewTransactionId(). Since the limit value has to be computed before we run any real transactions, this requires adding code to database startup to scan pg_database and determine the oldest datfrozenxid. This can conveniently be combined with the first stage of an attack on the problem that the 'flat file' copies of pg_shadow and pg_group are not properly updated during WAL recovery. The code I've added to startup resides in a new file src/backend/utils/init/flatfiles.c, and it is responsible for rewriting the flat files as well as initializing the XID wraparound limit value. This will eventually allow us to get rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add a trigger to pg_database.	2005-02-20 02:22:07 +00:00
Bruce Momjian	7c44e57331	Move plpgsql DEBUG from DEBUG2 to DEBUG1 because it is a user-requested DEBUG. Fix a few places where DEBUG1 crept in that should have been DEBUG2.	2005-02-12 23:53:42 +00:00
Tom Lane	0ce4d56924	Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This is the minimum required fix. I want to look next at taking advantage of it by simplifying the message semantics in the shared inval message queue, but that part can be held over for 8.1 if it turns out too ugly.	2005-01-10 20:02:24 +00:00
Bruce Momjian	2daed8c5b3	Update copyrights that were missed.	2005-01-01 05:43:09 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	bfa5f30481	Awhile back I added some code to StartupCLOG() to forcibly zero out the remainder of the current clog page during system startup. While this was a good idea, it turns out the code fails if nextXid is exactly at a page boundary, because we won't have created the "current" clog page yet in that case. Since the page will be correctly zeroed when we execute the first transaction on it, the solution is just to do nothing when exactly at a page boundary. Per trouble report from Dave Hartwig.	2004-12-22 18:45:49 +00:00
Tom Lane	ff5a354ece	Fix is-it-time-for-a-checkpoint logic so that checkpoint_segments can usefully be larger than 255. Per gripe from Simon Riggs.	2004-12-17 00:10:36 +00:00
Tom Lane	37d693033d	Minor adjustment of message style.	2004-11-17 16:26:59 +00:00
Neil Conway	b25d23e1e6	Don't allow pg_start_backup() to be invoked if archive_command has not been defined. Patch from Gavin Sherry, editorializing by Neil Conway.	2004-11-17 02:22:54 +00:00
Peter Eisentraut	0ed3c7665e	Small message clarifications	2004-11-05 17:11:34 +00:00
Tom Lane	88868d4fbc	Change COMMIT back to the old behavior of emitting command tag COMMIT, not ROLLBACK, for the case of COMMIT outside a transaction block. Alvaro Herrera	2004-10-30 20:44:43 +00:00
Tom Lane	23f264d125	Rearrange order of pre-commit operations: must close cursors before doing ON COMMIT actions. Per bug report from Michael Guerin.	2004-10-29 22:19:53 +00:00
Tom Lane	ee69be44d5	Add DEBUG1-level logging of checkpoint start and end. Also, reduce the 'recycled log files' and 'removed log files' messages from DEBUG1 to DEBUG2, replacing them with a count of files added/removed/recycled in the checkpoint end message, as per suggestion from Simon Riggs.	2004-10-29 00:16:08 +00:00
Tom Lane	fdd13f1568	Give the ResourceOwner mechanism full responsibility for releasing buffer pins at end of transaction, and reduce AtEOXact_Buffers to an Assert cross-check that this was done correctly. When not USE_ASSERT_CHECKING, AtEOXact_Buffers is a complete no-op. This gets rid of an O(NBuffers) bottleneck during transaction commit/abort, which recent testing has shown becomes significant above a few tens of thousands of shared buffers.	2004-10-16 18:57:26 +00:00
Bruce Momjian	5c267325ec	Add 'int' cast for getpid() because some Solaris releases return long for getpid().	2004-10-14 20:23:46 +00:00
Peter Eisentraut	0fd37839d9	Message style revisions	2004-10-12 21:54:45 +00:00
Bruce Momjian	67608a393b	Make getpid() use %d consistently for printing.	2004-10-09 02:46:42 +00:00
Bruce Momjian	a5d7ba773d	Adjust comments previously moved to column 1 by pgident.	2004-10-07 15:21:58 +00:00
Tom Lane	4c77cbb272	PortalRun must guard against the possibility that the portal it's running contains VACUUM or a similar command that will internally start and commit transactions. In such a case, the original caller values of CurrentMemoryContext and CurrentResourceOwner will point to objects that will be destroyed by the internal commit. We must restore these pointers to point to the newly-manufactured transaction context and resource owner, rather than possibly pointing to deleted memory. Also tweak xact.c so that AbortTransaction and AbortSubTransaction forcibly restore a sane value for CurrentResourceOwner, much as they have always done for CurrentMemoryContext. I'm not certain this is necessary but I'm feeling paranoid today. Responds to Sean Chittenden's bug report of 4-Oct.	2004-10-04 21:52:15 +00:00
Tom Lane	257cccbe5e	Add some marginal tweaks to eliminate memory leakages associated with subtransactions. Trivial subxacts (such as a plpgsql exception block containing no database access) now demonstrably leak zero bytes.	2004-09-16 20:17:49 +00:00
Tom Lane	86fff990b2	RecentXmin is too recent to use as the cutoff point for accessing pg_subtrans --- what we need is the oldest xmin of any snapshot in use in the current top transaction. Introduce a new variable TransactionXmin to play this role. Fixes intermittent regression failure reported by Neil Conway.	2004-09-16 18:35:23 +00:00
Tom Lane	8f9f198603	Restructure subtransaction handling to reduce resource consumption, as per recent discussions. Invent SubTransactionIds that are managed like CommandIds (ie, counter is reset at start of each top transaction), and use these instead of TransactionIds to keep track of subtransaction status in those modules that need it. This means that a subtransaction does not need an XID unless it actually inserts/modifies rows in the database. Accordingly, don't assign it an XID nor take a lock on the XID until it tries to do that. This saves a lot of overhead for subtransactions that are only used for error recovery (eg plpgsql exceptions). Also, arrange to release a subtransaction's XID lock as soon as the subtransaction exits, in both the commit and abort cases. This avoids holding many unique locks after a long series of subtransactions. The price is some additional overhead in XactLockTableWait, but that seems acceptable. Finally, restructure the state machine in xact.c to have a more orthogonal set of states for subtransactions.	2004-09-16 16:58:44 +00:00
Tom Lane	b2c4071299	Redesign query-snapshot timing so that volatile functions in READ COMMITTED mode see a fresh snapshot for each command in the function, rather than using the latest interactive command's snapshot. Also, suppress fresh snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE functions, instead using the snapshot taken for the most closely nested regular query. (This behavior is only sane for read-only functions, so the patch also enforces that such functions contain only SELECT commands.) As per my proposal of 6-Sep-2004; I note that I floated essentially the same proposal on 19-Jun-2002, but that discussion tailed off without any action. Since 8.0 seems like the right place to be taking possibly nontrivial backwards compatibility hits, let's get it done now.	2004-09-13 20:10:13 +00:00
Tom Lane	b339d1fff6	Fire non-deferred AFTER triggers immediately upon query completion, rather than when returning to the idle loop. This makes no particular difference for interactively-issued queries, but it makes a big difference for queries issued within functions: trigger execution now occurs before the calling function is allowed to proceed. This responds to numerous complaints about nonintuitive behavior of foreign key checking, such as http://archives.postgresql.org/pgsql-bugs/2004-09/msg00020.php, and appears to be required by the SQL99 spec. Also take the opportunity to simplify the data structures used for the pending-trigger list, rename them for more clarity, and squeeze out a bit of space.	2004-09-10 18:40:09 +00:00
Tom Lane	23645f0582	Fix incorrect ordering of smgr cleanup relative to buffer pin cleanup during transaction abort. Add a regression test case to catch related mistakes in future. Alvaro Herrera and Tom Lane.	2004-09-06 17:56:33 +00:00
Tom Lane	e32bba202d	Downgrade LOG messages to DEBUG1 for normal recycling of xlog, clog, subtrans segments. Per Greg Mullane and Chris K-L.	2004-09-06 03:04:27 +00:00
Tom Lane	64cb889106	Ensure that the remainder of the current pg_clog page is zeroed during startup, just to be sure that there's no leftover junk there.	2004-08-30 19:00:42 +00:00
Tom Lane	9cf4eaa00b	Fix failure to advance nextXID beyond subtransactions whose XIDs appear only within COMMIT or ABORT records.	2004-08-30 19:00:03 +00:00
Bruce Momjian	15d3f9f6b7	Another pgindent run with lib typedefs added.	2004-08-30 02:54:42 +00:00
Tom Lane	50742aed68	Add WAL logging for CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. Fix TablespaceCreateDbspace() to be able to create a dummy directory in place of a dropped tablespace's symlink. This eliminates the open problem of a PANIC during WAL replay when a replayed action attempts to touch a file in a since-deleted tablespace. It also makes for a significant improvement in the usability of PITR replay.	2004-08-29 21:08:48 +00:00
Tom Lane	0ffe11abd3	Widen xl_len field of XLogRecord header to 32 bits, so that we'll have a more tolerable limit on the number of subtransactions or deleted files in COMMIT and ABORT records. Buy back the extra space by eliminating the xl_xact_prev field, which isn't being used for anything and is rather unlikely ever to be used for anything. This does not force initdb, but you do need to do pg_resetxlog if you want to upgrade an existing 8.0 installation without initdb.	2004-08-29 16:34:48 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	f78ecbf20e	Now that TransactionIdDidAbort doesn't think it should try to modify pg_clog, there's no reason to do abort marking of subtransactions in a nonintuitive order.	2004-08-28 22:04:12 +00:00
Tom Lane	7531d2fd85	Add missing Assert to make TransactionIdDidAbort more consistent with TransactionIdDidCommit.	2004-08-28 21:58:59 +00:00
Tom Lane	1c72d0dec1	Fix relcache to account properly for subtransaction status of 'new' relcache entries. Also, change TransactionIdIsCurrentTransactionId() so that if consulted during transaction abort, it will not say that the aborted xact is still current. (It would be better to ensure that it's never called at all during abort, but I'm not sure we can easily guarantee that.) In combination, these fix a crash we have seen occasionally during parallel regression tests of 8.0.	2004-08-28 20:31:44 +00:00
Tom Lane	f444dafab0	Can't truncate pg_subtrans during a recovery checkpoint --- subtrans module isn't fully initialized yet.	2004-08-28 18:18:03 +00:00
Tom Lane	fe455ee1d4	Revise ResourceOwner code to avoid accumulating ResourceOwner objects for every command executed within a transaction. For long transactions this was a significant memory leak. Instead, we can delete a portal's or subtransaction's ResourceOwner immediately, if we physically transfer the information about its locks up to the parent owner. This does not fully solve the leak problem; we need to do something about counting multiple acquisitions of the same lock in order to fix it. But it's a necessary step along the way.	2004-08-25 18:43:43 +00:00
Tom Lane	4dbb880d3c	Rearrange pg_subtrans handling as per recent discussion. pg_subtrans updates are no longer WAL-logged nor even fsync'd; we do not need to, since after a crash no old pg_subtrans data is needed again. We truncate pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is refactored a little bit to separate out the necessary decisions.	2004-08-23 23:22:45 +00:00
Tom Lane	f009c316ba	Tweak code so that pg_subtrans is never consulted for XIDs older than RecentXmin (== MyProc->xmin). This ensures that it will be safe to truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate any fear of bloat. Along the way, eliminate SubTransXidsHaveCommonAncestor, which isn't really needed and could not give a trustworthy result anyway under the lookback restriction. In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has been dead code since 2001 and seems unlikely to ever be resurrected.	2004-08-22 02:41:58 +00:00
Bruce Momjian	10249abfa1	Cleanup Win32 COPY handling, and move archive examples to SGML.	2004-08-12 19:03:44 +00:00
Bruce Momjian	43ea65a0dc	Add mention of "WIN32" COPY.	2004-08-12 18:34:45 +00:00
Bruce Momjian	6525b42b10	Add make_native_path() because Win32 COPY is an internal CMD.EXE command and doesn't process forward slashes in the same way as external commands. Quoting the first argument to COPY does not convert forward to backward slashes, but COPY does properly process quoted forward slashes in the second argument. Win32 COPY works with quoted forward slashes in the first argument only if the current directory is the same as the directory of the first argument.	2004-08-12 18:32:52 +00:00
Tom Lane	3fdf649f4f	Fix failure to guarantee that a checkpoint will write out pg_clog updates for transaction commits that occurred just before the checkpoint. This is an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a reproducible test case to prove its existence.	2004-08-11 04:07:16 +00:00
Tom Lane	35f539b481	When expanding %p in archive_command or restore_command, translate slashes to backslashes #ifdef WIN32. This is to cope with the fact that Windows seems exceedingly unfriendly to slashes in shell commands, as per recent discussion.	2004-08-09 16:26:06 +00:00
Tom Lane	7dca975c5d	Add a comment about why we always replay backup blocks from WAL.	2004-08-08 03:22:08 +00:00
Tom Lane	fcbc438727	Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments and documentation to reference 8.0 instead of 7.5.	2004-08-04 21:34:35 +00:00
Tom Lane	b387d16f96	Make use of backup label/history files to control recovery properly.	2004-08-04 16:25:02 +00:00
Tom Lane	58c41712d5	Add functions pg_start_backup, pg_stop_backup to create backup label and history files as per recent discussion. While at it, remove pg_terminate_backend, since we have decided we do not have time during this release cycle to address the reliability concerns it creates. Split the 'Miscellaneous Functions' documentation section into 'System Information Functions' and 'System Administration Functions', which hopefully will draw the eyes of those looking for such things.	2004-08-03 20:32:36 +00:00
Tom Lane	a83c45c4c6	Fix misplacement of savepointLevel test, per report from Chris K-L.	2004-08-03 15:57:26 +00:00
Tom Lane	410b1dfb88	Update the in-code documentation about the transaction system. Move it into a README file instead of being in xact.c's header comment. Alvaro Herrera.	2004-08-01 20:57:59 +00:00
Tom Lane	5cc380f9a3	Error message style adjustments, per Alvaro Herrera.	2004-08-01 17:45:43 +00:00
Tom Lane	efcaf1e868	Some mop-up work for savepoints (nested transactions). Store a small number of active subtransaction XIDs in each backend's PGPROC entry, and use this to avoid expensive probes into pg_subtrans during TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on modules to get control at subxact start/end. (This is deliberately not compatible with the former API, since any uses of that API probably need manual review anyway.) Add basic reference documentation for SAVEPOINT and related commands. Minor other cleanups to check off some of the open issues for subtransactions. Alvaro Herrera and Tom Lane.	2004-08-01 17:32:22 +00:00
Tom Lane	beda4814c1	plpgsql does exceptions. There are still some things that need refinement; in particular I fear that the recognized set of error condition names probably has little in common with what Oracle recognizes. But it's a start.	2004-07-31 07:39:21 +00:00
Tom Lane	1bf3d61504	Fix subtransaction behavior for large objects, temp namespace, files, password/group files. Also allow read-only subtransactions of a read-write parent, but not vice versa. These are the reasonably noncontroversial parts of Alvaro's recent mop-up patch, plus further work on large objects to minimize use of the TopTransactionResourceOwner.	2004-07-28 14:23:31 +00:00
Tom Lane	cc813fc2b8	Replace nested-BEGIN syntax for subtransactions with spec-compliant SAVEPOINT/RELEASE/ROLLBACK-TO syntax. (Alvaro) Cause COMMIT of a failed transaction to report ROLLBACK instead of COMMIT in its command tag. (Tom) Fix a few loose ends in the nested-transactions stuff.	2004-07-27 05:11:48 +00:00
Tom Lane	acd907bfcc	Add cross-check that current timeline of pg_control is an ancestor of recovery_target_timeline --- otherwise there is no path from the backup to the requested timeline. This check was foreseen in the original discussion but I forgot to implement it.	2004-07-22 21:09:37 +00:00
Tom Lane	3dba9cb694	Add a check on file size as an additional safety check that a WAL file recovered from archive is not corrupt. It's not much but it will catch one common problem, viz out-of-disk-space. Also, force a WAL recovery scan when recovery.conf is present, even if pg_control shows a clean shutdown. This allows recovery with a tar backup that was taken with the postmaster shut down, as per complaint from Mark Kirkwood.	2004-07-22 20:18:40 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	9c7a765f02	Remove unportable use of strptime() to parse recovery target time spec. Instead use our own abstimein code, which is more flexible anyway.	2004-07-19 14:34:39 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	fe548629c5	Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.	2004-07-17 03:32:14 +00:00
Tom Lane	f5c798ee82	Fix no-longer-correct bit-pushing in TransactionIdSetStatus, per Alvaro.	2004-07-03 02:55:56 +00:00
Tom Lane	b6197fe069	Further review of xact.c state machine for nested transactions. Fix problems with starting subtransactions inside already-failed transactions. Clean up some comments.	2004-07-01 20:11:03 +00:00
Tom Lane	573a71a5da	Nested transactions. There is still much left to do, especially on the performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.	2004-07-01 00:52:04 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	921d749bd4	Adjust our timezone library to use pg_time_t (typedef'd as int64) in place of time_t, as per prior discussion. The behavior does not change on machines without a 64-bit-int type, but on machines with one, which is most, we are rid of the bizarre boundary behavior at the edges of the 32-bit-time_t range (1901 and 2038). The system will now treat times over the full supported timestamp range as being in your local time zone. It may seem a little bizarre to consider that times in 4000 BC are PST or EST, but this is surely at least as reasonable as propagating Gregorian calendar rules back that far. I did not modify the format of the zic timezone database files, which means that for the moment the system will not know about daylight-savings periods outside the range 1901-2038. Given the way the files are set up, it's not a simple decision like 'widen to 64 bits'; we have to actually think about the range of years that need to be supported. We should probably inquire what the plans of the upstream zic people are before making any decisions of our own.	2004-06-03 02:08:07 +00:00
Tom Lane	9b178555fc	Per previous discussions, get rid of use of sync(2) in favor of explicitly fsync'ing every (non-temp) file we have written since the last checkpoint. In the vast majority of cases, the burden of the fsyncs should fall on the bgwriter process not on backends. (To this end, we assume that an fsync issued by the bgwriter will force out blocks written to the same file by other processes using other file descriptors. Anyone have a problem with that?) This makes the world safe for WIN32, which ain't even got sync(2), and really makes the world safe for Unixen as well, because sync(2) never had the semantics we need: it offers no way to wait for the requested I/O to finish. Along the way, fix a bug I recently introduced in xlog recovery: file truncation replay failed to clear bufmgr buffers for the dropped blocks, which could result in 'PANIC: heap_delete_redo: no block' later on in xlog replay.	2004-05-31 03:48:10 +00:00
Tom Lane	076a055acf	Separate out bgwriter code into a logically separate module, rather than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.	2004-05-29 22:48:23 +00:00
Tom Lane	1a321f26d8	Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.	2004-05-28 05:13:32 +00:00
Tom Lane	16974ee910	Get rid of the former rather baroque mechanism for propagating the values of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just to make them all grab the values out of shared memory during startup. This helps to decouple the postmaster from checkpoint execution, which I need since I'm intending to let the bgwriter do it instead, and it also fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at all AFAICS. (Doesn't give me a lot of faith in the amount of testing that port has gotten.)	2004-05-27 17:12:57 +00:00
Tom Lane	4d86ae4260	For multi-table ANALYZE, use per-table transactions when possible (ie, when not inside a transaction block), so that we can avoid holding locks longer than necessary. Per trouble report from Philip Warner.	2004-05-22 23:14:38 +00:00
Tom Lane	e6319d1d28	Put back #include <sys/time.h> in files that seem to need it on Linux.	2004-05-21 16:08:47 +00:00
Tom Lane	63bd0db121	Integrate src/timezone library for all platforms. There is more we can and should do now that we control our own destiny for timezone handling, but this commit gets the bulk of the picayune diffs in place. Magnus Hagander and Tom Lane.	2004-05-21 05:08:06 +00:00
Tom Lane	0bd61548ab	Solve the 'Turkish problem' with undesirable locale behavior for case conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.	2004-05-07 00:24:59 +00:00
Bruce Momjian	31338352bd	* Most changes are to fix warnings issued when compiling win32 * removed a few redundant defines * get_user_name safe under win32 * rationalized pipe read EOF for win32 (UPDATED PATCH USED) * changed all backend instances of sleep() to pg_usleep - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a 32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is acceptable, please replace with pg_usleep(2000000000L)] I added a comment to that part of the code: /* * It would be nice to use pg_usleep() here, but only does 2000 sec * or 33 minutes, which seems too short. */ sleep(1000000); Claudio Natoli	2004-04-19 17:42:59 +00:00
Bruce Momjian	823ac7c2b4	This is a cleanup patch for access/transam/xact.c. It only removes some #ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact that StartTransaction() has been executed. Alvaro Herrera	2004-04-05 03:11:39 +00:00
Bruce Momjian	6367ed4382	Increase xlog str_time() static string variable, per Korean User's Group.	2004-03-22 04:16:57 +00:00
Tom Lane	7a57a67278	Replace opendir/closedir calls throughout the backend with AllocateDir and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.	2004-02-23 23:03:10 +00:00
Bruce Momjian	1f17316a3d	Here is an updated version of the win32 readdir patch. 1) Now puts in exactly the same change as the current-cvs mingw code does. (see http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1= 1.3&r2=1.4, second part of the patch). 2) Updates both xlog.c and slru.c in backend/access/transam/ 3) Also updates pg_resetxlog, which also uses readdir() and checks the errno value after the loop. Magnus Hagander	2004-02-17 03:45:17 +00:00
Tom Lane	c3c09be34b	Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to wit: Add a header record to each WAL segment file so that it can be reliably identified. Avoid splitting WAL records across segment files (this is not strictly necessary, but makes it simpler to incorporate the header records). Make WAL entries for file creation, deletion, and truncation (as foreseen but never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent smgr API changes. initdb is forced due to changes in pg_control contents.	2004-02-11 22:55:26 +00:00
Tom Lane	58f337a343	Centralize implementation of delay code by creating a pg_usleep() subroutine in src/port/pgsleep.c. Remove platform dependencies from miscadmin.h and put them in port.h where they belong. Extend recent vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and non-btree index vacuuming. By the way, where is the documentation for the cost-based-delay patch?	2004-02-10 03:42:45 +00:00

... 2 3 4 5 6 ...

671 Commits