postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2024-10-01 22:31:18 +02:00

Author	SHA1	Message	Date
Tom Lane	5f60086e10	Minor adjustments to make failures in startup/shutdown behave more cleanly. StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.	2006-11-30 18:29:12 +00:00
Tom Lane	3ad0728c81	On systems that have setsid(2) (which should be just about everything except Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.	2006-11-21 20:59:53 +00:00
Peter Eisentraut	e138b80996	String fix	2006-11-16 14:28:41 +00:00
Tom Lane	792d6edd5b	Clean up some misleading references to %p being a full path, per Simon.	2006-11-10 22:32:20 +00:00
Tom Lane	dcbdf9b1d4	Change Windows rename and unlink substitutes so that they time out after 30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.	2006-11-08 20:12:05 +00:00
Tom Lane	48188e1621	Fix recently-understood problems with handling of XID freezing, particularly in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.	2006-11-05 22:42:10 +00:00
Tom Lane	1e758d5263	Add some code to CREATE DATABASE to check for pre-existing subdirectories that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.	2006-10-18 22:44:12 +00:00
Peter Eisentraut	b9b4f10b5b	Message style improvements	2006-10-06 17:14:01 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	35af5422f6	Make the server track an 'XID epoch', that is, maintain higher-order bits of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.	2006-08-21 16:16:31 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	704ddaaa09	Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.	2006-08-06 03:53:44 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	3c71244b74	Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect this someday, but right now it seems that posix_fadvise is immature to the point of being broken on many platforms ... and we don't have any benchmark evidence proving it's worth spending time on.	2006-06-27 18:59:17 +00:00
Tom Lane	3a04f53e7f	pg_stop_backup was calling XLogArchiveNotify() twice for the newly created backup history file. Bug introduced by the 8.1 change to make pg_stop_backup delete older history files. Per report from Masao Fujii.	2006-06-22 20:42:57 +00:00
Tom Lane	1e8ae13640	Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration for it. Hopefully will fix core dump evidenced by some buildfarm members since fadvise patch went in. The actual definition of the function is not ABI-compatible with compiler's default assumption in the absence of any declaration, so it's clearly unsafe to try to call it without seeing a declaration.	2006-06-18 18:30:21 +00:00
Bruce Momjian	40bc06fa16	Test for POSIX_FADV_DONTNEED to use posix_fadvise().	2006-06-16 04:11:48 +00:00
Bruce Momjian	94a5c4a01b	Use posix_fadvise() to avoid kernel caching of WAL contents on WAL file close. ITAGAKI Takahiro	2006-06-15 19:15:00 +00:00
Tom Lane	eac825aa68	Ensure that we validate the page header of the first page of a WAL file whenever we start to read within that file. The first page carries extra identification information that really ought to be checked, but as the code stood, this was only checked when we switched sequentially into a new WAL file, or if by chance the starting checkpoint record was within the first page. This patch ensures that we will detect bogus 'long header' information before we start replaying the WAL sequence.	2006-04-20 04:07:38 +00:00
Tom Lane	0a87394956	Fix the torn-page hazard for PITR base backups by forcing full page writes to occur between pg_start_backup() and pg_stop_backup(), even if the GUC setting full_page_writes is OFF. Per discussion, doing this in combination with the already-existing checkpoint during pg_start_backup() should ensure safety against partial page updates being included in the backup. We do not have to force full page writes to occur during normal PITR operation, as I had first feared.	2006-04-17 18:55:05 +00:00
Tom Lane	defe93463c	Make the world safe for full_page_writes. Allow XLOG records that try to update no-longer-existing pages to fall through as no-ops, but make a note of each page number referenced by such records. If we don't see a later XLOG entry dropping the table or truncating away the page, complain at the end of XLOG replay. Since this fixes the known failure mode for full_page_writes = off, revert my previous band-aid patch that disabled that GUC variable.	2006-04-14 20:27:24 +00:00
Tom Lane	09b5271ebd	Add a field to the first page of each WAL file to indicate the XLOG_BLCKSZ. This ought to help in preventing configuration mismatch problems if anyone tries to ship PITR files between servers compiled with different XLOG_BLCKSZ settings. Simon Riggs	2006-04-05 03:34:05 +00:00
Tom Lane	e6140d9052	Don't use BLCKSZ for the physical length of the pg_control file, but instead a dedicated symbol. This probably makes no functional difference for likely values of BLCKSZ, but it makes the intent clearer. Simon Riggs, minor editorialization by Tom Lane.	2006-04-04 22:39:59 +00:00
Tom Lane	eaef111396	Define a separately configurable XLOG_BLCKSZ symbol for the page size used within WAL files. Historically this was the same as the data file BLCKSZ, but there's no necessary connection, and it's possible that performance gains might ensue from reducing XLOG_BLCKSZ. In any case distinguishing two symbols should improve code clarity. This commit does not actually change the page size, only provide the infrastructure to make it possible to do so. initdb forced because of addition of a field to pg_control. Mark Wong, with some help from Simon Riggs and Tom Lane.	2006-04-03 23:35:05 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	0a971e2f20	Disable full_page_writes, because turning it off risks causing crash-recovery failures even when the hardware and OS did nothing wrong. Per recent analysis of a problem report from Alex Bahdushka. For the moment I've just diked out the test of the parameter, rather than removing the GUC infrastructure and documentation, in case we conclude that there's something salvageable there. There seems no chance of it being resurrected in the 8.1 branch though.	2006-03-28 22:01:16 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Neil Conway	fb627b76cc	Cosmetic code cleanup: fix a bunch of places that used "return (expr);" rather than "return expr;" -- the latter style is used in most of the tree. I kept the parentheses when they were necessary or useful because the return expression was complex.	2006-01-11 08:43:13 +00:00
Tom Lane	195f164228	Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS (hence, these correspond to the old SpinLockAcquire_NoHoldoff case). Given our coding rules for spinlock use, there is no reason to allow CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there is no situation where ImmediateInterruptOK will be true while holding a spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.	2005-12-29 18:08:05 +00:00
Tom Lane	ab51bbaa06	Arrange to set the LC_XXX environment variables to match our locale setup. This protects against undesired changes in locale behavior if someone carelessly does setlocale(LC_ALL, "") (and we know who you are, perl guys).	2005-12-28 23:22:51 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Peter Eisentraut	07bb9f086b	Message corrections	2005-10-29 00:31:52 +00:00
Tom Lane	6d6c3722fb	Make code for selecting default WAL sync method less confusing.	2005-10-22 20:27:17 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	64eea6c21d	Expand pg_control information so that we can verify that the database was created on a machine with alignment rules and floating-point format similar to the current machine. Per recent discussion, this seems like a good idea with the increasing prevalence of 32/64 bit environments.	2005-10-03 00:28:43 +00:00
Tom Lane	9052537325	Rewrite gather-write patch into something less obviously bolted on after the fact. Fix bug with incorrect test for whether we are at end of logfile segment. Arrange for writes triggered by XLogInsert's is-cache-more-than-half-full test to synchronize with the cache boundaries, so that in long transactions we tend to write alternating halves of the cache rather than randomly chosen portions of it; this saves one more write syscall per cache load.	2005-08-22 23:59:04 +00:00
Tom Lane	d0096a41fa	Fix some inconsistent choices of datatypes in xlog.c. Make buffer indexes all be int, rather than variously int, uint16 and uint32; add some casts where necessary to support large buffer arrays.	2005-08-22 00:41:28 +00:00
Tom Lane	0007490e09	Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.	2005-08-20 23:26:37 +00:00
Tom Lane	d90c531188	Autovacuum loose end mop-up. Provide autovacuum-specific vacuum cost delay and limit, both as global GUCs and as table-specific entries in pg_autovacuum. stats_reset_on_server_start is now OFF by default, but a reset is forced if we did WAL replay. XID-wrap vacuums do not ANALYZE, but do FREEZE if it's a template database. Alvaro Herrera	2005-08-11 21:11:50 +00:00
Bruce Momjian	5b0bfec414	Fix compile for no O_SYNC, but introduced with O_DIRECT.	2005-07-30 14:15:44 +00:00
Tom Lane	5d5f1a79e6	Clean up a number of autovacuum loose ends. Make the stats collector track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.	2005-07-29 19:30:09 +00:00
Bruce Momjian	c6b1724c67	Update O_DIRECT comment.	2005-07-29 03:25:53 +00:00
Bruce Momjian	c34bb00581	Use O_DIRECT if available when using O_SYNC for wal_sync_method. Also, write multiple WAL buffers out in one write() operation. ITAGAKI Takahiro --------------------------------------------------------------------------- > If we disable writeback-cache and use open_sync, the per-page writing > behavior in WAL module will show up as bad result. O_DIRECT is similar > to O_DSYNC (at least on linux), so that the benefit of it will disappear > behind the slow disk revolution. > > In the current source, WAL is written as: > for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); } > Is this intentional? Can we rewrite it as follows? > write(&buffers[0], N * BLCKSZ); > > In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff). > Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff). > These two patches are independent, so they can be applied either or both. > > > I tested them on my machine and the results as follows. It shows that > direct-io and gather-write is the best choice when writeback-cache is off. > Are these two patches worth trying if they are used together? > > > \| writeback \| fsync= \| fdata \| open_ \| fsync_ \| open_ > patch \| cache \| false \| sync \| sync \| direct \| direct > ------------+-----------+--------+-------+-------+--------+--------- > direct io \| off \| 124.2 \| 105.7 \| 48.3 \| 48.3 \| 48.2 > direct io \| on \| 129.1 \| 112.3 \| 114.1 \| 142.9 \| 144.5 > gather-write\| off \| 124.3 \| 108.7 \| 105.4 \| (N/A) \| (N/A) > both \| off \| 131.5 \| 115.5 \| 114.4 \| 145.4 \| 145.2 > > - 20runs * pgbench -s 100 -c 50 -t 200 > - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8) > - using 2 ATA disks: > - hda(reiserfs) includes system and wal. > - hdc(jfs) includes database files. writeback-cache is always on. > > --- > ITAGAKI Takahiro	2005-07-29 03:22:33 +00:00
Bruce Momjian	9af9d674c6	Remove unintended code addition.	2005-07-23 15:31:16 +00:00
Bruce Momjian	4098c8867d	Macro alignment cleanup.	2005-07-23 15:29:47 +00:00
Tom Lane	d7207cfc6b	Even though I'd like to see full_page_writes go away before 8.1, a minimum requirement is that it not completely break the system meanwhile. Put the test in the right place.	2005-07-08 04:07:26 +00:00
Bruce Momjian	326a7a0788	Add GUC full_page_writes to control writing full pages to WAL.	2005-07-05 23:18:10 +00:00
Tom Lane	eb5949d190	Arrange for the postmaster (and standalone backends, initdb, etc) to chdir into PGDATA and subsequently use relative paths instead of absolute paths to access all files under PGDATA. This seems to give a small performance improvement, and it should make the system more robust against naive DBAs doing things like moving a database directory that has a live postmaster in it. Per recent discussion.	2005-07-04 04:51:52 +00:00
Tom Lane	401de9c8be	Improve the checkpoint signaling mechanism so that the bgwriter can tell the difference between checkpoints forced due to WAL segment consumption and checkpoints forced for other reasons (such as CREATE DATABASE). Avoid generating 'checkpoints are occurring too frequently' messages when the checkpoint wasn't caused by WAL segment consumption. Per gripe from Chris K-L.	2005-06-30 00:00:52 +00:00
Tom Lane	b5f7cff84f	Clean up the rather historically encumbered interface to now() and current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.	2005-06-29 22:51:57 +00:00
Tom Lane	3f749924f8	Simplify uses of readdir() by creating a function ReadDir() that includes error checking and an appropriate ereport(ERROR) message. This gets rid of rather tedious and error-prone manipulation of errno, as well as a Windows-specific bug workaround, at more than a dozen call sites. After an idea in a recent patch by Heikki Linnakangas.	2005-06-19 21:34:03 +00:00
Tom Lane	e26b0abda3	Arrange to fsync two-phase-commit state files only during checkpoints; given reasonably short lifespans for prepared transactions, this should mean that only a small minority of state files ever need to be fsynced at all. Per discussion with Heikki Linnakangas.	2005-06-19 20:00:39 +00:00
Tom Lane	d0a89683a3	Two-phase commit. Original patch by Heikki Linnakangas, with additional hacking by Alvaro Herrera and Tom Lane.	2005-06-17 22:32:51 +00:00
Bruce Momjian	f4d907ca85	Remove old .backup files when we do pg_stop_backup(). This prevents a large number of .backup files from existing in pg_xlog/	2005-06-15 01:36:08 +00:00
Bruce Momjian	51746c4549	Free buffer allocated via malloc (process is short-lived, but fix it anyway).	2005-06-09 22:36:27 +00:00
Tom Lane	f5b2f60bd1	Change WAL-logging scheme for multixacts to be more like regular transaction IDs, rather than like subtrans; in particular, the information now survives a database restart. Per previous discussion, this is essential for PITR log shipping and for 2PC.	2005-06-08 15:50:28 +00:00
Tom Lane	ee7ac7b11e	Modify XLogInsert API to make callers specify whether pages to be backed up have the standard layout with unused space between pd_lower and pd_upper. When this is set, XLogInsert will omit the unused space without bothering to scan it to see if it's zero. That saves time in XLogInsert, and also allows reversion of my earlier patch to make PageRepairFragmentation et al explicitly re-zero freed space. Per suggestion by Heikki Linnakangas.	2005-06-06 20:22:58 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	21fda22ec4	Change CRCs in WAL records from 64bit to 32bit for performance reasons. Instead of a separate CRC on each backup block, include backup blocks in their parent WAL record's CRC; this is important to ensure that the backup block really goes with the WAL record, ie there was not a page tear right at the start of the backup block. Implement a simple form of compression of backup blocks: drop any run of zeroes starting at pd_lower, so as not to store the unused 'hole' that commonly exists in PG heap and index pages. Tweak PageRepairFragmentation and related routines to ensure they keep the unused space zeroed, so that the above compression method remains effective. All per recent discussions.	2005-06-02 05:55:29 +00:00
Tom Lane	a91fa39028	Add test to WAL replay to verify that xl_prev points back to the previous WAL record; this is necessary to be sure we recognize stale WAL records when a WAL page was only partially written during a system crash.	2005-05-31 19:10:28 +00:00
Bruce Momjian	6dc7760ac3	Add support for wal_fsync_writethrough for Darwin, and restructure the code to better handle writethrough. Chris Campbell	2005-05-20 14:53:26 +00:00
Tom Lane	ee3b71f6bc	Split the shared-memory array of PGPROC pointers out of the sinval communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.	2005-05-19 21:35:48 +00:00
Bruce Momjian	35e1651508	Back out check for unreferenced files. Heikki Linnakangas	2005-05-10 22:27:30 +00:00
Bruce Momjian	76668e6eb4	Check the file system on postmaster startup and report any unreferenced files in the server log. Heikki Linnakangas	2005-05-02 18:26:54 +00:00
Tom Lane	bedb78d386	Implement sharable row-level locks, and use them for foreign key references to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. When more than one transaction is holding a shared lock on a particular row, we create a MultiXactId representing that set of transactions and store its ID in the row's XMAX. This scheme allows an effectively unlimited number of row locks, just as we did before, while not costing any extra overhead except when a shared lock actually has to be shared. Still TODO: use the regular lock manager to control the grant order when multiple backends are waiting for a row lock. Alvaro Herrera and Tom Lane.	2005-04-28 21:47:18 +00:00
Tom Lane	19d127548c	Add comment about checkpoint panic behavior during shutdown, per suggestion from Qingqing Zhou.	2005-04-23 18:49:54 +00:00
Bruce Momjian	1a6ad669fb	Fix comment typo.	2005-04-17 03:04:29 +00:00
Tom Lane	5f0a974ea9	Reduce PANIC to ERROR in several xlog routines that are used in both critical and noncritical contexts (an example of noncritical being post-checkpoint removal of dead xlog segments). In the critical cases the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC anyway, and in the noncritical cases we shouldn't let an error take down the entire database. Arguably there should be no explicit PANIC errors in this module, only more START/END_CRIT_SECTION calls, but I didn't go that far. (Yet.)	2005-04-15 22:19:48 +00:00
Tom Lane	61b861421b	Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior when recycling a large number of xlog segments during checkpoint. The former behavior searched from the same start point each time, requiring O(checkpoint_segments^2) stat() calls to relocate all the segments. Instead keep track of where we stopped last time through.	2005-04-15 18:48:10 +00:00
Tom Lane	2193a856a2	Simplify initdb-time assignment of OIDs as I proposed yesterday, and avoid encroaching on the 'user' range of OIDs by allowing automatic OID assignment to use values below 16k until we reach normal operation. initdb not forced since this doesn't make any incompatible change; however a lot of stuff will have different OIDs after your next initdb.	2005-04-13 18:54:57 +00:00
Tom Lane	8c85a34a3b	Officially decouple FUNC_MAX_ARGS from INDEX_MAX_KEYS, and set the former to 100 by default. Clean up some of the less necessary dependencies on FUNC_MAX_ARGS; however, the biggie (FunctionCallInfoData) remains.	2005-03-29 03:01:32 +00:00
Bruce Momjian	b1f57d88f5	Change Win32 O_SYNC method to O_DSYNC because that is what the method currently does. This is now the default Win32 wal sync method because we perfer o_datasync to fsync. Also, change Win32 fsync to a new wal sync method called fsync_writethrough because that is the behavior of _commit, which is what is used for fsync on Win32. Backpatch to 8.0.X.	2005-03-24 04:36:20 +00:00
Bruce Momjian	7c44e57331	Move plpgsql DEBUG from DEBUG2 to DEBUG1 because it is a user-requested DEBUG. Fix a few places where DEBUG1 crept in that should have been DEBUG2.	2005-02-12 23:53:42 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	ff5a354ece	Fix is-it-time-for-a-checkpoint logic so that checkpoint_segments can usefully be larger than 255. Per gripe from Simon Riggs.	2004-12-17 00:10:36 +00:00
Tom Lane	37d693033d	Minor adjustment of message style.	2004-11-17 16:26:59 +00:00
Neil Conway	b25d23e1e6	Don't allow pg_start_backup() to be invoked if archive_command has not been defined. Patch from Gavin Sherry, editorializing by Neil Conway.	2004-11-17 02:22:54 +00:00
Peter Eisentraut	0ed3c7665e	Small message clarifications	2004-11-05 17:11:34 +00:00
Tom Lane	ee69be44d5	Add DEBUG1-level logging of checkpoint start and end. Also, reduce the 'recycled log files' and 'removed log files' messages from DEBUG1 to DEBUG2, replacing them with a count of files added/removed/recycled in the checkpoint end message, as per suggestion from Simon Riggs.	2004-10-29 00:16:08 +00:00
Bruce Momjian	5c267325ec	Add 'int' cast for getpid() because some Solaris releases return long for getpid().	2004-10-14 20:23:46 +00:00
Peter Eisentraut	0fd37839d9	Message style revisions	2004-10-12 21:54:45 +00:00
Bruce Momjian	67608a393b	Make getpid() use %d consistently for printing.	2004-10-09 02:46:42 +00:00
Bruce Momjian	a5d7ba773d	Adjust comments previously moved to column 1 by pgident.	2004-10-07 15:21:58 +00:00
Tom Lane	8f9f198603	Restructure subtransaction handling to reduce resource consumption, as per recent discussions. Invent SubTransactionIds that are managed like CommandIds (ie, counter is reset at start of each top transaction), and use these instead of TransactionIds to keep track of subtransaction status in those modules that need it. This means that a subtransaction does not need an XID unless it actually inserts/modifies rows in the database. Accordingly, don't assign it an XID nor take a lock on the XID until it tries to do that. This saves a lot of overhead for subtransactions that are only used for error recovery (eg plpgsql exceptions). Also, arrange to release a subtransaction's XID lock as soon as the subtransaction exits, in both the commit and abort cases. This avoids holding many unique locks after a long series of subtransactions. The price is some additional overhead in XactLockTableWait, but that seems acceptable. Finally, restructure the state machine in xact.c to have a more orthogonal set of states for subtransactions.	2004-09-16 16:58:44 +00:00
Tom Lane	e32bba202d	Downgrade LOG messages to DEBUG1 for normal recycling of xlog, clog, subtrans segments. Per Greg Mullane and Chris K-L.	2004-09-06 03:04:27 +00:00
Bruce Momjian	15d3f9f6b7	Another pgindent run with lib typedefs added.	2004-08-30 02:54:42 +00:00
Tom Lane	0ffe11abd3	Widen xl_len field of XLogRecord header to 32 bits, so that we'll have a more tolerable limit on the number of subtransactions or deleted files in COMMIT and ABORT records. Buy back the extra space by eliminating the xl_xact_prev field, which isn't being used for anything and is rather unlikely ever to be used for anything. This does not force initdb, but you do need to do pg_resetxlog if you want to upgrade an existing 8.0 installation without initdb.	2004-08-29 16:34:48 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	f444dafab0	Can't truncate pg_subtrans during a recovery checkpoint --- subtrans module isn't fully initialized yet.	2004-08-28 18:18:03 +00:00
Tom Lane	4dbb880d3c	Rearrange pg_subtrans handling as per recent discussion. pg_subtrans updates are no longer WAL-logged nor even fsync'd; we do not need to, since after a crash no old pg_subtrans data is needed again. We truncate pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is refactored a little bit to separate out the necessary decisions.	2004-08-23 23:22:45 +00:00
Bruce Momjian	10249abfa1	Cleanup Win32 COPY handling, and move archive examples to SGML.	2004-08-12 19:03:44 +00:00
Bruce Momjian	43ea65a0dc	Add mention of "WIN32" COPY.	2004-08-12 18:34:45 +00:00
Bruce Momjian	6525b42b10	Add make_native_path() because Win32 COPY is an internal CMD.EXE command and doesn't process forward slashes in the same way as external commands. Quoting the first argument to COPY does not convert forward to backward slashes, but COPY does properly process quoted forward slashes in the second argument. Win32 COPY works with quoted forward slashes in the first argument only if the current directory is the same as the directory of the first argument.	2004-08-12 18:32:52 +00:00
Tom Lane	3fdf649f4f	Fix failure to guarantee that a checkpoint will write out pg_clog updates for transaction commits that occurred just before the checkpoint. This is an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a reproducible test case to prove its existence.	2004-08-11 04:07:16 +00:00
Tom Lane	35f539b481	When expanding %p in archive_command or restore_command, translate slashes to backslashes #ifdef WIN32. This is to cope with the fact that Windows seems exceedingly unfriendly to slashes in shell commands, as per recent discussion.	2004-08-09 16:26:06 +00:00
Tom Lane	7dca975c5d	Add a comment about why we always replay backup blocks from WAL.	2004-08-08 03:22:08 +00:00
Tom Lane	fcbc438727	Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments and documentation to reference 8.0 instead of 7.5.	2004-08-04 21:34:35 +00:00
Tom Lane	b387d16f96	Make use of backup label/history files to control recovery properly.	2004-08-04 16:25:02 +00:00
Tom Lane	58c41712d5	Add functions pg_start_backup, pg_stop_backup to create backup label and history files as per recent discussion. While at it, remove pg_terminate_backend, since we have decided we do not have time during this release cycle to address the reliability concerns it creates. Split the 'Miscellaneous Functions' documentation section into 'System Information Functions' and 'System Administration Functions', which hopefully will draw the eyes of those looking for such things.	2004-08-03 20:32:36 +00:00
Tom Lane	5cc380f9a3	Error message style adjustments, per Alvaro Herrera.	2004-08-01 17:45:43 +00:00
Tom Lane	acd907bfcc	Add cross-check that current timeline of pg_control is an ancestor of recovery_target_timeline --- otherwise there is no path from the backup to the requested timeline. This check was foreseen in the original discussion but I forgot to implement it.	2004-07-22 21:09:37 +00:00
Tom Lane	3dba9cb694	Add a check on file size as an additional safety check that a WAL file recovered from archive is not corrupt. It's not much but it will catch one common problem, viz out-of-disk-space. Also, force a WAL recovery scan when recovery.conf is present, even if pg_control shows a clean shutdown. This allows recovery with a tar backup that was taken with the postmaster shut down, as per complaint from Mark Kirkwood.	2004-07-22 20:18:40 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Tom Lane	9c7a765f02	Remove unportable use of strptime() to parse recovery target time spec. Instead use our own abstimein code, which is more flexible anyway.	2004-07-19 14:34:39 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	573a71a5da	Nested transactions. There is still much left to do, especially on the performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.	2004-07-01 00:52:04 +00:00
Tom Lane	921d749bd4	Adjust our timezone library to use pg_time_t (typedef'd as int64) in place of time_t, as per prior discussion. The behavior does not change on machines without a 64-bit-int type, but on machines with one, which is most, we are rid of the bizarre boundary behavior at the edges of the 32-bit-time_t range (1901 and 2038). The system will now treat times over the full supported timestamp range as being in your local time zone. It may seem a little bizarre to consider that times in 4000 BC are PST or EST, but this is surely at least as reasonable as propagating Gregorian calendar rules back that far. I did not modify the format of the zic timezone database files, which means that for the moment the system will not know about daylight-savings periods outside the range 1901-2038. Given the way the files are set up, it's not a simple decision like 'widen to 64 bits'; we have to actually think about the range of years that need to be supported. We should probably inquire what the plans of the upstream zic people are before making any decisions of our own.	2004-06-03 02:08:07 +00:00
Tom Lane	076a055acf	Separate out bgwriter code into a logically separate module, rather than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.	2004-05-29 22:48:23 +00:00
Tom Lane	1a321f26d8	Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.	2004-05-28 05:13:32 +00:00
Tom Lane	16974ee910	Get rid of the former rather baroque mechanism for propagating the values of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just to make them all grab the values out of shared memory during startup. This helps to decouple the postmaster from checkpoint execution, which I need since I'm intending to let the bgwriter do it instead, and it also fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at all AFAICS. (Doesn't give me a lot of faith in the amount of testing that port has gotten.)	2004-05-27 17:12:57 +00:00
Tom Lane	e6319d1d28	Put back #include <sys/time.h> in files that seem to need it on Linux.	2004-05-21 16:08:47 +00:00
Tom Lane	63bd0db121	Integrate src/timezone library for all platforms. There is more we can and should do now that we control our own destiny for timezone handling, but this commit gets the bulk of the picayune diffs in place. Magnus Hagander and Tom Lane.	2004-05-21 05:08:06 +00:00
Tom Lane	0bd61548ab	Solve the 'Turkish problem' with undesirable locale behavior for case conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.	2004-05-07 00:24:59 +00:00
Bruce Momjian	31338352bd	* Most changes are to fix warnings issued when compiling win32 * removed a few redundant defines * get_user_name safe under win32 * rationalized pipe read EOF for win32 (UPDATED PATCH USED) * changed all backend instances of sleep() to pg_usleep - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a 32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is acceptable, please replace with pg_usleep(2000000000L)] I added a comment to that part of the code: /* * It would be nice to use pg_usleep() here, but only does 2000 sec * or 33 minutes, which seems too short. */ sleep(1000000); Claudio Natoli	2004-04-19 17:42:59 +00:00
Bruce Momjian	6367ed4382	Increase xlog str_time() static string variable, per Korean User's Group.	2004-03-22 04:16:57 +00:00
Tom Lane	7a57a67278	Replace opendir/closedir calls throughout the backend with AllocateDir and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.	2004-02-23 23:03:10 +00:00
Bruce Momjian	1f17316a3d	Here is an updated version of the win32 readdir patch. 1) Now puts in exactly the same change as the current-cvs mingw code does. (see http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1= 1.3&r2=1.4, second part of the patch). 2) Updates both xlog.c and slru.c in backend/access/transam/ 3) Also updates pg_resetxlog, which also uses readdir() and checks the errno value after the loop. Magnus Hagander	2004-02-17 03:45:17 +00:00
Tom Lane	c3c09be34b	Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to wit: Add a header record to each WAL segment file so that it can be reliably identified. Avoid splitting WAL records across segment files (this is not strictly necessary, but makes it simpler to incorporate the header records). Make WAL entries for file creation, deletion, and truncation (as foreseen but never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent smgr API changes. initdb is forced due to changes in pg_control contents.	2004-02-11 22:55:26 +00:00
Tom Lane	87bd956385	Restructure smgr API as per recent proposal. smgr no longer depends on the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.	2004-02-10 01:55:27 +00:00
Tom Lane	c77f363384	Ensure that close() and fclose() are checked for errors, at least in cases involving writes. Per recent discussion about the possibility of close-time failures on some filesystems. There is a TODO item for this, too.	2004-01-26 22:35:32 +00:00
Tom Lane	9bd681a522	Repair problem identified by Olivier Prenant: ALTER DATABASE SET search_path should not be too eager to reject paths involving unknown schemas, since it can't really tell whether the schemas exist in the target database. (Also, when reading pg_dumpall output, it could be that the schemas don't exist yet, but eventually will.) ALTER USER SET has a similar issue. So, reduce the normal ERROR to a NOTICE when checking search_path values for these commands. Supporting this requires changing the API for GUC assign_hook functions, which causes the patch to touch a lot of places, but the changes are conceptually trivial.	2004-01-19 19:04:40 +00:00
Tom Lane	06288d4e22	Suppress compiler warning (xlog_outrec is unused if not WAL_DEBUG).	2004-01-06 22:22:37 +00:00
Neil Conway	bc028beb16	Make the 'wal_debug' GUC variable a boolean (rather than an integer), and hide it behind #ifdef WAL_DEBUG blocks.	2004-01-06 17:26:23 +00:00
Bruce Momjian	d75b2ec4eb	This patch is the next step towards (re)allowing fork/exec. Claudio Natoli	2003-12-20 17:31:21 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
Peter Eisentraut	2afacfc403	This patch properly sets the prototype for the on_shmem_exit and on_proc_exit functions, and adjust all other related code to use the proper types too. by Kurt Roeckx	2003-12-12 18:45:10 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	4f7a2fa0c3	Fix typo in message.	2003-09-27 18:16:35 +00:00
Peter Eisentraut	d84b6ef56b	Various message fixes, among those fixes for the previous round of fixes	2003-09-26 15:27:37 +00:00
Peter Eisentraut	feb4f44d29	Message editing: remove gratuitous variations in message wording, standardize terms, add some clarifications, fix some untranslatable attempts at dynamic message building.	2003-09-25 06:58:07 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	81b5c8a136	A visit from the message-style police ...	2003-07-28 00:09:16 +00:00
Tom Lane	ec7aa4b515	Error message editing in backend/access.	2003-07-21 20:29:40 +00:00
Tom Lane	8cf63ba920	Repair boundary-case bug introduced by patch of two months ago that fixed incorrect initial setting of StartUpID. The logic in XLogWrite() expects that Write->curridx is advanced to the next page as soon as LogwrtResult points to the end of the current page, but StartupXLOG() failed to make that happen when the old WAL ended exactly on a page boundary. Per trouble report from Hannu Krosing.	2003-07-17 16:45:04 +00:00
Tom Lane	0c985ab5a8	Add comment pointing out that XLByteToPrevSeg macro is not broken.	2003-06-26 18:23:07 +00:00
Tom Lane	39e98d9563	Repair sometimes-incorrect computation of StartUpID after a crash, per example from Rao Kumar. This is a very corner corner-case, requiring a minimum of three closely-spaced database crashes and an unlucky positioning of the second recovery's checkpoint record before you'd notice any problem. But the consequences are dire enough that it's a must-fix.	2003-05-22 14:39:28 +00:00
Tom Lane	8d86a96068	Adjust CreateCheckpoint so that buffer dumping activities and cleanup of dead xlog segments are not considered part of a critical section. It is not necessary to force a database-wide panic if we get a failure in these operations. Per recent trouble reports.	2003-05-10 18:01:31 +00:00
Tom Lane	9cbaf72177	In the continuing saga of FE/BE protocol revisions, add reporting of initial values and runtime changes in selected parameters. This gets rid of the need for an initial 'select pg_client_encoding()' query in libpq, bringing us back to one message transmitted in each direction for a standard connection startup. To allow server version to be sent using the same GUC mechanism that handles other parameters, invent the concept of a never-settable GUC parameter: you can 'show server_version' but it's not settable by any GUC input source. Create 'lc_collate' and 'lc_ctype' never-settable parameters so that people can find out these settings without need for pg_controldata. (These side ideas were all discussed some time ago in pgsql-hackers, but not yet implemented.)	2003-04-25 19:45:10 +00:00
Bruce Momjian	4d4953fc41	Make Win32 tests to match existing Cygwin tests, where appropriate.	2003-04-18 01:03:42 +00:00
Tom Lane	70508ba7ae	Make btree index structure adjustments and WAL logging changes needed to support btree compaction, as per proposal of a few days ago. btree index pages no longer store parent links, instead they have a level indicator (counting up from zero for leaf pages). The FixBTree recovery logic is removed, and replaced by code that detects missing parent-level insertions during WAL replay. Also, generate appropriate WAL entries when updating btree metapage and when building a btree index from scratch. I believe btree indexes are now completely WAL-legal for the first time. initdb forced due to index and WAL changes.	2003-02-21 00:06:22 +00:00
Tom Lane	80727ce14f	Use stat(2) to probe for existing xlog segments in InstallXLogFileSegment, rather than actually opening the files. This eliminates some corner cases where the file indeed exists but open() fails for another reason, such as being out of file descriptors. The net reliability gain is probably tiny, since xlog.c is full of other file open calls that will elog(PANIC) if they fail for any reason; but this specific failure mode has been observed in the field, so we may as well fix it.	2003-01-25 03:06:04 +00:00
Bruce Momjian	bea4792125	This patch removes a bunch of superfluous #include directives: if postgres.h or c.h includes a system header (such as stdio.h or stdlib.h), there's no need to specifically include it in any of the .c files in the backend. Neil Conway	2002-11-08 20:23:57 +00:00
Tom Lane	f6e0130b5b	Clean up a few fprintf(stderr)'s that should be elog's.	2002-11-02 15:54:13 +00:00

1 2 3 4 5 ...

358 Commits