postgresql

Commit Graph

Author	SHA1	Message	Date
Peter Eisentraut	21f1e15aaf	Unify spelling of "canceled", "canceling", "cancellation" We had previously (`af26857a27`) established the U.S. spellings as standard.	2011-06-29 09:28:46 +03:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Robert Haas	727589995a	Move synchronous_standbys_defined updates from WAL writer to BG writer. This is advantageous because the BG writer is alive until much later in the shutdown sequence than WAL writer; we want to make sure that it's possible to shut off synchronous replication during a smart shutdown, else it might not be possible to complete the shutdown at all. Per very reasonable gripes from Fujii Masao and Simon Riggs.	2011-03-18 21:43:45 -04:00
Robert Haas	7f242d880b	Try to avoid running with a full fsync request queue. When we need to insert a new entry and the queue is full, compact the entire queue in the hopes of making room for the new entry. Doing this on every insertion might worsen contention on BgWriterCommLock, but when the queue it's full, it's far better than allowing the backend to perform its own fsync, per testing by Greg Smith as reported in http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php Original idea from Greg Smith. Patch by me. Review by Chris Browne and Greg Smith	2011-01-29 08:08:41 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Robert Haas	3134d8863e	Add new buffers_backend_fsync field to pg_stat_bgwriter. This new field counts the number of times that a backend which writes a buffer out to the OS must also fsync() it. This happens when the bgwriter fsync request queue is full, and is generally detrimental to performance, so it's good to know when it's happening. Along the way, log a new message at level DEBUG1 whenever we fail to hand off an fsync, so that the problem can also be seen in examination of log files (if the logging level is cranked up high enough). Greg Smith, with minor tweaks by me.	2010-11-15 12:42:59 -05:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Tom Lane	77acab75df	Modify ShmemInitStruct and ShmemInitHash to throw errors internally, rather than returning NULL for some-but-not-all failures as they used to. Remove now-redundant tests for NULL from call sites. We had to do something about this because many call sites were failing to check for NULL; and changing it like this seems a lot more useful and mistake-proof than adding checks to the call sites without them.	2010-04-28 16:54:16 +00:00
Bruce Momjian	4b113d9cdc	Document that archive_timeout will force new WAL files even if a single checkpoint has happened, and recommend adjusting checkpoint_timeout to reduce the impact of this.	2010-02-05 23:37:43 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Peter Eisentraut	b63b967a7e	If there is no sigdelset(), define it as a macro. This removes some duplicate code that recreated the identical workaround when the newer signal API is missing.	2009-12-16 22:55:34 +00:00
Tom Lane	2487d872e0	Create a multiplexing structure for signals to Postgres child processes. This patch gets us out from under the Unix limitation of two user-defined signal types. We already had done something similar for signals directed to the postmaster process; this adds multiplexing for signals directed to backends and auxiliary processes (so long as they're connected to shared memory). As proof of concept, replace the former usage of SIGUSR1 and SIGUSR2 for backends with use of the multiplexing mechanism. There are still some hard-wired definitions of SIGUSR1 and SIGUSR2 for other process types, but getting rid of those doesn't seem interesting at the moment. Fujii Masao	2009-07-31 20:26:23 +00:00
Tom Lane	2de48a83e6	Cleanup and code review for the patch that made bgwriter active during archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom	2009-06-26 20:29:04 +00:00
Heikki Linnakangas	7e48b77b1c	Fix some serious bugs in archive recovery, now that bgwriter is active during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.	2009-06-25 21:36:00 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	76d4abf2d9	Improve the recently-added support for properly pluralized error messages by extending the ereport() API to cater for pluralization directly. This is better than the original method of calling ngettext outside the elog.c code because (1) it avoids double translation, which wastes cycles and in the worst case could give a wrong result; and (2) it avoids having to use a different coding method in PL code than in the core backend. The client-side uses of ngettext are not touched since neither of these concerns is very pressing in the client environment. Per my proposal of yesterday.	2009-06-04 18:33:08 +00:00
Tom Lane	4616d57dad	Fix all the server-side SIGQUIT handlers (grumble ... why so many identical copies?) to ensure they really don't run proc_exit/shmem_exit callbacks, as was intended. I broke this behavior recently by installing atexit callbacks without thinking about the one case where we truly don't want to run those callback functions. Noted in an example from Dave Page.	2009-05-15 15:56:39 +00:00
Peter Eisentraut	8032d76b5b	Gettext plural support In the backend, I changed only a handful of exemplary or important-looking instances to make use of the plural support; there is probably more work there. For the rest of the source, this should cover all relevant cases.	2009-03-26 22:26:08 +00:00
Heikki Linnakangas	cdd46c7654	Start background writer during archive recovery. Background writer now performs its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.	2009-02-18 15:58:41 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Tom Lane	6f6a6d8b14	Teach RequestCheckpoint() to wait and retry a few times if it can't signal the bgwriter immediately. This covers the case where the bgwriter is still starting up, as seen in a recent buildfarm failure. In future it might also assist with clean recovery after a bgwriter termination and restart --- right now the postmaster treats early bgwriter exit as a system crash, but that might not always be so.	2008-11-23 01:40:19 +00:00
Heikki Linnakangas	84c3769482	Fix oversight in the relation forks patch: forgot to copy fork number to fsync requests. This should fix the installcheck failure of the buildfarm member "kudu".	2008-10-14 08:06:39 +00:00
Heikki Linnakangas	15c121b3ed	Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.	2008-09-30 10:52:14 +00:00
Heikki Linnakangas	3f0e808c4a	Introduce the concept of relation forks. An smgr relation can now consist of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.	2008-08-11 11:05:11 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Tom Lane	cd00406774	Replace time_t with pg_time_t (same values, but always int64) in on-disk data structures and backend internal APIs. This solves problems we've seen recently with inconsistent layout of pg_control between machines that have 32-bit time_t and those that have already migrated to 64-bit time_t. Also, we can get out from under the problem that Windows' Unix-API emulation is not consistent about the width of time_t. There are a few remaining places where local time_t variables are used to hold the current or recent result of time(NULL). I didn't bother changing these since they do not affect any cross-module APIs and surely all platforms will have 64-bit time_t before overflow becomes an actual risk. time_t should be avoided for anything visible to extension modules, however.	2008-02-17 02:09:32 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	5858990f87	Fix incorrect calculation of elapsed_xlogs. Itagaki Takahiro	2007-11-14 21:19:18 +00:00
Tom Lane	b26738b583	Change Assert() to a plain test and elog, just to see if that works around the icc bug exhibited by buildfarm member dugong.	2007-10-04 15:37:44 +00:00
Tom Lane	6f5c38dcd0	Just-in-time background writing strategy. This code avoids re-scanning buffers that cannot possibly need to be cleaned, and estimates how many buffers it should try to clean based on moving averages of recent allocation requests and density of reusable buffers. The patch also adds a couple more columns to pg_stat_bgwriter to help measure the effectiveness of the bgwriter. Greg Smith, building on his own work and ideas from several other people, in particular a much older patch from Itagaki Takahiro.	2007-09-25 20:03:38 +00:00
Tom Lane	039dc49d55	Remove Assert(BgWriterShmem != NULL), which is rather pointless since we'd dump core anyway immediately afterward if it were null; and it seems to confuse some versions of icc into generating bad code. Per report from Sergey Koposov. Patched in HEAD only, for the moment, since this is only likely to affect developers.	2007-09-16 16:33:04 +00:00
Tom Lane	f181f9e1e4	Make sure that open hash table scans are cleaned up when bgwriter tries to recover from elog(ERROR). Problem was created by introduction of hash seq search tracking awhile back, and affects all branches that have bgwriter; in HEAD the disease has snuck into autovacuum and walwriter too. (Not sure that the latter two use hash_seq_search at the moment, but surely they might someday.) Per report from Sergey Koposov.	2007-09-11 17:15:33 +00:00
Tom Lane	83aaebba63	Fix incorrect comment about the timing of AbsorbFsyncRequests() during checkpoint. The comment claimed that we could do this anytime after setting the checkpoint REDO point, but actually BufferSync is relying on the assumption that buffers dumped by other backends will be fsync'd too. So we really could not do it any sooner than we are doing it.	2007-07-03 14:51:24 +00:00
Tom Lane	9fc25c0511	Improve logging of checkpoints. Patch by Greg Smith, worked over by Heikki and a little bit by me.	2007-06-30 19:12:02 +00:00
Tom Lane	867e2c91a0	Implement "distributed" checkpoints in which the checkpoint I/O is spread over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.	2007-06-28 00:02:40 +00:00
Tom Lane	77947c51c0	Fix up pgstats counting of live and dead tuples to recognize that committed and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.	2007-05-27 03:50:39 +00:00
Magnus Hagander	335feca441	Add some instrumentation to the bgwriter, through the stats collector. New view pg_stat_bgwriter, and the functions required to build it.	2007-03-30 18:34:56 +00:00
Tom Lane	eddbf39756	Extend yesterday's patch so that the bgwriter is also told to forget pending fsyncs during DROP DATABASE. Obviously necessary in hindsight :-(	2007-01-17 16:25:01 +00:00
Tom Lane	6d660587f6	Revise bgwriter fsync-request mechanism to improve robustness when a table is deleted. A backend about to unlink a file now sends a "revoke fsync" request to the bgwriter to make it clean out pending fsync requests. There is still a race condition where the bgwriter may try to fsync after the unlink has happened, but we can resolve that by rechecking the fsync request queue to see if a revoke request arrived meanwhile. This eliminates the former kluge of "just assuming" that an ENOENT failure is okay, and lets us handle the fact that on Windows it might be EACCES too without introducing any questionable assumptions. After an idea of mine improved by Magnus. The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port later. In the meantime this could do with some testing on Windows; I've been able to force it through the code path via ENOENT, but that doesn't prove that it actually fixes the Windows problem ...	2007-01-17 00:17:21 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	3049fe7cfa	Make the bgwriter's error recovery path do smgrcloseall(). On Windows this should allow delete-pending files to actually go away, and thereby work around the various complaints we've seen about 'permission denied' errors in such cases. Should be reasonably harmless in any case...	2006-12-01 19:55:28 +00:00
Tom Lane	5f60086e10	Minor adjustments to make failures in startup/shutdown behave more cleanly. StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.	2006-11-30 18:29:12 +00:00
Tom Lane	3ad0728c81	On systems that have setsid(2) (which should be just about everything except Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.	2006-11-21 20:59:53 +00:00
Tom Lane	e82d9e6283	Adjust elog.c so that elog(FATAL) exits (including cases where ERROR is promoted to FATAL) end in exit(1) not exit(0). Then change the postmaster to allow exit(1) without a system-wide panic, but not for the startup subprocess or the bgwriter. There were a couple of places that were using exit(1) to deliberately force a system-wide panic; adjust these to be exit(2) instead. This fixes the problem noted back in July that if the startup process exits with elog(ERROR), the postmaster would think everything is hunky-dory and proceed to start up. Alternative solutions such as trying to run the entire startup process as a critical section seem less clean, primarily because of the fact that a fair amount of startup code is shared by all postmaster children in the EXEC_BACKEND case. We'd need an ugly special case somewhere near the head of main.c to make it work if it's the child process's responsibility to determine what happens; and what's the point when the postmaster already treats different children differently?	2006-11-21 00:49:55 +00:00
Peter Eisentraut	b9b4f10b5b	Message style improvements	2006-10-06 17:14:01 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00

1 2

76 Commits