postgresql

Commit Graph

Author	SHA1	Message	Date
Tom Lane	04011cc970	Allow backends to start up without use of the flat-file copy of pg_database. To make this work in the base case, pg_database now has a nailed-in-cache relation descriptor that is initialized using hardwired knowledge in relcache.c. This means pg_database is added to the set of relations that need to have a Schema_pg_xxx macro maintained in pg_attribute.h. When this path is taken, we'll have to do a seqscan of pg_database to find the row we need. In the normal case, we are able to do an indexscan to find the database's row by name. This is made possible by storing a global relcache init file that describes only the shared catalogs and their indexes (and therefore is usable by all backends in any database). A new backend loads this cache file, finds its database OID after an indexscan on pg_database, and then loads the local relcache init file for that database. This change should effectively eliminate number of databases as a factor in backend startup time, even with large numbers of databases. However, the real reason for doing it is as a first step towards getting rid of the flat files altogether. There are still several other sub-projects to be tackled before that can happen.	2009-08-12 20:53:31 +00:00
Heikki Linnakangas	23dc89d2c3	Improve error messages in md.c. When a filesystem operation like open() or fsync() fails, say "file" rather than "relation" when printing the filename. This makes messages that display block numbers a bit confusing. For example, in message 'could not read block 150000 of file "base/1234/5678.1"', 150000 is the block number from the beginning of the relation, ie. segment 0, not 150000th block within that segment. Per discussion, users aren't usually interested in the exact location within the file, so we can live with that. To ease constructing error messages, add FilePathName(File) function to return the pathname of a virtual fd.	2009-08-05 18:01:54 +00:00
Tom Lane	2487d872e0	Create a multiplexing structure for signals to Postgres child processes. This patch gets us out from under the Unix limitation of two user-defined signal types. We already had done something similar for signals directed to the postmaster process; this adds multiplexing for signals directed to backends and auxiliary processes (so long as they're connected to shared memory). As proof of concept, replace the former usage of SIGUSR1 and SIGUSR2 for backends with use of the multiplexing mechanism. There are still some hard-wired definitions of SIGUSR1 and SIGUSR2 for other process types, but getting rid of those doesn't seem interesting at the moment. Fujii Masao	2009-07-31 20:26:23 +00:00
Tom Lane	8504905793	Fix a thinko introduced into CountActiveBackends by a recent patch: we should ignore NULL array entries, not non-NULL ones. This had the effect of disabling commit_delay, and could have caused a crash in the rare race condition the patch was intended to fix. Bug report and diagnosis by Jeff Janes, in bug #4952.	2009-07-29 15:57:11 +00:00
Tom Lane	2de48a83e6	Cleanup and code review for the patch that made bgwriter active during archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom	2009-06-26 20:29:04 +00:00
Heikki Linnakangas	7e48b77b1c	Fix some serious bugs in archive recovery, now that bgwriter is active during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.	2009-06-25 21:36:00 +00:00
Tom Lane	6382448cf9	For bulk write operations (eg COPY IN), use a ring buffer of 16MB instead of the 256KB limit originally enforced by a patch committed 2008-11-06. Per recent test results, the smaller size resulted in an undesirable decrease in bulk data loading speed, due to COPY processing frequently getting blocked for WAL flushing. This area might need more tweaking later, but this setting seems to be good enough for 8.4.	2009-06-22 20:04:28 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	4616d57dad	Fix all the server-side SIGQUIT handlers (grumble ... why so many identical copies?) to ensure they really don't run proc_exit/shmem_exit callbacks, as was intended. I broke this behavior recently by installing atexit callbacks without thinking about the one case where we truly don't want to run those callback functions. Noted in an example from Dave Page.	2009-05-15 15:56:39 +00:00
Tom Lane	249a899f73	Install an atexit(2) callback that ensures that proc_exit's cleanup processing will still be performed if something in a backend process calls exit() directly, instead of going through proc_exit() as we prefer. This is a second response to the issue that we might load third-party code that doesn't know it should not call exit(). Such a call will now cause a reasonably graceful backend shutdown, if possible. (Of course, if the reason for the exit() call is out-of-memory or some such, we might not be able to recover, but at least we will try.)	2009-05-05 20:06:07 +00:00
Tom Lane	969d7cd431	Install a "dead man switch" to allow the postmaster to detect cases where a backend has done exit(0) or exit(1) without having disengaged itself from shared memory. We are at risk for this whenever third-party code is loaded into a backend, since such code might not know it's supposed to go through proc_exit() instead. Also, it is reported that under Windows there are ways to externally kill a process that cause the status code returned to the postmaster to be indistinguishable from a voluntary exit (thank you, Microsoft). If this does happen then the system is probably hosed --- for instance, the dead session might still be holding locks. So the best recovery method is to treat this like a backend crash. The dead man switch is armed for a particular child process when it acquires a regular PGPROC, and disarmed when the PGPROC is released; these should be the first and last touches of shared memory resources in a backend, or close enough anyway. This choice means there is no coverage for auxiliary processes, but I doubt we need that, since they shouldn't be executing any user-provided code anyway. This patch also improves the management of the EXEC_BACKEND ShmemBackendArray array a bit, by reducing search costs. Although this problem is of long standing, the lack of field complaints seems to mean it's not critical enough to risk back-patching; at least not till we get some more testing of this mechanism.	2009-05-05 19:59:00 +00:00
Tom Lane	c973051ae6	A session that does not have any live snapshots does not have to be waited for when we are waiting for old snapshots to go away during a concurrent index build. In particular, this rule lets us avoid waiting for idle-in-transaction sessions. This logic could be improved further if we had some way to wake up when the session we are currently waiting for goes idle-in-transaction. However that would be a significantly more complex/invasive patch, so it'll have to wait for some other day. Simon Riggs, with some improvements by Tom.	2009-04-04 17:40:36 +00:00
Tom Lane	1b2bb33a54	Add a comment documenting the question of whether PrefetchBuffer should try to protect an already-existing buffer from being evicted. This was left as an open issue when the posix_fadvise patch was committed. I'm not sure there's any evidence to justify more work in this area, but we should have some record about it in the source code.	2009-04-03 18:17:43 +00:00
Tom Lane	948d6ec90f	Modify the relcache to record the temp status of both local and nonlocal temp relations; this is no more expensive than before, now that we have pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting to fetch pages from nonlocal temp relations. This provides a low-level defense against bugs-of-omission allowing temp pages to be loaded into shared buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop. While at it, tweak a bunch of places to use new relcache tests (instead of expensive probes into pg_namespace) to detect local or nonlocal temp tables.	2009-03-31 22:12:48 +00:00
Heikki Linnakangas	eeeb782e60	Fix a rare race condition when commit_siblings > 0 and a transaction commits at the same instant as a new backend is spawned. Since CountActiveBackends() doesn't hold ProcArrayLock, it needs to be prepared for the case that a pointer at the end of the proc array is still NULL even though numProcs says it should be valid, since it doesn't hold ProcArrayLock. Backpatch to 8.1. 8.0 and earlier had this right, but it was broken in the split of PGPROC and sinval shared memory arrays. Per report and proposal by Marko Kreen.	2009-03-31 05:18:33 +00:00
Tom Lane	471913a6a5	More fixes for 8.4 DTrace probes. Remove useless BUFFER_HIT/BUFFER_MISS probes --- the BUFFER_READ_DONE probe provides the same information and more besides. Expand the LOCK_WAIT_START/DONE probe arguments so that there's actually some chance of telling what is being waited for. Update and clean up the documentation.	2009-03-23 01:52:38 +00:00
Tom Lane	44023dc5f5	Add isExtend to the parameters of the buffer_read_start and buffer_read_done DTrace probes, so that ordinary reads can be distinguished from relation extension operations. Move buffer_read_start probe to before the smgrnblocks() call that's needed in the isExtend case, since really that step should be charged as part of the time needed for the extension operation. (This makes it slightly harder to match the read_start with the associated read_done, since now you can't match them on blockNumber, but it should still be possible since isExtend operations on the same relation can never be interleaved.) Per recent discussion. In passing, add the page identity (forkNum/blockNum) to the parameters of the buffer_flush_start/buffer_flush_done probes, which were unaccountably lacking the info.	2009-03-22 22:39:05 +00:00
Tom Lane	d287c9eff0	Restore previous ordering of BUFFER_FLUSH_START probe. I had wanted to make it include the time for the possible smgropen() call, but that results in a null pointer dereference :-(. An alternative solution would be to fetch the buffer tag instead of looking at *reln, but I'll just put it back as it was for the moment. BTW, this indicates that DTrace probes evaluate their arguments even when nominally inactive. What was that about "zero cost", again?	2009-03-13 17:46:21 +00:00
Tom Lane	e04810e8c4	Code review for dtrace probes added (so far) to 8.4. Adjust placement of some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.	2009-03-11 23:19:25 +00:00
Peter Eisentraut	9add9f95c3	Don't actively violate the system limit of maximum open files (RLIMIT_NOFILE). This avoids irritating kernel logs (if system overstep violations are enabled) and also the grsecurity alert when starting PostgreSQL. original patch by Jacek Drobiecki References: http://archives.postgresql.org/pgsql-bugs/2004-05/msg00103.php http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=248967	2009-03-04 09:12:49 +00:00
Heikki Linnakangas	cdd46c7654	Start background writer during archive recovery. Background writer now performs its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.	2009-02-18 15:58:41 +00:00
Heikki Linnakangas	b2a667b9ee	Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.	2009-01-20 18:59:37 +00:00
Tom Lane	b7b8f0b609	Implement prefetching via posix_fadvise() for bitmap index scans. A new GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark	2009-01-12 05:10:45 +00:00
Tom Lane	dad75a62bf	Create a "shmem_startup_hook" to be called at the end of shared memory initialization, to give loadable modules a reasonable place to perform creation of any shared memory areas they need. This is the logical conclusion of our previous creation of RequestAddinShmemSpace() and RequestAddinLWLocks(). We don't need an explicit shmem_shutdown_hook, because the existing on_shmem_exit and on_proc_exit mechanisms serve that need. Also, adjust SubPostmasterMain so that libraries that got loaded into the postmaster will be loaded into all child processes, not only regular backends. This improves consistency with the non-EXEC_BACKEND behavior, and might be necessary for functionality for some types of add-ons.	2009-01-03 17:08:39 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Bruce Momjian	5a90bc1fbe	The attached patch contains a couple of fixes in the existing probes and includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor	2008-12-17 01:39:04 +00:00
Tom Lane	55368223cd	Tweak the tree descent loop in fsm_search_avail to not look at the right child if it doesn't need to. This saves some miniscule number of cycles, but the ulterior motive is to avoid an optimization bug known to exist in SCO's C compiler (and perhaps others?)	2008-12-10 17:11:18 +00:00
Heikki Linnakangas	dea81a6cf6	Revert SIGUSR1 multiplexing patch, per Tom's objection.	2008-12-09 15:59:39 +00:00
Heikki Linnakangas	7b05b3fa39	Provide support for multiplexing SIGUSR1 signal. The upcoming synchronous replication patch needs a signal, but we've already used SIGUSR1 and SIGUSR2 in normal backends. This patch allows reusing SIGUSR1 for that, and for other purposes too if the need arises.	2008-12-09 14:28:20 +00:00
Alvaro Herrera	7b640b0345	Fix a couple of snapshot management bugs in the new ResourceOwner world: non-writable large objects need to have their snapshots registered on the transaction resowner, not the current portal's, because it must persist until the large object is closed (which the portal does not). Also, ensure that the serializable snapshot is recorded by the transaction resource owner too, even when a subtransaction has changed the current resource owner before serializable is taken. Per bug reports from Pavan Deolasee.	2008-12-04 14:51:02 +00:00
Heikki Linnakangas	011fa3662e	Small comment fixes.	2008-12-03 12:22:53 +00:00
Heikki Linnakangas	4d6ee26171	Don't force creation of the FSM on searches. It will still be created as soon as the first page fills up, and is marked as (almost) full, though.	2008-11-27 13:32:26 +00:00
Heikki Linnakangas	58bece7a60	Fix #ifdeffed debugging code to work with relation forks.	2008-11-27 07:38:01 +00:00
Heikki Linnakangas	9858a8c81c	Rely on relcache invalidation to update the cached size of the FSM.	2008-11-26 17:08:58 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00
Heikki Linnakangas	f06b7604ca	Fix oversight in previous error-reporting patch; mustn't pfree path string before passing it to elog.	2008-11-14 11:09:50 +00:00
Tom Lane	cad3a26a95	Fix sloppy omission of now-required #include's.	2008-11-11 14:17:02 +00:00
Heikki Linnakangas	7e8b0b9ab1	Change error messages to print the physical path, like "base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.	2008-11-11 13:19:16 +00:00
Tom Lane	6517f377d6	Implement ALTER DATABASE SET TABLESPACE to move a whole database (or at least as much of it as lives in its default tablespace) to a new tablespace. Guillaume Lelarge, with some help from Bernd Helmle and Tom Lane	2008-11-07 18:25:07 +00:00
Tom Lane	85e2cedf98	Improve bulk-insert performance by keeping the current target buffer pinned (but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.	2008-11-06 20:51:15 +00:00
Tom Lane	b4eae023bb	Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage by splitting it into three functions with better-defined behaviors. Zdenek Kotala	2008-11-03 20:47:49 +00:00
Tom Lane	d7112cfa88	Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't allowed different processes to have different addresses for the shmem segment in quite a long time, but there were still a few places left that used the old coding convention. Clean them up to reduce confusion and improve the compiler's ability to detect pointer type mismatches. Kris Jurka	2008-11-02 21:24:52 +00:00
Tom Lane	902d1cb35f	Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple, and heap_deformtuple in favor of the newer functions heap_form_tuple et al (which do the same things but use bool control flags instead of arbitrary char values). Eliminate the former duplicate coding of these functions, reducing the deprecated functions to mere wrappers around the newer ones. We can't get rid of them entirely because add-on modules probably still contain many instances of the old coding style. Kris Jurka	2008-11-02 01:45:28 +00:00
Heikki Linnakangas	e9816533e3	Update FSM on WAL replay. This is a bit limited; the FSM is only updated on non-full-page-image WAL records, and quite arbitrarily, only if there's less than 20% free space on the page after the insert/update (not on HOT updates, though). The 20% cutoff should avoid most of the overhead, when replaying a bulk insertion, for example, while ensuring that pages that are full are marked as full in the FSM. This is mostly to avoid the nasty worst case scenario, where you replay from a PITR archive, and the FSM information in the base backup is really out of date. If there was a lot of pages that the outdated FSM claims to have free space, but don't actually have any, the first unlucky inserter after the recovery would traverse through all those pages, just to find out that they're full. We didn't have this problem with the old FSM implementation, because we simply threw the FSM information away on a non-clean shutdown.	2008-10-31 19:40:27 +00:00
Heikki Linnakangas	19c8dc839b	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.	2008-10-31 15:05:00 +00:00
Alvaro Herrera	089ae3bc9a	Properly access a buffer's LSN using existing access macros instead of abusing knowledge of page layout. Stolen from Jonah Harris' CRC patch	2008-10-20 21:11:15 +00:00
Tom Lane	dd4c165bc3	Improve some of the comments in fsmpage.c.	2008-10-07 21:10:11 +00:00
Heikki Linnakangas	89f373bf5b	Index FSMs needs to be vacuumed as well. Report by Jeff Davis.	2008-10-06 08:04:11 +00:00
Tom Lane	68827a7ada	Suppress an uninitialized-variable warning (not all versions of gcc complain here, but some do)	2008-10-01 14:59:23 +00:00
Heikki Linnakangas	f06ef2bede	Fix WAL redo of FSM truncation. We can't call smgrtruncate() during WAL replay, because it tries to XLogInsert().	2008-10-01 08:12:14 +00:00

1 2 3 4 5 ...

1118 Commits