postgresql

Commit Graph

Author	SHA1	Message	Date
Heikki Linnakangas	15c121b3ed	Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.	2008-09-30 10:52:14 +00:00
Tom Lane	35c2a3c3cf	Allow ShowBufferUsage() to report the number of reads/writes that have occurred to temporary files. This replaces the unused NDirectFileRead/NDirectFileWrite counters. Itagaki Takahiro	2008-09-17 13:15:55 +00:00
Tom Lane	30df79a70b	Widen the nLocks counts in local lock tables from int to int64. This forestalls potential overflow when the same table (or other object, but usually tables) is accessed by very many successive queries within a single transaction. Per report from Michael Milligan. Back-patch to 8.0, which is as far back as the patch conveniently applies. There have been no reports of overflow in pre-8.3 releases, but clearly the risk existed all along. (Michael's report suggests that 8.3 may consume lock counts faster than prior releases, but with no test case to look at it's hard to be sure about that. Widening the counts seems a good future-proofing measure in any event.)	2008-09-16 01:56:26 +00:00
Heikki Linnakangas	3f0e808c4a	Introduce the concept of relation forks. An smgr relation can now consist of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.	2008-08-11 11:05:11 +00:00
Tom Lane	4abd7b49f1	Improve CREATE/DROP/RENAME DATABASE so that when failing because the source or target database is being accessed by other users, it tells you whether the "other users" are live sessions or uncommitted prepared transactions. (Indeed, it tells you exactly how many of each, but that's mostly just because it was easy to do so.) This should help forestall the gotcha of not realizing that a prepared transaction is what's blocking the command. Per discussion.	2008-08-04 18:03:46 +00:00
Tom Lane	d92c370c72	Clean up buildfarm failures arising from the seemingly straightforward page macros patch :-(. Results from both baiji and mastodon imply that MSVC fails to perceive offsetof(PageHeaderData, pd_linp[0]) as a constant expression in some contexts where offsetof(PageHeaderData, pd_linp) works fine. Sloth, thy name is Micro.	2008-07-14 03:22:32 +00:00
Tom Lane	6816577a78	Change the PageGetContents() macro to guarantee its result is maxalign'd, thereby forestalling any problems with alignment of the data structure placed there. Since SizeOfPageHeaderData is maxalign'd anyway in 8.3 and HEAD, this does not actually change anything right now, but it is foreseeable that the header size will change again someday. I had to fix a couple of places that were assuming that the content offset is just SizeOfPageHeaderData rather than MAXALIGN(SizeOfPageHeaderData). Per discussion of Zdenek's page-macros patch.	2008-07-13 21:50:04 +00:00
Tom Lane	5b965bf08b	Teach autovacuum how to determine whether a temp table belongs to a crashed backend. If so, send a LOG message to the postmaster log, and if the table is beyond the vacuum-for-wraparound horizon, forcibly drop it. Per recent discussions. Perhaps we ought to back-patch this, but it probably needs to age a bit in HEAD first.	2008-07-01 02:09:34 +00:00
Tom Lane	fad153ec45	Rewrite the sinval messaging mechanism to reduce contention and avoid unnecessary cache resets. The major changes are: * When the queue overflows, we only issue a cache reset to the specific backend or backends that still haven't read the oldest message, rather than resetting everyone as in the original coding. * When we observe backend(s) falling well behind, we signal SIGUSR1 to only one backend, the one that is furthest behind and doesn't already have a signal outstanding for it. When it finishes catching up, it will in turn signal SIGUSR1 to the next-furthest-back guy, if there is one that is far enough behind to justify a signal. The PMSIGNAL_WAKEN_CHILDREN mechanism is removed. * We don't attempt to clean out dead messages after every message-receipt operation; rather, we do it on the insertion side, and only when the queue fullness passes certain thresholds. * Split SInvalLock into SInvalReadLock and SInvalWriteLock so that readers don't block writers nor vice versa (except during the infrequent queue cleanout operations). * Transfer multiple sinval messages for each acquisition of a read or write lock.	2008-06-19 21:32:56 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Heikki Linnakangas	a213f1ee6c	Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.	2008-06-12 09:12:31 +00:00
Alvaro Herrera	cc87402d6e	Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).	2008-06-08 22:00:48 +00:00
Alvaro Herrera	e4ca6cac43	Change xlog.h to xlogdefs.h in bufpage.h, and fix fallout.	2008-06-06 22:35:22 +00:00
Alvaro Herrera	5da9da71c4	Improve snapshot manager by keeping explicit track of snapshots. There are two ways to track a snapshot: there's the "registered" list, which is used for arbitrary long-lived snapshots; and there's the "active stack", which is used for the snapshot that is considered "active" at any time. This also allows users of snapshots to stop worrying about snapshot memory allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot assignment. This is all done automatically now. As a consequence, this allows us to reset MyProc->xmin when there are no more snapshots registered in the current backend, reducing the impact that long-running transactions have on VACUUM.	2008-05-12 20:02:02 +00:00
Alvaro Herrera	9084399782	Put back bufmgr.h in bufpage.h -- it is needed by some macros. Remove #include bufmgr.h from (most?) source files which already include bufpage.h.	2008-05-12 16:06:10 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Tom Lane	d1cbd26ded	Repair two places where SIGTERM exit could leave shared memory state corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.	2008-04-16 23:59:40 +00:00
Bruce Momjian	76365960d2	Revert addition of pg_terminate_backend() because of race conditions.	2008-04-15 20:28:47 +00:00
Bruce Momjian	18b286f3e3	Add pg_terminate_backend() to allow terminating only a single session.	2008-04-15 13:55:12 +00:00
Alvaro Herrera	d43b085d57	Separate snapshot management code from tuple visibility code, create a snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.	2008-03-26 16:20:48 +00:00
Alvaro Herrera	23057f51f5	Move ProcState definition into sinvaladt.c from sinvaladt.h, since it's not needed anywhere after my previous patch. Noticed by Tom Lane. Also, remove #include <signal.h> from sinval.c.	2008-03-17 11:50:27 +00:00
Alvaro Herrera	ec6550c6c0	Modify interactions between sinval.c and sinvaladt.c. The code that actually deals with the queue, including locking etc, is all in sinvaladt.c. This means that the struct definition of the queue, and the queue pointer, are now internal "implementation details" inside sinvaladt.c. Per my proposal dated 25-Jun-2007 and followup discussion.	2008-03-16 19:47:34 +00:00
Tom Lane	f0828b2fc3	Provide a build-time option to store large relations as single files, rather than dividing them into 1GB segments as has been our longtime practice. This requires working support for large files in the operating system; at least for the time being, it won't be the default. Zdenek Kotala	2008-03-10 20:06:27 +00:00
Tom Lane	3fcc7e8e18	Reduce memory consumption during VACUUM of large relations, by using FSMPageData (6 bytes) instead of PageFreeSpaceInfo (8 or 16 bytes) for the temporary array of page-free-space information. Itagaki Takahiro	2008-03-10 02:04:10 +00:00
Tom Lane	7d6e6e2e97	Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.	2008-03-04 19:54:06 +00:00
Tom Lane	6322e84430	Change StatementCancelHandler() to check the DoingCommandRead flag to decide whether to execute an immediate interrupt, rather than testing whether LockWaitCancel() cancelled a lock wait. The old way misclassified the case where we were blocked in ProcWaitForSignal(), and arguably would misclassify any other future additions of new ImmediateInterruptOK states too. This allows reverting the old kluge that gave LockWaitCancel() a return value, since no callers care anymore. Improve comments in the various implementations of PGSemaphoreLock() to explain that on some platforms, the assumption that semop() exits after a signal is wrong, and so we must ensure that the signal handler itself throws elog if we want cancel or die interrupts to be effective. Per testing related to bug #3883, though this patch doesn't solve those problems fully. Perhaps this change should be back-patched, but since pre-8.3 branches aren't really relying on autovacuum to respond to SIGINT, it doesn't seem critical for them.	2008-01-26 19:55:08 +00:00
Tom Lane	ceb9360067	Fix CREATE INDEX CONCURRENTLY to not deadlock against an automatic or manual VACUUM that is blocked waiting to get lock on the table being indexed. Per report and fix suggestion from Greg Stark.	2008-01-09 21:52:36 +00:00
Tom Lane	da3df47c84	lmgr.c:DescribeLockTag was never taught about virtual xids, per Greg Stark. Also a couple of minor tweaks to try to future-proof the code a bit better against future locktag additions.	2008-01-08 23:18:51 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	f6e8730d11	Re-run pgindent with updated list of typedefs. (Updated README should avoid this problem in the future.)	2007-11-15 22:25:18 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Tom Lane	6cc4451b5c	Prevent re-use of a deleted relation's relfilenode until after the next checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.	2007-11-15 20:36:40 +00:00
Alvaro Herrera	acac68b2bc	Allow an autovacuum worker to be interrupted automatically when it is found to be locking another process (except when it's working to prevent Xid wraparound problems).	2007-10-26 20:45:10 +00:00
Alvaro Herrera	745c1b2c2a	Rearrange vacuum-related bits in PGPROC as a bitmask, to better support having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.	2007-10-24 20:55:36 +00:00
Tom Lane	6f5c38dcd0	Just-in-time background writing strategy. This code avoids re-scanning buffers that cannot possibly need to be cleaned, and estimates how many buffers it should try to clean based on moving averages of recent allocation requests and density of reusable buffers. The patch also adds a couple more columns to pg_stat_bgwriter to help measure the effectiveness of the bgwriter. Greg Smith, building on his own work and ideas from several other people, in particular a much older patch from Itagaki Takahiro.	2007-09-25 20:03:38 +00:00
Tom Lane	cc59049daf	Improve handling of prune/no-prune decisions by storing a page's oldest unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us from performing heap_page_prune when there's no chance of pruning anything. Seems to be necessary per Heikki's preliminary performance testing.	2007-09-21 21:25:42 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	6889303531	Redefine the lp_flags field of item pointers as having four states, rather than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.	2007-09-12 22:10:26 +00:00
Tom Lane	6bd4f401b0	Replace the former method of determining snapshot xmax --- to wit, calling ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.	2007-09-08 20:31:15 +00:00
Tom Lane	cd1aae5864	Allow CREATE INDEX CONCURRENTLY to disregard transactions in other databases, per gripe from hubert depesz lubaczewski. Patch from Simon Riggs.	2007-09-07 00:58:57 +00:00
Tom Lane	295e63983d	Implement lazy XID allocation: transactions that do not modify any database rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.	2007-09-05 18:10:48 +00:00
Tom Lane	c8b7e811f3	Apparently icc doesn't always define __ICC, and it's more correct to check for __INTEL_COMPILER. Per report from Dirk Tilger. Not back-patched since I don't fully trust it yet ...	2007-08-05 15:11:40 +00:00
Tom Lane	e4f4a7f5a4	Remove FileUnlink(), which wasn't being used anywhere and interacted poorly with the recent patch to log temp file sizes at removal time. Doesn't seem worth fixing since it's unused. In passing, make a few elog messages conform to the message style guide.	2007-07-26 15:15:18 +00:00
Magnus Hagander	906b2e1b37	Rename DLLIMPORT macro to PGDLLIMPORT to avoid conflict with third party includes (like tcl) that define DLLIMPORT.	2007-07-25 12:22:54 +00:00
Tom Lane	9f6f51d5d4	Hmm, so evidently _check_lock and _clear_lock take an argument of type int not unsigned int. Third try to get grebe building without warnings...	2007-07-16 14:02:22 +00:00
Tom Lane	5aaf09ac46	So our reward for including <sys/atomic_op.h> seems to be a bunch of nattering about casting away volatile. Losers.	2007-07-16 04:57:57 +00:00
Tom Lane	057d5c421f	On AIX, include <sys/atomic_op.h> so that the functions we use for TAS support are properly declared.	2007-07-16 02:03:14 +00:00
Tom Lane	867e2c91a0	Implement "distributed" checkpoints in which the checkpoint I/O is spread over a fairly long period of time, rather than being spat out in a burst. This happens only for background checkpoints carried out by the bgwriter; other cases, such as a shutdown checkpoint, are still done at full speed. Remove the "all buffers" scan in the bgwriter, and associated stats infrastructure, since this seems no longer very useful when the checkpoint itself is properly throttled. Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas, and some minor API editorialization by me.	2007-06-28 00:02:40 +00:00
Alvaro Herrera	a03e8ad266	Remove unused BAD_LOCATION definition.	2007-06-25 17:12:07 +00:00
Tom Lane	6e07228728	Code review for log_lock_waits patch. Don't try to issue log messages from within a signal handler (this might be safe given the relatively narrow code range in which the interrupt is enabled, but it seems awfully risky); do issue more informative log messages that tell what is being waited for and the exact length of the wait; minor other code cleanup. Greg Stark and Tom Lane	2007-06-19 20:13:22 +00:00
Tom Lane	a04a423599	Arrange for large sequential scans to synchronize with each other, so that when multiple backends are scanning the same relation concurrently, each page is (ideally) read only once. Jeff Davis, with review by Heikki and Tom.	2007-06-08 18:23:53 +00:00
Tom Lane	24ee8af573	Rework temp_tablespaces patch so that temp tablespaces are assigned separately for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.	2007-06-07 19:19:57 +00:00
Tom Lane	acfce502ba	Create a GUC parameter temp_tablespaces that allows selection of the tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.	2007-06-03 17:08:34 +00:00
Tom Lane	bd0a260928	Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends will exit before failing because of conflicting DB usage. Per discussion, this seems a good idea to help mask the fact that backend exit takes nonzero time. Remove a couple of thereby-obsoleted sleeps in contrib and PL regression test sequences.	2007-06-01 19:38:07 +00:00
Tom Lane	d526575f89	Make large sequential scans and VACUUMs work in a limited-size "ring" of buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.	2007-05-30 20:12:03 +00:00
Tom Lane	14c4d3dea9	Fix trivial misspelling in comment.	2007-05-30 16:16:32 +00:00
Tom Lane	c7464720a3	tas() support for Renesas' M32R processor. Kazuhiro Inaoka	2007-05-04 15:20:52 +00:00
Tom Lane	8c3cc86e7b	During WAL recovery, when reading a page that we intend to overwrite completely from the WAL data, don't bother to physically read it; just have bufmgr.c return a zeroed-out buffer instead. This speeds recovery significantly, and also avoids unnecessary failures when a page-to-be-overwritten has corrupt page headers on disk. This replaces a former kluge that accomplished the latter by pretending zero_damaged_pages was always ON during WAL recovery; which was OK when the kluge was put in, but is unsafe when restoring a WAL log that was written with full_page_writes off. Heikki Linnakangas	2007-05-02 23:18:03 +00:00
Alvaro Herrera	e2a186b03c	Add a multi-worker capability to autovacuum. This allows multiple worker processes to be running simultaneously. Also, now autovacuum processes do not count towards the max_connections limit; they are counted separately from regular processes, and are limited by the new GUC variable autovacuum_max_workers. The launcher now has intelligence to launch workers on each database every autovacuum_naptime seconds, limited only on the max amount of worker slots available. Also, the global worker I/O utilization is limited by the vacuum cost-based delay feature. Workers are "balanced" so that the total I/O consumption does not exceed the established limit. This part of the patch was contributed by ITAGAKI Takahiro. Per discussion.	2007-04-16 18:30:04 +00:00
Tom Lane	9c9b619473	Remove the CheckpointStartLock in favor of having backends show whether they are in their commit critical sections via flags in the ProcArray. Checkpoint can watch the ProcArray to determine when it's safe to proceed. This is a considerably better solution to the original problem of race conditions between checkpoint and transaction commit: it speeds up commit, since there's one less lock to fool with, and it prevents the problem of checkpoint being delayed indefinitely when there's a constant flow of commits. Heikki, with some kibitzing from Tom.	2007-04-03 16:34:36 +00:00
Alvaro Herrera	626eb02198	Cleanup the bootstrap code a little, and rename "dummy procs" in the code comments and variables to "auxiliary proc", per Heikki's request.	2007-03-07 13:35:03 +00:00
Bruce Momjian	0763a56501	Add lo_truncate() to backend and libpq for large object truncation. Kris Jurka	2007-03-03 19:52:47 +00:00
Bruce Momjian	e52c4a6e26	Add GUC log_lock_waits to log long wait times. Simon Riggs	2007-03-03 18:46:40 +00:00
Tom Lane	fb276438b6	Suppress useless searches for unused line pointers in PageAddItem. To do this, add a 16-bit "flags" field to page headers by stealing some bits from pd_tli. We use one flag bit as a hint to indicate whether there are any unused line pointers; the remaining 15 are available for future use. This is a cut-down form of an idea proposed by Hiroki Kataoka in July 2005. At the time it was rejected because the original patch increased the size of page headers and it wasn't clear that the benefit outweighed the distributed cost. The flag-bit approach gets most of the benefit without requiring an increase in the page header size. Heikki Linnakangas and Tom Lane	2007-03-02 00:48:44 +00:00
Bruce Momjian	6f519ad01c	btree source code cleanups: I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas	2007-02-21 20:02:17 +00:00
Alvaro Herrera	1820650934	Restructure autovacuum in two processes: a dummy process, which runs continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.	2007-02-15 23:23:23 +00:00
Tom Lane	c398300330	Combine cmin and cmax fields of HeapTupleHeaders into a single field, by keeping private state in each backend that has inserted and deleted the same tuple during its current top-level transaction. This is sufficient since there is no need to be able to determine the cmin/cmax from any other transaction. This gets us back down to 23-byte headers, removing a penalty paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with minor revisions by moi, following a design hashed out awhile back on the pghackers list.	2007-02-09 03:35:35 +00:00
Tom Lane	eddbf39756	Extend yesterday's patch so that the bgwriter is also told to forget pending fsyncs during DROP DATABASE. Obviously necessary in hindsight :-(	2007-01-17 16:25:01 +00:00
Alvaro Herrera	eb63cc3da8	Arrange for autovacuum to be killed when another operation wants to be alone accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.	2007-01-16 13:28:57 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	ef07221997	Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.	2007-01-03 18:11:01 +00:00
Bruce Momjian	0c6f167c4a	Update lock comments for concurrent index creation, analyze. Walter Cruz	2006-11-23 05:14:04 +00:00
Tom Lane	def651f48f	Clean up local redeclarations of variables with DLLIMPORT, per report from Magnus that MSVC complains about this.	2006-10-19 18:32:48 +00:00
Tom Lane	e0dece127d	Redesign the patch for allocation of shmem space and LWLocks for add-on modules; the first try was not usable in EXEC_BACKEND builds (e.g., Windows). Instead, just provide some entry points to increase the allocation requests during postmaster start, and provide a dedicated LWLock that can be used to synchronize allocation operations performed by backends. Per discussion with Marc Munro.	2006-10-15 22:04:08 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	d40d34863e	Fix pg_locks view to call advisory locks advisory locks, while preserving backward compatibility for anyone using the old userlock code that's now on pgfoundry --- locks from that code still show as 'userlock'.	2006-09-22 23:20:14 +00:00
Tom Lane	9e936693a9	Fix free space map to correctly track the total amount of FSM space needed even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.	2006-09-21 20:31:22 +00:00
Tom Lane	9b4cda0df6	Add built-in userlock manipulation functions to replace the former contrib functionality. Along the way, remove the USER_LOCKS configuration symbol, since it no longer makes any sense to try to compile that out. No user documentation yet ... mmoncure has promised to write some. Thanks to Abhijit Menon-Sen for creating a first draft to work from.	2006-09-18 22:40:40 +00:00
Bruce Momjian	a0e87ad7a5	Specify lo_write() to take a _const_ buffer, to match documentation.	2006-09-07 15:37:25 +00:00
Tom Lane	e06fda0a8b	Add a function GetLockConflicts() to lock.c to report xacts holding locks that would conflict with a specified lock request, without actually trying to get that lock. Use this instead of the former ad hoc method of doing the first wait step in CREATE INDEX CONCURRENTLY. Fixes problem with undetected deadlock and in many cases will allow the index creation to proceed sooner than it otherwise could've. Per discussion with Greg Stark.	2006-08-27 19:14:34 +00:00
Tom Lane	e093dcdd28	Add the ability to create indexes 'concurrently', that is, without blocking concurrent writes to the table. Greg Stark, with a little help from Tom Lane.	2006-08-25 04:06:58 +00:00
Tom Lane	7aa772f03e	Now that we've rearranged relation open to get a lock before touching the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.	2006-08-18 16:09:13 +00:00
Bruce Momjian	2c6d96cef6	Add support for loadable modules to allocated shared memory and lightweight locks. Marc Munro	2006-08-01 19:03:11 +00:00
Tom Lane	09d3670df3	Change the relation_open protocol so that we obtain lock on a relation (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.	2006-07-31 20:09:10 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Tom Lane	a794fb0681	Convert the lock manager to use the new dynahash.c support for partitioned hash tables, instead of the previous kluge involving multiple hash tables. This partially undoes my patch of last December.	2006-07-23 23:08:46 +00:00
Tom Lane	10b9ca3d05	Split the buffer mapping table into multiple separately lockable partitions, as per discussion. Passes functionality checks, but I don't have any performance data yet.	2006-07-23 03:07:58 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b13c9686d0	Take the statistics collector out of the loop for monitoring backends' current commands; instead, store current-status information in shared memory. This substantially reduces the overhead of stats_command_string and also ensures that pg_stat_activity is fully up to date at all times. Per my recent proposal.	2006-06-19 01:51:22 +00:00
Bruce Momjian	399a36a75d	Prepare code to be built by MSVC: o remove many WIN32_CLIENT_ONLY defines o add WIN32_ONLY_COMPILER define o add 3rd argument to open() for portability o add include/port/win32_msvc directory for system includes Magnus Hagander	2006-06-07 22:24:46 +00:00
Bruce Momjian	b125d4b0ca	Fix Solaris/ASM test for x86.	2006-05-19 13:10:11 +00:00
Bruce Momjian	40a95aa25b	Use unsigned into for slock_t for pre-sparcv8plus.	2006-05-18 21:18:40 +00:00
Bruce Momjian	0622821853	Mention that gcc/sparc generates sparcv7 binaries.	2006-05-18 16:02:30 +00:00
Bruce Momjian	407885ea3b	Add comments that Solaris Sun compiler only supports sparc9 ASM,	2006-05-17 23:57:03 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Bruce Momjian	908f317b73	Add Win32 semaphore implementation, rather than mimicking SysV semaphores. Qingqing Zhou	2006-04-29 16:34:41 +00:00
Bruce Momjian	291724dfa8	Solaris tas() uses 'int' now. Theo Schlossnagle	2006-04-29 11:55:19 +00:00
Bruce Momjian	dfec2b070d	Remove "volatile" from tas function, per TOm.	2006-04-28 03:43:19 +00:00
Bruce Momjian	128bed948f	Rewrite Solaris compiler tas() assembly routines, merge i386 and x86_64 assembler files, renamed as solaris_x86.s. Theo Schlossnagle	2006-04-27 22:28:42 +00:00
Tom Lane	486f994be7	Revise large-object access routines to avoid running with CurrentMemoryContext set to the large object context ("fscxt"), as this is inevitably a source of transaction-duration memory leaks. Not sure why we'd not noticed it before; maybe people weren't touching a whole lot of LOs in the same transaction before the 8.1 pg_dump changes. Per report from Wayne Conrad. Backpatched as far as 8.1, but the problem doubtless goes all the way back. I'm disinclined to spend the time to try to verify that the older branches would still work if patched, seeing that this code was significantly modified for 8.0 and again for 8.1, and that we don't have any trouble reports before 8.1. (Maybe the leaks were smaller before?)	2006-04-26 00:34:57 +00:00
Tom Lane	cc7eab38dd	Recognize __ppc64__, which seems to be Apple's spelling of the predefined symbol for PPC64 hardware. I hadn't known that Apple supported PPC64 at all, but darn if there aren't 64-bit variant libraries in OS X as well as support in their gcc.	2006-04-19 23:11:15 +00:00
Tom Lane	0fcc3c2f1d	Repair a low-probability race condition identified by Qingqing Zhou. If a process abandons a wait in LockBufferForCleanup (in practice, only happens if someone cancels a VACUUM) just before someone else sends it a signal indicating the buffer is available, it was possible for the wakeup to remain in the process' semaphore, causing misbehavior next time the process waited for an lmgr lock. Rather than try to prevent the race condition directly, it seems best to make the lock manager robust against leftover wakeups, by having it repeat waiting on the semaphore if the lock has not actually been granted or denied yet.	2006-04-14 03:38:56 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	60d3c9fdf4	Declare the arguments of AllocateFile() as const char , not char . This is consistent with the standard definition of fopen().	2006-03-04 21:32:47 +00:00
Bruce Momjian	d5dd3d451e	Add contrib/pg_freespacemap to display free space map information. Mark Kirkwood	2006-02-12 03:55:53 +00:00
Tom Lane	4513d9deda	It turns out that TablespaceCreateDbspace fails badly if a relcache flush occurs when it tries to heap_open pg_tablespace. When control returns to smgrcreate, that routine will be holding a dangling pointer to a closed SMgrRelation, resulting in mayhem. This is of course a consequence of the violation of proper module layering inherent in having smgr.c call a tablespace command routine, but the simplest fix seems to be to change the locking mechanism. There's no real need for TablespaceCreateDbspace to touch pg_tablespace at all --- it's only opening it as a way of locking against a parallel DROP TABLESPACE command. A much better answer is to create a special-purpose LWLock to interlock these two operations. This drops TablespaceCreateDbspace quite a few layers down the food chain and makes it something reasonably safe for smgr to call.	2006-01-19 04:45:38 +00:00
Bruce Momjian	a1675649e4	Remove QNX port.	2006-01-05 01:56:30 +00:00
Tom Lane	349f40b2c2	Rearrange backend startup sequence so that ShmemIndexLock can become an LWLock instead of a spinlock. This hardly matters on Unix machines but should improve startup performance on Windows (or any port using EXEC_BACKEND). Per previous discussion.	2006-01-04 21:06:32 +00:00
Bruce Momjian	12af9cdff4	Add support for Solaris x86_64 using Sun's compiler. Pierre Girard	2005-12-30 21:43:41 +00:00
Tom Lane	195f164228	Get rid of the SpinLockAcquire/SpinLockAcquire_NoHoldoff distinction in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS (hence, these correspond to the old SpinLockAcquire_NoHoldoff case). Given our coding rules for spinlock use, there is no reason to allow CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there is no situation where ImmediateInterruptOK will be true while holding a spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.	2005-12-29 18:08:05 +00:00
Bruce Momjian	ea771743c8	Fix typo.	2005-12-17 21:08:24 +00:00
Bruce Momjian	8d26730a9a	Update s_lock.c comments.	2005-12-17 20:39:16 +00:00
Bruce Momjian	70cab220c8	Update ASM comments.	2005-12-17 20:15:43 +00:00
Tom Lane	fb3dbdf986	Rethink prior patch to filter out dead backend entries from the pgstats file. The original code probed the PGPROC array separately for each PID, which was not good for large numbers of backends: not only is the runtime O(N^2) but most of it is spent holding ProcArrayLock. Instead, take the lock just once and copy the active PIDs into an array, then use qsort and bsearch so that the lookup time is more like O(N log N).	2005-12-16 04:03:40 +00:00
Tom Lane	ec0baf949e	Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions.	2005-12-11 21:02:18 +00:00
Tom Lane	c599a247bb	Simplify lock manager data structures by making a clear separation between the data defining the semantics of a lock method (ie, conflict resolution table and ancillary data, which is all constant) and the hash tables storing the current state. The only thing we give up by this is the ability to use separate hashtables for different lock methods, but there is no need for that anyway. Put some extra fields into the LockMethod definition structs to clean up some other uglinesses, like hard-wired tests for DEFAULT_LOCKMETHOD and USER_LOCKMETHOD. This commit doesn't do anything about the performance issues we were discussing, but it clears away some of the underbrush that's in the way of fixing that.	2005-12-09 01:22:04 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	c859308aba	DropRelFileNodeBuffers failed to fix the state of the lookup hash table that was added to localbuf.c in 8.1; therefore, applying it to a temp table left corrupt lookup state in memory. The only case where this had a significant chance of causing problems was an ON COMMIT DELETE ROWS temp table; the other possible paths left bogus state that was unlikely to be used again. Per report from Csaba Nagy.	2005-11-17 17:42:02 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Neil Conway	689c815b09	Add a comment describing the requirement that pointers into shared memory that is protected by a spinlock must be volatile, per recent discussion.	2005-10-13 06:17:34 +00:00
Tom Lane	07eeb9d109	Do all accesses to shared buffer headers through volatile-qualified pointers, to ensure that compilers won't rearrange accesses to occur while we're not holding the buffer header spinlock. It's probably not necessary to mark volatile in every single place in bufmgr.c, but better safe than sorry. Per trouble report from Kevin Grittner.	2005-10-12 16:45:14 +00:00
Tom Lane	a72ee09090	Add infrastructure for making spins_per_delay variable depending on whether we seem to be running in a uniprocessor or multiprocessor. The adjustment rules could probably still use further tweaking, but I'm convinced this should be a win overall.	2005-10-11 20:41:32 +00:00
Tom Lane	9907b9775b	Don't use a non-locked pre-test of the spinlock on x86_64 machines. The pre-test has been shown to be a big loss on Opterons and at best a wash on EM64T.	2005-10-11 20:01:30 +00:00
Bruce Momjian	4f915cd377	This patch cleans up the access to members of ItemIdData. It uses existing macros instead of touching directly. ITAGAKI Takahiro	2005-09-22 16:46:00 +00:00
Bruce Momjian	658657177e	Print proper cause of statement cancel, user interaction or timeout.	2005-09-19 17:21:49 +00:00
Tom Lane	2d03390945	Sigh, looks like you need '.set mips2' before you can access MIPS SYNC instruction.	2005-08-29 00:41:34 +00:00
Tom Lane	7319ab9a59	Add a SYNC instruction to the S_UNLOCK sequence for MIPS.	2005-08-28 18:26:01 +00:00
Tom Lane	5824d02155	Get the MIPS assembler syntax right. Also add a separate sync command; the reference I consulted yesterday said SC does a SYNC, but apparently this is not true on newer MIPS processors, so be safe.	2005-08-27 16:22:48 +00:00
Tom Lane	846319db3f	Another try at the inlined MIPS spinlock code. Can't test this myself, but for sure it's not any more broken than the prior version.	2005-08-26 22:04:42 +00:00
Tom Lane	396526d8c3	Adjust m68k spinlock code to avoid duplicate in-line and not-in-line definitions on recent Linux systems, per Martin Pitt.	2005-08-26 14:47:35 +00:00
Tom Lane	1a33436224	Replace out-of-line tas() assembly code for MIPS with a properly constrained GCC inline version. Thiemo Seufer, by way of Martin Pitt.	2005-08-25 17:17:10 +00:00
Tom Lane	0007490e09	Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.	2005-08-20 23:26:37 +00:00
Tatsuo Ishii	bc3991c185	Add BackendXidGetPid().	2005-08-20 01:26:36 +00:00
Tom Lane	3ae7e4a33b	Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZ computation. On modern machines this is as fast if not faster, and we don't have to clog the CPU's L2 cache with a tens-of-KB pointer array. If we ever decide to adopt a more dynamic allocation method for shared buffers, we'll probably have to revert this patch, but in the meantime we might as well save a few bytes and nanoseconds. Per Qingqing Zhou.	2005-08-12 05:05:51 +00:00
Bruce Momjian	b609695b7a	Add files to do read I/O on the cluster directory: pg_stat_file() pg_read_file() pg_ls_dir() pg_reload_conf() pg_rotate_logfile() Dave Page Andreas Pflug	2005-08-12 03:25:13 +00:00
Tom Lane	7117cd3a77	Cause ShutdownPostgres to do a normal transaction abort during backend exit, instead of trying to take shortcuts. Introduce some additional shutdown callback routines to eliminate kluges like having ProcKill be responsible for shutting down the buffer manager. Ensure that the order of operations during shutdown is predictable and what you would expect given the module layering.	2005-08-08 03:12:16 +00:00
Tom Lane	2a4fad1a0e	Add NOWAIT option to SELECT FOR UPDATE/SHARE. Original patch by Hans-Juergen Schoenig, revisions by Karel Zak and Tom Lane.	2005-08-01 20:31:16 +00:00
Tom Lane	d42cf5a42a	Add per-user and per-database connection limit options. This patch also includes preliminary update of pg_dumpall for roles. Petr Jelinek, with review by Bruce Momjian and Tom Lane.	2005-07-31 17:19:22 +00:00
Neil Conway	b98b75eb3b	Remove MMCacheLock -- it is no longer used. Per ITAGAKI Takahiro.	2005-07-27 08:05:36 +00:00
Tom Lane	0eaa36a16a	Bring syntax of role-related commands into SQL compliance. To avoid syntactic conflicts, both privilege and role GRANT/REVOKE commands have to use the same production for scanning the list of tokens that might eventually turn out to be privileges or role names. So, change the existing GRANT/REVOKE code to expect a list of strings not pre-reduced AclMode values. Fix a couple other minor issues while at it, such as InitializeAcl function name conflicting with a Windows system function.	2005-06-28 19:51:26 +00:00
Tom Lane	3f749924f8	Simplify uses of readdir() by creating a function ReadDir() that includes error checking and an appropriate ereport(ERROR) message. This gets rid of rather tedious and error-prone manipulation of errno, as well as a Windows-specific bug workaround, at more than a dozen call sites. After an idea in a recent patch by Heikki Linnakangas.	2005-06-19 21:34:03 +00:00
Tom Lane	d0a89683a3	Two-phase commit. Original patch by Heikki Linnakangas, with additional hacking by Alvaro Herrera and Tom Lane.	2005-06-17 22:32:51 +00:00
Tom Lane	8563ccae2c	Simplify shared-memory lock data structures as per recent discussion: it is sufficient to track whether a backend holds a lock or not, and store information about transaction vs. session locks only in the inside-the-backend LocalLockTable. Since there can now be but one PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed, thus eliminating some O(N^2) behavior when a backend holds many locks. Also simplify the LockAcquire/LockRelease API by passing just a 'sessionLock' boolean instead of a transaction ID. The previous API was designed with the idea that per-transaction lock holding would be important for subtransactions, but now that we have subtransactions we know that this is unwanted. While at it, add an 'isTempObject' parameter to LockAcquire to indicate whether the lock is being taken on a temp table. This is not used just yet, but will be needed shortly for two-phase commit.	2005-06-14 22:15:33 +00:00
Tom Lane	a2fb7b8a1f	Adjust lo_open() so that specifying INV_READ without INV_WRITE creates a descriptor that uses the current transaction snapshot, rather than SnapshotNow as it did before (and still does if INV_WRITE is set). This means pg_dump will now dump a consistent snapshot of large object contents, as it never could do before. Also, add a lo_create() function that is similar to lo_creat() but allows the desired OID of the large object to be specified. This will simplify pg_restore considerably (but I'll fix that in a separate commit).	2005-06-13 02:26:53 +00:00
Tom Lane	4c8495a1f2	Remove the mostly-stubbed-out-anyway support routines for WAL UNDO. That code is never going to be used in the foreseeable future, and where it's more than a stub it's making the redo routines harder to read.	2005-06-06 17:01:25 +00:00
Tom Lane	140b078d2a	Improve LockAcquire API per my recent proposal. All error conditions are now reported via elog, eliminating the need to test the result code at most call sites. Make it possible for the caller to distinguish a freshly acquired lock from one already held in the current transaction. Use that capability to avoid redundant AcceptInvalidationMessages() calls in LockRelation().	2005-05-29 22:45:02 +00:00
Bruce Momjian	b492c3accc	Add parentheses to macros when args are used in computations. Without them, the executation behavior could be unexpected.	2005-05-25 21:40:43 +00:00
Bruce Momjian	6dc7760ac3	Add support for wal_fsync_writethrough for Darwin, and restructure the code to better handle writethrough. Chris Campbell	2005-05-20 14:53:26 +00:00
Tom Lane	191b13aaca	Factor out lock cleanup code that is needed in several places in lock.c. Also, remove the rather useless return value of LockReleaseAll. Change response to detection of corruption in the shared lock tables to PANIC, since that is the only way of cleaning up fully. Originally an idea of Heikki Linnakangas, variously hacked on by Alvaro Herrera and Tom Lane.	2005-05-19 23:30:18 +00:00
Tom Lane	ee3b71f6bc	Split the shared-memory array of PGPROC pointers out of the sinval communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.	2005-05-19 21:35:48 +00:00
Tom Lane	93b2477278	Use the standard lock manager to establish priority order when there is contention for a tuple-level lock. This solves the problem of a would-be exclusive locker being starved out by an indefinite succession of share-lockers. Per recent discussion with Alvaro.	2005-04-30 19:03:33 +00:00
Tom Lane	3a694bb0a1	Restructure LOCKTAG as per discussions of a couple months ago. Essentially, we shoehorn in a lockable-object-type field by taking a byte away from the lockmethodid, which can surely fit in one byte instead of two. This allows less artificial definitions of all the other fields of LOCKTAG; we can get rid of the special pg_xactlock pseudo-relation, and also support locks on individual tuples and general database objects (including shared objects). None of those possibilities are actually exploited just yet, however. I removed pg_xactlock from pg_class, but did not force initdb for that change. At this point, relkind 's' (SPECIAL) is unused and could be removed entirely.	2005-04-29 22:28:24 +00:00
Tom Lane	bedb78d386	Implement sharable row-level locks, and use them for foreign key references to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU data structure (managed much like pg_subtrans) to represent multiple- transaction-ID sets. When more than one transaction is holding a shared lock on a particular row, we create a MultiXactId representing that set of transactions and store its ID in the row's XMAX. This scheme allows an effectively unlimited number of row locks, just as we did before, while not costing any extra overhead except when a shared lock actually has to be shared. Still TODO: use the regular lock manager to control the grant order when multiple backends are waiting for a row lock. Alvaro Herrera and Tom Lane.	2005-04-28 21:47:18 +00:00
Tom Lane	94e03330cb	Create a routine PageIndexMultiDelete() that replaces a loop around PageIndexTupleDelete() with a single pass of compactification --- logic mostly lifted from PageRepairFragmentation. I noticed while profiling that a VACUUM that's cleaning up a whole lot of deleted tuples would spend as much as a third of its CPU time in PageIndexTupleDelete; not too surprising considering the loop method was roughly O(N^2) in the number of tuples involved.	2005-03-22 06:17:03 +00:00
Tom Lane	354049c709	Remove unnecessary calls of FlushRelationBuffers: there is no need to write out data that we are about to tell the filesystem to drop. smgr_internal_unlink already had a DropRelFileNodeBuffers call to get rid of dead buffers without a write after it's no longer possible to roll back the deleting transaction. Adding a similar call in smgrtruncate simplifies callers and makes the overall division of labor clearer. This patch removes the former behavior that VACUUM would write all dirty buffers of a relation unconditionally.	2005-03-20 22:00:54 +00:00
Tom Lane	91728fa26c	Add temp_buffers GUC variable to allow users to determine the size of the local buffer arena for temporary table access.	2005-03-19 23:27:11 +00:00
Tom Lane	88164799ce	Need to reset local buffer pin counts, not only shared buffer pins, before we attempt any file deletions in ShutdownPostgres. Per Tatsuo.	2005-03-18 16:16:09 +00:00
Bruce Momjian	609e32b929	Add spinlock support for Itanium processor with Intel compiler. Vikram Kalsi	2005-03-10 21:41:01 +00:00
Tom Lane	5d5087363d	Replace the BufMgrLock with separate locks on the lookup hashtable and the freelist, plus per-buffer spinlocks that protect access to individual shared buffer headers. This requires abandoning a global freelist (since the freelist is a global contention point), which shoots down ARC and 2Q as well as plain LRU management. Adopt a clock sweep algorithm instead. Preliminary results show substantial improvement in multi-backend situations.	2005-03-04 20:21:07 +00:00
Tom Lane	cc4f58f4cd	Ensure that all details of the ARC algorithm are hidden within freelist.c. This refactoring does not change any algorithms or data structures, just remove visibility of the ARC datastructures from other source files.	2005-02-03 23:29:19 +00:00
Tom Lane	fc299179df	Separate the functions of relcache entry flush and smgr cache entry flush so that we can get the size of a shared inval message back down to what it was in 7.4 (and simplify the logic too). Phase 2 of fixing the 'SMgrRelation hashtable corrupted' problem.	2005-01-10 21:57:19 +00:00
Tom Lane	0ce4d56924	Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This is the minimum required fix. I want to look next at taking advantage of it by simplifying the message semantics in the shared inval message queue, but that part can be held over for 8.1 if it turns out too ugly.	2005-01-10 20:02:24 +00:00
PostgreSQL Daemon	2ff501590b	Tag appropriate files for rc3 Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...	2004-12-31 22:04:05 +00:00
Tom Lane	eee5abce46	Refactor EXEC_BACKEND code so that postmaster child processes reattach to shared memory as soon as possible, ie, right after read_backend_variables. The effective difference from the original code is that this happens before instead of after read_nondefault_variables(), which loads GUC information and is apparently capable of expanding the backend's memory allocation more than you'd think it should. This should fix the failure-to-attach-to-shared-memory reports we've been seeing on Windows. Also clean up a few bits of unnecessarily grotty EXEC_BACKEND code.	2004-12-29 21:36:09 +00:00
Tom Lane	8f6278d907	Put in place some defenses against being fooled by accidental match of shared memory segment ID. If we can't access the existing shmem segment, it must not be relevant to our data directory. If we can access it, then attach to it and check for an actual match to the data directory. This should avoid some cases of failure-to-restart-after-boot without introducing any significant risk of failing to detect a still-running old backend.	2004-11-09 21:30:18 +00:00
Tom Lane	fdd13f1568	Give the ResourceOwner mechanism full responsibility for releasing buffer pins at end of transaction, and reduce AtEOXact_Buffers to an Assert cross-check that this was done correctly. When not USE_ASSERT_CHECKING, AtEOXact_Buffers is a complete no-op. This gets rid of an O(NBuffers) bottleneck during transaction commit/abort, which recent testing has shown becomes significant above a few tens of thousands of shared buffers.	2004-10-16 18:57:26 +00:00
Tom Lane	1c2de47746	Remove BufferLocks[] array in favor of a single pointer to the buffer (if any) currently waited for by LockBufferForCleanup(), which is all that we were using it for anymore. Saves some space and eliminates proportional-to-NBuffers slowdown in UnlockBuffers().	2004-10-16 18:05:07 +00:00
Tom Lane	9ffc8ed58b	Repair possible failure to update hint bits back to disk, per http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php. This fix is intended to be permanent: it moves the responsibility for calling SetBufferCommitInfoNeedsSave() into the tqual.c routines, eliminating the requirement for callers to test whether t_infomask changed. Also, tighten validity checking on buffer IDs in bufmgr.c --- several routines were paranoid about out-of-range shared buffer numbers but not about out-of-range local ones, which seems a tad pointless.	2004-10-15 22:40:29 +00:00
Neil Conway	f629583f94	Document what the "rep; nop" x86 assembler sequence is actually equivalent to, and what it is intended to do.	2004-10-06 23:41:59 +00:00
Tom Lane	0fb3152ea9	Minor adjustments to improve the accuracy of our computation of required shared memory size.	2004-09-29 15:15:56 +00:00
Tom Lane	3a246cc285	Arrange to preallocate all required space for the buffer and FSM hash tables in shared memory. This ensures that overflow of the lock table creates no long-lasting problems. Per discussion with Merlin Moncure.	2004-09-28 20:46:37 +00:00
Tom Lane	682598139e	Get rid of /*-inside-comment warning. My fault.	2004-09-24 01:48:43 +00:00
Tom Lane	409b38f514	Fix TAS assembly stuff for Solaris/386. (I'm not in a position to actually test this, but it couldn't be broken any worse than it was...)	2004-09-24 00:21:32 +00:00
Tom Lane	8f9f198603	Restructure subtransaction handling to reduce resource consumption, as per recent discussions. Invent SubTransactionIds that are managed like CommandIds (ie, counter is reset at start of each top transaction), and use these instead of TransactionIds to keep track of subtransaction status in those modules that need it. This means that a subtransaction does not need an XID unless it actually inserts/modifies rows in the database. Accordingly, don't assign it an XID nor take a lock on the XID until it tries to do that. This saves a lot of overhead for subtransactions that are only used for error recovery (eg plpgsql exceptions). Also, arrange to release a subtransaction's XID lock as soon as the subtransaction exits, in both the commit and abort cases. This avoids holding many unique locks after a long series of subtransactions. The price is some additional overhead in XactLockTableWait, but that seems acceptable. Finally, restructure the state machine in xact.c to have a more orthogonal set of states for subtransactions.	2004-09-16 16:58:44 +00:00
Tom Lane	5042985fb4	Add s_lock support for HPUX on IA64, per Shinji Teragaito.	2004-09-02 17:10:58 +00:00
Tom Lane	17364edce6	slock_t must be int not char for MIPS. 7.4 got this right, but the info was apparently mistranscribed in s_lock code rearrangement.	2004-08-30 22:49:07 +00:00
Bruce Momjian	b6b71b85bc	Pgindent run for 8.0.	2004-08-29 05:07:03 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
Tom Lane	1785acebf2	Introduce local hash table for lock state, as per recent proposal. PROCLOCK structs in shared memory now have only a bitmask for held locks, rather than counts (making them 40 bytes smaller, which is a good thing). Multiple locks within a transaction are counted in the local hash table instead, and we have provision for tracking which ResourceOwner each count belongs to. Solves recently reported problem with memory leakage within long transactions.	2004-08-27 17:07:42 +00:00
Tom Lane	51d7e25651	Improve some comments.	2004-08-26 17:22:28 +00:00
Tom Lane	4dbb880d3c	Rearrange pg_subtrans handling as per recent discussion. pg_subtrans updates are no longer WAL-logged nor even fsync'd; we do not need to, since after a crash no old pg_subtrans data is needed again. We truncate pg_subtrans to RecentGlobalXmin at each checkpoint. slru.c's API is refactored a little bit to separate out the necessary decisions.	2004-08-23 23:22:45 +00:00
Tom Lane	3fdf649f4f	Fix failure to guarantee that a checkpoint will write out pg_clog updates for transaction commits that occurred just before the checkpoint. This is an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a reproducible test case to prove its existence.	2004-08-11 04:07:16 +00:00
Tom Lane	fcbc438727	Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments and documentation to reference 8.0 instead of 7.5.	2004-08-04 21:34:35 +00:00
Tom Lane	efcaf1e868	Some mop-up work for savepoints (nested transactions). Store a small number of active subtransaction XIDs in each backend's PGPROC entry, and use this to avoid expensive probes into pg_subtrans during TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on modules to get control at subxact start/end. (This is deliberately not compatible with the former API, since any uses of that API probably need manual review anyway.) Add basic reference documentation for SAVEPOINT and related commands. Minor other cleanups to check off some of the open issues for subtransactions. Alvaro Herrera and Tom Lane.	2004-08-01 17:32:22 +00:00
Tom Lane	1bf3d61504	Fix subtransaction behavior for large objects, temp namespace, files, password/group files. Also allow read-only subtransactions of a read-write parent, but not vice versa. These are the reasonably noncontroversial parts of Alvaro's recent mop-up patch, plus further work on large objects to minimize use of the TopTransactionResourceOwner.	2004-07-28 14:23:31 +00:00
Tom Lane	2042b3428d	Invent WAL timelines, as per recent discussion, to make point-in-time recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.	2004-07-21 22:31:26 +00:00
Bruce Momjian	7a55ba7615	Back out pg_autovacuum commit after cvs clean failure causes commit.	2004-07-21 20:34:50 +00:00
Bruce Momjian	8dec0c1bf2	lease find enclosed a patch that matches the PL/Perl documentation (fairly closely, I hope) to the current PL/Perl implementation. David Fetter	2004-07-21 20:23:05 +00:00
Tom Lane	66ec2db728	XLOG file archiving and point-in-time recovery. There are still some loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.	2004-07-19 02:47:16 +00:00
Tom Lane	fe548629c5	Invent ResourceOwner mechanism as per my recent proposal, and use it to keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.	2004-07-17 03:32:14 +00:00
Tom Lane	573a71a5da	Nested transactions. There is still much left to do, especially on the performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.	2004-07-01 00:52:04 +00:00
Tom Lane	1098677482	Adjust TAS assembly as per recent discussions: use "+m"(*lock) everywhere to reference the spinlock variable, and specify "memory" as a clobber operand to be sure gcc does not try to keep shared-memory values in registers across a spinlock acquisition. Also tighten the S/390 asm sequence, which was apparently written with only minimal study of the gcc asm documentation. I have personally tested i386, ia64, ppc, hppa, and s390 variants --- there is some small chance that I broke the others, but I doubt it.	2004-06-19 23:02:32 +00:00
Tom Lane	2467394ee1	Tablespaces. Alternate database locations are dead, long live tablespaces. There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.	2004-06-18 06:14:31 +00:00
Tom Lane	e6cba71503	Add some code to Assert that when we release pin on a buffer, we are not holding the buffer's cntx_lock or io_in_progress_lock. A recent report from Litao Wu makes me wonder whether it is ever possible for us to drop a buffer and forget to release its cntx_lock. The Assert does not fire in the regression tests, but that proves little ...	2004-06-11 16:43:24 +00:00
Tom Lane	24a1e20f14	Adjust PageGetMaxOffsetNumber to ensure sane behavior on uninitialized pages, even when the macro's result is stored into an unsigned variable.	2004-06-05 17:42:46 +00:00
Bruce Momjian	e8d9d68ca4	Per previous discussions, here are two functions to send INT and TERM (cancel and terminate) signals to other backends. They permit only INT and TERM, and permits sending only to postgresql backends. Magnus Hagander	2004-06-02 21:29:29 +00:00
Tom Lane	2095206de1	Adjust btree index build to not use shared buffers, thereby avoiding the locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.	2004-06-02 17:28:18 +00:00
Tom Lane	91d20ff7aa	Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs for temp tables, and avoid WAL-logging truncations of temp tables. Do issue fsync on truncated files (not sure this is necessary but it seems like a good idea).	2004-05-31 20:31:33 +00:00
Tom Lane	e674707968	Minor code rationalization: FlushRelationBuffers just returns void, rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.	2004-05-31 19:24:05 +00:00
Tom Lane	9b178555fc	Per previous discussions, get rid of use of sync(2) in favor of explicitly fsync'ing every (non-temp) file we have written since the last checkpoint. In the vast majority of cases, the burden of the fsyncs should fall on the bgwriter process not on backends. (To this end, we assume that an fsync issued by the bgwriter will force out blocks written to the same file by other processes using other file descriptors. Anyone have a problem with that?) This makes the world safe for WIN32, which ain't even got sync(2), and really makes the world safe for Unixen as well, because sync(2) never had the semantics we need: it offers no way to wait for the requested I/O to finish. Along the way, fix a bug I recently introduced in xlog recovery: file truncation replay failed to clear bufmgr buffers for the dropped blocks, which could result in 'PANIC: heap_delete_redo: no block' later on in xlog replay.	2004-05-31 03:48:10 +00:00
Tom Lane	076a055acf	Separate out bgwriter code into a logically separate module, rather than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.	2004-05-29 22:48:23 +00:00
Tom Lane	1a321f26d8	Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.	2004-05-28 05:13:32 +00:00
Tom Lane	ebfc56d3fb	Handle impending sinval queue overflow by means of a separate signal (SIGUSR1, which we have not been using recently) instead of piggybacking on SIGUSR2-driven NOTIFY processing. This has several good results: the processing needed to drain the sinval queue is a lot less than the processing needed to answer a NOTIFY; there's less contention since we don't have a bunch of backends all trying to acquire exclusive lock on pg_listener; backends that are sitting inside a transaction block can still drain the queue, whereas NOTIFY processing can't run if there's an open transaction block. (This last is a fairly serious issue that I don't think we ever recognized before --- with clients like JDBC that tend to sit with open transaction blocks, the sinval queue draining mechanism never really worked as intended, probably resulting in a lot of useless cache-reset overhead.) This is the last of several proposed changes in response to Philip Warner's recent report of sinval-induced performance problems.	2004-05-23 03:50:45 +00:00
Tom Lane	4af3421161	Get rid of rd_nblocks field in relcache entries. Turns out this was costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.	2004-05-08 19:09:25 +00:00
Neil Conway	0370951347	Tiny assorted fixes: correct a typo in a comment in vacuumlazy.c, remove some unused #include directives from bufmgr.c, and clarify comments in bufmgr.h and buf.h	2004-04-25 23:50:58 +00:00
Neil Conway	139abc2896	Make LocalRefCount and PrivateRefCount arrays of int32, rather than long. This saves a small amount of per-backend memory for LP64 machines.	2004-04-22 07:21:55 +00:00
Tom Lane	95a03e9cdf	Another round of code cleanup on bufmgr. Use BM_VALID flag to keep track of whether we have successfully read data into a buffer; this makes the error behavior a bit more transparent (IMHO anyway), and also makes it work correctly for local buffers which don't use Start/TerminateBufferIO. Collapse three separate functions for writing a shared buffer into one. This overlaps a bit with cleanups that Neil proposed awhile back, but seems not to have committed yet.	2004-04-21 18:06:30 +00:00
Tom Lane	011c3e62e7	Code review for ARC patch. Eliminate static variables, improve handling of VACUUM cases so that VACUUM requests don't affect the ARC state at all, avoid corner case where BufferSync would uselessly rewrite a buffer that no longer contains the page that was to be flushed. Make some minor other cleanups in and around the bufmgr as well, such as moving PinBuffer and UnpinBuffer into bufmgr.c where they really belong.	2004-04-19 23:27:17 +00:00
Bruce Momjian	31338352bd	* Most changes are to fix warnings issued when compiling win32 * removed a few redundant defines * get_user_name safe under win32 * rationalized pipe read EOF for win32 (UPDATED PATCH USED) * changed all backend instances of sleep() to pg_usleep - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a 32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is acceptable, please replace with pg_usleep(2000000000L)] I added a comment to that part of the code: /* * It would be nice to use pg_usleep() here, but only does 2000 sec * or 33 minutes, which seems too short. */ sleep(1000000); Claudio Natoli	2004-04-19 17:42:59 +00:00
Bruce Momjian	c672aa823b	For application to HEAD, following community review. * Changes incorrect CYGWIN defines to __CYGWIN__ * Some localtime returns NULL checks (when unchecked cause SEGVs under Win32 regression tests) * Rationalized CreateSharedMemoryAndSemaphores and AttachSharedMemoryAndSemaphores (Bruce, I finally remembered to do it); requires attention. Claudio Natoli	2004-02-25 19:41:23 +00:00
Tom Lane	7a57a67278	Replace opendir/closedir calls throughout the backend with AllocateDir and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.	2004-02-23 23:03:10 +00:00
Tom Lane	f83356c7f5	Do a direct probe during postmaster startup to determine the maximum number of openable files and the number already opened. This eliminates depending on sysconf(_SC_OPEN_MAX), and allows much saner behavior on platforms where open-file slots are used up by semaphores.	2004-02-23 20:45:59 +00:00
Jan Wieck	fc65a3e1fd	Fixed bug where FlushRelationBuffers() did call StrategyInvalidateBuffer() for already empty buffers because their buffer tag was not cleard out when the buffers have been invalidated before. Also removed the misnamed BM_FREE bufhdr flag and replaced the checks, which effectively ask if the buffer is unpinned, with checks against the refcount field. Jan	2004-02-12 15:06:56 +00:00
Tom Lane	c3c09be34b	Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to wit: Add a header record to each WAL segment file so that it can be reliably identified. Avoid splitting WAL records across segment files (this is not strictly necessary, but makes it simpler to incorporate the header records). Make WAL entries for file creation, deletion, and truncation (as foreseen but never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent smgr API changes. initdb is forced due to changes in pg_control contents.	2004-02-11 22:55:26 +00:00
Tom Lane	87bd956385	Restructure smgr API as per recent proposal. smgr no longer depends on the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.	2004-02-10 01:55:27 +00:00
Jan Wieck	8d09e25693	Backing out the background writer sync() option. Jan	2004-02-04 01:24:53 +00:00
Tom Lane	c77f363384	Ensure that close() and fclose() are checked for errors, at least in cases involving writes. Per recent discussion about the possibility of close-time failures on some filesystems. There is a TODO item for this, too.	2004-01-26 22:35:32 +00:00
Jan Wieck	d77b63b17c	Added GUC variable bgwriter_flush_method controlling the action done by the background writer between writing dirty blocks and napping. none (default) no action sync bgwriter calls smgrsync() causing a sync(2) A global sync() is only good on dedicated database servers, so more flush methods should be added in the future. Jan	2004-01-24 20:00:46 +00:00
Jan Wieck	dfdd59e918	Adjusted calculation of shared memory requirements to new ARC buffer replacement strategy. Jan	2004-01-15 16:14:26 +00:00
Tom Lane	037e2fcf8f	Must test for __hppa__ as well as __hppa to make linux-hppa happy.	2004-01-03 05:47:44 +00:00
Tom Lane	f8eed65dfb	Improve spinlock code for recent x86 processors: insert a PAUSE instruction in the s_lock() wait loop, and use test before test-and-set in TAS() macro to avoid unnecessary bus traffic. Patch from Manfred Spraul, reworked a bit by Tom.	2003-12-27 20:58:58 +00:00
Tom Lane	afb09b5a31	Use inlined TAS() on PA-RISC, if we are compiling with gcc. Patch inspired by original submission from ViSolve.	2003-12-23 22:15:07 +00:00
Tom Lane	9adaf64da3	Mop-up for HAS_TEST_AND_SET refactoring. Un-break two or three platforms that were broken, try to make layout of s_lock.h entries consistent, use HAVE_SPINLOCKS in preference to HAS_TEST_AND_SET everywhere outside s_lock.h itself.	2003-12-23 18:13:17 +00:00
Bruce Momjian	caf6e9d2dd	Have configure --without-spinlocks actually not use spinlock code, even if supported by the cpu.	2003-12-23 03:52:10 +00:00
Bruce Momjian	69f2e9b0fc	Move slock_t typdefs into s_lock.h from include/port files for centralization and easier maintanence.	2003-12-23 03:31:30 +00:00
Bruce Momjian	887b5a7be0	Remove NEED_I386_TAS_ASM and just test for compiler defines.	2003-12-23 00:32:06 +00:00
Bruce Momjian	b731d04101	Test for __alpha and __alpha__.	2003-12-22 23:36:38 +00:00
Bruce Momjian	d75b2ec4eb	This patch is the next step towards (re)allowing fork/exec. Claudio Natoli	2003-12-20 17:31:21 +00:00
Neil Conway	fef0c8345a	I posted some bufmgr cleanup a few weeks ago, but it conflicted with some concurrent changes Jan was making to the bufmgr. Here's an updated version of the patch -- it should apply cleanly to CVS HEAD and passes the regression tests. This patch makes the following changes: - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer() macros, and replace uses of them with calls to the appropriate functions. - remove a bunch of #ifdef BMTRACE code: it is ugly & broken (i.e. it doesn't compile) - make BufferReplace() return a bool, not an int - cleanup some logic in bufmgr.c; should be functionality equivalent to the previous code, just cleaner now - remove the BM_PRIVATE flag as it is unused - improve a few comments, etc.	2003-12-14 00:34:47 +00:00
Peter Eisentraut	2afacfc403	This patch properly sets the prototype for the on_shmem_exit and on_proc_exit functions, and adjust all other related code to use the proper types too. by Kurt Roeckx	2003-12-12 18:45:10 +00:00
Tom Lane	f288877f10	Fix thinko in comment.	2003-12-11 21:21:55 +00:00
Tom Lane	5e2b99db95	Avoid assuming that type key_t is 32 bits, since it reportedly isn't on 64-bit Solaris. Use a non-system-dependent datatype for UsedShmemSegID, namely unsigned long (which we were already assuming could hold a shmem key anyway, cf RecordSharedMemoryInLockFile).	2003-12-01 22:15:38 +00:00
Bruce Momjian	e7ca867485	Try to reduce confusion about what is a lock method identifier, a lock method control structure, or a table of control structures. . Use type LOCKMASK where an int is not a counter. . Get rid of INVALID_TABLEID, use INVALID_LOCKMETHOD instead. . Use INVALID_LOCKMETHOD instead of (LOCKMETHOD) NULL, because LOCKMETHOD is not a pointer. . Define and use macro LockMethodIsValid. . Rename LOCKMETHOD to LOCKMETHODID. . Remove global variable LongTermTableId in lmgr.c, because it is never used. . Make LockTableId static in lmgr.c, because it is used nowhere else. Why not remove it and use DEFAULT_LOCKMETHOD? . Rename the lock method control structure from LOCKMETHODTABLE to LockMethodData. Introduce a pointer type named LockMethod. . Remove elog(FATAL) after InitLockTable() call in CreateSharedMemoryAndSemaphores(), because if something goes wrong, there is elog(FATAL) in LockMethodTableInit(), and if this doesn't help, an elog(ERROR) in InitLockTable() is promoted to FATAL. . Make InitLockTable() void, because its only caller does not use its return value any more. . Rename variables in lock.c to avoid statements like LockMethodTable[NumLockMethods] = lockMethodTable; lockMethodTable = LockMethodTable[lockmethod]; . Change LOCKMETHODID type to uint16 to fit into struct LOCKTAG. . Remove static variables BITS_OFF and BITS_ON from lock.c, because I agree to this doubt: * XXX is a fetch from a static array really faster than a shift? . Define and use macros LOCKBIT_ON/OFF. Manfred Koizar	2003-12-01 21:59:25 +00:00
PostgreSQL Daemon	55b113257c	make sure the $Id tags are converted to $PostgreSQL as well ...	2003-11-29 22:41:33 +00:00
Jan Wieck	cfeca62148	Background writer process This first part of the background writer does no syncing at all. It's only purpose is to keep the LRU heads clean so that regular backends seldom to never have to call write(). Jan	2003-11-19 15:55:08 +00:00
Jan Wieck	6b86d62b00	2nd try for the ARC strategy. I added a couple more Assertions while tracking down the exact cause of the former bug. All 93 regression tests pass now. Jan	2003-11-13 14:57:15 +00:00
Jan Wieck	923e994d79	ARC strategy backed out ... sorry Jan	2003-11-13 05:34:58 +00:00
Jan Wieck	48adc0b34b	Replacement of the buffer replacement strategy with an ARC algorithm adopted for PostgreSQL. Jan	2003-11-13 00:40:02 +00:00
Tom Lane	c1d62bfd00	Add operator strategy and comparison-value datatype fields to ScanKey. Remove the 'strategy map' code, which was a large amount of mechanism that no longer had any use except reverse-mapping from procedure OID to strategy number. Passing the strategy number to the index AM in the first place is simpler and faster. This is a preliminary step in planned support for cross-datatype index operations. I'm committing it now since the ScanKeyEntryInitialize() API change touches quite a lot of files, and I want to commit those changes before the tree drifts under me.	2003-11-09 21:30:38 +00:00
Tom Lane	f8a769b47a	Cause stats processes to detach from shared memory when started, so that they do not prevent the postmaster from deleting the shmem segment during a post-backend-crash restart cycle. Per recent discussion.	2003-11-07 21:55:50 +00:00
Peter Eisentraut	c119c554ed	Improve message wording for spinlocks-missing compilation error.	2003-11-04 09:53:36 +00:00
Bruce Momjian	9821455425	Rename __arm__/__arm__ to __arm__/__arm, found by Neil Conway	2003-10-10 03:58:57 +00:00
Bruce Momjian	f7fca96366	Fix #error message to mention renamed option --disable-spinlocks.	2003-09-29 04:20:22 +00:00
Bruce Momjian	06e3ec7a54	Implement compiler #error if spinlock code not found, add configure flag to bypass the error, --without-spinlocks.	2003-09-12 16:10:27 +00:00
Tom Lane	7a3693716d	Reimplement hash index locking algorithms, per my recent proposal to pghackers. This fixes the problem recently reported by Markus KrÌutner (hash bucket split corrupts the state of scans being done concurrently), and I believe it also fixes all the known problems with deadlocks in hash index operations. Hash indexes are still not really ready for prime time (since they aren't WAL-logged), but this is a step forward.	2003-09-04 22:06:27 +00:00
Tom Lane	ffafacc1f6	Repair potential deadlock created by recent changes to recycle btree index pages: when _bt_getbuf asks the FSM for a free index page, it is possible (and, in some cases, even moderately likely) that the answer will be the same page that _bt_split is trying to split. _bt_getbuf already knew that the returned page might not be free, but it wasn't prepared for the possibility that even trying to lock the page could be problematic. Fix by doing a conditional rather than unconditional grab of the page lock.	2003-08-10 19:48:08 +00:00
Bruce Momjian	46785776c4	Another pgindent run with updated typedefs.	2003-08-08 21:42:59 +00:00
Bruce Momjian	f3c3deb7d0	Update copyrights to 2003.	2003-08-04 02:40:20 +00:00
Bruce Momjian	089003fb46	pgindent run.	2003-08-04 00:43:34 +00:00
Tom Lane	13ac54d1ca	Since HPUX now exists for Itanium, we should decouple the assumption that OS=hpux is the same as CPU=hppa. First steps at doing this. With these patches, we still work on hppa with either gcc or HP's cc. We might work on hpux/itanium with gcc, but I can't test it. Definitely will not work on hpux/itanium with non-gcc compiler, for lack of spinlock code.	2003-08-01 19:12:52 +00:00
Tom Lane	e8db9b26d0	elog mop-up.	2003-07-27 17:10:07 +00:00
Bruce Momjian	9132506477	Add Opteron/Itanium comment.	2003-07-20 04:31:32 +00:00
Bruce Momjian	aa62f7f74a	Add x86_64 support for spinlocks. Jeffrey W. Baker	2003-06-24 23:20:08 +00:00
Bruce Momjian	7cb4278e82	Small patch to link to the proper place in the "runtime" file, and to add the "schemaname" column to the description of the pg_stats view. Greg Sabino Mullane	2003-06-24 23:19:11 +00:00
Bruce Momjian	0abe7431c6	This patch extracts page buffer pooling and the simple least-recently-used strategy from clog.c into slru.c. It doesn't change any visible behaviour and passes all regression tests plus a TruncateCLOG test done manually. Apart from refactoring I made a little change to SlruRecentlyUsed, formerly ClogRecentlyUsed: It now skips incrementing lru_counts, if slotno is already the LRU slot, thus saving a few CPU cycles. To make this work, lru_counts are initialised to 1 in SimpleLruInit. SimpleLru will be used by pg_subtrans (part of the nested transactions project), so the main purpose of this patch is to avoid future code duplication. Manfred Koizar	2003-06-11 22:37:46 +00:00
Peter Eisentraut	1fed74f257	Support for Intel compiler on Linux	2003-06-05 16:07:25 +00:00
Bruce Momjian	5e7a5c9511	Pass shared memory address on command line to exec'ed backend. Allow backends to attached to specified shared memory address.	2003-05-08 14:49:04 +00:00
Bruce Momjian	d9fd7d12f6	Pass shared memory id and socket descriptor number on command line for fork/exec.	2003-05-06 23:34:56 +00:00
Bruce Momjian	a7fd03e1de	Handle clog structure in shared memory in exec() case, for Win32.	2003-05-03 03:52:07 +00:00
Bruce Momjian	a2e038fbee	Back out last commit --- wrong patch.	2003-05-02 21:59:31 +00:00
Bruce Momjian	fb1f7ccec5	Dump/read non-default GUC values for use by exec'ed backends, for Win32.	2003-05-02 21:52:42 +00:00
Tom Lane	4a5f38c4e6	Code review for holdable-cursors patch. Fix error recovery, memory context sloppiness, some other things. Includes Neil's mopup patch of 22-Apr.	2003-04-29 03:21:30 +00:00
Tom Lane	f9ba0a7fe5	Apple's assembler likes the inlined TAS syntax too, so no reason to maintain a separate out-of-line version of PPC tas() anymore. Also fix S_UNLOCK for __powerpc64__ platforms.	2003-04-20 21:54:34 +00:00
Tom Lane	eb5e4c58d1	Tighten up register usage for inline PPC version of tas().	2003-04-04 06:57:39 +00:00
Tom Lane	cd35d601b8	Put the isync where it's supposed to be.	2003-04-04 05:32:30 +00:00
Tom Lane	fd42262836	Add code to apply some simple sanity checks to the header fields of a page when it's read in, per pghackers discussion around 17-Feb. Add a GUC variable zero_damaged_pages that causes the response to be a WARNING followed by zeroing the page, rather than the normal ERROR; this is per Hiroshi's suggestion that there needs to be a way to get at the data in the rest of the table.	2003-03-28 20:17:13 +00:00
Bruce Momjian	54f7338fa1	This patch implements holdable cursors, following the proposal (materialization into a tuple store) discussed on pgsql-hackers earlier. I've updated the documentation and the regression tests. Notes on the implementation: - I needed to change the tuple store API slightly -- it assumes that it won't be used to hold data across transaction boundaries, so the temp files that it uses for on-disk storage are automatically reclaimed at end-of-transaction. I added a flag to tuplestore_begin_heap() to control this behavior. Is changing the tuple store API in this fashion OK? - in order to store executor results in a tuple store, I added a new CommandDest. This works well for the most part, with one exception: the current DestFunction API doesn't provide enough information to allow the Executor to store results into an arbitrary tuple store (where the particular tuple store to use is chosen by the call site of ExecutorRun). To workaround this, I've temporarily hacked up a solution that works, but is not ideal: since the receiveTuple DestFunction is passed the portal name, we can use that to lookup the Portal data structure for the cursor and then use that to get at the tuple store the Portal is using. This unnecessarily ties the Portal code with the tupleReceiver code, but it works... The proper fix for this is probably to change the DestFunction API -- Tom suggested passing the full QueryDesc to the receiveTuple function. In that case, callers of ExecutorRun could "subclass" QueryDesc to add any additional fields that their particular CommandDest needed to get access to. This approach would work, but I'd like to think about it for a little bit longer before deciding which route to go. In the mean time, the code works fine, so I don't think a fix is urgent. - (semi-related) I added a NO SCROLL keyword to DECLARE CURSOR, and adjusted the behavior of SCROLL in accordance with the discussion on -hackers. - (unrelated) Cleaned up some SGML markup in sql.sgml, copy.sgml Neil Conway	2003-03-27 16:51:29 +00:00
Tom Lane	e4704001ea	This patch fixes a bunch of spelling mistakes in comments throughout the PostgreSQL source code. Neil Conway	2003-03-10 22:28:22 +00:00
Tom Lane	4b6c198a6a	Add code to dump contents of free space map into $PGDATA/global/pg_fsm.cache at database shutdown, and then load it again at database startup. This preserves our hard-won knowledge of free space across restarts (given an orderly shutdown, that is).	2003-03-06 00:04:27 +00:00
Tom Lane	391eb5e5b6	Reimplement free-space-map management as per recent discussions. Adjustable threshold is gone in favor of keeping track of total requested page storage and doling out proportional fractions to each relation (with a minimum amount per relation, and some quantization of the results to avoid thrashing with small changes in page counts). Provide special- case code for indexes so as not to waste space storing useless page free space counts. Restructure internal data storage to be a flat array instead of list-of-chunks; this may cost a little more work in data copying when reorganizing, but allows binary search to be used during lookup_fsm_page_entry().	2003-03-04 21:51:22 +00:00
Bruce Momjian	69c049cef4	Back out LOCKTAG changes by Rod Taylor, pending code review. Sorry.	2003-02-19 23:41:15 +00:00
Bruce Momjian	d0f3a7e9c4	- Modifies LOCKTAG to include a 'classId'. Relation receive a classId of RelOid_pg_class, and transaction locks XactLockTableId. RelId is renamed to objId. - LockObject() and UnlockObject() functions created, and their use sprinkled throughout the code to do descent locking for domains and types. They accept lock modes AccessShare and AccessExclusive, as we only really need a 'read' and 'write' lock at the moment. Most locking cases are held until the end of the transaction. This fixes the cases Tom mentioned earlier in regards to locking with Domains. If the patch is good, I'll work on cleaning up issues with other database objects that have this problem (most of them). Rod Taylor	2003-02-19 04:02:54 +00:00
Bruce Momjian	32cc6cbe23	Rename 'holder' references to 'proclock' for PROCLOCK references, for consistency.	2003-02-18 02:13:24 +00:00
Tom Lane	227a404cf4	Add code to print information about a detected deadlock cycle. The printed data is comparable to what you could read in the pg_locks view, were you fortunate enough to have been looking at it at the right time.	2003-01-16 21:01:45 +00:00
Tom Lane	fadcb01177	TAS code originally written for s390 (32-bit) does not work for s390x (64-bit). Fix it. Per report from Permaine Cheung.	2002-11-22 01:13:16 +00:00
Bruce Momjian	ceb4f5ea9c	> > I'll re-check that with the ppc architecture guy here. > > ... he is now about to write an inlined version that can go into > s_lock.h . I'll send the new patch later on... OK, here it comes: An inlined version of tas(), that works for both, powerpc and powerpc64. The patch is against 7.3b5 and passes the test suite on both architectures. Reinhard Max	2002-11-10 00:33:43 +00:00
Tom Lane	55e4ef138c	Code review for statement_timeout patch. Fix some race conditions between signal handler and enable/disable code, avoid accumulation of timing error due to trying to maintain remaining-time instead of absolute-end-time, disable timeout before commit not after.	2002-10-31 21:34:17 +00:00
Peter Eisentraut	7d970df60e	Add DLLIMPORT declarations required by contrib with asserts enabled.	2002-10-22 20:00:48 +00:00
Peter Eisentraut	de9d7f4bd5	Add DLLIMPORT declarations needed by contrib modules.	2002-10-21 18:57:35 +00:00
Tom Lane	7233aae50b	Fix PPC s_lock operations to work correctly on multi-CPU machines. Need 'isync' during TAS and 'sync' during S_UNLOCK.	2002-09-21 00:14:05 +00:00
Tom Lane	b2735fcd52	Performance improvement for MultiRecordFreeSpace on large relations --- avoid O(N^2) behavior. Problem noted and fixed by Stephen Marshall <smarshall@wsicorp.com>, with some help from Tom Lane.	2002-09-20 19:56:01 +00:00
Bruce Momjian	e50f52a074	pgindent run.	2002-09-04 20:31:48 +00:00
Bruce Momjian	50938576d4	I tried to build PostgreSQL with the following step to see backends hung during the regression test. The problem has been reproduced on two machine but both of these are the same type of hardware and software. I also tried to recreate the problem on other machines, on older version of AIX but I couldn't. After looked through pgsql-hackers mailing list, I focused on spin lock issue to solve the problem. The easiest and may not be the best solution for the problem is to give up HAS_TEST_AND_SET. This actually works. One another and better solution for the problem is to use _check_lock() and _clear_lock() as spin lock. Important thing here is to define S_UNLOCK() with _clear_lock(). This will solve the so called "Compiler bug" issue someone wrote on the mailing list. We have some other API such as cs(), compare_and_swap() and fetch_and_or() to do test and set on AIX, but any of these didn't solve my problem. I wrote tiny testing program to see if we have any bug of these API of AIX, but I couldn't see any problem except for compare_and_swap(). It seems that you can not use compare_and_swap() for the purpose, as it would not work as spin lock on any SMP machines I tested. I don't know the reason why cs() nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690. These worked with my testing program on all machines I tested. Tomoyuki Niijima	2002-09-02 04:42:52 +00:00
Bruce Momjian	97ac103289	Remove sys/types.h in files that include postgres.h, and hence c.h, because c.h has sys/types.h.	2002-09-02 02:47:07 +00:00
Tom Lane	c7a165adc6	Code review for HeapTupleHeader changes. Add version number to page headers (overlaying low byte of page size) and add HEAP_HASOID bit to t_infomask, per earlier discussion. Simplify scheme for overlaying fields in tuple header (no need for cmax to live in more than one place). Don't try to clear infomask status bits in tqual.c --- not safe to do it there. Don't try to force output table of a SELECT INTO to have OIDs, either. Get rid of unnecessarily complex three-state scheme for TupleDesc.tdhasoids, which has already caused one recent failure. Improve documentation.	2002-09-02 01:05:06 +00:00
Tom Lane	1bab464eb4	Code review for pg_locks feature. Make shmemoffset of PROCLOCK structs available (else there's no way to interpret the list links). Change pg_locks view to show transaction ID locks separately from ordinary relation locks. Avoid showing N duplicate rows when the same lock is held multiple times (seems unlikely that users care about exact hold count). Improve documentation.	2002-08-31 17:14:28 +00:00
Bruce Momjian	626eca697c	This patch reserves the last superuser_reserved_connections slots for connections by the superuser only. This patch replaces the last patch I sent a couple of days ago. It closes a connection that has not been authorised by a superuser if it would leave less than the GUC variable ReservedBackends (superuser_reserved_connections in postgres.conf) backend process slots free in the SISeg. This differs to the first patch which only reserved the last ReservedBackends slots in the procState array. This has made the free slot test more expensive due to the use of a lock. After thinking about a comment on the first patch I've also made it a fatal error if the number of reserved slots is not less than the maximum number of connections. Nigel J. Andrews	2002-08-29 21:02:12 +00:00
Bruce Momjian	82119a696e	[ Newest version of patch applied.] This patch is an updated version of the lock listing patch. I've made the following changes: - write documentation - wrap the SRF in a view called 'pg_locks': all user-level access should be done through this view - re-diff against latest CVS One thing I chose not to do is adapt the SRF to use the anonymous composite type code from Joe Conway. I'll probably do that eventually, but I'm not really convinced it's a significantly cleaner way to bootstrap SRF builtins than the method this patch uses (of course, it has other uses...) Neil Conway	2002-08-17 13:04:19 +00:00
Tom Lane	e44beef712	Code review of CLUSTER patch. Clean up problems with relcache getting confused, toasted data getting lost, etc.	2002-08-11 21:17:35 +00:00
Tom Lane	4038e8610c	Remove no-longer-used PageManagerMode enum.	2002-08-06 19:37:10 +00:00
Tom Lane	5df307c778	Restructure local-buffer handling per recent pghackers discussion. The local buffer manager is no longer used for newly-created relations (unless they are TEMP); a new non-TEMP relation goes through the shared bufmgr and thus will participate normally in checkpoints. But TEMP relations use the local buffer manager throughout their lifespan. Also, operations in TEMP relations are not logged in WAL, thus improving performance. Since it's no longer necessary to fsync relations as they move out of the local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code is no longer needed and has been removed: there's no concept of a dirty relation anymore in md.c/fd.c, and we never fsync anything but WAL. Still TODO: improve local buffer management algorithms so that it would be reasonable to increase NLocBuffer.	2002-08-06 02:36:35 +00:00
Bruce Momjian	5e6528adf7	* -Remove LockMethodTable.prio field, not used (Bruce)	2002-08-01 05:18:34 +00:00
Bruce Momjian	b75fcf9326	Complete TODO item: * -HOLDER/HOLDERTAB rename to PROCLOCK/PROCLOCKTAG	2002-07-19 00:17:40 +00:00
Bruce Momjian	981d045e88	Complete TODO item: * Merge LockMethodCtl and LockMethodTable into one shared structure (Bruce)	2002-07-18 23:06:20 +00:00
Bruce Momjian	4db8718e84	Add SET statement_timeout capability. Timeout is in ms. A value of zero turns off the timer.	2002-07-13 01:02:14 +00:00
Bruce Momjian	c9a7345217	>the extra level of struct naming for pd_opaque has no obvious >usefulness. > >> [...] should I post a patch that puts pagesize directly into >> PageHeaderData? > >If you're so inclined. Given that pd_opaque is hidden in those macros, >there wouldn't be much of any gain in readability either, so I haven't >worried about changing the declaration. Thanks for the clarification. Here is the patch. Not much gain, but at least it saves the next junior hacker from scratching his head ... Manfred Koizar	2002-07-02 06:18:57 +00:00
Bruce Momjian	33f1687879	There already was a macro PageGetItemId; this is now used in (almost) all places, where pd_linp is accessed. Also introduce new macros SizeOfPageHeaderData and BTMaxItemSize. This is just source code cosmetic, no behaviour changed. Manfred Koizar	2002-07-02 05:48:44 +00:00
Bruce Momjian	8864603f3c	Minor code cleanup in bufmgr.c and bufmgr.h, mainly by moving repeated lines of code into internal routines (drop_relfilenode_buffers, release_buffer) and by hiding unused routines (PrintBufferDescs, PrintPinnedBufs) behind #ifdef NOT_USED. Remove AbortBufferIO() declaration from bufmgr.c (already declared in bufmgr.h) Manfred Koizar	2002-07-02 05:47:37 +00:00
Bruce Momjian	d84fe82230	Update copyright to 2002.	2002-06-20 20:29:54 +00:00
Bruce Momjian	6e8a1a6717	WriteBuffer return value: >I'd vote for changing WriteBuffer to >return void, and have it elog() on bad argument. Manfred Koizar	2002-06-15 19:59:59 +00:00
Bruce Momjian	918e864f14	Remove some pre-WAL relics: SharedBufferChanged BufferRelidLastDirtied BufferTagLastDirtied BufferDirtiedByMe Manfred Koizar	2002-06-15 19:55:38 +00:00
Jan Wieck	469cb65aca	Katherine Ward wrote: > Changes to avoid collisions with WIN32 & MFC names... > 1. Renamed: > a. PROC => PGPROC > b. GetUserName() => GetUserNameFromId() > c. GetCurrentTime() => GetCurrentDateTime() > d. IGNORE => IGNORE_DTF in include/utils/datetime.h & utils/adt/datetim > > 2. Added _P to some lex/yacc tokens: > CONST, CHAR, DELETE, FLOAT, GROUP, IN, OUT Jan	2002-06-11 13:40:53 +00:00
Tom Lane	72a3902a66	Create an internal semaphore API that is not tied to SysV semaphores. As proof of concept, provide an alternate implementation based on POSIX semaphores. Also push the SysV shared-memory implementation into a separate file so that it can be replaced conveniently.	2002-05-05 00:03:29 +00:00
Bruce Momjian	171824087c	The patch I sent to -patches a little while ago wasn't applied: it was in the thread "make BufferGetBlockNumber() a macro". Tom objected to the original patch, so I prepared a new one which doesn't change BufferGetBlockNumber() into a macro, it just cleans up some comments and fixes an assertion. The patch is attached. Neil Conway	2002-04-15 23:47:12 +00:00
Tom Lane	26ac217173	Catcaches can now store negative entries as well as positive ones, to speed up repetitive failed searches; per pghackers discussion in late January. inval.c logic substantially simplified, since we can now treat inserts and deletes alike as far as inval events are concerned. Some repair work needed in heap_create_with_catalog, which turns out to have been doing CommandCounterIncrement at a point where the new relation has non-self-consistent catalog entries. With the new inval code, that resulted in assert failures during a relcache entry rebuild.	2002-03-03 17:47:56 +00:00
Tom Lane	7863404417	A bunch of changes aimed at reducing backend startup time... Improve 'pg_internal.init' relcache entry preload mechanism so that it is safe to use for all system catalogs, and arrange to preload a realistic set of system-catalog entries instead of only the three nailed-in-cache indexes that were formerly loaded this way. Fix mechanism for deleting out-of-date pg_internal.init files: this must be synchronized with transaction commit, not just done at random times within transactions. Drive it off relcache invalidation mechanism so that no special-case tests are needed. Cache additional information in relcache entries for indexes (their pg_index tuples and index-operator OIDs) to eliminate repeated lookups. Also cache index opclass info at the per-opclass level to avoid repeated lookups during relcache load. Generalize 'systable scan' utilities originally developed by Hiroshi, move them into genam.c, use in a number of places where there was formerly ugly code for choosing either heap or index scan. In particular this allows simplification of the logic that prevents infinite recursion between syscache and relcache during startup: we can easily switch to heapscans in relcache.c when and where needed to avoid recursion, so IndexScanOK becomes simpler and does not need any expensive initialization. Eliminate useless opening of a heapscan data structure while doing an indexscan (this saves an mdnblocks call and thus at least one kernel call).	2002-02-19 20:11:20 +00:00
Tom Lane	00fc295be0	Make S/390 TAS spell __inline__ the same way as the other eight GCC inline routines do.	2002-01-29 15:44:42 +00:00
Tom Lane	aa00e6134e	Add more sanity-checking to PageAddItem and PageIndexTupleDelete, to prevent spreading of corruption when page header pointers are bad. Merge PageZero into PageInit, since it was never used separately, and remove separate memset calls used at most other PageInit call points. Remove IndexPageCleanup, which wasn't used at all.	2002-01-15 22:14:17 +00:00
Tom Lane	4433eb1dff	Make sure that inlined S_UNLOCK is marked as an update of a 'volatile' object. This should prevent the compiler from reordering loads and stores into or out of a critical section.	2001-12-11 02:58:49 +00:00
Tom Lane	f6ee99a062	Clean up usage-statistics display code (ShowUsage and friends). StatFp is gone, usage messages now go through elog(DEBUG).	2001-11-10 23:51:14 +00:00
Tom Lane	ca7578d454	The extra semaphore that proc.c now allocates for checkpoint processes should be accounted for in the PROC_SEM_MAP_ENTRIES() macro. Otherwise the ports that rely on this macro to size data structures are broken. Mea culpa.	2001-11-06 00:38:26 +00:00
Bruce Momjian	ea08e6cd55	New pgindent run with fixes suggested by Tom. Patch manually reviewed, initdb/regression tests pass.	2001-11-05 17:46:40 +00:00
Tom Lane	fb5f1b2c13	Merge three existing ways of signaling postmaster from child processes, so that only one signal number is used not three. Flags in shared memory tell the reason(s) for the current signal. This method is extensible to handle more signal reasons without chewing up even more signal numbers, but the immediate reason is to keep pg_pwd reloads separate from SIGHUP processing in the postmaster. Also clean up some problems in the postmaster with delayed response to checkpoint status changes --- basically, it wouldn't schedule a checkpoint if it wasn't getting connection requests on a regular basis.	2001-11-04 19:55:31 +00:00
Bruce Momjian	6783b2372e	Another pgindent run. Fixes enum indenting, and improves #endif spacing. Also adds space for one-line comments.	2001-10-28 06:26:15 +00:00
Bruce Momjian	b81844b173	pgindent run on all C files. Java run to follow. initdb/regression tests pass.	2001-10-25 05:50:21 +00:00
Tom Lane	8a52b893b3	Further cleanup of dynahash.c API, in pursuit of portability and readability. Bizarre '(long *) TRUE' return convention is gone, in favor of just raising an error internally in dynahash.c when we detect hashtable corruption. HashTableWalk is gone, in favor of using hash_seq_search directly, since it had no hope of working with non-LONGALIGNable datatypes. Simplify some other code that was made undesirably grotty by promixity to HashTableWalk.	2001-10-05 17:28:13 +00:00
Tom Lane	5999e78fc4	Another round of cleanups for dynahash.c (maybe it's finally clean of portability issues). Caller-visible data structures are now allocated on MAXALIGN boundaries, allowing safe use of datatypes wider than 'long'. Rejigger hash_create API so that caller specifies size of key and total size of entry, not size of key and size of rest of entry. This simplifies life considerably since each number is just a sizeof(), and padding issues etc. are taken care of automatically.	2001-10-01 05:36:17 +00:00
Tom Lane	f9f258281e	Create a GUC parameter max_files_per_process that is a configurable upper limit on what we will believe from sysconf(_SC_OPEN_MAX). The default value is 1000, so that under ordinary conditions it won't affect the behavior. But on platforms where the kernel promises far more than it can deliver, this can be used to prevent running out of file descriptors. See numerous past discussions, eg, pgsql-hackers around 23-Dec-2000.	2001-09-30 18:57:45 +00:00
Bruce Momjian	0386ccfed1	Back out change. Too many place to change too close to beta: * HOLDER/HOLDERTAB rename to PROCLOCKLINK/PROCLOCKLINKTAG (Bruce) Will return later.	2001-09-30 00:45:48 +00:00
Bruce Momjian	f738747494	Do this TODO item: * HOLDER/HOLDERTAB rename to PROCLOCK/PROCLOCKTAG (Tom) Didn't use PROCLOCKLINK because it made PROCLOCKLINKTAG too long.	2001-09-29 21:35:14 +00:00
Tom Lane	499abb0c0f	Implement new 'lightweight lock manager' that's intermediate between existing lock manager and spinlocks: it understands exclusive vs shared lock but has few other fancy features. Replace most uses of spinlocks with lightweight locks. All remaining uses of spinlocks have very short lock hold times (a few dozen instructions), so tweak spinlock backoff code to work efficiently given this assumption. All per my proposal on pghackers 26-Sep-01.	2001-09-29 04:02:27 +00:00
Tom Lane	3d59ad00e8	Remove useless LockDisable() function and associated overhead, per my proposal of 26-Aug.	2001-09-27 16:29:13 +00:00
Peter Eisentraut	8401f06efd	Treat __s390x__ the same as __s390__. (taken from RPM patch set)	2001-09-24 20:10:44 +00:00
Tom Lane	35b7601b04	Add an overall timeout on the client authentication cycle, so that a hung client or lost connection can't indefinitely block a postmaster child (not to mention the possibility of deliberate DoS attacks). Timeout is controlled by new authentication_timeout GUC variable, which I set to 60 seconds by default ... does that seem reasonable?	2001-09-21 17:06:12 +00:00
Tom Lane	863aceb54f	Get rid of PID entries in shmem hash table; there is no longer any need for them, and making them just wastes time during backend startup/shutdown. Also, remove compile-time MAXBACKENDS limit per long-ago proposal. You can now set MaxBackends as high as your kernel can stand without any reconfiguration/recompilation.	2001-09-07 00:27:30 +00:00
Tom Lane	bc7d37a525	Transaction IDs wrap around, per my proposal of 13-Aug-01. More documentation to come, but the code is all here. initdb forced.	2001-08-26 16:56:03 +00:00
Tom Lane	2589735da0	Replace implementation of pg_log as a relation accessed through the buffer manager with 'pg_clog', a specialized access method modeled on pg_xlog. This simplifies startup (don't need to play games to open pg_log; among other things, OverrideTransactionSystem goes away), should improve performance a little, and opens the door to recycling commit log space by removing no-longer-needed segments of the commit log. Actual recycling is not there yet, but I felt I should commit this part separately since it'd still be useful if we chose not to do transaction ID wraparound.	2001-08-25 18:52:43 +00:00
Tom Lane	4fe42dfbc3	Add SHARE UPDATE EXCLUSIVE lock mode, coming soon to a VACUUM near you. Name chosen per pghackers discussion around 6/22/01.	2001-07-09 22:18:34 +00:00
Tom Lane	55432fedd2	Implement LockBufferForCleanup(), which will allow concurrent VACUUM to wait until it's safe to remove tuples and compact free space in a shared buffer page. Miscellaneous small code cleanups in bufmgr, too.	2001-07-06 21:04:26 +00:00
Tom Lane	42748087c1	First non-stub implementation of shared free space map. It's not super useful as yet, since its primary source of information is (full) VACUUM, which makes a concerted effort to get rid of free space before telling the map about it ... next stop is concurrent VACUUM ...	2001-07-02 20:50:46 +00:00
Tom Lane	af5ced9cfd	Further work on connecting the free space map (which is still just a stub) into the rest of the system. Adopt a cleaner approach to preventing deadlock in concurrent heap_updates: allow RelationGetBufferForTuple to select any page of the rel, and put the onus on it to lock both buffers in a consistent order. Remove no-longer-needed isExtend hack from API of ReleaseAndReadBuffer.	2001-06-29 21:08:25 +00:00
Tom Lane	e0c9301c87	Install infrastructure for shared-memory free space map. Doesn't actually do anything yet, but it has the necessary connections to initialization and so forth. Make some gestures towards allowing number of blocks in a relation to be BlockNumber, ie, unsigned int, rather than signed int. (I doubt I got all the places that are sloppy about it, yet.) On the way, replace the hardwired NLOCKS_PER_XACT fudge factor with a GUC variable.	2001-06-27 23:31:40 +00:00
Tom Lane	14807a3c98	Remove another unused include file with obsolete, useless, confusing definitions in it.	2001-06-27 19:02:48 +00:00
Tom Lane	d8d9ed931e	Add support to lock manager for conditionally locking a lock (ie, return without waiting if we can't get the lock immediately). Not used yet, but will be needed for concurrent VACUUM.	2001-06-22 00:04:59 +00:00
Tom Lane	986915c181	Remove unused include file for long-dead flavors of locking.	2001-06-21 21:01:36 +00:00
Tom Lane	bbbc00af88	Clean up some longstanding problems in shared-cache invalidation. SI messages now include the relevant database OID, so that operations in one database do not cause useless cache flushes in backends attached to other databases. Declare SI messages properly using a union, to eliminate the former assumption that Oid is the same size as int or Index. Rewrite the nearly-unreadable code in inval.c, and document it better. Arrange for catcache flushes at end of command/transaction to happen before relcache flushes do --- this avoids loading a new tuple into the catcache while setting up new relcache entry, only to have it be flushed again immediately.	2001-06-19 19:42:16 +00:00
Bruce Momjian	558fae16e3	The attached patch enables the contrib subtree to build cleanly under Cygwin with the possible exception of mSQL-interface. Since I don't have mSQL installed, I skipped this tool. Except for dealing with a missing getopt.h (oid2name) and HUGE (seg), the bulk of the patch uses the standard PostgreSQL approach to deal with Windows DLL issues. I tested the build aspect of this patch under Cygwin and Linux without any ill affects. Note that I did not actually attempt to test the code for functionality. The procedure to apply the patch is as follows: $ # save the attachment as /tmp/contrib.patch $ # change directory to the top of the PostgreSQL source tree $ patch -p0 </tmp/contrib.patch Jason	2001-06-18 21:38:02 +00:00
Tom Lane	2917f0a5dd	Tweak startup sequence so that running out of PROC array slots is detected sooner in backend startup, and is treated as an expected error (it gives 'Sorry, too many clients already' now). This allows us not to have to enforce the MaxBackends limit exactly in the postmaster. Also, remove ProcRemove() and fold its functionality into ProcKill(). There's no good reason for a backend not to be responsible for removing its PROC entry, and there are lots of good reasons for the postmaster not to be touching shared-memory data structures.	2001-06-16 22:58:17 +00:00
Tom Lane	2a6f7ac456	Move temporary files into 'pg_tempfiles' subdirectory of each database directory (which can be made a symlink to put temp files on another disk). Add code to delete leftover temp files during postmaster startup. Bruce, with some kibitzing from Tom.	2001-06-11 04:12:29 +00:00
Tom Lane	bdadc9bf1c	Remove RelationGetBufferWithBuffer(), which is horribly confused about appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer. Make use of the fact that the passed-in buffer (if there is one) must be pinned to avoid grabbing the bufmgr spinlock when we are able to return this same buffer. Eliminate unnecessary 'previous tuple' and 'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing a whole lot of bookkeeping from heap_getnext() and related routines.	2001-06-09 18:16:59 +00:00
Bruce Momjian	f6923ff3ac	Oops, only wanted python change in the last commit. Backing out.	2001-05-25 15:45:34 +00:00
Bruce Momjian	dffb673692	While changing Cygwin Python to build its core as a DLL (like Win32 Python) to support shared extension modules, I have learned that Guido prefers the style of the attached patch to solve the above problem. I feel that this solution is particularly appropriate in this case because the following: PglargeType PgType PgQueryType are already being handled in the way that I am proposing for PgSourceType. Jason Tishler	2001-05-25 15:34:50 +00:00
Bruce Momjian	f36fc7bb63	I haven't tried building postgres with the Watcom compiler for 7.1 because it does not support 64bit integers. AFAIK that's the default data type for OIDs, so I am not surprised that this does not work. Use gcc instead. BTW., 7.1 does not compile as is with gcc either, I believed the required patches made it into the 7.1.1 release but obviously I missed the deadline. Since the ports mailing list does not seem to be archived I have attached a copy of the patch (for 7.1 and 7.1.1). I've just performed a build of a Watcom compiled version and found a couple of bugs in the watcom specific part of that patch. Please use the attached version instead. Tegge, Bernd	2001-05-24 15:53:34 +00:00
Bruce Momjian	80d4ae931a	Small include file fix for pg_variabie.h	2001-05-14 22:06:41 +00:00
Tom Lane	eedb7d18fa	Modify RelationGetBufferForTuple() so that we only do lseek and lock when we need to move to a new page; as long as we can insert the new tuple on the same page as before, we only need LockBuffer and not the expensive stuff. Also, twiddle bufmgr interfaces to avoid redundant lseeks in RelationGetBufferForTuple and BufferAlloc. Successive inserts now require one lseek per page added, rather than one per tuple with several additional ones at each page boundary as happened before. Lock contention when multiple backends are inserting in same table is also greatly reduced.	2001-05-12 19:58:28 +00:00
Tom Lane	642107d5ba	Avoid unnecessary lseek() calls by cleanups in md.c. mdfd_lstbcnt was not being consulted anywhere, so remove it and remove the _mdnblocks() calls that were used to set it. Change smgrextend interface to pass in the target block number (ie, current file length) --- the caller always knows this already, having already done smgrnblocks(), so it's silly to do it over again inside mdextend. Net result: extension of a file now takes one lseek(SEEK_END) and a write(), not three lseeks and a write.	2001-05-10 20:38:49 +00:00
Tom Lane	ca224d2ba4	Suppress compiler warnings in Vax and NS32K assembly code: 'register foo' is not a complete declaration.	2001-04-13 23:32:57 +00:00
Tom Lane	dcbbdb1b3e	Add appropriately ifdef'd hack to make ARM compiler allocate ItemPointerData as six bytes not eight. This fixes a regression test failure but more importantly avoids wasting four bytes of pad space in every tuple header. Also add some commentary about what's going on.	2001-03-30 05:25:51 +00:00
Tom Lane	42eaad0575	Re-order declarations to un-break the non-HAS_TEST_AND_SET case.	2001-03-25 17:52:46 +00:00
Bruce Momjian	9e1552607a	pgindent run. Make it all clean.	2001-03-22 04:01:46 +00:00
Tom Lane	4d14fe0048	XLOG (and related) changes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.	2001-03-13 01:17:06 +00:00
Tom Lane	9c9936587c	Implement COMMIT_SIBLINGS parameter to allow pre-commit delay to occur only if at least N other backends currently have open transactions. This is not a great deal of intelligence about whether a delay might be profitable ... but it beats no intelligence at all. Note that the default COMMIT_DELAY is still zero --- this new code does nothing unless that setting is changed. Also, mark ENABLEFSYNC as a system-wide setting. It's no longer safe to allow that to be set per-backend, since we may be relying on some other backend's fsync to have synced the WAL log.	2001-02-26 00:50:08 +00:00
Bruce Momjian	a37666c2ec	Update comments on locks.	2001-02-23 19:24:06 +00:00
Bruce Momjian	81b48493aa	Bruce Momjian <pgman@candle.pha.pa.us> writes: > Is there one LOCKMETHODCTL for every backend? I thought there was only > one of them. >> >> You're right, that line is erroneous; it should read >> >> size += MAX_LOCK_METHODS * MAXALIGN(sizeof(LOCKMETHODCTL)); >> >> Not a significant error but it should be changed for clarity ...	2001-02-23 18:28:46 +00:00
Bruce Momjian	82fc51e0b3	More comment improvements.	2001-02-22 23:02:33 +00:00
Bruce Momjian	660ca3e01c	Change /---- commants to / where appropriate. pgindent will tighten up the comments later.	2001-02-22 18:39:20 +00:00
Bruce Momjian	15903a1ed4	Comment improvements.	2001-02-21 19:07:04 +00:00
Tom Lane	33cc5d8a4d	Change s_lock to not use any zero-delay select() calls; these are just a waste of cycles on single-CPU machines, and of dubious utility on multi-CPU machines too. Tweak s_lock_stuck so that caller can specify timeout interval, and increase interval before declaring stuck spinlock for buffer locks and XLOG locks. On systems that have fdatasync(), use that rather than fsync() to sync WAL log writes. Ensure that WAL file is entirely allocated during XLogFileInit.	2001-02-18 04:39:42 +00:00
Tom Lane	6249971b41	Just noticed that use of 'volatile' in HPPA S_UNLOCK() was causing gcc to generate unnecessarily stupid code. Tweak macro to describe a series of store-constant ops, not store/load/store/load/store/load/store.	2001-02-16 23:50:40 +00:00
Tom Lane	af0a15287d	Fix byte-vs-word-width oversight in m68k TAS() code. Man, this brings back some old memories ...	2001-02-10 04:07:25 +00:00
Tom Lane	d08741eab5	Restructure the key include files per recent pghackers discussion: there are now separate files "postgres.h" and "postgres_fe.h", which are meant to be the primary include files for backend .c files and frontend .c files respectively. By default, only include files meant for frontend use are installed into the installation include directory. There is a new make target 'make install-all-headers' that adds the whole content of the src/include tree to the installed fileset, for use by people who want to develop server-side code without keeping the complete source tree on hand. Cleaned up a whole lot of crufty and inconsistent header inclusions.	2001-02-10 02:31:31 +00:00
Bruce Momjian	b60c57da2d	Apply patches for QNX from Maurizio	2001-02-02 18:21:59 +00:00
Tom Lane	a05eae029a	Re-implement deadlock detection and resolution, per design notes posted to pghackers on 18-Jan-01.	2001-01-25 03:31:16 +00:00
Bruce Momjian	623bf843d2	Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.	2001-01-24 19:43:33 +00:00
Tom Lane	e84c429062	Clean up lockmanager data structures some more, in preparation for planned rewrite of deadlock checking. Lock holder objects are now reachable from the associated LOCK as well as from the owning PROC. This makes it practical to find all the processes holding a lock, as well as all those waiting on the lock. Also, clean up some of the grottier aspects of the SHMQueue API, and cause the waitProcs list to be stored in the intuitive direction instead of the nonintuitive one. (Bet you didn't know that the code followed the 'prev' link to get to the next waiting process, instead of the 'next' link. It doesn't do that anymore.)	2001-01-22 22:30:06 +00:00
Tom Lane	a7ea9f46e1	Still further tweaking of s_lock assembler: do not assume that leading whitespace is unimportant in assembly code. Also, move VAX definition of typedef slock_t to port header files to be like all the other ports. Note that netbsd.h and openbsd.h are now identical, and I rather think that freebsd.h is broken in the places where it doesn't agree --- but I'll leave it to the freebsders to look at that.	2001-01-20 00:03:55 +00:00
Bruce Momjian	75815c3100	cleanup.	2001-01-19 21:09:57 +00:00
Bruce Momjian	27aaf9df7e	Remove ; and add \n to ASM code.	2001-01-19 20:39:16 +00:00
Bruce Momjian	8fe8fc9db0	Fix alignment	2001-01-19 07:03:53 +00:00
Bruce Momjian	246b5398b4	Fix univel asm alignment	2001-01-19 06:59:59 +00:00
Bruce Momjian	cef28fd943	Add __volatile__ to all __asm__ and make consistent indenting	2001-01-19 03:58:35 +00:00
Bruce Momjian	d7810023c5	New ASM format: /* * Standard __asm__ format: * * __asm__( * "command;" * "command;" * "command;" * : "=r"(_res) return value, in register * : "r"(lock) argument, 'lock pointer', in register * : "r0"); inline code uses this register */	2001-01-19 02:58:59 +00:00
Bruce Momjian	c0a0f34618	Fix VAX ASM '1 f' -> '1f'.	2001-01-18 23:40:26 +00:00
Tom Lane	dae52bf3ec	Oops, I had managed to break query-cancel-while-waiting-for-lock.	2001-01-16 20:59:34 +00:00
Tom Lane	64e6c60897	Rename fields of lock and lockholder structures to something a tad less confusing, and clean up documentation.	2001-01-16 06:11:34 +00:00
Tom Lane	36839c1927	Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts are treated more like 'cancel' interrupts: the signal handler sets a flag that is examined at well-defined spots, rather than trying to cope with an interrupt that might happen anywhere. See pghackers discussion of 1/12/01.	2001-01-14 05:08:17 +00:00
Tom Lane	4cb0950cfe	Fix small but critical typo ...	2001-01-09 02:15:16 +00:00
Vadim B. Mikheev	3e059b3802	1. WAL needs in zero-ed content of newly initialized page. 2. Log record for PageRepaireFragmentation now keeps array of !LP_USED offnums to redo cleanup properly.	2000-12-30 15:19:57 +00:00
Tom Lane	f83b221598	Clean up spinlock assembly code slightly (just cosmetic improvements) for Alpha gcc case. For Alpha non-gcc case, replace use of __INTERLOCKED_TESTBITSS_QUAD builtin with __LOCK_LONG_RETRY and __UNLOCK_LONG. The former does not execute an MB instruction and therefore was guaranteed not to work on multiprocessor machines. The LOCK_LONG builtins produce code that is the same in all essential details as the gcc assembler code.	2000-12-30 02:34:56 +00:00
Tom Lane	7f60b81e1a	Fix failure in CreateCheckPoint on some Alpha boxes --- it's not OK to assume that TAS() will always succeed the first time, even if the lock is known to be free. Also, make sure that code will eventually time out and report a stuck spinlock, rather than looping forever. Small cleanups in s_lock.h, too.	2000-12-29 21:31:21 +00:00
Vadim B. Mikheev	7ceeeb662f	New WAL version - CRC and data blocks backup.	2000-12-28 13:00:29 +00:00
Tom Lane	6cc842abd3	Revise lock manager to support "session level" locks as well as "transaction level" locks. A session lock is not released at transaction commit (but it is released on transaction abort, to ensure recovery after an elog(ERROR)). In VACUUM, use a session lock to protect the master table while vacuuming a TOAST table, so that the TOAST table can be done in an independent transaction. I also took this opportunity to do some cleanup and renaming in the lock code. The previously noted bug in ProcLockWakeup, that it couldn't wake up any waiters beyond the first non-wakeable waiter, is now fixed. Also found a previously unknown bug of the same kind (failure to scan all members of a lock queue in some cases) in DeadLockCheck. This might have led to failure to detect a deadlock condition, resulting in indefinite waits, but it's difficult to characterize the conditions required to trigger a failure.	2000-12-22 00:51:54 +00:00
Tom Lane	a626b78c89	Clean up backend-exit-time cleanup behavior. Use on_shmem_exit callbacks to ensure that we have released buffer refcounts and so forth, rather than putting ad-hoc operations before (some of the calls to) proc_exit. Add commentary to discourage future hackers from repeating that mistake.	2000-12-18 00:44:50 +00:00
Tom Lane	fb47385fc8	Resurrect -F switch: it controls fsyncs again, though the fsyncs are mostly just on the WAL logfile nowadays. But if people want to disable fsync for performance, why should we say no?	2000-12-08 22:21:33 +00:00
Tom Lane	68ed296301	Don't use 'private' as a parameter name in visible headers ... makes C++ very unhappy ...	2000-12-03 17:18:10 +00:00
Thomas G. Lockhart	48781d44e4	Support IBM S/390. Patches from Neale Ferguson@softwareAG-usa.com.	2000-12-03 14:41:47 +00:00
Vadim B. Mikheev	81c8c244b2	No more #ifdef XLOG.	2000-11-30 08:46:26 +00:00
Tom Lane	680b7357ce	Rearrange bufmgr header files so that buf_internals.h need not be included by everything that includes bufmgr.h --- it's supposed to be internals, after all, not part of the API! This fixes the conflict against FreeBSD headers reported by Rosenman, by making it unnecessary for s_lock.h to be included by plperl.c.	2000-11-30 01:39:08 +00:00
Tom Lane	c715fdea26	Significant cleanups in SysV IPC handling (shared mem and semaphores). IPC key assignment will now work correctly even when multiple postmasters are using same logical port number (which is possible given -k switch). There is only one shared-mem segment per postmaster now, not 3. Rip out broken code for non-TAS case in bufmgr and xlog, substitute a complete S_LOCK emulation using semaphores in spin.c. TAS and non-TAS logic is now exactly the same. When deadlock is detected, "Deadlock detected" is now the elog(ERROR) message, rather than a NOTICE that comes out before an unhelpful ERROR.	2000-11-28 23:27:57 +00:00
Peter Eisentraut	a70e74b060	Put external declarations into header files.	2000-11-21 21:16:06 +00:00
Vadim B. Mikheev	c07bb9e0ad	No casting to LSN (XLogRecPtr) is required.	2000-11-20 21:12:26 +00:00
Tom Lane	ebb0a20149	Keep track of the last active slot in the shared ProcState array, so that search loops only have to scan that far and not through all maxBackends entries. This eliminates a performance penalty for setting maxBackends much higher than the average number of active backends. Also, eliminate no-longer-used 'backend tag' concept. Remove setting of environment variables at backend start (except for CYR_RECODE), since none of them are being examined by the backend any longer.	2000-11-12 20:51:52 +00:00
Vadim B. Mikheev	92875e6f44	pg_fsync is fsync in WAL version.	2000-11-10 03:53:45 +00:00
Tom Lane	3908473c80	Make DROP TABLE rollback-able: postpone physical file delete until commit. (WAL logging for this is not done yet, however.) Clean up a number of really crufty things that are no longer needed now that DROP behaves nicely. Make temp table mapper do the right things when drop or rename affecting a temp table is rolled back. Also, remove "relation modified while in use" error check, in favor of locking tables at first reference and holding that lock throughout the statement.	2000-11-08 22:10:03 +00:00
Vadim B. Mikheev	5b0740d3fc	WAL	2000-10-28 16:21:00 +00:00
Tom Lane	a9b6b01ee8	Reconsider page size for large objects: rather than stuffing disk pages as full as possible, seems better to use a tuple size around BLCKSZ/4 so that less space is wasted when a LO tuple is updated. Also, this lets us use a logical page size that's an exact power of two, avoiding partial-page writes when client is sending us stuff in power-of-2 buffer chunks.	2000-10-24 03:34:53 +00:00

... 6 7 8 9 10 ...

971 Commits