postgresql

Commit Graph

Author	SHA1	Message	Date
Robert Haas	e26c539e9f	Wrap calls to SearchSysCache and related functions using macros. The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.	2010-02-14 18:42:19 +00:00
Tom Lane	cbe9d6beb4	Fix up rickety handling of relation-truncation interlocks. Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.	2010-02-09 21:43:30 +00:00
Tom Lane	9184cc7dab	Fix serious performance bug in new implementation of VACUUM FULL: cluster_rel necessarily builds an all-new toast table, so it's useless to then go and VACUUM FULL the toast table.	2010-02-08 16:50:21 +00:00
Tom Lane	0a469c8769	Remove old-style VACUUM FULL (which was known for a little while as VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.	2010-02-08 04:33:55 +00:00
Tom Lane	b9b8831ad6	Create a "relation mapping" infrastructure to support changing the relfilenodes of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.	2010-02-07 20:48:13 +00:00
Itagaki Takahiro	946cf229e8	Support rewritten-based full vacuum as VACUUM FULL. Traditional VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming. Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot use it for system tables. VACUUM FULL for system tables always fall back into VACUUM FULL INPLACE silently. Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.	2010-01-06 05:31:14 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	48c192c15e	Revise pgstat's tracking of tuple changes to improve the reliability of decisions about when to auto-analyze. The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples, where all three of these numbers could be bad estimates from ANALYZE itself. Even worse, in the presence of a steady flow of HOT updates and matching HOT-tuple reclamations, auto-analyze might never trigger at all, even if all three numbers are exactly right, because n_dead_tuples could hold steady. To fix, replace last_anl_tuples with an accurately tracked count of the total number of committed tuple inserts + updates + deletes since the last ANALYZE on the table. This can still be compared to the same threshold as before, but it's much more trustworthy than the old computation. Tracking this requires one more intra-transaction counter per modified table within backends, but no additional memory space in the stats collector. There probably isn't any measurable speed difference; if anything it might be a bit faster than before, since I was able to eliminate some per-tuple arithmetic operations in favor of adding sums once per (sub)transaction. Also, simplify the logic around pgstat vacuum and analyze reporting messages by not trying to fold VACUUM ANALYZE into a single pgstat message. The original thought behind this patch was to allow scheduling of analyzes on parent tables by artificially inflating their changes_since_analyze count. I've left that for a separate patch since this change seems to stand on its own merit.	2009-12-30 20:32:14 +00:00
Tom Lane	649b5ec7c8	Add the ability to store inheritance-tree statistics in pg_statistic, and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.	2009-12-29 20:11:45 +00:00
Simon Riggs	efc16ea520	Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.	2009-12-19 01:32:45 +00:00
Tom Lane	62aba76568	Prevent indirect security attacks via changing session-local state within an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136	2009-12-09 21:57:51 +00:00
Tom Lane	0cb65564e5	Add exclusion constraints, which generalize the concept of uniqueness to support any indexable commutative operator, not just equality. Two rows violate the exclusion constraint if "row1.col OP row2.col" is TRUE for each of the columns in the constraint. Jeff Davis, reviewed by Robert Haas	2009-12-07 05:22:23 +00:00
Tom Lane	5e66a51c2e	Provide a parenthesized-options syntax for VACUUM, analogous to that recently adopted for EXPLAIN. This will allow additional options to be implemented in future without having to make them fully-reserved keywords. The old syntax remains available for existing options, however. Itagaki Takahiro	2009-11-16 21:32:07 +00:00
Alvaro Herrera	e7ec022266	Fix longstanding problems in VACUUM caused by untimely interruptions In VACUUM FULL, an interrupt after the initial transaction has been recorded as committed can cause postmaster to restart with the following error message: PANIC: cannot abort transaction NNNN, it was already committed This problem has been reported many times. In lazy VACUUM, an interrupt after the table has been truncated by lazy_truncate_heap causes other backends' relcache to still point to the removed pages; this can cause future INSERT and UPDATE queries to error out with the following error message: could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes The window to this race condition is extremely narrow, but it has been seen in the wild involving a cancelled autovacuum process. The solution for both problems is to inhibit interrupts in both operations until after the respective transactions have been committed. It's not a complete solution, because the transaction could theoretically be aborted by some other error, but at least fixes the most common causes of both problems.	2009-11-10 18:00:06 +00:00
Tom Lane	9f2ee8f287	Re-implement EvalPlanQual processing to improve its performance and eliminate a lot of strange behaviors that occurred in join cases. We now identify the "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the appropriate row into each scan node in the rechecking plan, forcing it to emit only that one row. The former behavior could rescan the whole of each joined relation for each recheck, which was terrible for performance, and what's much worse could result in duplicated output tuples. Also, the original implementation of EvalPlanQual could not re-use the recheck execution tree --- it had to go through a full executor init and shutdown for every row to be tested. To avoid this overhead, I've associated a special runtime Param with each LockRows or ModifyTable plan node, and arranged to make every scan node below such a node depend on that Param. Thus, by signaling a change in that Param, the EPQ machinery can just rescan the already-built test plan. This patch also adds a prohibition on set-returning functions in the targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the duplicate-output-tuple problem. It seems fairly reasonable since the other restrictions on SELECT FOR UPDATE are meant to ensure that there is a unique correspondence between source tuples and result tuples, which an output SRF destroys as much as anything else does.	2009-10-26 02:26:45 +00:00
Tom Lane	794e3e81a0	Force VACUUM to recalculate oldestXmin even when we haven't changed our own database's datfrozenxid, if the current value is old enough to be forcing autovacuums or warning messages. This ensures that a bogus value is replaced as soon as possible. Per a comment from Heikki.	2009-09-01 04:46:49 +00:00
Alvaro Herrera	a8bb8eb583	Remove flatfiles.c, which is now obsolete. Recent commits have removed the various uses it was supporting. It was a performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems it slowed down user creation after a billion users.	2009-09-01 02:54:52 +00:00
Tom Lane	25ec228ef7	Track the current XID wrap limit (or more accurately, the oldest unfrozen XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. To avoid bloating the checkpoint records unreasonably, I switched from tracking the oldest database by name to tracking it by OID. This turns out to save cycles in general (everywhere but the warning-generating paths, which we hardly care about) and also helps us deal with the case that the oldest database got dropped instead of being vacuumed. The prior coding might go for a long time without updating the wrap limit in that case, which is bad because it might result in a lot of useless autovacuum activity.	2009-08-31 02:23:23 +00:00
Tom Lane	7fc7a7c4d0	Fix a violation of WAL coding rules in the recent patch to include an "all tuples visible" flag in heap page headers. The flag update must be applied before calling XLogInsert, but heap_update and the tuple moving routines in VACUUM FULL were ignoring this rule. A crash and replay could therefore leave the flag incorrectly set, causing rows to appear visible in seqscans when they should not be. This might explain recent reports of data corruption from Jeff Ross and others. In passing, do a bit of editorialization on comments in visibilitymap.c.	2009-08-24 02:18:32 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Tom Lane	32ea236361	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.	2009-06-06 22:13:52 +00:00
Tom Lane	948d6ec90f	Modify the relcache to record the temp status of both local and nonlocal temp relations; this is no more expensive than before, now that we have pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting to fetch pages from nonlocal temp relations. This provides a low-level defense against bugs-of-omission allowing temp pages to be loaded into shared buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop. While at it, tweak a bunch of places to use new relcache tests (instead of expensive probes into pg_namespace) to detect local or nonlocal temp tables.	2009-03-31 22:12:48 +00:00
Tom Lane	ff301d6e69	Implement "fastupdate" support for GIN indexes, in which we try to accumulate multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev	2009-03-24 20:17:18 +00:00
Heikki Linnakangas	6587818542	Add vacuum_freeze_table_age GUC option, to control when VACUUM should ignore the visibility map and scan the whole table, to advance relfrozenxid.	2009-01-16 13:27:24 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Heikki Linnakangas	dcf8409985	Don't reset pg_class.reltuples and relpages in VACUUM, if any pages were skipped. We could update relpages anyway, but it seems better to only update it together with reltuples, because we use the reltuples/relpages ratio in the planner. Also don't update n_live_tuples in pgstat. ANALYZE in VACUUM ANALYZE now needs to update pg_class, if the VACUUM-phase didn't do so. Added some boolean-passing to let analyze_rel know if it should update pg_class or not. I also moved the relcache invalidation (to update rd_targblock) from vac_update_relstats to where RelationTruncate is called, because vac_update_relstats is not called for partial vacuums anymore. It's more obvious to send the invalidation close to the truncation that requires it. Per report by Ned T. Crigler.	2008-12-17 09:15:03 +00:00
Heikki Linnakangas	608195a3a3	Introduce visibility map. The visibility map is a bitmap with one bit per heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.	2008-12-03 13:05:22 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00
Tom Lane	c5451c22e3	Make relhasrules and relhastriggers work like relhasindex, namely we let VACUUM reset them to false rather than trying to clean 'em up during DROP.	2008-11-10 00:49:37 +00:00
Heikki Linnakangas	19c8dc839b	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.	2008-10-31 15:05:00 +00:00
Heikki Linnakangas	15c121b3ed	Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.	2008-09-30 10:52:14 +00:00
Alvaro Herrera	d53a56687f	Initialize the minimum frozen Xid in vac_update_datfrozenxid using GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not depend on the latter being correctly set elsewhere, and while it is more expensive, this code path is not performance-critical. This is a real risk for autovacuum, because it can execute whole cycles without doing a single vacuum, which would mean that RecentGlobalXmin would stay at its initialization value, FirstNormalTransactionId, causing a bogus value to be inserted in pg_database. This bug could explain some recent reports of failure to truncate pg_clog. At the same time, change the initialization of RecentGlobalXmin to InvalidTransactionId, and ensure that it's set to something else whenever it's going to be used. Using it as FirstNormalTransactionId in HOT page pruning could incur in data loss. InitPostgres takes care of setting it to a valid value, but the extra checks are there to prevent "special" backends from behaving in unusual ways. Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us	2008-09-11 14:01:10 +00:00
Alvaro Herrera	3ccde312ec	Have autovacuum consider processing TOAST tables separately from their main tables. This requires vacuum() to accept processing a toast table standalone, so there's a user-visible change in that it's now possible (for a superuser) to execute "VACUUM pg_toast.pg_toast_XXX".	2008-08-13 00:07:50 +00:00
Alvaro Herrera	9319fd89e1	Modify vacuum() to accept a single relation OID instead of a list (which we always pass as a single element anyway.) In passing, fix an outdated comment.	2008-06-05 15:47:32 +00:00
Tom Lane	93c701edc6	Add support for tracking call counts and elapsed runtime for user-defined functions. Note that because this patch changes FmgrInfo, any external C functions you might be testing with 8.4 will need to be recompiled. Patch by Martin Pihlak, some editorialization by me (principally, removing tracking of getrusage() numbers)	2008-05-15 00:17:41 +00:00
Alvaro Herrera	5da9da71c4	Improve snapshot manager by keeping explicit track of snapshots. There are two ways to track a snapshot: there's the "registered" list, which is used for arbitrary long-lived snapshots; and there's the "active stack", which is used for the snapshot that is considered "active" at any time. This also allows users of snapshots to stop worrying about snapshot memory allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot assignment. This is all done automatically now. As a consequence, this allows us to reset MyProc->xmin when there are no more snapshots registered in the current backend, reducing the impact that long-running transactions have on VACUUM.	2008-05-12 20:02:02 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Alvaro Herrera	73b0300b2a	Move the HTSU_Result enum definition into snapshot.h, to avoid including tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.	2008-03-26 21:10:39 +00:00
Alvaro Herrera	78f02ca1f5	Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files. Per complaint from Tom Lane.	2008-03-26 18:48:59 +00:00
Alvaro Herrera	d43b085d57	Separate snapshot management code from tuple visibility code, create a snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.	2008-03-26 16:20:48 +00:00
Alvaro Herrera	a9686591d7	We no longer need a snapshot set after opening the finishing transaction: this is redundant because autovacuum now always analyzes a single table per transaction.	2008-03-19 14:18:21 +00:00
Alvaro Herrera	adc4e1e635	Fix vacuum so that autovacuum is really not cancelled when doing an emergency job (i.e. to prevent Xid wraparound problems.) Bug reported by ITAGAKI Takahiro in 20080314103837.63D3.52131E4D@oss.ntt.co.jp, though I didn't use his patch.	2008-03-14 17:25:59 +00:00
Tom Lane	3fcc7e8e18	Reduce memory consumption during VACUUM of large relations, by using FSMPageData (6 bytes) instead of PageFreeSpaceInfo (8 or 16 bytes) for the temporary array of page-free-space information. Itagaki Takahiro	2008-03-10 02:04:10 +00:00
Alvaro Herrera	f78611bba4	Improve error messages emitted when VACUUM and ANALYZE skip a table. Per gripe from Clodoaldo Pinto Neto on Message-ID: <a595de7a0801060326qbfc790ax2a60573043c2e2be@mail.gmail.com>	2008-02-20 14:31:35 +00:00
Tom Lane	c931c07124	Repair VACUUM FULL bug introduced by HOT patch: the original way of calculating a page's initial free space was fine, and should not have been "improved" by letting PageGetHeapFreeSpace do it. VACUUM FULL is going to reclaim LP_DEAD line pointers later, so there is no need for a guard against the page being too full of line pointers, and having one risks rejecting pages that are perfectly good move destinations. This also exposed a second bug, which is that the empty_end_pages logic assumed that any page with no live tuples would get entered into the fraged_pages list automatically (by virtue of having more free space than the threshold in the do_frag calculation). This assumption certainly seems risky when a low fillfactor has been chosen, and even without tunable fillfactor I think it could conceivably fail on a page with many unused line pointers. So fix the code to force do_frag true when notup is true, and patch this part of the fix all the way back. Per report from Tomas Szepe.	2008-02-11 19:14:30 +00:00
Tom Lane	eedb068c0a	Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX, and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600	2008-01-03 21:23:15 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	fdf5a5efb7	pgindent run for 8.3.	2007-11-15 21:14:46 +00:00
Alvaro Herrera	745c1b2c2a	Rearrange vacuum-related bits in PGPROC as a bitmask, to better support having several of them. Add two more flags: whether the process is executing an ANALYZE, and whether a vacuum is for Xid wraparound (which is obviously only set by autovacuum). Sneakily move the worker's recently-acquired PostAuthDelay to a more useful place.	2007-10-24 20:55:36 +00:00
Tom Lane	282d2a03dd	HOT updates. When we update a tuple without changing any of its indexed columns, and the new version can be stored on the same heap page, we no longer generate extra index entries for the new version. Instead, index searches follow the HOT-chain links to ensure they find the correct tuple version. In addition, this patch introduces the ability to "prune" dead tuples on a per-page basis, without having to do a complete VACUUM pass to recover space. VACUUM is still needed to clean up dead index entries, however. Pavan Deolasee, with help from a bunch of other people.	2007-09-20 17:56:33 +00:00
Tom Lane	6889303531	Redefine the lp_flags field of item pointers as having four states, rather than two independent bits (one of which was never used in heap pages anyway, or at least hadn't been in a very long time). This gives us flexibility to add the HOT notions of redirected and dead item pointers without requiring anything so klugy as magic values of lp_off and lp_len. The state values are chosen so that for the states currently in use (pre-HOT) there is no change in the physical representation.	2007-09-12 22:10:26 +00:00
Tom Lane	6bd4f401b0	Replace the former method of determining snapshot xmax --- to wit, calling ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid" variable that is updated during transaction commit or abort. Since latestCompletedXid is written only in places that had to lock ProcArrayLock exclusively anyway, and is read only in places that had to lock ProcArrayLock shared anyway, it adds no new locking requirements to the system despite being cluster-wide. Moreover, removing ReadNewTransactionId from snapshot acquisition eliminates the need to take both XidGenLock and ProcArrayLock at the same time. Since XidGenLock is sometimes held across I/O this can be a significant win. Some preliminary benchmarking suggested that this patch has no effect on average throughput but can significantly improve the worst-case transaction times seen in pgbench. Concept by Florian Pflug, implementation by Tom Lane.	2007-09-08 20:31:15 +00:00
Tom Lane	295e63983d	Implement lazy XID allocation: transactions that do not modify any database rows will normally never obtain an XID at all. We already did things this way for subtransactions, but this patch extends the concept to top-level transactions. In applications where there are lots of short read-only transactions, this should improve performance noticeably; not so much from removal of the actual XID-assignments, as from reduction of overhead that's driven by the rate of XID consumption. We add a concept of a "virtual transaction ID" so that active transactions can be uniquely identified even if they don't have a regular XID. This is a much lighter-weight concept: uniqueness of VXIDs is only guaranteed over the short term, and no on-disk record is made about them. Florian Pflug, with some editorialization by Tom.	2007-09-05 18:10:48 +00:00
Tom Lane	647fd9a108	Fix two bugs induced in VACUUM FULL by async-commit patch. First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be settable, because clog.c's inexact LSN bookkeeping results in windows where a previously flushed transaction is considered unhintable because it shares an LSN slot with a later unflushed transaction. But repair_frag requires XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the current vacuum. Since not being able to set the bit is an uncommon corner case, the most practical way of dealing with it seems to be to abandon shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose XMIN_COMMITTED bit couldn't be set. Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag examines the tuple it might be possible to set the bit. We therefore must take buffer content lock when calling HeapTupleSatisfiesVacuum a second time, else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This latter bug is latent in existing releases, but I think it cannot actually occur without async commit, since the first HeapTupleSatisfiesVacuum call should always have set the bit. So I'm not going to back-patch it. In passing, reduce the existing "cannot shrink relation" messages from NOTICE to LOG level. The new message must be no higher than LOG if we don't want unpredictable regression test failures, and consistency seems like a good idea. Also arrange that only one such message is reported per VACUUM FULL; in typical scenarios you could get spammed with many such messages, which seems a bit useless.	2007-08-13 19:08:26 +00:00
Tom Lane	4a78cdeb6b	Support an optional asynchronous commit mode, in which we don't flush WAL before reporting a transaction committed. Data consistency is still guaranteed (unlike setting fsync = off), but a crash may lose the effects of the last few transactions. Patch by Simon, some editorialization by Tom.	2007-08-01 22:45:09 +00:00
Alvaro Herrera	bd06ab29ae	Avoid having autovacuum run multiple ANALYZE commands in a single transaction, to prevent possible deadlock problems. Per request from Tom Lane.	2007-06-14 13:53:14 +00:00
Tom Lane	d526575f89	Make large sequential scans and VACUUMs work in a limited-size "ring" of buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.	2007-05-30 20:12:03 +00:00
Alvaro Herrera	3b0347b36e	Move the tuple freezing point in CLUSTER to a point further back in the past, to avoid losing useful Xid information in not-so-old tuples. This makes CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes (though CLUSTER does not yet advance the table's relfrozenxid). While at it, move the actual freezing operation in rewriteheap.c to a more appropriate place, and document it thoroughly. This part of the patch from Tom Lane.	2007-05-17 15:28:29 +00:00
Alvaro Herrera	e2a186b03c	Add a multi-worker capability to autovacuum. This allows multiple worker processes to be running simultaneously. Also, now autovacuum processes do not count towards the max_connections limit; they are counted separately from regular processes, and are limited by the new GUC variable autovacuum_max_workers. The launcher now has intelligence to launch workers on each database every autovacuum_naptime seconds, limited only on the max amount of worker slots available. Also, the global worker I/O utilization is limited by the vacuum cost-based delay feature. Workers are "balanced" so that the total I/O consumption does not exceed the established limit. This part of the patch was contributed by ITAGAKI Takahiro. Per discussion.	2007-04-16 18:30:04 +00:00
Tom Lane	d3ff180163	Fix a longstanding bug in VACUUM FULL's handling of update chains. The code did not expect that a DEAD tuple could follow a RECENTLY_DEAD tuple in an update chain, but because the OldestXmin rule for determining deadness is a simplification of reality, it is possible for this situation to occur (implying that the RECENTLY_DEAD tuple is in fact dead to all observers, but this patch does not attempt to exploit that). The code would follow a chain forward all the way, but then stop before a DEAD tuple when backing up, meaning that not all of the chain got moved. This could lead to copying the chain multiple times (resulting in duplicate copies of the live tuple at its end), or leaving dangling index entries behind (which, aside from generating warnings from later vacuums, creates a risk of wrong query results or bogus duplicate-key errors once the heap slot the index entry points to is repopulated). The fix is to recheck HeapTupleSatisfiesVacuum while following a chain forward, and to stop if a DEAD tuple is reached. Each contiguous group of RECENTLY_DEAD tuples will therefore be copied as a separate chain. The patch also adds a couple of extra sanity checks to verify correct behavior. Per report and test case from Pavan Deolasee.	2007-03-14 18:48:55 +00:00
Tom Lane	b9527e9840	First phase of plan-invalidation project: create a plan cache management module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.	2007-03-13 00:33:44 +00:00
Tom Lane	2825337232	Fix vac_update_relstats to ensure it always sends a relcache inval message, even if none of the fields in the pg_class row change. This behavior is necessary to ensure other backends flush rd_targblock values that might point to truncated-away pages. We got this right pre-8.2 but it was broken by overoptimistic change to not write out the pg_class row if unchanged. Per report from Pavan Deolasee.	2007-03-08 17:03:31 +00:00
Alvaro Herrera	1820650934	Restructure autovacuum in two processes: a dummy process, which runs continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.	2007-02-15 23:23:23 +00:00
Tom Lane	23c4978e6c	Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to describe the maximum size of index tuples (which is typically AM-dependent anyway); and consequently remove the bogus deduction for "special space" that was built into it. Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two bytes per toast chunk, and to ensure that the calculation correctly tracks any future changes in page header size. The computation had been inaccurate in a way that didn't cause any harm except space wastage, but future changes could have broken it more drastically. Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte more than it could safely be. This didn't cause any harm in practice because it's only compared against maxalign'd lengths, but future changes in the size of page headers or btree special space could have exposed the problem. initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the storage of toast tables.	2007-02-05 04:22:18 +00:00
Bruce Momjian	8b4ff8b6a1	Wording cleanup for error messages. Also change can't -> cannot. Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".	2007-02-01 19:10:30 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	48188e1621	Fix recently-understood problems with handling of XID freezing, particularly in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.	2006-11-05 22:42:10 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	9e936693a9	Fix free space map to correctly track the total amount of FSM space needed even when a single relation requires more than max_fsm_pages pages. Also, make VACUUM emit a warning in this case, since it likely means that VACUUM FULL or other drastic corrective measure is needed. Per reports from Jeff Frost and others of unexpected changes in the claimed max_fsm_pages need.	2006-09-21 20:31:22 +00:00
Tom Lane	2e5e856f6b	Marginal cleanup in arrangements for ensuring StrategyHintVacuum is cleared after an error during VACUUM. We have a PG_TRY block anyway around the only call sites, so just reset it in the CATCH clause instead of having AtEOXact_Buffers blindly do it during xact end. I think the old code was actively wrong for the case of a failure during ANALYZE inside a subtransaction --- the flag wouldn't get cleared until main transaction end. Probably not worth back-patching though.	2006-09-17 22:16:22 +00:00
Tom Lane	7aa772f03e	Now that we've rearranged relation open to get a lock before touching the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.	2006-08-18 16:09:13 +00:00
Tom Lane	09d3670df3	Change the relation_open protocol so that we obtain lock on a relation (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.	2006-07-31 20:09:10 +00:00
Alvaro Herrera	92c2ecc130	Modify snapshot definition so that lazy vacuums are ignored by other vacuums. This allows a OLTP-like system with big tables to continue regular vacuuming on small-but-frequently-updated tables while the big tables are being vacuumed. Original patch from Hannu Krossing, rewritten by Tom Lane and updated by me.	2006-07-30 02:07:18 +00:00
Bruce Momjian	e0522505bd	Remove 576 references of include files that were not needed.	2006-07-14 14:52:27 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Tom Lane	3fdeb189e9	Clean up code associated with updating pg_class statistics columns (relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.	2006-05-10 23:18:39 +00:00
Tom Lane	cb98e6fb8f	Create a syscache for pg_database-indexed-by-oid, and make use of it in various places that were previously doing ad hoc pg_database searches. This may speed up database-related privilege checks a little bit, but the main motivation is to eliminate the performance reason for having ReverifyMyDatabase do such a lot of stuff (viz, avoiding repeat scans of pg_database during backend startup). The locking reason for having that routine is about to go away, and it'd be good to have the option to break it up.	2006-05-03 22:45:26 +00:00
Tom Lane	e57345975c	Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion. This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().	2006-05-02 22:25:10 +00:00
Teodor Sigaev	8a3631f8d8	GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN.	2006-05-02 11:28:56 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	fd267c1ebc	Skip ambulkdelete scan if there's nothing to delete and the index is not partial. None of the existing AMs do anything useful except counting tuples when there's nothing to delete, and we can get a tuple count from the heap as long as it's not a partial index. (hash actually can skip anyway because it maintains a tuple count in the index metapage.) GIST is not currently able to exploit this optimization because, due to failure to index NULLs, GIST is always effectively partial. Possibly we should fix that sometime. Simon Riggs w/ some review by Tom Lane.	2006-02-11 23:31:34 +00:00
Bruce Momjian	77bb65d3fc	Revert based on Tom's recommendation: > Allow VACUUM to complete faster by avoiding scanning the indexes when no > rows were removed from the heap by the VACUUM.	2006-02-11 17:14:09 +00:00
Bruce Momjian	bf324946b3	Allow VACUUM to complete faster by avoiding scanning the indexes when no rows were removed from the heap by the VACUUM. Simon Riggs	2006-02-11 16:59:09 +00:00
Tom Lane	d5db3abfb6	Modify pgstats code to reduce performance penalties from oversized stats data files: avoid creating stats hashtable entries for tables that aren't being touched except by vacuum/analyze, ensure that entries for dropped tables are removed promptly, and tweak the data layout to avoid storing useless struct padding. Also improve the performance of pgstat_vacuum_tabstat(), and make sure that autovacuum invokes it exactly once per autovac cycle rather than multiple times or not at all. This should cure recent complaints about 8.1 showing much higher stats I/O volume than was seen in 8.0. It'd still be a good idea to revisit the design with an eye to not re-writing the entire stats dataset every half second ... but that would be too much to backpatch, I fear.	2006-01-18 20:35:06 +00:00
Tom Lane	e0078ea22d	Fix another case in which autovacuum would fail while analyzing expressional indexes. Per report from Brian Hirt.	2006-01-04 19:16:24 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	dd218ae7b0	Remove the t_datamcxt field of HeapTupleData. This was introduced for the convenience of tuptoaster.c and is no longer needed, so may as well get rid of some small amount of overhead.	2005-11-20 19:49:08 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Tom Lane	12992ab37a	Separate out the VacRUsage stuff as an independent module, in preparation for using it for other things besides VACUUM.	2005-10-03 22:52:26 +00:00
Tom Lane	a453951dd9	Take exclusive buffer lock in scan_heap() to eliminate some corner cases in which invalid page data could be transiently written to disk by concurrent bgwriter activity. There doesn't seem any risk of loss of actual user data, but an empty page could possibly be left corrupt if a crash occurs before the correct data gets written out. Pointed out by Alvaro Herrera.	2005-09-22 17:32:58 +00:00
Tom Lane	35e9b1cc1e	Clean up a couple of ad-hoc computations of the maximum number of tuples on a page, as suggested by ITAGAKI Takahiro. Also, change a few places that were using some other estimates of max-items-per-page to consistently use MaxOffsetNumber. This is conservatively large --- we could have used the new MaxHeapTuplesPerPage macro, or a similar one for index tuples --- but those places are simply declaring a fixed-size buffer and assuming it will work, rather than actively testing for overrun. It seems safer to size these buffers in a way that can't overflow even if the page is corrupt.	2005-09-02 19:02:20 +00:00
Tom Lane	f57e3f4cf3	Repair problems with VACUUM destroying t_ctid chains too soon, and with insufficient paranoia in code that follows t_ctid links. (We must do both because even with VACUUM doing it properly, the intermediate state with a dangling t_ctid link is visible concurrently during lazy VACUUM, and could be seen afterwards if either type of VACUUM crashes partway through.) Also try to improve documentation about what's going on. Patch is a bit bulky because passing the XMAX information around required changing the APIs of some low-level heapam.c routines, but it's not conceptually very complicated. Per trouble report from Teodor and subsequent analysis. This needs to be back-patched, but I'll do that after 8.1 beta is out.	2005-08-20 00:40:32 +00:00
Tom Lane	5d5f1a79e6	Clean up a number of autovacuum loose ends. Make the stats collector track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.	2005-07-29 19:30:09 +00:00
Tom Lane	29094193f5	Integrate autovacuum functionality into the backend. There's still a few loose ends to be dealt with, but it seems to work. Alvaro Herrera, based on the contrib code by Matthew O'Connor.	2005-07-14 05:13:45 +00:00
Tom Lane	8563ccae2c	Simplify shared-memory lock data structures as per recent discussion: it is sufficient to track whether a backend holds a lock or not, and store information about transaction vs. session locks only in the inside-the-backend LocalLockTable. Since there can now be but one PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed, thus eliminating some O(N^2) behavior when a backend holds many locks. Also simplify the LockAcquire/LockRelease API by passing just a 'sessionLock' boolean instead of a transaction ID. The previous API was designed with the idea that per-transaction lock holding would be important for subtransactions, but now that we have subtransactions we know that this is unwanted. While at it, add an 'isTempObject' parameter to LockAcquire to indicate whether the lock is being taken on a temp table. This is not used just yet, but will be needed shortly for two-phase commit.	2005-06-14 22:15:33 +00:00
Tom Lane	ee3b71f6bc	Split the shared-memory array of PGPROC pointers out of the sinval communication structure, and make it its own module with its own lock. This should reduce contention at least a little, and it definitely makes the code seem cleaner. Per my recent proposal.	2005-05-19 21:35:48 +00:00

1 2 3 4 5 ...

458 Commits