Commit Graph

401 Commits

Author SHA1 Message Date
Alvaro Herrera 3b0347b36e Move the tuple freezing point in CLUSTER to a point further back in the past,
to avoid losing useful Xid information in not-so-old tuples.  This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).

While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly.  This part of the patch from
Tom Lane.
2007-05-17 15:28:29 +00:00
Alvaro Herrera e2a186b03c Add a multi-worker capability to autovacuum. This allows multiple worker
processes to be running simultaneously.  Also, now autovacuum processes do not
count towards the max_connections limit; they are counted separately from
regular processes, and are limited by the new GUC variable
autovacuum_max_workers.

The launcher now has intelligence to launch workers on each database every
autovacuum_naptime seconds, limited only on the max amount of worker slots
available.

Also, the global worker I/O utilization is limited by the vacuum cost-based
delay feature.  Workers are "balanced" so that the total I/O consumption does
not exceed the established limit.  This part of the patch was contributed by
ITAGAKI Takahiro.

Per discussion.
2007-04-16 18:30:04 +00:00
Tom Lane d3ff180163 Fix a longstanding bug in VACUUM FULL's handling of update chains. The code
did not expect that a DEAD tuple could follow a RECENTLY_DEAD tuple in an
update chain, but because the OldestXmin rule for determining deadness is a
simplification of reality, it is possible for this situation to occur
(implying that the RECENTLY_DEAD tuple is in fact dead to all observers,
but this patch does not attempt to exploit that).  The code would follow a
chain forward all the way, but then stop before a DEAD tuple when backing
up, meaning that not all of the chain got moved.  This could lead to copying
the chain multiple times (resulting in duplicate copies of the live tuple at
its end), or leaving dangling index entries behind (which, aside from
generating warnings from later vacuums, creates a risk of wrong query
results or bogus duplicate-key errors once the heap slot the index entry
points to is repopulated).

The fix is to recheck HeapTupleSatisfiesVacuum while following a chain
forward, and to stop if a DEAD tuple is reached.  Each contiguous group
of RECENTLY_DEAD tuples will therefore be copied as a separate chain.
The patch also adds a couple of extra sanity checks to verify correct
behavior.

Per report and test case from Pavan Deolasee.
2007-03-14 18:48:55 +00:00
Tom Lane b9527e9840 First phase of plan-invalidation project: create a plan cache management
module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan).  This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.

Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too.  Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
2007-03-13 00:33:44 +00:00
Tom Lane 2825337232 Fix vac_update_relstats to ensure it always sends a relcache inval message,
even if none of the fields in the pg_class row change.  This behavior is
necessary to ensure other backends flush rd_targblock values that might
point to truncated-away pages.  We got this right pre-8.2 but it was broken
by overoptimistic change to not write out the pg_class row if unchanged.
Per report from Pavan Deolasee.
2007-03-08 17:03:31 +00:00
Alvaro Herrera 1820650934 Restructure autovacuum in two processes: a dummy process, which runs
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work.  This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.

For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
2007-02-15 23:23:23 +00:00
Tom Lane 23c4978e6c Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to
describe the maximum size of index tuples (which is typically AM-dependent
anyway); and consequently remove the bogus deduction for "special space"
that was built into it.

Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two
bytes per toast chunk, and to ensure that the calculation correctly tracks any
future changes in page header size.  The computation had been inaccurate in a
way that didn't cause any harm except space wastage, but future changes could
have broken it more drastically.

Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte
more than it could safely be.  This didn't cause any harm in practice because
it's only compared against maxalign'd lengths, but future changes in the size
of page headers or btree special space could have exposed the problem.

initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the
storage of toast tables.
2007-02-05 04:22:18 +00:00
Bruce Momjian 8b4ff8b6a1 Wording cleanup for error messages. Also change can't -> cannot.
Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".
2007-02-01 19:10:30 +00:00
Bruce Momjian 29dccf5fe0 Update CVS HEAD for 2007 copyright. Back branches are typically not
back-stamped for this.
2007-01-05 22:20:05 +00:00
Tom Lane 48188e1621 Fix recently-understood problems with handling of XID freezing, particularly
in PITR scenarios.  We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases.  Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId.  Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done.  Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database.  initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs.  Heikki Linnakangas, Simon Riggs, and Tom Lane.
2006-11-05 22:42:10 +00:00
Bruce Momjian f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane 9e936693a9 Fix free space map to correctly track the total amount of FSM space needed
even when a single relation requires more than max_fsm_pages pages.  Also,
make VACUUM emit a warning in this case, since it likely means that VACUUM
FULL or other drastic corrective measure is needed.  Per reports from Jeff
Frost and others of unexpected changes in the claimed max_fsm_pages need.
2006-09-21 20:31:22 +00:00
Tom Lane 2e5e856f6b Marginal cleanup in arrangements for ensuring StrategyHintVacuum is cleared
after an error during VACUUM.  We have a PG_TRY block anyway around the only
call sites, so just reset it in the CATCH clause instead of having
AtEOXact_Buffers blindly do it during xact end.  I think the old code was
actively wrong for the case of a failure during ANALYZE inside a
subtransaction --- the flag wouldn't get cleared until main transaction end.
Probably not worth back-patching though.
2006-09-17 22:16:22 +00:00
Tom Lane 7aa772f03e Now that we've rearranged relation open to get a lock before touching
the rel, it's easy to get rid of the narrow race-condition window that
used to exist in VACUUM and CLUSTER.  Did some minor code-beautification
work in the same area, too.
2006-08-18 16:09:13 +00:00
Tom Lane 09d3670df3 Change the relation_open protocol so that we obtain lock on a relation
(table or index) before trying to open its relcache entry.  This fixes
race conditions in which someone else commits a change to the relation's
catalog entries while we are in process of doing relcache load.  Problems
of that ilk have been reported sporadically for years, but it was not
really practical to fix until recently --- for instance, the recent
addition of WAL-log support for in-place updates helped.

Along the way, remove pg_am.amconcurrent: all AMs are now expected to support
concurrent update.
2006-07-31 20:09:10 +00:00
Alvaro Herrera 92c2ecc130 Modify snapshot definition so that lazy vacuums are ignored by other
vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
2006-07-30 02:07:18 +00:00
Bruce Momjian e0522505bd Remove 576 references of include files that were not needed. 2006-07-14 14:52:27 +00:00
Bruce Momjian a22d76d96a Allow include files to compile own their own.
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
2006-07-13 16:49:20 +00:00
Alvaro Herrera d4cef0aa2a Improve vacuum code to track minimum Xids per table instead of per database.
To this end, add a couple of columns to pg_class, relminxid and relvacuumxid,
based on which we calculate the pg_database columns after each vacuum.

We now force all databases to be vacuumed, even template ones.  A backend
noticing too old a database (meaning pg_database.datminxid is in danger of
falling behind Xid wraparound) will signal the postmaster, which in turn will
start an autovacuum iteration to process the offending database.  In principle
this is only there to cope with frozen (non-connectable) databases without
forcing users to set them to connectable, but it could force regular user
database to go through a database-wide vacuum at any time.  Maybe we should
warn users about this somehow.  Of course the real solution will be to use
autovacuum all the time ;-)

There are some additional improvements we could have in this area: for example
the vacuum code could be smarter about not updating pg_database for each table
when called by autovacuum, and do it only once the whole autovacuum iteration
is done.

I updated the system catalogs documentation, but I didn't modify the
maintenance section.  Also having some regression tests for this would be nice
but it's not really a very straightforward thing to do.

Catalog version bumped due to system catalog changes.
2006-07-10 16:20:52 +00:00
Tom Lane b7b78d24f7 Code review for FILLFACTOR patch. Change WITH grammar as per earlier
discussion (including making def_arg allow reserved words), add missed
opt_definition for UNIQUE case.  Put the reloptions support code in a less
random place (I chose to make a new file access/common/reloptions.c).
Eliminate header inclusion creep.  Make the index options functions safely
user-callable (seems like client apps might like to be able to test validity
of options before trying to make an index).  Reduce overhead for normal case
with no options by allowing rd_options to be NULL.  Fix some unmaintainably
klugy code, including getting rid of Natts_pg_class_fixed at long last.
Some stylistic cleanup too, and pay attention to keeping comments in sync
with code.

Documentation still needs work, though I did fix the omissions in
catalogs.sgml and indexam.sgml.
2006-07-03 22:45:41 +00:00
Bruce Momjian 277807bd9e Add FILLFACTOR to CREATE INDEX.
ITAGAKI Takahiro
2006-07-02 02:23:23 +00:00
Tom Lane 3fdeb189e9 Clean up code associated with updating pg_class statistics columns
(relpages/reltuples).  To do this, create formal support in heapam.c for
"overwrite" tuple updates (including xlog replay capability) and use that
instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX.
Take the responsibility for updating stats during CREATE INDEX out of the
individual index AMs, and do it where it belongs, in catalog/index.c.  Aside
from being more modular, this avoids having to update the same tuple twice in
some paths through CREATE INDEX.  It's probably not measurably faster, but
for sure it's a lot cleaner than before.
2006-05-10 23:18:39 +00:00
Tom Lane cb98e6fb8f Create a syscache for pg_database-indexed-by-oid, and make use of it
in various places that were previously doing ad hoc pg_database searches.
This may speed up database-related privilege checks a little bit, but
the main motivation is to eliminate the performance reason for having
ReverifyMyDatabase do such a lot of stuff (viz, avoiding repeat scans
of pg_database during backend startup).  The locking reason for having
that routine is about to go away, and it'd be good to have the option
to break it up.
2006-05-03 22:45:26 +00:00
Tom Lane e57345975c Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion.
This formulation requires every AM to provide amvacuumcleanup, unlike before,
but it's surely a whole lot cleaner.  Also, add an 'amstorage' column to
pg_am so that we can get rid of hardwired knowledge in DefineOpClass().
2006-05-02 22:25:10 +00:00
Teodor Sigaev 8a3631f8d8 GIN: Generalized Inverted iNdex.
text[], int4[], Tsearch2 support for GIN.
2006-05-02 11:28:56 +00:00
Tom Lane a8b8f4db23 Clean up WAL/buffer interactions as per my recent proposal. Get rid of the
misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer.  Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer.  So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.
2006-03-31 23:32:07 +00:00
Bruce Momjian f2f5b05655 Update copyright for 2006. Update scripts. 2006-03-05 15:59:11 +00:00
Tom Lane fd267c1ebc Skip ambulkdelete scan if there's nothing to delete and the index is not
partial.  None of the existing AMs do anything useful except counting
tuples when there's nothing to delete, and we can get a tuple count
from the heap as long as it's not a partial index.  (hash actually can
skip anyway because it maintains a tuple count in the index metapage.)
GIST is not currently able to exploit this optimization because, due to
failure to index NULLs, GIST is always effectively partial.  Possibly
we should fix that sometime.
Simon Riggs w/ some review by Tom Lane.
2006-02-11 23:31:34 +00:00
Bruce Momjian 77bb65d3fc Revert based on Tom's recommendation:
> Allow VACUUM to complete faster by avoiding scanning the indexes when no
> rows were removed from the heap by the VACUUM.
2006-02-11 17:14:09 +00:00
Bruce Momjian bf324946b3 Allow VACUUM to complete faster by avoiding scanning the indexes when no
rows were removed from the heap by the VACUUM.

Simon Riggs
2006-02-11 16:59:09 +00:00
Tom Lane d5db3abfb6 Modify pgstats code to reduce performance penalties from oversized stats data
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding.  Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all.  This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0.  It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.
2006-01-18 20:35:06 +00:00
Tom Lane e0078ea22d Fix another case in which autovacuum would fail while analyzing
expressional indexes.  Per report from Brian Hirt.
2006-01-04 19:16:24 +00:00
Bruce Momjian 436a2956d8 Re-run pgindent, fixing a problem where comment lines after a blank
comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.
2005-11-22 18:17:34 +00:00
Tom Lane dd218ae7b0 Remove the t_datamcxt field of HeapTupleData. This was introduced for
the convenience of tuptoaster.c and is no longer needed, so may as well
get rid of some small amount of overhead.
2005-11-20 19:49:08 +00:00
Bruce Momjian 1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane 12992ab37a Separate out the VacRUsage stuff as an independent module, in preparation
for using it for other things besides VACUUM.
2005-10-03 22:52:26 +00:00
Tom Lane a453951dd9 Take exclusive buffer lock in scan_heap() to eliminate some corner cases
in which invalid page data could be transiently written to disk by
concurrent bgwriter activity.  There doesn't seem any risk of loss of
actual user data, but an empty page could possibly be left corrupt if a
crash occurs before the correct data gets written out.  Pointed out by
Alvaro Herrera.
2005-09-22 17:32:58 +00:00
Tom Lane 35e9b1cc1e Clean up a couple of ad-hoc computations of the maximum number of tuples
on a page, as suggested by ITAGAKI Takahiro.  Also, change a few places
that were using some other estimates of max-items-per-page to consistently
use MaxOffsetNumber.  This is conservatively large --- we could have used
the new MaxHeapTuplesPerPage macro, or a similar one for index tuples ---
but those places are simply declaring a fixed-size buffer and assuming it
will work, rather than actively testing for overrun.  It seems safer to
size these buffers in a way that can't overflow even if the page is
corrupt.
2005-09-02 19:02:20 +00:00
Tom Lane f57e3f4cf3 Repair problems with VACUUM destroying t_ctid chains too soon, and with
insufficient paranoia in code that follows t_ctid links.  (We must do both
because even with VACUUM doing it properly, the intermediate state with
a dangling t_ctid link is visible concurrently during lazy VACUUM, and
could be seen afterwards if either type of VACUUM crashes partway through.)
Also try to improve documentation about what's going on.  Patch is a bit
bulky because passing the XMAX information around required changing the
APIs of some low-level heapam.c routines, but it's not conceptually very
complicated.  Per trouble report from Teodor and subsequent analysis.
This needs to be back-patched, but I'll do that after 8.1 beta is out.
2005-08-20 00:40:32 +00:00
Tom Lane 5d5f1a79e6 Clean up a number of autovacuum loose ends. Make the stats collector
track shared relations in a separate hashtable, so that operations done
from different databases are counted correctly.  Add proper support for
anti-XID-wraparound vacuuming, even in databases that are never connected
to and so have no stats entries.  Miscellaneous other bug fixes.
Alvaro Herrera, some additional fixes by Tom Lane.
2005-07-29 19:30:09 +00:00
Tom Lane 29094193f5 Integrate autovacuum functionality into the backend. There's still a
few loose ends to be dealt with, but it seems to work.  Alvaro Herrera,
based on the contrib code by Matthew O'Connor.
2005-07-14 05:13:45 +00:00
Tom Lane 8563ccae2c Simplify shared-memory lock data structures as per recent discussion:
it is sufficient to track whether a backend holds a lock or not, and
store information about transaction vs. session locks only in the
inside-the-backend LocalLockTable.  Since there can now be but one
PROCLOCK per lock per backend, LockCountMyLocks() is no longer needed,
thus eliminating some O(N^2) behavior when a backend holds many locks.
Also simplify the LockAcquire/LockRelease API by passing just a
'sessionLock' boolean instead of a transaction ID.  The previous API
was designed with the idea that per-transaction lock holding would be
important for subtransactions, but now that we have subtransactions we
know that this is unwanted.  While at it, add an 'isTempObject' parameter
to LockAcquire to indicate whether the lock is being taken on a temp
table.  This is not used just yet, but will be needed shortly for
two-phase commit.
2005-06-14 22:15:33 +00:00
Tom Lane ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Tom Lane 278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane 162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane 354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane 5d5087363d Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
2005-03-04 20:21:07 +00:00
Tom Lane 0fc4ecf935 Finish up the flat-files project: get rid of GetRawDatabaseInfo() hack
in favor of looking at the flat file copy of pg_database during backend
startup.  This should finally eliminate the various corner cases in which
backend startup fails unexpectedly because it isn't able to distinguish
live and dead tuples in pg_database.  Simplify locking on pg_database
to be similar to the rules used with pg_shadow and pg_group, and eliminate
FlushRelationBuffers operations that were used only to reduce the odds
of failure of GetRawDatabaseInfo.
initdb forced due to addition of a trigger to pg_database.
2005-02-26 18:43:34 +00:00
Tom Lane 60b2444cc3 Add code to prevent transaction ID wraparound by enforcing a safe limit
in GetNewTransactionId().  Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery.  The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value.  This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
2005-02-20 02:22:07 +00:00
Bruce Momjian 3110cd237c Update comment on VACUUM FULL.
Manfred Koizar
2005-02-15 03:50:07 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane 08457504f0 Avoid memory leakage during VACUUM FULL when an index expression or
index predicate uses temporary memory for evaluation.  Per example
from Jean-Gerard Pailloncy.
2004-12-23 22:42:15 +00:00
Tom Lane e9c03c3b1b Disallow the combination VACUUM FULL FREEZE for safety's sake, for the
reasons I outlined in pghackers a few days ago.

Also, undo someone's overly optimistic decision to reduce tuple state
checks from if (...) elog() to Asserts.  If I trusted this code more,
I might think it was a good idea to disable these checks in production
installations.  But I don't.
2004-12-02 19:28:49 +00:00
Tom Lane 5374d097de Change planner to use the current true disk file size as its estimate of
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages.  Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows.  When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows.  This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created.  Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly.  (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
2004-12-01 19:00:56 +00:00
Tom Lane 9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Bruce Momjian 9da50e1f53 Back out unindented modification to file. 2004-10-07 14:19:58 +00:00
Bruce Momjian cdc84adbdb Indent comment pushed to new line by else so it is indented by BSD
indent.
2004-10-07 14:15:50 +00:00
Tom Lane d2af5f8a3e Adjust index locking rules as per my proposal of earlier today. You
now are supposed to take some kind of lock on an index whenever you
are going to access the index contents, rather than relying only on a
lock on the parent table.
2004-09-30 23:21:26 +00:00
Tom Lane b2c4071299 Redesign query-snapshot timing so that volatile functions in READ COMMITTED
mode see a fresh snapshot for each command in the function, rather than
using the latest interactive command's snapshot.  Also, suppress fresh
snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE
functions, instead using the snapshot taken for the most closely nested
regular query.  (This behavior is only sane for read-only functions, so
the patch also enforces that such functions contain only SELECT commands.)
As per my proposal of 6-Sep-2004; I note that I floated essentially the
same proposal on 19-Jun-2002, but that discussion tailed off without any
action.  Since 8.0 seems like the right place to be taking possibly
nontrivial backwards compatibility hits, let's get it done now.
2004-09-13 20:10:13 +00:00
Bruce Momjian 15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane 4dbb880d3c Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again.  We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint.  slru.c's API is
refactored a little bit to separate out the necessary decisions.
2004-08-23 23:22:45 +00:00
Bruce Momjian 0307c09cf5 Rename vacuum_cost_naptime to vacuum_cost_delay, with agreement from Jan. 2004-08-06 04:15:09 +00:00
Tom Lane a393fbf937 Restructure error handling as recently discussed. It is now really
possible to trap an error inside a function rather than letting it
propagate out to PostgresMain.  You still have to use AbortCurrentTransaction
to clean up, but at least the error handling itself will cooperate.
2004-07-31 00:45:57 +00:00
Tom Lane 2042b3428d Invent WAL timelines, as per recent discussion, to make point-in-time
recovery more manageable.  Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
2004-07-21 22:31:26 +00:00
Bruce Momjian 6f0d96f8c8 Fix unused variable when Assert() not used. 2004-07-20 22:56:29 +00:00
Tom Lane 573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Bruce Momjian 5e926cbb6b vacuum.c refactoring
. rename variables
     . cur_buffer -> dst_buffer
     . ToPage -> dst_page
     . cur_page -> dst_vacpage
   . move variable declarations into block where variable is used
   . various Asserts instead of elog(ERROR, ...)
   . extract functionality from repair_frag() into new routines
     . move_chain_tuple()
     . move_plain_tuple()
     . update_hint_bits()
   . create type ExecContext
   . add comments

Manfred Koizar
2004-06-08 13:59:36 +00:00
Tom Lane c3a153afed Tweak palloc/repalloc to allow zero bytes to be requested, as per recent
proposal.  Eliminate several dozen now-unnecessary hacks to avoid palloc(0).
(It's likely there are more that I didn't find.)
2004-06-05 19:48:09 +00:00
Tom Lane e674707968 Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem.  All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
2004-05-31 19:24:05 +00:00
Neil Conway d0b4399d81 Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.

The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
2004-05-26 04:41:50 +00:00
Tom Lane 4d86ae4260 For multi-table ANALYZE, use per-table transactions when possible
(ie, when not inside a transaction block), so that we can avoid holding
locks longer than necessary.  Per trouble report from Philip Warner.
2004-05-22 23:14:38 +00:00
Tom Lane e6319d1d28 Put back #include <sys/time.h> in files that seem to need it on Linux. 2004-05-21 16:08:47 +00:00
Tom Lane 4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Tom Lane 7bed6df184 Check only live tuples for OID validity, to avoid issues after ALTER
TABLE.  Per recent discussion.
2004-02-12 05:39:55 +00:00
Tom Lane 58f337a343 Centralize implementation of delay code by creating a pg_usleep()
subroutine in src/port/pgsleep.c.  Remove platform dependencies from
miscadmin.h and put them in port.h where they belong.  Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.

By the way, where is the documentation for the cost-based-delay patch?
2004-02-10 03:42:45 +00:00
Tom Lane 87bd956385 Restructure smgr API as per recent proposal. smgr no longer depends on
the relcache, and so the notion of 'blind write' is gone.  This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
2004-02-10 01:55:27 +00:00
Neil Conway 192ad63bd7 More janitorial work: remove the explicit casting of NULL literals to a
pointer type when it is not necessary to do so.

For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
2004-01-07 18:56:30 +00:00
Neil Conway dfc7e7b71d Code cleanup, mostly in the smgr:
- Update comment in IsReservedName() to the present day

     - Improve some variable & function names in commands/vacuum.c. I
       was planning to rewrite this to avoid lappend(), but since I
       still intend to do the list rewrite, there's no need for that.

     - Update some smgr comments which seemed to imply that we still
       forced all dirty pages to disk at commit-time.

     - Replace some #ifdef DIAGNOSTIC code with assertions.

     - Make the distinction between OS-level file descriptors and
       virtual file descriptors a little clearer in a few comments

     - Other minor comment improvements in the smgr code
2004-01-06 18:07:32 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Jan Wieck 6b86d62b00 2nd try for the ARC strategy.
I added a couple more Assertions while tracking down the exact
cause of the former bug.

All 93 regression tests pass now.

Jan
2003-11-13 14:57:15 +00:00
Jan Wieck 923e994d79 ARC strategy backed out ... sorry
Jan
2003-11-13 05:34:58 +00:00
Jan Wieck 48adc0b34b Replacement of the buffer replacement strategy with an ARC
algorithm adopted for PostgreSQL.

Jan
2003-11-13 00:40:02 +00:00
Tom Lane fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane c1d62bfd00 Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number.  Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations.  I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
2003-11-09 21:30:38 +00:00
Tom Lane 19c90bcc25 Add a bit more locking to vac_update_relstats and vac_update_dbstats
to make them comparable to what UpdateStats does in the same situation.
I'm not certain two instances of vac_update_relstats could run in
parallel for the same relation, but parallel invocations of vac_update_dbstats
do seem possible.
2003-10-02 23:19:44 +00:00
Peter Eisentraut 7438af96fa More message editing, some suggested by Alvaro Herrera 2003-09-29 00:05:25 +00:00
Peter Eisentraut feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Tom Lane a56a016ceb Repair some REINDEX problems per recent discussions. The relcache is
now able to cope with assigning new relfilenode values to nailed-in-cache
indexes, so they can be reindexed using the fully crash-safe method.  This
leaves only shared system indexes as special cases.  Remove the 'index
deactivation' code, since it provides no useful protection in the shared-
index case.  Require reindexing of shared indexes to be done in standalone
mode, but remove other restrictions on REINDEX.  -P (IgnoreSystemIndexes)
now prevents using indexes for lookups, but does not disable index updates.
It is therefore safe to allow from PGOPTIONS.  Upshot: reindexing system catalogs
can be done without a standalone backend for all cases except
shared catalogs.
2003-09-24 18:54:02 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane da4ed8bfdd Another round of error message editing, covering backend/commands/. 2003-07-20 21:56:35 +00:00
Peter Eisentraut b256f24264 First batch of object rename commands. 2003-06-27 14:45:32 +00:00
Tom Lane fc8d970cbc Replace functional-index facility with expressional indexes. Any column
of an index can now be a computed expression instead of a simple variable.
Restrictions on expressions are the same as for predicates (only immutable
functions, no sub-selects).  This fixes problems recently introduced with
inlining SQL functions, because the inlining transformation is applied to
both expression trees so the planner can still match them up.  Along the
way, improve efficiency of handling index predicates (both predicates and
index expressions are now cached by the relcache) and fix 7.3 oversight
that didn't record dependencies of predicate expressions.
2003-05-28 16:04:02 +00:00
Bruce Momjian 98b6f37e47 Make debug_ GUC varables output DEBUG1 rather than LOG, and mention in
docs that CLIENT/LOG_MIN_MESSAGES now controls debug_* output location.
Doc changes included.
2003-05-27 17:49:47 +00:00
Tom Lane f85f43dfb5 Backend support for autocommit removed, per recent discussions. The
only remnant of this failed experiment is that the server will take
SET AUTOCOMMIT TO ON.  Still TODO: provide some client-side autocommit
logic in libpq.
2003-05-14 03:26:03 +00:00
Tom Lane de28dc9a04 Portal and memory management infrastructure for extended query protocol.
Both plannable queries and utility commands are now always executed
within Portals, which have been revamped so that they can handle the
load (they used to be good only for single SELECT queries).  Restructure
code to push command-completion-tag selection logic out of postgres.c,
so that it won't have to be duplicated between simple and extended queries.
initdb forced due to addition of a field to Query nodes.
2003-05-02 20:54:36 +00:00