Commit Graph

376 Commits

Author SHA1 Message Date
Bruce Momjian fab177e64f Improve documention on loading large data sets into plperl.
David Fetter
2005-08-12 21:42:53 +00:00
Tom Lane 3ae7e4a33b Remove BufferBlockPointers array in favor of a base + (bufnum) * BLCKSZ
computation.  On modern machines this is as fast if not faster, and we
don't have to clog the CPU's L2 cache with a tens-of-KB pointer array.
If we ever decide to adopt a more dynamic allocation method for shared
buffers, we'll probably have to revert this patch, but in the meantime
we might as well save a few bytes and nanoseconds.  Per Qingqing Zhou.
2005-08-12 05:05:51 +00:00
Tom Lane 15269b5955 Avoid useless loop overhead in AtEOXact routines when the backend is
compiled with USE_ASSERT_CHECKING but is running with assert_enabled false.
2005-08-08 19:44:22 +00:00
Tom Lane 7117cd3a77 Cause ShutdownPostgres to do a normal transaction abort during backend
exit, instead of trying to take shortcuts.  Introduce some additional
shutdown callback routines to eliminate kluges like having ProcKill
be responsible for shutting down the buffer manager.  Ensure that the
order of operations during shutdown is predictable and what you would
expect given the module layering.
2005-08-08 03:12:16 +00:00
Tom Lane 6eac4e69cf Tweak BgBufferSync() so that a persistent write error on a dirty buffer
doesn't block the bgwriter from making progress writing out other buffers.
This was a hard problem in the context of the ARC/2Q design, but it's
trivial in the context of clock sweep ... just advance the sweep counter
before we try to write not after.
2005-08-02 20:52:08 +00:00
Tom Lane e92a88272e Modify hash_search() API to prevent future occurrences of the error
spotted by Qingqing Zhou.  The HASH_ENTER action now automatically
fails with elog(ERROR) on out-of-memory --- which incidentally lets
us eliminate duplicate error checks in quite a bunch of places.  If
you really need the old return-NULL-on-out-of-memory behavior, you
can ask for HASH_ENTER_NULL.  But there is now an Assert in that path
checking that you aren't hoping to get that behavior in a palloc-based
hash table.
Along the way, remove the old HASH_FIND_SAVE/HASH_REMOVE_SAVED actions,
which were not being used anywhere anymore, and were surely too ugly
and unsafe to want to see revived again.
2005-05-29 04:23:07 +00:00
Tom Lane ee3b71f6bc Split the shared-memory array of PGPROC pointers out of the sinval
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner.  Per my recent proposal.
2005-05-19 21:35:48 +00:00
Tom Lane 354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane 91728fa26c Add temp_buffers GUC variable to allow users to determine the size
of the local buffer arena for temporary table access.
2005-03-19 23:27:11 +00:00
Tom Lane d65522aeb6 Upgrade localbuf.c to use a hash table instead of linear search to
find already-allocated local buffers.  This is the last obstacle
in the way of setting NLocBuffer to something reasonably large.
2005-03-19 17:39:43 +00:00
Tom Lane 88164799ce Need to reset local buffer pin counts, not only shared buffer pins,
before we attempt any file deletions in ShutdownPostgres.  Per Tatsuo.
2005-03-18 16:16:09 +00:00
Tom Lane cef01c3355 Avoid infinite loop in InvalidateBuffer if we ourselves are holding
a pin on the victim buffer.
2005-03-18 05:25:23 +00:00
Tom Lane 5d5087363d Replace the BufMgrLock with separate locks on the lookup hashtable and
the freelist, plus per-buffer spinlocks that protect access to individual
shared buffer headers.  This requires abandoning a global freelist (since
the freelist is a global contention point), which shoots down ARC and 2Q
as well as plain LRU management.  Adopt a clock sweep algorithm instead.
Preliminary results show substantial improvement in multi-backend situations.
2005-03-04 20:21:07 +00:00
Tom Lane cc4f58f4cd Ensure that all details of the ARC algorithm are hidden within freelist.c.
This refactoring does not change any algorithms or data structures, just
remove visibility of the ARC datastructures from other source files.
2005-02-03 23:29:19 +00:00
Tom Lane 0ce4d56924 Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This
is the minimum required fix.  I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
2005-01-10 20:02:24 +00:00
Tom Lane c9d8edc906 Repair bufmgr deadlock problem reported by Michael Wildpaner. Must take
share lock on a buffer being written out before releasing BufMgrLock in
the BufferAlloc code path; if we do it later we might block on someone
who's re-pinned the buffer.  I believe this is only an issue for BufferAlloc
and not the other places that call FlushBuffer.  BufferSync must continue
to do it the old way since it may well be trying to write buffers that
other backends have pinned; but it should not be holding any conflicting
locks.  FlushRelationBuffers is okay since it's got exclusive lock at the
relation level.
2005-01-03 18:49:41 +00:00
PostgreSQL Daemon 2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Neil Conway 4acc97d7e4 Assert that BufferIsPinned() in IncrBufferRefCount(), rather than using
a home-brewed combination of assertions that boiled down to the same
thing.
2004-11-24 02:56:17 +00:00
Tom Lane 4347cc2392 Allow background writing to be shut down by setting limit values to zero.
This does not disable the bgwriter process: it still has to wake up often
enough to collect fsync requests from backends in a timely fashion.  But
it responds to the recent gripe about not being able to prevent the disk
from being spun up constantly.
2004-10-17 22:01:51 +00:00
Tom Lane fdd13f1568 Give the ResourceOwner mechanism full responsibility for releasing buffer
pins at end of transaction, and reduce AtEOXact_Buffers to an Assert
cross-check that this was done correctly.  When not USE_ASSERT_CHECKING,
AtEOXact_Buffers is a complete no-op.  This gets rid of an O(NBuffers)
bottleneck during transaction commit/abort, which recent testing has shown
becomes significant above a few tens of thousands of shared buffers.
2004-10-16 18:57:26 +00:00
Tom Lane 1c2de47746 Remove BufferLocks[] array in favor of a single pointer to the buffer
(if any) currently waited for by LockBufferForCleanup(), which is all
that we were using it for anymore.  Saves some space and eliminates
proportional-to-NBuffers slowdown in UnlockBuffers().
2004-10-16 18:05:07 +00:00
Tom Lane 9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Tom Lane 8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane eb917c1a21 I can't see any good reason for DropRelFileNodeBuffers to be issuing
FATAL when it detects a nonzero reference count.  Reduce to ERROR.
2004-09-06 17:31:32 +00:00
Tom Lane a421b4e850 FlushRelationBuffers was also being a bit cavalier about whether the
relation is already opened by smgr.
2004-08-31 16:13:06 +00:00
Bruce Momjian b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane fe548629c5 Invent ResourceOwner mechanism as per my recent proposal, and use it to
keep track of portal-related resources separately from transaction-related
resources.  This allows cursors to work in a somewhat sane fashion with
nested transactions.  For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor.  We might want to change that later.
2004-07-17 03:32:14 +00:00
Tom Lane 573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Tom Lane 2467394ee1 Tablespaces. Alternate database locations are dead, long live tablespaces.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation.  Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE.  Also initlocation is
dead, it just doesn't know it yet.

Gavin Sherry and Tom Lane.
2004-06-18 06:14:31 +00:00
Tom Lane bbf0ebadaf StrategyDirtyBufferList wasn't being careful to honor max_buffers limit.
Bug is only latent given that sole caller is passing NBuffers, but it
could bite someone in the rear someday.
2004-06-11 17:20:39 +00:00
Tom Lane e6cba71503 Add some code to Assert that when we release pin on a buffer, we are
not holding the buffer's cntx_lock or io_in_progress_lock.  A recent
report from Litao Wu makes me wonder whether it is ever possible for
us to drop a buffer and forget to release its cntx_lock.  The Assert
does not fire in the regression tests, but that proves little ...
2004-06-11 16:43:24 +00:00
Tom Lane 921d749bd4 Adjust our timezone library to use pg_time_t (typedef'd as int64) in
place of time_t, as per prior discussion.  The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038).  The system will now treat
times over the full supported timestamp range as being in your local
time zone.  It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.

I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038.  Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported.  We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
2004-06-03 02:08:07 +00:00
Tom Lane 91d20ff7aa Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs for
temp tables, and avoid WAL-logging truncations of temp tables.  Do issue
fsync on truncated files (not sure this is necessary but it seems like
a good idea).
2004-05-31 20:31:33 +00:00
Tom Lane e674707968 Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem.  All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
2004-05-31 19:24:05 +00:00
Tom Lane 9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Tom Lane 076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane 1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00
Tom Lane 4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Neil Conway 0370951347 Tiny assorted fixes: correct a typo in a comment in vacuumlazy.c, remove
some unused #include directives from bufmgr.c, and clarify comments in
bufmgr.h and buf.h
2004-04-25 23:50:58 +00:00
Neil Conway 139abc2896 Make LocalRefCount and PrivateRefCount arrays of int32, rather than long.
This saves a small amount of per-backend memory for LP64 machines.
2004-04-22 07:21:55 +00:00
Tom Lane 95a03e9cdf Another round of code cleanup on bufmgr. Use BM_VALID flag to keep track
of whether we have successfully read data into a buffer; this makes the
error behavior a bit more transparent (IMHO anyway), and also makes it
work correctly for local buffers which don't use Start/TerminateBufferIO.
Collapse three separate functions for writing a shared buffer into one.
This overlaps a bit with cleanups that Neil proposed awhile back, but
seems not to have committed yet.
2004-04-21 18:06:30 +00:00
Tom Lane 011c3e62e7 Code review for ARC patch. Eliminate static variables, improve handling
of VACUUM cases so that VACUUM requests don't affect the ARC state at all,
avoid corner case where BufferSync would uselessly rewrite a buffer that
no longer contains the page that was to be flushed.  Make some minor
other cleanups in and around the bufmgr as well, such as moving PinBuffer
and UnpinBuffer into bufmgr.c where they really belong.
2004-04-19 23:27:17 +00:00
Tom Lane da99cce7cd Avoid delaying postmaster shutdown by up to 10 seconds on platforms
where signals do not terminate sleep() delays.
2004-02-12 20:07:26 +00:00
Jan Wieck fc65a3e1fd Fixed bug where FlushRelationBuffers() did call StrategyInvalidateBuffer()
for already empty buffers because their buffer tag was not cleard out
when the buffers have been invalidated before.

Also removed the misnamed BM_FREE bufhdr flag and replaced the checks,
which effectively ask if the buffer is unpinned, with checks against the
refcount field.

Jan
2004-02-12 15:06:56 +00:00
Tom Lane 58f337a343 Centralize implementation of delay code by creating a pg_usleep()
subroutine in src/port/pgsleep.c.  Remove platform dependencies from
miscadmin.h and put them in port.h where they belong.  Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.

By the way, where is the documentation for the cost-based-delay patch?
2004-02-10 03:42:45 +00:00
Tom Lane 87bd956385 Restructure smgr API as per recent proposal. smgr no longer depends on
the relcache, and so the notion of 'blind write' is gone.  This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
2004-02-10 01:55:27 +00:00
Jan Wieck f425b605f4 Cost based vacuum delay feature.
Jan
2004-02-06 19:36:18 +00:00
Jan Wieck 8d09e25693 Backing out the background writer sync() option.
Jan
2004-02-04 01:24:53 +00:00
Bruce Momjian 5ee2ae2049 Remove sleep() and use single PG_SLEEP call for Win32 signal handling
and consistency.

Change PG_USLEEP to use SleepEx() for signal interuptability.
2004-01-30 15:57:04 +00:00
Jan Wieck d77b63b17c Added GUC variable bgwriter_flush_method controlling the action
done by the background writer between writing dirty blocks and
napping.

    none (default)   no action
	sync             bgwriter calls smgrsync() causing a sync(2)

A global sync() is only good on dedicated database servers, so
more flush methods should be added in the future.

Jan
2004-01-24 20:00:46 +00:00
Jan Wieck dfdd59e918 Adjusted calculation of shared memory requirements to new
ARC buffer replacement strategy.

Jan
2004-01-15 16:14:26 +00:00
Bruce Momjian 38081fd000 Change PG_DELAY from msec to usec and use it consistenly rather than
select().   Add Win32 Sleep() for delay.
2004-01-09 21:08:50 +00:00
Neil Conway 192ad63bd7 More janitorial work: remove the explicit casting of NULL literals to a
pointer type when it is not necessary to do so.

For future reference, casting NULL to a pointer type is only necessary
when (a) invoking a function AND either (b) the function has no prototype
OR (c) the function is a varargs function.
2004-01-07 18:56:30 +00:00
Tom Lane 16cc9dff4f bufmgr.c failed to compile on Darwin, because it didn't include
<sys/time.h> where struct timeval is defined.
2003-12-20 22:18:02 +00:00
Bruce Momjian d75b2ec4eb This patch is the next step towards (re)allowing fork/exec.
Claudio Natoli
2003-12-20 17:31:21 +00:00
Neil Conway fef0c8345a I posted some bufmgr cleanup a few weeks ago, but it conflicted with
some concurrent changes Jan was making to the bufmgr. Here's an
updated version of the patch -- it should apply cleanly to CVS
HEAD and passes the regression tests.

This patch makes the following changes:

     - remove the UnlockAndReleaseBuffer() and UnlockAndWriteBuffer()
       macros, and replace uses of them with calls to the appropriate
       functions.

     - remove a bunch of #ifdef BMTRACE code: it is ugly & broken
       (i.e. it doesn't compile)

     - make BufferReplace() return a bool, not an int

     - cleanup some logic in bufmgr.c; should be functionality
       equivalent to the previous code, just cleaner now

     - remove the BM_PRIVATE flag as it is unused

     - improve a few comments, etc.
2003-12-14 00:34:47 +00:00
Tom Lane 0902ece5b9 Force zero_damaged_pages to be effectively ON during recovery from WAL,
since there is no need to worry about damaged pages when we are going to
overwrite them anyway from the WAL.  Per recent discussion.
2003-12-01 16:53:19 +00:00
PostgreSQL Daemon 969685ad44 $Header: -> $PostgreSQL Changes ... 2003-11-29 19:52:15 +00:00
Peter Eisentraut c9190ef074 Conditionalize variable that is only used conditionally, to avoid warning. 2003-11-27 18:12:50 +00:00
Tom Lane 0a97cb37fc Remove unused variable. 2003-11-21 17:41:31 +00:00
Jan Wieck cfeca62148 Background writer process
This first part of the background writer does no syncing at all.
It's only purpose is to keep the LRU heads clean so that regular
backends seldom to never have to call write().

Jan
2003-11-19 15:55:08 +00:00
Jan Wieck 1f45555892 Changed parameter name for shared cache status report interval to
debug_shared_buffers = <seconds>

as per previous discussion.


Jan
2003-11-16 16:41:01 +00:00
Jan Wieck 7c360d65a8 Added documentation for the new interface between the buffer manager
and the cache replacement strategy as well as a description of the
ARC algorithm and the special tailoring of that done for PostgreSQL.

Jan
2003-11-14 04:32:11 +00:00
Jan Wieck 6b86d62b00 2nd try for the ARC strategy.
I added a couple more Assertions while tracking down the exact
cause of the former bug.

All 93 regression tests pass now.

Jan
2003-11-13 14:57:15 +00:00
Jan Wieck 923e994d79 ARC strategy backed out ... sorry
Jan
2003-11-13 05:34:58 +00:00
Jan Wieck 48adc0b34b Replacement of the buffer replacement strategy with an ARC
algorithm adopted for PostgreSQL.

Jan
2003-11-13 00:40:02 +00:00
Tom Lane 4240d2bffd Update future-tense comments in README to present tense. Noted by
Neil Conway.
2003-10-31 22:48:08 +00:00
Peter Eisentraut feb4f44d29 Message editing: remove gratuitous variations in message wording, standardize
terms, add some clarifications, fix some untranslatable attempts at dynamic
message building.
2003-09-25 06:58:07 +00:00
Tom Lane ffafacc1f6 Repair potential deadlock created by recent changes to recycle btree
index pages: when _bt_getbuf asks the FSM for a free index page, it is
possible (and, in some cases, even moderately likely) that the answer
will be the same page that _bt_split is trying to split.  _bt_getbuf
already knew that the returned page might not be free, but it wasn't
prepared for the possibility that even trying to lock the page could
be problematic.  Fix by doing a conditional rather than unconditional
grab of the page lock.
2003-08-10 19:48:08 +00:00
Bruce Momjian f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian 089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane cfa191f3b8 Error message editing in backend/storage. 2003-07-24 22:04:15 +00:00
Tom Lane a4e775a263 Make use of new error context stack mechanism to allow random errors
detected during buffer dump to be labeled with the buffer location.
For example, if a page LSN is clobbered, we now produce something like
ERROR:  XLogFlush: request 2C000000/8468EC8 is not satisfied --- flushed only
to 0/8468EF0
CONTEXT:  writing block 0 of relation 428946/566240
whereas before there was no convenient way to find out which page had
been trashed.
2003-05-10 19:04:30 +00:00
Tom Lane fd42262836 Add code to apply some simple sanity checks to the header fields of a
page when it's read in, per pghackers discussion around 17-Feb.  Add a
GUC variable zero_damaged_pages that causes the response to be a WARNING
followed by zeroing the page, rather than the normal ERROR; this is per
Hiroshi's suggestion that there needs to be a way to get at the data
in the rest of the table.
2003-03-28 20:17:13 +00:00
Bruce Momjian 48ee6f4916 This trivial patch removes the usage of some old statistics code that no
longer works -- IncrHeapAccessStat() didn't actually *do* anything
anymore, so no reason to keep it around AFAICS. I also fixed a
grammatical error in a comment.


Neil Conway
2003-02-13 05:35:11 +00:00
Tom Lane a2e8e15dd4 localbuf.c must be able to do blind writes. 2002-12-05 22:48:03 +00:00
Tom Lane 8a6fab412e Remove ShutdownBufferPoolAccess exit callback, and do the work in
ProcKill instead, where we still have a PGPROC with which to wait on
LWLocks.  This fixes 'can't wait without a PROC structure' failures
occasionally seen during backend shutdown (I'm surprised they weren't
more frequent, actually).  Add an Assert() to LWLockAcquire to help
catch any similar mistakes in future.  Fix failure to update MyProcPid
for standalone backends and pgstat processes.
2002-09-25 20:31:40 +00:00
Tom Lane c91b8bc537 Cosmetic fixes from Neil Conway. 2002-09-14 19:59:20 +00:00
Bruce Momjian e50f52a074 pgindent run. 2002-09-04 20:31:48 +00:00
Bruce Momjian 97ac103289 Remove sys/types.h in files that include postgres.h, and hence c.h,
because c.h has sys/types.h.
2002-09-02 02:47:07 +00:00
Bruce Momjian b1a5f87209 Tom Lane wrote:
> There's no longer a separate call to heap_storage_create in that routine
> --- the right place to make the test is now in the storage_create
> boolean parameter being passed to heap_create.  A simple change, but
> it passeth patch's understanding ...

Thanks.

Attached is a patch against cvs tip as of 8:30 PM PST or so. Turned out
that even after fixing the failed hunks, there was a new spot in
bufmgr.c which needed to be fixed (related to temp relations;
RelationUpdateNumberOfBlocks). But thankfully the regression test code
caught it :-)

Joe Conway
2002-08-15 16:36:08 +00:00
Tom Lane e44beef712 Code review of CLUSTER patch. Clean up problems with relcache getting
confused, toasted data getting lost, etc.
2002-08-11 21:17:35 +00:00
Tom Lane 5df307c778 Restructure local-buffer handling per recent pghackers discussion.
The local buffer manager is no longer used for newly-created relations
(unless they are TEMP); a new non-TEMP relation goes through the shared
bufmgr and thus will participate normally in checkpoints.  But TEMP relations
use the local buffer manager throughout their lifespan.  Also, operations
in TEMP relations are not logged in WAL, thus improving performance.
Since it's no longer necessary to fsync relations as they move out of the
local buffers into shared buffers, quite a lot of smgr.c/md.c/fd.c code
is no longer needed and has been removed: there's no concept of a dirty
relation anymore in md.c/fd.c, and we never fsync anything but WAL.
Still TODO: improve local buffer management algorithms so that it would
be reasonable to increase NLocBuffer.
2002-08-06 02:36:35 +00:00
Bruce Momjian 8864603f3c Minor code cleanup in bufmgr.c and bufmgr.h, mainly by moving repeated
lines of code into internal routines (drop_relfilenode_buffers,
release_buffer) and by hiding unused routines (PrintBufferDescs,
PrintPinnedBufs) behind #ifdef NOT_USED. Remove AbortBufferIO()
declaration from bufmgr.c (already declared in bufmgr.h)

Manfred Koizar
2002-07-02 05:47:37 +00:00
Bruce Momjian d84fe82230 Update copyright to 2002. 2002-06-20 20:29:54 +00:00
Bruce Momjian 6e8a1a6717 WriteBuffer return value:
>I'd vote for changing WriteBuffer to
>return void, and have it elog() on bad argument.

Manfred Koizar
2002-06-15 19:59:59 +00:00
Bruce Momjian 918e864f14 Remove some pre-WAL relics:
SharedBufferChanged
  BufferRelidLastDirtied
  BufferTagLastDirtied
  BufferDirtiedByMe

Manfred Koizar
2002-06-15 19:55:38 +00:00
Tom Lane 1a69a37d5b Fix obsolete comments. 2002-05-03 17:42:11 +00:00
Bruce Momjian 171824087c The patch I sent to -patches a little while ago wasn't applied: it
was in the thread "make BufferGetBlockNumber() a macro". Tom
objected to the original patch, so I prepared a new one which
doesn't change BufferGetBlockNumber() into a macro, it just
cleans up some comments and fixes an assertion. The patch
is attached.

Neil Conway
2002-04-15 23:47:12 +00:00
Bruce Momjian 92288a1cf9 Change made to elog:
o  Change all current CVS messages of NOTICE to WARNING.  We were going
to do this just before 7.3 beta but it has to be done now, as you will
see below.

o Change current INFO messages that should be controlled by
client_min_messages to NOTICE.

o Force remaining INFO messages, like from EXPLAIN, VACUUM VERBOSE, etc.
to always go to the client.

o Remove INFO from the client_min_messages options and add NOTICE.

Seems we do need three non-ERROR elog levels to handle the various
behaviors we need for these messages.

Regression passed.
2002-03-06 06:10:59 +00:00
Bruce Momjian a033daf566 Commit to match discussed elog() changes. Only update is that LOG is
now just below FATAL in server_min_messages.  Added more text to
highlight ordering difference between it and client_min_messages.

---------------------------------------------------------------------------

REALLYFATAL => PANIC
STOP => PANIC
New INFO level the prints to client by default
New LOG level the prints to server log by default
Cause VACUUM information to print only to the client
NOTICE => INFO where purely information messages are sent
DEBUG => LOG for purely server status messages
DEBUG removed, kept as backward compatible
DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1 added
DebugLvl removed in favor of new DEBUG[1-5] symbols
New server_min_messages GUC parameter with values:
        DEBUG[5-1], INFO, NOTICE, ERROR, LOG, FATAL, PANIC
New client_min_messages GUC parameter with values:
        DEBUG[5-1], LOG, INFO, NOTICE, ERROR, FATAL, PANIC
Server startup now logged with LOG instead of DEBUG
Remove debug_level GUC parameter
elog() numbers now start at 10
Add test to print error message if older elog() values are passed to elog()
Bootstrap mode now has a -d that requires an argument, like postmaster
2002-03-02 21:39:36 +00:00
Tom Lane f6ee99a062 Clean up usage-statistics display code (ShowUsage and friends). StatFp
is gone, usage messages now go through elog(DEBUG).
2001-11-10 23:51:14 +00:00
Bruce Momjian ea08e6cd55 New pgindent run with fixes suggested by Tom. Patch manually reviewed,
initdb/regression tests pass.
2001-11-05 17:46:40 +00:00
Bruce Momjian 6783b2372e Another pgindent run. Fixes enum indenting, and improves #endif
spacing.  Also adds space for one-line comments.
2001-10-28 06:26:15 +00:00
Bruce Momjian b81844b173 pgindent run on all C files. Java run to follow. initdb/regression
tests pass.
2001-10-25 05:50:21 +00:00
Tom Lane 8a52b893b3 Further cleanup of dynahash.c API, in pursuit of portability and
readability.  Bizarre '(long *) TRUE' return convention is gone,
in favor of just raising an error internally in dynahash.c when
we detect hashtable corruption.  HashTableWalk is gone, in favor
of using hash_seq_search directly, since it had no hope of working
with non-LONGALIGNable datatypes.  Simplify some other code that was
made undesirably grotty by promixity to HashTableWalk.
2001-10-05 17:28:13 +00:00
Tom Lane 5999e78fc4 Another round of cleanups for dynahash.c (maybe it's finally clean of
portability issues).  Caller-visible data structures are now allocated
on MAXALIGN boundaries, allowing safe use of datatypes wider than 'long'.
Rejigger hash_create API so that caller specifies size of key and
total size of entry, not size of key and size of rest of entry.
This simplifies life considerably since each number is just a sizeof(),
and padding issues etc. are taken care of automatically.
2001-10-01 05:36:17 +00:00
Tom Lane 499abb0c0f Implement new 'lightweight lock manager' that's intermediate between
existing lock manager and spinlocks: it understands exclusive vs shared
lock but has few other fancy features.  Replace most uses of spinlocks
with lightweight locks.  All remaining uses of spinlocks have very short
lock hold times (a few dozen instructions), so tweak spinlock backoff
code to work efficiently given this assumption.  All per my proposal on
pghackers 26-Sep-01.
2001-09-29 04:02:27 +00:00
Tom Lane 90aebf7f52 Move s_lock.c and spin.c into lmgr subdirectory, which seems a much
more reasonable location for them.
2001-09-27 19:10:02 +00:00
Peter Eisentraut f45b7270b6 Whoops, wrong logic. 2001-08-29 11:54:12 +00:00
Peter Eisentraut dd225655b9 Change the conditionals so the mips + gcc code here doesn't apply for Irix.
The code in s_lock.h should get used.

report from Bruno Mattarollo <bruno@web1.greenpeace.org>
2001-08-28 15:04:27 +00:00
Tom Lane 2589735da0 Replace implementation of pg_log as a relation accessed through the
buffer manager with 'pg_clog', a specialized access method modeled
on pg_xlog.  This simplifies startup (don't need to play games to
open pg_log; among other things, OverrideTransactionSystem goes away),
should improve performance a little, and opens the door to recycling
commit log space by removing no-longer-needed segments of the commit
log.  Actual recycling is not there yet, but I felt I should commit
this part separately since it'd still be useful if we chose not to
do transaction ID wraparound.
2001-08-25 18:52:43 +00:00
Tom Lane 55432fedd2 Implement LockBufferForCleanup(), which will allow concurrent VACUUM
to wait until it's safe to remove tuples and compact free space in a
shared buffer page.  Miscellaneous small code cleanups in bufmgr, too.
2001-07-06 21:04:26 +00:00
Tom Lane a29f6c095c Make the found-a-buffer-when-we-were-expecting-to-extend-the-rel path
actually work.  It had been throwing an Assert as of my recent changes
to bufmgr.c, but was not really right even before that AFAICT.
2001-07-02 18:47:18 +00:00
Tom Lane af5ced9cfd Further work on connecting the free space map (which is still just a
stub) into the rest of the system.  Adopt a cleaner approach to preventing
deadlock in concurrent heap_updates: allow RelationGetBufferForTuple to
select any page of the rel, and put the onus on it to lock both buffers
in a consistent order.  Remove no-longer-needed isExtend hack from
API of ReleaseAndReadBuffer.
2001-06-29 21:08:25 +00:00
Jan Wieck 8d80b0d980 Statistical system views (yet without the config stuff, but
it's hard to keep such massive changes in sync with the tree
so I need to get it in and work from there now).

Jan
2001-06-22 19:16:24 +00:00
Tom Lane bdadc9bf1c Remove RelationGetBufferWithBuffer(), which is horribly confused about
appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer.
Make use of the fact that the passed-in buffer (if there is one) must
be pinned to avoid grabbing the bufmgr spinlock when we are able to
return this same buffer.  Eliminate unnecessary 'previous tuple' and
'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing
a whole lot of bookkeeping from heap_getnext() and related routines.
2001-06-09 18:16:59 +00:00
Tom Lane eedb7d18fa Modify RelationGetBufferForTuple() so that we only do lseek and lock
when we need to move to a new page; as long as we can insert the new
tuple on the same page as before, we only need LockBuffer and not the
expensive stuff.  Also, twiddle bufmgr interfaces to avoid redundant
lseeks in RelationGetBufferForTuple and BufferAlloc.  Successive inserts
now require one lseek per page added, rather than one per tuple with
several additional ones at each page boundary as happened before.
Lock contention when multiple backends are inserting in same table
is also greatly reduced.
2001-05-12 19:58:28 +00:00
Tom Lane 642107d5ba Avoid unnecessary lseek() calls by cleanups in md.c. mdfd_lstbcnt was
not being consulted anywhere, so remove it and remove the _mdnblocks()
calls that were used to set it.  Change smgrextend interface to pass in
the target block number (ie, current file length) --- the caller always
knows this already, having already done smgrnblocks(), so it's silly to
do it over again inside mdextend.  Net result: extension of a file now
takes one lseek(SEEK_END) and a write(), not three lseeks and a write.
2001-05-10 20:38:49 +00:00
Tom Lane ff71301806 Spell __volatile__ correctly. 2001-03-27 01:16:24 +00:00
Bruce Momjian 9e1552607a pgindent run. Make it all clean. 2001-03-22 04:01:46 +00:00
Vadim B. Mikheev ab36582a19 Check bufHdr->cntxDirty and call StartBufferIO in BufferSync()
*before* acquiring shlock on buffer context. This way we should be
protected against conflicts with FlushRelationBuffers.
(Seems we never do excl lock and then StartBufferIO for the same
buffer, so there should be no deadlock here, - but we'd better
check this very soon).
2001-03-21 10:13:29 +00:00
Tom Lane 496ea7a876 At least on HPUX, select with delay.tv_sec = 0 and delay.tv_usec = 1000000
does not lead to a one-second delay, but to an immediate EINVAL failure.
This causes CHECKPOINT to crash with s_lock_stuck much too quickly :-(.
Fix by breaking down the requested wait div/mod 1e6.
2001-02-24 22:42:45 +00:00
Tom Lane 33cc5d8a4d Change s_lock to not use any zero-delay select() calls; these are just a
waste of cycles on single-CPU machines, and of dubious utility on multi-CPU
machines too.
Tweak s_lock_stuck so that caller can specify timeout interval, and
increase interval before declaring stuck spinlock for buffer locks and XLOG
locks.
On systems that have fdatasync(), use that rather than fsync() to sync WAL
log writes.  Ensure that WAL file is entirely allocated during XLogFileInit.
2001-02-18 04:39:42 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane 6ce0ed2813 Make critical sections (elog->crash) and interrupt holdoff sections
into distinct concepts, per recent discussion on pghackers.
2001-01-19 22:08:47 +00:00
Bruce Momjian 75815c3100 cleanup. 2001-01-19 21:09:57 +00:00
Bruce Momjian 27aaf9df7e Remove ; and add \n to ASM code. 2001-01-19 20:39:16 +00:00
Tom Lane 36839c1927 Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts
are treated more like 'cancel' interrupts: the signal handler sets a
flag that is examined at well-defined spots, rather than trying to cope
with an interrupt that might happen anywhere.  See pghackers discussion
of 1/12/01.
2001-01-14 05:08:17 +00:00
Tom Lane 6162432de9 Add more critical-section calls: all code sections that hold spinlocks
are now critical sections, so as to ensure die() won't interrupt us while
we are munging shared-memory data structures.  Avoid insecure intermediate
states in some code that proc_exit will call, like palloc/pfree.  Rename
START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be
what people tend to call them anyway, and make them be called with () like
a function call, in hopes of not confusing pg_indent.
I doubt that this is sufficient to make SIGTERM safe anywhere; there's
just too much code that could get invoked during proc_exit().
2001-01-12 21:54:01 +00:00
Tom Lane e2586c3c62 LockBuffer should not elog while holding buffer's cntx_lock. 2001-01-08 18:31:49 +00:00
Tom Lane 7f60b81e1a Fix failure in CreateCheckPoint on some Alpha boxes --- it's not OK to
assume that TAS() will always succeed the first time, even if the lock
is known to be free.  Also, make sure that code will eventually time out
and report a stuck spinlock, rather than looping forever.  Small cleanups
in s_lock.h, too.
2000-12-29 21:31:21 +00:00
Vadim B. Mikheev 7ceeeb662f New WAL version - CRC and data blocks backup. 2000-12-28 13:00:29 +00:00
Vadim B. Mikheev 369aace5f3 Avoid XLogFlush for clean buffers in BufferSync. 2000-12-22 20:04:43 +00:00
Tom Lane a626b78c89 Clean up backend-exit-time cleanup behavior. Use on_shmem_exit callbacks
to ensure that we have released buffer refcounts and so forth, rather than
putting ad-hoc operations before (some of the calls to) proc_exit.  Add
commentary to discourage future hackers from repeating that mistake.
2000-12-18 00:44:50 +00:00
Tom Lane 41fe2a2a03 Darwin porting patches from Peter Bierman <bierman@apple.com> 2000-12-11 00:49:54 +00:00
Vadim B. Mikheev 309112267f misc 2000-11-30 19:06:37 +00:00
Vadim B. Mikheev 8247f47fc7 Hope that this is valid localbuf.c version 2000-11-30 19:03:26 +00:00
Vadim B. Mikheev 81c8c244b2 No more #ifdef XLOG. 2000-11-30 08:46:26 +00:00
Tom Lane 680b7357ce Rearrange bufmgr header files so that buf_internals.h need not be
included by everything that includes bufmgr.h --- it's supposed to be
internals, after all, not part of the API!  This fixes the conflict
against FreeBSD headers reported by Rosenman, by making it unnecessary
for s_lock.h to be included by plperl.c.
2000-11-30 01:39:08 +00:00
Tom Lane c715fdea26 Significant cleanups in SysV IPC handling (shared mem and semaphores).
IPC key assignment will now work correctly even when multiple postmasters
are using same logical port number (which is possible given -k switch).
There is only one shared-mem segment per postmaster now, not 3.
Rip out broken code for non-TAS case in bufmgr and xlog, substitute a
complete S_LOCK emulation using semaphores in spin.c.  TAS and non-TAS
logic is now exactly the same.
When deadlock is detected, "Deadlock detected" is now the elog(ERROR)
message, rather than a NOTICE that comes out before an unhelpful ERROR.
2000-11-28 23:27:57 +00:00
Hiroshi Inoue 36933b4628 avoid opening view files. 2000-11-22 02:19:14 +00:00
Peter Eisentraut 2b1d8bd29a Include postgres.h before checking #ifdef XLOG. 2000-11-20 16:47:32 +00:00
Bruce Momjian 312063c97b Make pgsql compile on FreeBSD-alpha.
Context diff this time.

Remove -m486 compile args for FreeBSD-i386, compile -O2 on i386.

Compile with only -O on alpha for codegen safety.

Make the port use the TEST_AND_SET for alpha and i386 on FreeBSD.

Fix a lot of bogus string formats for outputting pointers (cast to int
and %u/%x replaced with no cast and %p), and 'Size'(size_t) are now
cast to 'unsigned long' and output with %lu/

Remove an unused variable.

Alfred Perlstein
2000-11-16 05:51:07 +00:00
Vadim B. Mikheev 92875e6f44 pg_fsync is fsync in WAL version. 2000-11-10 03:53:45 +00:00
Tom Lane 3908473c80 Make DROP TABLE rollback-able: postpone physical file delete until commit.
(WAL logging for this is not done yet, however.)  Clean up a number of really
crufty things that are no longer needed now that DROP behaves nicely.  Make
temp table mapper do the right things when drop or rename affecting a temp
table is rolled back.  Also, remove "relation modified while in use" error
check, in favor of locking tables at first reference and holding that lock
throughout the statement.
2000-11-08 22:10:03 +00:00
Vadim B. Mikheev 5b0740d3fc WAL 2000-10-28 16:21:00 +00:00
Vadim B. Mikheev 4b65a2840b New relcache hash table with RelFileNode as key to be used
from bufmgr - it would be nice to have separate hash in smgr
for node <--> fd mappings, but for the moment it's easy to
add new hash to relcache.
Fixed small bug in xlog.c:ReadRecord.
2000-10-23 04:10:24 +00:00
Tom Lane 3c5d000749 Fix incorrect logic for clearing BufferDirtiedByMe in ReleaseRelationBuffers
and DropBuffers.  Formerly we cleared the flag for each buffer currently
belonging to the target rel or database, but that's completely wrong!
Must look at BufferTagLastDirtied to see whether the BufferDirtiedByMe
flag is relevant to target rel or not; this is *independent* of the
current contents of the buffer.  Vadim spotted this problem, but his
fix was only partially correct...
2000-10-22 20:20:49 +00:00
Vadim B. Mikheev a7fcadd10a WAL 2000-10-21 15:43:36 +00:00
Vadim B. Mikheev b58c0411ba redo/undo support functions and cleanups. 2000-10-20 11:01:21 +00:00
Vadim B. Mikheev 2e6358172f I had to change buffer tag: now RelFileNode is used instead of
LockRelId - ie physical information, not logical. It's required
for WAL. Regression tests passed.
2000-10-18 05:50:16 +00:00
Vadim B. Mikheev 2c7de17b07 New file naming. Database OID is used as "tablespace" id and
relation OID is used as file node on creation but may be changed later
if required. Regression Tests Approved (c) -:)))
2000-10-16 14:52:28 +00:00
Hiroshi Inoue 5f18e2183e BufferAlloc() doesn't allocate write error buffers.
Remove compiler waring(my fault).
2000-09-29 03:55:45 +00:00
Hiroshi Inoue 77df055c54 avoid database-wide restart on write error 2000-09-29 01:23:47 +00:00
Tom Lane a8405cfc4d Acquire read lock on a buffer while writing it out, to prevent
concurrent modifications to the page by other backends.
2000-09-25 04:11:09 +00:00
Peter Eisentraut 424f0edcb8 Fix relative path references so that make knowns which dependencies refer
to one another. Sort out builddir vs srcdir variable namings. Remove some
now obsoleted make variables.
2000-08-31 16:12:35 +00:00
Bruce Momjian 20ad43b576 Mark functions as static and ifdef NOT_USED as appropriate. 2000-06-08 22:38:00 +00:00
Peter Eisentraut 6a68f42648 The heralded `Grand Unified Configuration scheme' (GUC)
That means you can now set your options in either or all of $PGDATA/configuration,
some postmaster option (--enable-fsync=off), or set a SET command. The list of
options is in backend/utils/misc/guc.c, documentation will be written post haste.

pg_options is gone, so is that pq_geqo config file. Also removed were backend -K,
-Q, and -T options (no longer applicable, although -d0 does the same as -Q).

Added to configure an --enable-syslog option.

changed all callers from TPRINTF to elog(DEBUG)
2000-05-31 00:28:42 +00:00