Commit Graph

536 Commits

Author SHA1 Message Date
Tom Lane cfab4f6541 Use SEP_CHAR consistently in forming XLOG pathnames. 2001-03-14 20:23:04 +00:00
Tom Lane 1b87e24c4a Change xlog page-header format to include StartUpID. Use the SUI to
detect case that next page in log came from an older run than the prior
page.  This avoids the necessity to re-zero the log after recovery from
a crash, which is good because we need not risk destroying valuable log
information.
This forces another initdb since yesterday :-(.  Need to get that log
reset utility done...
2001-03-13 20:32:37 +00:00
Tom Lane 4d14fe0048 XLOG (and related) changes:
* Store two past checkpoint locations, not just one, in pg_control.
  On startup, we fall back to the older checkpoint if the newer one
  is unreadable.  Also, a physical copy of the newest checkpoint record
  is kept in pg_control for possible use in disaster recovery (ie,
  complete loss of pg_xlog).  Also add a version number for pg_control
  itself.  Remove archdir from pg_control; it ought to be a GUC
  parameter, not a special case (not that it's implemented yet anyway).

* Suppress successive checkpoint records when nothing has been entered
  in the WAL log since the last one.  This is not so much to avoid I/O
  as to make it actually useful to keep track of the last two
  checkpoints.  If the things are right next to each other then there's
  not a lot of redundancy gained...

* Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs
  on alternate bytes.  Polynomial borrowed from ECMA DLT1 standard.

* Fix XLOG record length handling so that it will work at BLCKSZ = 32k.

* Change XID allocation to work more like OID allocation.  (This is of
  dubious necessity, but I think it's a good idea anyway.)

* Fix a number of minor bugs, such as off-by-one logic for XLOG file
  wraparound at the 4 gig mark.

* Add documentation and clean up some coding infelicities; move file
  format declarations out to include files where planned contrib
  utilities can get at them.

* Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or
  every CHECKPOINT_TIMEOUT seconds, whichever comes first.  It is also
  possible to force a checkpoint by sending SIGUSR1 to the postmaster
  (undocumented feature...)

* Defend against kill -9 postmaster by storing shmem block's key and ID
  in postmaster.pid lockfile, and checking at startup to ensure that no
  processes are still connected to old shmem block (if it still exists).

* Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency
  stop, for symmetry with postmaster and xlog utilities.  Clean up signal
  handling in bootstrap.c so that xlog utilities launched by postmaster
  will react to signals better.

* Standalone bootstrap now grabs lockfile in target directory, as added
  insurance against running it in parallel with live postmaster.
2001-03-13 01:17:06 +00:00
Tom Lane b109b03fea Repair a number of places that didn't bother to check whether PageAddItem
succeeds or not.  Revise rtree page split algorithm to take care about
making a feasible split --- ie, will the incoming tuple actually fit?
Failure to make a feasible split, combined with failure to notice the
failure, account for Jim Stone's recent bug report.  I suspect that
hash and gist indices may have the same type of bug, but at least now
we'll get error messages rather than silent failures if so.  Also clean
up rtree code to use Datum rather than char* where appropriate.
2001-03-07 21:20:26 +00:00
Tom Lane 9c9936587c Implement COMMIT_SIBLINGS parameter to allow pre-commit delay to occur
only if at least N other backends currently have open transactions.  This
is not a great deal of intelligence about whether a delay might be
profitable ... but it beats no intelligence at all.  Note that the default
COMMIT_DELAY is still zero --- this new code does nothing unless that
setting is changed.
Also, mark ENABLEFSYNC as a system-wide setting.  It's no longer safe to
allow that to be set per-backend, since we may be relying on some other
backend's fsync to have synced the WAL log.
2001-02-26 00:50:08 +00:00
Bruce Momjian 4f6c49fef0 Clean up index/btree comments/macros, as approved. 2001-02-22 21:48:49 +00:00
Hiroshi Inoue 50e3c60b95 Avoid 'FATAL: out of free buffers: time to abort !" error
during WAL recovery.  Recovery failure is always serious.
2001-02-22 08:59:40 +00:00
Tom Lane 57e0847180 Change default commit_delay to zero, update documentation. 2001-02-18 04:50:43 +00:00
Tom Lane 33cc5d8a4d Change s_lock to not use any zero-delay select() calls; these are just a
waste of cycles on single-CPU machines, and of dubious utility on multi-CPU
machines too.
Tweak s_lock_stuck so that caller can specify timeout interval, and
increase interval before declaring stuck spinlock for buffer locks and XLOG
locks.
On systems that have fdatasync(), use that rather than fsync() to sync WAL
log writes.  Ensure that WAL file is entirely allocated during XLogFileInit.
2001-02-18 04:39:42 +00:00
Tom Lane 059e361481 Although we can't support out-of-line TOAST storage in indexes (yet),
compressed storage works perfectly well.  Might as well have a coherent
strategy for applying it, rather than the haphazard store-what-you-get
approach that was in the code before.  The strategy I've set up here is
to attempt compression of any compressible index value exceeding
BLCKSZ/16, or about 500 bytes by default.
2001-02-15 20:57:01 +00:00
Vadim B. Mikheev 7e04843ba7 Comments about GetFreeXLBuffer().
GetFreeXLBuffer(): use Insert->LgwrResult instead of private LgwrResult
copy if it's more fresh (attempt to avoid acquiring info_lck/lgwr_lck).
2001-02-13 20:40:25 +00:00
Vadim B. Mikheev 35273825dc Removed abort() in XLogFileOpen. 2001-02-13 08:44:09 +00:00
Tom Lane 7fdca53711 When updating a tuple containing compressed-in-line fields, do not
decompress the existing fields unnecessarily.
2001-02-09 17:30:03 +00:00
Vadim B. Mikheev c19dadbf08 Runtime btree recovery is now ON by default. 2001-02-07 23:35:33 +00:00
Vadim B. Mikheev b18c09ee3a Runtime tree recovery is implemented, just testing is left -:) 2001-02-02 19:49:15 +00:00
Vadim B. Mikheev dca0762efc Couple additional functions to fix tree at runtime.
Need in one more function to handle "my bits moved..."
case. FixBTree is still FALSE.
2001-01-31 01:08:36 +00:00
Vadim B. Mikheev 598a12722a Call _bt_fixroot() from _bt_insertonpg. 2001-01-29 07:28:17 +00:00
Tom Lane 0d54d6ac44 Clean up handling of tuple descriptors so that result-tuple descriptors
allocated by plan nodes are not leaked at end of query.  This doesn't
really matter for normal queries, but it sure does for queries invoked
repetitively inside SQL functions.  Clean up some other grotty code
associated with tupdescs, and fix a few other memory leaks exposed by
tests with simple SQL functions.
2001-01-29 00:39:20 +00:00
Vadim B. Mikheev c6e6d292bc First step in attempt to fix tree at runtime: create upper levels
and new root page if old root one was splitted but new root page
wasn't created.
New code is protected by FixBTree bool flag setted to FALSE, so
nothing should be affected by this untested approach.
2001-01-26 01:24:31 +00:00
Bruce Momjian 623bf843d2 Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group. 2001-01-24 19:43:33 +00:00
Tom Lane 4e27b308e2 Do _bt_wrtbuf() outside critical section, per discussion with Vadim 1/19. 2001-01-23 23:29:22 +00:00
Tom Lane 786f1a59cd Fix all the places that called heap_update() and heap_delete() without
bothering to check the return value --- which meant that in case the
update or delete failed because of a concurrent update, you'd not find
out about it, except by observing later that the transaction produced
the wrong outcome.  There are now subroutines simple_heap_update and
simple_heap_delete that should be used anyplace that you're not prepared
to do the full nine yards of coping with concurrent updates.  In
practice, that seems to mean absolutely everywhere but the executor,
because *noplace* else was checking.
2001-01-23 04:32:23 +00:00
Tom Lane 6ce0ed2813 Make critical sections (elog->crash) and interrupt holdoff sections
into distinct concepts, per recent discussion on pghackers.
2001-01-19 22:08:47 +00:00
Vadim B. Mikheev 0a12767004 Comment out xlrec in xact_redo - no support for file unlinking on
commit yet.
2001-01-18 18:33:45 +00:00
Tom Lane efd6cade83 Tweak heap_update/delete so that we do not hold the buffer context lock
on the old tuple's page while we are doing TOAST pushups.
2001-01-15 05:29:19 +00:00
Tom Lane 36839c1927 Restructure backend SIGINT/SIGTERM handling so that 'die' interrupts
are treated more like 'cancel' interrupts: the signal handler sets a
flag that is examined at well-defined spots, rather than trying to cope
with an interrupt that might happen anywhere.  See pghackers discussion
of 1/12/01.
2001-01-14 05:08:17 +00:00
Tom Lane 6162432de9 Add more critical-section calls: all code sections that hold spinlocks
are now critical sections, so as to ensure die() won't interrupt us while
we are munging shared-memory data structures.  Avoid insecure intermediate
states in some code that proc_exit will call, like palloc/pfree.  Rename
START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be
what people tend to call them anyway, and make them be called with () like
a function call, in hopes of not confusing pg_indent.
I doubt that this is sufficient to make SIGTERM safe anywhere; there's
just too much code that could get invoked during proc_exit().
2001-01-12 21:54:01 +00:00
Marc G. Fournier 0ad7db4be4 New feature:
1. Support of variable size keys - new algorithm of insertion to tree
      (GLI - gist layrered insertion). Previous algorithm was implemented
      as described in paper by Joseph M. Hellerstein et.al
      "Generalized Search Trees for Database Systems".  This (old)
      algorithm was not suitable for variable size keys and could be
      not effective ( walking up-down ) in case of multiple levels split
Bug fixed:
   1. fixed bug in gistPageAddItem - key values were written to disk
      uncompressed. This caused failure if decompression function
      does real job.
   2. NULLs handling - we keep NULLs in tree. Right way is to remove them,
      but we don't know how to inform vacuum about index statistics. This is
      just cosmetic warning message (like in case with R-Tree),
      but I'm not sure how to recognize real problem if we remove NULLs
      and suppress this warning as Tom suggested.
   3. various memory leaks

This work was done by Teodor Sigaev (teodor@stack.net) and
Oleg Bartunov (oleg@sai.msu.su).
2001-01-12 00:12:58 +00:00
Vadim B. Mikheev 4b59366e57 1. Checkpoint.undo may be after checkpoint itself:
- no more elog(STOP) in StartupXLOG();
   - both checkpoint' undo & redo are used to define
     oldest on-line log file.
2. Ability to pre-allocate a few log files at checkpoint time
   (wal_files option). Off by default.
2001-01-09 06:24:33 +00:00
Tom Lane a4ddbbd1a4 Correct nasty error in heap_update: it was releasing the buffer refcount
before calling RelationInvalidateHeapTuple(), which is bad because the
latter needs to look at the tuple data, which is in the shared disk
buffer.  If another backend manages to recycle the buffer while this
is going on, we will compute the wrong hashindex for the tuple or
maybe even crash outright.  Must hold buffer refcount until afterwards.
(This bug is not in 7.0.*; seems to be have introduced during WAL changes.)
2001-01-07 22:14:31 +00:00
Tom Lane 1b8a219eef Clean up non-reentrant interface for hash_seq/HashTableWalk, so that
starting a new hashtable search no longer clobbers any other search
active anywhere in the system.  Fix RelationCacheInvalidate() so that
it will not crash or go into an infinite loop if invoked recursively,
as for example by a second SI Reset message arriving while we are still
processing a prior one.
2001-01-02 04:33:24 +00:00
Vadim B. Mikheev 3e059b3802 1. WAL needs in zero-ed content of newly initialized page.
2. Log record for PageRepaireFragmentation now keeps array
   of !LP_USED offnums to redo cleanup properly.
2000-12-30 15:19:57 +00:00
Vadim B. Mikheev c193f19a39 Fixed misprint in heap update WALoging. 2000-12-30 06:52:34 +00:00
Tom Lane 7f60b81e1a Fix failure in CreateCheckPoint on some Alpha boxes --- it's not OK to
assume that TAS() will always succeed the first time, even if the lock
is known to be free.  Also, make sure that code will eventually time out
and report a stuck spinlock, rather than looping forever.  Small cleanups
in s_lock.h, too.
2000-12-29 21:31:21 +00:00
Vadim B. Mikheev 7d363c4c33 MUST update (in-memory) data page BEFORE XLogInsert to log
NEW page content if WAL will decide to backup page.
2000-12-29 20:47:17 +00:00
Vadim B. Mikheev b3c4f03c9c nbtree_xlog_newroot: set meta flag in meta page opaque. 2000-12-29 08:08:59 +00:00
Vadim B. Mikheev 7ceeeb662f New WAL version - CRC and data blocks backup. 2000-12-28 13:00:29 +00:00
Tom Lane 8609d4abf2 Fix portability problems recently exposed by regression tests on Alphas.
1. Distinguish cases where a Datum representing a tuple datatype is an OID
from cases where it is a pointer to TupleTableSlot, and make sure we use
the right typlen in each case.
2. Make fetchatt() and related code support 8-byte by-value datatypes on
machines where Datum is 8 bytes.  Centralize knowledge of the available
by-value datatype sizes in two macros in tupmacs.h, so that this will be
easier if we ever have to do it again.
2000-12-27 23:59:14 +00:00
Tom Lane 6cc842abd3 Revise lock manager to support "session level" locks as well as "transaction
level" locks.  A session lock is not released at transaction commit (but it
is released on transaction abort, to ensure recovery after an elog(ERROR)).
In VACUUM, use a session lock to protect the master table while vacuuming a
TOAST table, so that the TOAST table can be done in an independent
transaction.

I also took this opportunity to do some cleanup and renaming in the lock
code.  The previously noted bug in ProcLockWakeup, that it couldn't wake up
any waiters beyond the first non-wakeable waiter, is now fixed.  Also found
a previously unknown bug of the same kind (failure to scan all members of
a lock queue in some cases) in DeadLockCheck.  This might have led to failure
to detect a deadlock condition, resulting in indefinite waits, but it's
difficult to characterize the conditions required to trigger a failure.
2000-12-22 00:51:54 +00:00
Bruce Momjian 1f159e562b >> Here is a patch for the beos port (All regression tests are OK).
>>     xlog.c : special case for beos to avoid 'link' which does not work yet
>>     beos/sem.c : implementation of new sem_ctl call (GETPID) and a new
>sem_op
>> flag (IPCNOWAIT)
>>     dynloader/beos.c : add a verification of symbol validity (seem that
the
>> loader sometime return OK with an invalid symbol)
>>     postmaster.c :  add beos forking support for the new checkpoint
process
>>     postgres.c : remove beos special case for getrusage
>>     beos.h : Correction of a bas definition of AF_UNIX, misc defnitions
>>
>>
>>     thanks
>>
>>
>>             cyril

Cyril VELTER
2000-12-18 18:45:05 +00:00
Tom Lane a626b78c89 Clean up backend-exit-time cleanup behavior. Use on_shmem_exit callbacks
to ensure that we have released buffer refcounts and so forth, rather than
putting ad-hoc operations before (some of the calls to) proc_exit.  Add
commentary to discourage future hackers from repeating that mistake.
2000-12-18 00:44:50 +00:00
Vadim B. Mikheev 5bb4f723d2 Remove elog for online log files. 2000-12-11 19:27:42 +00:00
Vadim B. Mikheev dae369d390 elog(LOG)-->elog(DEBUG) for skipped logs. 2000-12-11 18:02:25 +00:00
Hiroshi Inoue 6ef0219c34 Resolve complie error(was my fault). 2000-12-11 09:14:03 +00:00
Hiroshi Inoue a8824ff257 *redo: Heap move* neglects to set t_cmin for MOVED_IN tuples. 2000-12-11 05:25:23 +00:00
Tom Lane 376784cf8a Repair erroneous use of hashvarlena() for MACADDR, which is not a
varlena type.  (I did not force initdb, but you won't see the fix
unless you do one.)  Also, make sure all index support operators and
functions are careful not to leak memory for toasted inputs; I had
missed some hash and rtree support ops on this point before.
2000-12-08 23:57:03 +00:00
Tom Lane fb47385fc8 Resurrect -F switch: it controls fsyncs again, though the fsyncs are
mostly just on the WAL logfile nowadays.  But if people want to disable
fsync for performance, why should we say no?
2000-12-08 22:21:33 +00:00
Hiroshi Inoue 8bb4dab94d RecordTransactionAbort() shouldn't log XLOG_XACT_ABORT
if the transaction has already been committed ?
2000-12-07 10:03:46 +00:00
Tom Lane 06dde51ef0 Silence compiler warning. 2000-12-07 02:04:30 +00:00
Vadim B. Mikheev 65b362fae1 Disable elog(ERROR|FATAL) in signal handlers in
critical sections of code.
2000-12-03 10:27:29 +00:00