Commit Graph

1061 Commits

Author SHA1 Message Date
Tom Lane
278bd0cc22 For some reason access/tupmacs.h has been #including utils/memutils.h,
which is neither needed by nor related to that header.  Remove the bogus
inclusion and instead include the header in those C files that actually
need it.  Also fix unnecessary inclusions and bad inclusion order in
tsearch2 files.
2005-05-06 17:24:55 +00:00
Tom Lane
126eaef651 Clean up MultiXactIdExpand's API by separating out the case where we
are creating a new MultiXactId from two regular XIDs.  The original
coding was unnecessarily complicated and didn't save any code anyway.
2005-05-03 19:42:41 +00:00
Bruce Momjian
76668e6eb4 Check the file system on postmaster startup and report any unreferenced
files in the server log.

Heikki Linnakangas
2005-05-02 18:26:54 +00:00
Tom Lane
6c412f0605 Change CREATE TYPE to require datatype output and send functions to have
only one argument.  (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.)  Simplify call sites of
output/send functions to not bother passing more than one argument.
2005-05-01 18:56:19 +00:00
Tom Lane
93b2477278 Use the standard lock manager to establish priority order when there
is contention for a tuple-level lock.  This solves the problem of a
would-be exclusive locker being starved out by an indefinite succession
of share-lockers.  Per recent discussion with Alvaro.
2005-04-30 19:03:33 +00:00
Tom Lane
3a694bb0a1 Restructure LOCKTAG as per discussions of a couple months ago.
Essentially, we shoehorn in a lockable-object-type field by taking
a byte away from the lockmethodid, which can surely fit in one byte
instead of two.  This allows less artificial definitions of all the
other fields of LOCKTAG; we can get rid of the special pg_xactlock
pseudo-relation, and also support locks on individual tuples and
general database objects (including shared objects).  None of those
possibilities are actually exploited just yet, however.

I removed pg_xactlock from pg_class, but did not force initdb for
that change.  At this point, relkind 's' (SPECIAL) is unused and
could be removed entirely.
2005-04-29 22:28:24 +00:00
Tom Lane
bedb78d386 Implement sharable row-level locks, and use them for foreign key references
to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets.  When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX.  This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared.   Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.

Alvaro Herrera and Tom Lane.
2005-04-28 21:47:18 +00:00
Tom Lane
19d127548c Add comment about checkpoint panic behavior during shutdown, per
suggestion from Qingqing Zhou.
2005-04-23 18:49:54 +00:00
Tom Lane
3842892492 Recent changes got the sense of the notnull bit backwards in the 2.0
protocol output routines.  Mea culpa :-(.  Per report from Kris Jurka.
2005-04-23 17:45:35 +00:00
Bruce Momjian
1a6ad669fb Fix comment typo. 2005-04-17 03:04:29 +00:00
Tom Lane
5f0a974ea9 Reduce PANIC to ERROR in several xlog routines that are used in both
critical and noncritical contexts (an example of noncritical being
post-checkpoint removal of dead xlog segments).  In the critical cases
the CRIT_SECTION mechanism will cause ERROR to be promoted to PANIC
anyway, and in the noncritical cases we shouldn't let an error take
down the entire database.  Arguably there should be *no* explicit
PANIC errors in this module, only more START/END_CRIT_SECTION calls,
but I didn't go that far.  (Yet.)
2005-04-15 22:19:48 +00:00
Tom Lane
61b861421b Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments.  Instead keep track of where we stopped last time through.
2005-04-15 18:48:10 +00:00
Tom Lane
8e14408028 Make equalTupleDescs() compare attlen/attbyval/attalign rather than
assuming comparison of atttypid is sufficient.  In a dropped column
atttypid will be 0, and we'd better check the physical-storage data
to make sure the tupdescs are physically compatible.
I do not believe there is a real risk before 8.0, since before that
we only used this routine to compare successive states of the tupdesc
for a particular relation.  But 8.0's typcache.c might be comparing
arbitrary tupdescs so we'd better play it safer.
2005-04-14 22:34:48 +00:00
Tom Lane
162bd08b3f Completion of project to use fixed OIDs for all system catalogs and
indexes.  Replace all heap_openr and index_openr calls by heap_open
and index_open.  Remove runtime lookups of catalog OID numbers in
various places.  Remove relcache's support for looking up system
catalogs by name.  Bulky but mostly very boring patch ...
2005-04-14 20:03:27 +00:00
Tom Lane
2193a856a2 Simplify initdb-time assignment of OIDs as I proposed yesterday, and
avoid encroaching on the 'user' range of OIDs by allowing automatic
OID assignment to use values below 16k until we reach normal operation.

initdb not forced since this doesn't make any incompatible change;
however a lot of stuff will have different OIDs after your next initdb.
2005-04-13 18:54:57 +00:00
Tom Lane
c3294f1cbf Fix interaction between materializing holdable cursors and firing
deferred triggers: either one can create more work for the other,
so we have to loop till it's all gone.  Per example from andrew@supernews.
Add a regression test to help spot trouble in this area in future.
2005-04-11 19:51:16 +00:00
Tom Lane
ad161bcc8a Merge Resdom nodes into TargetEntry nodes to simplify code and save a
few palloc's.  I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.

initdb forced due to change in contents of stored rules.
2005-04-06 16:34:07 +00:00
Tom Lane
47888fe842 First phase of OUT-parameters project. We can now define and use SQL
functions with OUT parameters.  The various PLs still need work, as does
pg_dump.  Rudimentary docs and regression tests included.
2005-03-31 22:46:33 +00:00
Tom Lane
8c85a34a3b Officially decouple FUNC_MAX_ARGS from INDEX_MAX_KEYS, and set the
former to 100 by default.  Clean up some of the less necessary
dependencies on FUNC_MAX_ARGS; however, the biggie (FunctionCallInfoData)
remains.
2005-03-29 03:01:32 +00:00
Tom Lane
70c9763d48 Convert oidvector and int2vector into variable-length arrays. This
change saves a great deal of space in pg_proc and its primary index,
and it eliminates the former requirement that INDEX_MAX_KEYS and
FUNC_MAX_ARGS have the same value.  INDEX_MAX_KEYS is still embedded
in the on-disk representation (because it affects index tuple header
size), but FUNC_MAX_ARGS is not.  I believe it would now be possible
to increase FUNC_MAX_ARGS at little cost, but haven't experimented yet.
There are still a lot of vestigial references to FUNC_MAX_ARGS, which
I will clean up in a separate pass.  However, getting rid of it
altogether would require changing the FunctionCallInfoData struct,
and I'm not sure I want to buy into that.
2005-03-29 00:17:27 +00:00
Tom Lane
119191609c Remove dead push/pop rollback code. Vadim once planned to implement
transaction rollback via UNDO but I think that's highly unlikely to
happen, so we may as well remove the stubs.  (Someday we ought to
rip out the stub xxx_undo routines, too.)  Per Alvaro.
2005-03-28 01:50:34 +00:00
Tom Lane
bf3dbb5881 First steps towards index scans with heap access decoupled from index
access: define new index access method functions 'amgetmulti' that can
fetch multiple TIDs per call.  (The functions exist but are totally
untested as yet.)  Since I was modifying pg_am anyway, remove the
no-longer-needed 'rel' parameter from amcostestimate functions, and
also remove the vestigial amowner column that was creating useless
work for Alvaro's shared-object-dependencies project.
Initdb forced due to changes in pg_am.
2005-03-27 23:53:05 +00:00
Tom Lane
617dd33b6e Eliminate duplicate hasnulls bit testing in index tuple access, and
clean up itup.h a little bit.
2005-03-27 18:38:27 +00:00
Bruce Momjian
b1f57d88f5 Change Win32 O_SYNC method to O_DSYNC because that is what the method
currently does.  This is now the default Win32 wal sync method because
we perfer o_datasync to fsync.

Also, change Win32 fsync to a new wal sync method called
fsync_writethrough because that is the behavior of _commit, which is
what is used for fsync on Win32.

Backpatch to 8.0.X.
2005-03-24 04:36:20 +00:00
Tom Lane
94e03330cb Create a routine PageIndexMultiDelete() that replaces a loop around
PageIndexTupleDelete() with a single pass of compactification ---
logic mostly lifted from PageRepairFragmentation.  I noticed while
profiling that a VACUUM that's cleaning up a whole lot of deleted
tuples would spend as much as a third of its CPU time in
PageIndexTupleDelete; not too surprising considering the loop method
was roughly O(N^2) in the number of tuples involved.
2005-03-22 06:17:03 +00:00
Tom Lane
ee4ddac137 Convert index-related tuple handling routines from char 'n'/' ' to bool
convention for isnull flags.  Also, remove the useless InsertIndexResult
return struct from index AM aminsert calls --- there is no reason for
the caller to know where in the index the tuple was inserted, and we
were wasting a palloc cycle per insert to deliver this uninteresting
value (plus nontrivial complexity in some AMs).
I forced initdb because of the change in the signature of the aminsert
routines, even though nothing really looks at those pg_proc entries...
2005-03-21 01:24:04 +00:00
Neil Conway
fe7015f5e8 Change the return value of HeapTupleSatisfiesUpdate() to be an enum,
rather than an integer, and fix the associated fallout. From Alvaro
Herrera.
2005-03-20 23:40:34 +00:00
Tom Lane
354049c709 Remove unnecessary calls of FlushRelationBuffers: there is no need
to write out data that we are about to tell the filesystem to drop.
smgr_internal_unlink already had a DropRelFileNodeBuffers call to
get rid of dead buffers without a write after it's no longer possible
to roll back the deleting transaction.  Adding a similar call in
smgrtruncate simplifies callers and makes the overall division of
labor clearer.  This patch removes the former behavior that VACUUM
would write all dirty buffers of a relation unconditionally.
2005-03-20 22:00:54 +00:00
Tom Lane
f97aebd162 Revise TupleTableSlot code to avoid unnecessary construction and disassembly
of tuples when passing data up through multiple plan nodes.  A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays.  Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again.  This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.)  A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.

I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '.  This provides a better match to the convention used by
ExecEvalExpr.  While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
2005-03-16 21:38:10 +00:00
Tom Lane
a9b05bdc83 Avoid O(N^2) overhead in repeated nocachegetattr calls when columns of
a tuple are being accessed via ExecEvalVar and the attcacheoff shortcut
isn't usable (due to nulls and/or varlena columns).  To do this, cache
Datums extracted from a tuple in the associated TupleTableSlot.
Also some code cleanup in and around the TupleTable handling.
Atsushi Ogawa with some kibitzing by Tom Lane.
2005-03-14 04:41:13 +00:00
Tom Lane
a52b4fb131 Adjust creation/destruction of TupleDesc data structure to reduce the
number of palloc calls.  This has a salutory impact on plpgsql operations
with record variables (which create and destroy tupdescs constantly)
and probably helps a bit in some other cases too.
2005-03-07 04:42:17 +00:00
Tom Lane
4aefe75553 Remove some no-longer-needed kluges for bootstrapping, in particular
the AMI_OVERRIDE flag.  The fact that TransactionLogFetch treats
BootstrapTransactionId as always committed is sufficient to make
bootstrap work, and getting rid of extra tests in heavily used code
paths seems like a win.  The files produced by initdb are demonstrably
the same after this change.
2005-02-20 21:46:50 +00:00
Tom Lane
60b2444cc3 Add code to prevent transaction ID wraparound by enforcing a safe limit
in GetNewTransactionId().  Since the limit value has to be computed
before we run any real transactions, this requires adding code to database
startup to scan pg_database and determine the oldest datfrozenxid.
This can conveniently be combined with the first stage of an attack on
the problem that the 'flat file' copies of pg_shadow and pg_group are
not properly updated during WAL recovery.  The code I've added to
startup resides in a new file src/backend/utils/init/flatfiles.c, and
it is responsible for rewriting the flat files as well as initializing
the XID wraparound limit value.  This will eventually allow us to get
rid of GetRawDatabaseInfo too, but we'll need an initdb so we can add
a trigger to pg_database.
2005-02-20 02:22:07 +00:00
Bruce Momjian
7c44e57331 Move plpgsql DEBUG from DEBUG2 to DEBUG1 because it is a user-requested
DEBUG.

Fix a few places where DEBUG1 crept in that should have been DEBUG2.
2005-02-12 23:53:42 +00:00
Tom Lane
12179c99b1 Marginal hack to merge adjacent ReleaseBuffer/ReadBuffer calls into
ReleaseAndReadBuffer during GIST index searches.  We already did this
in btree and rtree, might as well do it here too.
2005-02-05 19:38:58 +00:00
Neil Conway
a885ecd6ef Change heap_modifytuple() to require a TupleDesc rather than a
Relation. Patch from Alvaro Herrera, minor editorializing by
Neil Conway.
2005-01-27 23:24:11 +00:00
Tom Lane
0ffe9f7946 Fix memory leak in rtdosplit, per report from Clive Page. 2005-01-24 02:47:26 +00:00
Neil Conway
b4297c177c This patch makes some improvements to the rtree index implementation:
(1) Keep a pin on the scan's current buffer and mark buffer. This
avoids the need to do a ReadBuffer() for each tuple produced by the
scan. Since ReadBuffer() is expensive, this is a significant win.

(2) Convert a ReleaseBuffer(); ReadBuffer() pair into
ReleaseAndReadBuffer(). Surely not a huge win, but it saves a lock
acquire/release...

(3) Remove a bunch of duplicated code in rtget.c; make rtnext() handle
both the "initial result" and "subsequent result" cases.

(4) Add support for index tuple killing

(5) Remove rtscancache(): it is dead code, for the same reason that
gistscancache() is dead code (an index scan ought not be invoked with
NoMovementScanDirection).

The end result is about a 10% improvement in rtree index scan perf,
according to contrib/rtree_gist/bench.
2005-01-18 23:25:55 +00:00
Tom Lane
0ce4d56924 Phase 1 of fix for 'SMgrRelation hashtable corrupted' problem. This
is the minimum required fix.  I want to look next at taking advantage of
it by simplifying the message semantics in the shared inval message queue,
but that part can be held over for 8.1 if it turns out too ugly.
2005-01-10 20:02:24 +00:00
Bruce Momjian
2daed8c5b3 Update copyrights that were missed. 2005-01-01 05:43:09 +00:00
PostgreSQL Daemon
2ff501590b Tag appropriate files for rc3
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
2004-12-31 22:04:05 +00:00
Tom Lane
bfa5f30481 Awhile back I added some code to StartupCLOG() to forcibly zero out
the remainder of the current clog page during system startup.  While
this was a good idea, it turns out the code fails if nextXid is
exactly at a page boundary, because we won't have created the "current"
clog page yet in that case.  Since the page will be correctly zeroed
when we execute the first transaction on it, the solution is just to
do nothing when exactly at a page boundary.  Per trouble report from
Dave Hartwig.
2004-12-22 18:45:49 +00:00
Tom Lane
ff5a354ece Fix is-it-time-for-a-checkpoint logic so that checkpoint_segments can
usefully be larger than 255.  Per gripe from Simon Riggs.
2004-12-17 00:10:36 +00:00
Tom Lane
c3d6c7d8f9 Calculation of keys_are_unique flag was wrong for cases involving
redundant cross-datatype comparisons.  Per example from Merlin Moncure.
2004-12-15 19:16:39 +00:00
Tom Lane
5374d097de Change planner to use the current true disk file size as its estimate of
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages.  Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows.  When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows.  This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created.  Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly.  (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
2004-12-01 19:00:56 +00:00
Tom Lane
37d693033d Minor adjustment of message style. 2004-11-17 16:26:59 +00:00
Neil Conway
5d1dd2bc55 Micro-optimization of markpos() and restrpos() in btree and hash indexes.
Rather than using ReadBuffer() to increment the reference count on an
already-pinned buffer, we should use IncrBufferRefCount() as it is
faster and does not require acquiring the BufMgrLock.
2004-11-17 03:13:38 +00:00
Neil Conway
b25d23e1e6 Don't allow pg_start_backup() to be invoked if archive_command has not
been defined. Patch from Gavin Sherry, editorializing by Neil Conway.
2004-11-17 02:22:54 +00:00
Neil Conway
a236dd9536 There is no need for ReadBuffer() call sites to check that the returned
buffer is valid, as ReadBuffer() will elog on error. Most of the call
sites of ReadBuffer() got this right, but this patch fixes those call
sites that did not.
2004-11-14 02:04:14 +00:00
Neil Conway
4d0f669f3c Remove obsolete comment from btbuild() and hashbuild(): we no longer use
a global variable to control building indexes.
2004-11-11 00:32:50 +00:00
Peter Eisentraut
0ed3c7665e Small message clarifications 2004-11-05 17:11:34 +00:00
Tom Lane
88868d4fbc Change COMMIT back to the old behavior of emitting command tag COMMIT,
not ROLLBACK, for the case of COMMIT outside a transaction block.
Alvaro Herrera
2004-10-30 20:44:43 +00:00
Tom Lane
23f264d125 Rearrange order of pre-commit operations: must close cursors before doing
ON COMMIT actions.  Per bug report from Michael Guerin.
2004-10-29 22:19:53 +00:00
Tom Lane
ee69be44d5 Add DEBUG1-level logging of checkpoint start and end. Also, reduce the
'recycled log files' and 'removed log files' messages from DEBUG1 to
DEBUG2, replacing them with a count of files added/removed/recycled in
the checkpoint end message, as per suggestion from Simon Riggs.
2004-10-29 00:16:08 +00:00
Tom Lane
83cd2d8b0f Make heap_fetch API more consistent by having the buffer remain pinned
in all cases when keep_buf = true.  This allows ANALYZE's inner loop to
use heap_release_fetch, which saves multiple buffer lookups for the same
page and avoids overestimation of cost by the vacuum cost mechanism.
2004-10-26 16:05:03 +00:00
Tom Lane
fb22b32095 Allow functions returning void or cstring to appear in FROM clause,
to make life cushy for the JDBC driver.  Centralize the decision-making
that affects this by inventing a get_type_func_class() function, rather
than adding special cases in half a dozen places.
2004-10-20 16:04:50 +00:00
Tom Lane
fdd13f1568 Give the ResourceOwner mechanism full responsibility for releasing buffer
pins at end of transaction, and reduce AtEOXact_Buffers to an Assert
cross-check that this was done correctly.  When not USE_ASSERT_CHECKING,
AtEOXact_Buffers is a complete no-op.  This gets rid of an O(NBuffers)
bottleneck during transaction commit/abort, which recent testing has shown
becomes significant above a few tens of thousands of shared buffers.
2004-10-16 18:57:26 +00:00
Tom Lane
9ffc8ed58b Repair possible failure to update hint bits back to disk, per
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00464.php.
This fix is intended to be permanent: it moves the responsibility for
calling SetBufferCommitInfoNeedsSave() into the tqual.c routines,
eliminating the requirement for callers to test whether t_infomask changed.
Also, tighten validity checking on buffer IDs in bufmgr.c --- several
routines were paranoid about out-of-range shared buffer numbers but not
about out-of-range local ones, which seems a tad pointless.
2004-10-15 22:40:29 +00:00
Bruce Momjian
5c267325ec Add 'int' cast for getpid() because some Solaris releases return long
for getpid().
2004-10-14 20:23:46 +00:00
Peter Eisentraut
0fd37839d9 Message style revisions 2004-10-12 21:54:45 +00:00
Bruce Momjian
67608a393b Make getpid() use %d consistently for printing. 2004-10-09 02:46:42 +00:00
Bruce Momjian
a5d7ba773d Adjust comments previously moved to column 1 by pgident. 2004-10-07 15:21:58 +00:00
Tom Lane
4c77cbb272 PortalRun must guard against the possibility that the portal it's
running contains VACUUM or a similar command that will internally start
and commit transactions.  In such a case, the original caller values of
CurrentMemoryContext and CurrentResourceOwner will point to objects that
will be destroyed by the internal commit.  We must restore these pointers
to point to the newly-manufactured transaction context and resource owner,
rather than possibly pointing to deleted memory.
Also tweak xact.c so that AbortTransaction and AbortSubTransaction
forcibly restore a sane value for CurrentResourceOwner, much as they
have always done for CurrentMemoryContext.  I'm not certain this is
necessary but I'm feeling paranoid today.
Responds to Sean Chittenden's bug report of 4-Oct.
2004-10-04 21:52:15 +00:00
Tom Lane
4c5e810fcd Code review for NOWAIT patch: downgrade NOWAIT from fully reserved keyword
to unreserved keyword, use ereport not elog, assign a separate error code
for 'could not obtain lock' so that applications will be able to detect
that case cleanly.
2004-10-01 16:40:05 +00:00
Tom Lane
d2af5f8a3e Adjust index locking rules as per my proposal of earlier today. You
now are supposed to take some kind of lock on an index whenever you
are going to access the index contents, rather than relying only on a
lock on the parent table.
2004-09-30 23:21:26 +00:00
Neil Conway
0ed07d49d5 Code cleanup: don't bother casting the argument to pfree() to void *
from another pointer type. Per C89, this is unnecessary, and it is common
practice throughout the rest of the tree anyway.
2004-09-27 04:01:23 +00:00
Tom Lane
054b78ba38 Now that xmax and cmin are distinct fields again, we should zero xmax when
creating a new tuple.  This is just for debugging sanity, though, since
nothing should be paying any attention to xmax when the HEAP_XMAX_INVALID
bit is set.
2004-09-17 18:09:55 +00:00
Tom Lane
257cccbe5e Add some marginal tweaks to eliminate memory leakages associated with
subtransactions.  Trivial subxacts (such as a plpgsql exception block
containing no database access) now demonstrably leak zero bytes.
2004-09-16 20:17:49 +00:00
Tom Lane
86fff990b2 RecentXmin is too recent to use as the cutoff point for accessing
pg_subtrans --- what we need is the oldest xmin of any snapshot in use
in the current top transaction.  Introduce a new variable TransactionXmin
to play this role.  Fixes intermittent regression failure reported by
Neil Conway.
2004-09-16 18:35:23 +00:00
Tom Lane
8f9f198603 Restructure subtransaction handling to reduce resource consumption,
as per recent discussions.  Invent SubTransactionIds that are managed like
CommandIds (ie, counter is reset at start of each top transaction), and
use these instead of TransactionIds to keep track of subtransaction status
in those modules that need it.  This means that a subtransaction does not
need an XID unless it actually inserts/modifies rows in the database.
Accordingly, don't assign it an XID nor take a lock on the XID until it
tries to do that.  This saves a lot of overhead for subtransactions that
are only used for error recovery (eg plpgsql exceptions).  Also, arrange
to release a subtransaction's XID lock as soon as the subtransaction
exits, in both the commit and abort cases.  This avoids holding many
unique locks after a long series of subtransactions.  The price is some
additional overhead in XactLockTableWait, but that seems acceptable.
Finally, restructure the state machine in xact.c to have a more orthogonal
set of states for subtransactions.
2004-09-16 16:58:44 +00:00
Tom Lane
b2c4071299 Redesign query-snapshot timing so that volatile functions in READ COMMITTED
mode see a fresh snapshot for each command in the function, rather than
using the latest interactive command's snapshot.  Also, suppress fresh
snapshots as well as CommandCounterIncrement inside STABLE and IMMUTABLE
functions, instead using the snapshot taken for the most closely nested
regular query.  (This behavior is only sane for read-only functions, so
the patch also enforces that such functions contain only SELECT commands.)
As per my proposal of 6-Sep-2004; I note that I floated essentially the
same proposal on 19-Jun-2002, but that discussion tailed off without any
action.  Since 8.0 seems like the right place to be taking possibly
nontrivial backwards compatibility hits, let's get it done now.
2004-09-13 20:10:13 +00:00
Tom Lane
493f72606b Renumber SnapshotNow and the other special snapshot codes so that
((Snapshot) NULL) can no longer be confused with a valid snapshot,
as per my recent suggestion.  Define a macro InvalidSnapshot for 0.
Use InvalidSnapshot instead of SnapshotAny as the do-nothing special
case for heap_update and heap_delete crosschecks; this seems a little
cleaner even though the behavior is really the same.
2004-09-11 18:28:34 +00:00
Tom Lane
b339d1fff6 Fire non-deferred AFTER triggers immediately upon query completion,
rather than when returning to the idle loop.  This makes no particular
difference for interactively-issued queries, but it makes a big difference
for queries issued within functions: trigger execution now occurs before
the calling function is allowed to proceed.  This responds to numerous
complaints about nonintuitive behavior of foreign key checking, such as
http://archives.postgresql.org/pgsql-bugs/2004-09/msg00020.php, and
appears to be required by the SQL99 spec.
Also take the opportunity to simplify the data structures used for the
pending-trigger list, rename them for more clarity, and squeeze out a
bit of space.
2004-09-10 18:40:09 +00:00
Tom Lane
23645f0582 Fix incorrect ordering of smgr cleanup relative to buffer pin cleanup
during transaction abort.  Add a regression test case to catch related
mistakes in future.  Alvaro Herrera and Tom Lane.
2004-09-06 17:56:33 +00:00
Tom Lane
e32bba202d Downgrade LOG messages to DEBUG1 for normal recycling of xlog, clog,
subtrans segments.  Per Greg Mullane and Chris K-L.
2004-09-06 03:04:27 +00:00
Tom Lane
64cb889106 Ensure that the remainder of the current pg_clog page is zeroed during
startup, just to be sure that there's no leftover junk there.
2004-08-30 19:00:42 +00:00
Tom Lane
9cf4eaa00b Fix failure to advance nextXID beyond subtransactions whose XIDs appear
only within COMMIT or ABORT records.
2004-08-30 19:00:03 +00:00
Bruce Momjian
15d3f9f6b7 Another pgindent run with lib typedefs added. 2004-08-30 02:54:42 +00:00
Tom Lane
50742aed68 Add WAL logging for CREATE/DROP DATABASE and CREATE/DROP TABLESPACE.
Fix TablespaceCreateDbspace() to be able to create a dummy directory
in place of a dropped tablespace's symlink.  This eliminates the open
problem of a PANIC during WAL replay when a replayed action attempts
to touch a file in a since-deleted tablespace.  It also makes for a
significant improvement in the usability of PITR replay.
2004-08-29 21:08:48 +00:00
Tom Lane
0ffe11abd3 Widen xl_len field of XLogRecord header to 32 bits, so that we'll have
a more tolerable limit on the number of subtransactions or deleted files
in COMMIT and ABORT records.  Buy back the extra space by eliminating the
xl_xact_prev field, which isn't being used for anything and is rather
unlikely ever to be used for anything.
This does not force initdb, but you do need to do pg_resetxlog if you
want to upgrade an existing 8.0 installation without initdb.
2004-08-29 16:34:48 +00:00
Bruce Momjian
b6b71b85bc Pgindent run for 8.0. 2004-08-29 05:07:03 +00:00
Bruce Momjian
da9a8649d8 Update copyright to 2004. 2004-08-29 04:13:13 +00:00
Tom Lane
f78ecbf20e Now that TransactionIdDidAbort doesn't think it should try to modify
pg_clog, there's no reason to do abort marking of subtransactions in a
nonintuitive order.
2004-08-28 22:04:12 +00:00
Tom Lane
7531d2fd85 Add missing Assert to make TransactionIdDidAbort more consistent with
TransactionIdDidCommit.
2004-08-28 21:58:59 +00:00
Tom Lane
1c72d0dec1 Fix relcache to account properly for subtransaction status of 'new'
relcache entries.  Also, change TransactionIdIsCurrentTransactionId()
so that if consulted during transaction abort, it will not say that
the aborted xact is still current.  (It would be better to ensure that
it's never called at all during abort, but I'm not sure we can easily
guarantee that.)  In combination, these fix a crash we have seen
occasionally during parallel regression tests of 8.0.
2004-08-28 20:31:44 +00:00
Tom Lane
f444dafab0 Can't truncate pg_subtrans during a recovery checkpoint --- subtrans
module isn't fully initialized yet.
2004-08-28 18:18:03 +00:00
Tom Lane
fe455ee1d4 Revise ResourceOwner code to avoid accumulating ResourceOwner objects
for every command executed within a transaction.  For long transactions
this was a significant memory leak.  Instead, we can delete a portal's
or subtransaction's ResourceOwner immediately, if we physically transfer
the information about its locks up to the parent owner.  This does not
fully solve the leak problem; we need to do something about counting
multiple acquisitions of the same lock in order to fix it.  But it's a
necessary step along the way.
2004-08-25 18:43:43 +00:00
Tom Lane
4dbb880d3c Rearrange pg_subtrans handling as per recent discussion. pg_subtrans
updates are no longer WAL-logged nor even fsync'd; we do not need to,
since after a crash no old pg_subtrans data is needed again.  We truncate
pg_subtrans to RecentGlobalXmin at each checkpoint.  slru.c's API is
refactored a little bit to separate out the necessary decisions.
2004-08-23 23:22:45 +00:00
Tom Lane
f009c316ba Tweak code so that pg_subtrans is never consulted for XIDs older than
RecentXmin (== MyProc->xmin).  This ensures that it will be safe to
truncate pg_subtrans at RecentGlobalXmin, which should largely eliminate
any fear of bloat.  Along the way, eliminate SubTransXidsHaveCommonAncestor,
which isn't really needed and could not give a trustworthy result anyway
under the lookback restriction.
In an unrelated but nearby change, #ifdef out GetUndoRecPtr, which has
been dead code since 2001 and seems unlikely to ever be resurrected.
2004-08-22 02:41:58 +00:00
Tom Lane
19cd31b068 Fix bug introduced into _bt_getstackbuf() on 2003-Feb-21: the initial
value of 'start' could be past the end of the page, if the page was
split by some concurrent inserting process since we visited it.  In
this situation the code could look at bogus entries and possibly find
a match (since after all those entries still contain what they had
before the split).  This would lead to 'specified item offset is too large'
followed by 'PANIC: failed to add item to the page', as reported by Joe
Conway for scenarios involving heavy concurrent insertion activity.
2004-08-17 23:15:33 +00:00
Tom Lane
1a3de15a3a Dept. of further reflection: I looked around to see if any other callers
of XLogInsert had the same sort of checkpoint interlock problem as
RecordTransactionCommit, and indeed I found some.  Btree index build
and ALTER TABLE SET TABLESPACE write data outside the friendly confines
of the buffer manager, and therefore they have to take their own
responsibility for checkpoint interlock.  The easiest solution seems to
be to force smgrimmedsync at the end of the index build or table copy,
even when the operation is being WAL-logged.  This is sufficient since
the new index or table will be of interest to no one if we don't get
as far as committing the current transaction.
2004-08-15 23:44:46 +00:00
Bruce Momjian
10249abfa1 Cleanup Win32 COPY handling, and move archive examples to SGML. 2004-08-12 19:03:44 +00:00
Bruce Momjian
43ea65a0dc Add mention of "WIN32" COPY. 2004-08-12 18:34:45 +00:00
Bruce Momjian
6525b42b10 Add make_native_path() because Win32 COPY is an internal CMD.EXE command
and doesn't process forward slashes in the same way as external
commands.  Quoting the first argument to COPY does not convert forward
to backward slashes, but COPY does properly process quoted forward
slashes in the second argument.

Win32 COPY works with quoted forward slashes in the first argument only if the
current directory is the same as the directory of the first argument.
2004-08-12 18:32:52 +00:00
Tom Lane
3fdf649f4f Fix failure to guarantee that a checkpoint will write out pg_clog updates
for transaction commits that occurred just before the checkpoint.  This is
an EXTREMELY serious bug --- kudos to Satoshi Okada for creating a
reproducible test case to prove its existence.
2004-08-11 04:07:16 +00:00
Tom Lane
35f539b481 When expanding %p in archive_command or restore_command, translate
slashes to backslashes #ifdef WIN32.  This is to cope with the fact
that Windows seems exceedingly unfriendly to slashes in shell commands,
as per recent discussion.
2004-08-09 16:26:06 +00:00
Tom Lane
7dca975c5d Add a comment about why we always replay backup blocks from WAL. 2004-08-08 03:22:08 +00:00
Tom Lane
fcbc438727 Label CVS tip as 8.0devel instead of 7.5devel. Adjust various comments
and documentation to reference 8.0 instead of 7.5.
2004-08-04 21:34:35 +00:00
Tom Lane
b387d16f96 Make use of backup label/history files to control recovery properly. 2004-08-04 16:25:02 +00:00
Tom Lane
58c41712d5 Add functions pg_start_backup, pg_stop_backup to create backup label
and history files as per recent discussion.  While at it, remove
pg_terminate_backend, since we have decided we do not have time during
this release cycle to address the reliability concerns it creates.
Split the 'Miscellaneous Functions' documentation section into
'System Information Functions' and 'System Administration Functions',
which hopefully will draw the eyes of those looking for such things.
2004-08-03 20:32:36 +00:00
Tom Lane
a83c45c4c6 Fix misplacement of savepointLevel test, per report from Chris K-L. 2004-08-03 15:57:26 +00:00
Tom Lane
410b1dfb88 Update the in-code documentation about the transaction system. Move it
into a README file instead of being in xact.c's header comment.
Alvaro Herrera.
2004-08-01 20:57:59 +00:00
Tom Lane
5cc380f9a3 Error message style adjustments, per Alvaro Herrera. 2004-08-01 17:45:43 +00:00
Tom Lane
efcaf1e868 Some mop-up work for savepoints (nested transactions). Store a small
number of active subtransaction XIDs in each backend's PGPROC entry,
and use this to avoid expensive probes into pg_subtrans during
TransactionIdIsInProgress.  Extend EOXactCallback API to allow add-on
modules to get control at subxact start/end.  (This is deliberately
not compatible with the former API, since any uses of that API probably
need manual review anyway.)  Add basic reference documentation for
SAVEPOINT and related commands.  Minor other cleanups to check off some
of the open issues for subtransactions.
Alvaro Herrera and Tom Lane.
2004-08-01 17:32:22 +00:00
Tom Lane
beda4814c1 plpgsql does exceptions.
There are still some things that need refinement; in particular I fear
that the recognized set of error condition names probably has little in
common with what Oracle recognizes.  But it's a start.
2004-07-31 07:39:21 +00:00
Tom Lane
1bf3d61504 Fix subtransaction behavior for large objects, temp namespace, files,
password/group files.  Also allow read-only subtransactions of a read-write
parent, but not vice versa.  These are the reasonably noncontroversial
parts of Alvaro's recent mop-up patch, plus further work on large objects
to minimize use of the TopTransactionResourceOwner.
2004-07-28 14:23:31 +00:00
Tom Lane
cc813fc2b8 Replace nested-BEGIN syntax for subtransactions with spec-compliant
SAVEPOINT/RELEASE/ROLLBACK-TO syntax.  (Alvaro)
Cause COMMIT of a failed transaction to report ROLLBACK instead of
COMMIT in its command tag.  (Tom)
Fix a few loose ends in the nested-transactions stuff.
2004-07-27 05:11:48 +00:00
Tom Lane
acd907bfcc Add cross-check that current timeline of pg_control is an ancestor of
recovery_target_timeline --- otherwise there is no path from the backup
to the requested timeline.  This check was foreseen in the original
discussion but I forgot to implement it.
2004-07-22 21:09:37 +00:00
Tom Lane
3dba9cb694 Add a check on file size as an additional safety check that a WAL file
recovered from archive is not corrupt.  It's not much but it will catch
one common problem, viz out-of-disk-space.
Also, force a WAL recovery scan when recovery.conf is present, even if
pg_control shows a clean shutdown.  This allows recovery with a tar backup
that was taken with the postmaster shut down, as per complaint from
Mark Kirkwood.
2004-07-22 20:18:40 +00:00
Tom Lane
2042b3428d Invent WAL timelines, as per recent discussion, to make point-in-time
recovery more manageable.  Also, undo recent change to add FILE_HEADER
and WASTED_SPACE records to XLOG; instead make the XLOG page header
variable-size with extra fields in the first page of an XLOG file.
This should fix the boundary-case bugs observed by Mark Kirkwood.
initdb forced due to change of XLOG representation.
2004-07-21 22:31:26 +00:00
Tom Lane
9c7a765f02 Remove unportable use of strptime() to parse recovery target time spec.
Instead use our own abstimein code, which is more flexible anyway.
2004-07-19 14:34:39 +00:00
Tom Lane
66ec2db728 XLOG file archiving and point-in-time recovery. There are still some
loose ends and a glaring lack of documentation, but it basically works.

Simon Riggs with some editorialization by Tom Lane.
2004-07-19 02:47:16 +00:00
Tom Lane
fe548629c5 Invent ResourceOwner mechanism as per my recent proposal, and use it to
keep track of portal-related resources separately from transaction-related
resources.  This allows cursors to work in a somewhat sane fashion with
nested transactions.  For now, cursor behavior is non-subtransactional,
that is a cursor's state does not roll back if you abort a subtransaction
that fetched from the cursor.  We might want to change that later.
2004-07-17 03:32:14 +00:00
Tom Lane
94d4d240bb Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, and
shift support code into heapam.c accordingly.  This is in service of
soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to
use this same record type for both heaps and indexes.

Theoretically I should have forced initdb for this, but in practice there
is no change in xlog contents because CVS tip will never really emit this
record type anyhow...
2004-07-11 18:01:45 +00:00
Tom Lane
f5c798ee82 Fix no-longer-correct bit-pushing in TransactionIdSetStatus, per Alvaro. 2004-07-03 02:55:56 +00:00
Tom Lane
b6197fe069 Further review of xact.c state machine for nested transactions. Fix
problems with starting subtransactions inside already-failed transactions.
Clean up some comments.
2004-07-01 20:11:03 +00:00
Tom Lane
573a71a5da Nested transactions. There is still much left to do, especially on the
performance front, but with feature freeze upon us I think it's time to
drive a stake in the ground and say that this will be in 7.5.

Alvaro Herrera, with some help from Tom Lane.
2004-07-01 00:52:04 +00:00
Tom Lane
2467394ee1 Tablespaces. Alternate database locations are dead, long live tablespaces.
There are various things left to do: contrib dbsize and oid2name modules
need work, and so does the documentation.  Also someone should think about
COMMENT ON TABLESPACE and maybe RENAME TABLESPACE.  Also initlocation is
dead, it just doesn't know it yet.

Gavin Sherry and Tom Lane.
2004-06-18 06:14:31 +00:00
Tom Lane
950d047ec5 Give inet/cidr datatypes their own hash function that ignores the inet vs
cidr type bit, the same as network_eq does.  This is needed for hash joins
and hash aggregation to work correctly on these types.  Per bug report
from Michael Fuhr, 2004-04-13.
Also, improve hash function for int8 as suggested by Greg Stark.
2004-06-13 21:57:28 +00:00
Tom Lane
c541bb86e9 Infrastructure for I/O of composite types: arrange for the I/O routines
of a composite type to get that type's OID as their second parameter,
in place of typelem which is useless.  The actual changes are mostly
centralized in getTypeInputInfo and siblings, but I had to fix a few
places that were fetching pg_type.typelem for themselves instead of
using the lsyscache.c routines.  Also, I renamed all the related variables
from 'typelem' to 'typioparam' to discourage people from assuming that
they necessarily contain array element types.
2004-06-06 00:41:28 +00:00
Tom Lane
c3a153afed Tweak palloc/repalloc to allow zero bytes to be requested, as per recent
proposal.  Eliminate several dozen now-unnecessary hacks to avoid palloc(0).
(It's likely there are more that I didn't find.)
2004-06-05 19:48:09 +00:00
Tom Lane
ae93e5fd6e Make the world very nearly safe for composite-type columns in tables.
1. Solve the problem of not having TOAST references hiding inside composite
values by establishing the rule that toasting only goes one level deep:
a tuple can contain toasted fields, but a composite-type datum that is
to be inserted into a tuple cannot.  Enforcing this in heap_formtuple
is relatively cheap and it avoids a large increase in the cost of running
the tuptoaster during final storage of a row.
2. Fix some interesting problems in expansion of inherited queries that
reference whole-row variables.  We never really did this correctly before,
but it's now relatively painless to solve by expanding the parent's
whole-row Var into a RowExpr() selecting the proper columns from the
child.
If you dike out the preventive check in CheckAttributeType(),
composite-type columns now seem to actually work.  However, we surely
cannot ship them like this --- without I/O for composite types, you
can't get pg_dump to dump tables containing them.  So a little more
work still to do.
2004-06-05 01:55:05 +00:00
Tom Lane
8f2ea8b7b5 Resurrect heap_deformtuple(), this time implemented as a singly nested
loop over the fields instead of a loop around heap_getattr.  This is
considerably faster (O(N) instead of O(N^2)) when there are nulls or
varlena fields, since those prevent use of attcacheoff.  Replace loops
over heap_getattr with heap_deformtuple in situations where all or most
of the fields have to be fetched, such as printtup and tuptoaster.
Profiling done more than a year ago shows that this should be a nice
win for situations involving many-column tables.
2004-06-04 20:35:21 +00:00
Tom Lane
921d749bd4 Adjust our timezone library to use pg_time_t (typedef'd as int64) in
place of time_t, as per prior discussion.  The behavior does not change
on machines without a 64-bit-int type, but on machines with one, which
is most, we are rid of the bizarre boundary behavior at the edges of
the 32-bit-time_t range (1901 and 2038).  The system will now treat
times over the full supported timestamp range as being in your local
time zone.  It may seem a little bizarre to consider that times in
4000 BC are PST or EST, but this is surely at least as reasonable as
propagating Gregorian calendar rules back that far.

I did not modify the format of the zic timezone database files, which
means that for the moment the system will not know about daylight-savings
periods outside the range 1901-2038.  Given the way the files are set up,
it's not a simple decision like 'widen to 64 bits'; we have to actually
think about the range of years that need to be supported.  We should
probably inquire what the plans of the upstream zic people are before
making any decisions of our own.
2004-06-03 02:08:07 +00:00
Tom Lane
2095206de1 Adjust btree index build to not use shared buffers, thereby avoiding the
locking conflict against concurrent CHECKPOINT that was discussed a few
weeks ago.  Also, if not using WAL archiving (which is always true ATM
but won't be if PITR makes it into this release), there's no need to
WAL-log the index build process; it's sufficient to force-fsync the
completed index before commit.  This seems to gain about a factor of 2
in my tests, which is consistent with writing half as much data.  I did
not try it with WAL on a separate drive though --- probably the gain would
be a lot less in that scenario.
2004-06-02 17:28:18 +00:00
Tom Lane
e674707968 Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem.  All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
2004-05-31 19:24:05 +00:00
Tom Lane
9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Neil Conway
72b6ad6313 Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
2004-05-30 23:40:41 +00:00
Tom Lane
076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane
1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00
Tom Lane
16974ee910 Get rid of the former rather baroque mechanism for propagating the values
of ThisStartUpID and RedoRecPtr into new backends.  It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS.  (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
2004-05-27 17:12:57 +00:00
Neil Conway
d0b4399d81 Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.

The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
2004-05-26 04:41:50 +00:00
Tom Lane
4d86ae4260 For multi-table ANALYZE, use per-table transactions when possible
(ie, when not inside a transaction block), so that we can avoid holding
locks longer than necessary.  Per trouble report from Philip Warner.
2004-05-22 23:14:38 +00:00
Tom Lane
e6319d1d28 Put back #include <sys/time.h> in files that seem to need it on Linux. 2004-05-21 16:08:47 +00:00
Tom Lane
63bd0db121 Integrate src/timezone library for all platforms. There is more we can
and should do now that we control our own destiny for timezone handling,
but this commit gets the bulk of the picayune diffs in place.
Magnus Hagander and Tom Lane.
2004-05-21 05:08:06 +00:00
Tom Lane
868404b859 Fix speling. 2004-05-20 15:07:30 +00:00
Tom Lane
4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Tom Lane
0bd61548ab Solve the 'Turkish problem' with undesirable locale behavior for case
conversion of basic ASCII letters.  Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower.  These functions use the same notions of
case folding already developed for identifier case conversion.  I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent.  Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
2004-05-07 00:24:59 +00:00
Tom Lane
37fa3b6c89 Tweak indexscan and seqscan code to arrange that steps from one page to
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls.  This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page).  Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
2004-04-21 18:24:26 +00:00
Bruce Momjian
31338352bd * Most changes are to fix warnings issued when compiling win32
* removed a few redundant defines
* get_user_name safe under win32
* rationalized pipe read EOF for win32 (UPDATED PATCH USED)
* changed all backend instances of sleep() to pg_usleep

    - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a
32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is
acceptable, please replace with pg_usleep(2000000000L)]

I added a comment to that part of the code:

    /*
     *  It would be nice to use pg_usleep() here, but only does 2000 sec
     *  or 33 minutes, which seems too short.
     */
    sleep(1000000);

Claudio Natoli
2004-04-19 17:42:59 +00:00
Bruce Momjian
823ac7c2b4 This is a cleanup patch for access/transam/xact.c. It only removes some
#ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact
that StartTransaction() has been executed.

Alvaro Herrera
2004-04-05 03:11:39 +00:00
Tom Lane
375369acd1 Replace TupleTableSlot convention for whole-row variables and function
results with tuples as ordinary varlena Datums.  This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables.  However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well.  Per my proposal of a few days ago.
2004-04-01 21:28:47 +00:00
Teodor Sigaev
f2c064afcb Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
2004-03-30 15:45:33 +00:00
Bruce Momjian
6367ed4382 Increase xlog str_time() static string variable, per Korean User's Group. 2004-03-22 04:16:57 +00:00
Tatsuo Ishii
0b86ade1c2 Add NOWAIT option to LOCK command 2004-03-11 01:47:41 +00:00
Tom Lane
7a57a67278 Replace opendir/closedir calls throughout the backend with AllocateDir
and FreeDir routines modeled on the existing AllocateFile/FreeFile.
Like the latter, these routines will avoid failing on EMFILE/ENFILE
conditions whenever possible, and will prevent leakage of directory
descriptors if an elog() occurs while one is open.
Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not
critical code and there is no reason to force a DB restart on failure.
All per recent trouble report from Olivier Hubaut.
2004-02-23 23:03:10 +00:00
Bruce Momjian
1f17316a3d Here is an updated version of the win32 readdir patch.
1) Now puts in exactly the same change as the current-cvs mingw code
does. (see
http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1=
1.3&r2=1.4, second part of the patch).

2) Updates both xlog.c and slru.c in backend/access/transam/

3) Also updates pg_resetxlog, which also uses readdir() and checks the
errno value after the loop.

Magnus Hagander
2004-02-17 03:45:17 +00:00
Tom Lane
c3c09be34b Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to
wit: Add a header record to each WAL segment file so that it can be reliably
identified.  Avoid splitting WAL records across segment files (this is not
strictly necessary, but makes it simpler to incorporate the header records).
Make WAL entries for file creation, deletion, and truncation (as foreseen but
never implemented by Vadim).  Also, add support for making XLOG_SEG_SIZE
configurable at compile time, similarly to BLCKSZ.  Fix a couple bugs I
introduced in WAL replay during recent smgr API changes.  initdb is forced
due to changes in pg_control contents.
2004-02-11 22:55:26 +00:00
Tom Lane
58f337a343 Centralize implementation of delay code by creating a pg_usleep()
subroutine in src/port/pgsleep.c.  Remove platform dependencies from
miscadmin.h and put them in port.h where they belong.  Extend recent
vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and
non-btree index vacuuming.

By the way, where is the documentation for the cost-based-delay patch?
2004-02-10 03:42:45 +00:00
Tom Lane
87bd956385 Restructure smgr API as per recent proposal. smgr no longer depends on
the relcache, and so the notion of 'blind write' is gone.  This should
improve efficiency in bgwriter and background checkpoint processes.
Internal restructuring in md.c to remove the not-very-useful array of
MdfdVec objects --- might as well just use pointers.
Also remove the long-dead 'persistent main memory' storage manager (mm.c),
since it seems quite unlikely to ever get resurrected.
2004-02-10 01:55:27 +00:00